> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Nico Williams
> I've wanted a system where dedup applies only to blocks being written
> that have a good chance of being dups of others.
> I think one way to do this would be to keep a scalable Bloom filter
> (on disk) into which one inserts block hashes.
> To decide if a block needs dedup one would first check the Bloom
> filter, then if the block is in it, use the dedup code path,
How is this different or better than the existing dedup architecture? If you
found that some block about to be written in fact matches the hash of an
existing block on disk, then you've already determined it's a duplicate block,
exactly as you would, if you had dedup enabled. In that situation, gosh, it
sure would be nice to have the extra information like reference count, and
pointer to the duplicate block, which exists in the dedup table.
In other words, exactly the way existing dedup is already architected.
> The nice thing about this is that Bloom filters can be sized to fit in
> main memory, and will be much smaller than the DDT.
If you're storing all the hashes of all the blocks, how is that going to be
smaller than the DDT storing all the hashes of all the blocks?
zfs-discuss mailing list