I've wanted a system where dedup applies only to blocks being written that have a good chance of being dups of others.
I think one way to do this would be to keep a scalable Bloom filter (on disk) into which one inserts block hashes. To decide if a block needs dedup one would first check the Bloom filter, then if the block is in it, use the dedup code path, else the non-dedup codepath and insert the block in the Bloom filter. This means that the filesystem would store *two* copies of any deduplicatious block, with one of those not being in the DDT. This would allow most writes of non-duplicate blocks to be faster than normal dedup writes, but still slower than normal non-dedup writes: the Bloom filter will add some cost. The nice thing about this is that Bloom filters can be sized to fit in main memory, and will be much smaller than the DDT. It's very likely that this is a bit too obvious to just work. Of course, it is easier to just use flash. It's also easier to just not dedup: the most highly deduplicatious data (VM images) is relatively easy to manage using clones and snapshots, to a point anyways. Nico -- _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss