On Thu, Dec 17, 2009 at 03:32:21PM +0100, Kjetil Torgrim Homme wrote:
> if the hash used for dedup is completely separate from the hash used for
> data protection, I don't see any downsides to computing the dedup hash
> from uncompressed data.  why isn't it?

Hash and checksum functions are slow (hash functions are slower, but
either way you'll be loading large blocks of data, which sets a floor
for cost).  Duplicating work is bad for performance.  Using the same
checksum for integrity protection and dedup is an optimization, and a
very nice one at that.  Having separate checksums would require making
blkptr_t larger, which imposes its own costs.

There's lots of trade-offs here.  Using the same checksum/hash for
integrity protection and dedup is a great solution.

If you use a non-cryptographic checksum algorithm then you'll
want to enable verification for dedup.  That's all.

Nico
-- 
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to