On Thu, Dec 17, 2009 at 03:32:21PM +0100, Kjetil Torgrim Homme wrote: > if the hash used for dedup is completely separate from the hash used for > data protection, I don't see any downsides to computing the dedup hash > from uncompressed data. why isn't it?
Hash and checksum functions are slow (hash functions are slower, but either way you'll be loading large blocks of data, which sets a floor for cost). Duplicating work is bad for performance. Using the same checksum for integrity protection and dedup is an optimization, and a very nice one at that. Having separate checksums would require making blkptr_t larger, which imposes its own costs. There's lots of trade-offs here. Using the same checksum/hash for integrity protection and dedup is a great solution. If you use a non-cryptographic checksum algorithm then you'll want to enable verification for dedup. That's all. Nico -- _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss