Actually although as you pointed out that the chances to have an sha256
collision is minimal, but still it can happen, that would mean
that the dedup algorithm discards a block that he thinks is a duplicate.
Probably it's anyway better to do a byte to byte comparison
if the hashes match to be sure that the blocks are really identical.
The funny thing here is that ZFS tries to solve all sorts of data integrity
issues with checksumming and healing, etc.,
and on the other hand a hash collision in the dedup algorithm can cause
loss of data if wrongly configured.
Anyway thanks that you have brought up the subject, now I know if I will
enable the dedup feature I must set it to sha256,verify.
On Wed, Jul 11, 2012 at 10:41 AM, Ferenc-Levente Juhos
> I was under the impression that the hash (or checksum) used for data
> integrity is the same as the one used for deduplication,
> but now I see that they are different.
> On Wed, Jul 11, 2012 at 10:23 AM, Sašo Kiselkov <skiselkov...@gmail.com>wrote:
>> On 07/11/2012 09:58 AM, Ferenc-Levente Juhos wrote:
>> > Hello all,
>> > what about the fletcher2 and fletcher4 algorithms? According to the zfs
>> > page on oracle, fletcher4 is the current default.
>> > Shouldn't the fletcher algorithms be much faster then any of the SHA
>> > algorithms?
>> > On Wed, Jul 11, 2012 at 9:19 AM, Sašo Kiselkov <skiselkov...@gmail.com
>> Fletcher is a checksum, not a hash. It can and often will produce
>> collisions, so you need to set your dedup to verify (do a bit-by-bit
>> comparison prior to deduplication) which can result in significant write
>> amplification (every write is turned into a read and potentially another
>> write in case verify finds the blocks are different). With hashes, you
>> can leave verify off, since hashes are extremely unlikely (~10^-77) to
>> produce collisions.
zfs-discuss mailing list