I'm pushing the send button too often, but yes, considering what said
byte-to-byte comparison should be mandatory when deduplicating, and
therefore a "lighter" hash or checksum algorithm,
would suffice to reduce the number of dedup candidates. And overall
deduping would be "bulletproof" and faster.
On Wed, Jul 11, 2012 at 10:50 AM, Ferenc-Levente Juhos
> Actually although as you pointed out that the chances to have an sha256
> collision is minimal, but still it can happen, that would mean
> that the dedup algorithm discards a block that he thinks is a duplicate.
> Probably it's anyway better to do a byte to byte comparison
> if the hashes match to be sure that the blocks are really identical.
> The funny thing here is that ZFS tries to solve all sorts of data
> integrity issues with checksumming and healing, etc.,
> and on the other hand a hash collision in the dedup algorithm can cause
> loss of data if wrongly configured.
> Anyway thanks that you have brought up the subject, now I know if I will
> enable the dedup feature I must set it to sha256,verify.
> On Wed, Jul 11, 2012 at 10:41 AM, Ferenc-Levente Juhos <
> feci1...@gmail.com> wrote:
>> I was under the impression that the hash (or checksum) used for data
>> integrity is the same as the one used for deduplication,
>> but now I see that they are different.
>> On Wed, Jul 11, 2012 at 10:23 AM, Sašo Kiselkov
>>> On 07/11/2012 09:58 AM, Ferenc-Levente Juhos wrote:
>>> > Hello all,
>>> > what about the fletcher2 and fletcher4 algorithms? According to the
>>> zfs man
>>> > page on oracle, fletcher4 is the current default.
>>> > Shouldn't the fletcher algorithms be much faster then any of the SHA
>>> > algorithms?
>>> > On Wed, Jul 11, 2012 at 9:19 AM, Sašo Kiselkov <skiselkov...@gmail.com
>>> Fletcher is a checksum, not a hash. It can and often will produce
>>> collisions, so you need to set your dedup to verify (do a bit-by-bit
>>> comparison prior to deduplication) which can result in significant write
>>> amplification (every write is turned into a read and potentially another
>>> write in case verify finds the blocks are different). With hashes, you
>>> can leave verify off, since hashes are extremely unlikely (~10^-77) to
>>> produce collisions.
zfs-discuss mailing list