On 07/11/12 02:10, Sašo Kiselkov wrote:
> Oh jeez, I can't remember how many times this flame war has been going
> on on this list. Here's the gist: SHA-256 (or any good hash) produces a
> near uniform random distribution of output. Thus, the chances of getting
> a random hash collision are around 2^-256 or around 10^-77.
I think you're correct that most users don't need to worry about this --
sha-256 dedup without verification is not going to cause trouble for them.
But your analysis is off. You're citing the chance that two blocks picked at
random will have the same hash. But that's not what dedup does; it compares
the hash of a new block to a possibly-large population of other hashes, and
that gets you into the realm of "birthday problem" or "birthday paradox".
See http://en.wikipedia.org/wiki/Birthday_problem for formulas.
So, maybe somewhere between 10^-50 and 10^-55 for there being at least one
collision in really large collections of data - still not likely enough to
Of course, that assumption goes out the window if you're concerned that an
adversary may develop practical ways to find collisions in sha-256 within the
deployment lifetime of a system. sha-256 is, more or less, a scaled-up sha-1,
and sha-1 is known to be weaker than the ideal 2^80 strength you'd expect from
2^160 bits of hash; the best credible attack is somewhere around 2^57.5 (see
on a somewhat less serious note, perhaps zfs dedup should contain "chinese
lottery" code (see http://tools.ietf.org/html/rfc3607 for one explanation)
which asks the sysadmin to report a detected sha-256 collision to
eprint.iacr.org or the like...
zfs-discuss mailing list