Re: [zfs-discuss] New fast hash algorithm - is it needed?

Sašo Kiselkov Wed, 11 Jul 2012 04:25:03 -0700

On 07/11/2012 12:32 PM, Ferenc-Levente Juhos wrote:
> Saso, I'm not flaming at all, I happen to disagree, but still I understand
> that
> chances are very very very slim, but as one poster already said, this is
> how
> the lottery works. I'm not saying one should make an exhaustive search with
> trillions of computers just to produce a sha256 collision.
> If I wanted an exhaustive search I would generate all the numbers from
> 0 to 2**256 and I would definitely get at least 1 collision.
> If you formulate it in another way, by generating all the possible 256 bit
> (32 byte)
> blocks + 1 you will definitely get a collision. This is much more credible
> than the analogy with the age of the universe and atoms picked at random,
> etc.


First of all, I never said that the chance is zero. It's definitely
non-zero, but claiming that is analogous to the lottery is just not
appreciating the scale of the difference.

Next, your proposed method of finding hash collisions is naive in that
you assume that you merely need generate 256-bit numbers. First of all,
the smallest blocks in ZFS are 1k (IIRC), i.e. 8192 bits. Next, you fail
to appreciate the difficulty of even generating 2^256 256-bit numbers.
Here's how much memory you'd need:

2^256 * 32 ~= 2^261 bytes

Memory may be cheap, but not that cheap...

> The fact is it can happen, it's entirely possible that there are two jpg's
> in the universe with different content and they have the same hash.
> I can't prove the existence of those, but you can't deny it.

I'm not denying that. Read what I said.

> The fact is, that zfs and everyone using it trys to correct data
> degradation e.g. cause by cosmic rays, and on the other hand their
> using probability calculation (no matter how slim the chances are) to
> potentially discard valid data.
> You can come with other universe and atom theories and with the
> age of the universe, etc. The fact remains the same.

This is just a profoundly naive statement. You always need to make
trade-offs between practicality and performance. How "slim" the chances
are has a very real impact on engineering decisions.

> And each generation was convinced that their current best checksum or
> hash algorithm is the best and will be the best forever. MD5 has
> demonstrated that it's not the case. Time will tell what becomes of
> SHA256, but why take any chances.

You really don't understand much about hash algorithms, do you? MD5 has
*very* good safety against random collisions - that's why it's still
used in archive integrity checking (ever wonder what algorithm Debian
and RPM packages use?). The reason why it's been replaced by newer
algorithms has nothing to do with its chance of random hash collisions,
but with deliberate collision attacks on security algorithms built on
top of it. Also, the reason why we don't use it in ZFS is because
SHA-256 is faster and it has a larger pattern space (thus further
lowering the odds of random collisions).

--
Saso
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] New fast hash algorithm - is it needed?

Reply via email to