On Wed, 16 Feb 2005, Craig J. Lindstrom wrote: > By very definition collisions will occur in any digest algorithm. The only > way to have no collisions would be to have a hash as long as the data, then > have the algorithm perfectly distribute the hashes. So collisions are > extremely common (in a statistical sense). The real notion behind a hash is > that if there are differences in a stream of data the resulting hash will > have a very high probability of not matching the original hash. What this > document is saying is that the hash does not provide a perfectly random > distribution of hash values across input data; there is some relation or > predictability in the algorithm that can reduce the effectiveness of the > hash. In other words to brute force a hash requires less iterations than > 2^HashBitsSize.
Correct, depending on your definition of "brute force". If you want to find a preimage to match a hash output you're given (preimage resistance), or find a second preimage which has the same hash value as some fixed value you don't control (second preimage resistance), that should take 2^k. But the birthday paradox says that finding a pair of values which collide (collision resistance) only takes 2^(k/2) operations. This page describes these three properties: http://en.wikipedia.org/wiki/Cryptographic_hash_function > It is not "broken" or completely useless (at least not according to this). You might be correct if collision resistance weren't important, but it is. Many cryptosystems, like digital signatures, rely on that property if they are to work as advertised. I say you /might/ be correct in that case because we still don't know enough about these attacks to tell whether they'll also break second preimage or preimage resistance. As yet, we have no evidence to suggest that that's the case, although the attacks do seem to do more than just vanilla collisions -- they're finding multiple collisions, and may allow flexibility in choosing preimages. > And even if a collision can be created, the colliding data stream would > have to be similar enough in context to the original that it could be used > in "context" and the tampering not be noticed. Imagine a hash of a > spreadsheet that collides with the hash of another spreadsheet that is > similar enough to the original but contains the forgery needed to exploit > the spreadsheet with the intened alteration and nothing else that would > indicate a change. When you look at the data in context the probably of a > meaninful collision is greatly reduced. > > So I'm not so worried, especially for data that does not need a super high > degree of authenticity. If I had data that sensitive I would send multiple > hashes with different algorithms. The probably of a collision across to > algorithims is much smaller. Unfortunately, neither of these gives as much protection as we'd hope. Recent research suggests that collisions on the concatenation of hashes are much easier to find than naive brute force would suggest, and the Kaminsky paper shows that you actually have quite a lot of flexibility in where you put the "garbage" which causes a collision. He came up with two different mp3 files which play just fine but have different contents. Weird, huh? That's why I keep going on about how important these breaks are. Cryptosystems are filled with weird things that happen when you break even a seemingly insignificant assumption; that's why proofs of security are becoming so important in crypto papers. -J -------------------- BYU Unix Users Group http://uug.byu.edu/ The opinions expressed in this message are the responsibility of their author. They are not endorsed by BYU, the BYU CS Department or BYU-UUG. ___________________________________________________________________ List Info: http://uug.byu.edu/cgi-bin/mailman/listinfo/uug-list
