On Thu, Jan 24, 2013 at 11:09 AM, Dave Angel <d...@davea.name> wrote: > I certainly can't disagree that it's easy to produce a very long hash that > isn't at all secure. But I would disagree that longer hashes > *automatically* reduce chances of collision.
Sure. But by and large, longer hashes give you a better chance at avoiding collisions. Caveat: I am not a cryptography expert. My statements are based on my own flawed understanding of what's going on. I use the stuff but I don't invent it. > Wikipedia - http://en.wikipedia.org/wiki/Cryptographic_hash_function > > seems to say that there are four requirements. > it is easy to compute the hash value for any given message > it is infeasible to generate a message that has a given hash > it is infeasible to modify a message without changing the hash > it is infeasible to find two different messages with the same hash > > Seems to me a small hash wouldn't be able to meet the last 3 conditions. True, but the definition of "small" is tricky. Of course the one-byte hash I proposed isn't going to be difficult to break, since you can just brute-force a bunch of message changes until you find one that has the right hash. But it's more about the cascade effect - that any given message has equal probability of having any of the possible hashes. Make a random change, get another random hash. So for a perfect one-byte hash, you have exactly one chance in 256 of getting any particular hash. By comparison, a simple/naive hash that just XORs together all the byte values fails these checks. Even if you take the message 64 bytes at a time (thus producing a 512-bit hash), you'll still be insecure, because it's easy to predict what hash you'll get after making a particular change. This property of the hash doesn't change as worldwide computing power improves. A hashing function might go from being "military-grade security" to being "home-grade security" to being "two-foot fence around your property", while still being impossible to predict without brute-forcing. But when an algorithm is found that generates collisions faster than the hash size indicates, it effectively reduces the hash size to the collision rate - MD5 is 128-bit, but (if I understand the Wikipedia note correctly) a known attack cuts that to 20.96 bits of "real hash size". So MD5 is still better than a perfect 16-bit hash, but not as good as a perfect 32-bit hash. (And on today's hardware, that's not good enough.) http://en.wikipedia.org/wiki/Collision_resistant ChrisA -- http://mail.python.org/mailman/listinfo/python-list