Les Hughes wrote:
Greg Keogh wrote:

The strings produce quite different MD5 hashes as you would expect, but when I XOR “fold” the buffers down I get the same 4 byte result. This seems statistically infeasible. Using SHA1 you get collisions with "7Fw" and "X9z". I have a dozen other examples.


Let's say each character in the string: "7Fw". Each character is a byte. (could be ASCII, but whatever...)

You are trying to turn 3 bytes (of which there are ~17 million combinations) into a 4 byte hash (of which there are ~4 billion combinations) using an algorithm which works for strings of near infinite length.

This means there is a 1/250 chance (0.4%) that strings of 3 bytes will have the same 4 byte hash).
Actually, who needs approximations? 3 bytes into 4 bytes is 1 in 256 (8 bit!)

Again: while you might want an even distribution, as the hash function is trying to do an even distribution over an unknown and unlimited string length, a bias such as this for 3 chars is expected.
--
Les Hughes
[email protected]

Reply via email to