Les Hughes wrote:
Greg Keogh wrote:
The strings produce quite different MD5 hashes as you would expect,
but when I XOR “fold” the buffers down I get the same 4 byte result.
This seems statistically infeasible. Using SHA1 you get collisions
with "7Fw" and "X9z". I have a dozen other examples.
Let's say each character in the string: "7Fw". Each character is a
byte. (could be ASCII, but whatever...)
You are trying to turn 3 bytes (of which there are ~17 million
combinations) into a 4 byte hash (of which there are ~4 billion
combinations) using an algorithm which works for strings of near
infinite length.
This means there is a 1/250 chance (0.4%) that strings of 3 bytes will
have the same 4 byte hash).
Actually, who needs approximations? 3 bytes into 4 bytes is 1 in 256 (8
bit!)
Again: while you might want an even distribution, as the hash function
is trying to do an even distribution over an unknown and unlimited
string length, a bias such as this for 3 chars is expected.
--
Les Hughes
[email protected]