Folks, for the maths curious I found a seemingly ridiculous coincidence in
the MD5 and SHA1 hash algorithms. Here's what I do in my test:
Get the ASCII bytes of a string.
Hash the bytes using MD5 or SHA1.
"Fold" the 16 or 20 byte hash down to 4 bytes with XOR to make an Int32.
By "fold" I mean that I make a short buffer out of a long one by doing this
loop over the long buffer: fold[i%4] ^= buff[i]. Doing this is questionable,
but some gut feeling told me that this would make more randomness and better
distribute the 4 byte hash. It turns out that this "folding" actually
produces a weird effect.
I found two strings that produce the same hash this way. Here is the output
of my test code:
"123" --> Hash 202CB962AC59075B964B07152D234B70 --> Fold 371DF25C
"3Px" --> Hash 02BAB61AA407FD643066D193A1C668B1 --> Fold 371DF25C
The strings produce quite different MD5 hashes as you would expect, but when
I XOR "fold" the buffers down I get the same 4 byte result. This seems
statistically infeasible. Using SHA1 you get collisions with "7Fw" and
"X9z". I have a dozen other examples.
Greg
using System;
using System.Security.Cryptography;
using System.Text;
namespace collider
{
class Program
{
static void Main(string[] args)
{
HashAlgorithm md5 = MD5.Create();
HashAlgorithm sha1 = SHA1.Create();
ShowCollision(md5, "123", "3Px");
ShowCollision(sha1, "7Fw", "X9z");
}
private static void ShowCollision(HashAlgorithm hasher, string s1,
string s2)
{
byte[] hash = hasher.ComputeHash(StringBytes(s1));
byte[] fold = FoldBuffer(hash, 4);
Trace("\n\"{0}\" --> Hash {1} --> Fold {2}", s1,
PrintBuff(hash), PrintBuff(fold));
hash = hasher.ComputeHash(StringBytes(s2));
fold = FoldBuffer(hash, 4);
Trace("\"{0}\" --> Hash {1} --> Fold {2}", s2, PrintBuff(hash),
PrintBuff(fold));
}
private static string PrintBuff(byte[] buffer)
{
return BitConverter.ToString(buffer).Replace("-", "");
}
private static byte[] FoldBuffer(byte[] buffer, int length)
{
byte[] fold = new byte[length];
for (int i = 0; i < buffer.Length; i++)
{
fold[i % length] ^= buffer[i];
}
return fold;
}
private static byte[] StringBytes(string s)
{
return Encoding.ASCII.GetBytes(s);
}
private static void Trace(string format, params object[] args)
{
Console.WriteLine(format, args);
}
}
}