On 2 May 2012 17:09, Greg Keogh <[email protected]> wrote: > I'm back home. I hashed all 3-char strings from a pool of 63 characters > (250047 total) into a 4-byte hash of 2^32 possible values. This means I > tested 0.0058% of the space. > > I assumed that the MD5 and SHA1 hash rounds were so effective that they > acted like a really good randomiser and that the hashes would be well > distributed. So well distributed in fact that I expected no collisions at > all. It looks like the "avalanche effect" isn't as strong as I expected > where small bit changes in the input are supposed to significantly alter the > output. I was quite shocked to see whole blocks of identical output bytes > for different inputs.
I'm not sure what you mean by "whole blocks": I hashed all 3 ASCII character strings (32-126) and found exactly 73 collisions for MD5 and 79 collisions for SHA1 and they all were only duplicates; no triplicates, etc. > CRC32 checksums of the same 250047 inputs produce no collisions. > > I'll look into this more when I get some hobby time (maybe next Xmas!) > > Greg > -- Regards, Mark Hurd, B.Sc.(Ma.)(Hons.)
