On 14.11.2012 01:41, Richard Baron Penman wrote: > I found the MD5 and SHA hashes slow to calculate.
Slow? For URLs? Are you kidding? How many URLs per second do you want to calculate? > The builtin hash is fast but I was concerned about collisions. What > rate of collisions could I expect? MD5 has 16 bytes (128 bit), SHA1 has 20 bytes (160 bit). Utilizing the birthday paradox and some approximations, I can tell you that when using the full MD5 you'd need around 2.609e16 hashes in the same namespace to get a one in a million chance of a collision. That is, 26090000000000000 filenames. For SHA1 This number rises even further and you'd need around 1.71e21 or 1710000000000000000000 hashes in one namespace for the one-in-a-million. I really have no clue about how many URLs you want to hash, and it seems to be LOTS since the speed of MD5 seems to be an issue for you. Let me estimate that you'd want to calculate a million hashes per second then when you use MD5, you'd have about 827 years to fill the namespace up enough to get a one-in-a-million. If you need even more hashes (say a million million per second), I'd suggest you go with SHA-1, giving you 54 years to get the one-in-a-million. Then again, if you went for a million million hashes per second, Python would probably not be the language of your choice. Best regards, Johannes -- >> Wo hattest Du das Beben nochmal GENAU vorhergesagt? > Zumindest nicht öffentlich! Ah, der neueste und bis heute genialste Streich unsere großen Kosmologen: Die Geheim-Vorhersage. - Karl Kaos über Rüdiger Thomas in dsa <hidbv3$om2$1...@speranza.aioe.org> -- http://mail.python.org/mailman/listinfo/python-list