On Thu, Jul 26, 2012 at 9:30 AM, Steven D'Aprano <steve+comp.lang.pyt...@pearwood.info> wrote: > What happens if you get a collision? > > That is, you have two different long identifiers: > > a.b.c.d...something > a.b.c.d...anotherthing > > which by bad luck both hash to the same value: > > a.b.c.d.$AABB99 > a.b.c.d.$AABB99 > > (or whatever).
The odds of a given pair of identifiers having the same digest to 10 hex digits are 1 in 16^10, or approximately 1 in a trillion. If you bought one lottery ticket a day at those odds, you would win approximately once every 3 billion years. But it's not enough just to have a hash collision, they also have to match exactly in the first 21 (or 30, or whatever) characters of their actual names, and they have to both be long enough to invoke the truncating scheme in the first place. The Oracle backend for Django uses this same approach with an MD5 sum to ensure that identifiers will be no more than 30 characters long (a hard limit imposed by Oracle). It actually truncates the hash to 4 digits, though, not 10. This hasn't caused any problems that I'm aware of. -- http://mail.python.org/mailman/listinfo/python-list