Raymond Hettinger <raymond.hettin...@gmail.com> added the comment:

ISTM the collision of "hash((1,0,0)) == hash((1,-2,-2))" also stems from 
integers hashing to themselves and that twos-complement arithmetic relation, -x 
== ~x+1.  Further, that example specifically exploits that "hash(-1) == 
hash(-2)" due to how error codes.  In a way, tuple hashing is incidental.

Occasional collisions aren't really a big deal -- it is normal for dicts to 
have some collisions and to resolve them.  I would be more concerned is 
non-contrived datasets had many elements that gave exactly the same hash value 
resulting in a pile-up in one place.

FWIW, the current algorithm also has some advantages that we don't want to 
lose. In particular, the combination of the int hash and tuple hash are good at 
producing distinct values for non-negative numbers:

    >>> from itertools import product
    >>> len(set(map(hash, product(range(100), repeat=4))))
    100000000

The current hash is in the pretty-darned-good category and so we should be 
disinclined to change battle-tested code that is known to work well across a 
broad range of use cases.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue34751>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to