Raymond Hettinger <raymond.hettin...@gmail.com> added the comment:
ISTM the collision of "hash((1,0,0)) == hash((1,-2,-2))" also stems from integers hashing to themselves and that twos-complement arithmetic relation, -x == ~x+1. Further, that example specifically exploits that "hash(-1) == hash(-2)" due to how error codes. In a way, tuple hashing is incidental. Occasional collisions aren't really a big deal -- it is normal for dicts to have some collisions and to resolve them. I would be more concerned is non-contrived datasets had many elements that gave exactly the same hash value resulting in a pile-up in one place. FWIW, the current algorithm also has some advantages that we don't want to lose. In particular, the combination of the int hash and tuple hash are good at producing distinct values for non-negative numbers: >>> from itertools import product >>> len(set(map(hash, product(range(100), repeat=4)))) 100000000 The current hash is in the pretty-darned-good category and so we should be disinclined to change battle-tested code that is known to work well across a broad range of use cases. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue34751> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com