[Larry Hastings] > I see--it's avoiding the Birthday Paradox. It /tends/ to, yes. This wasn't a design goal of the string hash, it's just a property observed after it was adopted, and appreciated much later ;-)
It's much clearer for Python's small-int hash, where hash(i) == i for i != -1. That is, nearly all "small enough" integers are their own "hash codes". That guarantees no collisions whatsoever in a dict keyed by a contiguous range of small integers (excluding -1), no matter how large the range. Read the comments in dictobject.c for more on this. The predictability of such hash schemes has both good & bad implications for dict performance, and Python's dict conflict-resolution strategies are fancier than most to mitigate the possible bad implications. > Collisions are actually more likely if the numbers are totally random than > if the numbers are, because of a feeble hash algorithm, relatively > consecutive. ;) Right. In a "good" (cryptographically speaking) hash function, a 1-bit change in the input "should" change about half the output bits (in the hash code), making collisions much more likely when the keys differ little in the low bits. The important point is that the cost of building and accessing string-keyed dicts is more important (in Python) than the cost of just hashing strings, and collision resolution is a real expense. _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com