[Kyle] > ... > For some reason, I had assumed in the back of my head (without > giving it much thought) that the average collision rate would be the > same for set items and dict keys. Thanks for the useful information.
I know the theoretical number of probes for dicts, but not for sets anymore. The latter use a mix of probe strategies now, "randomish" jumps (same as for dicts) but also purely linear ("up by 1") probing to try to exploit L1 cache. It's not _apparent_ to me that the mix actually helps ;-) I just trust that Raymond timed stuff until he was sure it does. > ... > Ah, I forgot to consider how the hash function actually works for continuous > integers. A better comparison to demonstrate the collision differences would > likely use random strings. And, at an extreme, a class with a __hash__ that always returns the same integer. > Also, I believe that max "reasonable" integer range of no collision > is (-2305843009213693951, 2305843009213693951), ... Any range that does _not_ contain both -2 and -1 (-1 is an annoying special case, with hash(-1) == hash(-2) == -2), and spans no more than sys.hash_info.modulus integers. Apart from that, the sign and magnitude of the start of the range don't matter; e.g., >>> len(set(hash(i) for i in range(10**5000, 10**5000 + 1000000))) 1000000 >>> len(set(hash(i) for i in range(-10**5000, -10**5000 + 1000000))) 1000000 _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/FXTRZLOHSWYV5KEGJ4DYNS7HOMUOPVE5/ Code of Conduct: http://python.org/psf/codeofconduct/