[Kyle]
> ...
> For some reason, I had assumed in the back of my head (without
> giving it much thought) that the average collision rate would be the
> same for set items and dict keys. Thanks for the useful information.

I know the theoretical number of probes for dicts, but not for sets
anymore.  The latter use a mix of probe strategies now, "randomish"
jumps (same as for dicts) but also purely linear ("up by 1") probing
to try to exploit L1 cache.

It's not _apparent_ to me that the mix actually helps ;-)  I just
trust that Raymond timed stuff until he was sure it does.

> ...
> Ah, I forgot to consider how the hash function actually works for continuous
> integers. A better comparison to demonstrate the collision differences would
> likely use random strings.

And, at an extreme, a class with a __hash__ that always returns the
same integer.

> Also, I believe that max "reasonable" integer range of no collision
> is (-2305843009213693951, 2305843009213693951), ...

Any range  that does _not_ contain both -2 and -1 (-1 is an annoying
special case, with hash(-1) == hash(-2) == -2), and spans no more than
sys.hash_info.modulus integers.  Apart from that, the sign and
magnitude of the start of the range don't matter; e.g.,

>>> len(set(hash(i) for i in range(10**5000, 10**5000 + 1000000)))
1000000
>>> len(set(hash(i) for i in range(-10**5000, -10**5000 + 1000000)))
1000000
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/FXTRZLOHSWYV5KEGJ4DYNS7HOMUOPVE5/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to