Sorry!  A previous attempt to reply got sent before I typed anything :-(

Very briefly:

> >>> timeit.timeit("set(i for i in range(1000))", number=100_000)

[and other examples using a range of integers]

The collision resolution strategy for sets evolved to be fancier than
for dicts, to reduce cache misses.  This is important for sets because
the _only_ interesting thing about an element wrt a set is whether or
not the set contains it.   Lookup speed is everything for sets.

If you use a contiguous range of "reasonable" integers for keys, the
integer hash function is perfect:  there's never a collision.  So any
such test misses all the work Raymond did to speed set lookups.
String keys have sufficiently "random" hashes to reliably create
collisions, though.  Cache misses can, of course, have massive effects
on timing.

> Add (much faster for dicts):
> >>> timeit.timeit("s = set(); s.add(0)", number=100_000_000)
> 13.330938750001224
> >>> timeit.timeit("d = {}; d[0] = None", number=100_000_000)
> 5.788865385999088

In the former case you're primarily measuring the time to look up the
"add" method of sets (that's more expensive than adding 0 to the set).
A better comparison would, e.g., move `add = s.add` to a setup line,
and use plain "add(0)" in the loop.

That's it!
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/5UKYDB36WDRRIYOLRD52QS4I43SCAAJ6/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to