On 15.09.16 19:13, Antoine Pitrou wrote:
Since this micro-benchmark creates the keys in order just before
filling the dict with them, randomizing the insertion order destroys
the temporal locality of object header accesses when iterating over the
dict keys. *This* looks like the right explanation, not branch
mispredicts due to NULL checks.

This also shows that a micro-benchmark that merely looks ok can actually
be a terrible proxy of actual performance.

Thanks you for great explanation Antoine! I came to the same conclusions about randomized integers example, but didn't notice that this also is a main cause of the speed up of strings example.

As a further validation of this theory, let's dramatically decrease the
working set size on the initial benchmark:

$ ./python -m timeit -s "d=dict.fromkeys(map(str,range(10**3)))"
"list(d)"

-> Python 3.5: 100000 loops, best of 3: 10.9 usec per loop
-> Python 3.6: 100000 loops, best of 3: 9.72 usec per loop

When the working set fits in the cache, this micro-benchmark is
only 12% slower on 3.5 compared to 3.6.
*This* much smaller difference (a mere 1.2ns difference per dict
element) could be attributed to eliminating the NULL checks, or to any
other streamlining of the core iteration logic.

Yet one example, with random hashes and insertion order independent from the creation order.

$ ./python -m timeit -s "import random; a = list(map(str, range(10**6))); random.shuffle(a); d = dict.fromkeys(a)" -- "list(d)"

Python 3.5: 180, 180, 180 msec per loop
Python 3.6: 171, 172, 171 msec per loop

Python 3.6 is 5% faster and this looks closer to the actual performance.


_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to