[Eric]

>> My understanding is that the all-int-keys case is an outlier.  This is due
>> to how ints hash, resulting in fewer collisions and a mostly
>> insertion-ordered hash table.  Consequently, I'd expect the above
>> microbenchmark to give roughly the same result between 3.5 and 3.6, which
>> it did.


[Antoine]
> Dict iteration shouldn't have any dependence on collisions or insertion
> order.  It's just a table scan, both in 3.5 and 3.6.

Eric is correct on this one.  The consecutive hashes make a huge difference for 
Python 3.5.   While there is a table full table scan, the check for NULL 
entries becomes a predictable branch when all the keys are in consecutive 
positions.   There is an astonishingly well written stack overflow post that 
explains this effect clearly: http://stackoverflow.com/questions/11227809 

With normal randomized keys, Python 3.6 loop is dramatically better that Python 
3.5:

~/py36 $ ./python.exe -m timeit -s "d=dict.fromkeys(map(str,range(10**6)))" 
"list(d)"
100 loops, best of 3: 12.3 msec per loop

~/py35 $ ./python.exe -m timeit -s "d=dict.fromkeys(map(str,range(10**6)))" 
"list(d)"
10 loops, best of 3: 54.7 msec per loop

Repeating the timings, I get consistent results: 12.0 vs 46.7 and 12.0 vs 52.2 
and 11.5 vs 44.8. 


Raymond



P.S. Timings are from fresh builds on Mac OS X 10.11.6 running on a 2.6 Ghz 
Haswell i7 with 16Gb of 1600 Mhz ram:  $ ./configure CC=gcc-6 && make



_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to