> On Sep 14, 2016, at 11:31 PM, Serhiy Storchaka <storch...@gmail.com> wrote:
> Note that this is made at the expense of the 20% slowing down an iteration.
> $ ./python -m timeit -s "d = dict.fromkeys(range(10**6))" -- "list(d)"
> Python 3.5: 66.1 msec per loop
> Python 3.6: 82.5 msec per loop

A range of consecutive integers which have consecutive hash values is a really 
weak and non-representative basis for comparison.

Something like this will reveal the true and massive improvement in iteration 

     $ ./python.exe -m timeit -s "d=dict.fromkeys(map(str,range(10**6)))" 

There are two reasons for the significant improvement in iteration speed:

1) The dense key table is smaller (no intervening NULL entries) so we do fewer 
total memory fetches to loop over the keys, values, or items.

2) The loop over the dense table no longer makes frequent, unpredictable tests 
for NULL entries.  (To better understand why this matters and how major the 
impact is, see http://stackoverflow.com/questions/11227809 ).

Your mileage will vary depending on the size of dictionary and whether the old 
dictionary would have densely packed the keys (as in Serhiy's 
non-representative example).


P.S.  Algorithmically, the compact dict seems to be mostly where it needs to be 
(modulo some implementation bugs that are being ironed-out).  However, the code 
hasn't been tuned and polished as much as the old implementation, so there is 
still room for its timings to improve.  Dict copies should end-up being faster 
(fewer bytes copied and a predictable test for NULLs).  Resizes should be much 
faster (only the small index table needs to be updated, while the 
keys/values/hashes don't get moved).  In complex apps, the memory savings ought 
translate into better cache performance (that doesn't show-up much in tight 
benchmark loops but tends to make a different in real code).
Python-Dev mailing list

Reply via email to