> On Sep 14, 2016, at 11:31 PM, Serhiy Storchaka <storch...@gmail.com> wrote:
> Note that this is made at the expense of the 20% slowing down an iteration.
> $ ./python -m timeit -s "d = dict.fromkeys(range(10**6))" -- "list(d)"
> Python 3.5: 66.1 msec per loop
> Python 3.6: 82.5 msec per loop
A range of consecutive integers which have consecutive hash values is a really
weak and non-representative basis for comparison.
Something like this will reveal the true and massive improvement in iteration
$ ./python.exe -m timeit -s "d=dict.fromkeys(map(str,range(10**6)))"
There are two reasons for the significant improvement in iteration speed:
1) The dense key table is smaller (no intervening NULL entries) so we do fewer
total memory fetches to loop over the keys, values, or items.
2) The loop over the dense table no longer makes frequent, unpredictable tests
for NULL entries. (To better understand why this matters and how major the
impact is, see http://stackoverflow.com/questions/11227809 ).
Your mileage will vary depending on the size of dictionary and whether the old
dictionary would have densely packed the keys (as in Serhiy's
P.S. Algorithmically, the compact dict seems to be mostly where it needs to be
(modulo some implementation bugs that are being ironed-out). However, the code
hasn't been tuned and polished as much as the old implementation, so there is
still room for its timings to improve. Dict copies should end-up being faster
(fewer bytes copied and a predictable test for NULLs). Resizes should be much
faster (only the small index table needs to be updated, while the
keys/values/hashes don't get moved). In complex apps, the memory savings ought
translate into better cache performance (that doesn't show-up much in tight
benchmark loops but tends to make a different in real code).
Python-Dev mailing list