Inada Naoki <songofaca...@gmail.com> added the comment:

> I do not like how much code is needed for such minor optimization.

PR 12307 is +46 lines.  Benefit is about 10ns when first insertion.

$ cpython/python.opt -m perf timeit --compare-to master/python.opt 
--python-names master:empty-dict2 --duplicate 100 'd={};d["a"]=1'
master: ..................... 62.6 ns +- 1.1 ns
empty-dict2: ..................... 52.5 ns +- 1.0 ns

Mean +- std dev: [master] 62.6 ns +- 1.1 ns -> [empty-dict2] 52.5 ns +- 1.0 ns: 
1.19x faster (-16%)


While "1.19x faster" is significant, I agree that this optimization is "minor".
_PyDict_NewPresized() is used for important cases.  So this optimization 
doesn't work in most case.


But I came up with an idea.  _PyDict_NewPresized(n) skips preallocation when n 
fit in minsize table.
https://github.com/python/cpython/pull/12307/commits/93906f1568b08c59e0afddfc98070827c65cd0f9

With this idea, "first insertion optimization" works.  Additionally, it may 
help branch prediction
for `while (newsize < minsize) { newsize <<= 1 }` loop.

Total diff is still +46 lines.  But it affects more cases.


$ alias compare='cpython/python.opt -m perf timeit --compare-to 
master/python.opt --python-names master:empty-dict2'

$ compare -s 'def f(x, **kw): pass' --duplicate 10 -- 'f(4)'  # No table
Mean +- std dev: [master] 65.4 ns +- 2.3 ns -> [empty-dict2] 64.5 ns +- 1.5 ns: 
1.01x faster (-1%)

$ compare -s 'def f(x, **kw): pass' --duplicate 10 -- 'f(4, a=1)'  # minsize 
table is allocated
Mean +- std dev: [master] 152 ns +- 3 ns -> [empty-dict2] 144 ns +- 4 ns: 1.05x 
faster (-5%)

$ compare -s 'def f(x, **kw): pass' --duplicate 10 -- 'f(4, a=1,b=2)'
Mean +- std dev: [master] 211 ns +- 3 ns -> [empty-dict2] 186 ns +- 3 ns: 1.13x 
faster (-12%)

$ compare -s 'def f(x, **kw): pass' --duplicate 10 -- 'f(4, a=1,b=2,c=3)'
Mean +- std dev: [master] 248 ns +- 3 ns -> [empty-dict2] 223 ns +- 3 ns: 1.11x 
faster (-10%)

$ compare -s 'def f(x, **kw): pass' --duplicate 10 -- 'f(4, 
a=1,b=2,c=3,d=4,e=5)'
Mean +- std dev: [master] 327 ns +- 6 ns -> [empty-dict2] 301 ns +- 6 ns: 1.09x 
faster (-8%)

$ compare -s 'def f(x, **kw): pass' --duplicate 10 -- 'f(4, 
a=1,b=2,c=3,d=4,e=5,f=6)'  # minsize*2 table is allocated
Mean +- std dev: [master] 431 ns +- 8 ns -> [empty-dict2] 406 ns +- 8 ns: 1.06x 
faster (-6%)


Now I think PR 12307 is not so minor.  Of course, same idea can be applied to 
PR 12308.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue30040>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to