Inada Naoki <songofaca...@gmail.com> added the comment:
> I do not like how much code is needed for such minor optimization. PR 12307 is +46 lines. Benefit is about 10ns when first insertion. $ cpython/python.opt -m perf timeit --compare-to master/python.opt --python-names master:empty-dict2 --duplicate 100 'd={};d["a"]=1' master: ..................... 62.6 ns +- 1.1 ns empty-dict2: ..................... 52.5 ns +- 1.0 ns Mean +- std dev: [master] 62.6 ns +- 1.1 ns -> [empty-dict2] 52.5 ns +- 1.0 ns: 1.19x faster (-16%) While "1.19x faster" is significant, I agree that this optimization is "minor". _PyDict_NewPresized() is used for important cases. So this optimization doesn't work in most case. But I came up with an idea. _PyDict_NewPresized(n) skips preallocation when n fit in minsize table. https://github.com/python/cpython/pull/12307/commits/93906f1568b08c59e0afddfc98070827c65cd0f9 With this idea, "first insertion optimization" works. Additionally, it may help branch prediction for `while (newsize < minsize) { newsize <<= 1 }` loop. Total diff is still +46 lines. But it affects more cases. $ alias compare='cpython/python.opt -m perf timeit --compare-to master/python.opt --python-names master:empty-dict2' $ compare -s 'def f(x, **kw): pass' --duplicate 10 -- 'f(4)' # No table Mean +- std dev: [master] 65.4 ns +- 2.3 ns -> [empty-dict2] 64.5 ns +- 1.5 ns: 1.01x faster (-1%) $ compare -s 'def f(x, **kw): pass' --duplicate 10 -- 'f(4, a=1)' # minsize table is allocated Mean +- std dev: [master] 152 ns +- 3 ns -> [empty-dict2] 144 ns +- 4 ns: 1.05x faster (-5%) $ compare -s 'def f(x, **kw): pass' --duplicate 10 -- 'f(4, a=1,b=2)' Mean +- std dev: [master] 211 ns +- 3 ns -> [empty-dict2] 186 ns +- 3 ns: 1.13x faster (-12%) $ compare -s 'def f(x, **kw): pass' --duplicate 10 -- 'f(4, a=1,b=2,c=3)' Mean +- std dev: [master] 248 ns +- 3 ns -> [empty-dict2] 223 ns +- 3 ns: 1.11x faster (-10%) $ compare -s 'def f(x, **kw): pass' --duplicate 10 -- 'f(4, a=1,b=2,c=3,d=4,e=5)' Mean +- std dev: [master] 327 ns +- 6 ns -> [empty-dict2] 301 ns +- 6 ns: 1.09x faster (-8%) $ compare -s 'def f(x, **kw): pass' --duplicate 10 -- 'f(4, a=1,b=2,c=3,d=4,e=5,f=6)' # minsize*2 table is allocated Mean +- std dev: [master] 431 ns +- 8 ns -> [empty-dict2] 406 ns +- 8 ns: 1.06x faster (-6%) Now I think PR 12307 is not so minor. Of course, same idea can be applied to PR 12308. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue30040> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com