[issue26280] ceval: Optimize list
Zach Byrne added the comment: The new patch "subscr2" removes the tuple block, and addresses Victor's comments. This one looks a little faster, down to 0.0215 usec for the same test. -- Added file: http://bugs.python.org/file42049/subscr2.patch ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue26280> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue26280] ceval: Optimize list
Zach Byrne added the comment: Is it worth handling the exception, or just let it take the slow path and get caught by PyObject_GetItem()? We're still making sure the index is in bounds. Also, where would be an appropriate place to put a macro for adjusting negative indices? -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue26280> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue26280] ceval: Optimize list[int] (subscript) operation similarly to CPython 2.7
Zach Byrne added the comment: Here's a patch that looks likes Victor's from the duplicate, but with tuples covered as well. I ran some straight forward micro benchmarks but haven't bothered running the benchmark suite yet. Unsurprisingly, optimized paths are faster, and the others take a penalty. [0]byrnez@byrnez-laptop:~/git/python$ ./python.orig -m timeit -s "l = [1,2,3,4,5,6]" "l[3]" 1000 loops, best of 3: 0.0306 usec per loop [0]byrnez@byrnez-laptop:~/git/python$ ./python -m timeit -s "l = [1,2,3,4,5,6]" "l[3]" 1000 loops, best of 3: 0.0243 usec per loop [0]byrnez@byrnez-laptop:~/git/python$ ./python.orig -m timeit -s "l = (1,2,3,4,5,6)" "l[3]" 1000 loops, best of 3: 0.0291 usec per loop [0]byrnez@byrnez-laptop:~/git/python$ ./python -m timeit -s "l = (1,2,3,4,5,6)" "l[3]" 1000 loops, best of 3: 0.0241 usec per loop [0]byrnez@byrnez-laptop:~/git/python$ ./python.orig -m timeit -s "l = 'asdfasdf'" "l[3]" 1000 loops, best of 3: 0.034 usec per loop [0]byrnez@byrnez-laptop:~/git/python$ ./python -m timeit -s "l = 'asdfasdf'" "l[3]" 1000 loops, best of 3: 0.0366 usec per loop [0]byrnez@byrnez-laptop:~/git/python$ ./python.orig -m timeit -s "l = [1,2,3,4,5,6]" "l[:3]" 1000 loops, best of 3: 0.124 usec per loop [0]byrnez@byrnez-laptop:~/git/python$ ./python -m timeit -s "l = [1,2,3,4,5,6]" "l[:3]" 1000 loops, best of 3: 0.125 usec per loop -- Added file: http://bugs.python.org/file41939/subscr1.patch ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue26280> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue26280] ceval: Optimize [] operation similarly to CPython 2.7
Zach Byrne added the comment: I'm attaching output from a selection of the benchmarks, I'm counting non-builtins and slices, but for everything, not just lists and tuples. Quick observation: math workloads seem list heavy, text workloads seem dict heavy, and tuples are usually somewhere in the middle. -- Added file: http://bugs.python.org/file41826/subscr_stats.txt ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue26280> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue26280] ceval: Optimize [] operation similarly to CPython 2.7
Zach Byrne added the comment: Ok, I've started on the instrumenting, thanks for that head start, that would have taken me a while to figure out where to call the stats dump function from. Fun fact: BINARY_SUBSCR is called 717 starting python. -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue26280> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue26280] ceval: Optimize [] operation similarly to CPython 2.7
Zach Byrne added the comment: I'll put together something comprehensive in a bit, but here's a quick preview: $ ./python Python 3.6.0a0 (default, Feb 4 2016, 20:08:03) [GCC 4.6.3] on linux Type "help", "copyright", "credits" or "license" for more information. >>> exit() Total BINARY_SUBSCR calls: 726 List BINARY_SUBSCR calls: 36 Tuple BINARY_SUBSCR calls: 103 Dict BINARY_SUBSCR calls: 227 Unicode BINARY_SUBSCR calls: 288 Bytes BINARY_SUBSCR calls: 68 [-1] BINARY_SUBSCR calls: 0 $ python bm_elementtree.py -n 100 --timer perf_counter ...[snip]... Total BINARY_SUBSCR calls: 1078533 List BINARY_SUBSCR calls: 513 Tuple BINARY_SUBSCR calls: 1322 Dict BINARY_SUBSCR calls: 1063075 Unicode BINARY_SUBSCR calls: 13150 Bytes BINARY_SUBSCR calls: 248 [-1] BINARY_SUBSCR calls: 0 Lib/test$ ../../python -m unittest discover ...[snip]...^C <== I got bored waiting KeyboardInterrupt Total BINARY_SUBSCR calls: 4732885 List BINARY_SUBSCR calls: 1418730 Tuple BINARY_SUBSCR calls: 1300717 Dict BINARY_SUBSCR calls: 1151766 Unicode BINARY_SUBSCR calls: 409924 Bytes BINARY_SUBSCR calls: 363029 [-1] BINARY_SUBSCR calls: 26623 So dict seems to be the winner here -- keywords: +patch Added file: http://bugs.python.org/file41814/26280_stats.diff ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue26280> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue26280] ceval: Optimize [] operation similarly to CPython 2.7
Zach Byrne added the comment: One thing I forgot to do was count slices. -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue26280> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue26280] ceval: Optimize [] operation similarly to CPython 2.7
Zach Byrne added the comment: Yury, Are you going to tackle this one, or would you like me to? -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue26280> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21955] ceval.c: implement fast path for integers with a single digit
Zach Byrne added the comment: > Could you please take a look at the updated patch? Looks ok to me, for whatever that's worth. -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue21955> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21955] ceval.c: implement fast path for integers with a single digit
Zach Byrne added the comment: > I don't understand what this table means (why 4 columns?). Can you explain > what you did? Yury suggested running perf.py twice with the binaries swapped So "faster" and "slower" underneath "Baseline Reference" are runs where the unmodified python binary was the first argument to perf, and the "Modified Reference" is where the patched binary is the first argument. ie. "perf.py -r -b all python patched_python" vs "perf.py -r -b all patched_python python" bench_results.txt has the actual output in it, and the "slower in the right column" comment was referring to the contents of that file, not the table. Sorry for the confusion. -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue21955> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21955] ceval.c: implement fast path for integers with a single digit
Zach Byrne added the comment: I ran 6 benchmarks on my work machine(not the same one as the last set) overnight. Two with just the BINARY_ADD change, two with the BINARY_SUBSCR change, and two with both. I'm attaching the output from all my benchmark runs, but here are the highlights In this table I've flipped the results for running the modified build as the reference, but in the new attachment, slower in the right column means faster, I think :) |--|---|---| |Build | Baseline Reference| Modified Reference | |--||--||--| | | Faster | Slower | Faster | Slower | |--||--||--| |BINARY_ADD| chameleon_v2 | etree_parse | chameleon_v2 | call_simple | | | chaos | nbody| fannkuch | nbody| | | django | normal_startup | normal_startup | pickle_dict | | | etree_generate | pickle_dict | nqueens | regex_v8 | | | fannkuch | pickle_list | regex_compile | | | | formatted_logging | regex_effbot | spectral_norm | | | | go | | unpickle_list | | | | json_load | | | | | | regex_compile | | | | | | simple_logging | | | | | | spectral_norm | | | | |--||--||--| |BINARY_SUBSCR | chameleon_v2 | call_simple | 2to3 | etree_parse | | | chaos | go | call_method_slots | json_dump_v2 | | | etree_generate | pickle_list | chaos | pickle_dict | | | fannkuch | telco| fannkuch | | | | fastpickle | | formatted_logging | | | | hexiom2| | go | | | | json_load | | hexiom2 | | | | mako_v2| | mako_v2 | | | | meteor_contest | | meteor_contest | | | | nbody | | nbody | | | | regex_v8 | | normal_startup | | | | spectral_norm | | nqueens | | | || | pickle_list | | | || | simple_logging | | | || | spectral_norm | | | || | telco | | |--||--||--| |BOTH | chameleon_v2 | call_simple | chameleon_v2 | fastpickle | | | chaos | etree_parse | choas | pickle_dict | | | etree_generate | pathlib | etree_generate | pickle_list | | | etree_process | pickle_list | etree_process | telco| | | fannkuch | | fannkuch | | | | fastunpickle | | float | | | | float | | formatted_logging | | | | formatted_logging | | go | | | | hexiom2| | hexiom2 | | | | nbody | | nbody | | | | nqueens| | normal_startup | | | | regex_v8 | | nqueens | | | | spectral_norm
[issue21955] ceval.c: implement fast path for integers with a single digit
Zach Byrne added the comment: I took another look at this, and tried applying it to 3.6 and running the latest benchmarks. It applied cleanly, and the benchmark results were similar, this time unpack_sequence and spectral_norm were slower. Spectral norm makes sense, it's doing lots of FP addition. The unpack_sequence instruction looks like it already has optimizations for unpacking lists and tuples onto the stack, and running dis on the test showed that it's completely dominated calls to unpack_sequence, load_fast, and store_fast so I still don't know what's going on there. -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue21955> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21955] ceval.c: implement fast path for integers with a single digit
Zach Byrne added the comment: Anybody still looking at this? I can take another stab at it if it's still in scope. -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue21955> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21955] ceval.c: implement fast path for integers with a single digit
Zach Byrne added the comment: > Can you figure why unpack_sequence and other benchmarks were slower? I didn't look really closely, A few of the slower ones were floating point heavy, which would incur the slow path penalty, but I can dig into unpack_sequence. -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue21955> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21955] ceval.c: implement fast path for integers with a single digit
Zach Byrne added the comment: I haven't looked at it since I posted the benchmark results for 21955_2.patch. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue21955 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21955] ceval.c: implement fast path for integers with a single digit
Zach Byrne added the comment: I ran the whole benchmark suite. There are a few that are slower: call_method_slots, float, pickle_dict, and unpack_sequence. Report on Linux zach-vbox 3.2.0-24-generic-pae #39-Ubuntu SMP Mon May 21 18:54:21 UTC 2012 i686 i686 Total CPU cores: 1 ### 2to3 ### 24.789549 - 24.809551: 1.00x slower ### call_method_slots ### Min: 1.743554 - 1.780807: 1.02x slower Avg: 1.751735 - 1.792814: 1.02x slower Significant (t=-26.32) Stddev: 0.00576 - 0.01823: 3.1660x larger ### call_method_unknown ### Min: 1.828094 - 1.739625: 1.05x faster Avg: 1.852225 - 1.806721: 1.03x faster Significant (t=2.28) Stddev: 0.01874 - 0.24320: 12.9783x larger ### call_simple ### Min: 1.353581 - 1.263386: 1.07x faster Avg: 1.397946 - 1.302046: 1.07x faster Significant (t=24.28) Stddev: 0.03667 - 0.03154: 1.1629x smaller ### chaos ### Min: 1.199377 - 1.115550: 1.08x faster Avg: 1.230859 - 1.146573: 1.07x faster Significant (t=16.24) Stddev: 0.02663 - 0.02525: 1.0544x smaller ### django_v2 ### Min: 2.682884 - 2.633110: 1.02x faster Avg: 2.747521 - 2.690486: 1.02x faster Significant (t=9.90) Stddev: 0.02744 - 0.03010: 1.0970x larger ### fastpickle ### Min: 1.751475 - 1.597340: 1.10x faster Avg: 1.771805 - 1.613533: 1.10x faster Significant (t=64.81) Stddev: 0.01177 - 0.01263: 1.0727x larger ### float ### Min: 1.254858 - 1.293067: 1.03x slower Avg: 1.336045 - 1.365787: 1.02x slower Significant (t=-3.30) Stddev: 0.04851 - 0.04135: 1.1730x smaller ### json_dump_v2 ### Min: 17.871819 - 16.968647: 1.05x faster Avg: 18.428747 - 17.483397: 1.05x faster Significant (t=4.10) Stddev: 1.60617 - 0.27655: 5.8078x smaller ### mako ### Min: 0.241614 - 0.231678: 1.04x faster Avg: 0.253730 - 0.240585: 1.05x faster Significant (t=8.93) Stddev: 0.01912 - 0.01327: 1.4417x smaller ### mako_v2 ### Min: 0.225664 - 0.213179: 1.06x faster Avg: 0.234850 - 0.225984: 1.04x faster Significant (t=10.12) Stddev: 0.01379 - 0.01391: 1.0090x larger ### meteor_contest ### Min: 0.777612 - 0.758924: 1.02x faster Avg: 0.799580 - 0.780897: 1.02x faster Significant (t=3.97) Stddev: 0.02482 - 0.02212: 1.1221x smaller ### nbody ### Min: 0.969724 - 0.883935: 1.10x faster Avg: 0.996416 - 0.918375: 1.08x faster Significant (t=12.65) Stddev: 0.02426 - 0.03627: 1.4951x larger ### nqueens ### Min: 1.142745 - 1.128195: 1.01x faster Avg: 1.296659 - 1.162443: 1.12x faster Significant (t=2.75) Stddev: 0.34462 - 0.02680: 12.8578x smaller ### pickle_dict ### Min: 1.433264 - 1.467394: 1.02x slower Avg: 1.468122 - 1.506908: 1.03x slower Significant (t=-7.20) Stddev: 0.02695 - 0.02691: 1.0013x smaller ### raytrace ### Min: 5.454853 - 5.538799: 1.02x slower Avg: 5.530943 - 5.676983: 1.03x slower Significant (t=-8.64) Stddev: 0.05152 - 0.10791: 2.0947x larger ### regex_effbot ### Min: 0.205875 - 0.194776: 1.06x faster Avg: 0.28 - 0.198759: 1.06x faster Significant (t=5.10) Stddev: 0.01305 - 0.01112: 1.1736x smaller ### regex_v8 ### Min: 0.141628 - 0.133819: 1.06x faster Avg: 0.147024 - 0.140053: 1.05x faster Significant (t=2.72) Stddev: 0.01163 - 0.01388: 1.1933x larger ### richards ### Min: 0.734472 - 0.727501: 1.01x faster Avg: 0.760795 - 0.743484: 1.02x faster Significant (t=3.50) Stddev: 0.02778 - 0.02127: 1.3061x smaller ### silent_logging ### Min: 0.344678 - 0.336087: 1.03x faster Avg: 0.357982 - 0.347361: 1.03x faster Significant (t=2.76) Stddev: 0.01992 - 0.01852: 1.0755x smaller ### simple_logging ### Min: 1.104831 - 1.072921: 1.03x faster Avg: 1.146844 - 1.117068: 1.03x faster Significant (t=4.02) Stddev: 0.03552 - 0.03848: 1.0833x larger ### spectral_norm ### Min: 1.710336 - 1.688910: 1.01x faster Avg: 1.872578 - 1.738698: 1.08x faster Significant (t=2.35) Stddev: 0.40095 - 0.03331: 12.0356x smaller ### tornado_http ### Min: 0.849374 - 0.852209: 1.00x slower Avg: 0.955472 - 0.916075: 1.04x faster Significant (t=4.82) Stddev: 0.07059 - 0.04119: 1.7139x smaller ### unpack_sequence ### Min: 0.30 - 0.20: 1.52x faster Avg: 0.000164 - 0.000174: 1.06x slower Significant (t=-13.11) Stddev: 0.00011 - 0.00013: 1.2256x larger ### unpickle_list ### Min: 1.333952 - 1.212805: 1.10x faster Avg: 1.373228 - 1.266677: 1.08x faster Significant (t=16.32) Stddev: 0.02894 - 0.03597: 1.2428x larger -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue21955 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21955] ceval.c: implement fast path for integers with a single digit
Zach Byrne added the comment: I did something similar to BINARY_SUBSCR after looking at the 2.7 source as Raymond suggested. Hopefully I got my binaries straight this time :) The new patch includes Victor's inlining and my new subscript changes. Platform of campaign orig: Python version: 3.5.0a0 (default:c8ce5bca0fcd+, Jul 15 2014, 18:11:28) [GCC 4.6.3] Timer precision: 6 ns Date: 2014-07-21 20:28:30 Platform of campaign patch: Python version: 3.5.0a0 (default:c8ce5bca0fcd+, Jul 21 2014, 20:21:20) [GCC 4.6.3] Timer precision: 20 ns Date: 2014-07-21 20:28:39 -+-+--- Tests | orig | patch -+-+--- 1+2 | 118 ns (*) | 103 ns (-13%) 1+2 ran 100 times | 7.28 us (*) | 5.93 us (-19%) x[1] | 120 ns (*) | 98 ns (-19%) x[1] ran 100 times | 7.35 us (*) | 5.31 us (-28%) -+-+--- Total | 14.9 us (*) | 11.4 us (-23%) -+-+--- -- Added file: http://bugs.python.org/file36021/21955_2.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue21955 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21955] ceval.c: implement fast path for integers with a single digit
Zach Byrne added the comment: Well, dont' I feel silly. I confirmed both my regression and the inline speedup using the benchmark Victor added. I wonder if I got my binaries backwards in my first test... -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue21955 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21955] ceval.c: implement fast path for integers with a single digit
Zach Byrne added the comment: So I'm trying something pretty similar to Victor's pseudo-code and just using timeit to look for speedups timeit('x+x', 'x=10', number=1000) before: 1.193423141393 1.1988609210002323 1.1998214110003573 1.206968028999654 1.2065417159997196 after: 1.1698650090002047 1.170515890227 1.1752884750003432 1.174481861933 1.1741297110002051 1.176042264782 Small improvement. Haven't looked at optimizing BINARY_SUBSCR yet. -- keywords: +patch nosy: +zbyrne Added file: http://bugs.python.org/file35961/21955.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue21955 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21323] CGI HTTP server not running scripts from subdirectories
Zach Byrne added the comment: Done and done. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue21323 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21323] CGI HTTP server not running scripts from subdirectories
Zach Byrne added the comment: Hi, I'm new. I wrote a test for nested directories under cgi-bin and got that to pass without failing the test added for 19435 by undoing most of the changes to run_cgi() but building path from the values in self.cgi_info. Thoughts? -- keywords: +patch nosy: +zbyrne Added file: http://bugs.python.org/file35908/21323.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue21323 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com