[issue26814] [WIP] Add a new _PyObject_FastCall() function which avoids the creation of a tuple or dict for arguments

2016-09-01 Thread STINNER Victor
STINNER Victor added the comment: I splitted the giant patch into smaller patches easier to review. The first part (_PyObject_FastCall, _PyObject_FastCallDict) is already merged. Other issues were opened to implement the full feature. I now close this issue. -- resolution: -> fixed

[issue26814] [WIP] Add a new _PyObject_FastCall() function which avoids the creation of a tuple or dict for arguments

2016-05-25 Thread STINNER Victor
STINNER Victor added the comment: I fixed even more issues with my setup to run benchmark. Results should be even more reliable. Moreover, I fixed multiple reference leaks in the code which introduced performance regressions. I started to write articles to explain how to run stable

[issue26814] [WIP] Add a new _PyObject_FastCall() function which avoids the creation of a tuple or dict for arguments

2016-05-20 Thread STINNER Victor
STINNER Victor added the comment: > unpickle_list: 1.11x faster This result was unfair: my fastcall branch contained the optimization of the issue #27056. I just pushed this optimization into the default branch. I ran again the benchmark: the result is now "not significant", as expected,

[issue26814] [WIP] Add a new _PyObject_FastCall() function which avoids the creation of a tuple or dict for arguments

2016-05-19 Thread STINNER Victor
STINNER Victor added the comment: > In short, I replayed exaclty the same scenario. And... Only raytrace remains > slower, (...) Oh, it looks like the reference binary calls the garbage collector less frequently than the patched python. In the patched Python, collections of the generation 2

[issue26814] [WIP] Add a new _PyObject_FastCall() function which avoids the creation of a tuple or dict for arguments

2016-05-19 Thread STINNER Victor
STINNER Victor added the comment: > Result of the benchmark suite: > > slower (3): > > * raytrace: 1.06x slower > * etree_parse: 1.03x slower > * normal_startup: 1.02x slower Hum, I recompiled the patched Python, again with PGO+LTO, and ran the same benchmark with the same command. In short, I

[issue26814] [WIP] Add a new _PyObject_FastCall() function which avoids the creation of a tuple or dict for arguments

2016-05-19 Thread STINNER Victor
STINNER Victor added the comment: Status of the my FASTCALL implementation (34456cce64bb.patch): * Add METH_FASTCALL calling convention to C functions, similar to METH_VARARGS|METH_KEYWORDS * Clinic uses METH_FASTCALL when possible (it may use METH_FASTCALL for all cases in the future) *

[issue26814] [WIP] Add a new _PyObject_FastCall() function which avoids the creation of a tuple or dict for arguments

2016-05-19 Thread STINNER Victor
STINNER Victor added the comment: New patch: 34456cce64bb.patch $ diffstat 34456cce64bb.patch .hgignore |3 Makefile.pre.in | 37 b/Doc/includes/shoddy.c |2 b/Include/Python.h

[issue26814] [WIP] Add a new _PyObject_FastCall() function which avoids the creation of a tuple or dict for arguments

2016-05-19 Thread STINNER Victor
Changes by STINNER Victor : Removed file: http://bugs.python.org/file42898/34456cce64bb.diff ___ Python tracker ___

[issue26814] [WIP] Add a new _PyObject_FastCall() function which avoids the creation of a tuple or dict for arguments

2016-05-19 Thread STINNER Victor
Changes by STINNER Victor : Added file: http://bugs.python.org/file42898/34456cce64bb.diff ___ Python tracker ___

[issue26814] [WIP] Add a new _PyObject_FastCall() function which avoids the creation of a tuple or dict for arguments

2016-05-19 Thread STINNER Victor
STINNER Victor added the comment: Hi, I made progress on my FASTCALL branch. I removed tp_fastnew, tp_fastinit and tp_fastnew fields from PyTypeObject to replace them with new type flags (ex: Py_TPFLAGS_FASTNEW) to avoid code duplication and reduce the memory footprint. Before, each function

[issue26814] [WIP] Add a new _PyObject_FastCall() function which avoids the creation of a tuple or dict for arguments

2016-05-09 Thread Jakub Stasiak
Changes by Jakub Stasiak : -- nosy: +jstasiak ___ Python tracker ___ ___

[issue26814] [WIP] Add a new _PyObject_FastCall() function which avoids the creation of a tuple or dict for arguments

2016-04-29 Thread STINNER Victor
STINNER Victor added the comment: > Results look as a noise. As I wrote, it's really hard to get a reliable benchmark result. I did my best. See also discussions about the CPython benchmark suite on the speed list: https://mail.python.org/pipermail/speed/ I'm not sure that you will get less

[issue26814] [WIP] Add a new _PyObject_FastCall() function which avoids the creation of a tuple or dict for arguments

2016-04-29 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Results look as a noise. Some tests become slower, others become faster. If results on different machine will show the same sets of slowing down and speeding up tests, this likely is not a noise. -- ___ Python

[issue26814] [WIP] Add a new _PyObject_FastCall() function which avoids the creation of a tuple or dict for arguments

2016-04-29 Thread STINNER Victor
STINNER Victor added the comment: > Could you repeat benchmarks on different computer? Better with different CPU > or compiler. Sorry, I don't really have the bandwith to repeat the benchmarks. PGO+LTO compilation is slow and running the benchmark suite in rigorous mode is very slow. What

[issue26814] [WIP] Add a new _PyObject_FastCall() function which avoids the creation of a tuple or dict for arguments

2016-04-29 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Could you repeat benchmarks on different computer? Better with different CPU or compiler. -- ___ Python tracker ___

[issue26814] [WIP] Add a new _PyObject_FastCall() function which avoids the creation of a tuple or dict for arguments

2016-04-29 Thread STINNER Victor
STINNER Victor added the comment: > Results of the CPython benchmark suite. Reference = default branch at rev > 496e094f4734, patched: fastcall fork at rev 2b4b7def2949. Oh, I forgot to mention that I modified perf.py to run each benchmark using 10 fresh processes to test multiple random

[issue26814] [WIP] Add a new _PyObject_FastCall() function which avoids the creation of a tuple or dict for arguments

2016-04-29 Thread STINNER Victor
STINNER Victor added the comment: Results of the CPython benchmark suite. Reference = default branch at rev 496e094f4734, patched: fastcall fork at rev 2b4b7def2949. I got many issues to get a reliable benchmark output: * https://mail.python.org/pipermail/speed/2016-April/000329.html *

[issue26814] [WIP] Add a new _PyObject_FastCall() function which avoids the creation of a tuple or dict for arguments

2016-04-24 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: I think you can simplify the patch by dropping keyword arguments support from fastcall. Then you can decrease _PyStack_SIZE to 4 (larger size will serve only 1.7% of calls), and may be refactor a code since an array of 4 pointers consumes less C stack than

[issue26814] [WIP] Add a new _PyObject_FastCall() function which avoids the creation of a tuple or dict for arguments

2016-04-24 Thread STINNER Victor
STINNER Victor added the comment: > Thus I think we need to optimize only cases of calling with small number > (0-3) of positional arguments. My code is optimized to up to 10 positional arguments: with 0..10 arguments, the C stack is used to hold the array of PyObject*. For more arguments, an

[issue26814] [WIP] Add a new _PyObject_FastCall() function which avoids the creation of a tuple or dict for arguments

2016-04-24 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: I have collected statistics about using CALL_FUNCTION* opcodes in compliled code during running CPython testsuite. According to it, 99.4% emitted opcodes is the CALL_FUNCTION opcode, and 89% of emitted CALL_FUNCTION opcodes have only positional arguments,

[issue26814] [WIP] Add a new _PyObject_FastCall() function which avoids the creation of a tuple or dict for arguments

2016-04-22 Thread STINNER Victor
STINNER Victor added the comment: Results of the CPython benchmark suite on the revision 6c376e866330 of https://hg.python.org/sandbox/fastcall compared to CPython 3.6 at the revision 496e094f4734. It's surprising than call_simple is 1.08x slower in fastcall. This slowdown is not acceptable

[issue26814] [WIP] Add a new _PyObject_FastCall() function which avoids the creation of a tuple or dict for arguments

2016-04-22 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Could you compare filter(), map() and sorted() performance with your patch and with issue23507 patch? -- ___ Python tracker

[issue26814] [WIP] Add a new _PyObject_FastCall() function which avoids the creation of a tuple or dict for arguments

2016-04-22 Thread STINNER Victor
STINNER Victor added the comment: For more fun, comparison between Python 2.7 / 3.4 / 3.6 / 3.6 FASTCALL. --+-+++--- Tests |py27 | py34 | py36 |

[issue26814] [WIP] Add a new _PyObject_FastCall() function which avoids the creation of a tuple or dict for arguments

2016-04-22 Thread STINNER Victor
STINNER Victor added the comment: Some microbenchmarks: bench_fast.py. == Python 3.6 / Python 3.6 FASTCALL == --+--+--- Tests | /tmp/default | /tmp/fastcall

[issue26814] [WIP] Add a new _PyObject_FastCall() function which avoids the creation of a tuple or dict for arguments

2016-04-22 Thread STINNER Victor
STINNER Victor added the comment: Related issue: issue #23507, "Tuple creation is too slow". -- ___ Python tracker ___

[issue26814] [WIP] Add a new _PyObject_FastCall() function which avoids the creation of a tuple or dict for arguments

2016-04-22 Thread STINNER Victor
STINNER Victor added the comment: Changes of my current implementation, ad4a53ed1fbf.diff. The good thing is that all changes are internals (really?). Even if you don't modify your C extensions (nor your Python code), you should benefit of the new fast call is *a lot* of cases. IMHO the best

[issue26814] [WIP] Add a new _PyObject_FastCall() function which avoids the creation of a tuple or dict for arguments

2016-04-22 Thread STINNER Victor
Changes by STINNER Victor : Added file: http://bugs.python.org/file42566/ad4a53ed1fbf.diff ___ Python tracker ___

[issue26814] [WIP] Add a new _PyObject_FastCall() function which avoids the creation of a tuple or dict for arguments

2016-04-21 Thread STINNER Victor
STINNER Victor added the comment: I created a repository. I will work there and make some experiment. It would help to have a better idea of the concrete performance. When I will have a better view of all requires changes to get best performances everywhere, I will start a discussion to see