[issue28839] _PyFunction_FastCallDict(): replace PyTuple_New() with PyMem_Malloc()

2017-01-10 Thread STINNER Victor
STINNER Victor added the comment: Ok, I give up on that one. I don't think that it's worth it. -- resolution: -> rejected status: open -> closed ___ Python tracker

[issue28839] _PyFunction_FastCallDict(): replace PyTuple_New() with PyMem_Malloc()

2017-01-02 Thread STINNER Victor
STINNER Victor added the comment: fastcalldict-4.patch: Rebased patch. kw1: Median +- std dev: [ref] 290 ns +- 3 ns -> [patch] 253 ns +- 21 ns: 1.14x faster (-13%) kw5: Median +- std dev: [ref] 438 ns +- 33 ns -> [patch] 394 ns +- 27 ns: 1.11x faster (-10%) kw10: Median +- std dev: [ref] 663

[issue28839] _PyFunction_FastCallDict(): replace PyTuple_New() with PyMem_Malloc()

2017-01-02 Thread STINNER Victor
STINNER Victor added the comment: I pushed the two obvious and safe optimization of fastcalldict-3.patch. -- ___ Python tracker ___

[issue28839] _PyFunction_FastCallDict(): replace PyTuple_New() with PyMem_Malloc()

2017-01-02 Thread Roundup Robot
Roundup Robot added the comment: New changeset 5f7cd3b6c9b1 by Victor Stinner in branch 'default': Issue #28839: Optimize function_call() https://hg.python.org/cpython/rev/5f7cd3b6c9b1 New changeset f9dd607dc04c by Victor Stinner in branch 'default': Optimize _PyFunction_FastCallDict() when

[issue28839] _PyFunction_FastCallDict(): replace PyTuple_New() with PyMem_Malloc()

2017-01-02 Thread STINNER Victor
STINNER Victor added the comment: Quick update on the fastcall work. > I pushed th echange b9c9691c72c5 to replace PyObject_CallFunctionObjArgs() > with _PyObject_CallNoArg() or _PyObject_CallArg1(). These functions are > simpler and don't allocate memory on the C stack. Using

[issue28839] _PyFunction_FastCallDict(): replace PyTuple_New() with PyMem_Malloc()

2016-12-01 Thread STINNER Victor
STINNER Victor added the comment: I pushed th echange b9c9691c72c5 to replace PyObject_CallFunctionObjArgs() with _PyObject_CallNoArg() or _PyObject_CallArg1(). These functions are simpler and don't allocate memory on the C stack. I made similar to PyObject_CallFunctionObjArgs() in Python 3.6

[issue28839] _PyFunction_FastCallDict(): replace PyTuple_New() with PyMem_Malloc()

2016-12-01 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: The problem with C stack overflow is not new, but your patch may make it worse (I don't know if it actually make it worse). Py_EnterRecursiveCall() is used for limiting Python stack. It can't prevent C stack overflow. --

[issue28839] _PyFunction_FastCallDict(): replace PyTuple_New() with PyMem_Malloc()

2016-12-01 Thread STINNER Victor
STINNER Victor added the comment: > I agree with Josh, PyTuple_New() can be faster than PyMem_Malloc() due to > tuple free list. According to benchmarks, PyTuple_New() is slower than PyMem_Malloc(). It's not surprising for me, using a tuple object requires extra work: * Track and then

[issue28839] _PyFunction_FastCallDict(): replace PyTuple_New() with PyMem_Malloc()

2016-12-01 Thread STINNER Victor
STINNER Victor added the comment: Serhiy: "small_stack increases C stack consumption even for calls without keyword arguments. This is serious problem since we can't control stack overflow." This problem is not new and is worked around by Py_EnterRecursiveCall() macro which counts the depth

[issue28839] _PyFunction_FastCallDict(): replace PyTuple_New() with PyMem_Malloc()

2016-12-01 Thread STINNER Victor
STINNER Victor added the comment: > Note: Using a simple printf() in the C code, I noticed that it is not > uncommon that _PyFunction_FastCallDict() is called with an empty dictionary > for keyword arguments. Simplified Python example where _PyFunction_FastCallDict() is called with an empty

[issue28839] _PyFunction_FastCallDict(): replace PyTuple_New() with PyMem_Malloc()

2016-12-01 Thread STINNER Victor
STINNER Victor added the comment: (Oops, I attached the wrong benchmark script. It's now fixed.) -- Added file: http://bugs.python.org/file45721/bench_fastcalldict.py ___ Python tracker

[issue28839] _PyFunction_FastCallDict(): replace PyTuple_New() with PyMem_Malloc()

2016-12-01 Thread STINNER Victor
Changes by STINNER Victor : Removed file: http://bugs.python.org/file45720/bench_fastcalldict.py ___ Python tracker ___

[issue28839] _PyFunction_FastCallDict(): replace PyTuple_New() with PyMem_Malloc()

2016-12-01 Thread STINNER Victor
STINNER Victor added the comment: bench_fastcalldict.py: hardcore microbenchmark on _PyFunction_FastCallDict(). Pass keyword arguments to the tp_init slot of a Python constructor. Result for 1, 5 and 10 keyword arguments: kw1: Median +- std dev: [ref] 329 ns +- 21 ns -> [patch] 306 ns +- 17

[issue28839] _PyFunction_FastCallDict(): replace PyTuple_New() with PyMem_Malloc()

2016-12-01 Thread STINNER Victor
STINNER Victor added the comment: Patch version: fix the "if (0)" to use the small stack allocated on the C stack. -- Added file: http://bugs.python.org/file45719/fastcalldict-3.patch ___ Python tracker

[issue28839] _PyFunction_FastCallDict(): replace PyTuple_New() with PyMem_Malloc()

2016-12-01 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: I agree with Josh, PyTuple_New() can be faster than PyMem_Malloc() due to tuple free list. small_stack increases C stack consumption even for calls without keyword arguments. This is serious problem since we can't control stack overflow. --

[issue28839] _PyFunction_FastCallDict(): replace PyTuple_New() with PyMem_Malloc()

2016-11-30 Thread Josh Rosenberg
Josh Rosenberg added the comment: Minor correction: No allocation when small stack used, so you'd only see (possibly) regressions with 6+ keyword arguments (assuming the tuple free list applies for tuples that large). Admittedly a minor concern; keyword processing is already pretty slow, and

[issue28839] _PyFunction_FastCallDict(): replace PyTuple_New() with PyMem_Malloc()

2016-11-30 Thread Josh Rosenberg
Josh Rosenberg added the comment: Given you can't avoid the refcounting overhead, how much does this really help? Are there meaningful benefits in microbenchmarks? I'd worry that unconditional allocation from PyMem_Malloc might lose out relative to PyTuple_New, which is likely to not involve

[issue28839] _PyFunction_FastCallDict(): replace PyTuple_New() with PyMem_Malloc()

2016-11-30 Thread STINNER Victor
STINNER Victor added the comment: fastcalldict.patch avoided INCREF/DECREF on keyword keys and values. This is wrong: we must hold strong references because the keyword dictionary can be technically modified: see issue #2016 and test_extcall. Hum, I'm quite sure that it's not the first time

[issue28839] _PyFunction_FastCallDict(): replace PyTuple_New() with PyMem_Malloc()

2016-11-30 Thread STINNER Victor
New submission from STINNER Victor: Attached patch is a minor optimization for _PyFunction_FastCallDict(): avoid the creation of a tuple to pass keyword arguments, use a simple C array allocated by PyMem_Malloc(). It also uses a small stack of 80 bytes (2*5*sizeof(PyObject*)) allocated on the