[issue28870] Reduce stack consumption of PyObject_CallFunctionObjArgs() and like

2017-02-06 Thread STINNER Victor

Changes by STINNER Victor :


--
resolution:  -> fixed
stage:  -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28870] Reduce stack consumption of PyObject_CallFunctionObjArgs() and like

2017-02-01 Thread STINNER Victor

STINNER Victor added the comment:

"The default branch is now as good as Python 3.4, in term of stack consumption, 
and Python 3.4 was the Python version which used the least stack memory 
according to my tests."

I consider that the initial issue is now fixed, so I close the issue.

Thanks Serhiy for the tests, reviews, ideas and obvious the bug report ;-) I 
never looked at the stack usage before.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28870] Reduce stack consumption of PyObject_CallFunctionObjArgs() and like

2017-01-11 Thread STINNER Victor

STINNER Victor added the comment:

I also ran the reliable performance benchmark suite with LTO+PGO. There is no 
significant performance change on these benchmarks:
https://speed.python.org/changes/?rev=b9404639a18c&exe=5&env=speed-python

The largest change is on scimark_lu (-13%), but there was an hiccup on the 
previous change which is probably a small unstability in the benchmark. It's 
not a speedup of these changes.

The second largest change is on spectral_norm: +9%. But this benchmark is known 
to be unstable, there was already a small peak previously. Again, I don't think 
that it's related to the changes.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28870] Reduce stack consumption of PyObject_CallFunctionObjArgs() and like

2017-01-10 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Awesome! You are great Victor!

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28870] Reduce stack consumption of PyObject_CallFunctionObjArgs() and like

2017-01-10 Thread STINNER Victor

STINNER Victor added the comment:

Result of attached bench_recursion-2.py comparing before/after the 3 changes 
reducing the stack consumption:

test_python_call: Median +- std dev: [a30cdf366c02] 512 us +- 12 us -> 
[6478e6d0476f] 467 us +- 21 us: 1.10x faster (-9%)
test_python_getitem: Median +- std dev: [a30cdf366c02] 485 us +- 26 us -> 
[6478e6d0476f] 437 us +- 18 us: 1.11x faster (-10%)
test_python_iterator: Median +- std dev: [a30cdf366c02] 1.15 ms +- 0.04 ms -> 
[6478e6d0476f] 1.03 ms +- 0.06 ms: 1.12x faster (-10%)

At least, it doesn't seem to be slower. Maybe the speedup comes from 
call_function() inlining. This function was probably already inlined when using 
PGO build.

The script was written by Serhiy in the issue #29227, I modified it to use the 
Runner.timeit() API for convenience.

--
Added file: http://bugs.python.org/file46249/bench_recursion-2.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28870] Reduce stack consumption of PyObject_CallFunctionObjArgs() and like

2017-01-10 Thread STINNER Victor

STINNER Victor added the comment:

I pushed 3 changes:

* rev b9404639a18c: Issue #29233: call_method() now uses _PyObject_FastCall()
* rev 8481c379e2da: Issue #29227: inline call_function()
* rev 6478e6d0476f: Issue #29234: disable _PyStack_AsTuple() inlining


Before (rev a30cdf366c02):

test_python_call: 7175 calls before crash, stack: 1168 bytes/call
test_python_getitem: 6235 calls before crash, stack: 1344 bytes/call
test_python_iterator: 5344 calls before crash, stack: 1568 bytes/call

=> total: 18754 calls, 4080 bytes


With these 3 changes (rev 6478e6d0476f):

test_python_call: 8587 calls before crash, stack: 976 bytes/call
test_python_getitem: 9189 calls before crash, stack: 912 bytes/call
test_python_iterator: 7936 calls before crash, stack: 1056 bytes/call

=> total: 25712 calls, 2944 bytes


The default branch is now as good as Python 3.4, in term of stack consumption, 
and Python 3.4 was the Python version which used the least stack memory 
according to my tests.

I didn't touch _PY_FASTCALL_SMALL_STACK value, it's still 5 arguments (40 
bytes). So my changes should not impact performances.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28870] Reduce stack consumption of PyObject_CallFunctionObjArgs() and like

2017-01-10 Thread STINNER Victor

STINNER Victor added the comment:

It seems like subfunc.patch approach using the "no inline" attribute helps.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28870] Reduce stack consumption of PyObject_CallFunctionObjArgs() and like

2017-01-10 Thread STINNER Victor

STINNER Victor added the comment:

I created the issue #29227 "Reduce C stack consumption in function calls" which 
contains a first simple patch with a significant effect on the C stack.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28870] Reduce stack consumption of PyObject_CallFunctionObjArgs() and like

2017-01-10 Thread STINNER Victor

STINNER Victor added the comment:

Stack used by each C function of test_python_call.

3.4:

(a) method_call: 64

(b) PyObject_Call: 48
(b) function_call: 160
(b) PyEval_EvalCodeEx: 176

(c) PyEval_EvalFrameEx: 256
(c) call_function: 0
(c) do_call: 0
(c) PyObject_Call: 48

(d) slot_tp_call: 64
(d) PyObject_Call: 48

=> total: 864


default:

(a) method_call: 80

(b) _PyObject_FastCallDict: 64
(b) _PyFunction_FastCallDict: 208
(b) _PyEval_EvalCodeWithName: 176

(c) _PyEval_EvalFrameDefault: 320
(c) call_function: 80
(c) _PyObject_FastCallKeywords: 80

(d) slot_tp_call: 64
(d) PyObject_Call: 48

=> total: 1120


Groups of functions, 3.4 => default:

(a) 64 => 80 (+16)
(b) 384 => 448 (+64)
(c) 304 => 480 (+176)
(d) 112 => 112 (=)


I used gdb:

(gdb) set $last=0
(gdb) define size
> print $last - (uintptr_t)$rsp
> set $last = (uintptr_t)$rsp
> down
(gdb) up
(gdb) up
(gdb) up
(... until a first method_call ...)
(gdb) size
(gdb) size
...

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28870] Reduce stack consumption of PyObject_CallFunctionObjArgs() and like

2017-01-10 Thread STINNER Victor

STINNER Victor added the comment:

> no_small_stack-2.patch decreases it only by 6% (with possible performance 
> loss).

Yeah, if we want to come back to Python 3.4 efficiency, we need to find the 
other functions which now uses more stack memory ;-) The discussed "small 
stack" buffers are only responsible of 96 bytes, not a big deal compared to the 
total of 4080 bytes.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28870] Reduce stack consumption of PyObject_CallFunctionObjArgs() and like

2017-01-10 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Thus Python 3.6 stack usage is about 20% larger than Python 3.5 and about 40% 
larger than Python 3.4. This is significant. :-(

no_small_stack-2.patch decreases it only by 6% (with possible performance loss).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28870] Reduce stack consumption of PyObject_CallFunctionObjArgs() and like

2017-01-10 Thread STINNER Victor

STINNER Victor added the comment:

no_small_stack-2.patch has a very bad impact on performances:

haypo@speed-python$ python3 -m perf compare_to 
2017-01-04_12-02-default-ee1390c9b585.json 
no_small_stack-2_refee1390c9b585.json -G --min-speed=5

Slower (59):
- telco: 15.7 ms +- 0.5 ms -> 23.4 ms +- 0.3 ms: 1.49x slower (+49%)
- scimark_sor: 393 ms +- 6 ms -> 579 ms +- 10 ms: 1.47x slower (+47%)
- json_loads: 56.9 us +- 0.9 us -> 83.1 us +- 2.4 us: 1.46x slower (+46%)
- unpickle_pure_python: 698 us +- 10 us -> 984 us +- 10 us: 1.41x slower (+41%)
- scimark_lu: 424 ms +- 22 ms -> 585 ms +- 33 ms: 1.38x slower (+38%)
- chameleon: 22.4 ms +- 0.2 ms -> 30.8 ms +- 0.3 ms: 1.38x slower (+38%)
- xml_etree_generate: 212 ms +- 3 ms -> 291 ms +- 4 ms: 1.37x slower (+37%)
- xml_etree_process: 177 ms +- 3 ms -> 240 ms +- 3 ms: 1.35x slower (+35%)
- raytrace: 1.04 sec +- 0.01 sec -> 1.40 sec +- 0.02 sec: 1.35x slower (+35%)
- logging_simple: 27.9 us +- 0.4 us -> 37.4 us +- 0.5 us: 1.34x slower (+34%)
- pickle_pure_python: 1.02 ms +- 0.01 ms -> 1.37 ms +- 0.02 ms: 1.34x slower 
(+34%)
- logging_format: 33.3 us +- 0.4 us -> 44.5 us +- 0.7 us: 1.34x slower (+34%)
- xml_etree_iterparse: 195 ms +- 5 ms -> 259 ms +- 7 ms: 1.32x slower (+32%)
- chaos: 236 ms +- 3 ms -> 306 ms +- 3 ms: 1.30x slower (+30%)
- regex_compile: 380 ms +- 3 ms -> 494 ms +- 5 ms: 1.30x slower (+30%)
- pathlib: 42.3 ms +- 0.5 ms -> 55.0 ms +- 0.6 ms: 1.30x slower (+30%)
- django_template: 364 ms +- 5 ms -> 471 ms +- 4 ms: 1.29x slower (+29%)
- call_method: 11.2 ms +- 0.2 ms -> 14.4 ms +- 0.2 ms: 1.29x slower (+29%)
- hexiom: 18.4 ms +- 0.2 ms -> 23.7 ms +- 0.2 ms: 1.29x slower (+29%)
- call_method_slots: 11.0 ms +- 0.3 ms -> 14.1 ms +- 0.1 ms: 1.28x slower (+28%)
- richards: 147 ms +- 4 ms -> 188 ms +- 5 ms: 1.28x slower (+28%)
- html5lib: 207 ms +- 7 ms -> 262 ms +- 6 ms: 1.27x slower (+27%)
- genshi_text: 71.5 ms +- 1.3 ms -> 90.3 ms +- 1.1 ms: 1.26x slower (+26%)
- deltablue: 14.2 ms +- 0.2 ms -> 17.9 ms +- 0.4 ms: 1.26x slower (+26%)
- genshi_xml: 164 ms +- 2 ms -> 207 ms +- 3 ms: 1.26x slower (+26%)
- sympy_str: 429 ms +- 5 ms -> 539 ms +- 4 ms: 1.25x slower (+25%)
- go: 493 ms +- 5 ms -> 619 ms +- 7 ms: 1.25x slower (+25%)
- mako: 35.4 ms +- 1.5 ms -> 44.2 ms +- 1.2 ms: 1.25x slower (+25%)
- sympy_expand: 959 ms +- 10 ms -> 1.19 sec +- 0.01 sec: 1.24x slower (+24%)
- nqueens: 215 ms +- 2 ms -> 268 ms +- 1 ms: 1.24x slower (+24%)
(...)

Benchmark ran on speed-python with PGO+LTO, Linux configured for benchmarks 
using python3 -m perf system tune.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28870] Reduce stack consumption of PyObject_CallFunctionObjArgs() and like

2017-01-10 Thread STINNER Victor

STINNER Victor added the comment:

Python 3.4 (rev 6340c9fcc111):

test_python_call: 9700 calls before crash, stack: 864 bytes/call
test_python_getitem: 8314 calls before crash, stack: 1008 bytes/call
test_python_iterator: 7818 calls before crash, stack: 1072 bytes/call

=> total: 25832 calls, 2944 bytes

Python 2.7 (rev 0d4e0a736688):

test_python_call: 6162 calls before crash, stack: 1360 bytes/call
test_python_getitem: 5952 calls before crash, stack: 1408 bytes/call
test_python_iterator: 5885 calls before crash, stack: 1424 bytes/call

=> total: 17999 calls, 4192 bytes

Nice. At least, Python 3.7 is better than Python 2.7 (4080 bytes <
4192 bytes) :-) Python 3.4 stack usage was very low, and lower than
Python 3.5.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28870] Reduce stack consumption of PyObject_CallFunctionObjArgs() and like

2017-01-10 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

What are results with 3.4? There were several issues about stack overflow in 
3.5 (issue25222, issue28179, issue28913).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28870] Reduce stack consumption of PyObject_CallFunctionObjArgs() and like

2017-01-10 Thread STINNER Victor

STINNER Victor added the comment:

Python 3.5 (revision 8125d9a8152b), before all fastcall changes:

test_python_call: 8314 calls before crash, stack: 1008 bytes/call
test_python_getitem: 7483 calls before crash, stack: 1120 bytes/call
test_python_iterator: 6802 calls before crash, stack: 1232 bytes/call

=> total: 22599 calls, 3360 bytes

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28870] Reduce stack consumption of PyObject_CallFunctionObjArgs() and like

2017-01-10 Thread STINNER Victor

STINNER Victor added the comment:

> no_small_stack.patch:

Oops, you should read no_small_stack-2.patch in my previous message ;-)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28870] Reduce stack consumption of PyObject_CallFunctionObjArgs() and like

2017-01-10 Thread STINNER Victor

STINNER Victor added the comment:

no_small_stack-2.patch: Remove all "small_stack" buffers.

Reference

test_python_call: 7175 calls before crash, stack: 1168 bytes/call
test_python_getitem: 6235 calls before crash, stack: 1344 bytes/call
test_python_iterator: 5344 calls before crash, stack: 1568 bytes/call

=> total: 18754 calls, 4080 bytes

no_small_stack.patch

test_python_call: 7482 calls (+307) before crash, stack: 1120 bytes/call (-48)
test_python_getitem: 6715 calls (+480) before crash, stack: 1248 bytes/call 
(-96)
test_python_iterator: 5693 calls (+349) before crash, stack: 1472 bytes/call 
(-96)

=> total: 19890 calls (+1136), 3840 bytes (-240)

The total gain is the removal of 5 small buffers of 48 bytes: 240 bytes.

--
Added file: http://bugs.python.org/file46240/no_small_stack-2.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28870] Reduce stack consumption of PyObject_CallFunctionObjArgs() and like

2017-01-10 Thread STINNER Victor

STINNER Victor added the comment:

stack_overflow_28870-sp.py: script using testcapi_stack_pointer.patch to 
compute the usage of the C stack. Results of this script.

(*) Reference

test_python_call: 7175 calls before crash, stack: 1168 bytes/call
test_python_getitem: 6235 calls before crash, stack: 1344 bytes/call
test_python_iterator: 5344 calls before crash, stack: 1568 bytes/call

=> total: 18754 calls, 4080 bytes

(1) no_small_stack.patch

test_python_call: 7175 calls before crash, stack: 1168 bytes/call
test_python_getitem: 6547 calls before crash, stack: 1280 bytes/call
test_python_iterator: 5572 calls before crash, stack: 1504 bytes/call

=> total: 19294 calls, 3952 bytes

test_python_call is clearly not impacted by no_small_stack.patch.

test_python_call loops on method_call():

method_call()
=> _PyObject_Call_Prepend()
=> _PyObject_FastCallDict()
=> _PyFunction_FastCallDict()
=> _PyEval_EvalCodeWithName()
=> PyEval_EvalFrameEx()
=> _PyEval_EvalFrameDefault()
=> call_function()
=> _PyObject_FastCallKeywords()
=> slot_tp_call()
=> PyObject_Call()
=> method_call()
=> (...)

_PyObject_Call_Prepend() is in the middle of the chain. This function uses a 
"small stack" of _PY_FASTCALL_SMALL_STACK "PyObject*" items. We can clearly see 
the impact of modifying _PY_FASTCALL_SMALL_STACK on the maximum number of 
test_python_call calls before crash in msg285057.

--
Added file: http://bugs.python.org/file46239/stack_overflow_28870-sp.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28870] Reduce stack consumption of PyObject_CallFunctionObjArgs() and like

2017-01-10 Thread STINNER Victor

STINNER Victor added the comment:

testcapi_stack_pointer.patch: add _testcapi.stack_pointer() function.

--
Added file: http://bugs.python.org/file46238/testcapi_stack_pointer.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28870] Reduce stack consumption of PyObject_CallFunctionObjArgs() and like

2017-01-09 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

I'm not sure that the result of pyobjectl_callfunctionobjargs_stacksize() has 
direct relation to stack consumption in test_python_call, test_python_getitem 
and test_python_iterator. Try to measure the stack consumption in these cases. 
This can be done with _testcapi helper that just returns the value of stack 
pointer. Run all three tests with fixed level of recursion and measure the 
difference between stack pointers.

Would be nice also measure a performance effect of the patches.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28870] Reduce stack consumption of PyObject_CallFunctionObjArgs() and like

2017-01-09 Thread STINNER Victor

STINNER Victor added the comment:

Impact of the _PY_FASTCALL_SMALL_STACK constant:

* _PY_FASTCALL_SMALL_STACK=1: 528 bytes/call

test_python_call 7376
test_python_getitem 6544
test_python_iterator 5572
=> total: 19 492

* _PY_FASTCALL_SMALL_STACK=3: 528 bytes/call

test_python_call 7272
test_python_getitem 6464
test_python_iterator 5512
=> total: 19 248

* _PY_FASTCALL_SMALL_STACK=5 (current value): 560 bytes/call

test_python_call 7172
test_python_getitem 6232
test_python_iterator 5344
=> total: 19 636

* _PY_FASTCALL_SMALL_STACK=10: 592 bytes/call

test_python_call 6984
test_python_getitem 5952
test_python_iterator 5132
=> total: 18 068

Increasing _PY_FASTCALL_SMALL_STACK has a clear effect on the total. Total 
decreases when _PY_FASTCALL_SMALL_STACK increases.


---

no_small_stack.patch with _PY_FASTCALL_SMALL_STACK=3: 368 bytes/call

test_python_call 7272
test_python_getitem 6628
test_python_iterator 5632
=> total: 19 532

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28870] Reduce stack consumption of PyObject_CallFunctionObjArgs() and like

2017-01-09 Thread STINNER Victor

STINNER Victor added the comment:

I modified Serhiy's stack_overflow.py of #28858:
* re-run each test 10 tests and show the maximum depth
* only test: ['test_python_call', 'test_python_getitem', 'test_python_iterator']

Maximum number of Python calls before a crash.

(*) Reference (unpatched): 560 bytes/call

test_python_call 7172
test_python_getitem 6232
test_python_iterator 5344
=> total: 18 838

(1) no_small_stack.patch: 368 bytes/call

test_python_call 7172 (=)
test_python_getitem 6544 (+312)
test_python_iterator 5572 (+228)
=> total: 19 288

(2) less_stack.patch: 384 bytes/call

test_python_call 7272 (+100)
test_python_getitem 6384 (+152)
test_python_iterator 5456 (+112)
=> total: 19 112

(3) subfunc.patch: 496 bytes

test_python_call 7272 (+100)
test_python_getitem 6712 (+480)
test_python_iterator 6020 (+678)
=> total: 20 004

(4) alloca.patch: 528 bytes/call

test_python_call 7272 (+100)
test_python_getitem 6464 (+232)
test_python_iterator 5752 (+408)
=> total: 19 488

Patched sorted by bytes/call, from best to worst: no_small_stack.patch (368) > 
less_stack.patch (384) > subfunc.patch (496) > alloca.patch (528) > reference 
(560).

Patched sorted by number of calls before crash: subfunc.patch (20 004) > 
alloca.patch (19 488) > no_small_stack.patch (19 288) > less_stack.patch (19 
112) > reference (18 838).

I expected a correlation between the measure bytes/call measured by 
testcapi_stacksize.patch and the number of calls before a crash, but I fail to 
see an obvious correlation :-/

Maybe the compiler is smarter than what I would expect and emits efficient code 
to be able to use less stack memory?

Maybe the Linux kernel does weird things which makes the behaviour on 
stack-overflow non-obvious :-)

At least, I would expect that no_small_stack.patch would be the clear winner, 
since it has the smallest usage of C stack.

--
Added file: http://bugs.python.org/file46230/stack_overflow_28870.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28870] Reduce stack consumption of PyObject_CallFunctionObjArgs() and like

2017-01-09 Thread Xiang Zhang

Changes by Xiang Zhang :


--
nosy: +xiang.zhang

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28870] Reduce stack consumption of PyObject_CallFunctionObjArgs() and like

2017-01-02 Thread STINNER Victor

STINNER Victor added the comment:

In Python 3.5, PyObject_CallFunctionObjArgs() calls objargs_mktuple() which 
uses Py_VA_COPY(countva, va) and creates a tuple. The tuple constructor uses a 
free list to reduce the cost of heap memory allocations.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28870] Reduce stack consumption of PyObject_CallFunctionObjArgs() and like

2017-01-02 Thread STINNER Victor

STINNER Victor added the comment:

no_small_stack.patch: And now something completely different, a patch to remove 
the "small stack" alllocated on the C stack, always use the heap memory. FYI I 
created no_small_stack.patch from less_stack.patch.

As expected, the stack usage is lower:

* less_stack.patch: 384 bytes/call
* no_small_stack.patch: 368 bytes/call

I didn't check the performance of no_small_stack.patch yet.

--
Added file: http://bugs.python.org/file46120/no_small_stack.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28870] Reduce stack consumption of PyObject_CallFunctionObjArgs() and like

2017-01-02 Thread STINNER Victor

STINNER Victor added the comment:

testcapi_stacksize.patch: add 
_testcapi.pyobjectl_callfunctionobjargs_stacksize(), function used to measure 
the stack consumption.

--
Added file: http://bugs.python.org/file46119/testcapi_stacksize.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28870] Reduce stack consumption of PyObject_CallFunctionObjArgs() and like

2016-12-15 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

I have tested all three patches with the stack_overflow.py script. The only 
affected are recursive Python implementations of __call__, __getitem__ and 
__iter__.

unpatched   less_stack  alloca  subfunc

test_python_call9696987698809876
test_python_getitem 988410264   988010688
test_python_iterator7812805283128872

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28870] Reduce stack consumption of PyObject_CallFunctionObjArgs() and like

2016-12-15 Thread STINNER Victor

STINNER Victor added the comment:

For comparison, Python 3.5 (before fast calls) uses 448 bytes of C stack per 
call. Python 3.5 uses a tuple allocated in the heap memory.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28870] Reduce stack consumption of PyObject_CallFunctionObjArgs() and like

2016-12-15 Thread STINNER Victor

STINNER Victor added the comment:

I also tried Serhiy's approach, split the function into subfunctions, but the 
result is not as good as expected: 496 bytes. See attached subfunc.patch.

--
Added file: http://bugs.python.org/file45918/subfunc.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28870] Reduce stack consumption of PyObject_CallFunctionObjArgs() and like

2016-12-15 Thread STINNER Victor

STINNER Victor added the comment:

I also tried to use alloca(): see attached alloca.patch. But the result is 
quite bad: 528 bytes of stack memory per call. I only attach the patch to 
discuss the issue, but I now dislike the option: the result is bad, it's less 
portable and more dangerous.

--
Added file: http://bugs.python.org/file45917/alloca.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28870] Reduce stack consumption of PyObject_CallFunctionObjArgs() and like

2016-12-15 Thread STINNER Victor

STINNER Victor added the comment:

I don't propose to add _testcapi.pyobjectl_callfunctionobjargs_stacksize(). 
It's just to test the patch. I'm using it with:

$./python -c 'import _testcapi; n=100; 
print(_testcapi.pyobjectl_callfunctionobjargs_stacksize(n) / (n+1))'
384.0

The value of n has no impact on the stack, it gives the same value with n=0.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28870] Reduce stack consumption of PyObject_CallFunctionObjArgs() and like

2016-12-15 Thread STINNER Victor

Changes by STINNER Victor :


--
title: Refactor PyObject_CallFunctionObjArgs() and like -> Reduce stack 
consumption of PyObject_CallFunctionObjArgs() and like

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com