[issue29263] Implement LOAD_METHOD/CALL_METHOD for C functions

2017-02-02 Thread INADA Naoki

Changes by INADA Naoki :


--
dependencies:  -Add tp_fastcall to PyTypeObject: support FASTCALL calling 
convention for all callable objects
resolution:  -> fixed
stage:  -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29263] Implement LOAD_METHOD/CALL_METHOD for C functions

2017-02-02 Thread Roundup Robot

Roundup Robot added the comment:


New changeset 48bde5d2cc5b9e98f55cd23ee57f3996d81caeb5 by INADA Naoki in branch 
'master':
Issue #29263: LOAD_METHOD support for C methods
https://github.com/python/cpython/commit/48bde5d2cc5b9e98f55cd23ee57f3996d81caeb5


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29263] Implement LOAD_METHOD/CALL_METHOD for C functions

2017-02-02 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 135a9a0c09f9 by INADA Naoki in branch 'default':
Issue #29263: LOAD_METHOD support for C methods
https://hg.python.org/cpython/rev/135a9a0c09f9

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29263] Implement LOAD_METHOD/CALL_METHOD for C functions

2017-02-02 Thread STINNER Victor

STINNER Victor added the comment:

loadmethod-methoddescr-2.patch LGTM.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29263] Implement LOAD_METHOD/CALL_METHOD for C functions

2017-02-02 Thread INADA Naoki

Changes by INADA Naoki :


Added file: http://bugs.python.org/file46493/loadmethod-methoddescr-2.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29263] Implement LOAD_METHOD/CALL_METHOD for C functions

2017-02-02 Thread STINNER Victor

STINNER Victor added the comment:

Naoki> I confirmed bm_mako performance degrade is caused by L1 cache miss.

I know this performance instability very well, the issue is called "code 
placement":
https://haypo.github.io/journey-to-stable-benchmark-deadcode.html

I tried to fight it with GCC __attribute__((hot)) in the issue #28618, but it 
doesn't fix the issue (at least, not completely):
http://bugs.python.org/issue28618#msg281459

In my experience, the best fix is PGO. Slowly, I consider that it's worthless 
to try to "fight" against code placement, and that benchmark results are only 
reliable when PGO compilation was used. Otherwise, you should ignore small 
performance differences. Problem: What is the threshold? 5%? 10%? I already 
noticed a difference up to 70% only caused by code placement!
https://haypo.github.io/analysis-python-performance-issue.html

--

I ran benchmarks on loadmethod-methoddescr.patch on haypo@speed-python with 
LTO+PGO. If you only show performance difference of at least 5%, only 3 
benchmarks are significant and are faster:
---
haypo@speed-python$ python3 -m perf compare_to ~/benchmarks/*762a93935afd*json 
loadmethod-methoddesc_ref_762a93935afd.json  -G --min-speed=5
Faster (3):
- regex_v8: 50.3 ms +- 0.4 ms -> 43.2 ms +- 0.3 ms: 1.17x faster (-14%)
- scimark_monte_carlo: 230 ms +- 6 ms -> 208 ms +- 4 ms: 1.11x faster (-10%)
- scimark_lu: 390 ms +- 17 ms -> 370 ms +- 13 ms: 1.05x faster (-5%)

Benchmark hidden because not significant (61): (...)
---

In my experience, regex_v8 and scimark_* benchmarks are not really reliable. 
I'm not surprised to not see a major speedup on performance, since the speedup 
on microbenchmarks is only 10% faster.

IHMO 10% faster on method calls is significant enough, since it's a core, very 
common, and widely used Python feature.

--

To be more explicitl:  loadmethod-methoddescr.patch LGTM except of minor 
comments on the review.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29263] Implement LOAD_METHOD/CALL_METHOD for C functions

2017-02-02 Thread INADA Naoki

INADA Naoki added the comment:

I confirmed bm_mako performance degrade is caused by L1 cache miss.
(I didn't use --with-optimization)
https://gist.github.com/methane/b33edbf1f123ae026e704b0e005c3606

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29263] Implement LOAD_METHOD/CALL_METHOD for C functions

2017-02-02 Thread Serhiy Storchaka

Changes by Serhiy Storchaka :


--
nosy: +serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29263] Implement LOAD_METHOD/CALL_METHOD for C functions

2017-02-02 Thread STINNER Victor

STINNER Victor added the comment:

I tried to benchmark b''.decode(encoding='ascii'): CALL_METHOD is not used for 
this call, but CALL_FUNCTION_KW :-/ So the call is not affected by your patch.

I also ran a quick benchmark on loadmethod-methoddescr.patch:

b''.decode(): 71.1 ns +- 0.5 ns -> 65.4 ns +- 0.2 ns: 1.09x faster (-8%)
b''.decode('utf8'): 92.8 ns +- 0.4 ns -> 85.5 ns +- 0.3 ns: 1.09x faster (-8%)

I confirm the speedup.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29263] Implement LOAD_METHOD/CALL_METHOD for C functions

2017-02-01 Thread INADA Naoki

INADA Naoki added the comment:

microbench:
$ ./python.patched -m perf timeit --compare-to `pwd`/python.default -s "d = 
b''" -- "d.decode('ascii')"
python.default: . 109 ns +- 1 ns
python.patched: . 102 ns +- 1 ns

Median +- std dev: [python.default] 109 ns +- 1 ns -> [python.patched] 102 ns 
+- 1 ns: 1.08x faster (-7%)

$ ./python.patched -m perf timeit --compare-to `pwd`/python.default -s "l = 
[42]" -- "l.count(42)"
python.default: . 66.0 ns +- 0.9 ns
python.patched: . 60.0 ns +- 0.3 ns

Median +- std dev: [python.default] 66.0 ns +- 0.9 ns -> [python.patched] 60.0 
ns +- 0.3 ns: 1.10x faster (-9%)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29263] Implement LOAD_METHOD/CALL_METHOD for C functions

2017-02-01 Thread INADA Naoki

INADA Naoki added the comment:

Here is the result. I'll investigate mako result.

+ ../python.default -m perf compare_to default.json patched.json -G 
--min-speed=1
Slower (20):
- mako: 40.9 ms +- 0.4 ms -> 44.7 ms +- 0.6 ms: 1.09x slower (+9%)
- chaos: 298 ms +- 3 ms -> 308 ms +- 2 ms: 1.03x slower (+3%)
- telco: 19.7 ms +- 0.6 ms -> 20.2 ms +- 0.5 ms: 1.03x slower (+3%)
- scimark_sor: 478 ms +- 7 ms -> 491 ms +- 11 ms: 1.03x slower (+3%)
- xml_etree_process: 238 ms +- 3 ms -> 245 ms +- 4 ms: 1.03x slower (+3%)
- logging_simple: 31.1 us +- 0.3 us -> 31.9 us +- 0.3 us: 1.03x slower (+3%)
- logging_format: 36.4 us +- 0.3 us -> 37.2 us +- 0.4 us: 1.02x slower (+2%)
- nqueens: 264 ms +- 3 ms -> 270 ms +- 3 ms: 1.02x slower (+2%)
- fannkuch: 1.07 sec +- 0.02 sec -> 1.09 sec +- 0.03 sec: 1.02x slower (+2%)
- django_template: 401 ms +- 5 ms -> 409 ms +- 3 ms: 1.02x slower (+2%)
- call_method_slots: 14.6 ms +- 0.2 ms -> 14.9 ms +- 0.1 ms: 1.02x slower (+2%)
- meteor_contest: 199 ms +- 2 ms -> 203 ms +- 2 ms: 1.02x slower (+2%)
- logging_silent: 735 ns +- 9 ns -> 746 ns +- 11 ns: 1.02x slower (+2%)
- sqlalchemy_declarative: 314 ms +- 4 ms -> 318 ms +- 4 ms: 1.01x slower (+1%)
- genshi_text: 89.1 ms +- 1.3 ms -> 90.2 ms +- 1.3 ms: 1.01x slower (+1%)
- unpack_sequence: 131 ns +- 2 ns -> 133 ns +- 2 ns: 1.01x slower (+1%)
- call_method: 15.1 ms +- 0.2 ms -> 15.3 ms +- 0.1 ms: 1.01x slower (+1%)
- scimark_monte_carlo: 263 ms +- 6 ms -> 266 ms +- 6 ms: 1.01x slower (+1%)
- unpickle_pure_python: 834 us +- 16 us -> 843 us +- 15 us: 1.01x slower (+1%)
- pathlib: 51.2 ms +- 0.9 ms -> 51.7 ms +- 0.7 ms: 1.01x slower (+1%)

Faster (11):
- scimark_lu: 457 ms +- 49 ms -> 407 ms +- 7 ms: 1.12x faster (-11%)
- chameleon: 30.2 ms +- 0.6 ms -> 28.8 ms +- 0.4 ms: 1.05x faster (-5%)
- xml_etree_parse: 314 ms +- 12 ms -> 303 ms +- 8 ms: 1.04x faster (-4%)
- sqlite_synth: 10.3 us +- 0.2 us -> 9.94 us +- 0.25 us: 1.03x faster (-3%)
- pickle_pure_python: 1.28 ms +- 0.02 ms -> 1.25 ms +- 0.02 ms: 1.03x faster 
(-3%)
- xml_etree_iterparse: 223 ms +- 5 ms -> 218 ms +- 7 ms: 1.02x faster (-2%)
- scimark_fft: 730 ms +- 19 ms -> 716 ms +- 18 ms: 1.02x faster (-2%)
- nbody: 243 ms +- 5 ms -> 239 ms +- 6 ms: 1.02x faster (-2%)
- tornado_http: 377 ms +- 4 ms -> 373 ms +- 4 ms: 1.01x faster (-1%)
- sympy_expand: 1.02 sec +- 0.01 sec -> 1.01 sec +- 0.01 sec: 1.01x faster (-1%)
- unpickle: 32.1 us +- 0.3 us -> 31.8 us +- 0.2 us: 1.01x faster (-1%)

Benchmark hidden because not significant (33):

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29263] Implement LOAD_METHOD/CALL_METHOD for C functions

2017-02-01 Thread INADA Naoki

INADA Naoki added the comment:

I'm running pyperformance.

--
Added file: http://bugs.python.org/file46487/loadmethod-methoddescr.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29263] Implement LOAD_METHOD/CALL_METHOD for C functions

2017-02-01 Thread STINNER Victor

STINNER Victor added the comment:

> * Create unbound PyCFunction and cache it

Does it mean create a new private type, similar to PyCFunction_Type 
(PyCFunctionObject) but without the self attribute?

IMHO creating a new type is complex. I'm not sure that it's worth it, 
especially if the _PyMethodDescr_FastCallKeywords() idea is simpler to 
implement and "fast enough".

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29263] Implement LOAD_METHOD/CALL_METHOD for C functions

2017-02-01 Thread STINNER Victor

STINNER Victor added the comment:

INADA Naoki: "LOAD_METHOD support was based on tp_fastcall. (...) Other ideas 
to support LOAD_METHOD for C function are: * Add 
_PyMethodDescr_FastCallKeywords and call it in call_function."

It seems like a very small subset of tp_fastcall. If it provides a significant 
speedup, I really like the idea :-) I would be happy to see a "subset of 
tp_fastcall" merged. At least my whole tp_fastcall experiment wouldn't be 
worthless :-)

What would be the main change which would provide a speedup? Avoid the creation 
of a temporary tuple to pass positional arguments?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29263] Implement LOAD_METHOD/CALL_METHOD for C functions

2017-02-01 Thread Yury Selivanov

Yury Selivanov added the comment:

> * Add _PyMethodDescr_FastCallKeywords and call it in call_function.

This option seems to be the easiest to implement.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29263] Implement LOAD_METHOD/CALL_METHOD for C functions

2017-02-01 Thread INADA Naoki

INADA Naoki added the comment:

LOAD_METHOD support was based on tp_fastcall.

Without LOAD_METHOD support, methoddescr_get is called and
stack layout after LOAD_METHOD is:

  NULL, (bound)PyCFunction, arg1, arg2, ... argN

With LOAD_METHOD support, stack layout is:

  PyMethodDescrObject, self, arg1, ... argN

And tp_call (or tp_fastcall) of method descriptor is called when CALL_METHOD.

Without tp_fastcall, it means pack tuple happens.
It is more heavy than creating temporary PyCFunction.

---

Other ideas to support LOAD_METHOD for C function are:

* Add _PyMethodDescr_FastCallKeywords and call it in call_function.
* Create unbound PyCFunction and cache it

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29263] Implement LOAD_METHOD/CALL_METHOD for C functions

2017-02-01 Thread STINNER Victor

STINNER Victor added the comment:

INADA Naoki: "Maybe, we should do: * Make clinic use more METH_FASTCALL * Use 
clinic more in builtin methods; before trying this optimization."

I created the issue #29286 "Use METH_FASTCALL in str methods" which is now 
fixed. I started to propose patches for other funnctions, types and modules.

Would you mind to rebase your loadmethod-fastcall.patch patch Naoki?

With my change 0b219348ec9e, I'm not sure that your change on 
methoddescr_call() is still needed.

I tried to rebase your change, but I failed to get the "1.10x faster" result. 
Maybe I did something wrong, maybe my change 0b219348ec9e already optimized 
method calls for C functions?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29263] Implement LOAD_METHOD/CALL_METHOD for C functions

2017-01-18 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 0b219348ec9e by Victor Stinner in branch 'default':
Optimize methoddescr_call(): avoid temporary PyCFunction
https://hg.python.org/cpython/rev/0b219348ec9e

--
nosy: +python-dev

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29263] Implement LOAD_METHOD/CALL_METHOD for C functions

2017-01-13 Thread INADA Naoki

INADA Naoki added the comment:

Maybe, we should do

* Make clinic use more METH_FASTCALL
* Use clinic more in builtin methods

before trying this optimization.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29263] Implement LOAD_METHOD/CALL_METHOD for C functions

2017-01-13 Thread INADA Naoki

INADA Naoki added the comment:

This is proof of concept patch to support LOAD_METHOD & CALL_METHOD for 
METH_FASTCALL methods.
(This patch should be applied after fastcall.patch)

$ ./python -m perf timeit --compare-to `pwd`/python-fastcall -s "d = b''" -- 
"d.decode()"
python-fastcall: . 91.0 ns +- 1.0 ns
python: . 80.3 ns +- 0.3 ns

Median +- std dev: [python-fastcall] 91.0 ns +- 1.0 ns -> [python] 80.3 ns +- 
0.3 ns: 1.13x faster

$ ./python -m perf timeit --compare-to `pwd`/python-fastcall -s "d = b''" -- 
"d.decode('ascii')"
python-fastcall: . 116 ns +- 1 ns
python: . 106 ns +- 1 ns

Median +- std dev: [python-fastcall] 116 ns +- 1 ns -> [python] 106 ns +- 1 ns: 
1.10x faster


Since PyCFunction is lighter than PyMethodObject, performance benefit seems 
smaller than
Python method (up to 20%).

Sadly speaking, there are only few METH_FASTCALL in builtin type.

--
keywords: +patch
Added file: http://bugs.python.org/file46286/loadmethod-fastcall.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29263] Implement LOAD_METHOD/CALL_METHOD for C functions

2017-01-13 Thread STINNER Victor

Changes by STINNER Victor :


--
dependencies: +Add tp_fastcall to PyTypeObject: support FASTCALL calling 
convention for all callable objects

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29263] Implement LOAD_METHOD/CALL_METHOD for C functions

2017-01-13 Thread STINNER Victor

New submission from STINNER Victor:

The issue #29259 implements tp_fastcall on method_descriptor 
(PyMethodDescr_Type). According to http://bugs.python.org/issue26110#msg283093 
it would allow to implement LOAD_METHOD and CALL_METHOD for C functions.

--
messages: 285414
nosy: haypo, inada.naoki, yselivanov
priority: normal
severity: normal
status: open
title: Implement LOAD_METHOD/CALL_METHOD for C functions
type: performance
versions: Python 3.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com