[issue38436] Improved performance for list addition.

2019-10-16 Thread Brandt Bucher
Brandt Bucher added the comment: I'm going to go ahead and close this, since the payoff doesn't seem to be worth the effort. Thanks for the benchmarking and feedback! -- resolution: -> rejected stage: patch review -> resolved status: open -> closed

[issue38436] Improved performance for list addition.

2019-10-11 Thread Brandt Bucher
Brandt Bucher added the comment: I went ahead and ran an instrumented build on some random production code (mostly financial data processing), just because I was curious: BINARY_ADD ops: 3,720,776 BINARY_ADD ops with two lists: 100,452 (2.7% of total) BINARY_ADD with new fast path: 26,357

[issue38436] Improved performance for list addition.

2019-10-11 Thread Brandt Bucher
Brandt Bucher added the comment: Serhiy, here are the better performance measurements: $ ./python.exe -m pyperf timeit --duplicate=1000 -s z=0 z+0 # list-add . Mean +- std dev: 17.3 ns +- 0.3 ns $ ./python.exe -m pyperf timeit --duplicate=1000 -s z=0 z+0 # master

[issue38436] Improved performance for list addition.

2019-10-11 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: > I do not see any significant change in + operator timing on my machine > (again, just a rough test): Because the time includes the time of iterating, which can be significant in comparison with adding two integers. Please use the pyperf module with

[issue38436] Improved performance for list addition.

2019-10-11 Thread Brandt Bucher
Brandt Bucher added the comment: ...and obviously the gains are more pronounced for more/longer lists. In general I'm not married to this change, though. If the consensus is "not worth it", I get it. But it seems like too easy of a win to me. --

[issue38436] Improved performance for list addition.

2019-10-11 Thread Brandt Bucher
Brandt Bucher added the comment: Thanks, Pablo, for providing that. So the changes look like mostly a wash on these benchmarks. Serhiy: I do not see any significant change in + operator timing on my machine (again, just a rough test): $ ./python.exe -m timeit -s z=0 z+0 # master

[issue38436] Improved performance for list addition.

2019-10-11 Thread Pablo Galindo Salgado
Pablo Galindo Salgado added the comment: I have repeated the benchmark in the speed.python.org server with CPU isolation + PGO + LTO: Slower (21): - xml_etree_iterparse: 127 ms +- 2 ms -> 131 ms +- 2 ms: 1.03x slower (+3%) - xml_etree_parse: 195 ms +- 1 ms -> 200 ms +- 2 ms: 1.03x slower

[issue38436] Improved performance for list addition.

2019-10-11 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: How often do you need to add three or more lists in comparison with other uses of the "+" operator? How larger is the benefit of this optimization? How much it slows down other uses of the "+" operator? Would be nice if you provide some numbers. More

[issue38436] Improved performance for list addition.

2019-10-10 Thread Pablo Galindo Salgado
Pablo Galindo Salgado added the comment: >From the PR: Where the benchmarks executed with CPU isolation? I am a bit suspicious of the - pickle_list: 3.63 us +- 0.08 us -> 3.78 us +- 0.10 us: 1.04x slower (+4%) If not, can you repeat them with CPU isolation and affinity? Check this for more

[issue38436] Improved performance for list addition.

2019-10-10 Thread Brandt Bucher
Change by Brandt Bucher : -- keywords: +patch pull_requests: +16289 stage: -> patch review pull_request: https://github.com/python/cpython/pull/16705 ___ Python tracker ___

[issue38436] Improved performance for list addition.

2019-10-10 Thread Brandt Bucher
New submission from Brandt Bucher : The attached PR adds a fast path for BINARY_ADD instructions involving two lists, where the left list has a refcount of exactly 1. In this case, we instead do a PySequence_InPlaceConcat operation. This has the affect of avoiding quadratic complexity for