[issue41902] Micro optimization for range.index if step is 1

2020-10-20 Thread Dong-hee Na
Change by Dong-hee Na : -- resolution: -> fixed stage: patch review -> resolved status: open -> closed ___ Python tracker ___ ___

[issue41902] Micro optimization for range.index if step is 1

2020-10-20 Thread Dong-hee Na
Dong-hee Na added the comment: New changeset c0f22fb8b3006936757cebb959cee94e285bc503 by Dong-hee Na in branch 'master': bpo-41902: Micro optimization for range.index if step is 1 (GH-22479) https://github.com/python/cpython/commit/c0f22fb8b3006936757cebb959cee94e285bc503 --

[issue41902] Micro optimization for range.index if step is 1

2020-10-20 Thread Inada Naoki
Inada Naoki added the comment: New changeset 25492a5b59c5b74328278f195540e318ab87674f by Dong-hee Na in branch 'master': bpo-41902: Micro optimization for compute_item of range (GH-22492) https://github.com/python/cpython/commit/25492a5b59c5b74328278f195540e318ab87674f -- nosy:

[issue41902] Micro optimization for range.index if step is 1

2020-10-02 Thread hai shi
Change by hai shi : -- nosy: +shihai1991 ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue41902] Micro optimization for range.index if step is 1

2020-10-02 Thread Pablo Galindo Salgado
Pablo Galindo Salgado added the comment: I would need to carefully look at the PRs to estimate the maintenance cost, but it seems to me initially that all these operations happen very rarely in regular code and probably will have no impact on macro benchmarks. In general, I dislike branches

[issue41902] Micro optimization for range.index if step is 1

2020-10-02 Thread Dong-hee Na
Dong-hee Na added the comment: @vstinner @pablo @mark On my local machine (without cpu isolation), PR 22480 does not affect performance issues. import pyperf runner = pyperf.Runner() runner.timeit(name="bench long divide", stmt=""" for i in range(1, 256): a = 1 // i

[issue41902] Micro optimization for range.index if step is 1

2020-10-01 Thread Dong-hee Na
Change by Dong-hee Na : -- pull_requests: +21509 pull_request: https://github.com/python/cpython/pull/22492 ___ Python tracker ___

[issue41902] Micro optimization for range.index if step is 1

2020-10-01 Thread Dong-hee Na
Dong-hee Na added the comment: Mean +- std dev: [master-compute] 317 ns +- 3 ns -> [bpo-41902-compute] 287 ns +- 6 ns: 1.11x faster (-10%) -- ___ Python tracker ___

[issue41902] Micro optimization for range.index if step is 1

2020-10-01 Thread Dong-hee Na
Dong-hee Na added the comment: I found another optimal case and this looks more practical usecase than range.index -- Added file: https://bugs.python.org/file49485/bench_range_compute_item.py ___ Python tracker

[issue41902] Micro optimization for range.index if step is 1

2020-10-01 Thread Dong-hee Na
Dong-hee Na added the comment: > s there reason to believe that range.index is performed often enough that > it's worth the extra code and maintenance cost here? There are at least 3 reasons 1. pointer comparaition optimization quite common usecase in CPython codebase. e.x) list.count 2. if

[issue41902] Micro optimization for range.index if step is 1

2020-10-01 Thread Ammar Askar
Ammar Askar added the comment: Out of curiosity, what is the motivation for this optimization? Is there reason to believe that range.index is performed often enough that it's worth the extra code and maintenance cost here? -- nosy: +ammar2 ___

[issue41902] Micro optimization for range.index if step is 1

2020-10-01 Thread STINNER Victor
STINNER Victor added the comment: Mark: "I'd much prefer not. Every extra fast path check costs time for the general case, and there's no strong reason to expect people to be dividing by one. The range code seems like the right place for this optimization, not the long-divide code." In

[issue41902] Micro optimization for range.index if step is 1

2020-10-01 Thread Mark Dickinson
Mark Dickinson added the comment: > Do we need another fast-path in long_div(a, b) when b == _PyLong_One? Just > return a in this case. I'd much prefer not. Every extra fast path check costs time for the general case, and there's no strong reason to expect people to be dividing by one. The

[issue41902] Micro optimization for range.index if step is 1

2020-10-01 Thread Dong-hee Na
Dong-hee Na added the comment: PR 22479 and PR 22480 are different approaches. I (and Victor) want to check which approach might be better. PR 22480 would affect overall long division performance PR 22479 assumes that step=1 case is very often (e.g range(100), range(-100, 100)) --

[issue41902] Micro optimization for range.index if step is 1

2020-10-01 Thread Dong-hee Na
Change by Dong-hee Na : -- pull_requests: +21498 pull_request: https://github.com/python/cpython/pull/22480 ___ Python tracker ___

[issue41902] Micro optimization for range.index if step is 1

2020-10-01 Thread Dong-hee Na
Dong-hee Na added the comment: For the record with optimization Mean +- std dev: [range_master] 112 ns +- 3 ns -> [range_opt] 99.3 ns +- 1.8 ns: 1.13x faster (-12%) -- ___ Python tracker

[issue41902] Micro optimization for range.index if step is 1

2020-10-01 Thread Dong-hee Na
Dong-hee Na added the comment: > do you build Python with PGO+LTO optimizations Nope, but I will try to run the benchmark > What is your OS and CPU? Do you use CPU isolation on Linux? macOS, intel i9 with 8 cores -- ___ Python tracker

[issue41902] Micro optimization for range.index if step is 1

2020-10-01 Thread STINNER Victor
STINNER Victor added the comment: About your benchmark, do you build Python with PGO+LTO optimizations? What is your OS and CPU? Do you use CPU isolation on Linux? It would be good to use PGO+LTO optimizations. -- ___ Python tracker

[issue41902] Micro optimization for range.index if step is 1

2020-10-01 Thread Dong-hee Na
Dong-hee Na added the comment: For practical use case >>> a = range(-2, 10) >>> a.index(2) -- ___ Python tracker ___ ___

[issue41902] Micro optimization for range.index if step is 1

2020-10-01 Thread STINNER Victor
STINNER Victor added the comment: PR 22479 avoids calling PyNumber_FloorDivide(a, b) if b == 1 (if b == _PyLong_One): it makes range.index(a, 1) call 214 ns faster. I'm surprised that PyNumber_FloorDivide(a, 1) takes 214 ns. Does it spend most time in binary_op1()? Or

[issue41902] Micro optimization for range.index if step is 1

2020-10-01 Thread Dong-hee Na
Dong-hee Na added the comment: For optimization case, >>> a = range(0, 1) >>> a.index(3) 3 -- ___ Python tracker ___ ___

[issue41902] Micro optimization for range.index if step is 1

2020-10-01 Thread Dong-hee Na
Change by Dong-hee Na : -- keywords: +patch pull_requests: +21496 stage: -> patch review pull_request: https://github.com/python/cpython/pull/22479 ___ Python tracker ___

[issue41902] Micro optimization for range.index if step is 1

2020-10-01 Thread Dong-hee Na
Change by Dong-hee Na : Added file: https://bugs.python.org/file49483/bench_range_step.py ___ Python tracker ___ ___ Python-bugs-list

[issue41902] Micro optimization for range.index if step is 1

2020-10-01 Thread Dong-hee Na
New submission from Dong-hee Na : if we declare range without setting step, we don't have to process divide operation. here is the benchmark result both setting step and without step, With my patch, there is no performance degrade with range.index when the step is not one and showing 19%