[issue46235] Do all ref-counting at once for sequence multiplication

2022-01-07 Thread Tim Peters


Change by Tim Peters :


--
assignee:  -> Dennis Sweeney
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46235] Do all ref-counting at once for sequence multiplication

2022-01-07 Thread Tim Peters


Tim Peters  added the comment:


New changeset ad1d5908ada171eff768291371a80022bfad4f04 by Dennis Sweeney in 
branch 'main':
bpo-46235: Do all ref-counting at once during list/tuple multiplication 
(GH-30346)
https://github.com/python/cpython/commit/ad1d5908ada171eff768291371a80022bfad4f04


--
nosy: +tim.peters

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46235] Do all ref-counting at once for sequence multiplication

2022-01-03 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

Benchmarks are definitively better with the most recent change (in which the 
major data-copying loop is between parts of the same buffer):

-- MSVC 

Slower (3):
- (None,) * 2: 54.0 ns +- 0.7 ns -> 63.4 ns +- 2.5 ns: 1.17x slower
- [None] * 2: 61.6 ns +- 3.3 ns -> 69.0 ns +- 1.4 ns: 1.12x slower
- [None] * 10: 72.3 ns +- 2.2 ns -> 73.9 ns +- 2.3 ns: 1.02x slower

Faster (33):
- [None] * 1: 25.1 us +- 0.2 us -> 12.8 us +- 0.2 us: 1.96x faster
- (None,) * 1: 26.7 us +- 0.2 us -> 15.2 us +- 0.2 us: 1.76x faster
- list(range(1000)) * 100: 248 us +- 95 us -> 146 us +- 50 us: 1.70x faster
- list(range(1000)) * 10: 16.8 us +- 0.2 us -> 10.4 us +- 0.1 us: 1.61x faster
- list(range(100)) * 100: 13.8 us +- 0.3 us -> 8.60 us +- 0.09 us: 1.60x faster
- list(range(10)) * 100: 1.41 us +- 0.02 us -> 924 ns +- 15 ns: 1.53x faster
- list(range(100)) * 10: 1.47 us +- 0.10 us -> 981 ns +- 14 ns: 1.50x faster
- tuple(range(10)) * 1: 255 us +- 91 us -> 171 us +- 51 us: 1.49x faster
- [None] * 100: 355 ns +- 11 ns -> 245 ns +- 7 ns: 1.45x faster
- (None,) * 100: 357 ns +- 4 ns -> 261 ns +- 3 ns: 1.37x faster
- tuple(range(1000)) * 10: 17.8 us +- 0.3 us -> 13.3 us +- 0.1 us: 1.34x faster
- list(range(10)) * 1: 205 us +- 49 us -> 157 us +- 95 us: 1.31x faster
- list(range(1000)) * 2: 3.32 us +- 0.07 us -> 2.63 us +- 0.03 us: 1.26x faster
- tuple(range(100)) * 100: 15.9 us +- 0.2 us -> 13.0 us +- 0.1 us: 1.22x faster
- list(range(1000)) * 1: 33.9 ms +- 0.9 ms -> 28.2 ms +- 1.6 ms: 1.20x 
faster
- ["Python", "Perl"] * 1: 31.4 us +- 4.3 us -> 26.2 us +- 0.7 us: 1.20x 
faster
- tuple(range(1000)) * 1: 33.8 ms +- 0.8 ms -> 28.3 ms +- 1.3 ms: 1.19x 
faster
- tuple(range(100)) * 10: 1.68 us +- 0.02 us -> 1.42 us +- 0.01 us: 1.18x faster
- ["Python", "Perl"] * 100: 442 ns +- 67 ns -> 376 ns +- 6 ns: 1.18x faster
- tuple(range(10)) * 100: 1.63 us +- 0.02 us -> 1.40 us +- 0.02 us: 1.17x faster
- tuple(range(1000)) * 2: 3.61 us +- 0.06 us -> 3.11 us +- 0.06 us: 1.16x faster
- list(range(10)) * 10: 238 ns +- 8 ns -> 206 ns +- 6 ns: 1.16x faster
- list(range(100)) * 1: 3.15 ms +- 0.13 ms -> 2.73 ms +- 0.15 ms: 1.15x 
faster
- list(range(100)) * 2: 370 ns +- 12 ns -> 323 ns +- 6 ns: 1.14x faster
- tuple(range(100)) * 1: 3.24 ms +- 0.13 ms -> 2.85 ms +- 0.13 ms: 1.14x 
faster
- ("Python", "Perl") * 1: 31.1 us +- 0.3 us -> 27.4 us +- 0.3 us: 1.14x 
faster
- ("Python", "Perl") * 100: 438 ns +- 7 ns -> 389 ns +- 9 ns: 1.13x faster
- ["Python", "Perl"] * 10: 86.1 ns +- 4.2 ns -> 77.3 ns +- 2.4 ns: 1.11x faster
- list(range(10)) * 2: 81.5 ns +- 2.8 ns -> 77.0 ns +- 2.1 ns: 1.06x faster
- ("Python", "Perl") * 10: 92.6 ns +- 7.8 ns -> 88.4 ns +- 4.8 ns: 1.05x faster
- ["Python", "Perl"] * 2: 63.9 ns +- 2.6 ns -> 61.3 ns +- 1.7 ns: 1.04x faster
- tuple(range(100)) * 2: 414 ns +- 19 ns -> 404 ns +- 6 ns: 1.02x faster
- tuple(range(10)) * 10: 258 ns +- 7 ns -> 253 ns +- 8 ns: 1.02x faster

Benchmark hidden because not significant (4): ("Python", "Perl") * 2, 
tuple(range(10)) * 2, (None,) * 10, tuple(range(1000)) * 100

Geometric mean: 1.21x faster


 GCC PGO on WSL 

Slower (4):
- [None] * 10: 58.1 ns +- 0.7 ns -> 59.4 ns +- 0.9 ns: 1.02x slower
- ("Python", "Perl") * 2: 48.9 ns +- 0.6 ns -> 49.9 ns +- 0.6 ns: 1.02x slower
- [None] * 1000: 1.56 us +- 0.02 us -> 1.59 us +- 0.02 us: 1.02x slower
- [None] * 2: 51.9 ns +- 0.5 ns -> 52.4 ns +- 0.8 ns: 1.01x slower

Faster (44):
- list(range(100)) * 1: 1.98 ms +- 0.33 ms -> 976 us +- 43 us: 2.03x faster
- list(range(1000)) * 100: 168 us +- 2 us -> 84.8 us +- 1.0 us: 1.98x faster
- list(range(1000)) * 10: 15.9 us +- 0.2 us -> 8.47 us +- 0.20 us: 1.88x faster
- list(range(100)) * 100: 13.1 us +- 0.2 us -> 7.02 us +- 0.12 us: 1.87x faster
- list(range(1000)) * 1000: 1.80 ms +- 0.05 ms -> 986 us +- 12 us: 1.83x faster
- list(range(100)) * 1000: 140 us +- 2 us -> 79.1 us +- 0.9 us: 1.77x faster
- list(range(10)) * 1: 136 us +- 2 us -> 77.5 us +- 1.0 us: 1.76x faster
- list(range(1000)) * 1: 26.0 ms +- 0.6 ms -> 15.2 ms +- 0.3 ms: 1.71x 
faster
- list(range(10)) * 1000: 12.2 us +- 0.2 us -> 7.18 us +- 0.08 us: 1.70x faster
- tuple(range(1000)) * 100: 167 us +- 2 us -> 106 us +- 1 us: 1.58x faster
- tuple(range(1000)) * 1000: 1.78 ms +- 0.03 ms -> 1.13 ms +- 0.01 ms: 1.58x 
faster
- tuple(range(10)) * 1: 148 us +- 7 us -> 95.5 us +- 1.1 us: 1.55x faster
- list(range(10)) * 100: 1.19 us +- 0.10 us -> 795 ns +- 20 ns: 1.49x faster
- tuple(range(1000)) * 10: 15.9 us +- 0.4 us -> 10.6 us +- 0.2 us: 1.49x faster
- list(range(100)) * 10: 1.17 us +- 0.03 us -> 797 ns +- 17 ns: 1.47x faster
- (None,) * 1: 27.5 us +- 0.3 us -> 18.9 us +- 0.1 us: 1.46x faster
- tuple(range(100)) * 1000: 138 us +- 2 us -> 95.4 us +- 0.7 us: 1.45x faster
- tuple(range(100)) * 1: 1.52 ms +- 0.02 ms -> 1.05 ms +- 0.01 ms: 1.45x 
faster
- tuple(range(10)) * 1000: 12.9 us +- 0.2 us -> 9.28 us +- 0.18 us: 

[issue46235] Do all ref-counting at once for sequence multiplication

2022-01-03 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

Hm... the effects are completely different on GCC. This is with 
--enable-optimizations and GCC on WSL:

Slower (29):
- tuple(range(100)) * 10: 1.22 us +- 0.02 us -> 2.29 us +- 0.05 us: 1.87x slower
- tuple(range(100)) * 2: 345 ns +- 7 ns -> 629 ns +- 5 ns: 1.82x slower
- tuple(range(1000)) * 2: 3.03 us +- 0.06 us -> 5.32 us +- 0.05 us: 1.76x slower
- tuple(range(10)) * 100: 1.25 us +- 0.08 us -> 2.18 us +- 0.03 us: 1.74x slower
- tuple(range(10)) * 1000: 12.9 us +- 0.2 us -> 20.8 us +- 0.2 us: 1.61x slower
- tuple(range(100)) * 100: 13.0 us +- 0.2 us -> 20.9 us +- 0.1 us: 1.60x slower
- tuple(range(100)) * 1000: 138 us +- 2 us -> 211 us +- 2 us: 1.52x slower
- ("Python", "Perl") * 1000: 3.18 us +- 0.03 us -> 4.69 us +- 0.03 us: 1.47x 
slower
- tuple(range(100)) * 1: 1.52 ms +- 0.02 ms -> 2.21 ms +- 0.02 ms: 1.45x 
slower
- tuple(range(10)) * 1: 148 us +- 7 us -> 211 us +- 2 us: 1.43x slower
- tuple(range(1000)) * 10: 15.9 us +- 0.4 us -> 22.3 us +- 0.2 us: 1.40x slower
- ("Python", "Perl") * 1: 33.8 us +- 0.8 us -> 46.5 us +- 0.6 us: 1.38x 
slower
- ("Python", "Perl") * 100: 428 ns +- 9 ns -> 577 ns +- 6 ns: 1.35x slower
- tuple(range(1000)) * 100: 167 us +- 2 us -> 218 us +- 2 us: 1.30x slower
- tuple(range(10)) * 2: 85.5 ns +- 3.9 ns -> 111 ns +- 3 ns: 1.30x slower
- (None,) * 100: 306 ns +- 4 ns -> 396 ns +- 3 ns: 1.29x slower
- tuple(range(1000)) * 1000: 1.78 ms +- 0.03 ms -> 2.27 ms +- 0.02 ms: 1.27x 
slower
- (None,) * 1000: 2.66 us +- 0.02 us -> 3.25 us +- 0.03 us: 1.22x slower
- tuple(range(10)) * 10: 236 ns +- 17 ns -> 287 ns +- 2 ns: 1.22x slower
- tuple(range(1000)) * 1: 23.2 ms +- 0.4 ms -> 28.1 ms +- 0.3 ms: 1.21x 
slower
- ("Python", "Perl") * 10: 88.6 ns +- 2.7 ns -> 106 ns +- 3 ns: 1.19x slower
- (None,) * 1: 27.5 us +- 0.3 us -> 31.5 us +- 0.2 us: 1.14x slower
- ("Python", "Perl") * 2: 48.9 ns +- 0.6 ns -> 54.7 ns +- 0.8 ns: 1.12x slower
- (None,) * 10: 63.4 ns +- 1.7 ns -> 70.0 ns +- 0.6 ns: 1.10x slower
- (None,) * 2: 46.7 ns +- 0.5 ns -> 49.8 ns +- 0.5 ns: 1.06x slower
- list(range(10)) * 2: 73.5 ns +- 1.2 ns -> 76.6 ns +- 2.3 ns: 1.04x slower
- [None] * 10: 58.1 ns +- 0.7 ns -> 60.3 ns +- 0.7 ns: 1.04x slower
- ["Python", "Perl"] * 2: 56.1 ns +- 0.6 ns -> 57.4 ns +- 0.9 ns: 1.02x slower
- [None] * 2: 51.9 ns +- 0.5 ns -> 52.3 ns +- 0.5 ns: 1.01x slower

Faster (18):
- list(range(100)) * 1: 1.98 ms +- 0.33 ms -> 953 us +- 14 us: 2.08x faster
- list(range(1000)) * 100: 168 us +- 2 us -> 86.8 us +- 0.9 us: 1.93x faster
- list(range(1000)) * 10: 15.9 us +- 0.2 us -> 8.40 us +- 0.13 us: 1.89x faster
- list(range(100)) * 100: 13.1 us +- 0.2 us -> 7.41 us +- 0.09 us: 1.77x faster
- list(range(1000)) * 1000: 1.80 ms +- 0.05 ms -> 1.04 ms +- 0.04 ms: 1.73x 
faster
- list(range(100)) * 1000: 140 us +- 2 us -> 82.2 us +- 4.9 us: 1.70x faster
- list(range(1000)) * 1: 26.0 ms +- 0.6 ms -> 15.4 ms +- 0.3 ms: 1.69x 
faster
- ["Python", "Perl"] * 1: 33.8 us +- 0.3 us -> 21.2 us +- 0.2 us: 1.59x 
faster
- ["Python", "Perl"] * 1000: 3.41 us +- 0.03 us -> 2.17 us +- 0.02 us: 1.57x 
faster
- list(range(10)) * 1000: 12.2 us +- 0.2 us -> 8.11 us +- 0.21 us: 1.51x faster
- list(range(10)) * 1: 136 us +- 2 us -> 91.7 us +- 1.0 us: 1.49x faster
- list(range(10)) * 100: 1.19 us +- 0.10 us -> 806 ns +- 15 ns: 1.47x faster
- list(range(1000)) * 2: 3.05 us +- 0.07 us -> 2.08 us +- 0.04 us: 1.46x faster
- list(range(100)) * 10: 1.17 us +- 0.03 us -> 850 ns +- 16 ns: 1.38x faster
- ["Python", "Perl"] * 100: 440 ns +- 8 ns -> 325 ns +- 5 ns: 1.35x faster
- list(range(10)) * 10: 172 ns +- 2 ns -> 148 ns +- 2 ns: 1.16x faster
- list(range(100)) * 2: 320 ns +- 19 ns -> 289 ns +- 3 ns: 1.11x faster
- ["Python", "Perl"] * 10: 85.0 ns +- 3.1 ns -> 79.7 ns +- 2.4 ns: 1.07x faster

Benchmark hidden because not significant (3): [None] * 100, [None] * 1000, 
[None] * 1

Geometric mean: 1.01x slower

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46235] Do all ref-counting at once for sequence multiplication

2022-01-02 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

# benchmarking script

from pyperf import Runner
runner = Runner()

for n in [2, 10, 100, 10_000]:
for A in [
'[None]',
'["Python", "Perl"]',
'list(range(10))',
'list(range(100))',
'list(range(1000))',
'(None,)',
'("Python", "Perl")',
'tuple(range(10))',
'tuple(range(100))',
'tuple(range(1000))',
]:
runner.timeit(
name=f"{A} * {n}",
setup=f"x = {A}",
stmt=f"x * {n}",
)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46235] Do all ref-counting at once for sequence multiplication

2022-01-02 Thread Dennis Sweeney


Change by Dennis Sweeney :


--
keywords: +patch
pull_requests: +28558
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/30346

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46235] Do all ref-counting at once for sequence multiplication

2022-01-02 Thread Dennis Sweeney


New submission from Dennis Sweeney :

Some benchmarks for this change are below. The case with the largest speedup, 
`[None] * 1`, is the case that I would consider the most important: 
initializing counters of the form `[0] * N` is very common in my experience.

The following were taken from a PGO build on Windows without CPU isolation, so 
take them with a grain of salt:

Slower (16):
- [None] * 2: 61.6 ns +- 3.3 ns -> 71.0 ns +- 1.5 ns: 1.15x slower
- list(range(10)) * 1: 205 us +- 49 us -> 232 us +- 93 us: 1.13x slower
- ["Python", "Perl"] * 1: 31.4 us +- 4.3 us -> 34.4 us +- 0.2 us: 1.10x 
slower
- list(range(100)) * 2: 370 ns +- 12 ns -> 395 ns +- 14 ns: 1.07x slower
- list(range(10)) * 2: 81.5 ns +- 2.8 ns -> 86.9 ns +- 2.4 ns: 1.07x slower
- [None] * 10: 72.3 ns +- 2.2 ns -> 75.8 ns +- 2.1 ns: 1.05x slower
- ["Python", "Perl"] * 100: 442 ns +- 67 ns -> 463 ns +- 4 ns: 1.05x slower
- tuple(range(10)) * 2: 88.2 ns +- 2.9 ns -> 91.9 ns +- 2.7 ns: 1.04x slower
- ("Python", "Perl") * 10: 92.6 ns +- 7.8 ns -> 96.0 ns +- 4.5 ns: 1.04x slower
- ["Python", "Perl"] * 10: 86.1 ns +- 4.2 ns -> 89.1 ns +- 2.7 ns: 1.04x slower
- (None,) * 10: 69.4 ns +- 1.9 ns -> 71.2 ns +- 3.1 ns: 1.03x slower
- ["Python", "Perl"] * 2: 63.9 ns +- 2.6 ns -> 65.3 ns +- 1.4 ns: 1.02x slower
- (None,) * 2: 54.0 ns +- 0.7 ns -> 55.1 ns +- 2.1 ns: 1.02x slower
- ("Python", "Perl") * 2: 57.5 ns +- 3.0 ns -> 58.6 ns +- 2.4 ns: 1.02x slower
- list(range(10)) * 10: 238 ns +- 8 ns -> 242 ns +- 7 ns: 1.02x slower
- tuple(range(100)) * 2: 414 ns +- 19 ns -> 420 ns +- 6 ns: 1.02x slower

Faster (22):
- [None] * 1: 25.1 us +- 0.2 us -> 12.8 us +- 0.2 us: 1.97x faster
- tuple(range(10)) * 1: 255 us +- 91 us -> 176 us +- 77 us: 1.45x faster
- tuple(range(1000)) * 10: 17.8 us +- 0.3 us -> 12.4 us +- 0.1 us: 1.44x faster
- [None] * 100: 355 ns +- 11 ns -> 251 ns +- 7 ns: 1.41x faster
- list(range(1000)) * 10: 16.8 us +- 0.2 us -> 12.5 us +- 0.1 us: 1.34x faster
- tuple(range(100)) * 100: 15.9 us +- 0.2 us -> 12.7 us +- 0.2 us: 1.26x faster
- tuple(range(10)) * 100: 1.63 us +- 0.02 us -> 1.33 us +- 0.01 us: 1.23x faster
- tuple(range(1000)) * 2: 3.61 us +- 0.06 us -> 2.93 us +- 0.03 us: 1.23x faster
- list(range(100)) * 100: 13.8 us +- 0.3 us -> 11.2 us +- 0.1 us: 1.23x faster
- tuple(range(100)) * 10: 1.68 us +- 0.02 us -> 1.40 us +- 0.01 us: 1.20x faster
- tuple(range(1000)) * 100: 240 us +- 87 us -> 200 us +- 102 us: 1.20x faster
- tuple(range(1000)) * 1: 33.8 ms +- 0.8 ms -> 28.3 ms +- 1.5 ms: 1.19x 
faster
- list(range(1000)) * 1: 33.9 ms +- 0.9 ms -> 28.6 ms +- 1.4 ms: 1.19x 
faster
- list(range(100)) * 10: 1.47 us +- 0.10 us -> 1.25 us +- 0.01 us: 1.18x faster
- list(range(10)) * 100: 1.41 us +- 0.02 us -> 1.22 us +- 0.02 us: 1.16x faster
- list(range(100)) * 1: 3.15 ms +- 0.13 ms -> 2.79 ms +- 0.13 ms: 1.13x 
faster
- list(range(1000)) * 2: 3.32 us +- 0.07 us -> 2.94 us +- 0.04 us: 1.13x faster
- tuple(range(100)) * 1: 3.24 ms +- 0.13 ms -> 2.88 ms +- 0.14 ms: 1.13x 
faster
- (None,) * 1: 26.7 us +- 0.2 us -> 24.1 us +- 0.2 us: 1.11x faster
- tuple(range(10)) * 10: 258 ns +- 7 ns -> 249 ns +- 6 ns: 1.04x faster
- ("Python", "Perl") * 100: 438 ns +- 7 ns -> 436 ns +- 7 ns: 1.01x faster
- ("Python", "Perl") * 1: 31.1 us +- 0.3 us -> 30.9 us +- 0.2 us: 1.01x 
faster

Benchmark hidden because not significant (2): list(range(1000)) * 100, (None,) 
* 100

Geometric mean: 1.10x faster

--
components: Interpreter Core
messages: 409546
nosy: Dennis Sweeney
priority: normal
severity: normal
status: open
title: Do all ref-counting at once for sequence multiplication
type: performance
versions: Python 3.11

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com