[issue24076] sum() several times slower on Python 3 64-bit

2021-09-22 Thread Pablo Galindo Salgado


Pablo Galindo Salgado  added the comment:

Always happy to help :)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24076] sum() several times slower on Python 3 64-bit

2021-09-22 Thread Stefan Behnel


Stefan Behnel  added the comment:

Sorry for that, Pablo. I knew exactly where the problem was, the second I read 
your notification. Thank you for resolving it so quickly.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24076] sum() several times slower on Python 3 64-bit

2021-09-21 Thread Pablo Galindo Salgado


Pablo Galindo Salgado  added the comment:


New changeset 1c7e98dc258a0e7ccd2325a1aefc4aa2de51e1c5 by Pablo Galindo Salgado 
in branch 'main':
bpo-24076: Fix reference in sum() introduced by GH-28469 (GH-28493)
https://github.com/python/cpython/commit/1c7e98dc258a0e7ccd2325a1aefc4aa2de51e1c5


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24076] sum() several times slower on Python 3 64-bit

2021-09-21 Thread Pablo Galindo Salgado


Pablo Galindo Salgado  added the comment:

Sorry, I meant PR 28493

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24076] sum() several times slower on Python 3 64-bit

2021-09-21 Thread Pablo Galindo Salgado


Pablo Galindo Salgado  added the comment:

Opened #28493 to fix the refleak

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24076] sum() several times slower on Python 3 64-bit

2021-09-21 Thread Pablo Galindo Salgado


Change by Pablo Galindo Salgado :


--
pull_requests: +26888
pull_request: https://github.com/python/cpython/pull/28493

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24076] sum() several times slower on Python 3 64-bit

2021-09-21 Thread Pablo Galindo Salgado

Pablo Galindo Salgado  added the comment:

Unfortunately commit debd80403721b00423680328d6adf160a28fbff4 introduced a 
reference leak:


❯ ./python -m test test_grammar -R :
0:00:00 load avg: 2.96 Run tests sequentially
0:00:00 load avg: 2.96 [1/1] test_grammar
beginning 9 repetitions
123456789
.
test_grammar leaked [12, 12, 12, 12] references, sum=48
test_grammar failed (reference leak)

== Tests result: FAILURE ==

1 test failed:
test_grammar

Total duration: 1.1 sec
Tests result: FAILURE

debd80403721b00423680328d6adf160a28fbff4 is the first bad commit
commit debd80403721b00423680328d6adf160a28fbff4
Author: scoder 
Date:   Tue Sep 21 11:01:18 2021 +0200

bpo-24076: Inline single digit unpacking in the integer fastpath of sum() 
(GH-28469)

 .../Core and Builtins/2021-09-20-10-02-12.bpo-24076.ZFgFSj.rst |  1 +
 Python/bltinmodule.c   | 10 +-
 2 files changed, 10 insertions(+), 1 deletion(-)
 create mode 100644 Misc/NEWS.d/next/Core and 
Builtins/2021-09-20-10-02-12.bpo-24076.ZFgFSj.rst

--
nosy: +pablogsal

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24076] sum() several times slower on Python 3 64-bit

2021-09-21 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

Thank you again Stefan. Now no doubts are left.

BTW, pyperf gives more stable results. I use it if have any doubts (either the 
results of timeit are not stable or the difference is less than say 10%).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24076] sum() several times slower on Python 3 64-bit

2021-09-21 Thread Stefan Behnel


Change by Stefan Behnel :


--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24076] sum() several times slower on Python 3 64-bit

2021-09-21 Thread Stefan Behnel


Stefan Behnel  added the comment:

Old, with PGO:
$ ./python -m timeit -s 'd = list(range(2**61, 2**61 + 1))' 'sum(d)'
1000 loops, best of 5: 340 usec per loop
$ ./python -m timeit -s 'd = list(range(2**30, 2**30 + 1))' 'sum(d)'
2000 loops, best of 5: 114 usec per loop
$ ./python -m timeit -s 'd = list(range(2**29, 2**29 + 1))' 'sum(d)'
5000 loops, best of 5: 73.4 usec per loop
$ ./python -m timeit -s 'd = list(range(1))' 'sum(d)'
5000 loops, best of 5: 73.3 usec per loop
$ ./python -m timeit -s 'd = [0] * 1' 'sum(d)'
5000 loops, best of 5: 78.7 usec per loop


New, with PGO:
$ ./python -m timeit -s 'd = list(range(2**61, 2**61 + 1))' 'sum(d)'
1000 loops, best of 5: 305 usec per loop
$ ./python -m timeit -s 'd = list(range(2**30, 2**30 + 1))' 'sum(d)'
2000 loops, best of 5: 115 usec per loop
$ ./python -m timeit -s 'd = list(range(2**29, 2**29 + 1))' 'sum(d)'
5000 loops, best of 5: 52.4 usec per loop
$ ./python -m timeit -s 'd = list(range(1))' 'sum(d)'
5000 loops, best of 5: 54 usec per loop
$ ./python -m timeit -s 'd = [0] * 1' 'sum(d)'
5000 loops, best of 5: 45.8 usec per loop

The results are a bit more mixed with PGO optimisation (I tried a couple of 
times), not sure why. Might just be normal fluctuation, bad benchmark value 
selection, or accidental PGO tuning, can't say. In any case, the 1-digit case 
(1, 2**29) is again about 28% faster and none of the other cases seems 
(visibly) slower.

I think this is a very clear net-win.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24076] sum() several times slower on Python 3 64-bit

2021-09-21 Thread Stefan Behnel


Stefan Behnel  added the comment:

Hmm, thanks for insisting, Serhiy. I was accidentally using a debug build this 
time. I'll make a PGO build and rerun the microbenchmarks.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24076] sum() several times slower on Python 3 64-bit

2021-09-21 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

Thank you. Could you please test PGO builds?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24076] sum() several times slower on Python 3 64-bit

2021-09-21 Thread Stefan Behnel


Stefan Behnel  added the comment:

Original:
$ ./python -m timeit -s 'd = list(range(2**61, 2**61 + 1))' 'sum(d)'
500 loops, best of 5: 712 usec per loop
$ ./python -m timeit -s 'd = list(range(2**30, 2**30 + 1))' 'sum(d)'
2000 loops, best of 5: 149 usec per loop
$ ./python -m timeit -s 'd = list(range(2**29, 2**29 + 1))' 'sum(d)'
2000 loops, best of 5: 107 usec per loop
$ ./python -m timeit -s 'd = list(range(1))' 'sum(d)'
2000 loops, best of 5: 107 usec per loop

New:
$ ./python -m timeit -s 'd = list(range(2**61, 2**61 + 1))' 'sum(d)'
500 loops, best of 5: 713 usec per loop
$ ./python -m timeit -s 'd = list(range(2**30, 2**30 + 1))' 'sum(d)'
2000 loops, best of 5: 148 usec per loop
$ ./python -m timeit -s 'd = list(range(2**29, 2**29 + 1))' 'sum(d)'
5000 loops, best of 5: 77.4 usec per loop
$ ./python -m timeit -s 'd = list(range(1))' 'sum(d)'
5000 loops, best of 5: 77.2 usec per loop

Seems to be 28% faster for the single digit case and exactly as fast as before 
with larger integers.
Note that these are not PGO builds.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24076] sum() several times slower on Python 3 64-bit

2021-09-21 Thread Stefan Behnel


Change by Stefan Behnel :


--
Removed message: https://bugs.python.org/msg402301

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24076] sum() several times slower on Python 3 64-bit

2021-09-21 Thread Stefan Behnel


Stefan Behnel  added the comment:

Original:
$ ./python -m timeit -s 'd = list(range(2**61, 2**61 + 1))' 'sum(d)'
500 loops, best of 5: 712 usec per loop
$ ./python -m timeit -s 'd = list(range(2**30, 2**30 + 1))' 'sum(d)'
2000 loops, best of 5: 149 usec per loop
$ ./python -m timeit -s 'd = list(range(2**29, 2**29 + 1))' 'sum(d)'
2000 loops, best of 5: 107 usec per loop
$ ./python -m timeit -s 'd = list(range(1))' 'sum(d)'
2000 loops, best of 5: 107 usec per loop

New:
stefan@flup:~/ablage/software/Python/python-git$ ./python -m timeit -s 'd = 
list(range(2**61, 2**61 + 1))' 'sum(d)'
500 loops, best of 5: 713 usec per loop
$ ./python -m timeit -s 'd = list(range(2**30, 2**30 + 1))' 'sum(d)'
2000 loops, best of 5: 148 usec per loop
$ ./python -m timeit -s 'd = list(range(2**29, 2**29 + 1))' 'sum(d)'
5000 loops, best of 5: 77.4 usec per loop
$ ./python -m timeit -s 'd = list(range(1))' 'sum(d)'
5000 loops, best of 5: 77.2 usec per loop

Seems to be 28% faster for the single digit case and exactly as fast as before 
with larger integers.
Note that these are not PGO builds.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24076] sum() several times slower on Python 3 64-bit

2021-09-21 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

What are microbenchmark results for PR 28469 in comparison with the baseline?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24076] sum() several times slower on Python 3 64-bit

2021-09-21 Thread Stefan Behnel


Stefan Behnel  added the comment:


New changeset debd80403721b00423680328d6adf160a28fbff4 by scoder in branch 
'main':
bpo-24076: Inline single digit unpacking in the integer fastpath of sum() 
(GH-28469)
https://github.com/python/cpython/commit/debd80403721b00423680328d6adf160a28fbff4


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24076] sum() several times slower on Python 3 64-bit

2021-09-20 Thread Guido van Rossum


Guido van Rossum  added the comment:

Sounds good, you have my blessing.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24076] sum() several times slower on Python 3 64-bit

2021-09-20 Thread Stefan Behnel


Stefan Behnel  added the comment:

> The patch looks fine, but it looks a bit like benchmark chasing. Is the speed 
> of builtin sum() of a sequence of integers important enough to do this bit of 
> inlining?

Given that we already accepted essentially separate loops for the int, float 
and everything else cases, I think the answer is that it doesn't add much to 
the triplication.


> It may break if we change the internals of Py_Long, as Mark Shannon has been 
> wanting to do for a while

I would assume that such a structural change would come with suitable macros to 
unpack the special 0-2 digit integers. Those would then apply here, too. As it 
stands, there are already some modules distributed over the source tree that 
use direct digit access: ceval.c, _decimal.c, marshal.c. They are easy to find 
with grep and my PR just adds one more.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24076] sum() several times slower on Python 3 64-bit

2021-09-20 Thread Guido van Rossum


Guido van Rossum  added the comment:

The patch looks fine, but it looks a bit like benchmark chasing. Is the speed 
of builtin sum() of a sequence of integers important enough to do this bit of 
inlining? (It may break if we change the internals of Py_Long, as Mark Shannon 
has been wanting to do for a while -- see 
https://github.com/faster-cpython/ideas/issues/42.)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24076] sum() several times slower on Python 3 64-bit

2021-09-20 Thread Raymond Hettinger


Raymond Hettinger  added the comment:

> I created a PR from my last patch, inlining the unpacking 
> of single digit integers. 

Thanks, that gets to the heart of the issue.

I marked the PR as approved (though there is a small coding nit you may want to 
fix).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24076] sum() several times slower on Python 3 64-bit

2021-09-20 Thread Stefan Behnel


Stefan Behnel  added the comment:

I created a PR from my last patch, inlining the unpacking of single digit 
integers. Since most integers should fit into a single digit these days, this 
is as fast a path as it gets.

https://github.com/python/cpython/pull/28469

--
versions: +Python 3.11 -Python 3.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24076] sum() several times slower on Python 3 64-bit

2021-09-20 Thread Stefan Behnel


Change by Stefan Behnel :


--
pull_requests: +26868
stage: needs patch -> patch review
pull_request: https://github.com/python/cpython/pull/28469

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24076] sum() several times slower on Python 3 64-bit

2021-09-20 Thread Raymond Hettinger


Raymond Hettinger  added the comment:

> OTOH on my Mac I still find that 3.10 with PGO is still 
> more than twice as slow than 2.7.

> Thinking about it that's a bit odd, since (presumably) 
> the majority of the work in sum() involves a long int result 
> (even though the values returned by range() all fit in 30 bits, 
> the sum quickly exceeds that).

The actual accumulation of a long int result is still as fast as it ever was.

The main difference from Py2.7 isn't the addition, it is that detecting and 
extracting a small int added has become expensive.  


-- Python 2 fastpath --

if (PyInt_CheckExact(item)) {   // Very cheap
long b = PyInt_AS_LONG(item);   // Very cheap
long x = i_result + b;  // Very cheap
if ((x^i_result) >= 0 || (x^b) >= 0) {  // Semi cheap
i_result = x;   // Zero cost 
Py_DECREF(item);// Most expensive step, 
but still cheap
continue;
}
}

-- Python 3 fastpath --

if (PyLong_CheckExact(item) || PyBool_Check(item)) { // Cheap
long b = PyLong_AsLongAndOverflow(item, );  // Super 
Expensive
if (overflow == 0 && // Branch 
predictable test
(i_result >= 0 ? (b <= LONG_MAX - i_result)  // Slower 
but better test  
   : (b >= LONG_MIN - i_result)))
{
i_result += b;// Very 
cheap
Py_DECREF(item);
continue;
}
}

-- Supporting function 

long
PyLong_AsLongAndOverflow(PyObject *vv, int *overflow) // OMG, 
this does a lot of work
{
/* This version by Tim Peters */
PyLongObject *v;
unsigned long x, prev;
long res;
Py_ssize_t i;
int sign;
int do_decref = 0; /* if PyNumber_Index was called */

*overflow = 0;
if (vv == NULL) {
PyErr_BadInternalCall();
return -1;
}

if (PyLong_Check(vv)) {
v = (PyLongObject *)vv;
}
else {
v = (PyLongObject *)_PyNumber_Index(vv);
if (v == NULL)
return -1;
do_decref = 1;
}

res = -1;
i = Py_SIZE(v);

switch (i) {
case -1:
res = -(sdigit)v->ob_digit[0];
break;
case 0:
res = 0;
break;
case 1:
res = v->ob_digit[0];
break;
default:
sign = 1;
x = 0;
if (i < 0) {
sign = -1;
i = -(i);
}
while (--i >= 0) {
prev = x;
x = (x << PyLong_SHIFT) | v->ob_digit[i];
if ((x >> PyLong_SHIFT) != prev) {
*overflow = sign;
goto exit;
}
}
/* Haven't lost any bits, but casting to long requires extra
 * care (see comment above).
 */
if (x <= (unsigned long)LONG_MAX) {
res = (long)x * sign;
}
else if (sign < 0 && x == PY_ABS_LONG_MIN) {
res = LONG_MIN;
}
else {
*overflow = sign;
/* res is already set to -1 */
}
}
  exit:
if (do_decref) {
Py_DECREF(v);
}
return res;
}

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24076] sum() several times slower on Python 3 64-bit

2021-09-18 Thread Guido van Rossum


Guido van Rossum  added the comment:

@Stefan
> FWIW, a PGO build of Py3.7 is now about 20% *faster* here than my Ubuntu 
> 16/04 system Python 2.7

Does that mean we can close this issue? Or do I misunderstand what you are 
comparing? 32 vs. 64 bits? PGO vs. non-PGO?

OTOH on my Mac I still find that 3.10 with PGO is still more than twice as slow 
than 2.7.

Thinking about it that's a bit odd, since (presumably) the majority of the work 
in sum() involves a long int result (even though the values returned by range() 
all fit in 30 bits, the sum quickly exceeds that).

(earlier)
> I suspect that adding a free-list for single-digit PyLong objects (the most 
> common case) would provide some visible benefit.

If my theory is correct that wouldn't help this particular case, right?

FWIW just "for i in [x]range(15, 10**9, 15): pass" is about the same speed in 
Python 2.7 as in 3.11.

--
nosy: +gvanrossum

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24076] sum() several times slower on Python 3 64-bit

2018-08-12 Thread Antoine Pitrou


Change by Antoine Pitrou :


--
nosy:  -pitrou

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24076] sum() several times slower on Python 3 64-bit

2018-08-12 Thread Stefan Behnel


Stefan Behnel  added the comment:

FWIW, a PGO build of Py3.7 is now about 20% *faster* here than my Ubuntu 16/04 
system Python 2.7, and for some (probably unrelated) reason, the system Python 
3.5 is another 2% faster on my side.

IMHO, the only other thing that seems obvious to try would be to inline the 
unpacking of single digit PyLongs into sum(). I attached a simple patch that 
does that, in case someone wants to test it out. For non-PGO builds, it's about 
17% faster for me. Didn't take the time to benchmark PGO builds with it.

--
versions:  -Python 3.5
Added file: https://bugs.python.org/file47748/unpack_single_digits.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24076] sum() several times slower on Python 3 64-bit

2017-04-11 Thread Louie Lu

Changes by Louie Lu :


--
title: sum() several times slower on Python 3 -> sum() several times slower on 
Python 3 64-bit

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com