[issue44376] Improve performance of integer exponentiation

2021-06-12 Thread Tim Peters


Tim Peters  added the comment:

Closing this now because the pull request did, I believe, all that can be done 
at the function level. Exponents of 1 and 2 are well within a factor of 2 of 
repeated multiplication now, and it's essentially a tie at exponent 3 now. 
Above that, pow() wins now. On my box.

Doing better would require a smarter compiler, which, e.g., knew that `pow(x, 
2)` is the same as `x*x`. But, as is, `pow` is just another identifier to 
CPython's compiler, and may refer to any code at all. `i**2` isn't really much 
better, because CPython just redirects to type(i)'s __pow__ function at 
runtime. Which, again to the compiler, may refer to any code at all.

`pow()` is quite an involved function, needing to cater to all sorts of things, 
including possible reduction by the optional modulus, and possibly negative 
exponents.

`pow(i, 2)` (same as `i**2` under the covers) does exactly one Python-int 
multiplication now, same as `i*i`. That's fast. In either case overheads 
account for the bulk of the elapsed time. The overhead of going around the eval 
loop an "extra" time (in `i*i`) and doing another name lookup is simply smaller 
than all the overheads `pow()` incurs to _deduce_, at runtime, that it's only 
being asked to do one multiply.

--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44376] Improve performance of integer exponentiation

2021-06-12 Thread Tim Peters


Tim Peters  added the comment:


New changeset 9d8dd8f08aae4ad6e73a9322a4e9dee965afebbc by Tim Peters in branch 
'main':
bpo-44376 - reduce pow() overhead for small exponents (GH-26662)
https://github.com/python/cpython/commit/9d8dd8f08aae4ad6e73a9322a4e9dee965afebbc


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44376] Improve performance of integer exponentiation

2021-06-10 Thread Tim Peters


Tim Peters  added the comment:

This is a stab at reducing overhead for small exponents, along the lines I 
sketched:

https://github.com/python/cpython/pull/26662

Unfortunately, I've been unable to convince BPO and GitHub to recognize that 
the PR is related to this report. Did something basic change?

Incidentally, at first this change triggered rare shutdown deaths due to 
negative refcounts, in the collection of small integer objects. That was a 
head-scratcher! Turns that was, I believe, due to a "temp = NULL" line missing 
from earlier code introduced to implement powers of modular inverses.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44376] Improve performance of integer exponentiation

2021-06-10 Thread Tim Peters


Change by Tim Peters :


--
keywords: +patch
pull_requests: +25248
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/26662

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44376] Improve performance of integer exponentiation

2021-06-10 Thread Tim Peters


Tim Peters  added the comment:

Under the released 3.9.5 for 64-bit Win10, raising to the power 2 is clearly 
much slower than multiplying directly:

C:\Windows\System32>py -3 -m timeit -s "x=151" "x*x"
1000 loops, best of 5: 30 nsec per loop

C:\Windows\System32>py -3 -m timeit -s "x=151" "x**2"
100 loops, best of 5: 194 nsec per loop

Since the multiplication itself is cheap, overheads must account for it. 
Offhand, looks to me like the `x**2` spelling is actually doing 31 
multiplications under the covers, although most of them are as cheap as 
Python-int multiplies get.

for (i = Py_SIZE(b) - 1; i >= 0; --i) {
digit bi = b->ob_digit[i];

for (j = (digit)1 << (PyLong_SHIFT-1); j != 0; j >>= 1) {
MULT(z, z, z);
if (bi & j)
MULT(z, a, z);
}
}

Python ints on a 64-bit box are stored internally in base 2**30 (PyLong_SHIFT 
is 30). z starts life at 1. The first 28 trips through the loop are chewing up 
the 28 leading zero bits in exponent 2, so MULT(z, z, z) multiplies 1 by 1 to 
get 1, 28 times. Then again on the 29th iteration, but now "bi & j" is finally 
true (we finally found the leading one bit in exponent 2), so z is replaced by 
1 times the base = the base. On the final, 30th, iteration, MULT(z, z, z) 
replaces z with its square, and we're done.

It would probably be worthwhile to add code special-casing the leading Python 
"digit" of the exponent, fast-forwarding without any multiplies to the leading 
one bit, and setting z directly to the base then.

--
nosy: +tim.peters

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44376] Improve performance of integer exponentiation

2021-06-10 Thread Mark Dickinson


Mark Dickinson  added the comment:

I can't reproduce this on my Mac laptop (using Python builds from MacPorts). 
Numbers for both x**2 and x*x are fairly stable across Python 3.2 to Python 
3.10. There's some variation, but nothing close to the same extent that Steven 
is seeing.

Here are my raw numbers:

lovelace:cpython mdickinson$ /opt/local/bin/python3.2 -m timeit -s "x=115" "x*x"
1000 loops, best of 3: 0.031 usec per loop
lovelace:cpython mdickinson$ /opt/local/bin/python3.3 -m timeit -s "x=115" "x*x"
1000 loops, best of 3: 0.0297 usec per loop
lovelace:cpython mdickinson$ /opt/local/bin/python3.4 -m timeit -s "x=115" "x*x"
1000 loops, best of 3: 0.0286 usec per loop
lovelace:cpython mdickinson$ /opt/local/bin/python3.5 -m timeit -s "x=115" "x*x"
1000 loops, best of 3: 0.03 usec per loop
lovelace:cpython mdickinson$ /opt/local/bin/python3.6 -m timeit -s "x=115" "x*x"
1000 loops, best of 3: 0.0312 usec per loop
lovelace:cpython mdickinson$ /opt/local/bin/python3.7 -m timeit -s "x=115" "x*x"
1000 loops, best of 5: 28.7 nsec per loop
lovelace:cpython mdickinson$ /opt/local/bin/python3.8 -m timeit -s "x=115" "x*x"
1000 loops, best of 5: 32 nsec per loop
lovelace:cpython mdickinson$ /opt/local/bin/python3.9 -m timeit -s "x=115" "x*x"
1000 loops, best of 5: 33.5 nsec per loop
lovelace:cpython mdickinson$ /opt/local/bin/python3.10 -m timeit -s "x=115" 
"x*x"
1000 loops, best of 5: 32.3 nsec per loop
lovelace:cpython mdickinson$ /opt/local/bin/python3.2 -m timeit -s "x=115" 
"x**2"
100 loops, best of 3: 0.249 usec per loop
lovelace:cpython mdickinson$ /opt/local/bin/python3.3 -m timeit -s "x=115" 
"x**2"
100 loops, best of 3: 0.224 usec per loop
lovelace:cpython mdickinson$ /opt/local/bin/python3.4 -m timeit -s "x=115" 
"x**2"
100 loops, best of 3: 0.221 usec per loop
lovelace:cpython mdickinson$ /opt/local/bin/python3.5 -m timeit -s "x=115" 
"x**2"
100 loops, best of 3: 0.213 usec per loop
lovelace:cpython mdickinson$ /opt/local/bin/python3.6 -m timeit -s "x=115" 
"x**2"
100 loops, best of 3: 0.235 usec per loop
lovelace:cpython mdickinson$ /opt/local/bin/python3.7 -m timeit -s "x=115" 
"x**2"
100 loops, best of 5: 204 nsec per loop
lovelace:cpython mdickinson$ /opt/local/bin/python3.8 -m timeit -s "x=115" 
"x**2"
100 loops, best of 5: 217 nsec per loop
lovelace:cpython mdickinson$ /opt/local/bin/python3.9 -m timeit -s "x=115" 
"x**2"
100 loops, best of 5: 245 nsec per loop
lovelace:cpython mdickinson$ /opt/local/bin/python3.10 -m timeit -s "x=115" 
"x**2"
100 loops, best of 5: 230 nsec per loop

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44376] Improve performance of integer exponentiation

2021-06-10 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

Is it because exponentiation becomes slower or because multiplication becomes 
faster? What are absolute numbers?

--
nosy: +serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44376] Improve performance of integer exponentiation

2021-06-10 Thread Karthikeyan Singaravelan


Change by Karthikeyan Singaravelan :


--
nosy: +mark.dickinson

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44376] Improve performance of integer exponentiation

2021-06-10 Thread Steven D'Aprano


New submission from Steven D'Aprano :

Naively, I assumed that `x**2` would be faster than `x*x` as there is only one 
name lookup in the first, and two in the second. But it is slower.

The performance of `x**2` relative to `x*x` has gradually deteriorated compared 
to `x*x` over many versions.

I have found the ratio of `x**2` to `x*x` using timeit:

pythonX.Y -m timeit -s "x=115" "x**2"
pythonX.Y -m timeit -s "x=115" "x*x"

for various X.Y:

2.4: 1.1  # ratio of time for x**2 / time for x*x
2.5: 1.5
2.6: 1.0
2.7: 1.6
3.2: 4.2
3.3: 4.2
3.5: 3.8
3.7: 5.9
3.9: 7.3


In the 2.x series, performance was quite close. In 3.x, the ratio has increased 
significantly. Either integer multiplication has gotten much faster, or 
exponentiation much slower, or both.

Shockingly (to me at least), an exponent of 1 is an order of magnitude slower 
than an multiplicand of 1:

2.7: 1.3  # ratio of time for x**1 / time for x*1
3.9: 10.2


Even an exponent of 10 is a little slower than repeated multiplication in 3.9:

`x*x*x*x*x*x*x*x*x*x` is slightly faster than `x**10`.


It would be nice if we could improve the performance of exponentiation.


(OS: Linux)

--
components: Interpreter Core
messages: 395527
nosy: steven.daprano
priority: normal
severity: normal
status: open
title: Improve performance of integer exponentiation
type: performance
versions: Python 3.11

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com