[issue46406] optimize int division

2022-01-24 Thread Tim Peters


Tim Peters  added the comment:


New changeset 7c26472d09548905d8c158b26b6a2b12de6cdc32 by Tim Peters in branch 
'main':
bpo-46504: faster code for trial quotient in x_divrem() (GH-30856)
https://github.com/python/cpython/commit/7c26472d09548905d8c158b26b6a2b12de6cdc32


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46406] optimize int division

2022-01-24 Thread Tim Peters


Change by Tim Peters :


--
pull_requests: +29038
pull_request: https://github.com/python/cpython/pull/30856

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46406] optimize int division

2022-01-23 Thread Mark Dickinson


Change by Mark Dickinson :


--
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46406] optimize int division

2022-01-23 Thread Mark Dickinson


Mark Dickinson  added the comment:


New changeset c7f20f1cc8c20654e5d539552604362feb9b0512 by Gregory P. Smith in 
branch 'main':
bpo-46406: Faster single digit int division. (#30626)
https://github.com/python/cpython/commit/c7f20f1cc8c20654e5d539552604362feb9b0512


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46406] optimize int division

2022-01-16 Thread Raymond Hettinger


Change by Raymond Hettinger :


--
nosy: +rhettinger, tim.peters

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46406] optimize int division

2022-01-16 Thread Gregory P. Smith


Gregory P. Smith  added the comment:

I tested my PR branch on 32-bit arm (raspbian bullseye) and the microbenchmark 
timing shows no change (within the noise across repeated runs).  Unsurprising 
as division is entirely different on 32-bit arm.

Raspbian uses armv6 for compatibility with the original rpi and rpi0.  armv6 
does not have an integer division instruction. (how RISCy of it)  But that 
doesn't make a difference in this code as the final 32-bit arm ISA, armv7-a, 
only has a 32:32 divider.  (armv8 aka aarch64 is 64-bit and uses a UDIV as one 
would expect)

anyways, that satisfies me that it isn't making anything worse elsewhere.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46406] optimize int division

2022-01-16 Thread Gregory P. Smith


Change by Gregory P. Smith :


--
nosy: +mark.dickinson

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46406] optimize int division

2022-01-16 Thread Gregory P. Smith


Gregory P. Smith  added the comment:

The PR was directly inspired by Mark Dickinson's code in the email thread 
directly using __asm__ to get the instruction he wanted.  There is usually a 
way to make the compiler actually do what you intend.  This appears to be it.

Interestingly, experimenting with small code snippets rather than the entire 
longobject.c on gotbolt.org to check various compilers output does not always 
yield as nice of a result.  (clang 11+ showed promise there, but this change 
benefits gcc equally as well in real world CPython microbenchmark timeit 
tests).  https://godbolt.org/z/63eWPczjx was my playground code.

```
$ ./b-clang13/python -m timeit -n 150 -s 'x = 10**1000; r=x//10; assert r 
== 10**999, r' 'x//17'
150 loops, best of 5: 450 nsec per loop
$ ./b-clang13-new-basic-divrem1/python -m timeit -n 150 -s 'x = 10**1000; 
r=x//10; assert r == 10**999, r' 'x//17'
150 loops, best of 5: 375 nsec per loop
$ ./b-gcc9/python -m timeit -n 150 -s 'x = 10**1000; r=x//10; assert r == 
10**999, r' 'x//17'
150 loops, best of 5: 448 nsec per loop
$ ./b-gcc9-new-basic-divrem1/python -m timeit -n 150 -s 'x = 10**1000; 
r=x//10; assert r == 10**999, r' 'x//17'
150 loops, best of 5: 370 nsec per loop
```

That's on an AMD zen3 (x86_64).  Also tested with other divisors, 17 is not 
specialized by the compiler.  These were not --enable-optimizations builds, 
though the results remain similar on those for non-specialized values as x//10 
turns into when using -fprofile-values on gcc9.

Performance tests using other architectures forthcoming.

A pyperformance suite run on a benchmark-stable host is worthwhile. I don't 
actually expect this to show up as significant in most things there; we'll see.

The new code is not any more difficult to maintain than the previous code 
regardless.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46406] optimize int division

2022-01-16 Thread Gregory P. Smith


Change by Gregory P. Smith :


--
keywords: +patch
pull_requests: +28829
stage: needs patch -> patch review
pull_request: https://github.com/python/cpython/pull/30626

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46406] optimize int division

2022-01-16 Thread Gregory P. Smith


New submission from Gregory P. Smith :

Based on a python-dev thread, we've come up with faster int division code for 
CPython's bignums.

https://mail.python.org/archives/list/python-...@python.org/thread/ZICIMX5VFCX4IOFH5NUPVHCUJCQ4Q7QM/#NEUNFZU3TQU4CPTYZNF3WCN7DOJBBTK5

filing this issue for starters to attach a PR to.  details forthcoming.

--
messages: 410735
nosy: gregory.p.smith
priority: normal
severity: normal
stage: needs patch
status: open
title: optimize int division
type: performance
versions: Python 3.11

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com