[issue46558] Quadratic time internal base conversions

2022-01-30 Thread Tim Peters


Tim Peters  added the comment:

> todecstr treats it as an "input" conversion instead, ...

Worth pointing this out since it doesn't seem widely known: "input" base 
conversions are _generally_ faster than "output" ones. Working in the 
destination base (or a power of it) is generally simpler.

In the math.factorial(100) example, it takes CPython more than 3x longer 
for str() to convert it to base 10 than for int() to reconstruct the bigint 
from that string. Not an O() thing (they're both quadratic time in CPython 
today).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46558] Quadratic time internal base conversions

2022-01-30 Thread Tim Peters


Tim Peters  added the comment:

The factorial of a million is much smaller than the case I was looking at. Here 
are rough timings on my box, for computing the decimal string from the bigint 
(and, yes, they all return the same string):

native:   475seconds (about 8 minutes)
numeral:   22.3  seconds
todecstr:   4.10 seconds
gmp:0.74 seconds

"They recursively split the bigint into halves using % 10^n at each recursion 
step". That's the standard trick for "output" conversions. Beyond that, there 
are different ways to try to use "fat" multiplications instead of division. The 
recursive splitting all on its own can help, but dramatic speedups need 
dramatically faster multiplication.

todecstr treats it as an "input" conversion instead, using the `decimal` module 
to work mostly _in_ base 10. That, by itself, reduces the role of division (to 
none at all in the Python code), and `decimal` has a more advanced 
multiplication algorithm than CPython's bigints have.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46558] Quadratic time internal base conversions

2022-01-30 Thread Carl Friedrich Bolz-Tereick


Carl Friedrich Bolz-Tereick  added the comment:

Somebody pointed me to V8's implementation of str(bigint) today:

https://github.com/v8/v8/blob/main/src/bigint/tostring.cc

They say that they can compute str(factorial(1_000_000)) (which is 5.5 million 
decimal digits) in 1.5s:

https://twitter.com/JakobKummerow/status/1487872478076620800

As far as I understand the code (I suck at C++) they recursively split the 
bigint into halves using % 10^n at each recursion step, but pre-compute and 
cache the divisors' inverses.

--
nosy: +Carl.Friedrich.Bolz

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46558] Quadratic time internal base conversions

2022-01-29 Thread Tim Peters


Tim Peters  added the comment:

Addendum: the "native" time (for built in str(a)) in the msg above turned out 
to be over 3 hours and 50 minutes.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46558] Quadratic time internal base conversions

2022-01-29 Thread Tim Peters


Tim Peters  added the comment:

The test case here is a = (1 << 1) - 1, a solid string of 100 million 1 
bits. The goal is to convert to a decimal string.

Methods:

native: str(a)

numeral: the Python numeral() function from bpo-3451's div.py after adapting to 
use the Python divmod_fast() from the same report's fast_div.py.

todecstr: from the Python file attached to this report.

gmp: str() applied to gmpy2.mpz(a).

Timings:

native: don't know; gave up after waiting over 2 1/2 hours.
numeral: about 5 1/2 minutes.
todecstr: under 30 seconds. (*)
gmp: under 6 seconds.

So there's room for improvement ;-)

But here's the thing: I've lost count of how many times someone has whipped up 
a pure-Python implementation of a bigint algorithm that leaves CPython in the 
dust. And they're generally pretty easy in Python.

But then they die there, because converting to C is soul-crushing, losing the 
beauty and elegance and compactness to mountains of low-level details of 
memory-management, refcounting, and checking for errors after every tiny 
operation.

So a new question in this endless dilemma: _why_ do we need to convert to C? 
Why not leave the extreme cases to far-easier to write and maintain Python 
code? When we're cutting runtime from hours down to minutes, we're focusing on 
entirely the wrong end to not settle for 2 minutes because it may be 
theoretically possible to cut that to 1 minute by resorting to C.


(*) I hope this algorithm tickles you by defying expectations ;-) It 
essentially stands `numeral()` on its head by splitting the input by a power of 
2 instead of by a power of 10. As a result _no_ divisions are used. But instead 
of shifting decimal digits into place, it has to multiply the high-end pieces 
by powers of 2. That seems insane on the face of it, but hard to argue with the 
clock ;-) The "tricks" here are that the O(log log n) powers of 2 needed can be 
computed efficiently in advance of any splitting, and that all the heavy 
arithmetic is done _in_ the `decimal` module, which implements 
fancier-than-Karatsuba multiplication and whose values can be converted to 
decimal strings very quickly.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46558] Quadratic time internal base conversions

2022-01-27 Thread Tim Peters


Tim Peters  added the comment:

Changed the code so that inner() only references one of the O(log log n) powers 
of 2 we actually precomputed (it could get lost before if `lo` was non-zero but 
within `n` had at least one leading zero bit - now we _pass_ the conceptual 
width instead of computing it on the fly).

--
Added file: https://bugs.python.org/file50595/todecstr.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46558] Quadratic time internal base conversions

2022-01-27 Thread Tim Peters


Tim Peters  added the comment:

Dennis, partly, although that was more aimed at speeding division, while the 
approach here doesn't use division at all.

However, thinking about it, the implementation I attached doesn't actually for 
many cases (it doesn't build as much of the power tree in advance as may be 
needed). Which I missed because all the test cases I tried had mountains of 
trailing 0 or 1 bits, not mixtures.

So I'm closing this anyway, at least until I can dream up an approach that 
always works. Thanks!

--
resolution:  -> wont fix
stage:  -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46558] Quadratic time internal base conversions

2022-01-27 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

Is this similar to https://bugs.python.org/issue3451 ?

--
nosy: +Dennis Sweeney

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46558] Quadratic time internal base conversions

2022-01-27 Thread Tim Peters


New submission from Tim Peters :

Our internal base conversion algorithms between power-of-2 and non-power-of-2 
bases are quadratic time, and that's been annoying forever ;-) This applies to 
int<->str and int<->decimal.Decimal conversions. Sometimes the conversion is 
implicit, like when comparing an int to a Decimal.

For example:

>>> a = 1 << 10 # yup! a billion and one bits
>>> s = str(a)

I gave up after waiting for over 8 hours, and the computation apparently can't 
be interrupted.

In contrast, using the function in the attached todecstr.py gets the result in 
under a minute:

>>> a = 1 << 10
>>> s = todecstr(a)
>>> len(s)
301029996

That builds an equal decimal.Decimal in a "clever" recursive way, and then just 
applies str to _that_.

That's actually a best case for the function, which gets major benefit from the 
mountains of trailing 0 bits. A worst case is all 1-bits, but that still 
finishes in under 5 minutes:

>>> a = 1 << 10
>>> s2 = todecstr(a - 1)
>>> len(s2)
301029996
>>> s[-10:], s2[-10:]
('1787109376', '1787109375')

A similar kind of function could certainly be written to convert from Decimal 
to int much faster, but it would probably be less effective. These things avoid 
explicit division entirely, but fat multiplies are key, and Decimal implements 
a fancier * algorithm than Karatsuba.

Not for the faint of heart ;-)

--
components: Interpreter Core
files: todecstr.py
messages: 411962
nosy: tim.peters
priority: normal
severity: normal
status: open
title: Quadratic time internal base conversions
type: performance
Added file: https://bugs.python.org/file50593/todecstr.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com