[issue47121] math.isfinite() can raise exception when called on a number

2022-04-05 Thread Tim Peters


Tim Peters  added the comment:

I'll testify that I won't volunteer one second of my time pursuing these 
abstract "purity" crusades ;-) `isfinite()` et alia were added to supply 
functions defined by current standards to work on IEEE floating-point values. 
People working with floats have reasonable expectations that Python will supply 
workalikes for industry-stand float functions.

If was had to cater to all possible "numberish" types, we'd never add anything 
new again :-( Doing that in a principled way requires dedicated new dunder 
methods, and that's a high bar.

That I said, I extended math.log() decades ago to work on giant integers. It 
was useful for real projects I was working on at the time - I was scratching my 
own itches.

--

___
Python tracker 
<https://bugs.python.org/issue47121>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47145] Improve graphlib.TopologicalSort by removing the prepare step

2022-03-30 Thread Tim Peters


Tim Peters  added the comment:

I believe I'm elaborating on your "footgun".

It doesn't matter to me whether we pick some scheme and document it, _if_ that 
scheme is incoherent, impossible to remember, error-prone, etc.

That's how, e.g., regular expression syntax was designed. "I know! ++*??+) 
doesn't have a coherent meaning now, so let's say it means to match a prime 
number of the pattern it follows" ;-)

That is, an error-prone extension is worse than no extension at all.

The algorithm now isn't guessing at anything: it's saying up front that two 
tasks are the same task if and only if they compare equal. Context, execution 
history, ..., nothing else is relevant. It's simple. Complicating a simple 
thing may open new possibilities, but creates new traps too.

One trap is pretty obviously making "the rules" for when two tasks are the same 
task depend on the execution history at the time the question is asked.

That goes away, though, if the current rule is retained ("the same iff =="), 
but can be explicitly overridden by .forget() (or some such).

That doesn't make it a no-brainer, though. For example, do

.add(A, B)

and run until A and B are marked done. Now do

.add(B, C)

What then? We're back in a guessing game again. We've just been told that B 
depends on C first. But we already did B. Depending on _domain_ knowledge we 
cannot have, that may or may not be worthy of raising an exception.

You can make up any rule you want about that and arbitrarily declare victory, 
but the chance that a user will realize it's not the behavior _their_ domain 
needs is approximately 0 until after they've been burned by it. So now it's 
error-prone, at least to them.

FWIW, in that specific case I'd say "tough beans - you told us too late that B 
depends on C to stop B the first time, but now that you've told us we'll 
respect it in future".

Another twist: assuming that's "the rule", what does

.add(B, C)

really mean? If B really is "the same task" as it was the first time around, 
well, it's already been marked done. Are we supposed to do it _again_ now? Why? 
Why not?

It's hard for me to find any _compelling_ sense here - just masses of 
more-or-less arbitrary implementation decisions. In which case "hard to 
remember" follows soon after.

None of that haunts the current API. It all follows quite directly from what 
"depends on" means to essentially everyone.

--

___
Python tracker 
<https://bugs.python.org/issue47145>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47145] Improve graphlib.TopologicalSort by removing the prepare step

2022-03-30 Thread Tim Peters


Tim Peters  added the comment:

Various kinds of tasks:

- "Power switch must be on." Needs to done the first time. _May_ need to be 
done again later (if some later task turns the power off again). Can be done 
any number of times without harm (beyond the expense of checking), so long as 
the switch is on.

- "Ensure gas tank is full." Probably needs to be done anew for every new added 
task that depends on it.

- "Remove outermost layer of skin." Probably shouldn't be done more than once 
;-)

- "Flip Najdorf switch." Who knows? Doing it a second time - or failing to do 
it a second time - may be necessary, harmless, or deadly.

I'd rather not bother with any of this dicey guessing. While some dynamism may 
be attractive, what it all "should mean" appears to be a rat's nest, and 
depends on domain knowledge graphlib can't possibly have.

I doubt there is a compelling default. As is, two tasks are considered to be 
"the same" task if and only if they compare equal, so that's the least 
_surprising_ default for tasks added later. "Remove outermost layer of skin"

"Two tasks that compare equal may or may not be considered the same task, 
depending on the execution history at the time the question is posed" is at 
best expedient, at worst disastrous.

--

___
Python tracker 
<https://bugs.python.org/issue47145>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47114] random.choice and random.choices have different distributions

2022-03-24 Thread Tim Peters


Tim Peters  added the comment:

Definitely a duplicate, and I doubt Mark or Raymond will change their mind.

One observation: while floats are not uniformly dense in [0, 1), random() 
results are uniformly spaced. Each is of the form I / 2**53 for an integer I in 
range(2**53).

--
nosy: +tim.peters

___
Python tracker 
<https://bugs.python.org/issue47114>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47080] Use atomic groups to simplify fnmatch

2022-03-21 Thread Tim Peters


Change by Tim Peters :


--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue47080>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47080] Use atomic groups to simplify fnmatch

2022-03-21 Thread Tim Peters


Tim Peters  added the comment:


New changeset 5c3201e146b251017cd77202015f47912ddcb980 by Tim Peters in branch 
'main':
bpo-47080: Use atomic groups to simplify fnmatch (GH-32029)
https://github.com/python/cpython/commit/5c3201e146b251017cd77202015f47912ddcb980


--

___
Python tracker 
<https://bugs.python.org/issue47080>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47080] Use atomic groups to simplify fnmatch

2022-03-21 Thread Tim Peters


Change by Tim Peters :


--
keywords: +patch
pull_requests: +30118
stage: needs patch -> patch review
pull_request: https://github.com/python/cpython/pull/32029

___
Python tracker 
<https://bugs.python.org/issue47080>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47080] Use atomic groups to simplify fnmatch

2022-03-20 Thread Tim Peters


New submission from Tim Peters :

I added some excruciatingly obscure technical tricks to ensure that 
fnmatch.py's regexps can't fall into exponential-time match failures.

It's hard to stop re from useless backtracking. But the new "atomic groups" 
make that easy instead in some cases, and make it trivial in the cases fnmatch 
needs.

Of course addressing this has to wait for the atomic groups PR to get merged 
GH-31982)

--
assignee: tim.peters
components: Library (Lib)
messages: 415662
nosy: tim.peters
priority: normal
severity: normal
stage: needs patch
status: open
title: Use atomic groups to simplify fnmatch
type: behavior
versions: Python 3.11

___
Python tracker 
<https://bugs.python.org/issue47080>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47037] Build problems on Windows

2022-03-16 Thread Tim Peters


Tim Peters  added the comment:

Christian, yes, but only in a debug build. See Eryk Sun's message a bit above: 
the machinery to prevent this is already present, but isn't getting called 
early enough.

--

___
Python tracker 
<https://bugs.python.org/issue47037>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47037] Build problems on Windows

2022-03-16 Thread Tim Peters


Tim Peters  added the comment:

BTW, the frequency of this new failure mode appears to be vastly increased if 
running the debug tests with "-j0". For example, the box popping up is reliably 
the very first sign of life if I run this from the PCBuild directory:

rt -q -d -j0

--

___
Python tracker 
<https://bugs.python.org/issue47037>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47037] Build problems on Windows

2022-03-16 Thread Tim Peters


Tim Peters  added the comment:

Actually, I see this ('Debug Assertion Failed!' in the same spot) every time I 
try to run the test suite with a debug build, starting yesterday.

Didn't have time to track it down. It _appeared_ to an obscure consequence of 
this commit:

"""
$ git log timemodule.c
commit a4674f0194067a801f6c6bdb4fc6448e3a40e069
Author: Christian Heimes 
Date:   Tue Mar 15 22:55:35 2022 +0200

bpo-40280: Detect presence of time.tzset and thread_time clock (GH-31898)
"""

--
nosy: +tim.peters

___
Python tracker 
<https://bugs.python.org/issue47037>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46990] Surprising list overallocation from .split()

2022-03-11 Thread Tim Peters


Tim Peters  added the comment:

Well, that's annoying ;-) In context, the OP was saving a list of 10 million 
splits. So each overallocation by a single element burned 80 million bytes of 
RAM. Overallocating by 7 burned 560 million bytes.

Which is unusual. Usually a split result is short-lived, consumed once then 
thrown away.

OTOH, the overwhelming motivation for overallocating at all is to acheive O(1) 
amortized time after a long _sequence_ of appends, and split results typically 
aren't appended to at all. split() appears to be using it as a timing 
micro-optimization for tiny lists instead.

So, like I said, it's annoying ;-) For "small" lists, split() really shouldn't 
overallocate at all (because, as before, split results are rarely appended to). 
A compromise could be to save pointers to the first N (12, whatever) instances 
of the splitting string in a stack ("auto") vector, before any list object (or 
result string object) is created. If it's out of stuff to do before reaching N, 
fine, build a result out of exactly what was found. If there's more to do, 
build a result from the first N, and go on as currently (letting PyList_Append 
deal with it - overallocation is huge in percentage terms when the list is 
short, but not so much as the list gets longer).

--

___
Python tracker 
<https://bugs.python.org/issue46990>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46990] Surprising list overallocation from .split()

2022-03-11 Thread Tim Peters


Change by Tim Peters :


--
type: behavior -> resource usage

___
Python tracker 
<https://bugs.python.org/issue46990>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46990] Surprising list overallocation from .split()

2022-03-11 Thread Tim Peters


New submission from Tim Peters :

When looking into a StackOverflow question about surprisingly high memory use, 
I stumbled into this (under 3.10.1, Win64):

>>> import sys
>>> s = "1 2 3 4 5".split()
>>> s
['1', '2', '3', '4', '5']
>>> sys.getsizeof(s)
152
>>> _ - sys.getsizeof([])
96
>>> 96 / 8
12.0

That is, we allocated enough space in the list to store 12(!) elements, despite 
that only 5 are used. Other ways of building a 5-element list I've tried 
overallocate by at most 3 slots:

>>> sys.getsizeof([ch for ch in "12345"])
120
>>> sys.getsizeof([1, 2, 3, 4, 5])
120

(and 120 - 56 = 64, room for 8 pointers)

Then there's this curiosity, which allocates space for exactly the 5 needed:

>>> sys.getsizeof(list(tuple("1 2 3 4 5".split(
96

(and 96 - 56 = 40, room for the 5 pointers needed)

I don't expect this to be consistent, but allocating space for 12 when only 5 
are needed is unreasonable. Even allocating space for 8 is pushing it ;-)

--
components: Interpreter Core
messages: 414942
nosy: tim.peters
priority: normal
severity: normal
status: open
title: Surprising list overallocation from .split()
type: behavior
versions: Python 3.11

___
Python tracker 
<https://bugs.python.org/issue46990>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46868] Improve performance of math.prod with bignums (and functools.reduce?)

2022-02-28 Thread Tim Peters


Tim Peters  added the comment:

> the total number of trailing 1 bits in the integers from 1
> through N inclusive is N - N.bit_count()

Sorry, that's the total number of trailing 0 bits. The total number of trailing 
1 bits is (N+1) - (N+1).bit_count().

--

___
Python tracker 
<https://bugs.python.org/issue46868>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46868] Improve performance of math.prod with bignums (and functools.reduce?)

2022-02-28 Thread Tim Peters


Tim Peters  added the comment:

About runtime, you're right. I did a ballpark "OK, if there are N incoming 
values, the inner loop has to go around, for each one, looking for a NULL, 
across a vector of at most log2(N) entries. So N * log2(N)". But, in fact, it's 
highly skewed toward getting out early, and 2*N is an upper bound on the total 
number of inner loop iterations. Strongly related to that the total number of 
trailing 1 bits in the integers from 1 through N inclusive is N - N.bit_count().

For the rest, I'll only repeat that if this goes in, it should be as a new 
function. Special-casing, e.g., math.prod() is a Bad Idea. We can have no idea 
in advance whether the iterable is type-homogenous, or even whether the __mul__ 
methods the types involved implement are even intended to be associative. 

functools.reduce() clearly documents strict "left to right" evaluation.

But a new treereduce() can do anything it feels like promising.

--

___
Python tracker 
<https://bugs.python.org/issue46868>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46868] Improve performance of math.prod with bignums (and functools.reduce?)

2022-02-28 Thread Tim Peters


Tim Peters  added the comment:

Too abstract for me to find compelling. I suspect the title of this report 
referenced "math.prod with bignums" because it's the only actual concrete use 
case you had ;-)

Here's another: math.lcm. That can benefit for the same reason as math.prod - 
provoking Karatsuba multiplication. However, applying lcm to a largish 
collection of ints is so rare I can't recall ever seeing it done.

Here's a related anti-example: math.gcd. Tree reduction hurts that. It 
typically falls to 1 quickly, and tree reduction just delays that.

So I'm at best -0 on this, and I'll stop now.

For reference, here's a Python implementation that strives to match 
functools.reduce's signature and endcase behaviors. It accepts any iterable, 
and requires temp space at most about log2(number of elements the iterable 
delivers in all).

That memory frugality adds a log2 factor to the runtime. The O() speed penalty 
could be eliminated by using temp memory that grows to about half the number of 
elements in the iterable.

def treereduce(function, iterable, initializer=None):
levels = []
if initializer is not None:
levels.append(initializer)
NULL = object()
for y in iterable:
for i, x in enumerate(levels):
if x is NULL:
levels[i] = y
break
y = function(x, y)
levels[i] = NULL
else:
levels.append(y)
y = NULL
for x in levels:
if x is not NULL:
y = x if y is NULL else function(x, y)
if y is NULL:
raise TypeError("treereduce() of empty iterable with no initial 
value")
return y

Then, for example,

>>> treereduce(lambda x, y: f"({x}+{y})", "abcdefghijk")
'a+b)+(c+d))+((e+f)+(g+h)))+((i+j)+k))'

--

___
Python tracker 
<https://bugs.python.org/issue46868>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46868] Improve performance of math.prod with bignums (and functools.reduce?)

2022-02-27 Thread Tim Peters


Tim Peters  added the comment:

Hi. "It's pretty good for a lot of things" is precisely what I'm questioning. 
Name some, and please be specific ;-)

Tree reduction is very popular in the parallel processing world, for obvious 
reasons. But we're talking about a single thread here, and there aren't all 
that many associative binary operators (or, depending on implementation, they 
may need also to be commutative).

I gave floating * as an example: tree reduction buys nothing for accuracy, and 
would be both slower and consume more memory. Other associative operators for 
which all the same are true include bitwise &, bitwise |, bitwise ^, min, and 
max.

Floating + can benefit for accuracy, but we already have math.fsum() for that, 
which delivers the most accurate possible result.

The only unaddressed example we have here so far that could actually benefit is 
bigint *. If that's the real use case, fine, but there are better ways to 
address that case.

I've searched in vain for other languages that try to address this "in 
general", except in the context of parallelization. As Guido will tell you, the 
only original idea in Python is adding an "else" clause to loops ;-)

In any case, I don't believe this belongs hiding in reduce(). As above, it's a 
net loss for most binary operators. Too close to "attractive nuisance" 
territory. Adding a new, e.g., treereduce() would be better - provided that it 
is in fact "pretty good for a lot of things".

--

___
Python tracker 
<https://bugs.python.org/issue46868>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46868] Improve performance of math.prod with bignums (and functools.reduce?)

2022-02-26 Thread Tim Peters


Tim Peters  added the comment:

I don't know that there's a good use case for this. For floating addition, 
tree-based summation can greatly reduce total roundoff error, but that's not 
true of floating multiplication.

The advantage for prod(range(2, 50001)) doesn't really stem from turning it 
into a tree reduction, but instead for that specific input sequence "the tree" 
happens to do a decent job of keeping multiplicands more-or-less balanced in 
size (bit length), which eventually allows Karatsuba multiplication to come 
into play. If CPython didn't implement Karatsuba multiplication, tree reduction 
wouldn't improve speed: if the inputs are in sequence xs, regardless how 
school-book multiplication is grouped, or rearranged, the time needed is 
proportional to

sum(a * b for a, b in combinations([x.bit_length() for x in xs], 2))

So if the real point is to speed large products of integers, a different 
approach is wanted: keep track of intermediate products' bit lengths, and at 
each step strive to multiply partial products near the same length. This will 
reliably get Karatsuba into play if possible, and caters too to that Karatsuba 
is most valuable on multiplicands of the same bit length. When any of that 
happens from blind tree reduction, it's down to luck.

I've seen decent results from doing that with a fixed, small array A, which 
(very roughly speaking) combines "the next" integer `i` to multiply with the 
array entry A[i.bit_length().bit_length()] (and continues that with the new 
partial product, & so on, until hitting an array slot containing 1).

--

___
Python tracker 
<https://bugs.python.org/issue46868>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45735] Promise the long-time truth that `args=list` works

2022-02-25 Thread Tim Peters


Tim Peters  added the comment:


New changeset e466faa9df9a1bd377d9725de5484471bc4af8d0 by Charlie Zhao in 
branch 'main':
bpo-45735: Promise the long-time truth that `args=list` works (GH-30982)
https://github.com/python/cpython/commit/e466faa9df9a1bd377d9725de5484471bc4af8d0


--

___
Python tracker 
<https://bugs.python.org/issue45735>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46812] Thread starvation with threading.Condition

2022-02-20 Thread Tim Peters


Tim Peters  added the comment:

Unassigning myself - I have no insight into this.

I suspect the eternally contentious issue 7946 is related.

--

___
Python tracker 
<https://bugs.python.org/issue46812>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46726] Thread spuriously marked dead after interrupting a join call

2022-02-13 Thread Tim Peters


Tim Peters  added the comment:

> It's nice that _maintain_shutdown_locks() gets
> called in _stop(), but the more important call site is in
> _set_tstate_lock().

I didn't have that in mind at all. What at the Python level cares whether the 
thread is alive? Well. is_alive() does, and it calls _wait_for_tstate_lock() 
__repr__() does, and it calls is_alive() (which I added years ago, else repr 
could keep claiming a thread was alive weeks ;-) after it actually ended). 
join() does, and it calls _wait_for_tstate_lock().

Anything else? Not off the top of my head. The point is that if 
_wait_for_tstate_lock() fails to call _stop() for some reason when it "should 
have" called it, it's not fatal. Anything that _cares_ will call it again. 
Indeed, it was all designed so that it could be called any number of times, 
redundantly or not, and without any explicit synchronization.

For the rest, I value clarity above all else here. This code has a long history 
of being the subject of bug reports. The cost of an "extra" call to .locked() 
is just trivial, especially given that calls to _wait_for_tstate_lock() are 
typically rare.

--

___
Python tracker 
<https://bugs.python.org/issue46726>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46726] Thread spuriously marked dead after interrupting a join call

2022-02-13 Thread Tim Peters


Tim Peters  added the comment:

>> is there a bulletproof way to guarantee that `self._stop()` gets 
>> called if the acquire_and_release() succeeds? 

> I don't think it's critical.

Agreed! Anything at the Python level that cares whether the thread is still 
alive will call _wait_for_tstate_lock() again, and get another chance each time 
to notice that the tstate lock has been freed.

Instead of:

try:
if lock.acquire_and_release(block, timeout):
self._stop
except:
if not lock.locked():
self._stop()

I'd prefer:

try:
lock.acquire_and_release(block, timeout)
finally:
if not lock.locked():
self._stop()

Because, at the Python level, whether acquire_and_release() succeeded at its 
task depends entirely on whether the lock is free afterward. That's what we're 
_really_ looking for, and is so on all possible platforms.

--

___
Python tracker 
<https://bugs.python.org/issue46726>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46726] Thread spuriously marked dead after interrupting a join call

2022-02-12 Thread Tim Peters


Tim Peters  added the comment:

While bundling the lock.release() into C makes that bulletproof, is there a 
bulletproof way to guarantee that `self._stop()` gets called if the 
acquire_and_release() succeeds? Offhand, I don't see a reason for why that 
isn't just as vulnerable to getting skipped due to an unfortunate signal.

Indeed, huge mounds of threading.py can leave things in insane states in the 
presence of by-magic exceptions. Even if code is very careful to put crucial 
cleanup code in `finally` blocks so it gets executed "no matter what", there's 
nothing to stop code in `finally` blocks from getting skipped over due to 
by-magic exceptions too.

It's an eternal marvel that anything ever works at all ;-)

--

___
Python tracker 
<https://bugs.python.org/issue46726>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46726] Thread spuriously marked dead after interrupting a join call

2022-02-12 Thread Tim Peters


Tim Peters  added the comment:

> Maybe add an `acquire_and_release()` method

Bingo - that should do the trick, in an "obviously correct" way. Of course it's 
of limited applicability, but fine by me.

Will you open a PR with this code?

--

___
Python tracker 
<https://bugs.python.org/issue46726>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46726] Thread spuriously marked dead after interrupting a join call

2022-02-12 Thread Tim Peters


Tim Peters  added the comment:

Na, we've been doing excruciatingly clever stuff to deal with thread shutdown 
for decades, and it always proves to be wrong in some way. Even if code like

except:
if lock.locked():
lock.release()
self._stop()
raise

did work as hoped for, it would still be broken, but in a different way: 
suppose we really did acquire the tstate lock because the thread actually ended 
while we were in acquire(). But then a signal dumped us into the `except` block 
before doing the release. Oops! There's nothing I can see to stop _another_ 
(say) KeyboardInterrupt preventing us from doing the "failsafe" release too. So 
the lock remains locked forever after (we hold the lock now, and missed our 
chances to release it). And I think that's pretty likely: if I don't see an 
instant response to Ctrl-C, I'm likely to do another very soon after.

So I don't think catching exceptions can be made to work for this. Or `finally` 
blocks either. Indeed, it appears that any way whatsoever of spelling 
`lock.release()` in Python can be defeated by an unfortunately timed signal.

Which isn't unique to this code, of course. The failure modes of this code just 
happen to be unusually visible ;-)

Two other approaches come to mind:

- Wrap everything needed in a custom C function. CPython can't interfere with 
its control flow.

- Add new sys gimmicks to suppress and re-enable raising Python level 
exceptions for signals. Then, e.g., something here like:

with sys.delay_signal_exceptions():
# Dead simple code, like _before_ we "fixed" it ;-)
# In particular, while Ctrl-C may terminate the `acquire()` call,
# KeyboardInterrupt will not be raised until the `with` block
# exits.
# Possibly intractable: arranging then for the traceback to
# point at the code where the exception would have been raised
# had temporary suspension not been enabled. Then again, since
# it's not _actually_ raised there at the Python level, maybe
# it's a Good Thing to ignore.
if lock.acquire(block, timeout):
lock.release()
self._stop()

The second way is more general, but would probably require a PEP.

--

___
Python tracker 
<https://bugs.python.org/issue46726>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46726] Thread spuriously marked dead after interrupting a join call

2022-02-12 Thread Tim Peters


Tim Peters  added the comment:

Eryk, I don't think that workaround is solid on Windows in all cases. For 
example, if .join() is called with a timeout, the same timeout is passed to 
lock.acquire(block, timeout). If the acquire() in fact times out, but the store 
to the `acquired` variable is interrupted, `if _WINDOWS and acquired is None` 
will succeed, despite that the lock is still locked. Then we go on to - again - 
incorrectly release the lock and call _stop().

But please don't "repair" that: platform-specific tricks aren't on a path to an 
actual solution ;-) If the latter requires some new C code, fine.

--
nosy: +tim.peters

___
Python tracker 
<https://bugs.python.org/issue46726>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46639] Ceil division with math.ceildiv

2022-02-07 Thread Tim Peters


Tim Peters  added the comment:

The `decimal` module intends to be a faithful implementation of external 
standards. The identity

x == (x // y) * y + x % y

isn't a minor detail, it's the dog on which all else is but a tail ;-)

It's why Guido picked -7 // 4 = -2 in Python[1]. That's really not all that 
useful on its own, and does surprise people. But it's necessary so that the 
identity implies -7 % 4 = 1, and _that's_ what drove it all, the desire that 
doing mod wrt a positive int always give a non-negative result.

The identity also implies that, under truncating division, mod returns a result 
with the sign of the dividend rather than of the divisor.

`decimal` implements what the external standards demand integer division do, 
and also integer modulus. The decimal context `divmod()` method returns both at 
once.

The behavior of int div and mod are linked by all standards I'm aware of, 
including by C89 (which didn't define what integer division did in all cases, 
but did demand that - whatever it did - % had to work in a way consistent with 
the named identity).

[1] 
http://python-history.blogspot.com/2010/08/why-pythons-integer-division-floors.html

--

___
Python tracker 
<https://bugs.python.org/issue46639>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46667] SequenceMatcher & autojunk - false negative

2022-02-07 Thread Tim Peters


Tim Peters  added the comment:

We can't change defaults without superb reason - Python has millions of users, 
and changing the output of code "that works" is almost always a non-starter.

Improvements to the docs are welcome.

In your example, try running this code after using autojunk=True:

pending = ""
for ch in first:
if ch in sm.bpopular:
if pending:
print(repr(pending))
pending = ""
else:
pending += ch
print(repr(pending))

That shows how `first` is effectively broken into tiny pieces given that the 
"popular" chaaracters act like walls. Here's the start of the output:

'\nUN'
'QUESTR'
'NG\nL'
'x'
'f'
'.'
'L'
'b'
"'"
'x'
'v'
'1500'
','

and on & on. `QUESTER' is the longest common contiguous substring remaining.

--

___
Python tracker 
<https://bugs.python.org/issue46667>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46667] SequenceMatcher & autojunk - false negative

2022-02-06 Thread Tim Peters


Tim Peters  added the comment:

SequenceMatcher looks for the longest _contiguous_ match. "UNIQUESTRING" isn't 
the longest by far when autojunk is False, but is the longest when autojunk is 
True. All those bpopular characters then effectively prevent finding a longer 
match than 'QUESTR' (capital 'I" is also in bpopular) directly.

The effects of autojunk can be surprising, and it would have been better if it 
were False by default. But I don't see anything unexpected here. Learn from 
experience and force it to False yourself ;-) BTW, it was introduced as a way 
to greatly speed comparing files of code, viewing them as sequences of lines. 
In that context, autojunk is rarely surprising and usually helpful. But it more 
often backfires when comparing strings (viewed as sequences of characters) :-(

--
nosy: +tim.peters

___
Python tracker 
<https://bugs.python.org/issue46667>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46639] Ceil division with math.ceildiv

2022-02-05 Thread Tim Peters


Tim Peters  added the comment:

I expect "obviousness" is mostly driven by background here. You know, e.g., 
that ceil(x) = -floor(-x) for any real x, and the application to integer 
division is just a special case of that. I expect programmers mostly don't know 
that, though. And Python having floor integer division is unusual among 
programming languages. Everyone coming from, say, C, has seen the (i + j - 1)/j 
"idiom" over and over, where "the usual" truncating integer division is the 
rule (and they know too that `i` and `j` are positive). Familiarity breeds 
"obviousness" too :-)

--

___
Python tracker 
<https://bugs.python.org/issue46639>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46187] Optionally support rounding for math.isqrt()

2022-02-05 Thread Tim Peters


Tim Peters  added the comment:

I've been keeping my eyes open. The only mariginally relevant thing I've 
noticed is Hart's "one line factoring" method:

http://wrap.warwick.ac.uk/54707/1/WRAP_Hart_S1446788712000146a.pdf

That wants the _ceiling_ of the square root. And in another place it wants to 
know "is it a perfect square?". But the part that wants the ceiling doesn't 
care whether it's a perfect square, while the part asking "is it a perfect 
square?" doesn't care what the ceiling may be.

--

___
Python tracker 
<https://bugs.python.org/issue46187>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46639] Ceil division with math.ceildiv

2022-02-05 Thread Tim Peters


Tim Peters  added the comment:

GMP's mpz has 18 functions of this form. These are the 6 "ceiling" flavors:

c_divmod
c_div
c_mod

c_divmod_2exp
c_div_2exp
c_mod_2exp

The suggestion here is for c_div.

There are 6 more for floor rounding (with prefix "f_" instead of "c_"), and 
another 6 for truncation ("to-zero" rounding, with prefix "t_"). Curiously 
enough, there's no direct support for any form of "round to nearest".

So that's where this ends ;-)

I personally almost never use -(-x // y). Because it's a bit obscure, and in 
real life y is always known to be positive, so the nearly obvious (x + y - 1) 
// y works fine.

--
nosy: +tim.peters

___
Python tracker 
<https://bugs.python.org/issue46639>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46627] Regex hangs indefinitely

2022-02-03 Thread Tim Peters


Tim Peters  added the comment:

Introducing some kind of optional timeout is too involved to just drop in 
without significant discussion and design effort first. If you want to pursue 
this, please bring it up on the python-ideas mailing list.

My first take: it wouldn't really help, because nobody would use it until after 
it was too late ;-) So, in the absence of someone jumping on it right away, I'm 
closing this one too as "won't fix". If the idea gains traction, we can, of 
course, reopen this.

BTW, Jeffrey Friedl's book "Mastering Regular Expressions" teaches how to write 
bulletproof regexps. It's not always obvious at first glance, but it's an art 
that can be mastered.

--
nosy: +tim.peters
resolution:  -> wont fix
stage:  -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue46627>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46615] Use-after-free by mutating set during set operations

2022-02-03 Thread Tim Peters


Tim Peters  added the comment:

Raised the priority back to normal.

I agree with Dennis's observation that PyDict_Next is safe, provided it's used 
as intended: it returns borrowed references, but to things that absolutely are 
legitimate at the time. In the presence of mutations, *what* it returns isn't 
defined at all, but I don't see a way for it to blow up (unless its caller 
screws up by believing it owns the references). It doesn't assume anything 
about the structure of the dict across calls.

--
nosy: +tim.peters
priority: low -> normal

___
Python tracker 
<https://bugs.python.org/issue46615>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46618] Exponent operator(**) interpreter issue

2022-02-02 Thread Tim Peters


Tim Peters  added the comment:

Exponentiation has higher precedence (binds more tightly) than unary minus, so 
the expression groups as -(2**2).

Virtually all computer languages (those that _have_ an exponentiation operator) 
do the same. For example, here from wxMaxima:

(%i1) -2**2;
(%o1) -4

Closing as not-a-bug.

--
nosy: +tim.peters
resolution:  -> not a bug
stage:  -> resolved
status: open -> closed
versions:  -Python 3.10

___
Python tracker 
<https://bugs.python.org/issue46618>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46558] Quadratic time internal base conversions

2022-01-30 Thread Tim Peters


Tim Peters  added the comment:

> todecstr treats it as an "input" conversion instead, ...

Worth pointing this out since it doesn't seem widely known: "input" base 
conversions are _generally_ faster than "output" ones. Working in the 
destination base (or a power of it) is generally simpler.

In the math.factorial(100) example, it takes CPython more than 3x longer 
for str() to convert it to base 10 than for int() to reconstruct the bigint 
from that string. Not an O() thing (they're both quadratic time in CPython 
today).

--

___
Python tracker 
<https://bugs.python.org/issue46558>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46558] Quadratic time internal base conversions

2022-01-30 Thread Tim Peters


Tim Peters  added the comment:

The factorial of a million is much smaller than the case I was looking at. Here 
are rough timings on my box, for computing the decimal string from the bigint 
(and, yes, they all return the same string):

native:   475seconds (about 8 minutes)
numeral:   22.3  seconds
todecstr:   4.10 seconds
gmp:0.74 seconds

"They recursively split the bigint into halves using % 10^n at each recursion 
step". That's the standard trick for "output" conversions. Beyond that, there 
are different ways to try to use "fat" multiplications instead of division. The 
recursive splitting all on its own can help, but dramatic speedups need 
dramatically faster multiplication.

todecstr treats it as an "input" conversion instead, using the `decimal` module 
to work mostly _in_ base 10. That, by itself, reduces the role of division (to 
none at all in the Python code), and `decimal` has a more advanced 
multiplication algorithm than CPython's bigints have.

--

___
Python tracker 
<https://bugs.python.org/issue46558>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46558] Quadratic time internal base conversions

2022-01-29 Thread Tim Peters


Tim Peters  added the comment:

Addendum: the "native" time (for built in str(a)) in the msg above turned out 
to be over 3 hours and 50 minutes.

--

___
Python tracker 
<https://bugs.python.org/issue46558>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3451] Asymptotically faster divmod and str(long)

2022-01-29 Thread Tim Peters


Tim Peters  added the comment:

Ha! This will never die. More discussion in bpo-46558.

Ya, I already closed it, but don't want to. I opened it to begin with to record 
an int->str method that doesn't use division, so it didn't really belong on 
this report.

What if we _didn't_ convert these things to C? That's the primary question the 
new report gets around to asking. These things are far easier to write, 
understand, and maintain in Python, and converting to C is generally only 
improving a constant factor, or _maybe_ removing a log(n) factor. When we're 
looking at massive O() improvements, squeezing those out is of comparatively 
little value.

--
nosy: +tim.peters

___
Python tracker 
<https://bugs.python.org/issue3451>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46558] Quadratic time internal base conversions

2022-01-29 Thread Tim Peters


Tim Peters  added the comment:

The test case here is a = (1 << 1) - 1, a solid string of 100 million 1 
bits. The goal is to convert to a decimal string.

Methods:

native: str(a)

numeral: the Python numeral() function from bpo-3451's div.py after adapting to 
use the Python divmod_fast() from the same report's fast_div.py.

todecstr: from the Python file attached to this report.

gmp: str() applied to gmpy2.mpz(a).

Timings:

native: don't know; gave up after waiting over 2 1/2 hours.
numeral: about 5 1/2 minutes.
todecstr: under 30 seconds. (*)
gmp: under 6 seconds.

So there's room for improvement ;-)

But here's the thing: I've lost count of how many times someone has whipped up 
a pure-Python implementation of a bigint algorithm that leaves CPython in the 
dust. And they're generally pretty easy in Python.

But then they die there, because converting to C is soul-crushing, losing the 
beauty and elegance and compactness to mountains of low-level details of 
memory-management, refcounting, and checking for errors after every tiny 
operation.

So a new question in this endless dilemma: _why_ do we need to convert to C? 
Why not leave the extreme cases to far-easier to write and maintain Python 
code? When we're cutting runtime from hours down to minutes, we're focusing on 
entirely the wrong end to not settle for 2 minutes because it may be 
theoretically possible to cut that to 1 minute by resorting to C.


(*) I hope this algorithm tickles you by defying expectations ;-) It 
essentially stands `numeral()` on its head by splitting the input by a power of 
2 instead of by a power of 10. As a result _no_ divisions are used. But instead 
of shifting decimal digits into place, it has to multiply the high-end pieces 
by powers of 2. That seems insane on the face of it, but hard to argue with the 
clock ;-) The "tricks" here are that the O(log log n) powers of 2 needed can be 
computed efficiently in advance of any splitting, and that all the heavy 
arithmetic is done _in_ the `decimal` module, which implements 
fancier-than-Karatsuba multiplication and whose values can be converted to 
decimal strings very quickly.

--

___
Python tracker 
<https://bugs.python.org/issue46558>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46558] Quadratic time internal base conversions

2022-01-27 Thread Tim Peters


Tim Peters  added the comment:

Changed the code so that inner() only references one of the O(log log n) powers 
of 2 we actually precomputed (it could get lost before if `lo` was non-zero but 
within `n` had at least one leading zero bit - now we _pass_ the conceptual 
width instead of computing it on the fly).

--
Added file: https://bugs.python.org/file50595/todecstr.py

___
Python tracker 
<https://bugs.python.org/issue46558>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46558] Quadratic time internal base conversions

2022-01-27 Thread Tim Peters


Tim Peters  added the comment:

Dennis, partly, although that was more aimed at speeding division, while the 
approach here doesn't use division at all.

However, thinking about it, the implementation I attached doesn't actually for 
many cases (it doesn't build as much of the power tree in advance as may be 
needed). Which I missed because all the test cases I tried had mountains of 
trailing 0 or 1 bits, not mixtures.

So I'm closing this anyway, at least until I can dream up an approach that 
always works. Thanks!

--
resolution:  -> wont fix
stage:  -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue46558>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46558] Quadratic time internal base conversions

2022-01-27 Thread Tim Peters


New submission from Tim Peters :

Our internal base conversion algorithms between power-of-2 and non-power-of-2 
bases are quadratic time, and that's been annoying forever ;-) This applies to 
int<->str and int<->decimal.Decimal conversions. Sometimes the conversion is 
implicit, like when comparing an int to a Decimal.

For example:

>>> a = 1 << 10 # yup! a billion and one bits
>>> s = str(a)

I gave up after waiting for over 8 hours, and the computation apparently can't 
be interrupted.

In contrast, using the function in the attached todecstr.py gets the result in 
under a minute:

>>> a = 1 << 10
>>> s = todecstr(a)
>>> len(s)
301029996

That builds an equal decimal.Decimal in a "clever" recursive way, and then just 
applies str to _that_.

That's actually a best case for the function, which gets major benefit from the 
mountains of trailing 0 bits. A worst case is all 1-bits, but that still 
finishes in under 5 minutes:

>>> a = 1 << 10
>>> s2 = todecstr(a - 1)
>>> len(s2)
301029996
>>> s[-10:], s2[-10:]
('1787109376', '1787109375')

A similar kind of function could certainly be written to convert from Decimal 
to int much faster, but it would probably be less effective. These things avoid 
explicit division entirely, but fat multiplies are key, and Decimal implements 
a fancier * algorithm than Karatsuba.

Not for the faint of heart ;-)

--
components: Interpreter Core
files: todecstr.py
messages: 411962
nosy: tim.peters
priority: normal
severity: normal
status: open
title: Quadratic time internal base conversions
type: performance
Added file: https://bugs.python.org/file50593/todecstr.py

___
Python tracker 
<https://bugs.python.org/issue46558>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46407] optimizing `1 << n` or `2 ** n` and modulo-only operations

2022-01-27 Thread Tim Peters


Tim Peters  added the comment:

I only merged the split-off PR that added new remainder-only functions. Still 
thinking about the `1 << n` and `2**n` one.

--
assignee:  -> tim.peters
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue46407>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46407] optimizing `1 << n` or `2 ** n` and modulo-only operations

2022-01-27 Thread Tim Peters


Tim Peters  added the comment:


New changeset f10dafc430279b4e6cf5b981ae3d1d76e8f431ad by Crowthebird in branch 
'main':
bpo-46407: Optimizing some modulo operations (GH-30653)
https://github.com/python/cpython/commit/f10dafc430279b4e6cf5b981ae3d1d76e8f431ad


--

___
Python tracker 
<https://bugs.python.org/issue46407>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45735] Promise the long-time truth that `args=list` works

2022-01-26 Thread Tim Peters


Tim Peters  added the comment:

Charliz, please do! I have no idea why Raymond just stopped. He even deleted 
his initial message here, saying "I relied on this for many years.  So, yet it 
would be nice to guarantee it :-)".

Best I can tell, nothing has changed: lots of people have relied on this 
behavior for years, and the docs should say it's fine.

--

___
Python tracker 
<https://bugs.python.org/issue45735>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46524] test_peg_generator takes 8 minutes on Windows

2022-01-25 Thread Tim Peters


Tim Peters  added the comment:

As a general thing, I expect people on Windows always run the tests with 
multiple processes. In which case it would be generally helpful to start the 
longest-running tests first. As is, because of its late-in-the-alphabet name, 
"test_peg_generator" gets started late in the process and so typically keeps 
running for minutes and minutes after all other tests have completed. So just 
renaming it to  "test_0peg_generator" would cut minutes off the Windows wall 
clock time ;-)

--

___
Python tracker 
<https://bugs.python.org/issue46524>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46524] test_peg_generator takes 8 minutes on Windows

2022-01-25 Thread Tim Peters


Change by Tim Peters :


--
nosy: +tim.peters

___
Python tracker 
<https://bugs.python.org/issue46524>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46504] Faster code for trial quotient in x_divrem

2022-01-24 Thread Tim Peters


Change by Tim Peters :


--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue46504>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46504] Faster code for trial quotient in x_divrem

2022-01-24 Thread Tim Peters


Tim Peters  added the comment:


New changeset 7c26472d09548905d8c158b26b6a2b12de6cdc32 by Tim Peters in branch 
'main':
bpo-46504: faster code for trial quotient in x_divrem() (GH-30856)
https://github.com/python/cpython/commit/7c26472d09548905d8c158b26b6a2b12de6cdc32


--

___
Python tracker 
<https://bugs.python.org/issue46504>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46406] optimize int division

2022-01-24 Thread Tim Peters


Tim Peters  added the comment:


New changeset 7c26472d09548905d8c158b26b6a2b12de6cdc32 by Tim Peters in branch 
'main':
bpo-46504: faster code for trial quotient in x_divrem() (GH-30856)
https://github.com/python/cpython/commit/7c26472d09548905d8c158b26b6a2b12de6cdc32


--

___
Python tracker 
<https://bugs.python.org/issue46406>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46504] Faster code for trial quotient in x_divrem

2022-01-24 Thread Tim Peters


Change by Tim Peters :


--
assignee:  -> tim.peters
nosy: +gregory.p.smith, mark.dickinson

___
Python tracker 
<https://bugs.python.org/issue46504>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46504] Faster code for trial quotient in x_divrem

2022-01-24 Thread Tim Peters


Change by Tim Peters :


--
keywords: +patch
pull_requests: +29037
stage: needs patch -> patch review
pull_request: https://github.com/python/cpython/pull/30856

___
Python tracker 
<https://bugs.python.org/issue46504>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46406] optimize int division

2022-01-24 Thread Tim Peters


Change by Tim Peters :


--
pull_requests: +29038
pull_request: https://github.com/python/cpython/pull/30856

___
Python tracker 
<https://bugs.python.org/issue46406>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46504] Faster code for trial quotient in x_divrem

2022-01-24 Thread Tim Peters


Change by Tim Peters :


--
versions: +Python 3.11

___
Python tracker 
<https://bugs.python.org/issue46504>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46504] Faster code for trial quotient in x_divrem

2022-01-24 Thread Tim Peters


New submission from Tim Peters :

x_divrem1() was recently (bpo-46406) changed to generate faster code for 
division, essentially nudging optimizing compilers into recognizing that modern 
processors compute the quotient and remainder with a single machine instruction.

The same can be done for x_divrem(), although it's less valuable there because 
the HW division generally accounts for a much smaller percent of its total 
runtime.

Still, it does cut a multiply and subtract out of the loop, and makes the code 
more obvious (since it brings x_divrem1() and x_divrem() back into synch).

--
components: Interpreter Core
messages: 411505
nosy: tim.peters
priority: normal
severity: normal
stage: needs patch
status: open
title: Faster code for trial quotient in x_divrem
type: performance

___
Python tracker 
<https://bugs.python.org/issue46504>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46488] listsort.txt wrongly assumes you cannot calculate leading zeros in O(1) time.

2022-01-23 Thread Tim Peters


Tim Peters  added the comment:

For any fixed width integer type, the worst case of the dead simple loop (all 
bits are zero) is a fixed upper bound. So you don't mean "constant bounded" 
either. You mean something more like "clever C code that usually runs faster 
than the obvious loop".

See my "if it's not unbearably pedantic" comment earlier ;-) Again, by 
"primitive" I meant HW-level primitive. I agree that's not wholly clear from 
what I wrote, but really don't care - it's a trivial point that makes no 
difference in context. The lack of an integer type in C wide enough to support 
the division the paper uses is _the_ deal-breaker. That C doesn't define a 
count-leading-zero function either is just a flea on the tail of that dog.

--

___
Python tracker 
<https://bugs.python.org/issue46488>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37295] Possible optimizations for math.comb()

2022-01-23 Thread Tim Peters


Tim Peters  added the comment:

OK, here's the last version I had. Preconditions are that d > 0, n > 0, and n % 
d == 0.

This version tries to use the narrowest possible integers on each step. The 
lowermost `good_bits` of dinv at the start of the loop are correct already.

Taking out all the modular stuff, the body of the loop boils down to just

dinv *= 2 - dinv * d

For insight, if

dinv * d = 1 + k*2**i

for some k and i (IOW, if dinv * d = 1 modulo 2**i), then

2 - dinv * d = 1 - k*2**i

and so dinv times that equals 1 - k**2 * 2**(2*i). Or, IOW, the next value of 
dinv is such that d * dinv = 1 modulo 2**(2*i) - it's good to twice as many 
bits.

def ediv(n, d):
assert d

def makemask(n):
return (1 << n) - 1

if d & 1 == 0:
ntz = (d & -d).bit_length() - 1
n >>= ntz
d >>= ntz
bits_needed = n.bit_length() - d.bit_length() + 1
good_bits = 3
dinv = d & 7
while good_bits < bits_needed:
twice = min(2 * good_bits, bits_needed)
twomask = makemask(twice)
fac2 = dinv * (d & twomask)
fac2 &= twomask
fac2 = (2 - fac2) & twomask
dinv = (dinv * fac2) & twomask
good_bits = twice
goodmask = makemask(bits_needed)
return ((dinv & goodmask) * (n & goodmask)) & goodmask

--

___
Python tracker 
<https://bugs.python.org/issue37295>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46488] listsort.txt wrongly assumes you cannot calculate leading zeros in O(1) time.

2022-01-23 Thread Tim Peters


Tim Peters  added the comment:

I'm not inclined to change anything here. It's a trivial point, and by 
"primitive" I had in mind a dedicated hardware instruction, blazing fast. Yes, 
I was aware of long-winded ways of doing it for specific fixed integer widths. 
But that's not what `O(1)` means. A dead obvious loop testing each bit, one at 
a time, starting with the MSB, until finding the first bit set, is also O(1) 
for any fixed-width int size.

So I'm not doing anything here. If someone else creates a PR with text they 
want to see instead, I'll review it, and if it's not unbearably pedantic ;-) 
I'll merge it.

--

___
Python tracker 
<https://bugs.python.org/issue46488>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37295] Possible optimizations for math.comb()

2022-01-22 Thread Tim Peters


Tim Peters  added the comment:

Ya, I don't expect anyone will check in a change without doing comparative 
timings in C first. Not worried about that.

I'd be happy to declare victory and move on at this point ;-) But that's me. 
Near the start of this, I noted that we just won't compete with GMP's vast 
array of tricks.

I noted that they use a special routine for division when it's known in advance 
that the remainder is 0 (as it is, e.g., in every division performed by "our" 
recursion (which GMP also uses, in some cases)).

But I didn't let on just how much that can buy them. Under 3.10.1 on my box, 
the final division alone in math.factorial(100) // 
math.factorial(50)**2 takes over 20 seconds. But a pure Python 
implementation of what I assume (don't know for sure) is the key idea(*) in 
their exact-division algorithm does the same thing in under 0.4 seconds. Huge 
difference - and while the pure Python version only ever wants the lower N bits 
of an NxN product, there's no real way to do that in pure Python except via 
throwing away the higher N bits of a double-width int product. In C, of course, 
the high half of the bits wouldn't be computed to begin with.

(*) Modular arithmetic again. Given n and d such that it's known n = q*d for 
some integer q, shift n and d right until d is odd. q is unaffected. A good 
upper bound on the bit length of q is then n.bit_length() - d.bit_length() + 1. 
Do the remaining work modulo 2 raised to that power. Call that base B.

We "merely" need to solve for q in the equation n = q*d (mod B). Because d is 
odd, I = pow(d, -1, B) exists. Just multiply both sides by I to get n * I = q 
(mod B).

No divisions of any kind are needed. More, there's also a very efficient, 
division-free algorithm for finding an inverse modulo a power of 2. To start 
with, every odd int is its own inverse mod 8, so we start with 3 good bits. A 
modular Newton-like iteration can double the number of correct bits on each 
iteration.

But I won't post code (unless someone asks) because I don't want to encourage 
anyone :-)

--

___
Python tracker 
<https://bugs.python.org/issue37295>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46071] Graphlib documentation (edge direction)

2022-01-21 Thread Tim Peters


Tim Peters  added the comment:

> I know that there are many different ways to represent
> a graph, but your graph format *is just plain wrong.*

Yet when I offered to support an option to support the graph format you insist 
is uniquely "right", you poo-poo'ed the idea. So what is your real complaint? I 
don't know. Here you once again complain about the graph format, after 
rejecting an idea to address that directly.

Sorry, but you don't make sense to me. 

> if you're willing to rewrite all of the documentation you
> probably can explain this in a clear and simple way, and
> without breaking compatibility.

I don't see that happening. The docs are quite clear to me already, but I 
recognize that I'm too close to them to be objective about that. Instead I'm 
going on ample evidence from mailing lists, blogs, and StackOverflow that yours 
is the first complaint I've seen that the basic functionality here is any way 
unclear or ill-documented. People pick it up and use it productively at once. 
Nobody else has appeared to be confused.

Where things get unclear is limited to novel applications of multiprocessing. 
But people always stumble over multiprocessing, even for something as simple 
and bulletproof as a multiprocessing.Queue. Comes with the territory.

Finally, the distinction between "predecessors" and "dependencies" here strikes 
me as being uselessly pedantic. Nobody on the planet has an actual problem with 
recognizing that, e.g., "a" is predecessor of "n" in the string "pedantic". 
That's so in both technical language and plain English. Same here.

Still, if you want to rewrite the docs, go for out. I'll stay out of it 
completely.

--

___
Python tracker 
<https://bugs.python.org/issue46071>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46071] Graphlib documentation (edge direction)

2022-01-21 Thread Tim Peters


Tim Peters  added the comment:

Perhaps you've overlooking something so obvious to the module authors that I 
haven't thought to mention it?

The purpose here is to compute a linear order. Now not even you ;-) can pretend 
to be confused about what "predecessor" and "successor" mean in a linear order. 
There's no graph involved in the final result (unless you mentally impose a 
trivial linear graph on the result sequence).

And that's how to view the `graph` argument _as intended_. It maps an element 
to the elements that must precede it _in the result_. The element's mandatory 
predecessors in any acceptable imposed total ordering.

>From that intended view, this is just wy off base:

> I wish the docs started by saying:
>
> The graph is a dict of {'start_node': ['end_nodes',]}
> The topological sorter puts the end_nodes before their start_nodes.
>   [note: this is what the code currently does]

The words "start" and "end" there would be clearer as "banana" and "beeswax" 
;-) End of what? Start of what? The "start" node has to appear _after_ all the 
"end" nodes? In what other context does the "start" appear after the "end"?

As is, the code puts the "start" node after all its declared mandatory 
predecessors. Exactly as the word "predecessors" suggests should happen.


[Dennis]
> David suggests:
>
> * static_order returns [A, B]
> * Conceptual/abstract directed graph direction is B --> A
> * mapping to be passed is {B: [A]}

IF that's what he's suggesting (I don't know), that's not a matter of 
preference, it's just plain wrong. The only possible topsort of the DAG  B --> 
A is the sequence [B, A]. For which see absolutely any text defining the terms.

There are many ways the "mapping to be passed" could be defined. David is 
certainly right that, by convention, most graphs in Python are represented by 
successor dicts. But he apparently doesn't want an option for the topsorter to 
accept such a dict either. (An irony in passing: under the covers, the class 
converts the predecessor-dict `graph` argument to a successor-dict 
representation.)

At this point, I not only don't know what he wants, his position gets less 
clear to me over time.

--

___
Python tracker 
<https://bugs.python.org/issue46071>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46071] Graphlib documentation (edge direction)

2022-01-21 Thread Tim Peters


Tim Peters  added the comment:

Now you may be getting somewhere ;-) Complaining about the docs wasn't getting 
traction because they're using standard terminology with the standard meanings, 
and tell the plain truth about what the class requires and delivers.

You wish it required something else instead. Fine: be direct about that! 
Suggest a feature enhancement rather than contorting the docs to pretend the 
class "really" requires what you _wish_ it required and is computing "the 
reverse" of what it claims to be computing. That won't be accepted  because 
it's not true.

How about, e.g., suggesting instead a new optional constructor argument, like 
`successor_graph=False`, which can be passed as `True` to say that the `graph` 
argument is a successor dict rather than the default predecessor dict?

I wouldn't oppose that, and it would be easy to implement (indeed, way back 
when here I already posted code to do it).

--

___
Python tracker 
<https://bugs.python.org/issue46071>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46071] Graphlib documentation (edge direction)

2022-01-20 Thread Tim Peters


Tim Peters  added the comment:

>> the meanings of "predecessor" and "successor" are
>> universally agreed upon

> I disagree.

I can post literally hundreds of citations that all agree: in u -> v, u is a 
direct predecessor of v, and v is a direct successor of u.

Here's one:

http://pages.cs.wisc.edu/~vernon/cs367/notes/13.GRAPH.html

Here's another from a Python context:

https://networkx.org/documentation/stable/reference/classes/generated/networkx.DiGraph.predecessors.html

"""
A predecessor of n is a node m such that there exists a directed edge from m to 
n.
"""

On & on & on. Hence my "universal".

Can you link to any that disagree?

As to the meaning of "point to", in "u -> v" it's obvious that the arrow points 
_from_ u _to_ v. I very strongly doubt you can find a credible source disputing 
that either.

--

___
Python tracker 
<https://bugs.python.org/issue46071>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46071] Graphlib documentation (edge direction)

2022-01-20 Thread Tim Peters


Tim Peters  added the comment:

I think you should give up on this. But suit yourself ;-)

Exactly the same information is conveyed whether representing a graph by 
successor or predecessor dicts. Some operations are more convenient in one 
representation than the other, but each is easily derived from the other.

For your

> The issue with the current form is that you can not traverse
> the graph, at least not forwards.

is the equal and opposite complaint that with "your" (successor) form, you 
cannot traverse the graph, at least not backwards.

What of it? _Some_ packages maintain both predecessor and successor dicts in 
lockstep. For example, Kosaraju's elegant algorithm for finding strongly 
connected components requires efficient traversal in both directions.

The topsort algorithm couldn't care less how you want to represent your graphs. 
But it does have a specific representation it requires if you want to use the 
optional `graph=` constructor argument. It's documented and works fine if used 
as documented.

> What is important about the topo-sort is that it makes all of the
> edges point in the *same* direction.

Not following that. topsort applies to directed graphs, where the meanings of 
"predecessor" and "successor" are universally agreed upon. By definition, a 
topsort must begin with an element having no predecessors. The directions of 
the arrows are crucial to that.

> It doesn't actually matter which direction that is. And credit where due, the
> library picked the more-useful direction. It was a pragmatic choice, driven
> by the real use case of dependency resolution.

Not really. It was driven by the definition of what "topological sort" means. 
Most of the API was actually driven by making it easy to use correctly 
(race-free, deadlock-free) in a dynamic (there's no way to guess from one run 
to the next which order will end up being used) multiprocessing context, 
something few topsort packages even attempt to address.

> But having the graph with arrows pointing in the wrong direction is
> going to cause endless confusion.

Starting when? ;-) Seriously, this is the first complaint I've seen, and you're 
not actually confused.

> For an example of the confusion this causes, look at the API itself. The "add"
> method explicitly calls out the fact that you can add nodes with no 
> predecessors.

And what about that is confusing? If every node has to have a predecessor, a 
topsort can't even exist!

More generally, any usable implementation of "directed graph" has to allow for 
isolated nodes too. Historically, the docs pointed that out to contrast it with 
other APIs under consideration that required special tricks to convince the 
function that a graph had isolated nodes. In this API, it's dead easy: just add 
the node with no predecessors.

> This is obviously also possible with the graph argument as it currently is.

Of course. Are you going to seriously contend this is "a problem", but the 
obvious workalike for successor dicts would not be "a problem"? (In a successor 
dict, it's equally easy to specify a node with no successors. Good! For a total 
ordering to exist, there must be at least one node with no predecessor and at 
least one with no successor - these aren't "problems", they're essentials.)

In any case, the docs say what was intended and implemented from the start: the 
optional `graph` argument is a predecessor dict representation. You're not 
required to like that decision, but there's no case here - to my eyes - made 
for saying "it's a bug".

--

___
Python tracker 
<https://bugs.python.org/issue46071>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46071] Graphlib documentation (edge direction)

2022-01-20 Thread Tim Peters


Tim Peters  added the comment:

I'm going to leave this to Pablo - adding the `graph` argument was his idea ;-)

It would, I think, have been better if this argument had been named, say, 
"preds" instead of "graph".

The docs, to my eyes, are entirely clear about that `graph` is a representation 
based on mapping a node to its predecessors:

"""
If the optional graph argument is provided it must be a dictionary representing 
a directed acyclic graph where the keys are nodes and the values are iterables 
of all predecessors of that node in the graph (the nodes that have edges that 
point to the value in the key). 
"""

but it could perhaps be usefully emphasized that "the usual" graph 
representation in Python maps a node to its successors instead.

The stuff about "direction" continues to make no sense to me, though. The class 
computes the (forward!) topsort of the graph passed to it, given that the graph 
is presented as mapping nodes to predecessors. It's only "reversed" if you 
refuse to believe the docs, and insist that `graph` is mapping a node to its 
successors. But it's not.

>>> import graphlib
>>> ts = graphlib.TopologicalSorter({'A' : ['B']})
>>> list(ts.static_order())
['B', 'A']

Nothing is "reversed". The argument passed is the predecessor form of the graph 
B -> A, and [B, A] is the forward topsort of that graph.

--

___
Python tracker 
<https://bugs.python.org/issue46071>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46071] Graphlib documentation (edge direction)

2022-01-20 Thread Tim Peters


Tim Peters  added the comment:

For the purpose of topological sorting, yes, it's by far most natural to give, 
for each node, the collection of that node's predecessors. And that's the way 
topsort applications typically collect their data to begin with, like, "here, 
for each recipe, is a list of ingredients needed first" or "before we can begin 
this job, here are the other jobs that need to complete first". or, as in 
makefiles, "here's a list of the targets that need to be built before we can 
start building the current target".

I definitely don't want to change the wording, because it's bog standard as is. 
For example, Wikipedia's "topological sorting" entry is typical:

"""
In computer science, a topological sort or topological ordering of a directed 
graph is a linear ordering of its vertices such that for every directed edge uv 
from vertex u to vertex v, u comes before v in the ordering. 
"""

It's what topsort _means_. We did not intend to implement a "reverse topsort" 
algorithm, and I see no attraction to doing so, neither in pretending we did so.

If the way the user collects their data stores only successor links (which, as 
above, seems odd in applications that actually use topsorts), then they need 
something like this instead:

ts = TopologicalSorter()
for u, successors in my_forward_graph.items():
for v in successors:
ts.add(v, u)

But that's something I never needed.

--

___
Python tracker 
<https://bugs.python.org/issue46071>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37295] Possible optimizations for math.comb()

2022-01-14 Thread Tim Peters


Tim Peters  added the comment:

Another trick, building on the last one: computing factorial(k) isn't cheap, in 
time or space, and neither is dividing by it. But we know it will entirely 
cancel out. Indeed, for each outer loop iteration, prod(p) is divisible by the 
current k. But, unlike as in Stefan's code, which materializes range(n, n-k, 
-1) as an explicit list, we have no way to calculate "in advance" which 
elements of p[] are divisible by what.

What we _can_ do is march over all of p[], and do a gcd of each element with 
the current k. If greater than 1, it can be divided out of both that element of 
p[], and the current k. Later, rinse, repeat - the current k must eventually be 
driven to 1 then.

But that slows things down: gcd() is also expensive.

But there's a standard trick to speed that too: as in serious implementations 
of Pollard's rho factorization method, "chunk it". That is, don't do it on 
every outer loop iteration, but instead accumulate the running product of 
several denominators first, then do the expensive gcd pass on that product.

Here's a replacement for "the main loop" of the last code that delays doing 
gcds until the running product is at least 2000 bits:

fold_into_p(n)

kk = 1
for k in range(2, k+1):
n -= 1
# Merge into p[].
fold_into_p(n)
# Divide by k.
kk *= k
if kk.bit_length() < 2000:
continue
for i, pi in enumerate(p):
if pi > 1:
g = gcd(pi, kk)
if g > 1:
p[i] = pi // g
kk //= g
if kk == 1:
break
assert kk == 1
showp()
return prod(x for x in p if x > 1) // kk

That runs in under half the time (for n=100, k=50), down to under 7.5 
seconds. And, of course, the largest denominator consumes only about 2000 bits 
instead of 50!'s 8,744,448 bits.

Raising the kk bit limit from 2000 to 1 cuts another 2.5 seconds off, down 
to about 5 seconds.

Much above that, it starts getting slower again.

Seems to hard to out-think! And highly dubious to fine-tune it based on a 
single input case ;-)

Curious: at a cutoff of 1 bits, we're beyond the point where Karatsuba 
would have paid off for computing denominator partial products too.

--
versions:  -Python 3.11

___
Python tracker 
<https://bugs.python.org/issue37295>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37295] Possible optimizations for math.comb()

2022-01-13 Thread Tim Peters


Tim Peters  added the comment:

I was thinking about

comb(100, 50)

The simple "* --n / ++k" loop does 499,999 each of multiplication and division, 
and in all instances the second operand is a single Python digit. Cheap as can 
be.

In contrast, despite that it short-circuits all "small k" cases, comb_pole2.py 
executes

return C(n, j) * C(n-j, k-j) // C(k, j)   # Recursive case

7,299,598 times. Far more top-level arithmetic operations, including 
extra-expensive long division with both operands multi-digit. At the very top 
of the tree, "// C(50, 25)" is dividing out nearly half a million bits.

But a tiny Python function coding the simple loop takes about 150 seconds, 
while Raymond's C() about 30 (under the released 3.10.1). The benefits from 
provoking Karatsuba are major.

Just for the heck of it, I coded a complication of the simple loop that just 
tries to provoke Karatsuba on the numerator (prod(range(n, n-k, -1))), and 
dividing that just once at the end, by factorial(k). That dropped the time to 
about 17.5 seconds. Over 80% of that time is spent computing the "return" 
expression, where len(p) is 40 and only 7 entries pass the x>1 test:

return prod(x for x in p if x > 1) // factorial(k)

That is, with really big results, almost everything is mostly noise compared to 
the time needed to do the relative handful of * and // on the very largest 
intermediate results.

Stefan's code runs much faster still, but requires storage proportional to k. 
The crude hack I used needs a fixed (independent of k and n) and small amount 
of temp storage, basically grouping inputs into buckets roughly based on the 
log _of_ their log, and keeping only one int per bucket.

Here's the code:

def pcomb2(n, k):
from math import prod, factorial

assert 0 <= k <= n
k = min(k, n-k)
PLEN = 40 # good enough for ints with over a trillion bits
p = [1] * PLEN

def fold_into_p(x):
if x == 1:
return
while True:
i = max(x.bit_length().bit_length() - 5, 0)
if p[i] == 1:
p[i] = x
break
x *= p[i]
p[i] = 1

def showp():
for i in range(PLEN):
pi = p[i]
if pi > 1:
print(i, pi.bit_length())

for i in range(k):
fold_into_p(n)
n -= 1
showp()
return prod(x for x in p if x > 1) // factorial(k)

I'm not sure it's practical. For example, while the list p[] can be kept quite 
small, factorial(k) can require a substantial amount of temp storage of its own 
- factorial(50) in the case at hand is an int approaching 9 million bits.

Note: again in the case at hand, computing it via

factorial(100) // factorial(50)**2

takes about 10% longer than Raymond's C() function.

--

___
Python tracker 
<https://bugs.python.org/issue37295>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46020] Optimize long_pow for the common case

2022-01-12 Thread Tim Peters


[issue37295] Possible optimizations for math.comb()

2022-01-12 Thread Tim Peters


Tim Peters  added the comment:

A feature of the current code is that, while the recursion tree can be very 
wide, it's not tall, with max depth proportional to the log of k. But it's 
proportional to k in the proposal (the C(n-j, k-j) term's second argument goes 
down by at most 20 per recursion level).

So, e.g., C(100, 50) dies with RecursionError in Python; in C, whatever 
platform-specific weird things can happen when the C stack is blown.

The width of the recursion tree could be slashed in any case by "just dealing 
with" (say) k <= 20 directly, no matter how large n is. Do the obvious loop 
with k-1 multiplies, and k-1 divides by the tiny ints in range(2, k+1). Note 
that "almost all" the calls in your "trace for the old code" listing are due to 
recurring for k <= 20.  Or use Stefan's method: if limited to k <= 20, it only 
requires a tiny precomputed table of the 8 primes <= 20, and a stack array to 
hold range(n, n-k, -1); that can be arranged to keep Karatsuba in play if n is 
large.

An irony is that the primary point of the recurrence is to get Karatsuba 
multiplication into play, but the ints involved in computing C(240, 112) are 
nowhere near big enough to trigger that.

To limit recursion depth, I think you have to change your approach to decide in 
advance the deepest you're willing to risk letting it go, and keep the current 
j = k // 2 whenever repeatedly subtracting 20 could exceed that.

--

___
Python tracker 
<https://bugs.python.org/issue37295>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37295] Possible optimizations for math.comb()

2022-01-11 Thread Tim Peters


Tim Peters  added the comment:

Just noting that comb_pole.py requires a development version of Python to run 
(under all released versions, a byteorder argument is required for int.{to, 
from}_byte() calls).

--

___
Python tracker 
<https://bugs.python.org/issue37295>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46020] Optimize long_pow for the common case

2022-01-11 Thread Tim Peters


Tim Peters  added the comment:

GH_30555 helps a bit by leaving the giant-exponent table of small odd powers as 
uninitialized stack trash unless it's actually used.

--

___
Python tracker 
<https://bugs.python.org/issue46020>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46020] Optimize long_pow for the common case

2022-01-11 Thread Tim Peters


Change by Tim Peters :


--
keywords: +patch
pull_requests: +28756
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/30555

___
Python tracker 
<https://bugs.python.org/issue46020>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46235] Do all ref-counting at once for sequence multiplication

2022-01-07 Thread Tim Peters


Change by Tim Peters :


--
assignee:  -> Dennis Sweeney
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue46235>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46235] Do all ref-counting at once for sequence multiplication

2022-01-07 Thread Tim Peters


Tim Peters  added the comment:


New changeset ad1d5908ada171eff768291371a80022bfad4f04 by Dennis Sweeney in 
branch 'main':
bpo-46235: Do all ref-counting at once during list/tuple multiplication 
(GH-30346)
https://github.com/python/cpython/commit/ad1d5908ada171eff768291371a80022bfad4f04


--
nosy: +tim.peters

___
Python tracker 
<https://bugs.python.org/issue46235>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46187] Optionally support rounding for math.isqrt()

2022-01-04 Thread Tim Peters


Tim Peters  added the comment:

> Is
>
>  i, rem = isqrt_rem(n)
>  i + (rem != 0)
>
> better than
>
>   (isqrt(n<<2) + 1) >> 1
>
> or
>
>   n and isqrt(n-1) + 1
>
> ?

Define "better"? The first way is by far the most obvious of the three, and the 
second way the least obvious. The first way also "wins" on being  a variation 
of a uniform pattern that can deliver the floor, ceiling, or rounded result, 
depending on which simple comparison result is added. It's not "a trick" - it's 
the opposite of clever ;-)

The first way is also unique in being the only one of the three that does _not_ 
do any Python-level arithmetic on integers as wide as `n`. `i` and `rem` are no 
more than about half the bit length of `n`. `n << 2` and `n - 1` in the others 
have to create new int objects at least as wide as `n`.

--

___
Python tracker 
<https://bugs.python.org/issue46187>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46233] Minor speedup for bigint squaring

2022-01-03 Thread Tim Peters


Change by Tim Peters :


--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue46233>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46233] Minor speedup for bigint squaring

2022-01-03 Thread Tim Peters


Tim Peters  added the comment:


New changeset 3aa5242b54b0627293d95cfb4a26b2f917f667be by Tim Peters in branch 
'main':
bpo-46233: Minor speedup for bigint squaring (GH-30345)
https://github.com/python/cpython/commit/3aa5242b54b0627293d95cfb4a26b2f917f667be


--

___
Python tracker 
<https://bugs.python.org/issue46233>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46187] Optionally support rounding for math.isqrt()

2022-01-03 Thread Tim Peters


Tim Peters  added the comment:

I've made several good-faith efforts to find any hint of demand for rounded 
isqrt on the web; I've found no direct support for it in other 
languages/environments(*); and the one use case you presented isn't compelling. 
Building static tables to help implement extended precision functions via 
scaled integer arithmetic is quite niche, and anyone numerically knowledgeable 
enough to pull that off will have no trouble figuring out how to roll their own 
isqrt rounding. For them, isqrtrem would make it dead easy.

Close to no demonstrated demand, and no demonstrated prior art, coupled with 
slowing down _every_ isqrt() call to cater to an overwhelmingly unused 
possibility (and, to boot, via the slowest possible way of specifying an 
option, on what should be a fast function), leaves me -1. No, I'm not a fan of 
the "big"/"little" mandatory arguments to int.{from,to}_bytes() either.

+0 on isqrtrem, though, because it follows established practice in a related 
context (mpz), and also offers efficient support for the related use case I 
actually found some (albeit small) demand for ("and was it a perfect square?").

(*) Other languages with an integer square root include Common Lisp, Julia, and 
Maple's programming language. None have an option for rounding. Maple's doesn't 
define the rounding used, other than to promise the result is less than 1 away 
from the infinitely precise result. Common Lisp and Julia return the floor. A 
number of environments essentially inherit isqrt from their Lisp base (e.g., 
wxMaxima).

--

___
Python tracker 
<https://bugs.python.org/issue46187>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46020] Optimize long_pow for the common case

2022-01-03 Thread Tim Peters


Tim Peters  added the comment:

I was suprised that

https://bugs.python.org/issue44376 

managed to get i**2 to within a factor of 2 of i*i's speed. The overheads of 
running long_pow() at all are high! Don't overlook that initialization of stack 
variables at the start, like

PyLongObject *z = NULL;  /* accumulated result */

isn't free - code has to be generated to force zeroes into those variables. The 
initialization of `table[]` alone requires code to fill 256 memory bytes with 
zeroes (down to 128 on current main branch). Nothing is free.

We can't sanely move the `table` initialization expense into the "giant k-ary 
window exponentiation" block either, because every bigint operation can fail 
("out of memory"), and the macros for doing the common ones (MULT and REDUCE) 
can do "goto Error;", and that common exit code has no way to know what is or 
isn't initialized. We can't let it see uninitialized stack trash.

The exit code in turn has a string of things like

Py_DECREF(a);
Py_DECREF(b);
Py_XDECREF(c);

and those cost cycles too, including tests and branches.

So the real "outrage" to me is why x*x took 17.6 nsec for x == 10 in the 
original report. That's many times longer than the HW takes to do the actual 
multiply. Whether it's spelled x*x or x**2, we're overwhelming timing 
overheads. `pow()` has many because it's a kind of Swiss army knife doing all 
sorts of things; what's `x*x`'s excuse? ;-)

--

___
Python tracker 
<https://bugs.python.org/issue46020>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46233] Minor speedup for bigint squaring

2022-01-02 Thread Tim Peters


Change by Tim Peters :


--
keywords: +patch
pull_requests: +28557
stage: needs patch -> patch review
pull_request: https://github.com/python/cpython/pull/30345

___
Python tracker 
<https://bugs.python.org/issue46233>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46233] Minor speedup for bigint squaring

2022-01-02 Thread Tim Peters


New submission from Tim Peters :

longobject.c's x_mul()'s special code for squaring gets kind of sloppy at the 
end of a digit pass, doing a useless add of 0 and an "extra" test for carry. 
Easily cleaned up.

I think the underlying cause is that the HAC algorithm description it was 
modeled on was quite "hand wavy" about how badly, and exactly when, the carry 
can exceed a single digit. Things are better-behaved at the end of a digit pass.

--
assignee: tim.peters
components: Interpreter Core
messages: 409534
nosy: tim.peters
priority: normal
severity: normal
stage: needs patch
status: open
title: Minor speedup for bigint squaring
type: performance
versions: Python 3.11

___
Python tracker 
<https://bugs.python.org/issue46233>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46218] Change long_pow() to sliding window algorithm

2022-01-02 Thread Tim Peters


Change by Tim Peters :


--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue46218>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46218] Change long_pow() to sliding window algorithm

2022-01-02 Thread Tim Peters


Tim Peters  added the comment:


New changeset 863729e9c6f599286f98ec37c8716e982c4ca9dd by Tim Peters in branch 
'main':
bpo-46218: Change long_pow() to sliding window algorithm (GH-30319)
https://github.com/python/cpython/commit/863729e9c6f599286f98ec37c8716e982c4ca9dd


--

___
Python tracker 
<https://bugs.python.org/issue46218>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46218] Change long_pow() to sliding window algorithm

2022-01-01 Thread Tim Peters


Tim Peters  added the comment:

Since cutting the window size by 1 cut the size of the dynamic precomputed 
table in half, the overhead of the sliding window method was correspondingly 
reduced. It can pay off at exponents of half the bit length now, so that 
threshold was also changed.

--

___
Python tracker 
<https://bugs.python.org/issue46218>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46218] Change long_pow() to sliding window algorithm

2022-01-01 Thread Tim Peters


Tim Peters  added the comment:

Nope, boosting the window size to 6 doesn't appear to really help at all, at 
least not on my box. Regardless of the window size, we have to do a bigint 
square for every bit position in the exponent. I don't know of any way to speed 
that (short of speeding multiplication, and we already make a special case of 
squaring).

So that dominates. Even for an exponent of a million solid 1-bits, the overall 
timing difference appears insignificant.

So I cut the window size back to 5 bits again. That has the benefit of cutting 
the old table size in half, which also cuts the precomputation time to fill it 
about in half, which should have some minor benefit for saner (smaller) 
exponents.

As a sanity check, I also tried cutting the window size to 3 bits. That had an 
obvious bad timing impact on huge exponents of solid 1 bits.

--

___
Python tracker 
<https://bugs.python.org/issue46218>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46218] Change long_pow() to sliding window algorithm

2021-12-31 Thread Tim Peters


Change by Tim Peters :


--
keywords: +patch
pull_requests: +28535
stage: needs patch -> patch review
pull_request: https://github.com/python/cpython/pull/30319

___
Python tracker 
<https://bugs.python.org/issue46218>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46218] Change long_pow() to sliding window algorithm

2021-12-31 Thread Tim Peters


New submission from Tim Peters :

As discussed on python-dev, it would be nice to break long_pow()'s reliance on 
that the number of bits in a CPython long "digit" is a multiple of 5.

Moving to the sliding-window algorithm would do that (it has to be able to 
cross digit boundaries to fill the current window), and would be better on some 
other counts too: the precomputed table can be half the size (or we can add an 
extra bit to the window for the same table size), and it can - depending on 
exponent bit patterns - sometimes reduce the number of multiplies needed too.

So I intend to do that, and bump the window size from 5 to 6 (which would yield 
a significant, although modest, speed improvement for giant-exponent cases).

--
assignee: tim.peters
components: Interpreter Core
messages: 409446
nosy: tim.peters
priority: normal
severity: normal
stage: needs patch
status: open
title: Change long_pow() to sliding window algorithm
type: performance
versions: Python 3.11

___
Python tracker 
<https://bugs.python.org/issue46218>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46187] Optionally support rounding for math.isqrt()

2021-12-31 Thread Tim Peters


Tim Peters  added the comment:

The name "isqrtrem" works fine for me too, and, indeed, is even more 
reminiscent of mpz's workalike function.

> the number of times I've needed rounded integer square root
> is small compared with the number of times I've needed rounded
> integer division

Speaking of which, the analogy to divmod extends just beyond the lack of an 
underscore in the name ;-) divmod() allows easy emulation of any division 
rounding mode, and an instant answer to "was the division exact?". isqrtrem() 
would support the same for square roots.

--

___
Python tracker 
<https://bugs.python.org/issue46187>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46187] Optionally support rounding for math.isqrt()

2021-12-31 Thread Tim Peters


Tim Peters  added the comment:

Suppose we added isqrt_rem(n), returning the integer pair (i, rem) such that

n == i**2 + rem
0 <= rem <= 2*i

Then:

- Want the floor of sqrt(n)? i.
- The ceiling? i + (rem != 0).
- Rounded? i + (rem > i).
- Is n a perfect square? not rem.

That's how mpz addresses these, although it has a different function to compute 
the floor without returning the remainder too.

I wouldn't object to that - just +0, though. Depending on implementation 
details, which I haven't investigated, it may or may not be materially faster 
than doing:

def isqrt_rem(n):
return (i := isqrt(n)), n - i*i

myself.

--

___
Python tracker 
<https://bugs.python.org/issue46187>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46187] Optionally support rounding for math.isqrt()

2021-12-30 Thread Tim Peters


Tim Peters  added the comment:

I'm not rejecting anything. Mark wrote the function, and I'll defer to him.

Yes, you added `initial` to `accumulate()`, and I spent hours in all explaining 
to you why it was an utterly natural addition, conspicuous by absence, 
including writing up several real-life use cases that just happened to pop up 
while that discussion dragged on. Thanks! I've used it many more times since it 
was added. Adding it brought accumulate() into line with other languages' 
workalikes. Python was the oddball there.

OTOH, not even gmp's mpz supports rounded isqrt directly, just truncated, and a 
_different_ function that returns not only isqrt(n), but also "the remainder" 
(n - isqrt(n)**2). The latter shows they're not averse to adding cruft that can 
easily be done - which makes it all the more suggestive that they _didn't_ also 
add other rounding modes.

Note: the primary intended "use case" for the version with the remainder 
appears to be a fast way to detect if the argument was a perfect square. As 
mentioned earlier, the lack of a builtin way to do that is the only complaint 
I've seen before about Python's isqrt.

Mark didn't mention his use case for rounded isqrt, I don't have any myself, 
and your only named use case was "building a table of square roots incorporated 
in arbitrary precision functions implemented with scaled integer arithmetic". 
It's very hard to believe that minor speed boosts make any difference at all 
when building static tables, so "it's faster built in" falls flat for me.

To the contrary, adding _any_ optional argument (be it a string or of any other 
type) complicates the calling sequence and requires burning cycles on every 
invocation just to work out which arguments _were_ passed. So, despite that I 
have no expectation of ever asking for a rounded isqrt, adding the possibility 
would cost _me_ cycles on every isqrt() call I already have. None of which are 
in one-shot table-builders, but mostly in loops.

If there's a _compelling_ use case for rounded isqrt, I haven't seen one, but 
I'd be much happier to see it added it as a new function of its own. Like, 
e.g., gmp added mpz_sqrtrem() rather than add an annoying "mode" argument to 
the widely used mpz_sqrt().

I don't really object to string arguments per se, but in the context of what 
one hopes to be blazing fast integer functions, throwing in string comparisons 
just to set a flag seems ludicrously expensive. _PyUnicode_EqualToASCIIId() is 
not my idea of "cheap" ;-)

--

___
Python tracker 
<https://bugs.python.org/issue46187>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46187] Optionally support rounding for math.isqrt()

2021-12-30 Thread Tim Peters


Tim Peters  added the comment:

Pretend this were the itertools module and you were responding to someone 
suggesting a new optional argument to one of its functions. You have tons of 
experience rejecting those, and much the same arguments apply, from "not every 
one-line function needs to be built in" to "in seventeen centuries this is the 
first time someone asked for it" ;-)

Seriously, the only complaint I've seen about isqrt() before is that there 
isn't a version that raises an exception if the argument isn't a perfect 
square. That's so niche I wouldn't support adding that to the core either. 
Sure, sometimes that's what someone may want, and that's fine - they can do it 
on their own.

--

___
Python tracker 
<https://bugs.python.org/issue46187>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46187] Optionally support rounding for math.isqrt()

2021-12-30 Thread Tim Peters


Tim Peters  added the comment:

[Mark]
> The idea is no more than: "compute an extra bit, then use
> that extra bit to determine which way to round".

Thanks! Despite that this is what the derivation I sketched boiled down to, I 
was still missing how general the underlying principle was. Duh! Indeed, 
nothing special about sqrt here. So I'm delightfully surprised again :-)

--

___
Python tracker 
<https://bugs.python.org/issue46187>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37295] Possible optimizations for math.comb()

2021-12-30 Thread Tim Peters


Tim Peters  added the comment:

> Aargh! That is of course what I meant, but not in fact
> what I timed. :-(

!!! Even more baffling then. Seems like the code posted got out of 
math_comb_impl() early here:

if (overflow || ki > ni) {
result = PyLong_FromLong(0);
goto done;
}

67 out of every 68 times comb() was called, before any actual ;-) computation 
was even tried. Yet one way was significantly faster than the other overall, 
despite that they were so rarely executed at all?

Something ... seems off here ;-)

--

___
Python tracker 
<https://bugs.python.org/issue37295>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37295] Possible optimizations for math.comb()

2021-12-30 Thread Tim Peters


Tim Peters  added the comment:

[Mark]
> I ran some timings for comb(k, 67) on my macOS / Intel MacBook Pro,
> using timeit to time calls to a function that looked like this:
>
> def f(comb):
> for k in range(68):
> for _ in range(256):
> comb(k, 67)
> comb(k, 67)
>... # 64 repetitions of comb(k, 67) in all

I'm assuming you meant to write comb(67, k) instead, since the comb(k, 67) 
given is 0 at all tested k values except for k=67, and almost never executes 
any of the code in question.

It's surprising to me that even the long-winded popcount code was faster! The 
other way needs to read up 3 1-byte values from a trailing zero table, but the 
long-winded popcount emulation needs to read up 4 4-byte mask constants (or are 
they embedded in the instruction stream?), in addition to doing many more 
bit-fiddling operations (4 shifts, 4 "&" masks, 3 add/subtract, and a multiply 
- compared to just 2 add/subtract).

So if the results are right, Intel timings make no sense to me at all ;-)

--

___
Python tracker 
<https://bugs.python.org/issue37295>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37295] Possible optimizations for math.comb()

2021-12-29 Thread Tim Peters


Tim Peters  added the comment:

[Tim]
> That's [Mark's code] cheaper than almost every case handled by
> perm_comb_small()'s current ... "iscomb" loop.

Although I should clarify they're aimed at different things, and don't overlap 
all that much. Mark's code, & even more so Raymond's extension, picks on small 
"n" and then looks for the largest "k" such that comb(n, k) can be done with 
supernatural speed.

But the existing perm_comb_small() picks on small "k" and then looks for the 
largest "n" such that "the traditional" one-at-a-time loop can complete without 
ever overflowing a C uint64 along the way.

The latter is doubtless more valuable for perm_comb_small(), since its 
recursive calls cut k roughly in half, and the first such call doesn't reduce n 
at all.

But where they do overlap (e.g., comb(50, 15)), Mark's approach is much faster, 
so that should be checked first.

--

___
Python tracker 
<https://bugs.python.org/issue37295>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37295] Possible optimizations for math.comb()

2021-12-29 Thread Tim Peters


Tim Peters  added the comment:

A practical caution about this in comb_with_side_limits.py:

Pmax = 25   # 25 41
Fmax = Pmax

It's true then that 25! == F[25] << S[25], but that's so in Python. The result 
has 84 bits, so 64-bit C arithmetic isn't enough.

That's apparently why mathmodule.c's static SmallFactorials[] table ends at 20 
(the largest n such that n! fits in 64 bits).

(Well, actually, it ends at 12 on Windows, where sizeof(long) is 4, even on 
"modern" 64-bit boxes)

I would, of course, use uint64_t for these things - "long" and "long long" are 
nuisances, and not even attractive ones ;-) While support (both software and 
HW) for 64-bit ints progressed steadily in the 32-bit era, the same does not 
appear to be true of 128-bit ints in the 64-bit era. Looks to me like 64-bit 
ints will become as much a universal HW truth as, say, 2's-complement, and byte 
addresses, have become.

--

___
Python tracker 
<https://bugs.python.org/issue37295>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37295] Possible optimizations for math.comb()

2021-12-29 Thread Tim Peters


Tim Peters  added the comment:

{Serhiy]
> It can benefit larger arguments if move the code in
> perm_comb_small().

Seems pretty clear that the newer code really belongs there instead.

> And perhaps loops in perm_comb_small() could be optimized 
> by using precalculated values for some products.

For all n <= 67, the newer code computes comb(n, k), for all k, in a small and 
fixed number of operations, all in platform C arithmetic. Two multiplies, a 
shift, and some fiddling to compute the shift count. No divisions. That's 
cheaper than almost every case handled by perm_comb_small()'s current

k < Py_ARRAY_LENGTH(fast_comb_limits)
   && n <= fast_comb_limits[k])

"iscomb" loop. That loop is constrained by that all intermediate results have 
to fit in a C unsigned long long, but the newer code only needs to know that 
the _final_ result fits (intermediate overflows are both expected and harmless 
- indeed, they're _necessary_ for the modular arithmetic to work as intended).

--

___
Python tracker 
<https://bugs.python.org/issue37295>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



  1   2   3   4   5   6   7   8   9   10   >