[issue37295] Possible optimizations for math.comb()

2022-01-23 Thread Tim Peters
Tim Peters added the comment: OK, here's the last version I had. Preconditions are that d > 0, n > 0, and n % d == 0. This version tries to use the narrowest possible integers on each step. The lowermost `good_bits` of dinv at the start of the loop are correct already. Taking out all the

[issue37295] Possible optimizations for math.comb()

2022-01-23 Thread Raymond Hettinger
Raymond Hettinger added the comment: > But I won't post code (unless someone asks) Okay, I'll ask. -- ___ Python tracker ___ ___

[issue37295] Possible optimizations for math.comb()

2022-01-22 Thread Tim Peters
Tim Peters added the comment: Ya, I don't expect anyone will check in a change without doing comparative timings in C first. Not worried about that. I'd be happy to declare victory and move on at this point ;-) But that's me. Near the start of this, I noted that we just won't compete with

[issue37295] Possible optimizations for math.comb()

2022-01-19 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: comb(n, k) can be computed as perm(n, k) // factorial(k). $ ./python -m timeit -r1 -n1 -s 'from math import comb' "comb(100, 50)" recursive: 1 loop, best of 1: 9.16 sec per loop iterative: 1 loop, best of 1: 164 sec per loop $ ./python -m timeit

[issue37295] Possible optimizations for math.comb()

2022-01-19 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: All this should be tested with the C implementation because relative cost of operations is different in C and Python. I have tested Raymond's idea about using iterative algorithm for small k. $ ./python -m timeit -s 'from math import comb' "comb(3329023,

[issue37295] Possible optimizations for math.comb()

2022-01-14 Thread Tim Peters
Tim Peters added the comment: Another trick, building on the last one: computing factorial(k) isn't cheap, in time or space, and neither is dividing by it. But we know it will entirely cancel out. Indeed, for each outer loop iteration, prod(p) is divisible by the current k. But, unlike as

[issue37295] Possible optimizations for math.comb()

2022-01-13 Thread Tim Peters
Tim Peters added the comment: I was thinking about comb(100, 50) The simple "* --n / ++k" loop does 499,999 each of multiplication and division, and in all instances the second operand is a single Python digit. Cheap as can be. In contrast, despite that it short-circuits all

[issue37295] Possible optimizations for math.comb()

2022-01-12 Thread Raymond Hettinger
Change by Raymond Hettinger : Added file: https://bugs.python.org/file50559/comb_pole2.py ___ Python tracker ___ ___ Python-bugs-list

[issue37295] Possible optimizations for math.comb()

2022-01-12 Thread Raymond Hettinger
Raymond Hettinger added the comment: def comb64(n, k): 'comb(n, k) in multiplicative group modulo 64-bits' return (F[n] * Finv[k] * Finv[n-k] & (2**64-1)) << (S[n] - S[k] - S[n - k]) def comb_iterative(n, k): 'Straight multiply and divide when k is small.' result = 1 for r

[issue37295] Possible optimizations for math.comb()

2022-01-12 Thread Raymond Hettinger
Raymond Hettinger added the comment: Okay, will set a cap on the n where a fixedj is used. Also, making a direct computation for k<20 is promising. -- ___ Python tracker

[issue37295] Possible optimizations for math.comb()

2022-01-12 Thread Raymond Hettinger
Change by Raymond Hettinger : -- Removed message: https://bugs.python.org/msg410440 ___ Python tracker ___ ___ Python-bugs-list

[issue37295] Possible optimizations for math.comb()

2022-01-12 Thread Raymond Hettinger
Raymond Hettinger added the comment: ISTM that the asymptotic benefits of Karatsuba multiplication are mostly lost when each of the factors has to be built-up recursively prior to the multiply. Also, the benefits of Karatsuba only start to appear at around 400-bit numbers. For us, we don't

[issue37295] Possible optimizations for math.comb()

2022-01-12 Thread Tim Peters
Tim Peters added the comment: A feature of the current code is that, while the recursion tree can be very wide, it's not tall, with max depth proportional to the log of k. But it's proportional to k in the proposal (the C(n-j, k-j) term's second argument goes down by at most 20 per

[issue37295] Possible optimizations for math.comb()

2022-01-12 Thread Raymond Hettinger
Raymond Hettinger added the comment: Ran some timings on the pure python version with and without the precomputed diagonal: [C(n, k) for k in range(n+1)] New CodeBaseline - - n=85 156 usec160 usec n=137 632 usec 1.82

[issue37295] Possible optimizations for math.comb()

2022-01-11 Thread Raymond Hettinger
Change by Raymond Hettinger : Removed file: https://bugs.python.org/file50556/comb_pole.py ___ Python tracker ___ ___ Python-bugs-list

[issue37295] Possible optimizations for math.comb()

2022-01-11 Thread Raymond Hettinger
Raymond Hettinger added the comment: Just posted an update that runs on 3.8 or later. -- Added file: https://bugs.python.org/file50557/comb_pole.py ___ Python tracker ___

[issue37295] Possible optimizations for math.comb()

2022-01-11 Thread Tim Peters
Tim Peters added the comment: Just noting that comb_pole.py requires a development version of Python to run (under all released versions, a byteorder argument is required for int.{to, from}_byte() calls). -- ___ Python tracker

[issue37295] Possible optimizations for math.comb()

2022-01-11 Thread Raymond Hettinger
Raymond Hettinger added the comment: One fixup: - j = min(k // 2, FixedJ) + j = FixedJ if k > FixedJ else k // 2 With that fix, the number of 64-bit mod arithmetic calls drops to 3, 4, and 20 for C(200,100), C(225,112), and C(250,125). The compares to 115, 150, and 193 calls in

[issue37295] Possible optimizations for math.comb()

2022-01-11 Thread Raymond Hettinger
Change by Raymond Hettinger : Removed file: https://bugs.python.org/file50555/comb_pole.py ___ Python tracker ___ ___ Python-bugs-list

[issue37295] Possible optimizations for math.comb()

2022-01-11 Thread Raymond Hettinger
Raymond Hettinger added the comment: I've been experimenting with a modification of Serhiy's recurrence relation, using a different value of j rather than j=k//2. The current design splits-off three ways when it recurses, so the number of calls grows quickly. For C(200,100), C(225,112),

[issue37295] Possible optimizations for math.comb()

2022-01-09 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: New changeset 2d787971c65b005d0cce219399b9a8e2b70d4ef4 by Serhiy Storchaka in branch 'main': bpo-37295: Use constant-time comb() and perm() for larger n depending on k (GH-30305)

[issue37295] Possible optimizations for math.comb()

2021-12-31 Thread Mark Dickinson
Mark Dickinson added the comment: New changeset 0b58bac3e7877d722bdbd3c38913dba2cb212f13 by Mark Dickinson in branch 'main': bpo-37295: More direct computation of power-of-two factor in math.comb (GH-30313) https://github.com/python/cpython/commit/0b58bac3e7877d722bdbd3c38913dba2cb212f13

[issue37295] Possible optimizations for math.comb()

2021-12-31 Thread Mark Dickinson
Mark Dickinson added the comment: > I'd be happy to change the implementation to use the trailing zero counts as > suggested. Done in GH-30313 (though this will conflict with Serhiy's PR). -- ___ Python tracker

[issue37295] Possible optimizations for math.comb()

2021-12-31 Thread Mark Dickinson
Change by Mark Dickinson : -- pull_requests: +28529 pull_request: https://github.com/python/cpython/pull/30313 ___ Python tracker ___

[issue37295] Possible optimizations for math.comb()

2021-12-30 Thread Raymond Hettinger
Raymond Hettinger added the comment: > I'd be happy to change the implementation to use the trailing > zero counts as suggested. Thanks. I think that is a portability win and will made the code a lot easier to explain. -- ___ Python tracker

[issue37295] Possible optimizations for math.comb()

2021-12-30 Thread Mark Dickinson
Change by Mark Dickinson : Added file: https://bugs.python.org/file50531/driver.py ___ Python tracker ___ ___ Python-bugs-list mailing list

[issue37295] Possible optimizations for math.comb()

2021-12-30 Thread Mark Dickinson
Mark Dickinson added the comment: Thanks Tim for spotting the stupid mistake. The reworked timings are a bit more ... plausible. tl;dr: On my machine, Raymond's suggestion gives a 2.2% speedup in the case where POPCNT is not available, and a 0.45% slowdown in the case that it _is_

[issue37295] Possible optimizations for math.comb()

2021-12-30 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: PR 30305 applies Mark's algorithm for larger n (up to 127) depending on k, as was suggested by Raymond. Note that it uses different table for limits, which maps k to maximal n. -- ___ Python tracker

[issue37295] Possible optimizations for math.comb()

2021-12-30 Thread Serhiy Storchaka
Change by Serhiy Storchaka : -- pull_requests: +28518 pull_request: https://github.com/python/cpython/pull/30305 ___ Python tracker ___

[issue37295] Possible optimizations for math.comb()

2021-12-30 Thread Tim Peters
Tim Peters added the comment: > Aargh! That is of course what I meant, but not in fact > what I timed. :-( !!! Even more baffling then. Seems like the code posted got out of math_comb_impl() early here: if (overflow || ki > ni) { result = PyLong_FromLong(0);

[issue37295] Possible optimizations for math.comb()

2021-12-30 Thread Mark Dickinson
Mark Dickinson added the comment: > I'm assuming you meant to write comb(67, k) instead Aargh! That is of course what I meant, but not in fact what I timed. :-( I'll redo the timings. Please disregard the previous message. -- ___ Python tracker

[issue37295] Possible optimizations for math.comb()

2021-12-30 Thread Tim Peters
Tim Peters added the comment: [Mark] > I ran some timings for comb(k, 67) on my macOS / Intel MacBook Pro, > using timeit to time calls to a function that looked like this: > > def f(comb): > for k in range(68): > for _ in range(256): > comb(k, 67) > comb(k,

[issue37295] Possible optimizations for math.comb()

2021-12-30 Thread Mark Dickinson
Mark Dickinson added the comment: > So which of xor-popcount and add-up-up-trailing-zero-counts is faster may > well depend on platform. I ran some timings for comb(k, 67) on my macOS / Intel MacBook Pro, using timeit to time calls to a function that looked like this: def f(comb): for

[issue37295] Possible optimizations for math.comb()

2021-12-29 Thread Tim Peters
Tim Peters added the comment: [Tim] > That's [Mark's code] cheaper than almost every case handled by > perm_comb_small()'s current ... "iscomb" loop. Although I should clarify they're aimed at different things, and don't overlap all that much. Mark's code, & even more so Raymond's extension,

[issue37295] Possible optimizations for math.comb()

2021-12-29 Thread Tim Peters
Tim Peters added the comment: A practical caution about this in comb_with_side_limits.py: Pmax = 25 # 25 41 Fmax = Pmax It's true then that 25! == F[25] << S[25], but that's so in Python. The result has 84 bits, so 64-bit C arithmetic isn't enough. That's

[issue37295] Possible optimizations for math.comb()

2021-12-29 Thread Tim Peters
Tim Peters added the comment: {Serhiy] > It can benefit larger arguments if move the code in > perm_comb_small(). Seems pretty clear that the newer code really belongs there instead. > And perhaps loops in perm_comb_small() could be optimized > by using precalculated values for some

[issue37295] Possible optimizations for math.comb()

2021-12-29 Thread Raymond Hettinger
Raymond Hettinger added the comment: > Did you try running that? Yes. The table size was extended beyond what we have now. See attached file. -- Added file: https://bugs.python.org/file50526/comb_with_side_limits.py ___ Python tracker

[issue37295] Possible optimizations for math.comb()

2021-12-29 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: I think Raymond means extending the tables to TableSize=101. It can benefit larger arguments if move the code in perm_comb_small(). And perhaps loops in perm_comb_small() could be optimized by using precalculated values for some products. --

[issue37295] Possible optimizations for math.comb()

2021-12-29 Thread Tim Peters
Tim Peters added the comment: About: TableSize = 101 limits = bytearray(TableSize) for n in range(0, TableSize): for k in range(0, n+1): if comb(n, k) != comb_small(n, k): (and regardless of whether the last line is replaced with the later correction): Did

[issue37295] Possible optimizations for math.comb()

2021-12-29 Thread Raymond Hettinger
Raymond Hettinger added the comment: - if comb(n, k) != comb_small(n, k): + if comb(n, k) != comb_small(n, k) % Modulus: -- ___ Python tracker ___

[issue37295] Possible optimizations for math.comb()

2021-12-28 Thread Raymond Hettinger
Raymond Hettinger added the comment: Also, it would help Serhiy's divide and conquer algorithm if the fast cases included the sides of Pascal's triangle rather than just the top: if n < TableSize and k < limits[n]: return comb_small(n, k) return comb_slow(n, k) Build the table

[issue37295] Possible optimizations for math.comb()

2021-12-28 Thread Tim Peters
Tim Peters added the comment: A timing confounder: I see that CPython's _Py_popcount32() only tries to use the relevant blazing fast hardware instruction if defined(__clang__) || defined(__GNUC__). On Windows, it's a long-winded bit-fiddling dance. So which of xor-popcount and

[issue37295] Possible optimizations for math.comb()

2021-12-28 Thread Tim Peters
Tim Peters added the comment: Raymond, using the count of trailing zeroes instead is exactly what I suggested before, here: https://bugs.python.org/issue37295#msg409000 So Mark & I already batted that around. For whatever reasons he had, though, he stuck with the xor-popcount approach.

[issue37295] Possible optimizations for math.comb()

2021-12-28 Thread Raymond Hettinger
Raymond Hettinger added the comment: The shift table is an array of uint8_t, so it would be tiny (nearly fitting in one cache line). -- ___ Python tracker ___

[issue37295] Possible optimizations for math.comb()

2021-12-28 Thread Raymond Hettinger
Raymond Hettinger added the comment: It's a little late, but I had a thought that code could be made more general, slightly faster, and much easier to understand if the popcount logic were to be replaced with a table that records how many bits of the factorial were shifted out to make it

[issue37295] Possible optimizations for math.comb()

2021-12-28 Thread Mark Dickinson
Mark Dickinson added the comment: New changeset 02b5417f1107415abaf81acab7522f9aa84269ea by Mark Dickinson in branch 'main': bpo-37295: Speed up math.comb(n, k) for 0 <= k <= n <= 67 (GH-30275) https://github.com/python/cpython/commit/02b5417f1107415abaf81acab7522f9aa84269ea --

[issue37295] Possible optimizations for math.comb()

2021-12-27 Thread Mark Dickinson
Change by Mark Dickinson : -- pull_requests: +28490 pull_request: https://github.com/python/cpython/pull/30275 ___ Python tracker ___

[issue37295] Possible optimizations for math.comb()

2021-12-24 Thread Tim Peters
Tim Peters added the comment: If people are keen to speed comb() for larger arguments, the approach in Stefan's "comb_with_primes.py" is very much worth looking at. I've played with that idea before, and it runs circles around any other approach I've seen. The only divisions it does are of

[issue37295] Possible optimizations for math.comb()

2021-12-23 Thread Raymond Hettinger
Raymond Hettinger added the comment: [Mark] > Should I code up my suggestion in a PR, Yes, go for it. -- ___ Python tracker ___

[issue37295] Possible optimizations for math.comb()

2021-12-23 Thread Tim Peters
Tim Peters added the comment: Please don't use "long long". It usually introduces platform dependence where none is intended, or helpful. PEP 7 requires that the compiler in use supply the fixed-width integer types defined by C99's stdint.h[1]. These are: int8_t int16_t int32_t int64_t

[issue37295] Possible optimizations for math.comb()

2021-12-23 Thread Mark Dickinson
Mark Dickinson added the comment: Raymond: how do you want to proceed on this? Should I code up my suggestion in a PR, or are you already working on it? -- ___ Python tracker

[issue37295] Possible optimizations for math.comb()

2021-12-22 Thread Tim Peters
Tim Peters added the comment: No problem, Mark! I just prefer the shallowest approaches that are "good enough". If it's materially faster to use xors and a popcount instead, fine by me, just provided a comment points to a clue about why that works. BTW, the later xor version was clearer to

[issue37295] Possible optimizations for math.comb()

2021-12-22 Thread Mark Dickinson
Mark Dickinson added the comment: [Tim] > The justification for the shift count isn't self-evident, and > appears to me to be an instance of the generalization of Kummer's > theorem to multinomial coefficients. Not sure there's any generalisation here: I think it *is* just Kummer's theorem.

[issue37295] Possible optimizations for math.comb()

2021-12-22 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: long long is at least 64 bit, so we can safely use PyLong_FromLongLong() for int64_t. -- ___ Python tracker ___

[issue37295] Possible optimizations for math.comb()

2021-12-21 Thread Pablo Galindo Salgado
Change by Pablo Galindo Salgado : -- nosy: -pablogsal ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue37295] Possible optimizations for math.comb()

2021-12-21 Thread Tim Peters
Tim Peters added the comment: I see no use of 128-bit ints in the CPython core. Advice: forget it. int64_t and uint64_t are required by C99, and are used many places in the core. Advice: use freely. Note that if tables of "odd part mod 2**64" and "number of trailing zeroes" are used up

[issue37295] Possible optimizations for math.comb()

2021-12-21 Thread Raymond Hettinger
Raymond Hettinger added the comment: Am I correct in my understanding the 64 bits are always available, that 128 bit ints aren't universal, and that #ifdefs would be needed to extend the range of the table for systems that support it? -- ___

[issue37295] Possible optimizations for math.comb()

2021-12-21 Thread Raymond Hettinger
Raymond Hettinger added the comment: > Finv = [pow(fodd, -1, 2**64) for fodd in Fodd] This is a good trick. I had already experimented with separating factorials into an odd component and a shift count, but failed to get a speed-up because the divisions were slow. Having a table of

[issue37295] Possible optimizations for math.comb()

2021-12-21 Thread Tim Peters
Tim Peters added the comment: Clever, Mark! Very nice. The justification for the shift count isn't self-evident, and appears to me to be an instance of the generalization of Kummer's theorem to multinomial coefficients. I think it would be clearer at first sight to rely instead on that

[issue37295] Possible optimizations for math.comb()

2021-12-21 Thread Mark Dickinson
Mark Dickinson added the comment: That computation of the shift can be simplified to require only one popcount operation. With F and Finv as before: def comb_small(n, k): assert 0 <= k <= n <= Nmax return (F[n] * Finv[k] * Finv[n-k] % 2**64) << (k ^ n ^ (n-k)).bit_count()

[issue37295] Possible optimizations for math.comb()

2021-12-21 Thread Mark Dickinson
Mark Dickinson added the comment: One approach that avoids the use of floating-point arithmetic is to precompute the odd part of the factorial of n modulo 2**64, for all small n. If we also precompute the inverses, then three lookups and two 64x64->64 unsigned integer multiplications gets

[issue37295] Possible optimizations for math.comb()

2021-12-20 Thread Raymond Hettinger
Raymond Hettinger added the comment: > Just curious about why the focus on the newer exp2 and log2? No particular reason, those happened to give slighy best results on macOS. Across compilers, plain exp() is likely the most mature. The quality of log() is irrelevant because it isn't used.

[issue37295] Possible optimizations for math.comb()

2021-12-20 Thread Tim Peters
Tim Peters added the comment: Just curious about why the focus on the newer exp2 and log2? Logs are logs, and the "natural" `exp()` and `log()` should work just as well. On Windows, exp2() is particularly poor for now (I've seen dozens of cases of errors over 2 ulp, with exp2(x) very much

[issue37295] Possible optimizations for math.comb()

2021-12-20 Thread Raymond Hettinger
Change by Raymond Hettinger : -- Removed message: https://bugs.python.org/msg408963 ___ Python tracker ___ ___ Python-bugs-list

[issue37295] Possible optimizations for math.comb()

2021-12-20 Thread Raymond Hettinger
Raymond Hettinger added the comment: Stand by. I think I can implement this using only bit integer arithmetic. Will post tomorrow. -- ___ Python tracker ___

[issue37295] Possible optimizations for math.comb()

2021-12-20 Thread Raymond Hettinger
Raymond Hettinger added the comment: > The problem here is that C gives no guarantees about accuracy of either log2 > or exp2 * The input table has hard-wired constants so there is no dependency on log2(). The inputs can be as exact as pi, tau, and e. * The C library's exp2() function

[issue37295] Possible optimizations for math.comb()

2021-12-20 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: factorial(49) has 163 significant binary digits. It cannot be represented in floating-point without losses. And C functions log2() and exp2() can add additional errors, depending on platform. We rely on the assumption that all errors will be compensated,

[issue37295] Possible optimizations for math.comb()

2021-12-20 Thread Mark Dickinson
Mark Dickinson added the comment: > we can get faster code by using a small (3Kb) table of factorial logarithms The problem here is that C gives no guarantees about accuracy of either log2 or exp2, so we'd be playing a guessing game about how far we can go before the calculation becomes

[issue37295] Possible optimizations for math.comb()

2021-12-18 Thread Raymond Hettinger
Raymond Hettinger added the comment: For the small cases (say n < 50), we can get faster code by using a small (3Kb) table of factorial logarithms: double lf[50] = [log2(factorial(n)) for n in range(50)]; Then comb() and perm() function can be computed quickly and in constant time using

[issue37295] Possible optimizations for math.comb()

2021-12-18 Thread Raymond Hettinger
Change by Raymond Hettinger : -- Removed message: https://bugs.python.org/msg408865 ___ Python tracker ___ ___ Python-bugs-list

[issue37295] Possible optimizations for math.comb()

2021-12-18 Thread Raymond Hettinger
Raymond Hettinger added the comment: For the small cases (say n < 50), we can get faster code by using a small (3Kb) table of factorial logarithms: double lf[50] = [log2(factorial(n)) for n in range(50)]; Then comb() and perm() can be computed quickly and in constant time using the C99

[issue37295] Possible optimizations for math.comb()

2021-12-05 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: New changeset 60c320c38e4e95877cde0b1d8562ebd6bc02ac61 by Serhiy Storchaka in branch 'main': bpo-37295: Optimize math.comb() and math.perm() (GH-29090) https://github.com/python/cpython/commit/60c320c38e4e95877cde0b1d8562ebd6bc02ac61 --

[issue37295] Possible optimizations for math.comb()

2021-11-14 Thread Stefan Pochmann
Stefan Pochmann added the comment: Turns out for n=100_000, k=50_000, about 87% of my factors are 1, so they don't even need to be turned into Python ints for multiplication, improving the multiplication part to 3.05 ms. And a C++ version to produce the factors took 0.85 ms. Updated

[issue37295] Possible optimizations for math.comb()

2021-11-14 Thread Stefan Pochmann
Stefan Pochmann added the comment: And for Raymond's case 4), about running very long and not responding to SIGINT, with n=1_000_000 and k=500_000: 150.91 seconds math.comb(n, k) 39.11 seconds factorial(n) // (factorial(k) * factorial(n-k)) 0.40 seconds mycomb(n, k) 0.14 seconds

[issue37295] Possible optimizations for math.comb()

2021-11-14 Thread Stefan Pochmann
Stefan Pochmann added the comment: I wrote a Python solution ("mycomb") that computes comb(100_000, 50_000) faster, maybe of interest: 1510.4 ms math.comb(n, k) 460.8 ms factorial(n) // (factorial(k) * factorial(n-k)) 27.5 ms mycomb(n, k) 6.7 ms *estimation* for mycomb if written

[issue37295] Possible optimizations for math.comb()

2021-11-13 Thread Raymond Hettinger
Raymond Hettinger added the comment: These speedups all to be significant and worth doing. -- ___ Python tracker ___ ___

[issue37295] Possible optimizations for math.comb()

2021-11-12 Thread Serhiy Storchaka
Change by Serhiy Storchaka : -- versions: +Python 3.11 -Python 3.8, Python 3.9 ___ Python tracker ___ ___ Python-bugs-list mailing

[issue37295] Possible optimizations for math.comb()

2021-10-20 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: And with optimization of math.perm() for small arguments: $ ./python -m pyperf timeit -s 'from math import perm' 'perm(30, 14)' Mean +- std dev: 524 ns +- 43 ns -> 66.7 ns +- 4.6 ns: 7.85x faster $ ./python -m pyperf timeit -s 'from math import perm'

[issue37295] Possible optimizations for math.comb()

2021-10-20 Thread Serhiy Storchaka
Change by Serhiy Storchaka : -- pull_requests: +27356 pull_request: https://github.com/python/cpython/pull/29090 ___ Python tracker ___

[issue37295] Possible optimizations for math.comb()

2021-10-20 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Divide-and-conquer approach works pretty well for larger n. For results slightly out of the 64-bit range: $ ./python -m pyperf timeit -s 'from math import comb' 'comb(63, 31)' Mean +- std dev: 2.80 us +- 0.14 us -> 388 ns +- 19 ns: 7.22x faster $ ./python

[issue37295] Possible optimizations for math.comb()

2021-10-18 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Microbenchmarks: $ ./python -m pyperf timeit -s 'from math import comb' '[comb(n, k) for n in range(63) for k in range(n+1)]' Mean +- std dev: 1.57 ms +- 0.07 ms -> 209 us +- 11 us: 7.53x faster $ ./python -m pyperf timeit -s 'from math import comb'

[issue37295] Possible optimizations for math.comb()

2021-10-18 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Here is more optimized PR inspired by PR 29020. It would be too long to explain how PR 29020 can be improved, so I write a new PR. Basically it implements Raymond's idea #1, but supports n>62 for smaller k. How to calculate limits: import math n = m =

[issue37295] Possible optimizations for math.comb()

2021-10-18 Thread Serhiy Storchaka
Change by Serhiy Storchaka : -- pull_requests: +27302 pull_request: https://github.com/python/cpython/pull/29030 ___ Python tracker ___

[issue37295] Possible optimizations for math.comb()

2021-10-18 Thread Marco Cognetta
Change by Marco Cognetta : -- keywords: +patch nosy: +mcognetta nosy_count: 6.0 -> 7.0 pull_requests: +27293 stage: -> patch review pull_request: https://github.com/python/cpython/pull/29020 ___ Python tracker

[issue37295] Possible optimizations for math.comb()

2019-06-18 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: > Me, I'd wait for them to complain, and encourage _them_ to learn something > useful by writing a patch to fix it :-) +1! -- ___ Python tracker

[issue37295] Possible optimizations for math.comb()

2019-06-17 Thread Tim Peters
Tim Peters added the comment: In real life, I expect 99.999%+ of calls will be made with small arguments, so (1) is worth it. I like Mark's suggestion to use uint64_t so the acceptable range doesn't depend on platform. At least in the world I live in, 32-bit boxes are all but extinct

[issue37295] Possible optimizations for math.comb()

2019-06-17 Thread Raymond Hettinger
Raymond Hettinger added the comment: FWIW, here's a rough implementation of (3). It is limited to a specific range to avoid excess memory usage. For comb(50_000, 25_000), it gives a three-fold speedup: if (k_long > 20 && k_long > n_long / 3 && n_long <= 10) { PyObject *den

[issue37295] Possible optimizations for math.comb()

2019-06-16 Thread Boštjan Mejak
Boštjan Mejak added the comment: Performance improvements is what a beta build exists for in the first place. -- nosy: +PedanticHacker ___ Python tracker ___

[issue37295] Possible optimizations for math.comb()

2019-06-16 Thread Mark Dickinson
Mark Dickinson added the comment: (1), (4) and (5) sound good to me. For (1), it might make sense to ignore the 32-bit vs. 64-bit distinction and use `uint64_t` for the internal computations. Then we can do up to n = 62 regardless of platform. (2) feels like too much extra complication to

[issue37295] Possible optimizations for math.comb()

2019-06-16 Thread Raymond Hettinger
Raymond Hettinger added the comment: Optimizations 2 and 3 look something like this: bc75 = [comb(75, r) for r in range(75//2+1)] bc150 = [comb(150, r) for r in range(150//2+1)] bc225 = [comb(225, r) for r in range(225//2+1)] def comb(n, k): if n < 0 or k < 0: raise

[issue37295] Possible optimizations for math.comb()

2019-06-15 Thread Raymond Hettinger
New submission from Raymond Hettinger : The implementation of math.comb() is nice in that it limits the size of intermediate values as much as possible and it avoids unnecessary steps when min(k,n-k) is small with respect to k. There are some ways it can be made better or faster: 1) For