Re: [go-nuts] Any interest in nat.mulRange simplification/optimization?

2024-01-13 Thread Bakul Shah
Fair enough! I replied as I found a factor of 60 a bit surprising. Though I shouldn't have been -- my n digits of pi scheme program was consistently about 16-17 times slower than gmp based one (at least up to 1E9), using the same algorithm (Chudnovsky's), which is also highly dependent on

Re: [go-nuts] Any interest in nat.mulRange simplification/optimization?

2024-01-13 Thread Rob Pike
This is getting interesting but maybe only for me, so I'll stop here. I did some profiling and the majority of the time is spent in math/big.addMulVVW. Thinking I might be doing something stupid, I compared it with the Go implementation at

Re: [go-nuts] Any interest in nat.mulRange simplification/optimization?

2024-01-13 Thread Rob Pike
Oh, I did say my implementation was straightforward. It's free of any clever multiplication algorithms or mathematical delights. It could easily be giving up 10x or more for that reason alone. And I haven't even profiled it yet. -rob On Sat, Jan 13, 2024 at 7:04 PM Bakul Shah wrote: > FYI

Re: [go-nuts] Any interest in nat.mulRange simplification/optimization?

2024-01-13 Thread Bakul Shah
FYI Julia (on M1 MBP) seems much faster: julia> @which factorial(big(1)) factorial(x::BigInt) in Base.GMP at gmp.jl:645 julia> @time begin; factorial(big(1)); 1; end 27.849116 seconds (1.39 M allocations: 11.963 GiB, 0.22% gc time) Probably they use Schönhage-Strassen

Re: [go-nuts] Any interest in nat.mulRange simplification/optimization?

2024-01-12 Thread Rob Pike
Thanks for the tip. A fairly straightforward implementation of this algorithm gives me about a factor of two speedup for pretty much any value. I went up to 1e8!, which took about half an hour compared to nearly an hour for MulRange. I'll probably stick in ivy after a little more tuning. I may

Re: [go-nuts] Any interest in nat.mulRange simplification/optimization?

2024-01-09 Thread John Jannotti
I wrote the single allocation version and added it to the issue, but Bakul is absolutely right that the recursive solution has merit when the input is large enough. By building up large values on each "side" of the recursion, Karatsuba gets used for the larger multiplies and the recursive version

Re: [go-nuts] Any interest in nat.mulRange simplification/optimization?

2024-01-08 Thread Bakul Shah
For that you may wish to explore Peter Luschny's "prime swing" factorial algorithm and variations! https://oeis.org/A000142/a000142.pdf And implementations in various languages including go: https://github.com/PeterLuschny/Fast-Factorial-Functions > On Jan 8, 2024, at 9:22 PM, Rob Pike wrote:

Re: [go-nuts] Any interest in nat.mulRange simplification/optimization?

2024-01-08 Thread Rob Pike
Here's an example where it's the bottleneck: ivy factorial !1e7 1.20242340052e+65657059 )cpu 1m10s (1m10s user, 167.330ms sys) -rob On Tue, Jan 9, 2024 at 2:21 PM Bakul Shah wrote: > Perhaps you were thinking of this? > > At iteration number k, the value xk contains O(klog(k)) digits,

Re: [go-nuts] Any interest in nat.mulRange simplification/optimization?

2024-01-08 Thread Bakul Shah
Perhaps you were thinking of this? At iteration number k, the value xk contains O(klog(k)) digits, thus the computation of xk+1 = kxk has cost O(klog(k)). Finally, the total cost with this basic approach is O(2log(2)+¼+n log(n)) = O(n2log(n)). A better approach is the binary splitting : it just

Re: [go-nuts] Any interest in nat.mulRange simplification/optimization?

2024-01-08 Thread Robert Griesemer
The overflow fix is pending: CL 554617 I filed https://github.com/golang/go/issues/65027 for a possibly faster mulRange implementation. - gri On Mon, Jan 8, 2024 at 11:47 AM Robert Griesemer wrote: > Hello John; > > Thanks for your interest in this code. > > In a (long past) implementation of

Re: [go-nuts] Any interest in nat.mulRange simplification/optimization?

2024-01-08 Thread Robert Griesemer
Hello John; Thanks for your interest in this code. In a (long past) implementation of the factorial function, I noticed that computing a * (a+1) * (a+2) * ... (b-1) * b was much faster when computed in a recursive fashion than when computed iteratively: the reason (I believed) was that the

Re: [go-nuts] Any interest in nat.mulRange simplification/optimization?

2024-01-07 Thread John Jannotti
Actually, both implementations have bugs! The recursive implementation ends with: ``` m := (a + b) / 2 return z.mul(nat(nil).mulRange(a, m), nat(nil).mulRange(m+1, b)) ``` That's a bug whenever `(a+b)` overflows, making `m` small. FIX: `m := a + (b-a)/2` My iterative implementation went into an

Re: [go-nuts] Any interest in nat.mulRange simplification/optimization?

2024-01-07 Thread John Jannotti
I'm equally curious. FWIW, I realized the loop should perhaps be ``` mb := nat(nil).setUint64(b) // ensure mb starts big enough for b, even on 32-bit arch for m := a + 1; m <= b; m++ { mb.setUint64(m) z = z.mul(z, mb) } ``` to avoid allocating repeatedly for `m`, which yields:

Re: [go-nuts] Any interest in nat.mulRange simplification/optimization?

2024-01-06 Thread Rob Pike
It seems reasonable but first I'd like to understand why the recursive method is used. I can't deduce why, but the CL that adds it, by gri, does Karatsuba multiplication, which implies something deep is going on. I'll add him to the conversation. -rob On Sun, Jan 7, 2024 at 5:46 PM John

[go-nuts] Any interest in nat.mulRange simplification/optimization?

2024-01-06 Thread John Jannotti
I enjoy bignum implementations, so I was looking through nat.go and saw that `mulRange` is implemented in a surprising, recursive way,. In the non-base case, `mulRange(a, b)` returns `mulrange(a, (a+b)/2) * mulRange(1+(a+b)/2, b)` (lots of big.Int ceremony elided). That's fine, but I didn't