Re: [go-nuts] Any interest in nat.mulRange simplification/optimization?

2024-01-13 Thread Bakul Shah
Fair enough!

I replied as I found a factor of 60 a bit surprising. Though I shouldn't have 
been -- my n digits of pi scheme program was consistently about 16-17 times 
slower than gmp based one (at least up to 1E9), using the same algorithm 
(Chudnovsky's), which is also highly dependent on multiplication speed. And 
both (gambit-scheme and gmp) use faster algorithms for very large numbers. 
Adding these faster algorithms to math/big is probably not justifiable

Julia seems to use libgmp. "info gmp" reveals it implements Luchny's algorithm.

> On Jan 13, 2024, at 4:07 PM, Rob Pike  wrote:
> 
> This is getting interesting but maybe only for me, so I'll stop here.
> 
> I did some profiling and the majority of the time is spent in 
> math/big.addMulVVW. Thinking I might be doing something stupid, I compared it 
> with the Go implementation at 
> http://www.luschny.de/math/factorial/scala/FactorialScalaCsharp.htm, and 
> found that my version (although "straightforward"), is about the same amount 
> of code but without special casing and support tables, yet about 25% faster - 
> and yes, they get the same answer.
> 
> So I believe the factor of 60 comes from comparing an implementation in a 
> language and libraries designed for numerical computation against a much less 
> specialized and optimized world. Or perhaps from Julia using a different and 
> dramatically more efficient algorithm. It does seem like a big gap, but I am 
> no expert in this area.
> 
> Maybe worth investigating further but not by me.
> 
> -rob
> 
> On Sun, Jan 14, 2024 at 9:31 AM Rob Pike  > wrote:
>> Oh, I did say my implementation was straightforward. It's free of any clever 
>> multiplication algorithms or mathematical delights. It could easily be 
>> giving up 10x or more for that reason alone. And I haven't even profiled it 
>> yet.
>> 
>> -rob
>> 
>> 
>> On Sat, Jan 13, 2024 at 7:04 PM Bakul Shah > > wrote:
>>> FYI Julia (on M1 MBP) seems much faster:
>>> 
>>> julia> @which factorial(big(1))
>>> factorial(x::BigInt) in Base.GMP at gmp.jl:645
>>> 
>>> julia> @time begin; factorial(big(1)); 1; end
>>>  27.849116 seconds (1.39 M allocations: 11.963 GiB, 0.22% gc time)
>>> 
>>> Probably they use Schönhage-Strassen multiplication algorithm for very 
>>> large numbers as the 1E8! result will have over a 3/4 billion digits. I 
>>> should try this in Gambit-Scheme (which has an excellent multiply 
>>> implementation).
>>> 
 On Jan 12, 2024, at 9:32 PM, Rob Pike >>> > wrote:
 
 Thanks for the tip. A fairly straightforward implementation of this 
 algorithm gives me about a factor of two speedup for pretty much any 
 value. I went up to 1e8!, which took about half an hour compared to nearly 
 an hour for MulRange.
 
 I'll probably stick in ivy after a little more tuning. I may even try 
 parallelization.
 
 -rob
 
 
 On Tue, Jan 9, 2024 at 4:54 PM Bakul Shah >>> > wrote:
> For that you may wish to explore Peter Luschny's "prime swing" factorial 
> algorithm and variations!
> https://oeis.org/A000142/a000142.pdf
> 
> And implementations in various languages including go: 
> https://github.com/PeterLuschny/Fast-Factorial-Functions
> 
>> On Jan 8, 2024, at 9:22 PM, Rob Pike > > wrote:
>> 
>> Here's an example where it's the bottleneck: ivy factorial
>> 
>> 
>> !1e7
>> 1.20242340052e+65657059
>> 
>> )cpu
>> 1m10s (1m10s user, 167.330ms sys)
>> 
>> 
>> -rob
>> 
>> 
>> On Tue, Jan 9, 2024 at 2:21 PM Bakul Shah > > wrote:
>>> Perhaps you were thinking of this?
>>> 
>>> At iteration number k, the value xk contains O(klog(k)) digits, thus 
>>> the computation of xk+1 = kxk has cost O(klog(k)). Finally, the total 
>>> cost with this basic approach is O(2log(2)+¼+n log(n)) = O(n2log(n)).
>>> A better approach is the binary splitting : it just consists in 
>>> recursively cutting the product of m consecutive integers in half. It 
>>> leads to better results when products on large integers are performed 
>>> with a fast method.
>>> 
>>> http://numbers.computation.free.fr/Constants/Algorithms/splitting.html
>>> 
>>> I think you can do recursive splitting without using function recursion 
>>> by allocating N/2 array (where b = a+N-1) and iterating over it; each 
>>> time the array "shrinks" by half. A "cleverer" algorithm would allocate 
>>> an array of *words* of a bignum, as you know that the upper limit on 
>>> size is N*64 (for 64 bit numbers) so you can just reuse the same space 
>>> for each outer iteration (N/2 multiplie, N/4 ...) and apply Karatsuba 
>>> 2nd outer iteration onwards. Not sure if this is easy in Go.
>>> 
 On Jan 8, 

Re: [go-nuts] Any interest in nat.mulRange simplification/optimization?

2024-01-13 Thread Rob Pike
This is getting interesting but maybe only for me, so I'll stop here.

I did some profiling and the majority of the time is spent in
math/big.addMulVVW. Thinking I might be doing something stupid, I compared
it with the Go implementation at
http://www.luschny.de/math/factorial/scala/FactorialScalaCsharp.htm, and
found that my version (although "straightforward"), is about the same
amount of code but without special casing and support tables, yet about 25%
faster - and yes, they get the same answer.

So I believe the factor of 60 comes from comparing an implementation in a
language and libraries designed for numerical computation against a much
less specialized and optimized world. Or perhaps from Julia using a
different and dramatically more efficient algorithm. It does seem like a
big gap, but I am no expert in this area.

Maybe worth investigating further but not by me.

-rob

On Sun, Jan 14, 2024 at 9:31 AM Rob Pike  wrote:

> Oh, I did say my implementation was straightforward. It's free of any
> clever multiplication algorithms or mathematical delights. It could easily
> be giving up 10x or more for that reason alone. And I haven't even profiled
> it yet.
>
> -rob
>
>
> On Sat, Jan 13, 2024 at 7:04 PM Bakul Shah  wrote:
>
>> FYI Julia (on M1 MBP) seems much faster:
>>
>> julia> @which factorial(big(1))
>> factorial(x::BigInt) in Base.GMP at gmp.jl:645
>>
>> julia> @time begin; factorial(big(1)); 1; end
>>  27.849116 seconds (1.39 M allocations: 11.963 GiB, 0.22% gc time)
>>
>>
>> Probably they use Schönhage-Strassen multiplication algorithm for very
>> large numbers as the 1E8! result will have over a 3/4 billion digits. I
>> should try this in Gambit-Scheme (which has an excellent multiply
>> implementation).
>>
>> On Jan 12, 2024, at 9:32 PM, Rob Pike  wrote:
>>
>> Thanks for the tip. A fairly straightforward implementation of this
>> algorithm gives me about a factor of two speedup for pretty much any value.
>> I went up to 1e8!, which took about half an hour compared to nearly an hour
>> for MulRange.
>>
>> I'll probably stick in ivy after a little more tuning. I may even try
>> parallelization.
>>
>> -rob
>>
>>
>> On Tue, Jan 9, 2024 at 4:54 PM Bakul Shah  wrote:
>>
>>> For that you may wish to explore Peter Luschny's "prime swing" factorial
>>> algorithm and variations!
>>> https://oeis.org/A000142/a000142.pdf
>>>
>>> And implementations in various languages including go:
>>> https://github.com/PeterLuschny/Fast-Factorial-Functions
>>>
>>> On Jan 8, 2024, at 9:22 PM, Rob Pike  wrote:
>>>
>>> Here's an example where it's the bottleneck: ivy factorial
>>>
>>>
>>> !1e7
>>> 1.20242340052e+65657059
>>>
>>> )cpu
>>> 1m10s (1m10s user, 167.330ms sys)
>>>
>>>
>>> -rob
>>>
>>>
>>> On Tue, Jan 9, 2024 at 2:21 PM Bakul Shah  wrote:
>>>
 Perhaps you were thinking of this?

 At iteration number k, the value xk contains O(klog(k)) digits, thus
 the computation of xk+1 = kxk has cost O(klog(k)). Finally, the total
 cost with this basic approach is O(2log(2)+¼+n log(n)) = O(n2log(n)).

 A better approach is the *binary splitting* : it just consists in
 recursively cutting the product of m consecutive integers in half. It leads
 to better results when products on large integers are performed with a fast
 method.

 http://numbers.computation.free.fr/Constants/Algorithms/splitting.html


 I think you can do recursive splitting without using function recursion
 by allocating N/2 array (where b = a+N-1) and iterating over it; each time
 the array "shrinks" by half. A "cleverer" algorithm would allocate an array
 of *words* of a bignum, as you know that the upper limit on size is N*64
 (for 64 bit numbers) so you can just reuse the same space for each outer
 iteration (N/2 multiplie, N/4 ...) and apply Karatsuba 2nd outer iteration
 onwards. Not sure if this is easy in Go.

 On Jan 8, 2024, at 11:47 AM, Robert Griesemer  wrote:

 Hello John;

 Thanks for your interest in this code.

 In a (long past) implementation of the factorial function, I noticed
 that computing a * (a+1) * (a+2) * ... (b-1) * b was much faster when
 computed in a recursive fashion than when computed iteratively: the reason
 (I believed) was that the iterative approach seemed to produce a lot more
 "internal fragmentation", that is medium-size intermediate results where
 the most significant word (or "limb" as is the term in other
 implementations) is only marginally used, resulting in more work than
 necessary if those words were fully used.

 I never fully investigated, it was enough at the time that the
 recursive approach was much faster. In retrospect, I don't quite believe my
 own theory. Also, that implementation didn't have Karatsuba multiplication,
 it just used grade-school multiplication.

 Since a, b are uint64 values (words), this could probably 

Re: [go-nuts] Any interest in nat.mulRange simplification/optimization?

2024-01-13 Thread Rob Pike
Oh, I did say my implementation was straightforward. It's free of any
clever multiplication algorithms or mathematical delights. It could easily
be giving up 10x or more for that reason alone. And I haven't even profiled
it yet.

-rob


On Sat, Jan 13, 2024 at 7:04 PM Bakul Shah  wrote:

> FYI Julia (on M1 MBP) seems much faster:
>
> julia> @which factorial(big(1))
> factorial(x::BigInt) in Base.GMP at gmp.jl:645
>
> julia> @time begin; factorial(big(1)); 1; end
>  27.849116 seconds (1.39 M allocations: 11.963 GiB, 0.22% gc time)
>
>
> Probably they use Schönhage-Strassen multiplication algorithm for very
> large numbers as the 1E8! result will have over a 3/4 billion digits. I
> should try this in Gambit-Scheme (which has an excellent multiply
> implementation).
>
> On Jan 12, 2024, at 9:32 PM, Rob Pike  wrote:
>
> Thanks for the tip. A fairly straightforward implementation of this
> algorithm gives me about a factor of two speedup for pretty much any value.
> I went up to 1e8!, which took about half an hour compared to nearly an hour
> for MulRange.
>
> I'll probably stick in ivy after a little more tuning. I may even try
> parallelization.
>
> -rob
>
>
> On Tue, Jan 9, 2024 at 4:54 PM Bakul Shah  wrote:
>
>> For that you may wish to explore Peter Luschny's "prime swing" factorial
>> algorithm and variations!
>> https://oeis.org/A000142/a000142.pdf
>>
>> And implementations in various languages including go:
>> https://github.com/PeterLuschny/Fast-Factorial-Functions
>>
>> On Jan 8, 2024, at 9:22 PM, Rob Pike  wrote:
>>
>> Here's an example where it's the bottleneck: ivy factorial
>>
>>
>> !1e7
>> 1.20242340052e+65657059
>>
>> )cpu
>> 1m10s (1m10s user, 167.330ms sys)
>>
>>
>> -rob
>>
>>
>> On Tue, Jan 9, 2024 at 2:21 PM Bakul Shah  wrote:
>>
>>> Perhaps you were thinking of this?
>>>
>>> At iteration number k, the value xk contains O(klog(k)) digits, thus
>>> the computation of xk+1 = kxk has cost O(klog(k)). Finally, the total
>>> cost with this basic approach is O(2log(2)+¼+n log(n)) = O(n2log(n)).
>>>
>>> A better approach is the *binary splitting* : it just consists in
>>> recursively cutting the product of m consecutive integers in half. It leads
>>> to better results when products on large integers are performed with a fast
>>> method.
>>>
>>> http://numbers.computation.free.fr/Constants/Algorithms/splitting.html
>>>
>>>
>>> I think you can do recursive splitting without using function recursion
>>> by allocating N/2 array (where b = a+N-1) and iterating over it; each time
>>> the array "shrinks" by half. A "cleverer" algorithm would allocate an array
>>> of *words* of a bignum, as you know that the upper limit on size is N*64
>>> (for 64 bit numbers) so you can just reuse the same space for each outer
>>> iteration (N/2 multiplie, N/4 ...) and apply Karatsuba 2nd outer iteration
>>> onwards. Not sure if this is easy in Go.
>>>
>>> On Jan 8, 2024, at 11:47 AM, Robert Griesemer  wrote:
>>>
>>> Hello John;
>>>
>>> Thanks for your interest in this code.
>>>
>>> In a (long past) implementation of the factorial function, I noticed
>>> that computing a * (a+1) * (a+2) * ... (b-1) * b was much faster when
>>> computed in a recursive fashion than when computed iteratively: the reason
>>> (I believed) was that the iterative approach seemed to produce a lot more
>>> "internal fragmentation", that is medium-size intermediate results where
>>> the most significant word (or "limb" as is the term in other
>>> implementations) is only marginally used, resulting in more work than
>>> necessary if those words were fully used.
>>>
>>> I never fully investigated, it was enough at the time that the recursive
>>> approach was much faster. In retrospect, I don't quite believe my own
>>> theory. Also, that implementation didn't have Karatsuba multiplication, it
>>> just used grade-school multiplication.
>>>
>>> Since a, b are uint64 values (words), this could probably be implemented
>>> in terms of mulAddVWW directly, with a suitable initial allocation for the
>>> result - ideally this should just need one allocation (not sure how close
>>> we can get to the right size). That would cut down the allocations
>>> massively.
>>>
>>> In a next step, one should benchmark the implementation again.
>>>
>>> But at the very least, the overflow bug should be fixed, thanks for
>>> finding it! I will send out a CL to fix that today.
>>>
>>> Thanks,
>>> - gri
>>>
>>>
>>>
>>> On Sun, Jan 7, 2024 at 4:47 AM John Jannotti  wrote:
>>>
 Actually, both implementations have bugs!

 The recursive implementation ends with:
 ```
 m := (a + b) / 2
 return z.mul(nat(nil).mulRange(a, m), nat(nil).mulRange(m+1, b))
 ```

 That's a bug whenever `(a+b)` overflows, making `m` small.
 FIX: `m := a + (b-a)/2`

 My iterative implementation went into an infinite loop here:
 `for m := a + 1; m <= b; m++ {`
 if b is `math.MaxUint64`
 FIX: add `&& m > a` to the exit condition is an 

Re: [go-nuts] New type in generics

2024-01-13 Thread Daniel Theophanes
Thank you. Yeah, that seems overly complicated. I can allocate interior to 
the generic object, but I can't use pointer-receiver methods, which is fine 
in my case, I'll just adjust the interior manually.

Thank you.
On Saturday, January 13, 2024 at 1:50:17 PM UTC-6 Axel Wagner wrote:

> The way to do that is to add another level of indirection (as everything 
> in Software Engineering):
>
> type Namer[T any] interface {
> *T
> SetName(name string)
> }
> func WithName[T any, PT Namer[T]](name string) T {
> var v T
> PT().SetName(name)
> return v
> }
>
> I will say, though, that it's not unlikely that you'll be happier if you 
> don't do this and instead accept a plain interface value and let the caller 
> allocate the value and pass in a pointer. But if you want to do it, this is 
> the way.
>
> On Sat, Jan 13, 2024 at 8:10 PM Daniel Theophanes  
> wrote:
>
>> I have a situation where I would like to create a type, then set a 
>> property on it within a container. To set the type, I envisioned using a 
>> method "SetName" which would need to take a pointer receiver: (goplay share 
>> is down right now so I'll post inline:
>> type Namer interface {
>> SetName(name string)
>> }
>>
>> I wish to create a new Namer type as well, ideally without using reflect. 
>> If I use [*Ob ] as the type parameter, that works, but then `new(T)` 
>> returns `**Ob`, which I can deference to get `*Ob`, but then the value is 
>> nil (Ob isn't created).
>>
>> I'm working around this, but it surprised me, the interaction of a 
>> pointer receiver interface type constraint and then I can't create the 
>> desired type.
>>
>> ```
>> package main
>>
>> import "fmt"
>>
>> func main() {
>> ar := NewAppResult()
>> fmt.Printf("AR: %#v\n", *ar)
>> ar.Observation.Get("X1")
>> }
>>
>> type Ob struct {
>> Gene  string
>> Value string
>> }
>>
>> func (o *Ob) SetName(name string) {
>> // o is nil and this will panic.
>> o.Gene = name
>> }
>>
>> type Namer interface {
>> SetName(name string)
>> }
>>
>> type OrderedLookup[T Namer] struct {
>> List   []T
>> lookup map[string]T
>> }
>>
>> func (ol *OrderedLookup[T]) Get(name string) T {
>> v, ok := ol.lookup[name]
>> if !ok {
>> var v T // T is a pointer, new(T) creates **Ob, but I cant use generic 
>> type of [Ob] because then Namer
>> v.SetName(name)
>> ol.lookup[name] = v
>> ol.List = append(ol.List, v)
>> }
>> return v
>> }
>>
>> type AppResult struct {
>> Observation *OrderedLookup[*Ob]
>> }
>>
>> func NewAppResult() *AppResult {
>> return {
>> Observation: [*Ob]{},
>> }
>> }
>> ```
>>
>>
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "golang-nuts" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to golang-nuts...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/golang-nuts/8d0c816c-c332-4b44-87e3-9259ad173afcn%40googlegroups.com
>>  
>> 
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/5be36388-7bdf-4baa-8fba-15330a42e409n%40googlegroups.com.


Re: [go-nuts] New type in generics

2024-01-13 Thread 'Axel Wagner' via golang-nuts
The way to do that is to add another level of indirection (as everything in
Software Engineering):

type Namer[T any] interface {
*T
SetName(name string)
}
func WithName[T any, PT Namer[T]](name string) T {
var v T
PT().SetName(name)
return v
}

I will say, though, that it's not unlikely that you'll be happier if you
don't do this and instead accept a plain interface value and let the caller
allocate the value and pass in a pointer. But if you want to do it, this is
the way.

On Sat, Jan 13, 2024 at 8:10 PM Daniel Theophanes 
wrote:

> I have a situation where I would like to create a type, then set a
> property on it within a container. To set the type, I envisioned using a
> method "SetName" which would need to take a pointer receiver: (goplay share
> is down right now so I'll post inline:
> type Namer interface {
> SetName(name string)
> }
>
> I wish to create a new Namer type as well, ideally without using reflect.
> If I use [*Ob ] as the type parameter, that works, but then `new(T)`
> returns `**Ob`, which I can deference to get `*Ob`, but then the value is
> nil (Ob isn't created).
>
> I'm working around this, but it surprised me, the interaction of a pointer
> receiver interface type constraint and then I can't create the desired type.
>
> ```
> package main
>
> import "fmt"
>
> func main() {
> ar := NewAppResult()
> fmt.Printf("AR: %#v\n", *ar)
> ar.Observation.Get("X1")
> }
>
> type Ob struct {
> Gene  string
> Value string
> }
>
> func (o *Ob) SetName(name string) {
> // o is nil and this will panic.
> o.Gene = name
> }
>
> type Namer interface {
> SetName(name string)
> }
>
> type OrderedLookup[T Namer] struct {
> List   []T
> lookup map[string]T
> }
>
> func (ol *OrderedLookup[T]) Get(name string) T {
> v, ok := ol.lookup[name]
> if !ok {
> var v T // T is a pointer, new(T) creates **Ob, but I cant use generic
> type of [Ob] because then Namer
> v.SetName(name)
> ol.lookup[name] = v
> ol.List = append(ol.List, v)
> }
> return v
> }
>
> type AppResult struct {
> Observation *OrderedLookup[*Ob]
> }
>
> func NewAppResult() *AppResult {
> return {
> Observation: [*Ob]{},
> }
> }
> ```
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-nuts+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/golang-nuts/8d0c816c-c332-4b44-87e3-9259ad173afcn%40googlegroups.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CAEkBMfH69TQg8LndigToQyPtbmujkQr9w8Q_i80Uix5fkW2m4Q%40mail.gmail.com.


[go-nuts] New type in generics

2024-01-13 Thread Daniel Theophanes
I have a situation where I would like to create a type, then set a property 
on it within a container. To set the type, I envisioned using a method 
"SetName" which would need to take a pointer receiver: (goplay share is 
down right now so I'll post inline:
type Namer interface {
SetName(name string)
}

I wish to create a new Namer type as well, ideally without using reflect. 
If I use [*Ob ] as the type parameter, that works, but then `new(T)` 
returns `**Ob`, which I can deference to get `*Ob`, but then the value is 
nil (Ob isn't created).

I'm working around this, but it surprised me, the interaction of a pointer 
receiver interface type constraint and then I can't create the desired type.

```
package main

import "fmt"

func main() {
ar := NewAppResult()
fmt.Printf("AR: %#v\n", *ar)
ar.Observation.Get("X1")
}

type Ob struct {
Gene  string
Value string
}

func (o *Ob) SetName(name string) {
// o is nil and this will panic.
o.Gene = name
}

type Namer interface {
SetName(name string)
}

type OrderedLookup[T Namer] struct {
List   []T
lookup map[string]T
}

func (ol *OrderedLookup[T]) Get(name string) T {
v, ok := ol.lookup[name]
if !ok {
var v T // T is a pointer, new(T) creates **Ob, but I cant use generic type 
of [Ob] because then Namer
v.SetName(name)
ol.lookup[name] = v
ol.List = append(ol.List, v)
}
return v
}

type AppResult struct {
Observation *OrderedLookup[*Ob]
}

func NewAppResult() *AppResult {
return {
Observation: [*Ob]{},
}
}
```



-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/8d0c816c-c332-4b44-87e3-9259ad173afcn%40googlegroups.com.


[go-nuts] [NOTE] GOPL Type Theory

2024-01-13 Thread John Pritchard
Hi,

It occurs to me that GOPL type theory has a distinct benefit from its
constraint to the membership relation.  The external derivation of type
semantics has particular constraint.

Tools parsing GOPL expressions and systems are more readily capable of
reproducing type semantics.

The benefit of concise semantics to the universe external to GOPL "native"
interpretation may be represented as service and opportunity.

Meanwhile, the benefit of concise type semantics to the universe internal
to GOPL remains to be determined.


Best,

John


ps.  Interested in references to type theory conception, definition, and
review.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CAD6mcO-G1TwYJ4kdQh571Nu-VmqJPPqxASZzqgW3VaE6nA0TQA%40mail.gmail.com.


Re: [go-nuts] Any interest in nat.mulRange simplification/optimization?

2024-01-13 Thread Bakul Shah
FYI Julia (on M1 MBP) seems much faster:

julia> @which factorial(big(1))
factorial(x::BigInt) in Base.GMP at gmp.jl:645

julia> @time begin; factorial(big(1)); 1; end
 27.849116 seconds (1.39 M allocations: 11.963 GiB, 0.22% gc time)

Probably they use Schönhage-Strassen multiplication algorithm for very large 
numbers as the 1E8! result will have over a 3/4 billion digits. I should try 
this in Gambit-Scheme (which has an excellent multiply implementation).

> On Jan 12, 2024, at 9:32 PM, Rob Pike  wrote:
> 
> Thanks for the tip. A fairly straightforward implementation of this algorithm 
> gives me about a factor of two speedup for pretty much any value. I went up 
> to 1e8!, which took about half an hour compared to nearly an hour for 
> MulRange.
> 
> I'll probably stick in ivy after a little more tuning. I may even try 
> parallelization.
> 
> -rob
> 
> 
> On Tue, Jan 9, 2024 at 4:54 PM Bakul Shah  > wrote:
>> For that you may wish to explore Peter Luschny's "prime swing" factorial 
>> algorithm and variations!
>> https://oeis.org/A000142/a000142.pdf
>> 
>> And implementations in various languages including go: 
>> https://github.com/PeterLuschny/Fast-Factorial-Functions
>> 
>>> On Jan 8, 2024, at 9:22 PM, Rob Pike >> > wrote:
>>> 
>>> Here's an example where it's the bottleneck: ivy factorial
>>> 
>>> 
>>> !1e7
>>> 1.20242340052e+65657059
>>> 
>>> )cpu
>>> 1m10s (1m10s user, 167.330ms sys)
>>> 
>>> 
>>> -rob
>>> 
>>> 
>>> On Tue, Jan 9, 2024 at 2:21 PM Bakul Shah >> > wrote:
 Perhaps you were thinking of this?
 
 At iteration number k, the value xk contains O(klog(k)) digits, thus the 
 computation of xk+1 = kxk has cost O(klog(k)). Finally, the total cost 
 with this basic approach is O(2log(2)+¼+n log(n)) = O(n2log(n)).
 A better approach is the binary splitting : it just consists in 
 recursively cutting the product of m consecutive integers in half. It 
 leads to better results when products on large integers are performed with 
 a fast method.
 
 http://numbers.computation.free.fr/Constants/Algorithms/splitting.html
 
 I think you can do recursive splitting without using function recursion by 
 allocating N/2 array (where b = a+N-1) and iterating over it; each time 
 the array "shrinks" by half. A "cleverer" algorithm would allocate an 
 array of *words* of a bignum, as you know that the upper limit on size is 
 N*64 (for 64 bit numbers) so you can just reuse the same space for each 
 outer iteration (N/2 multiplie, N/4 ...) and apply Karatsuba 2nd outer 
 iteration onwards. Not sure if this is easy in Go.
 
> On Jan 8, 2024, at 11:47 AM, Robert Griesemer  > wrote:
> 
> Hello John;
> 
> Thanks for your interest in this code.
> 
> In a (long past) implementation of the factorial function, I noticed that 
> computing a * (a+1) * (a+2) * ... (b-1) * b was much faster when computed 
> in a recursive fashion than when computed iteratively: the reason (I 
> believed) was that the iterative approach seemed to produce a lot more 
> "internal fragmentation", that is medium-size intermediate results where 
> the most significant word (or "limb" as is the term in other 
> implementations) is only marginally used, resulting in more work than 
> necessary if those words were fully used.
> 
> I never fully investigated, it was enough at the time that the recursive 
> approach was much faster. In retrospect, I don't quite believe my own 
> theory. Also, that implementation didn't have Karatsuba multiplication, 
> it just used grade-school multiplication.
> 
> Since a, b are uint64 values (words), this could probably be implemented 
> in terms of mulAddVWW directly, with a suitable initial allocation for 
> the result - ideally this should just need one allocation (not sure how 
> close we can get to the right size). That would cut down the allocations 
> massively.
> 
> In a next step, one should benchmark the implementation again.
> 
> But at the very least, the overflow bug should be fixed, thanks for 
> finding it! I will send out a CL to fix that today.
> 
> Thanks,
> - gri
> 
> 
> 
> On Sun, Jan 7, 2024 at 4:47 AM John Jannotti  > wrote:
>> Actually, both implementations have bugs!
>> 
>> The recursive implementation ends with:
>> ```
>> m := (a + b) / 2
>> return z.mul(nat(nil).mulRange(a, m), nat(nil).mulRange(m+1, b))
>> ```
>> 
>> That's a bug whenever `(a+b)` overflows, making `m` small. 
>> FIX: `m := a + (b-a)/2`
>> 
>> My iterative implementation went into an infinite loop here:
>> `for m := a + 1; m <= b; m++ {`
>> if b is `math.MaxUint64`
>> FIX: add `&& m >