Re: 2-adic roots (Re: bdiv vs redc)

2012-07-30 Thread David Harvey

On Jul 31, 2012, at 7:42 AM, Niels Möller wrote:

 We currently have modular exponentation, powlo and regular powering
 with no reduction of any kind. I'm suggesting a pow_modbnm1. For
 euclidean square root, and for mpfr, it might also be useful with a
 pow_high, keeping only the n most significant limbs of each product, and
 returning the number of discarded low limbs. Another potential use is
 powm with moduli of special form, where the reduction can be done
 cheaper than with montgomery redc. Maybe the function could even be used
 for more complicated groups, e.g., if implementing elliptic curve
 operations on top of gmp. Or maybe it's easy enough to duplicate the
 code for each useful case, perhaps sharing some bit extraction macros in
 gmp-impl.h.

Related to this, you may also want to check out the paper Fast convolutions 
meet Montgomery by Mihailescu, I've always thought that would be an 
interesting algorithm to put in GMP, but maybe a bit tricky to implement.

david

___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel


Re: mpn_mulmod_bnm1

2014-04-03 Thread David Harvey

On 04/04/2014, at 4:28 AM, Niels Möller ni...@lysator.liu.se wrote:

 (1) the bit bounds for the coefficients get worse. For example if u =
 5 then f(x) = x^5 + x^4 + x + 1, so when you compute say a(x) b(x) mod
 f(x) (in Z[x]), every coefficient in the result is a sum of up to
 *five* terms from a(x) b(x). If say u = 63, you need an extra 6 bits.
 When you only work with x^n + 1, you never need extra bits like this.
 
 I guess you could mitigate this a bit by rounding u up to get a
 polynomial with reasonably low weight.

Yes, but doesn't this miss the whole point???

 (3) You probably need more specialised assembly code to handle the
 unusual low-level TFT primitives. (I haven't thought through this
 carefully.)
 
 My code implements TFT using calls to assembly routines doing full-size
 FFT:s. And I round the size up to a multiple of some reasonable power of
 two, so that the calls don't do very small transforms. I'm not *sure* this
 is the best way, but at least I think it's what Joris suggested when we
 discussed this at ICMS some years ago.

My understanding is that O(n log n) of the work gets passed down to full-size 
FFTs as you state above, but I'm talking here about the O(n) work, at the top 
layer, including e.g. all the cross-butterflies.

david

___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: Additional memory handler features.

2015-01-04 Thread David Harvey

On 5 Jan 2015, at 10:08 am, Niels Möller ni...@lysator.liu.se wrote:

 Of course there are also some drawbacks. It makes life more complicated
 for applications, and the implementation of functions like mpn_mul_itch,
 which interact with pretty complex algorithm choice machinery, is going
 to be a bit complex too.

Another downside is the additional overhead in the case of very small operands.

david

___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: New mulmod_bknp1

2022-02-22 Thread David Harvey
On Tue, 2022-02-22 at 23:23 +0100, Marco Bodrato wrote:
> Ciao David,
> 
> Il Mar, 22 Febbraio 2022 10:55 pm, David Harvey ha scritto:
> > On Tue, 2022-02-22 at 22:39 +0100, Marco Bodrato wrote:
> > > > E.g, in this case we could try a top-level B^66 - 1 product, split in
> > > > B^33+1 and B^33-1; then the former suits your new algorithm well, but
> > > > the former can't be recursively split (at least not on a B boundary).
> 
> > > I fully agree.
> 
> > What about
> > 
> > B^33 - 1 = (B^11 - 1)(B^22 + B^11 + 1)?
> 
> Actually, B is a number.
> 
> And if B is a power of 4 (there is an even number of bits in a limb, as
> usually happens, e.g. 32, or 64) then (B^n-1) and (B^2n+B^n+1) are not
> coprime, because both are divisible by 3. And we can not use CRT.

Yes good point.

But (N-1) and (N^2+N+1)/3 are relatively prime, where N = B^n, because

(N^2+N+1)/3 - (N-1) (N+2)/3 = 1.

So at least you can use CRT to get the answer mod (N^3-1)/3. Unfortunately
this is still not good enough, because N^3-1 is divisible by 9.

david


___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: New mulmod_bknp1

2022-02-22 Thread David Harvey
On Wed, 2022-02-23 at 09:53 +1100, David Harvey wrote:
> On Tue, 2022-02-22 at 23:23 +0100, Marco Bodrato wrote:
> > Ciao David,
> > 
> > Il Mar, 22 Febbraio 2022 10:55 pm, David Harvey ha scritto:
> > > On Tue, 2022-02-22 at 22:39 +0100, Marco Bodrato wrote:
> > > > > E.g, in this case we could try a top-level B^66 - 1 product, split
> > > > > in
> > > > > B^33+1 and B^33-1; then the former suits your new algorithm well,
> > > > > but
> > > > > the former can't be recursively split (at least not on a B
> > > > > boundary).
> > 
> > > > I fully agree.
> > 
> > > What about
> > > 
> > > B^33 - 1 = (B^11 - 1)(B^22 + B^11 + 1)?
> > 
> > Actually, B is a number.
> > 
> > And if B is a power of 4 (there is an even number of bits in a limb, as
> > usually happens, e.g. 32, or 64) then (B^n-1) and (B^2n+B^n+1) are not
> > coprime, because both are divisible by 3. And we can not use CRT.
> 
> Yes good point.
> 
> But (N-1) and (N^2+N+1)/3 are relatively prime, where N = B^n, because
> 
> (N^2+N+1)/3 - (N-1) (N+2)/3 = 1.
> 
> So at least you can use CRT to get the answer mod (N^3-1)/3. Unfortunately
> this is still not good enough, because N^3-1 is divisible by 9.

While N^2+N+1 is always divisible by 3, it is never divisible by 9. So maybe
you could use the factorisation

N^3 - 1 = (N^2+N+1)/3 * (3N-3).

I think (?) the two factors on the right are always coprime.

This would require a multiplication mod 3N-3 = 3 B^n - 3. If you are just
calling a general multiplication routine then this shouldn't be too much
trouble. But I guess it becomes more complicated if you want to use the idea
recursively.

david


___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: New mulmod_bknp1

2022-02-22 Thread David Harvey
On Tue, 2022-02-22 at 22:39 +0100, Marco Bodrato wrote:

> 
> > E.g, in this case we could try a top-level B^66 - 1 product, split in
> > B^33+1 and B^33-1; then the former suits your new algorithm well, but
> > the former can't be recursively split (at least not on a B boundary). If
> 
> I fully agree.

What about

B^33 - 1 = (B^11 - 1)(B^22 + B^11 + 1)?

Also, I suspect that some of these tricks are worth doing even if the
factorisations cross the B boundaries. The extra shifting overhead is only
linear.

david


___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel