Re: 2-adic roots (Re: bdiv vs redc)
On Jul 31, 2012, at 7:42 AM, Niels Möller wrote: We currently have modular exponentation, powlo and regular powering with no reduction of any kind. I'm suggesting a pow_modbnm1. For euclidean square root, and for mpfr, it might also be useful with a pow_high, keeping only the n most significant limbs of each product, and returning the number of discarded low limbs. Another potential use is powm with moduli of special form, where the reduction can be done cheaper than with montgomery redc. Maybe the function could even be used for more complicated groups, e.g., if implementing elliptic curve operations on top of gmp. Or maybe it's easy enough to duplicate the code for each useful case, perhaps sharing some bit extraction macros in gmp-impl.h. Related to this, you may also want to check out the paper Fast convolutions meet Montgomery by Mihailescu, I've always thought that would be an interesting algorithm to put in GMP, but maybe a bit tricky to implement. david ___ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel
Re: mpn_mulmod_bnm1
On 04/04/2014, at 4:28 AM, Niels Möller ni...@lysator.liu.se wrote: (1) the bit bounds for the coefficients get worse. For example if u = 5 then f(x) = x^5 + x^4 + x + 1, so when you compute say a(x) b(x) mod f(x) (in Z[x]), every coefficient in the result is a sum of up to *five* terms from a(x) b(x). If say u = 63, you need an extra 6 bits. When you only work with x^n + 1, you never need extra bits like this. I guess you could mitigate this a bit by rounding u up to get a polynomial with reasonably low weight. Yes, but doesn't this miss the whole point??? (3) You probably need more specialised assembly code to handle the unusual low-level TFT primitives. (I haven't thought through this carefully.) My code implements TFT using calls to assembly routines doing full-size FFT:s. And I round the size up to a multiple of some reasonable power of two, so that the calls don't do very small transforms. I'm not *sure* this is the best way, but at least I think it's what Joris suggested when we discussed this at ICMS some years ago. My understanding is that O(n log n) of the work gets passed down to full-size FFTs as you state above, but I'm talking here about the O(n) work, at the top layer, including e.g. all the cross-butterflies. david ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Additional memory handler features.
On 5 Jan 2015, at 10:08 am, Niels Möller ni...@lysator.liu.se wrote: Of course there are also some drawbacks. It makes life more complicated for applications, and the implementation of functions like mpn_mul_itch, which interact with pretty complex algorithm choice machinery, is going to be a bit complex too. Another downside is the additional overhead in the case of very small operands. david ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: New mulmod_bknp1
On Tue, 2022-02-22 at 23:23 +0100, Marco Bodrato wrote: > Ciao David, > > Il Mar, 22 Febbraio 2022 10:55 pm, David Harvey ha scritto: > > On Tue, 2022-02-22 at 22:39 +0100, Marco Bodrato wrote: > > > > E.g, in this case we could try a top-level B^66 - 1 product, split in > > > > B^33+1 and B^33-1; then the former suits your new algorithm well, but > > > > the former can't be recursively split (at least not on a B boundary). > > > > I fully agree. > > > What about > > > > B^33 - 1 = (B^11 - 1)(B^22 + B^11 + 1)? > > Actually, B is a number. > > And if B is a power of 4 (there is an even number of bits in a limb, as > usually happens, e.g. 32, or 64) then (B^n-1) and (B^2n+B^n+1) are not > coprime, because both are divisible by 3. And we can not use CRT. Yes good point. But (N-1) and (N^2+N+1)/3 are relatively prime, where N = B^n, because (N^2+N+1)/3 - (N-1) (N+2)/3 = 1. So at least you can use CRT to get the answer mod (N^3-1)/3. Unfortunately this is still not good enough, because N^3-1 is divisible by 9. david ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: New mulmod_bknp1
On Wed, 2022-02-23 at 09:53 +1100, David Harvey wrote: > On Tue, 2022-02-22 at 23:23 +0100, Marco Bodrato wrote: > > Ciao David, > > > > Il Mar, 22 Febbraio 2022 10:55 pm, David Harvey ha scritto: > > > On Tue, 2022-02-22 at 22:39 +0100, Marco Bodrato wrote: > > > > > E.g, in this case we could try a top-level B^66 - 1 product, split > > > > > in > > > > > B^33+1 and B^33-1; then the former suits your new algorithm well, > > > > > but > > > > > the former can't be recursively split (at least not on a B > > > > > boundary). > > > > > > I fully agree. > > > > > What about > > > > > > B^33 - 1 = (B^11 - 1)(B^22 + B^11 + 1)? > > > > Actually, B is a number. > > > > And if B is a power of 4 (there is an even number of bits in a limb, as > > usually happens, e.g. 32, or 64) then (B^n-1) and (B^2n+B^n+1) are not > > coprime, because both are divisible by 3. And we can not use CRT. > > Yes good point. > > But (N-1) and (N^2+N+1)/3 are relatively prime, where N = B^n, because > > (N^2+N+1)/3 - (N-1) (N+2)/3 = 1. > > So at least you can use CRT to get the answer mod (N^3-1)/3. Unfortunately > this is still not good enough, because N^3-1 is divisible by 9. While N^2+N+1 is always divisible by 3, it is never divisible by 9. So maybe you could use the factorisation N^3 - 1 = (N^2+N+1)/3 * (3N-3). I think (?) the two factors on the right are always coprime. This would require a multiplication mod 3N-3 = 3 B^n - 3. If you are just calling a general multiplication routine then this shouldn't be too much trouble. But I guess it becomes more complicated if you want to use the idea recursively. david ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: New mulmod_bknp1
On Tue, 2022-02-22 at 22:39 +0100, Marco Bodrato wrote: > > > E.g, in this case we could try a top-level B^66 - 1 product, split in > > B^33+1 and B^33-1; then the former suits your new algorithm well, but > > the former can't be recursively split (at least not on a B boundary). If > > I fully agree. What about B^33 - 1 = (B^11 - 1)(B^22 + B^11 + 1)? Also, I suspect that some of these tricks are worth doing even if the factorisations cross the B boundaries. The extra shifting overhead is only linear. david ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel