Re: bdiv vs redc
ni...@lysator.liu.se (Niels Möller) writes: I've checked in my changes in a new repository, ssh://shell.gmplib.org//var/hg/gmp-proj/gmp-bdiv Patch also appended below. The code passes make check now. I just remember this old sub-project: Changing bdiv convention from U = Q D + R B^n (Q = U D^-1 mod B^n) to U + Q D = R B^n (Q = - U D^-1 mod B^n) to make it more consistent with redc. It would be nice to get this in. Do we have any 6.0.x repository yet? Changes like this shouldn't be included in any coming bug-fix release. Regards, /Niels diff -Nrc2 gmp-bdiv.b5ca16212198/ChangeLog gmp-bdiv.aa820fde2c64/ChangeLog *** gmp-bdiv.b5ca16212198/ChangeLog 2012-07-17 23:52:21.0 +0200 --- gmp-bdiv.aa820fde2c64/ChangeLog 2012-07-17 23:52:21.0 +0200 *** *** 1,2 --- 1,27 + 2012-07-17 Niels Möller ni...@lysator.liu.se + + * mpn/generic/divis.c (mpn_divisible_p): Updated the divisibility + test; bdiv now returns R = D rather than R = 0 when D divides a + non-zero U. + + * mpn/generic/binvert.c (mpn_binvert): Negate bdiv quotient. + * mpn/generic/divexact.c (mpn_divexact): Likewise. + * mpn/generic/remove.c (mpn_remove): Likewise. + * mpz/bin_uiui.c (mpz_bdiv_bin_uiui): Likewise. + + * tests/mpn/t-bdiv.c (check_one): Updated to new convention, + B^{qn} R = U + QD. + + * mpn/generic/sbpi1_bdiv_qr.c (mpn_sbpi1_bdiv_qr): Reimplemented, + for new bdiv convention. + * mpn/generic/sbpi1_bdiv_q.c (mpn_sbpi1_bdiv_q): Likewise. + * mpn/generic/dcpi1_bdiv_q.c (mpn_dcpi1_bdiv_q_n) + (mpn_dcpi1_bdiv_q): Adapted to new bdiv convention. + * mpn/generic/dcpi1_bdiv_qr.c (mpn_dcpi1_bdiv_qr_n) + (mpn_dcpi1_bdiv_qr): Likewise. + * mpn/generic/mu_bdiv_qr.c (mpn_mu_bdiv_qr): Adapted to new bdiv + convention, using a wrapper calling the old function. + * mpn/generic/mu_bdiv_q.c (mpn_mu_bdiv_q): Likewise. + 2012-06-26 Torbjorn Granlund t...@gmplib.org diff -Nrc2 gmp-bdiv.b5ca16212198/mpn/generic/binvert.c gmp-bdiv.aa820fde2c64/mpn/generic/binvert.c *** gmp-bdiv.b5ca16212198/mpn/generic/binvert.c 2012-07-17 23:52:21.0 +0200 --- gmp-bdiv.aa820fde2c64/mpn/generic/binvert.c 2012-07-17 23:52:21.0 +0200 *** *** 75,78 --- 75,80 mpn_dcpi1_bdiv_q (rp, xp, rn, up, rn, -di); + mpn_neg (rp, rp, rn); + /* Use Newton iterations to get the desired precision. */ for (; rn n; rn = newrn) diff -Nrc2 gmp-bdiv.b5ca16212198/mpn/generic/dcpi1_bdiv_q.c gmp-bdiv.aa820fde2c64/mpn/generic/dcpi1_bdiv_q.c *** gmp-bdiv.b5ca16212198/mpn/generic/dcpi1_bdiv_q.c 2012-07-17 23:52:21.0 +0200 --- gmp-bdiv.aa820fde2c64/mpn/generic/dcpi1_bdiv_q.c 2012-07-17 23:52:21.0 +0200 *** *** 36,40 } ! /* Computes Q = N / D mod B^n, destroys N. N = {np,n} --- 36,40 } ! /* Computes Q = - N / D mod B^n, destroys N. N = {np,n} *** *** 58,67 mpn_mullo_n (tp, qp, dp + hi, lo); ! mpn_sub_n (np + hi, np + hi, tp, lo); if (lo hi) { ! cy += mpn_submul_1 (np + lo, qp, lo, dp[lo]); ! np[n - 1] -= cy; } qp += lo; --- 58,67 mpn_mullo_n (tp, qp, dp + hi, lo); ! mpn_add_n (np + hi, np + hi, tp, lo); if (lo hi) { ! cy += mpn_addmul_1 (np + lo, qp, lo, dp[lo]); ! np[n - 1] += cy; } qp += lo; *** *** 72,76 } ! /* Computes Q = N / D mod B^nn, destroys N. N = {np,nn} --- 72,76 } ! /* Computes Q = - N / D mod B^nn, destroys N. N = {np,nn} *** *** 120,124 mpn_incr_u (tp + qn, cy); ! mpn_sub (np + qn, np + qn, nn - qn, tp, dn); cy = 0; } --- 120,124 mpn_incr_u (tp + qn, cy); ! mpn_add (np + qn, np + qn, nn - qn, tp, dn); cy = 0; } *** *** 130,134 while (qn dn) { ! mpn_sub_1 (np + dn, np + dn, qn - dn, cy); cy = mpn_dcpi1_bdiv_qr_n (qp, np, dp, dn, dinv, tp); qp += dn; --- 130,134 while (qn dn) { ! mpn_add_1 (np + dn, np + dn, qn - dn, cy); cy = mpn_dcpi1_bdiv_qr_n (qp, np, dp, dn, dinv, tp); qp += dn; diff -Nrc2 gmp-bdiv.b5ca16212198/mpn/generic/dcpi1_bdiv_qr.c gmp-bdiv.aa820fde2c64/mpn/generic/dcpi1_bdiv_qr.c *** gmp-bdiv.b5ca16212198/mpn/generic/dcpi1_bdiv_qr.c 2012-07-17 23:52:21.0 +0200 --- gmp-bdiv.aa820fde2c64/mpn/generic/dcpi1_bdiv_qr.c 2012-07-17 23:52:21.0 +0200 *** *** 33,42 Output: ! q = n * d^{-1} mod 2^{qn * GMP_NUMB_BITS}, ! r = (n - q * d) * 2^{-qn * GMP_NUMB_BITS} Stores q at qp. Stores the n least significant limbs of r at the high
Re: 2-adic roots (Re: bdiv vs redc)
Replying to myself yet again... For the power x^n, I currently use mpn_powlo. But the least significant half is known from the previous iteration, so wraparound would be desirable. To me it would make some sense with a pow function for (mod (2^k - 1)), i.e., using mpn_mulmod_bnm1 and mpn_sqrmod_bnm1. On second though, I don't think that would work. One would need to do the wraparound reconstruction for each square and multiply during the powering algorithm. And then it's not enough to store the low half of x^n from the previous iteration, one needs to store *all* the intermediate values from the earlier (mod B^k) powering. The subproblem here is: Compute x^n (mod B^{2k}), taking maximun advantage of a previous computation of x^n (mod B^k). I could well be missing some simpler trick. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel
Re: 2-adic roots (Re: bdiv vs redc)
ni...@lysator.liu.se (Niels Möller) writes: Correction: Each iteration gets from \ell to 2 \ell-2, and it needs 2 \ell-1 bits of precision for intermediate values. Yet another correction, after more work on the implementation. For numbers x = 1 (mod 8), there are four square roots mod 2^k. If x is one of them, the other three are x + 2^{k-1} -x -x + 2^{k-1} So the top bit doesn't matter. This implies that no extra bit is needed for temporary values: If we start with \ell bits, we can do the iteration using 2\ell-2 bits for all values. And it doesn't matter which value we get in the most significant bit after the final shift right, we get a square root (mod 2^{2\ell - 2}) in either case. And we can canonicalize the returned (mod 2^k) square root by saying that it must be = 1 (mod 4), and 2^{k-1} (i.e., most significant bit always zero). I'm attaching my current code for both square root and nth root. Appears to work fine. None of them yet use any wraparound tricks, but they do take some advantage from cancellation. In the nth-root code, I use the iteration x' -- x - x * (a^{n-1} x^n - 1) / n which converges to a^{1/n-1}, so the nth root is recovered as a * x. The number of correct bits is *exactly* doubled in each iteration, so I didn't see much point in using bit counts, instead I use a limb count for the desired precision. Perhaps this iteration could be useful in the euclidean case too, I haven't investigated that. The factor a^{n-1} is a loop invariant, so it can be computed (at the highest needed precision) before the loop. For the power x^n, I currently use mpn_powlo. But the least significant half is known from the previous iteration, so wraparound would be desirable. To me it would make some sense with a pow function for (mod (2^k - 1)), i.e., using mpn_mulmod_bnm1 and mpn_sqrmod_bnm1. Or is there any easier way to take advantage of wraparound for that computation? Maybe it would be a good idea to write a general or abstract pow function taking multiplication and squaring functions as arguments? We currently have modular exponentation, powlo and regular powering with no reduction of any kind. I'm suggesting a pow_modbnm1. For euclidean square root, and for mpfr, it might also be useful with a pow_high, keeping only the n most significant limbs of each product, and returning the number of discarded low limbs. Another potential use is powm with moduli of special form, where the reduction can be done cheaper than with montgomery redc. Maybe the function could even be used for more complicated groups, e.g., if implementing elliptic curve operations on top of gmp. Or maybe it's easy enough to duplicate the code for each useful case, perhaps sharing some bit extraction macros in gmp-impl.h. Regards, /Niels #include stdio.h #include stdlib.h #include gmp.h #include gmp-impl.h /* Computes a^e (mod B). Uses right-to-left binary algorithm, since typical use will have e small. */ static mp_limb_t powlimb (mp_limb_t a, mp_limb_t e) { mp_limb_t r = 1; mp_limb_t s = a; for (r = 1, s = a; e 0; e = 1, s *= s) if (e 1) r *= s; return r; } /* Computes the nth root of A, mod B^k. Both A and n must be odd. Uses the iteration x' -- x - x * (a^{n-1} x^n - 1) / n converging to a^{1/n - 1}. If a^{n-1} x^n = 1 (mod 2^\ell), then a^{n-1} x'^n = 1 (mod 2^{2\ell}), */ void mpn_broot (mp_ptr rp, mp_srcptr ap, mp_size_t an, mp_limb_t n) { mp_size_t sizes[GMP_LIMB_BITS * 2]; mp_ptr anm1, tp, xp, xnp, ep; mp_limb_t a0, r0, nm1, ninv; mp_size_t xn; unsigned i; TMP_DECL; ASSERT (an 0); ASSERT (ap[0] 1); ASSERT (n 1); ASSERT (n = 3); TMP_MARK; anm1 = TMP_ALLOC_LIMBS (4*an); tp = anm1 + an; nm1 = n-1; mpn_powlo (anm1, ap, nm1, 1, an, tp); /* 3 an scratch space */ a0 = ap[0]; binvert_limb (ninv, n); /* 4 bits: a^{1/n - 1} (mod 16): a mod 8 1 3 5 7 1 1 1 1 1 3 1 9 9 1 */ r0 = 1 + (((n 2) ((a0 1) ^ (a0 2))) 8); r0 = ninv * r0 * (n+1 - anm1[0] * powlimb (r0, n)); /* 8 bits */ r0 = ninv * r0 * (n+1 - anm1[0] * powlimb (r0, n)); /* 16 bits */ r0 = ninv * r0 * (n+1 - anm1[0] * powlimb (r0, n)); /* 32 bits */ #if GMP_NUMB_BITS 32 r0 = ninv * r0 * (n+1 - anm1[0] * powlimb (r0, n)); /* 64 bits */ #endif if (an == 1) { TMP_FREE; rp[0] = r0 * a0; return; } xp = TMP_ALLOC_LIMBS (3*an); xnp = xp + an; ep = xp + 2*an; /* FIXME: Possible to this on the fly with some bit fiddling. */ for (i = 0; an 1; an = (an + 1)/2) sizes[i++] = an; xp[0] = r0; xn = 1; while (i-- 0) { /* Compute x^n. What's the best way to handle the doubled precision? Could do the complete powering using wraparound. */ MPN_ZERO (xp + xn, sizes[i] - xn); mpn_powlo (xnp, xp, n, 1, sizes[i], tp); /* Multiply by a^{n-1}. Can use wraparound; low part is 000...01. */ mpn_mullo_n (ep,
Re: 2-adic roots (Re: bdiv vs redc)
On Jul 31, 2012, at 7:42 AM, Niels Möller wrote: We currently have modular exponentation, powlo and regular powering with no reduction of any kind. I'm suggesting a pow_modbnm1. For euclidean square root, and for mpfr, it might also be useful with a pow_high, keeping only the n most significant limbs of each product, and returning the number of discarded low limbs. Another potential use is powm with moduli of special form, where the reduction can be done cheaper than with montgomery redc. Maybe the function could even be used for more complicated groups, e.g., if implementing elliptic curve operations on top of gmp. Or maybe it's easy enough to duplicate the code for each useful case, perhaps sharing some bit extraction macros in gmp-impl.h. Related to this, you may also want to check out the paper Fast convolutions meet Montgomery by Mihailescu, I've always thought that would be an interesting algorithm to put in GMP, but maybe a bit tricky to implement. david ___ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel
Re: bdiv vs redc
ni...@lysator.liu.se (Niels Möller) writes: ni...@lysator.liu.se (Niels Möller) writes: Some other test cases fail, though: t-root, t-perfpow, t-divis, and t-cong. I'm not familiar with this code, I had a quick look at the root code but didn't see any obvious dependency on bdiv. Found it, it's the divisibility check in perfpow. With the old bdiv convention, we had that D divides U iff R == 0. With the new convention, the divisibility test is the more complicated cy == 0 (R == D || R == 0) where the R == 0 case can happen only if U == 0. So if it is known a priori that U != 0, then the test can be simplified to just R == D. Ah. This makes some sense. I also noted that there's some similarity between perfpow and remove, maybe we should have an mpn_remove_1, and use that from perfpow? And differences, remove.c uses mpn_bdiv_qr for divisibility tests (and keeps the quotient in case the division is exact), while perfpow calls mpn_divisible_p which in turn calls mpn_*_bdiv_qr. Yes, I am aware of these similarities (or at least, I *was* aware of it when we worked on the perfpow code). An mpn_remove_1 makes sense. We should also extract the Hensel nth root and nth square root from mpn/generic/perfpow.c, and put them in their own files. (Some typing cleanup might be asked for.) -- Torbjörn ___ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel
Re: bdiv vs redc
ni...@lysator.liu.se (Niels Möller) writes: How do you define hensel square root with remainder? Given a and n, if there exists an x such that x^2 = a (mod B^n), that seems like the reasonable definition of the square root. But what if no such x exists; where should we put the remainder in the equation? I don't think a remainder is meaningful here. -- Torbjörn ___ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel
Re: bdiv vs redc
Torbjorn Granlund t...@gmplib.org writes: I don't think a remainder is meaningful here. Ok, so the return value should be a proper root mod B^k (B = 2^{GMP_NUMB_BITS, as usual), or an indication that no root exists. When a square root exists (and the input is non-zero), then there are two square roots (if I remember correctly, prime powers behave like primes rather than general composites in this respect, for which there exists more than two square roots). Should we have some canonicalization? I haven't really looked into the perfpow code or theory, but let me think aloud... For perfpow, I guess all roots mod B^k must be considered as candidates for a non-modular root. So, say we want to find the positive nth root of a, and we know the size of the root (if it exists) must be k (or possibly k-1) limbs. Then for all modular nth roots x such that x^n = a (mod B^k) we can check if x^n = a, and if we have equality for one of the candidates, then a is a perfect power. I imagine most candidates can be excluded by computing the highest few limbs of the power. Is it easy to find all nth roots mod B^k, given one of them? If I got the number of roots correctly (and I guess we have to assume that n B^k...), then they differ by a primitve nth root of unity. Does there exist a primitive nth root of unity mod B^k for any n and k, and if so, is it easy to find? Ah, and one more thing... Since we're working mod a power of two, I imagine there may be some fundamental differences between odd and even n? Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel
Re: bdiv vs redc
ni...@lysator.liu.se (Niels Möller) writes: Some other test cases fail, though: t-root, t-perfpow, t-divis, and t-cong. I'm not familiar with this code, I had a quick look at the root code but didn't see any obvious dependency on bdiv. Found it, it's the divisibility check in perfpow. With the old bdiv convention, we had that D divides U iff R == 0. With the new convention, the divisibility test is the more complicated cy == 0 (R == D || R == 0) where the R == 0 case can happen only if U == 0. So if it is known a priori that U != 0, then the test can be simplified to just R == D. I also noted that there's some similarity between perfpow and remove, maybe we should have an mpn_remove_1, and use that from perfpow? And differences, remove.c uses mpn_bdiv_qr for divisibility tests (and keeps the quotient in case the division is exact), while perfpow calls mpn_divisible_p which in turn calls mpn_*_bdiv_qr. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel
Re: bdiv vs redc
Torbjorn Granlund t...@gmplib.org writes: What are your plans with these bdiv changes, do you intend to debug things, or are you hoping that I and Marco debug it? ;-) I don't think I have a real plan. But the most practical way forward I see is to 1. Adapt bdiv_mu to the new convention by writing a simple wrapper around the current code. 2. Iron out remaining problems in bdiv callers as well as any remaining bugs in dc and sb. 3. Possibly change the implementation of mu_bdiv. When (2) is done, we can consider pushing it to the main repo. Or do you seen any reasonable way to do the changes in smaller pieces? I'd like to work further on this, but it's not my top priority, so I'm only happy if you or Marco get to it before I do. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel
Re: bdiv vs redc
Torbjorn Granlund t...@gmplib.org writes: hi = mpn_addmul_1 (np, dp, dn, q) hi += cy cy = hi cy // can this be true? Unless something interesting can be inferred from the hypothesis previous iteration generated a carry, I'm pretty sure carry is possible here. An addmul_1 with size n gives a result (including the carry limb) of up to B^{n+1} - B, or B-1, B-1, ..., B-1, 0. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel
Re: bdiv vs redc
Torbjorn Granlund t...@gmplib.org writes: iterate nn-dn times q = np[0] * dinv hi = mpn_addmul_1 (np, dp, dn, q) hi += cy cy = hi cy // can this be true? hi += np[dn] cy2 += hi np[dn] np[dn] = hi np += 1; I wrote a possible start for mpn_sbpi1_bdiv_r using the above idea. It is attached. As currently writtem this is a pure generalisation of redc_1; the dividend and divisor have separate sizes. The remainder is of redc_1 type, with carry returned, not borrow. I suspect this will beat both previous styles of schoolbook Hensel division, in particular when written in assembly. sbpi1_bdiv_r.c Description: Binary data -- Torbjörn ___ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel