Re: bdiv vs redc

2014-04-11 Thread Niels Möller
ni...@lysator.liu.se (Niels Möller) writes:

 I've checked in my changes in a new repository,

   ssh://shell.gmplib.org//var/hg/gmp-proj/gmp-bdiv

 Patch also appended below. The code passes make check now.

I just remember this old sub-project: Changing bdiv convention from

  U = Q D + R B^n  (Q = U D^-1 mod B^n)

to

  U + Q D = R B^n  (Q = - U D^-1 mod B^n)

to make it more consistent with redc. It would be nice to get this in.
Do we have any 6.0.x repository yet? Changes like this shouldn't be
included in any coming bug-fix release.

Regards,
/Niels

 diff -Nrc2 gmp-bdiv.b5ca16212198/ChangeLog gmp-bdiv.aa820fde2c64/ChangeLog
 *** gmp-bdiv.b5ca16212198/ChangeLog   2012-07-17 23:52:21.0 +0200
 --- gmp-bdiv.aa820fde2c64/ChangeLog   2012-07-17 23:52:21.0 +0200
 ***
 *** 1,2 
 --- 1,27 
 + 2012-07-17  Niels Möller  ni...@lysator.liu.se
 + 
 + * mpn/generic/divis.c (mpn_divisible_p): Updated the divisibility
 + test; bdiv now returns R = D rather than R = 0 when D divides a
 + non-zero U.
 + 
 + * mpn/generic/binvert.c (mpn_binvert): Negate bdiv quotient.
 + * mpn/generic/divexact.c (mpn_divexact): Likewise.
 + * mpn/generic/remove.c (mpn_remove): Likewise.
 + * mpz/bin_uiui.c (mpz_bdiv_bin_uiui): Likewise.
 + 
 + * tests/mpn/t-bdiv.c (check_one): Updated to new convention,
 + B^{qn} R = U + QD.
 + 
 + * mpn/generic/sbpi1_bdiv_qr.c (mpn_sbpi1_bdiv_qr): Reimplemented,
 + for new bdiv convention.
 + * mpn/generic/sbpi1_bdiv_q.c (mpn_sbpi1_bdiv_q): Likewise.
 + * mpn/generic/dcpi1_bdiv_q.c (mpn_dcpi1_bdiv_q_n)
 + (mpn_dcpi1_bdiv_q): Adapted to new bdiv convention.
 + * mpn/generic/dcpi1_bdiv_qr.c (mpn_dcpi1_bdiv_qr_n)
 + (mpn_dcpi1_bdiv_qr): Likewise.
 + * mpn/generic/mu_bdiv_qr.c (mpn_mu_bdiv_qr): Adapted to new bdiv
 + convention, using a wrapper calling the old function.
 + * mpn/generic/mu_bdiv_q.c (mpn_mu_bdiv_q): Likewise.
 + 
   2012-06-26  Torbjorn Granlund  t...@gmplib.org
   
 diff -Nrc2 gmp-bdiv.b5ca16212198/mpn/generic/binvert.c 
 gmp-bdiv.aa820fde2c64/mpn/generic/binvert.c
 *** gmp-bdiv.b5ca16212198/mpn/generic/binvert.c   2012-07-17 
 23:52:21.0 +0200
 --- gmp-bdiv.aa820fde2c64/mpn/generic/binvert.c   2012-07-17 
 23:52:21.0 +0200
 ***
 *** 75,78 
 --- 75,80 
   mpn_dcpi1_bdiv_q (rp, xp, rn, up, rn, -di);
   
 +   mpn_neg (rp, rp, rn);
 + 
 /* Use Newton iterations to get the desired precision.  */
 for (; rn  n; rn = newrn)
 diff -Nrc2 gmp-bdiv.b5ca16212198/mpn/generic/dcpi1_bdiv_q.c 
 gmp-bdiv.aa820fde2c64/mpn/generic/dcpi1_bdiv_q.c
 *** gmp-bdiv.b5ca16212198/mpn/generic/dcpi1_bdiv_q.c  2012-07-17 
 23:52:21.0 +0200
 --- gmp-bdiv.aa820fde2c64/mpn/generic/dcpi1_bdiv_q.c  2012-07-17 
 23:52:21.0 +0200
 ***
 *** 36,40 
   }
   
 ! /* Computes Q = N / D mod B^n, destroys N.
   
  N = {np,n}
 --- 36,40 
   }
   
 ! /* Computes Q = - N / D mod B^n, destroys N.
   
  N = {np,n}
 ***
 *** 58,67 
   
 mpn_mullo_n (tp, qp, dp + hi, lo);
 !   mpn_sub_n (np + hi, np + hi, tp, lo);
   
 if (lo  hi)
   {
 !   cy += mpn_submul_1 (np + lo, qp, lo, dp[lo]);
 !   np[n - 1] -= cy;
   }
 qp += lo;
 --- 58,67 
   
 mpn_mullo_n (tp, qp, dp + hi, lo);
 !   mpn_add_n (np + hi, np + hi, tp, lo);
   
 if (lo  hi)
   {
 !   cy += mpn_addmul_1 (np + lo, qp, lo, dp[lo]);
 !   np[n - 1] += cy;
   }
 qp += lo;
 ***
 *** 72,76 
   }
   
 ! /* Computes Q = N / D mod B^nn, destroys N.
   
  N = {np,nn}
 --- 72,76 
   }
   
 ! /* Computes Q = - N / D mod B^nn, destroys N.
   
  N = {np,nn}
 ***
 *** 120,124 
 mpn_incr_u (tp + qn, cy);
   
 !   mpn_sub (np + qn, np + qn, nn - qn, tp, dn);
 cy = 0;
   }
 --- 120,124 
 mpn_incr_u (tp + qn, cy);
   
 !   mpn_add (np + qn, np + qn, nn - qn, tp, dn);
 cy = 0;
   }
 ***
 *** 130,134 
 while (qn  dn)
   {
 !   mpn_sub_1 (np + dn, np + dn, qn - dn, cy);
 cy = mpn_dcpi1_bdiv_qr_n (qp, np, dp, dn, dinv, tp);
 qp += dn;
 --- 130,134 
 while (qn  dn)
   {
 !   mpn_add_1 (np + dn, np + dn, qn - dn, cy);
 cy = mpn_dcpi1_bdiv_qr_n (qp, np, dp, dn, dinv, tp);
 qp += dn;
 diff -Nrc2 gmp-bdiv.b5ca16212198/mpn/generic/dcpi1_bdiv_qr.c 
 gmp-bdiv.aa820fde2c64/mpn/generic/dcpi1_bdiv_qr.c
 *** gmp-bdiv.b5ca16212198/mpn/generic/dcpi1_bdiv_qr.c 2012-07-17 
 23:52:21.0 +0200
 --- gmp-bdiv.aa820fde2c64/mpn/generic/dcpi1_bdiv_qr.c 2012-07-17 
 23:52:21.0 +0200
 ***
 *** 33,42 
  Output:
   
 !   q = n * d^{-1} mod 2^{qn * GMP_NUMB_BITS},
   
 !   r = (n - q * d) * 2^{-qn * GMP_NUMB_BITS}
   
  Stores q at qp. Stores the n least significant limbs of r at the high 
 

Re: 2-adic roots (Re: bdiv vs redc)

2012-07-31 Thread Niels Möller
Replying to myself yet again...

 For the power x^n, I currently use mpn_powlo. But the least significant
 half is known from the previous iteration, so wraparound would be
 desirable. To me it would make some sense with a pow function for (mod
 (2^k - 1)), i.e., using mpn_mulmod_bnm1 and mpn_sqrmod_bnm1.

On second though, I don't think that would work. One would need to do
the wraparound reconstruction for each square and multiply during the
powering algorithm. And then it's not enough to store the low half of
x^n from the previous iteration, one needs to store *all* the
intermediate values from the earlier (mod B^k) powering.

The subproblem here is: Compute x^n (mod B^{2k}), taking maximun
advantage of a previous computation of x^n (mod B^k).

I could well be missing some simpler trick.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel


Re: 2-adic roots (Re: bdiv vs redc)

2012-07-30 Thread Niels Möller
ni...@lysator.liu.se (Niels Möller) writes:

 Correction: Each iteration gets from \ell to 2 \ell-2, and it needs 2
 \ell-1 bits of precision for intermediate values.

Yet another correction, after more work on the implementation.

For numbers x = 1 (mod 8), there are four square roots mod 2^k. If x is
one of them, the other three are

  x + 2^{k-1}
  -x
  -x + 2^{k-1}

So the top bit doesn't matter. This implies that no extra bit is needed
for temporary values: If we start with \ell bits, we can do the
iteration using 2\ell-2 bits for all values. And it doesn't matter
which value we get in the most significant bit after the final shift
right, we get a square root (mod 2^{2\ell - 2}) in either case.

And we can canonicalize the returned (mod 2^k) square root by saying
that it must be = 1 (mod 4), and  2^{k-1} (i.e., most significant bit
always zero).

I'm attaching my current code for both square root and nth root. Appears
to work fine. None of them yet use any wraparound tricks, but they do
take some advantage from cancellation.

In the nth-root code, I use the iteration 

 x' -- x - x * (a^{n-1} x^n - 1) / n

which converges to a^{1/n-1}, so the nth root is recovered as a * x. The
number of correct bits is *exactly* doubled in each iteration, so I
didn't see much point in using bit counts, instead I use a limb count
for the desired precision. Perhaps this iteration could be useful in the
euclidean case too, I haven't investigated that.

The factor a^{n-1} is a loop invariant, so it can be computed (at the
highest needed precision) before the loop.

For the power x^n, I currently use mpn_powlo. But the least significant
half is known from the previous iteration, so wraparound would be
desirable. To me it would make some sense with a pow function for (mod
(2^k - 1)), i.e., using mpn_mulmod_bnm1 and mpn_sqrmod_bnm1. Or is there
any easier way to take advantage of wraparound for that computation?

Maybe it would be a good idea to write a general or abstract pow
function taking multiplication and squaring functions as arguments?

We currently have modular exponentation, powlo and regular powering
with no reduction of any kind. I'm suggesting a pow_modbnm1. For
euclidean square root, and for mpfr, it might also be useful with a
pow_high, keeping only the n most significant limbs of each product, and
returning the number of discarded low limbs. Another potential use is
powm with moduli of special form, where the reduction can be done
cheaper than with montgomery redc. Maybe the function could even be used
for more complicated groups, e.g., if implementing elliptic curve
operations on top of gmp. Or maybe it's easy enough to duplicate the
code for each useful case, perhaps sharing some bit extraction macros in
gmp-impl.h.

Regards,
/Niels

#include stdio.h
#include stdlib.h

#include gmp.h
#include gmp-impl.h

/* Computes a^e (mod B). Uses right-to-left binary algorithm, since
   typical use will have e small. */
static mp_limb_t
powlimb (mp_limb_t a, mp_limb_t e)
{
  mp_limb_t r = 1;
  mp_limb_t s = a;

  for (r = 1, s = a; e  0; e = 1, s *= s)
if (e  1)
  r *= s;

  return r;
}

/* Computes the nth root of A, mod B^k. Both A and n must be odd.

   Uses the iteration

 x' -- x - x * (a^{n-1} x^n - 1) / n

   converging to a^{1/n - 1}.

   If

 a^{n-1} x^n = 1 (mod 2^\ell),

   then

 a^{n-1} x'^n = 1 (mod 2^{2\ell}),
*/

void
mpn_broot (mp_ptr rp, mp_srcptr ap, mp_size_t an, mp_limb_t n)
{
  mp_size_t sizes[GMP_LIMB_BITS * 2];  
  mp_ptr anm1, tp, xp, xnp, ep;
  mp_limb_t a0, r0, nm1, ninv;
  mp_size_t xn;
  unsigned i;

  TMP_DECL;

  ASSERT (an  0);
  ASSERT (ap[0]  1);
  ASSERT (n  1);
  ASSERT (n = 3);

  TMP_MARK;
  
  anm1 = TMP_ALLOC_LIMBS (4*an);
  tp = anm1 + an;

  nm1 = n-1;
  mpn_powlo (anm1, ap, nm1, 1, an, tp); /* 3 an scratch space */

  a0 = ap[0];
  binvert_limb (ninv, n);

  /* 4 bits: a^{1/n - 1} (mod 16):

   a mod 8
   1 3 5 7   
 1 1 1 1 1
 3 1 9 9 1
  */
  r0 = 1 + (((n  2)  ((a0  1) ^ (a0  2)))  8);
  r0 = ninv * r0 * (n+1 - anm1[0] * powlimb (r0, n)); /* 8 bits */
  r0 = ninv * r0 * (n+1 - anm1[0] * powlimb (r0, n)); /* 16 bits */
  r0 = ninv * r0 * (n+1 - anm1[0] * powlimb (r0, n)); /* 32 bits */
#if GMP_NUMB_BITS  32
  r0 = ninv * r0 * (n+1 - anm1[0] * powlimb (r0, n)); /* 64 bits */
#endif

  if (an == 1)
{
  TMP_FREE;
  rp[0] = r0 * a0;
  return;
}

  xp = TMP_ALLOC_LIMBS (3*an);
  xnp = xp + an;
  ep = xp + 2*an;

  /* FIXME: Possible to this on the fly with some bit fiddling. */
  for (i = 0; an  1; an = (an + 1)/2)
sizes[i++] = an;

  xp[0] = r0;
  xn = 1;

  while (i--  0)
{
  /* Compute x^n. What's the best way to handle the doubled
 precision? Could do the complete powering using
 wraparound. */
  MPN_ZERO (xp + xn, sizes[i] - xn);
  mpn_powlo (xnp, xp, n, 1, sizes[i], tp); 

  /* Multiply by a^{n-1}. Can use wraparound; low part is
 000...01. */

  mpn_mullo_n (ep, 

Re: 2-adic roots (Re: bdiv vs redc)

2012-07-30 Thread David Harvey

On Jul 31, 2012, at 7:42 AM, Niels Möller wrote:

 We currently have modular exponentation, powlo and regular powering
 with no reduction of any kind. I'm suggesting a pow_modbnm1. For
 euclidean square root, and for mpfr, it might also be useful with a
 pow_high, keeping only the n most significant limbs of each product, and
 returning the number of discarded low limbs. Another potential use is
 powm with moduli of special form, where the reduction can be done
 cheaper than with montgomery redc. Maybe the function could even be used
 for more complicated groups, e.g., if implementing elliptic curve
 operations on top of gmp. Or maybe it's easy enough to duplicate the
 code for each useful case, perhaps sharing some bit extraction macros in
 gmp-impl.h.

Related to this, you may also want to check out the paper Fast convolutions 
meet Montgomery by Mihailescu, I've always thought that would be an 
interesting algorithm to put in GMP, but maybe a bit tricky to implement.

david

___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel


Re: bdiv vs redc

2012-07-17 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes:

  ni...@lysator.liu.se (Niels Möller) writes:
  
   Some other test cases fail, though: t-root, t-perfpow, t-divis, and
   t-cong. I'm not familiar with this code, I had a quick look at the root
   code but didn't see any obvious dependency on bdiv.
  
  Found it, it's the divisibility check in perfpow.
  
  With the old bdiv convention, we had that D divides U iff R == 0.
  With the new convention, the divisibility test is the more complicated
  
cy == 0  (R == D || R == 0)
  
  where the R == 0 case can happen only if U == 0. So if it is known a
  priori that U != 0, then the test can be simplified to just
  
R == D.
  
Ah.  This makes some sense.

  I also noted that there's some similarity between perfpow and remove,
  maybe we should have an mpn_remove_1, and use that from perfpow? And
  differences, remove.c uses mpn_bdiv_qr for divisibility tests (and keeps
  the quotient in case the division is exact), while perfpow calls
  mpn_divisible_p which in turn calls mpn_*_bdiv_qr.
  
Yes, I am aware of these similarities (or at least, I *was* aware of it
when we worked on the perfpow code).

An mpn_remove_1 makes sense.

We should also extract the Hensel nth root and nth square root from
mpn/generic/perfpow.c, and put them in their own files.  (Some typing
cleanup might be asked for.)

-- 
Torbjörn
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel


Re: bdiv vs redc

2012-07-17 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes:

  How do you define hensel square root with remainder? Given a and n, if
  there exists an x such that x^2 = a (mod B^n), that seems like the
  reasonable definition of the square root. But what if no such x exists;
  where should we put the remainder in the equation?
  
I don't think a remainder is meaningful here.

-- 
Torbjörn
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel


Re: bdiv vs redc

2012-07-17 Thread Niels Möller
Torbjorn Granlund t...@gmplib.org writes:

 I don't think a remainder is meaningful here.

Ok, so the return value should be a proper root mod B^k (B =
2^{GMP_NUMB_BITS, as usual), or an indication that no root exists.

When a square root exists (and the input is non-zero), then there are
two square roots (if I remember correctly, prime powers behave like
primes rather than general composites in this respect, for which there
exists more than two square roots). Should we have some
canonicalization?

I haven't really looked into the perfpow code or theory, but let me
think aloud...

For perfpow, I guess all roots mod B^k must be considered as candidates
for a non-modular root. So, say we want to find the positive nth root of
a, and we know the size of the root (if it exists) must be k (or
possibly k-1) limbs.

Then for all modular nth roots x such that

  x^n = a (mod B^k)

we can check if x^n = a, and if we have equality for one of the
candidates, then a is a perfect power. I imagine most candidates can be
excluded by computing the highest few limbs of the power.

Is it easy to find all nth roots mod B^k, given one of them? If I got
the number of roots correctly (and I guess we have to assume that n 
B^k...), then they differ by a primitve nth root of unity. Does there
exist a primitive nth root of unity mod B^k for any n and k, and if so,
is it easy to find?

Ah, and one more thing... Since we're working mod a power of two, I
imagine there may be some fundamental differences between odd and even
n?

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel


Re: bdiv vs redc

2012-07-16 Thread Niels Möller
ni...@lysator.liu.se (Niels Möller) writes:

 Some other test cases fail, though: t-root, t-perfpow, t-divis, and
 t-cong. I'm not familiar with this code, I had a quick look at the root
 code but didn't see any obvious dependency on bdiv.

Found it, it's the divisibility check in perfpow.

With the old bdiv convention, we had that D divides U iff R == 0.
With the new convention, the divisibility test is the more complicated

  cy == 0  (R == D || R == 0)

where the R == 0 case can happen only if U == 0. So if it is known a
priori that U != 0, then the test can be simplified to just

  R == D.

I also noted that there's some similarity between perfpow and remove,
maybe we should have an mpn_remove_1, and use that from perfpow? And
differences, remove.c uses mpn_bdiv_qr for divisibility tests (and keeps
the quotient in case the division is exact), while perfpow calls
mpn_divisible_p which in turn calls mpn_*_bdiv_qr.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel


Re: bdiv vs redc

2012-07-10 Thread Niels Möller
Torbjorn Granlund t...@gmplib.org writes:

 What are your plans with these bdiv changes, do you intend to debug
 things, or are you hoping that I and Marco debug it?  ;-)

I don't think I have a real plan. But the most practical way forward I
see is to

1. Adapt bdiv_mu to the new convention by writing a simple wrapper around
   the current code.

2. Iron out remaining problems in bdiv callers as well as any
   remaining bugs in dc and sb.

3. Possibly change the implementation of mu_bdiv.

When (2) is done, we can consider pushing it to the main repo. Or do you
seen any reasonable way to do the changes in smaller pieces?

I'd like to work further on this, but it's not my top priority, so I'm
only happy if you or Marco get to it before I do.

Regards,
/Niels


-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel


Re: bdiv vs redc

2012-06-29 Thread Niels Möller
Torbjorn Granlund t...@gmplib.org writes:

 hi = mpn_addmul_1 (np, dp, dn, q)
 hi += cy
 cy = hi  cy  // can this be true?

Unless something interesting can be inferred from the hypothesis
previous iteration generated a carry, I'm pretty sure carry is
possible here.

An addmul_1 with size n gives a result (including the carry limb) of up
to B^{n+1} - B,  or B-1, B-1, ..., B-1, 0.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel


Re: bdiv vs redc

2012-06-28 Thread Torbjorn Granlund
Torbjorn Granlund t...@gmplib.org writes:

iterate nn-dn times
  q = np[0] * dinv
  hi = mpn_addmul_1 (np, dp, dn, q)
  hi += cy
  cy = hi  cy  // can this be true?
  hi += np[dn]
  cy2 += hi  np[dn]
  np[dn] = hi
  np += 1;

I wrote a possible start for mpn_sbpi1_bdiv_r using the above idea.
It is attached.

As currently writtem this is a pure generalisation of redc_1; the
dividend and divisor have separate sizes.  The remainder is of redc_1
type, with carry returned, not borrow.

I suspect this will beat both previous styles of schoolbook Hensel
division, in particular when written in assembly.



sbpi1_bdiv_r.c
Description: Binary data

-- 
Torbjörn
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel