Re: Anomaly in mpn_sqrtrem and mpn_rottrem

2015-06-10 Thread Torbjörn Granlund
ni...@lysator.liu.se (Niels Möller) writes:

  I also had a quick look at the math, and I realized (some of you surely
  knew that already) that floor(sqrt(a)) is mostly independent of the
  lowest half of the bits of a.

Some of us indeed knew that...  And this generalises to kth roots.  It
is similar to the more familiar problem of an m-limb by n-limb division;
the low n dividend limbs affect the quotient very little...

  The lowest half of the bits are needed only for computing the remainder,
  and a final adjustment-by-one step. I think the iteration should totally
  ignore the low half bits in each step.
  
  Would it make sense with an mpn_sqrt_appr which takes a n-limb
  normalized number A as input, and computes an n-limb square root
  approximation of B^{n-1} A, with some well defined error bound, of at
  most 2? (We might also need a function for the the (n-1)-limb sqrt of
  B^{n-2} A).
  
  And a corresponding mpn_root_appr would compute the kth root of
  B^{(k-1) n - 1} A, I think.

I haven't looked enough at your propposal to have an informed opinion.

I think mpn_rootrem's approach for when the remainder is not needed
might be wise, perhaps wiser.  It computes with one extra limb, then
uses that to almost always return the correct result without any
(partial) remainder computation.

To speed sqrt (non-rem) we might want to do like root.  But both
functions should probably do that only for operands over say 10 limbs.
Below that, some _appr approach will be better.

And as written before, as k grows, we would benefit enormously from a
pow_1_highpart.  The average speedup potential is O(k).

-- 
Torbjörn
Please encrypt, key id 0xC8601622
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: Anomaly in mpn_sqrtrem and mpn_rottrem

2015-06-10 Thread Niels Möller
t...@gmplib.org (Torbjörn Granlund) writes:

 I haven't looked enough at your propposal to have an informed opinion.

Here's a sketch with some more details. I've tried to work out both
sqrt(B^{n-1} A) and sqrt(B^{n-2}). To my surprise, they seem
independent, not mutually recursive.

sqrt_nm1 (n, A):

  Input: A = {a_{n-1}, ..., a_0}, n = 2
  Output: X = {x_{n-2}, ..., x_0} \approx sqrt(B^{n-2} A)

  if (n == 2)
x_0 -- floor (sqrt ({a_1, a_0}))
return X = {x_0}

  k -- floor (n/2)
  H -- sqrt_nm1 (k+1, {a_{n-1},...,a_{n-1-k}}) // k limbs

  if (n odd)// n = 2k+1
E -- B H^2 - A
X -- B^k H - floor (B^{k-1} E / 2H)

  else  // n = 2k
E -- H^2 - A
X -- B^{k-1} H - floor (B^{k-1} E / 2H)

  return X

sqrt_n (n, A):

  Input: A = {a_{n-1}, ..., a_0}, n = 1
  Output: X = {x_{n-1}, ..., x_0} \approx sqrt(B^{n-1} A)

  if (n == 1)
x_0 -- floor (sqrt (a_0))
return X = {x_0}

  k -- floor (n/2)
  H -- sqrt_n(k+1, {a_{n-1},...,a_{n-1-k}}) // k+1 limbs

  if (n odd)// n = 2k+1
E -- H^2 - A
X -- H B^k - floor (B^k E / 2 H)

  else  // n = 2k
E -- H^2 - B A
X -- B^{k-1} H - floor (B^{k-1} E / 2H)

  return X

Note that all the computations of the error/residual E is subject to a
lot of cancellation, so one can use mullo (for small sizes) or
wraparound arithmetic using mulmod_bnm1 for larger sizes.

And the divisions really ought to use divappr, rather than exact
truncating division. I'm not sure how that interface works, but one
ought be able to pass in numerators like, e.g., B^{k-1} E, without
actually padding the numerator with zeros.

I don't know how large errors one gets, from the above algorithms, but I
would expect errors to be at most 2 or 3.

For proper error analysis, one should also think about desired sign of
the error. The newton iteration wants to converge from above, and on the
other hand the inclusion of more bits of A wants X to converge from
below. So maybe one should add some fudge factor into the rounding
somewhere force monotonous convergence? Also sign and magnitude of the
divappr errors must be taken into account.

If we end up with an error E and corresponding X update of varying sign,
that's no fundamental problem, but it will make the wraparound
arithmetic a little more complex.

Finally, I'm not sure if one can expect tricks to eliminate division to
be a big win. Except for the smallest sizes, division, and in particular
divappr, should only be a small constant slower than multiplication,
right? The division free iteration for 1/sqrt(A) needs additional
multiplications, and the trick mensioned the other day to compute an
inverse incrementally, will also introduce additional multiplications.

I'm thinking that an exact sqrt/sqrtrem for even size n will call
sqrt_nm1 with n/2+1 input limbs producing an n-limb root. And for odd n,
call sqrt_n with (n+1)/2 input limbs. In either case, we'll get at least
one bit of margin, with the worst case being odd n and a[n-1] == 1.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel