Re: Anomaly in mpn_sqrtrem and mpn_rootrem

2015-08-19 Thread Torbjörn Granlund
Marco Bodrato bodr...@mail.dm.unipi.it writes:

  Before the changes I just pushed, I simply reordered the steps in the loop
  to shorten the first and the last iteration in the loop...
  
Resulting in even better performance, I presume?

   How much speed difference is there now, for k = 4 vs sqrt(sqrt())?
  
   mpn_sqrtmpn_root.4mpn_root.8   mpn_root.16
  1  #33.86659.37229.62178.01
  2 #112.47916.25789.81273.80
  4 #245.35   1350.10   .97   1117.45
  8 #419.01   1934.60   1683.93   1570.81
  16   #1015.91   2611.44   2558.12   2472.39
  32   #1666.53   3837.72   3969.28   4031.27
  64   #3305.67   6295.23   6160.25   6654.23
  
   Is the difference small enough that we could fix it by running the first
   few iterations using plain limb arithmetic?
  
  I fear it is not.

That is indeed evident from that data.  E.g., the delta between sizes 2
and 4 is more than twice greater for root than for sqrt.

-- 
Torbjörn
Please encrypt, key id 0xC8601622
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: Anomaly in mpn_sqrtrem and mpn_rootrem

2015-08-18 Thread Marco Bodrato
Ciao,

On Tue, August 18, 2015 10:51 am, Torbjörn Granlund wrote:
 I miss some rootrem logs from ChangeLog.

Before the changes I just pushed, I simply reordered the steps in the loop
to shorten the first and the last iteration in the loop...

 How much speed difference is there now, for k = 4 vs sqrt(sqrt())?

 mpn_sqrtmpn_root.4mpn_root.8   mpn_root.16
1  #33.86659.37229.62178.01
2 #112.47916.25789.81273.80
4 #245.35   1350.10   .97   1117.45
8 #419.01   1934.60   1683.93   1570.81
16   #1015.91   2611.44   2558.12   2472.39
32   #1666.53   3837.72   3969.28   4031.27
64   #3305.67   6295.23   6160.25   6654.23

 Is the difference small enough that we could fix it by running the first
 few iterations using plain limb arithmetic?

I fear it is not.

-- 
http://bodrato.it/

___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: Anomaly in mpn_sqrtrem and mpn_rootrem

2015-08-17 Thread Marco Bodrato
Ciao,

On Tue, July 14, 2015 11:48 am, bodr...@mail.dm.unipi.it wrote:

 estimate. The trick I suggest consists in adding some fractional digits of
 log2(n), by table lookup, then use log2(n)/k to estimate a few of the
 highest digits of n^(1/k), not just one.

With two 256-chars tables(too much?), it is possible to estimate 9 bits
(seldom the error is +2, previous code always used a single correction).

Before we started refining mpn_rootrem (changeset 16672:c86f4fc0aafe) the
timing (on shell.gmplib) was:
$tune/speed -cs 1-1024 -f3 mpn_root.3 mpn_root.32 mpn_root.332 mpn_root.3332
overhead 5.84 cycles, precision 1 units of 2.86e-10 secs, CPU freq
3500.08 MHz
   mpn_root.3   mpn_root.32  mpn_root.332 mpn_root.3332
1 2120.44424.16 84.71#84.70
3 2537.38   2255.13 86.57#86.53
9 3556.60   4661.17687.10#99.87
274779.44   6667.10   7156.09   #151.71
818768.02  13230.91  31229.99  #6717.58
243  40720.39 #27991.11  72860.77  81112.17
729 135971.53#112360.63 240474.621060633.50

The current repo gives:
   mpn_root.3   mpn_root.32  mpn_root.332 mpn_root.3332
1 1985.55290.12#27.20 27.21
3 2373.10   2015.57 27.21#27.21
9 3359.58   4401.62411.39#27.21
274626.43   6317.53   6857.23#27.21
818583.51  12816.22  30837.67  #5783.87
243  39055.19 #27236.95  71936.69  78244.04
729 134456.72#111492.59 235518.651054410.50

The code I'm experimenting with, reduces timings for small results:
   mpn_root.3   mpn_root.32  mpn_root.332 mpn_root.3332
1  634.94193.03#28.22 28.25
3 1289.25200.82#28.17 28.18
9 1865.79   1341.31384.23#28.24
273411.88   3257.38816.07#28.24
817518.93   9572.27  18564.51  #1991.75
243  37723.17  24465.42  61376.28 #20745.56
729 133654.08#108648.30 226299.89 768142.04

I'm still testing it, but I think I'll push it soon...

 Now, for small sizes, computing the 4th root with mpn_root is slower
 than calling mpn_sqrt twice.

Even with the new code, this is true on a wide range (up to 1000 limbs).
Should we write special code for k=4,8,16 that repeatedly calls mpn_sqrt?

Regards,
m

-- 
http://bodrato.it/

___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: Anomaly in mpn_sqrtrem and mpn_rootrem

2015-07-15 Thread Torbjörn Granlund
bodr...@mail.dm.unipi.it writes:

   But I spotted % k and / k there, and those are very expensive,
   unless you table inverses of k, of course...
  
  There is already a single (there are two currently, but we can avoid one)
  division by k in the code, it is used to compute the first single-bit
  estimate. The trick I suggest consists in adding some fractional digits of
  log2(n), by table lookup, then use log2(n)/k to estimate a few of the
  highest digits of n^(1/k), not just one.
  
  I still have to analyse the maximal error, but I believe we can use the
  trick to skip some iterations.
  
I agree that your code is an improvement.

For all of GMP, I think we should consider alternatives to explicit
division (and modulus).  If the numerator is limited, we can always
compute an accurate quotient with a multiply, provided we have a
reasonable divisor inverse.

My division paranoia might not be motivated for all chips, though.
32-bit chips tend to have fast enough division hardware,, and recent
64-bit x86 CPUs have fast 64/64 division (but slow 128/64 division).
We cannot then beat the hardware with Newton.

Why bother?  Our sqrt(rem) and root(rem) will sometimes be used for just
a few limbs, and then division as part of the bookkeeping will take a
very large fraction of the time.


-- 
Torbjörn
Please encrypt, key id 0xC8601622
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: Anomaly in mpn_sqrtrem and mpn_rootrem

2015-07-14 Thread bodrato
Ciao,

Il Mer, 8 Luglio 2015 4:20 pm, Torbjörn Granlund ha scritto:
 bodr...@mail.dm.unipi.it writes:

   The code above should give a range for the first 5 bits (the first is 1
   anyway), and it should work for any k.

 But I spotted % k and / k there, and those are very expensive,
 unless you table inverses of k, of course...

There is already a single (there are two currently, but we can avoid one)
division by k in the code, it is used to compute the first single-bit
estimate. The trick I suggest consists in adding some fractional digits of
log2(n), by table lookup, then use log2(n)/k to estimate a few of the
highest digits of n^(1/k), not just one.

I still have to analyse the maximal error, but I believe we can use the
trick to skip some iterations.

Best regards,
m

-- 
http://bodrato.it/papers/

___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: Anomaly in mpn_sqrtrem and mpn_rootrem

2015-07-09 Thread Torbjörn Granlund
  
   Surely the most important k is 3 (assuming 2 is always handled by the
   sqrt code).
  
  Elaborate? Are you thinking of any particular algorithm using cube
  roots, or just the general observation that smaller numbers tend to be
  more common?
  
The latter; I think u^(1/3) will tend to be more often computed than
u^(1/4711).

-- 
Torbjörn
Please encrypt, key id 0xC8601622
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: Anomaly in mpn_sqrtrem and mpn_rootrem

2015-07-08 Thread bodrato
Ciao,

Il Mar, 7 Luglio 2015 11:12 am, Torbjörn Granlund ha scritto:
 bodr...@mail.dm.unipi.it writes:

   Maybe we should improve mpn_rootrem for
   small sizes in general...

 Tabling start values is hard, but we should consider doing it for k 
 some limit, and perhaps provide just 4 bits.

We can use something based on the logarithm.
The code above should give a range for the first 5 bits (the first is 1
anyway), and it should work for any k.

void
root (mp_limb_t *r, mp_limb_t n, unsigned k)
{
  /* vector(32,i,floor((log(31+i)/log(2)-5)*256)) */
  unsigned char v[] = {0, 11, 22, 33, 43, 53, 63, 73,
82, 91, 100, 109, 117, 125, 134, 141,
149, 157, 164, 172, 179, 186, 193, 200,
206, 213, 219, 225, 232, 238, 244, 250, 255};
  unsigned c;
  mp_limb_t low, hi;

  count_leading_zeros (c, n);
  c -= GMP_NAIL_BITS;
  n = c;
  n = GMP_NUMB_BITS - 6;
  n -= 32;
  /* Now 0 = n  32 contains the higest bits of the original n, with
 the first 1 removed. */
  c = GMP_NUMB_BITS - c-1;
  /* c is the number of bits of the original number. */
  low = v[n] + (c%k) * 256;
  hi = v[n+1] + (c%k) * 256;
  low = low / k;
  hi = hi / k;
  for (c=32; low  v[--c];) ;
  r[0] = c+32;
  for (;hi  v[c]; ++c) ;
  r[1] = c+32;
}

Regards,
m

-- 
http://bodrato.it/papers/

___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: Anomaly in mpn_sqrtrem and mpn_rootrem

2015-07-08 Thread bodrato
Ciao,

Il Mer, 8 Luglio 2015 8:31 am, Niels Möller ha scritto:
 bodr...@mail.dm.unipi.it writes:

 If H=floor(sqrt(A/B^2n)), then the residual we need to compute the lower
 half of the square root is floor(A/B^2n)-H^2 .

 I don't follow you here. Consider a simple case: A of size 2n, and we
 try to compute an approximation of sqrt(B^{2n-2} A).

 Then first compute high half as

   H \approx sqrt(B^{n-2} floor (B^{-n} A))

 And the residual is

   E = H^2 - A (appr. n limbs, due to cancellation)

E is approximatively 3n/2 limbs.

An example:
B=10
A=9876 (4 digits)
To compute the high half, we consider A'=98 (2 digits),
we compute H'=sqrt(98)=9 (1 digit).
(A'=H'^2+17, 17 = 2*H', this result is exact, the first digit will not be
changed by any following step...)

Then H=90 (2 digits)
E=A - H^2 = 9876-90^2 = 1776 (slightly more than 3 digits...)

But we do not need all of E, we can simply use
E' = (A'-H'^2)*B + second_digit_of(A) = (98-9^2)*10 +... = 177
We computed (98-9^2), the remainder of the high half...
Then 177/9/2 = 9 is the second digit.

... but maybe I did not understand the algorithm you proposed.

Best regards,
m

-- 
http://bodrato.it/

___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: Anomaly in mpn_sqrtrem and mpn_rootrem

2015-07-08 Thread Torbjörn Granlund
bodr...@mail.dm.unipi.it writes:

  The code above should give a range for the first 5 bits (the first is 1
  anyway), and it should work for any k.
  
I have't looked at the maths, so I am not yet impressed...

But I spotted % k and / k there, and those are very expensive,
unless you table inverses of k, of course...


-- 
Torbjörn
Please encrypt, key id 0xC8601622
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: Anomaly in mpn_sqrtrem and mpn_rootrem

2015-07-08 Thread Niels Möller
bodr...@mail.dm.unipi.it writes:

 An example:
 B=10
 A=9876 (4 digits)

I'm thinking of a slightly different arrangement.

At the top-level sqrtrem, since we have 4 digits, we have to use the top
3 to get an accurate square root approximation. So we'll call a function
to compute sqrt(987 * B^1). With three digits input and 2 digits output,
so this is the function call I've denoted by sqrt_nm1(3, 987), for lack
of a better name.

Next, sqrt_nm1 will split off the top two digits, and recursively
compute 

  H = sqrt_nm1(2, 98) 

This is handled as the base base, returning floor (sqrt(98)) = 9. So we
get H = 9.

Since we had an odd number of digits (3), the residual is

  E = 987 - B H^2 = 987 - 810 = 177.

We then return

  X = B H + floor (E / 2H) = 90 + floor(177 / 18) = 99

So this is the value returned to the top-level sqrtrem,

  sqrt_nm1(3, 987)   ---   99

Finally, the top-level may need an adjustment taking the final digit
into account. In this example, no adjustment is needed. One case where
it is needed is if the top-level input happens to be a square, and with
a non-zero least significant digit. E.g, 8836, where sqrt_nm1(3, 883)
returns 93, one off from the correct square root 94.

 ... but maybe I did not understand the algorithm you proposed.

I hope I've made it a little bit clearer. (I'm actually working on an
implementation, so we can play more with it).

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: Anomaly in mpn_sqrtrem and mpn_rootrem

2015-07-08 Thread Torbjörn Granlund
ni...@lysator.liu.se (Niels Möller) writes:

  There are divisions also in the newton iteration, right? Then a single
  division in the computation of the initial value is likely a win, if it
  saves two or more iterations in the loop.
  
There are GMP division operations, but these are not likely going to hit
any hardware division.

  Precomputing an inverse of k makes sense, if there's more than one
  division by k.

Yes.

I think we should consider using special starting approximations for
important k, and then perhaps this general 5-bit code for the remaining
cases.

Surely the most important k is 3 (assuming 2 is always handled by the
sqrt code).  We could provide a 8-bit table here which just performs
masking and indexing.  Some other small k values might deserve such
tables too.

-- 
Torbjörn
Please encrypt, key id 0xC8601622
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: Anomaly in mpn_sqrtrem and mpn_rootrem

2015-07-08 Thread Niels Möller
t...@gmplib.org (Torbjörn Granlund) writes:

 But I spotted % k and / k there, and those are very expensive,
 unless you table inverses of k, of course...

There are divisions also in the newton iteration, right? Then a single
division in the computation of the initial value is likely a win, if it
saves two or more iterations in the loop.

Precomputing an inverse of k makes sense, if there's more than one
division by k. (I also haven't tried to understand the fine details of
the algorithm).

Regards,
/Niels


-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: Anomaly in mpn_sqrtrem and mpn_rootrem

2015-07-08 Thread Niels Möller
bodr...@mail.dm.unipi.it writes:

 Not a gain in speed, probably, but we can avoid writing code twice.

Using some variant of sqrtrem is one alternative for the organization. 

 If H=floor(sqrt(A/B^2n)), then the residual we need to compute the lower
 half of the square root is floor(A/B^2n)-H^2 .

I don't follow you here. Consider a simple case: A of size 2n, and we
try to compute an approximation of sqrt(B^{2n-2} A).

Then first compute high half as

  H \approx sqrt(B^{n-2} floor (B^{-n} A))

And the residual is
 
  E = H^2 - A (appr. n limbs, due to cancellation)

So the pieces of A needed for the two computations have very little
overlap, I haven't done the error analysis, but I expect at most one
limb, i.e., E is n+1 limbs, possibly signed.

For other odd/even cases, we also need residuals of the form

  E = B H^2 - A

and

  E = H^2 - B A

 Well, if H=floor(sqrt(floor(A/B^2n))), then H=floor(sqrt(A)/B^n).
 I mean, if the square root of the high half of the number is correctly
 rounded, it will coincide with the high half of the root, the additional
 low libs can not change it.

Hmmm, that's related to sqrt being concave? Computations involving floor
are a bit non-obvious to me. In my notation above, an exact H would be

  H = floor (sqrt (floor (A/B^2))) = floor (sqrt(A) / B)

Say we have such an exact H, what kind of rounding and adjustments do we
need after the adjustment step

  X -- B^{n-1} H - B^{n-1} E / 2H; some rounding for the division

If it's possible to get a correct X = floor (sqrt (B^{n-2} A)) without
actually computing X^2, that makes an exact algorithm more attractive.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: Anomaly in mpn_sqrtrem and mpn_rootrem

2015-07-07 Thread bodrato
Ciao,

Il Lun, 6 Luglio 2015 11:28 pm, Torbjörn Granlund ha scritto:
 bodr...@mail.dm.unipi.it writes:

   ... yes, I know, we really need to improve also odd sizes...

 Perhaps one could simply wrap the current function in code which appends
 a zero limbs, calls the current code, and then right-shift the root by
 half the limb bitcount?

This is a description of the patch I pushed yesterday:
https://gmplib.org/repo/gmp/rev/87ba695c8878
But I do not really like it. We alloc-copy-shift to add a dummy limb, then
we call a code that allocs-copies-shifts to (virtually) add two more dummy
limbs...

Anyway, in general the code for sqrt (no-remainder) is faster now than it
was before. With code from revision 16716 (3 weeks ago) we got
time tests/devel/try mpn_sqrt
[...]
user0m25.299s

Now we get:
time tests/devel/try mpn_sqrt
[...]
user0m23.300s

 (One would possibly need to suppress such code for operands below a
 certain size, to avoid slowdowns.)

There are some slowdown only for sizes 15,16,31,32, i.e. when the size is
near a small power of 2.

Now, for small sizes, computing the 4th root with mpn_rootrem is slower
than calling mpn_sqrtrem twice. Maybe we should improve mpn_rootrem for
small sizes in general...

Regards,
m

-- 
http://bodrato.it/papers/

___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: Anomaly in mpn_sqrtrem and mpn_rootrem

2015-07-07 Thread Torbjörn Granlund
bodr...@mail.dm.unipi.it writes:

  https://gmplib.org/repo/gmp/rev/87ba695c8878
  But I do not really like it. We alloc-copy-shift to add a dummy limb, then
  we call a code that allocs-copies-shifts to (virtually) add two more dummy
  limbs...
  
There is still a very noticeable difference between even and odd sizes:

 mpn_sqrtmpn_root.2   mpn_sqrtrem
4090  #1701239.982159255.232270148.03
4091  #1717724.102161403.542314530.39
4092  #1677835.052156978.062263015.48
4093  #1713743.032164690.062306523.04
4094  #1689689.172192785.722302479.92
4095  #1733267.802193850.692301651.07
4096  #1702328.082193558.182254053.40
4097  #1742066.352267900.342321820.79
4098  #1699858.762186774.592269899.77
4099  #1734269.702195714.072331587.32
4100  #1712964.012195057.782293181.09

  Now, for small sizes, computing the 4th root with mpn_rootrem is slower
  than calling mpn_sqrtrem twice. Maybe we should improve mpn_rootrem for
  small sizes in general...
  
Indeed.  We start with a single-bit approximation, then initially make
very little progress in the first steps (one bit for higher-order
roots).

Tabling start values is hard, but we should consider doing it for k 
some limit, and perhaps provide just 4 bits.

Another great speedup would be to use special code for the single-limb
iterating.

-- 
Torbjörn
Please encrypt, key id 0xC8601622
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: Anomaly in mpn_sqrtrem and mpn_rootrem

2015-07-06 Thread Torbjörn Granlund
bodr...@mail.dm.unipi.it writes:

  4095  #2112866.692407388.572453779.52
  4096  #1849300.022377781.132416991.81
  4097  #2144198.042376456.382492709.77
  4098  #1853563.192379253.132469931.56
  
  ... yes, I know, we really need to improve also odd sizes...
  
Perhaps one could simply wrap the current function in code which appends
a zero limbs, calls the current code, and then right-shift the root by
half the limb bitcount?

(One would possibly need to suppress such code for operands below a
certain size, to avoid slowdowns.)

-- 
Torbjörn
Please encrypt, key id 0xC8601622
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: Anomaly in mpn_sqrtrem and mpn_rootrem

2015-07-02 Thread bodrato
Ciao,

Il Mer, 24 Giugno 2015 10:25 pm, Niels Möller ha scritto:
 bodr...@mail.dm.unipi.it writes:
 If we need both the sqrt of the high-half and the residual, we should
 directly use sqrtrem, don't we?

 Maybe, but it's not obvious. For simplicity, consider the case of
 smallish numbers, so we use mullo rather than mulmod_bnm1 for the
 cancellation. Then H (high half of the square root) depends only on the
 high half of A, while the residual is computed from H and the low half
 of A. So I see no obvious gain in computing the residual sooner.

Not a gain in speed, probably, but we can avoid writing code twice.
If H=floor(sqrt(A/B^2n)), then the residual we need to compute the lower
half of the square root is floor(A/B^2n)-H^2 .
I do not know which algorithm is the faster to obtain both H and the
residual, but we should implement it just once, in sqrtrem.

 Once we compute a residual, correcting the
 approximation to obtain an exact result does not cost too much,
 right?

 Exact for the current A input, yes. But one level up in the recursion,
 where additional low limbs are used, it's no longer exact. Nothing is

Well, if H=floor(sqrt(floor(A/B^2n))), then H=floor(sqrt(A)/B^n).
I mean, if the square root of the high half of the number is correctly
rounded, it will coincide with the high half of the root, the additional
low libs can not change it.

 very obvious here, but my gut feeling is that exactness isn't a very
 useful property for the intermediate results based on truncated versions
 of the top-level input. And it's best to defer the adjustment to
 top-level.

Adjustment is a little bit more complex if the previous intermediate
result was not exact, but you are probably right: a unique top-level
adjustment might be the best strategy.
Actually I opted for a simpler reuse-already-working-code strategy :-)

Best regards,
m

-- 
http://bodrato.it/papers/

___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: Anomaly in mpn_sqrtrem and mpn_rootrem

2015-07-02 Thread bodrato
Ciao,

Il Mer, 24 Giugno 2015 10:48 am, bodr...@mail.dm.unipi.it ha scritto:
 Il Mer, 10 Giugno 2015 9:54 pm, Niels Möller ha scritto:
 And the divisions really ought to use divappr, rather than exact
 truncating division.

 You are right, I did not try divappr yet. I'll do.

Using div_q (not computing remainder) instead of divrem gave a 10% speedup.
Using divappr instead of div_q we gain a barely measurable 1%.

Moreover we don't have an mpn_divappr_q interface, and different
implementations *_divappr_q give different approximations. I hope I
correctly understood the comments in the code ... I assumed the worst case
is a possible +4 error when using mu_divappr.

We gained another 1% (probably less) using tdiv_qr instead of divrem (in
the sqrtrem branch). This drastically reduced the test coverage for
divrem:
https://gmplib.org/devel/lcov/shell/var/tmp/lcov/gmp/mpn/divrem.c.gcov.html

About divrem, why do we allocate a copy for the reminder?
rp = TMP_ALLOC_LIMBS (dn);
mpn_tdiv_qr (q2p, rp, 0L, np, nn, dp, dn);
MPN_COPY (np, rp, dn);  /* overwrite np area with remainder */
We can directly ask tdiv to overwrite np, can't we?
mpn_tdiv_qr (q2p, np, 0L, np, nn, dp, dn);

Best regards,
m

-- 
http://bodrato.it/toom-cook/

___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: Anomaly in mpn_sqrtrem and mpn_rootrem

2015-07-02 Thread bodrato
Ciao,

Il Mer, 24 Giugno 2015 9:53 am, Torbjörn Granlund ha scritto:
 bodr...@mail.dm.unipi.it writes:
   But the code I'm experimenting with is not ready yet, it supports only
   even sizes currently.

 That's another 10% speedup.

Yes... unluckily I did not have the time to adapt it to odd sizes, but I
decided to push it, so that anyone can look into it and comment or improve
it.

The speed of revision 16730:
$ tune/speed -cp1 -s4093-4098 mpn_sqrt mpn_root.2 mpn_sqrtrem
overhead 5.90 cycles, precision 1 units of 2.86e-10 secs, CPU freq
3500.08 MHz
 mpn_sqrtmpn_root.2   mpn_sqrtrem
4093  #2120836.002286161.232450569.02
4094  #2107173.022358206.402507010.17
4095  #2126777.632392313.422458527.34
4096  #2172714.172373645.182470476.63
4097  #2176120.282358870.392542117.51
4098  #2193557.392374669.592513122.24

The speed with the new code (current repo):
$ tune/speed -cp1 -s4093-4098 mpn_sqrt mpn_root.2 mpn_sqrtrem
overhead 5.86 cycles, precision 1 units of 2.86e-10 secs, CPU freq
3500.08 MHz
 mpn_sqrtmpn_root.2   mpn_sqrtrem
4093  #2103034.462307752.772442488.63
4094  #1831188.022367552.042469309.74
4095  #2112866.692407388.572453779.52
4096  #1849300.022377781.132416991.81
4097  #2144198.042376456.382492709.77
4098  #1853563.192379253.132469931.56

... yes, I know, we really need to improve also odd sizes...

 My guess is that a division-free iteration would give another 10%, and
 then using David's mulmid in that code would improve things by...10%.

Maybe...

Best regards,
m

-- 
http://bodrato.it/

___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: Anomaly in mpn_sqrtrem and mpn_rootrem

2015-06-24 Thread Niels Möller
bodr...@mail.dm.unipi.it writes:

 Note that all the computations of the error/residual E is subject to a
 lot of cancellation, so one can use mullo (for small sizes) or
 wraparound arithmetic using mulmod_bnm1 for larger sizes.

 If we need both the sqrt of the high-half and the residual, we should
 directly use sqrtrem, don't we? 

Maybe, but it's not obvious. For simplicity, consider the case of
smallish numbers, so we use mullo rather than mulmod_bnm1 for the
cancellation. Then H (high half of the square root) depends only on the
high half of A, while the residual is computed from H and the low half
of A. So I see no obvious gain in computing the residual sooner.

 Once we compute a residual, correcting the
 approximation to obtain an exact result does not cost too much,
 right?

Exact for the current A input, yes. But one level up in the recursion,
where additional low limbs are used, it's no longer exact. Nothing is
very obvious here, but my gut feeling is that exactness isn't a very
useful property for the intermediate results based on truncated versions
of the top-level input. And it's best to defer the adjustment to
top-level.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel