Re: Anomaly in mpn_sqrtrem and mpn_rootrem

2015-08-19 Thread Torbjörn Granlund
Marco Bodrato bodr...@mail.dm.unipi.it writes: Before the changes I just pushed, I simply reordered the steps in the loop to shorten the first and the last iteration in the loop... Resulting in even better performance, I presume? How much speed difference is there now, for k = 4 vs

Re: Anomaly in mpn_sqrtrem and mpn_rootrem

2015-08-18 Thread Marco Bodrato
Ciao, On Tue, August 18, 2015 10:51 am, Torbjörn Granlund wrote: I miss some rootrem logs from ChangeLog. Before the changes I just pushed, I simply reordered the steps in the loop to shorten the first and the last iteration in the loop... How much speed difference is there now, for k = 4 vs

Re: Anomaly in mpn_sqrtrem and mpn_rootrem

2015-08-17 Thread Marco Bodrato
Ciao, On Tue, July 14, 2015 11:48 am, bodr...@mail.dm.unipi.it wrote: estimate. The trick I suggest consists in adding some fractional digits of log2(n), by table lookup, then use log2(n)/k to estimate a few of the highest digits of n^(1/k), not just one. With two 256-chars tables(too

Re: Anomaly in mpn_sqrtrem and mpn_rootrem

2015-07-15 Thread Torbjörn Granlund
bodr...@mail.dm.unipi.it writes: But I spotted % k and / k there, and those are very expensive, unless you table inverses of k, of course... There is already a single (there are two currently, but we can avoid one) division by k in the code, it is used to compute the first single-bit

Re: Anomaly in mpn_sqrtrem and mpn_rootrem

2015-07-14 Thread bodrato
Ciao, Il Mer, 8 Luglio 2015 4:20 pm, Torbjörn Granlund ha scritto: bodr...@mail.dm.unipi.it writes: The code above should give a range for the first 5 bits (the first is 1 anyway), and it should work for any k. But I spotted % k and / k there, and those are very expensive, unless you

Re: Anomaly in mpn_sqrtrem and mpn_rootrem

2015-07-09 Thread Torbjörn Granlund
Surely the most important k is 3 (assuming 2 is always handled by the sqrt code). Elaborate? Are you thinking of any particular algorithm using cube roots, or just the general observation that smaller numbers tend to be more common? The latter; I think u^(1/3) will tend to be

Re: Anomaly in mpn_sqrtrem and mpn_rootrem

2015-07-08 Thread bodrato
Ciao, Il Mar, 7 Luglio 2015 11:12 am, Torbjörn Granlund ha scritto: bodr...@mail.dm.unipi.it writes: Maybe we should improve mpn_rootrem for small sizes in general... Tabling start values is hard, but we should consider doing it for k some limit, and perhaps provide just 4 bits. We

Re: Anomaly in mpn_sqrtrem and mpn_rootrem

2015-07-08 Thread bodrato
Ciao, Il Mer, 8 Luglio 2015 8:31 am, Niels Möller ha scritto: bodr...@mail.dm.unipi.it writes: If H=floor(sqrt(A/B^2n)), then the residual we need to compute the lower half of the square root is floor(A/B^2n)-H^2 . I don't follow you here. Consider a simple case: A of size 2n, and we try

Re: Anomaly in mpn_sqrtrem and mpn_rootrem

2015-07-08 Thread Torbjörn Granlund
bodr...@mail.dm.unipi.it writes: The code above should give a range for the first 5 bits (the first is 1 anyway), and it should work for any k. I have't looked at the maths, so I am not yet impressed... But I spotted % k and / k there, and those are very expensive, unless you table

Re: Anomaly in mpn_sqrtrem and mpn_rootrem

2015-07-08 Thread Niels Möller
bodr...@mail.dm.unipi.it writes: An example: B=10 A=9876 (4 digits) I'm thinking of a slightly different arrangement. At the top-level sqrtrem, since we have 4 digits, we have to use the top 3 to get an accurate square root approximation. So we'll call a function to compute sqrt(987 * B^1).

Re: Anomaly in mpn_sqrtrem and mpn_rootrem

2015-07-08 Thread Torbjörn Granlund
ni...@lysator.liu.se (Niels Möller) writes: There are divisions also in the newton iteration, right? Then a single division in the computation of the initial value is likely a win, if it saves two or more iterations in the loop. There are GMP division operations, but these are not likely

Re: Anomaly in mpn_sqrtrem and mpn_rootrem

2015-07-08 Thread Niels Möller
t...@gmplib.org (Torbjörn Granlund) writes: But I spotted % k and / k there, and those are very expensive, unless you table inverses of k, of course... There are divisions also in the newton iteration, right? Then a single division in the computation of the initial value is likely a win, if it

Re: Anomaly in mpn_sqrtrem and mpn_rootrem

2015-07-08 Thread Niels Möller
bodr...@mail.dm.unipi.it writes: Not a gain in speed, probably, but we can avoid writing code twice. Using some variant of sqrtrem is one alternative for the organization. If H=floor(sqrt(A/B^2n)), then the residual we need to compute the lower half of the square root is floor(A/B^2n)-H^2 .

Re: Anomaly in mpn_sqrtrem and mpn_rootrem

2015-07-07 Thread bodrato
Ciao, Il Lun, 6 Luglio 2015 11:28 pm, Torbjörn Granlund ha scritto: bodr...@mail.dm.unipi.it writes: ... yes, I know, we really need to improve also odd sizes... Perhaps one could simply wrap the current function in code which appends a zero limbs, calls the current code, and then

Re: Anomaly in mpn_sqrtrem and mpn_rootrem

2015-07-07 Thread Torbjörn Granlund
bodr...@mail.dm.unipi.it writes: https://gmplib.org/repo/gmp/rev/87ba695c8878 But I do not really like it. We alloc-copy-shift to add a dummy limb, then we call a code that allocs-copies-shifts to (virtually) add two more dummy limbs... There is still a very noticeable difference

Re: Anomaly in mpn_sqrtrem and mpn_rootrem

2015-07-06 Thread Torbjörn Granlund
bodr...@mail.dm.unipi.it writes: 4095 #2112866.692407388.572453779.52 4096 #1849300.022377781.132416991.81 4097 #2144198.042376456.382492709.77 4098 #1853563.192379253.132469931.56 ... yes, I know, we really need to improve also odd

Re: Anomaly in mpn_sqrtrem and mpn_rootrem

2015-07-02 Thread bodrato
Ciao, Il Mer, 24 Giugno 2015 10:25 pm, Niels Möller ha scritto: bodr...@mail.dm.unipi.it writes: If we need both the sqrt of the high-half and the residual, we should directly use sqrtrem, don't we? Maybe, but it's not obvious. For simplicity, consider the case of smallish numbers, so we

Re: Anomaly in mpn_sqrtrem and mpn_rootrem

2015-07-02 Thread bodrato
Ciao, Il Mer, 24 Giugno 2015 10:48 am, bodr...@mail.dm.unipi.it ha scritto: Il Mer, 10 Giugno 2015 9:54 pm, Niels Möller ha scritto: And the divisions really ought to use divappr, rather than exact truncating division. You are right, I did not try divappr yet. I'll do. Using div_q (not

Re: Anomaly in mpn_sqrtrem and mpn_rootrem

2015-07-02 Thread bodrato
Ciao, Il Mer, 24 Giugno 2015 9:53 am, Torbjörn Granlund ha scritto: bodr...@mail.dm.unipi.it writes: But the code I'm experimenting with is not ready yet, it supports only even sizes currently. That's another 10% speedup. Yes... unluckily I did not have the time to adapt it to odd

Re: Anomaly in mpn_sqrtrem and mpn_rootrem

2015-06-24 Thread Niels Möller
bodr...@mail.dm.unipi.it writes: Note that all the computations of the error/residual E is subject to a lot of cancellation, so one can use mullo (for small sizes) or wraparound arithmetic using mulmod_bnm1 for larger sizes. If we need both the sqrt of the high-half and the residual, we