Re: Anomaly in mpn_sqrtrem and mpn_rootrem
Marco Bodrato bodr...@mail.dm.unipi.it writes: Before the changes I just pushed, I simply reordered the steps in the loop to shorten the first and the last iteration in the loop... Resulting in even better performance, I presume? How much speed difference is there now, for k = 4 vs sqrt(sqrt())? mpn_sqrtmpn_root.4mpn_root.8 mpn_root.16 1 #33.86659.37229.62178.01 2 #112.47916.25789.81273.80 4 #245.35 1350.10 .97 1117.45 8 #419.01 1934.60 1683.93 1570.81 16 #1015.91 2611.44 2558.12 2472.39 32 #1666.53 3837.72 3969.28 4031.27 64 #3305.67 6295.23 6160.25 6654.23 Is the difference small enough that we could fix it by running the first few iterations using plain limb arithmetic? I fear it is not. That is indeed evident from that data. E.g., the delta between sizes 2 and 4 is more than twice greater for root than for sqrt. -- Torbjörn Please encrypt, key id 0xC8601622 ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Anomaly in mpn_sqrtrem and mpn_rootrem
Ciao, On Tue, August 18, 2015 10:51 am, Torbjörn Granlund wrote: I miss some rootrem logs from ChangeLog. Before the changes I just pushed, I simply reordered the steps in the loop to shorten the first and the last iteration in the loop... How much speed difference is there now, for k = 4 vs sqrt(sqrt())? mpn_sqrtmpn_root.4mpn_root.8 mpn_root.16 1 #33.86659.37229.62178.01 2 #112.47916.25789.81273.80 4 #245.35 1350.10 .97 1117.45 8 #419.01 1934.60 1683.93 1570.81 16 #1015.91 2611.44 2558.12 2472.39 32 #1666.53 3837.72 3969.28 4031.27 64 #3305.67 6295.23 6160.25 6654.23 Is the difference small enough that we could fix it by running the first few iterations using plain limb arithmetic? I fear it is not. -- http://bodrato.it/ ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Anomaly in mpn_sqrtrem and mpn_rootrem
Ciao, On Tue, July 14, 2015 11:48 am, bodr...@mail.dm.unipi.it wrote: estimate. The trick I suggest consists in adding some fractional digits of log2(n), by table lookup, then use log2(n)/k to estimate a few of the highest digits of n^(1/k), not just one. With two 256-chars tables(too much?), it is possible to estimate 9 bits (seldom the error is +2, previous code always used a single correction). Before we started refining mpn_rootrem (changeset 16672:c86f4fc0aafe) the timing (on shell.gmplib) was: $tune/speed -cs 1-1024 -f3 mpn_root.3 mpn_root.32 mpn_root.332 mpn_root.3332 overhead 5.84 cycles, precision 1 units of 2.86e-10 secs, CPU freq 3500.08 MHz mpn_root.3 mpn_root.32 mpn_root.332 mpn_root.3332 1 2120.44424.16 84.71#84.70 3 2537.38 2255.13 86.57#86.53 9 3556.60 4661.17687.10#99.87 274779.44 6667.10 7156.09 #151.71 818768.02 13230.91 31229.99 #6717.58 243 40720.39 #27991.11 72860.77 81112.17 729 135971.53#112360.63 240474.621060633.50 The current repo gives: mpn_root.3 mpn_root.32 mpn_root.332 mpn_root.3332 1 1985.55290.12#27.20 27.21 3 2373.10 2015.57 27.21#27.21 9 3359.58 4401.62411.39#27.21 274626.43 6317.53 6857.23#27.21 818583.51 12816.22 30837.67 #5783.87 243 39055.19 #27236.95 71936.69 78244.04 729 134456.72#111492.59 235518.651054410.50 The code I'm experimenting with, reduces timings for small results: mpn_root.3 mpn_root.32 mpn_root.332 mpn_root.3332 1 634.94193.03#28.22 28.25 3 1289.25200.82#28.17 28.18 9 1865.79 1341.31384.23#28.24 273411.88 3257.38816.07#28.24 817518.93 9572.27 18564.51 #1991.75 243 37723.17 24465.42 61376.28 #20745.56 729 133654.08#108648.30 226299.89 768142.04 I'm still testing it, but I think I'll push it soon... Now, for small sizes, computing the 4th root with mpn_root is slower than calling mpn_sqrt twice. Even with the new code, this is true on a wide range (up to 1000 limbs). Should we write special code for k=4,8,16 that repeatedly calls mpn_sqrt? Regards, m -- http://bodrato.it/ ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Anomaly in mpn_sqrtrem and mpn_rootrem
bodr...@mail.dm.unipi.it writes: But I spotted % k and / k there, and those are very expensive, unless you table inverses of k, of course... There is already a single (there are two currently, but we can avoid one) division by k in the code, it is used to compute the first single-bit estimate. The trick I suggest consists in adding some fractional digits of log2(n), by table lookup, then use log2(n)/k to estimate a few of the highest digits of n^(1/k), not just one. I still have to analyse the maximal error, but I believe we can use the trick to skip some iterations. I agree that your code is an improvement. For all of GMP, I think we should consider alternatives to explicit division (and modulus). If the numerator is limited, we can always compute an accurate quotient with a multiply, provided we have a reasonable divisor inverse. My division paranoia might not be motivated for all chips, though. 32-bit chips tend to have fast enough division hardware,, and recent 64-bit x86 CPUs have fast 64/64 division (but slow 128/64 division). We cannot then beat the hardware with Newton. Why bother? Our sqrt(rem) and root(rem) will sometimes be used for just a few limbs, and then division as part of the bookkeeping will take a very large fraction of the time. -- Torbjörn Please encrypt, key id 0xC8601622 ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Anomaly in mpn_sqrtrem and mpn_rootrem
Ciao, Il Mer, 8 Luglio 2015 4:20 pm, Torbjörn Granlund ha scritto: bodr...@mail.dm.unipi.it writes: The code above should give a range for the first 5 bits (the first is 1 anyway), and it should work for any k. But I spotted % k and / k there, and those are very expensive, unless you table inverses of k, of course... There is already a single (there are two currently, but we can avoid one) division by k in the code, it is used to compute the first single-bit estimate. The trick I suggest consists in adding some fractional digits of log2(n), by table lookup, then use log2(n)/k to estimate a few of the highest digits of n^(1/k), not just one. I still have to analyse the maximal error, but I believe we can use the trick to skip some iterations. Best regards, m -- http://bodrato.it/papers/ ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Anomaly in mpn_sqrtrem and mpn_rootrem
Surely the most important k is 3 (assuming 2 is always handled by the sqrt code). Elaborate? Are you thinking of any particular algorithm using cube roots, or just the general observation that smaller numbers tend to be more common? The latter; I think u^(1/3) will tend to be more often computed than u^(1/4711). -- Torbjörn Please encrypt, key id 0xC8601622 ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Anomaly in mpn_sqrtrem and mpn_rootrem
Ciao, Il Mar, 7 Luglio 2015 11:12 am, Torbjörn Granlund ha scritto: bodr...@mail.dm.unipi.it writes: Maybe we should improve mpn_rootrem for small sizes in general... Tabling start values is hard, but we should consider doing it for k some limit, and perhaps provide just 4 bits. We can use something based on the logarithm. The code above should give a range for the first 5 bits (the first is 1 anyway), and it should work for any k. void root (mp_limb_t *r, mp_limb_t n, unsigned k) { /* vector(32,i,floor((log(31+i)/log(2)-5)*256)) */ unsigned char v[] = {0, 11, 22, 33, 43, 53, 63, 73, 82, 91, 100, 109, 117, 125, 134, 141, 149, 157, 164, 172, 179, 186, 193, 200, 206, 213, 219, 225, 232, 238, 244, 250, 255}; unsigned c; mp_limb_t low, hi; count_leading_zeros (c, n); c -= GMP_NAIL_BITS; n = c; n = GMP_NUMB_BITS - 6; n -= 32; /* Now 0 = n 32 contains the higest bits of the original n, with the first 1 removed. */ c = GMP_NUMB_BITS - c-1; /* c is the number of bits of the original number. */ low = v[n] + (c%k) * 256; hi = v[n+1] + (c%k) * 256; low = low / k; hi = hi / k; for (c=32; low v[--c];) ; r[0] = c+32; for (;hi v[c]; ++c) ; r[1] = c+32; } Regards, m -- http://bodrato.it/papers/ ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Anomaly in mpn_sqrtrem and mpn_rootrem
Ciao, Il Mer, 8 Luglio 2015 8:31 am, Niels Möller ha scritto: bodr...@mail.dm.unipi.it writes: If H=floor(sqrt(A/B^2n)), then the residual we need to compute the lower half of the square root is floor(A/B^2n)-H^2 . I don't follow you here. Consider a simple case: A of size 2n, and we try to compute an approximation of sqrt(B^{2n-2} A). Then first compute high half as H \approx sqrt(B^{n-2} floor (B^{-n} A)) And the residual is E = H^2 - A (appr. n limbs, due to cancellation) E is approximatively 3n/2 limbs. An example: B=10 A=9876 (4 digits) To compute the high half, we consider A'=98 (2 digits), we compute H'=sqrt(98)=9 (1 digit). (A'=H'^2+17, 17 = 2*H', this result is exact, the first digit will not be changed by any following step...) Then H=90 (2 digits) E=A - H^2 = 9876-90^2 = 1776 (slightly more than 3 digits...) But we do not need all of E, we can simply use E' = (A'-H'^2)*B + second_digit_of(A) = (98-9^2)*10 +... = 177 We computed (98-9^2), the remainder of the high half... Then 177/9/2 = 9 is the second digit. ... but maybe I did not understand the algorithm you proposed. Best regards, m -- http://bodrato.it/ ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Anomaly in mpn_sqrtrem and mpn_rootrem
bodr...@mail.dm.unipi.it writes: The code above should give a range for the first 5 bits (the first is 1 anyway), and it should work for any k. I have't looked at the maths, so I am not yet impressed... But I spotted % k and / k there, and those are very expensive, unless you table inverses of k, of course... -- Torbjörn Please encrypt, key id 0xC8601622 ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Anomaly in mpn_sqrtrem and mpn_rootrem
bodr...@mail.dm.unipi.it writes: An example: B=10 A=9876 (4 digits) I'm thinking of a slightly different arrangement. At the top-level sqrtrem, since we have 4 digits, we have to use the top 3 to get an accurate square root approximation. So we'll call a function to compute sqrt(987 * B^1). With three digits input and 2 digits output, so this is the function call I've denoted by sqrt_nm1(3, 987), for lack of a better name. Next, sqrt_nm1 will split off the top two digits, and recursively compute H = sqrt_nm1(2, 98) This is handled as the base base, returning floor (sqrt(98)) = 9. So we get H = 9. Since we had an odd number of digits (3), the residual is E = 987 - B H^2 = 987 - 810 = 177. We then return X = B H + floor (E / 2H) = 90 + floor(177 / 18) = 99 So this is the value returned to the top-level sqrtrem, sqrt_nm1(3, 987) --- 99 Finally, the top-level may need an adjustment taking the final digit into account. In this example, no adjustment is needed. One case where it is needed is if the top-level input happens to be a square, and with a non-zero least significant digit. E.g, 8836, where sqrt_nm1(3, 883) returns 93, one off from the correct square root 94. ... but maybe I did not understand the algorithm you proposed. I hope I've made it a little bit clearer. (I'm actually working on an implementation, so we can play more with it). Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Anomaly in mpn_sqrtrem and mpn_rootrem
ni...@lysator.liu.se (Niels Möller) writes: There are divisions also in the newton iteration, right? Then a single division in the computation of the initial value is likely a win, if it saves two or more iterations in the loop. There are GMP division operations, but these are not likely going to hit any hardware division. Precomputing an inverse of k makes sense, if there's more than one division by k. Yes. I think we should consider using special starting approximations for important k, and then perhaps this general 5-bit code for the remaining cases. Surely the most important k is 3 (assuming 2 is always handled by the sqrt code). We could provide a 8-bit table here which just performs masking and indexing. Some other small k values might deserve such tables too. -- Torbjörn Please encrypt, key id 0xC8601622 ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Anomaly in mpn_sqrtrem and mpn_rootrem
t...@gmplib.org (Torbjörn Granlund) writes: But I spotted % k and / k there, and those are very expensive, unless you table inverses of k, of course... There are divisions also in the newton iteration, right? Then a single division in the computation of the initial value is likely a win, if it saves two or more iterations in the loop. Precomputing an inverse of k makes sense, if there's more than one division by k. (I also haven't tried to understand the fine details of the algorithm). Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Anomaly in mpn_sqrtrem and mpn_rootrem
bodr...@mail.dm.unipi.it writes: Not a gain in speed, probably, but we can avoid writing code twice. Using some variant of sqrtrem is one alternative for the organization. If H=floor(sqrt(A/B^2n)), then the residual we need to compute the lower half of the square root is floor(A/B^2n)-H^2 . I don't follow you here. Consider a simple case: A of size 2n, and we try to compute an approximation of sqrt(B^{2n-2} A). Then first compute high half as H \approx sqrt(B^{n-2} floor (B^{-n} A)) And the residual is E = H^2 - A (appr. n limbs, due to cancellation) So the pieces of A needed for the two computations have very little overlap, I haven't done the error analysis, but I expect at most one limb, i.e., E is n+1 limbs, possibly signed. For other odd/even cases, we also need residuals of the form E = B H^2 - A and E = H^2 - B A Well, if H=floor(sqrt(floor(A/B^2n))), then H=floor(sqrt(A)/B^n). I mean, if the square root of the high half of the number is correctly rounded, it will coincide with the high half of the root, the additional low libs can not change it. Hmmm, that's related to sqrt being concave? Computations involving floor are a bit non-obvious to me. In my notation above, an exact H would be H = floor (sqrt (floor (A/B^2))) = floor (sqrt(A) / B) Say we have such an exact H, what kind of rounding and adjustments do we need after the adjustment step X -- B^{n-1} H - B^{n-1} E / 2H; some rounding for the division If it's possible to get a correct X = floor (sqrt (B^{n-2} A)) without actually computing X^2, that makes an exact algorithm more attractive. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Anomaly in mpn_sqrtrem and mpn_rootrem
Ciao, Il Lun, 6 Luglio 2015 11:28 pm, Torbjörn Granlund ha scritto: bodr...@mail.dm.unipi.it writes: ... yes, I know, we really need to improve also odd sizes... Perhaps one could simply wrap the current function in code which appends a zero limbs, calls the current code, and then right-shift the root by half the limb bitcount? This is a description of the patch I pushed yesterday: https://gmplib.org/repo/gmp/rev/87ba695c8878 But I do not really like it. We alloc-copy-shift to add a dummy limb, then we call a code that allocs-copies-shifts to (virtually) add two more dummy limbs... Anyway, in general the code for sqrt (no-remainder) is faster now than it was before. With code from revision 16716 (3 weeks ago) we got time tests/devel/try mpn_sqrt [...] user0m25.299s Now we get: time tests/devel/try mpn_sqrt [...] user0m23.300s (One would possibly need to suppress such code for operands below a certain size, to avoid slowdowns.) There are some slowdown only for sizes 15,16,31,32, i.e. when the size is near a small power of 2. Now, for small sizes, computing the 4th root with mpn_rootrem is slower than calling mpn_sqrtrem twice. Maybe we should improve mpn_rootrem for small sizes in general... Regards, m -- http://bodrato.it/papers/ ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Anomaly in mpn_sqrtrem and mpn_rootrem
bodr...@mail.dm.unipi.it writes: https://gmplib.org/repo/gmp/rev/87ba695c8878 But I do not really like it. We alloc-copy-shift to add a dummy limb, then we call a code that allocs-copies-shifts to (virtually) add two more dummy limbs... There is still a very noticeable difference between even and odd sizes: mpn_sqrtmpn_root.2 mpn_sqrtrem 4090 #1701239.982159255.232270148.03 4091 #1717724.102161403.542314530.39 4092 #1677835.052156978.062263015.48 4093 #1713743.032164690.062306523.04 4094 #1689689.172192785.722302479.92 4095 #1733267.802193850.692301651.07 4096 #1702328.082193558.182254053.40 4097 #1742066.352267900.342321820.79 4098 #1699858.762186774.592269899.77 4099 #1734269.702195714.072331587.32 4100 #1712964.012195057.782293181.09 Now, for small sizes, computing the 4th root with mpn_rootrem is slower than calling mpn_sqrtrem twice. Maybe we should improve mpn_rootrem for small sizes in general... Indeed. We start with a single-bit approximation, then initially make very little progress in the first steps (one bit for higher-order roots). Tabling start values is hard, but we should consider doing it for k some limit, and perhaps provide just 4 bits. Another great speedup would be to use special code for the single-limb iterating. -- Torbjörn Please encrypt, key id 0xC8601622 ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Anomaly in mpn_sqrtrem and mpn_rootrem
bodr...@mail.dm.unipi.it writes: 4095 #2112866.692407388.572453779.52 4096 #1849300.022377781.132416991.81 4097 #2144198.042376456.382492709.77 4098 #1853563.192379253.132469931.56 ... yes, I know, we really need to improve also odd sizes... Perhaps one could simply wrap the current function in code which appends a zero limbs, calls the current code, and then right-shift the root by half the limb bitcount? (One would possibly need to suppress such code for operands below a certain size, to avoid slowdowns.) -- Torbjörn Please encrypt, key id 0xC8601622 ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Anomaly in mpn_sqrtrem and mpn_rootrem
Ciao, Il Mer, 24 Giugno 2015 10:25 pm, Niels Möller ha scritto: bodr...@mail.dm.unipi.it writes: If we need both the sqrt of the high-half and the residual, we should directly use sqrtrem, don't we? Maybe, but it's not obvious. For simplicity, consider the case of smallish numbers, so we use mullo rather than mulmod_bnm1 for the cancellation. Then H (high half of the square root) depends only on the high half of A, while the residual is computed from H and the low half of A. So I see no obvious gain in computing the residual sooner. Not a gain in speed, probably, but we can avoid writing code twice. If H=floor(sqrt(A/B^2n)), then the residual we need to compute the lower half of the square root is floor(A/B^2n)-H^2 . I do not know which algorithm is the faster to obtain both H and the residual, but we should implement it just once, in sqrtrem. Once we compute a residual, correcting the approximation to obtain an exact result does not cost too much, right? Exact for the current A input, yes. But one level up in the recursion, where additional low limbs are used, it's no longer exact. Nothing is Well, if H=floor(sqrt(floor(A/B^2n))), then H=floor(sqrt(A)/B^n). I mean, if the square root of the high half of the number is correctly rounded, it will coincide with the high half of the root, the additional low libs can not change it. very obvious here, but my gut feeling is that exactness isn't a very useful property for the intermediate results based on truncated versions of the top-level input. And it's best to defer the adjustment to top-level. Adjustment is a little bit more complex if the previous intermediate result was not exact, but you are probably right: a unique top-level adjustment might be the best strategy. Actually I opted for a simpler reuse-already-working-code strategy :-) Best regards, m -- http://bodrato.it/papers/ ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Anomaly in mpn_sqrtrem and mpn_rootrem
Ciao, Il Mer, 24 Giugno 2015 10:48 am, bodr...@mail.dm.unipi.it ha scritto: Il Mer, 10 Giugno 2015 9:54 pm, Niels Möller ha scritto: And the divisions really ought to use divappr, rather than exact truncating division. You are right, I did not try divappr yet. I'll do. Using div_q (not computing remainder) instead of divrem gave a 10% speedup. Using divappr instead of div_q we gain a barely measurable 1%. Moreover we don't have an mpn_divappr_q interface, and different implementations *_divappr_q give different approximations. I hope I correctly understood the comments in the code ... I assumed the worst case is a possible +4 error when using mu_divappr. We gained another 1% (probably less) using tdiv_qr instead of divrem (in the sqrtrem branch). This drastically reduced the test coverage for divrem: https://gmplib.org/devel/lcov/shell/var/tmp/lcov/gmp/mpn/divrem.c.gcov.html About divrem, why do we allocate a copy for the reminder? rp = TMP_ALLOC_LIMBS (dn); mpn_tdiv_qr (q2p, rp, 0L, np, nn, dp, dn); MPN_COPY (np, rp, dn); /* overwrite np area with remainder */ We can directly ask tdiv to overwrite np, can't we? mpn_tdiv_qr (q2p, np, 0L, np, nn, dp, dn); Best regards, m -- http://bodrato.it/toom-cook/ ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Anomaly in mpn_sqrtrem and mpn_rootrem
Ciao, Il Mer, 24 Giugno 2015 9:53 am, Torbjörn Granlund ha scritto: bodr...@mail.dm.unipi.it writes: But the code I'm experimenting with is not ready yet, it supports only even sizes currently. That's another 10% speedup. Yes... unluckily I did not have the time to adapt it to odd sizes, but I decided to push it, so that anyone can look into it and comment or improve it. The speed of revision 16730: $ tune/speed -cp1 -s4093-4098 mpn_sqrt mpn_root.2 mpn_sqrtrem overhead 5.90 cycles, precision 1 units of 2.86e-10 secs, CPU freq 3500.08 MHz mpn_sqrtmpn_root.2 mpn_sqrtrem 4093 #2120836.002286161.232450569.02 4094 #2107173.022358206.402507010.17 4095 #2126777.632392313.422458527.34 4096 #2172714.172373645.182470476.63 4097 #2176120.282358870.392542117.51 4098 #2193557.392374669.592513122.24 The speed with the new code (current repo): $ tune/speed -cp1 -s4093-4098 mpn_sqrt mpn_root.2 mpn_sqrtrem overhead 5.86 cycles, precision 1 units of 2.86e-10 secs, CPU freq 3500.08 MHz mpn_sqrtmpn_root.2 mpn_sqrtrem 4093 #2103034.462307752.772442488.63 4094 #1831188.022367552.042469309.74 4095 #2112866.692407388.572453779.52 4096 #1849300.022377781.132416991.81 4097 #2144198.042376456.382492709.77 4098 #1853563.192379253.132469931.56 ... yes, I know, we really need to improve also odd sizes... My guess is that a division-free iteration would give another 10%, and then using David's mulmid in that code would improve things by...10%. Maybe... Best regards, m -- http://bodrato.it/ ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Anomaly in mpn_sqrtrem and mpn_rootrem
bodr...@mail.dm.unipi.it writes: Note that all the computations of the error/residual E is subject to a lot of cancellation, so one can use mullo (for small sizes) or wraparound arithmetic using mulmod_bnm1 for larger sizes. If we need both the sqrt of the high-half and the residual, we should directly use sqrtrem, don't we? Maybe, but it's not obvious. For simplicity, consider the case of smallish numbers, so we use mullo rather than mulmod_bnm1 for the cancellation. Then H (high half of the square root) depends only on the high half of A, while the residual is computed from H and the low half of A. So I see no obvious gain in computing the residual sooner. Once we compute a residual, correcting the approximation to obtain an exact result does not cost too much, right? Exact for the current A input, yes. But one level up in the recursion, where additional low limbs are used, it's no longer exact. Nothing is very obvious here, but my gut feeling is that exactness isn't a very useful property for the intermediate results based on truncated versions of the top-level input. And it's best to defer the adjustment to top-level. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel