Marco Bodrato bodr...@mail.dm.unipi.it writes:
Before the changes I just pushed, I simply reordered the steps in the loop
to shorten the first and the last iteration in the loop...
Resulting in even better performance, I presume?
How much speed difference is there now, for k = 4 vs
Ciao,
On Tue, August 18, 2015 10:51 am, Torbjörn Granlund wrote:
I miss some rootrem logs from ChangeLog.
Before the changes I just pushed, I simply reordered the steps in the loop
to shorten the first and the last iteration in the loop...
How much speed difference is there now, for k = 4 vs
Ciao,
On Tue, July 14, 2015 11:48 am, bodr...@mail.dm.unipi.it wrote:
estimate. The trick I suggest consists in adding some fractional digits of
log2(n), by table lookup, then use log2(n)/k to estimate a few of the
highest digits of n^(1/k), not just one.
With two 256-chars tables(too
bodr...@mail.dm.unipi.it writes:
But I spotted % k and / k there, and those are very expensive,
unless you table inverses of k, of course...
There is already a single (there are two currently, but we can avoid one)
division by k in the code, it is used to compute the first single-bit
Ciao,
Il Mer, 8 Luglio 2015 4:20 pm, Torbjörn Granlund ha scritto:
bodr...@mail.dm.unipi.it writes:
The code above should give a range for the first 5 bits (the first is 1
anyway), and it should work for any k.
But I spotted % k and / k there, and those are very expensive,
unless you
Surely the most important k is 3 (assuming 2 is always handled by the
sqrt code).
Elaborate? Are you thinking of any particular algorithm using cube
roots, or just the general observation that smaller numbers tend to be
more common?
The latter; I think u^(1/3) will tend to be
Ciao,
Il Mar, 7 Luglio 2015 11:12 am, Torbjörn Granlund ha scritto:
bodr...@mail.dm.unipi.it writes:
Maybe we should improve mpn_rootrem for
small sizes in general...
Tabling start values is hard, but we should consider doing it for k
some limit, and perhaps provide just 4 bits.
We
Ciao,
Il Mer, 8 Luglio 2015 8:31 am, Niels Möller ha scritto:
bodr...@mail.dm.unipi.it writes:
If H=floor(sqrt(A/B^2n)), then the residual we need to compute the lower
half of the square root is floor(A/B^2n)-H^2 .
I don't follow you here. Consider a simple case: A of size 2n, and we
try
bodr...@mail.dm.unipi.it writes:
The code above should give a range for the first 5 bits (the first is 1
anyway), and it should work for any k.
I have't looked at the maths, so I am not yet impressed...
But I spotted % k and / k there, and those are very expensive,
unless you table
bodr...@mail.dm.unipi.it writes:
An example:
B=10
A=9876 (4 digits)
I'm thinking of a slightly different arrangement.
At the top-level sqrtrem, since we have 4 digits, we have to use the top
3 to get an accurate square root approximation. So we'll call a function
to compute sqrt(987 * B^1).
ni...@lysator.liu.se (Niels Möller) writes:
There are divisions also in the newton iteration, right? Then a single
division in the computation of the initial value is likely a win, if it
saves two or more iterations in the loop.
There are GMP division operations, but these are not likely
t...@gmplib.org (Torbjörn Granlund) writes:
But I spotted % k and / k there, and those are very expensive,
unless you table inverses of k, of course...
There are divisions also in the newton iteration, right? Then a single
division in the computation of the initial value is likely a win, if it
bodr...@mail.dm.unipi.it writes:
Not a gain in speed, probably, but we can avoid writing code twice.
Using some variant of sqrtrem is one alternative for the organization.
If H=floor(sqrt(A/B^2n)), then the residual we need to compute the lower
half of the square root is floor(A/B^2n)-H^2 .
Ciao,
Il Lun, 6 Luglio 2015 11:28 pm, Torbjörn Granlund ha scritto:
bodr...@mail.dm.unipi.it writes:
... yes, I know, we really need to improve also odd sizes...
Perhaps one could simply wrap the current function in code which appends
a zero limbs, calls the current code, and then
bodr...@mail.dm.unipi.it writes:
https://gmplib.org/repo/gmp/rev/87ba695c8878
But I do not really like it. We alloc-copy-shift to add a dummy limb, then
we call a code that allocs-copies-shifts to (virtually) add two more dummy
limbs...
There is still a very noticeable difference
bodr...@mail.dm.unipi.it writes:
4095 #2112866.692407388.572453779.52
4096 #1849300.022377781.132416991.81
4097 #2144198.042376456.382492709.77
4098 #1853563.192379253.132469931.56
... yes, I know, we really need to improve also odd
Ciao,
Il Mer, 24 Giugno 2015 10:25 pm, Niels Möller ha scritto:
bodr...@mail.dm.unipi.it writes:
If we need both the sqrt of the high-half and the residual, we should
directly use sqrtrem, don't we?
Maybe, but it's not obvious. For simplicity, consider the case of
smallish numbers, so we
Ciao,
Il Mer, 24 Giugno 2015 10:48 am, bodr...@mail.dm.unipi.it ha scritto:
Il Mer, 10 Giugno 2015 9:54 pm, Niels Möller ha scritto:
And the divisions really ought to use divappr, rather than exact
truncating division.
You are right, I did not try divappr yet. I'll do.
Using div_q (not
Ciao,
Il Mer, 24 Giugno 2015 9:53 am, Torbjörn Granlund ha scritto:
bodr...@mail.dm.unipi.it writes:
But the code I'm experimenting with is not ready yet, it supports only
even sizes currently.
That's another 10% speedup.
Yes... unluckily I did not have the time to adapt it to odd
bodr...@mail.dm.unipi.it writes:
Note that all the computations of the error/residual E is subject to a
lot of cancellation, so one can use mullo (for small sizes) or
wraparound arithmetic using mulmod_bnm1 for larger sizes.
If we need both the sqrt of the high-half and the residual, we
20 matches
Mail list logo