t...@gmplib.org (Torbjörn Granlund) writes:
> ni...@lysator.liu.se (Niels Möller) writes:
>
> I've made a quick try deleting it from the single-limb loop. See patch
> below. Measurements are a bit noisy, but it looks like a slowdown when I
> time it. With hgcd2 time increasing from 1220
ni...@lysator.liu.se (Niels Möller) writes:
Unit tests would be nice. I think the tests/mpz/t-gcd.c does exercises
the large quotient cases.
I cobbled together this:
test-div2.c
Description: Binary data
BTW, I wonder if it makes sense with HGCD2_DIV2_METHOD == 3 similar to
t...@gmplib.org (Torbjörn Granlund) writes:
> I feel we've achieved much of the possible speedup for gcd now.
How much speedup have we achieved?
> But what more can we do before we are completely done for now?
> Let me try to list it:
>
>
> * Add entry points for gcd_11 allowing even
t...@gmplib.org (Torbjörn Granlund) writes:
> What are the specs for div2?
>
> Surely n1 > 0 and d1 > 0.
All variants need d1 > 0, method 2 also needs n1 > 0 (for
count_leading_zeros).
> Also N >= D?
Method 2 needs clz(n1) <= clz(d1). Besides that, I think they can handle
N < D, i.e., q == 0.
ni...@lysator.liu.se (Niels Möller) writes:
How much speedup have we achieved?
I don't know. I just observed the GCD_DC_THRESHOLD to change a lot, and
that is a good sign.
Looks like GCD_DC_THRESHOLD gets higher on many machines, but lower on a
few? I now realize that the way tuneup
t...@gmplib.org (Torbjörn Granlund) writes:
> Ideally, one would compile hgcd2.c in all possible variants (presumable
> through hand-crafted hgcd2-1-1.c, hgcd2-2-1.c, etc., and then call the
> selected hgcd2's entry point through a function pointer for further
> measuring.
Hmm. So the
I feel we've achieved much of the possible speedup for gcd now.
But what more can we do before we are completely done for now?
Let me try to list it:
* Add entry points for gcd_11 allowing even operand(s).
* Add entry points for gcd_22 allowing even operand(s)?
* Make generic/gcd_1.c call
ni...@lysator.liu.se (Niels Möller) writes:
Ooops. Now it should be in.
What are the specs for div2?
Surely n1 > 0 and d1 > 0.
Also N >= D?
Does the new div2 always compute the "accurate" quotient,
i.e., with the remainder R < D?
I'm asking as I believe strongly in unit testing of these
ni...@lysator.liu.se (Niels Möller) writes:
I've made a quick try deleting it from the single-limb loop. See patch
below. Measurements are a bit noisy, but it looks like a slowdown when I
time it. With hgcd2 time increasing from 1220 cycles to 1290 (this time
measured on broadwell), which