Re: hgcd1/2

2019-09-17 Thread Niels Möller
t...@gmplib.org (Torbjörn Granlund) writes: > ni...@lysator.liu.se (Niels Möller) writes: > > I've made a quick try deleting it from the single-limb loop. See patch > below. Measurements are a bit noisy, but it looks like a slowdown when I > time it. With hgcd2 time increasing from 1220

Re: hgcd1/2

2019-09-17 Thread Torbjörn Granlund
ni...@lysator.liu.se (Niels Möller) writes: Unit tests would be nice. I think the tests/mpz/t-gcd.c does exercises the large quotient cases. I cobbled together this: test-div2.c Description: Binary data BTW, I wonder if it makes sense with HGCD2_DIV2_METHOD == 3 similar to

Re: GCD project status?

2019-09-17 Thread Niels Möller
t...@gmplib.org (Torbjörn Granlund) writes: > I feel we've achieved much of the possible speedup for gcd now. How much speedup have we achieved? > But what more can we do before we are completely done for now? > Let me try to list it: > > > * Add entry points for gcd_11 allowing even

Re: hgcd1/2

2019-09-17 Thread Niels Möller
t...@gmplib.org (Torbjörn Granlund) writes: > What are the specs for div2? > > Surely n1 > 0 and d1 > 0. All variants need d1 > 0, method 2 also needs n1 > 0 (for count_leading_zeros). > Also N >= D? Method 2 needs clz(n1) <= clz(d1). Besides that, I think they can handle N < D, i.e., q == 0.

Re: GCD project status?

2019-09-17 Thread Torbjörn Granlund
ni...@lysator.liu.se (Niels Möller) writes: How much speedup have we achieved? I don't know. I just observed the GCD_DC_THRESHOLD to change a lot, and that is a good sign. Looks like GCD_DC_THRESHOLD gets higher on many machines, but lower on a few? I now realize that the way tuneup

Re: GCD project status?

2019-09-17 Thread Niels Möller
t...@gmplib.org (Torbjörn Granlund) writes: > Ideally, one would compile hgcd2.c in all possible variants (presumable > through hand-crafted hgcd2-1-1.c, hgcd2-2-1.c, etc., and then call the > selected hgcd2's entry point through a function pointer for further > measuring. Hmm. So the

GCD project status?

2019-09-17 Thread Torbjörn Granlund
I feel we've achieved much of the possible speedup for gcd now. But what more can we do before we are completely done for now? Let me try to list it: * Add entry points for gcd_11 allowing even operand(s). * Add entry points for gcd_22 allowing even operand(s)? * Make generic/gcd_1.c call

Re: hgcd1/2

2019-09-17 Thread Torbjörn Granlund
ni...@lysator.liu.se (Niels Möller) writes: Ooops. Now it should be in. What are the specs for div2? Surely n1 > 0 and d1 > 0. Also N >= D? Does the new div2 always compute the "accurate" quotient, i.e., with the remainder R < D? I'm asking as I believe strongly in unit testing of these

Re: hgcd1/2

2019-09-17 Thread Torbjörn Granlund
ni...@lysator.liu.se (Niels Möller) writes: I've made a quick try deleting it from the single-limb loop. See patch below. Measurements are a bit noisy, but it looks like a slowdown when I time it. With hgcd2 time increasing from 1220 cycles to 1290 (this time measured on broadwell), which