Re: GCD project status?

2019-09-27 Thread Torbjörn Granlund
We now have almost complete results for the latest hgcd2.c tweaks. We have 5 div1 variants (as chosen by HGCD2_DIV1_METHOD). There are results and brief documentation here: https://gmplib.org/devel/thres/HGCD2_DIV1_METHOD.html About half of our test machines prefer '/', i.e., whatever word

Re: GCD project status?

2019-09-24 Thread Torbjörn Granlund
ni...@lysator.liu.se (Niels Möller) writes: Hmm. Definitely worth a try. But if we need explicit loads and stores from the structs, we'll not save that many instructions. x86 can handle operate-from-memory at almost the same cost as operate-from-register. But operate-to-memory is expensive.

Re: GCD project status?

2019-09-23 Thread Niels Möller
t...@gmplib.org (Torbjörn Granlund) writes: > Perhaps keeping the to-be-swapped variables in two structs, and instead > conditionally swap pointers to the structs? Hmm. Definitely worth a try. But if we need explicit loads and stores from the structs, we'll not save that many instructions. Each

Re: GCD project status?

2019-09-23 Thread Torbjörn Granlund
ni...@lysator.liu.se (Niels Möller) writes: $ ./tune/speed -c -s1 -p10 mpn_hgcd2_1 mpn_hgcd2_2 mpn_hgcd2_3 mpn_hgcd2_4 mpn_hgcd2_5 mpn_hgcd2_binary overhead 6.02 cycles, precision 10 units of 8.33e-10 secs, CPU freq 1200.00 MHz mpn_hgcd2_1 mpn_hgcd2_2 mpn_hgcd2_3

Re: GCD project status?

2019-09-23 Thread Niels Möller
ni...@lysator.liu.se (Niels Möller) writes: > Below is one variant, that seems to pass tests. I haven't done any > benchmarks yet. Just gave it a try with tune/speed, together with all the latest div1 variants. This one on my laptop, and not looking that great: $ ./tune/speed -c -s1 -p10

Re: GCD project status?

2019-09-23 Thread Torbjörn Granlund
t...@gmplib.org (Torbjörn Granlund) writes: I went ahead and pushed a hgcd2.c with _METHOD = 4 and _METHOD 5. The former tables quotients and neeeds one multiply per div1 call and the latter tables inverses and needs 2 multiplies per div1 call. I pushed improved versions. Instead of

Re: GCD project status?

2019-09-22 Thread Torbjörn Granlund
ni...@lysator.liu.se (Niels Möller) writes: I see two ways. Either just do an inversion table, which will cost an additional multiplication when used. Than one could handle quotients up to 7 or 8 bits bits with just 256 bytes of table. Or shift both ni and di, and get quotient with

Re: GCD project status?

2019-09-22 Thread Niels Möller
t...@gmplib.org (Torbjörn Granlund) writes: > I had not realised you meant left-to-right. I thought you talked about > right-to-left. Interesting! Below is one variant, that seems to pass tests. I haven't done any benchmarks yet. All the SWAP_MASK calls look a bit expensive, should be 4

Re: GCD project status?

2019-09-22 Thread Torbjörn Granlund
Nice! -- Torbjörn Please encrypt, key id 0xC8601622 ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel

Re: GCD project status?

2019-09-22 Thread Niels Möller
ni...@lysator.liu.se (Niels Möller) writes: > Hmm. So the hgcd2-x-y.c would be there als for speed, and hgcd2.c would > have something like > > #if TUNE_PROGRAM_BUILD > int mpn_hgcd2(...) > { > return (*hgcd2_method_pointer)(...); > } See below patch. diff -r 3e04a9bbba13 gmp-impl.h ---

Re: GCD project status?

2019-09-19 Thread Niels Möller
t...@gmplib.org (Torbjörn Granlund) writes: > With some thought, I am sure the table size could come down while the > accuracy could improve. I see two ways. Either just do an inversion table, which will cost an additional multiplication when used. Than one could handle quotients up to 7 or 8

Re: GCD project status?

2019-09-18 Thread Torbjörn Granlund
> * Speed up div1, presumably with a table-based approach. A quick-and-dirty but working variant below. The table size with NBITS = 8 is 32KiB. Far too big. It avoids branches in 83% of the cases, which is not that great considering the huge cost of mispredicted branches. With some

Re: GCD project status?

2019-09-17 Thread Niels Möller
t...@gmplib.org (Torbjörn Granlund) writes: > Ideally, one would compile hgcd2.c in all possible variants (presumable > through hand-crafted hgcd2-1-1.c, hgcd2-2-1.c, etc., and then call the > selected hgcd2's entry point through a function pointer for further > measuring. Hmm. So the

Re: GCD project status?

2019-09-17 Thread Torbjörn Granlund
ni...@lysator.liu.se (Niels Möller) writes: How much speedup have we achieved? I don't know. I just observed the GCD_DC_THRESHOLD to change a lot, and that is a good sign. Looks like GCD_DC_THRESHOLD gets higher on many machines, but lower on a few? I now realize that the way tuneup

Re: GCD project status?

2019-09-17 Thread Niels Möller
t...@gmplib.org (Torbjörn Granlund) writes: > I feel we've achieved much of the possible speedup for gcd now. How much speedup have we achieved? > But what more can we do before we are completely done for now? > Let me try to list it: > > > * Add entry points for gcd_11 allowing even