Ciao,
Il Dom, 25 Agosto 2019 2:28 am, Torbjörn Granlund ha scritto:
> Now we have a nice set of x86_64 gcd_22. The code is not as well tuned
> as the gcd_11 code, but it runs somewhat fast.
So if I suggest to reorder some instructions in the loop, you will not
upset :-)
If we can change cmovc-s
ni...@lysator.liu.se (Niels Möller) writes:
And to make the loop work, it needs some condition to decrement N and
maintain non-zero high limbs (if both up[N-1] and vp[N-1] are zero,
comparison is no good). So that would be something like
Since N is my proposal is a constant, it is
Marco Bodrato writes:
For a generic code with variable N, one may prefer a code that chooses
if a copy or a shorter shift is needed. But this means more code and
the shift could not be an in-lined fixed size version...
I broke out the unlikely up[0] code into a separate function, a
Some cleanups and tweaks later. The gcd_33 based on this, compiled with
gcc 8.3, runs at 30 cycles per iteration. (Note, not cycles per bit!)
My best gcd_33 in assembly runs at 10 cycles per iteration.
The former uses memory based operands. The latter keeps everything in
registers.
If we
Ciao,
Il 2019-08-27 16:35 t...@gmplib.org ha scritto:
I got something working. It runs quite well, and seems to beat the
Great!
static inline void
mpn_gcd_NN (mp_limb_t *rp, mp_limb_t *up, mp_limb_t *vp, size_t N)
I see that your idea is to obtain a N-loop-unrolled version...
if
Il 2019-08-27 21:10 t...@gmplib.org ha scritto:
Marco Bodrato writes:
... and on some platform mpn_rshift may not support cnt==0.
That was taken care of in ny last version.
I wrote my message before, and did not realize, before sending it, that
you sent a new version :-)
I added a
I got something working. It runs quite well, and seems to beat the
performance of mpn_gcd. Here is the code:
#include "gmp-impl.h"
#include "longlong.h"
#ifndef CMP_SWAP
#define CMP_SWAP(ap,bp,n) \
do {