ni...@lysator.liu.se (Niels Möller) writes:
Will try that. I think one could also try to delay the quotient store
one iteration, keeping Q1 in a register until the next iteration. Then
one gets rid of the
adc Q2,8(QP, UN, 8)
in the loop, using only a single store per
Torbjorn Granlund t...@gmplib.org writes:
On Intel chips, op-to-mem is expensive. Even op-from-memory is often
slower than load+op. (I understand the register shortage problem.)
The following (untested) variant needs one register too many.
UP, QP, UN: Load, store, loop counter.
I looked at the logic following this:
sbb U2, U2 C 7 13
You negate the U2 copy in Q2. It seems that three adc by sbb
could avoid the neg.
I might also be possible to replace the early loop and stuff by cmov.
Note that the carry flag survives dec, although that causes a
Torbjorn Granlund t...@gmplib.org writes:
I looked at the logic following this:
sbb U2, U2 C 7 13
You negate the U2 copy in Q2. It seems that three adc by sbb
could avoid the neg.
The problem is the final use, where Q2 is added, with carry, to a
different register.