Re: Sandybridge addmul_N challenge

2012-02-23 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes: Here's a sketch of an adddmul_2 iteration using Karatsuba. I assume we have vl, vh, vd = |vl - vh| and an appropriate sign vmask in registers before the loop. Carry input in c0, c1, carry out in r2, r3. mov (up), %rax mov

Re: Sandybridge addmul_N challenge

2012-02-23 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes: One can decrease it a bit by adding c0, c1 earlier (do you think recurrency can be a problem if we add c0, c1 to the first product?) and doing an in-place add to (rp) and 8(rp) at the end. I could get it down to 30 instructions with a deep

Re: Sandybridge addmul_N challenge

2012-02-23 Thread Niels Möller
Torbjorn Granlund t...@gmplib.org writes: In loopmixer or manually? I wouldn't draw any conclusions without mixing the code first... With the loop mixer. Meaning evaluating in +1 instead of -1, I assume. Exactly. Did you compute the recurrency chain? Annotating the instructions on the