Re: [PATCH 2 of 3] Add MIPS32R2 RDHWR-based cycle counter support
Mobile Stream writes: T> Add MIPS32R2 RDHWR-based cycle counter support. T> Does that work in user mode for all *nix ports? don't know. basing on a quick grep: openbsd, freebsd seem to only allow ULR but never CC/CCRes; netbsd seems to enable everything in HWREna including CC, CCRes. Wait! We're not MIPS system programmers around here! Can your timing code be used under *nix? I guess from your reply but it cannot, generally. T> Does it not also work in r3, r4, r5? And r6? the instruction and CP0 HWREna are defined in r3, r5, r6 too. Then (assuming the cycle counter is usable from outside of the kernel) I'd suggest that your suggested configuration pattern was a bit too restrictive. -- Torbjörn Please encrypt, key id 0xC8601622 ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: [PATCH 2 of 3] Add MIPS32R2 RDHWR-based cycle counter support
T> Add MIPS32R2 RDHWR-based cycle counter support. T> Does that work in user mode for all *nix ports? don't know. basing on a quick grep: openbsd, freebsd seem to only allow ULR but never CC/CCRes; netbsd seems to enable everything in HWREna including CC, CCRes. T> Does it not also work in r3, r4, r5? And r6? the instruction and CP0 HWREna are defined in r3, r5, r6 too. there is no r4. T> Is there are mipsisa64 counterpart? mips arch specs list mips32r2+ only where they usually mention mips64 too however i6400, p6600 (64-bit r6 cores) specs say rdhwr and the corresponding hwrena bits are supported. octeon cores has this 32-counter since octeon plus at least as well as a custom 64-bit counter. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: [PATCH 2 of 3] Add MIPS32R2 RDHWR-based cycle counter support
i...@mobile-stream.com writes: Add MIPS32R2 RDHWR-based cycle counter support. Does that work in user mode for all *nix ports? Does it not also work in r3, r4, r5? And r6? Is there are mipsisa64 counterpart? -- Torbjörn Please encrypt, key id 0xC8601622 ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
[PATCH 3 of 3] Add MIPS32R1 MADDU-based *mul_1.asm functions
Add MIPS32R1 MADDU-based *mul_1.asm functions. The code tries to keep the [accidental] property of MIPS-II counterparts: constant-time operation on 32x16 MDUs as found on e.g. 4KEc and some low- end MCUs. Even if that is unimportant, the performance cost is invisible. It is faster on all tried MIPS32R1/R2/R5 CPUs (see the c/l table) and is expected to be fast with any pipelined MDU. So-called Area-Efficient MDU (optional on some MCUs) will run it *much* slower (~3x for addmul_1). While functions look similar (especially mul_1 and addmul_1), they are kept separate due to corner-case (N=1,2,3) tweaks for P5600 without any ill effect on 4KEc or 24KEc at least. diff -r 6ab06c72027e -r 789677d6e8b2 configure.ac --- a/configure.ac +++ b/configure.ac @@ -1040,6 +1040,10 @@ mipsisa32r2*-*-*) SPEED_CYCLECOUNTER_OBJ=mips32r2.lo cyclecounter_size=1 +path="mips32/r1 mips32" + ;; + mipsisa32*-*-*) +path="mips32/r1 mips32" ;; esac ;; diff -r 6ab06c72027e -r 789677d6e8b2 mpn/mips32/r1/addmul_1.asm --- /dev/null +++ b/mpn/mips32/r1/addmul_1.asm @@ -0,0 +1,79 @@ +include(`../config.m4') + +C cycles/limb +C 4KEc 9.68 +C 24Kc 9.52 +C 24KEc 9.55 +C P5600 7.80 +C XBurst 13.55 + +C INPUT PARAMETERS +C rp $a0 +C up $a1 +C n$a2 +C vl $a3 + +ASM_START() + .setnoat +PROLOGUE(mpn_addmul_1) + lw $v1,0($a1) C L0 + ori $at,$zero,1 + multu $v1,$a3 C M0 + lw $t0,0($a0) C L1, 32x16 MDU stall + addiu $t1,$a2,-2 + beq $at,$a2,1f +maddu $t0,$at C M0 + mfhi$v0 C M0 carry + lw $t2,4($a1) C L1 + beqz$t1,23f +lw $v1,4($a0) C L1 + mflo$t0 C M0 + andi$t3,$t1,1 + sll $a2,$t1,2 + beqz$t3,0f +addu $a2,$a2,$a1 + multu $t2,$a3 C M1 + lw $t2,8($a1) C L2, 32x16 MDU stall + maddu $v1,$at C M1 + maddu $v0,$at C M1 + mfhi$v0 C M1 carry + lw $v1,8($a0) C L2 + sw $t0,0($a0) C S0 + beq $at,$t1,23f +addiu $a0,$a0,4 + addiu $a1,$a1,4 + mflo$t0 C M1 +0: addiu $a1,$a1,8 + addiu $a0,$a0,8 + multu $t2,$a3 C M1 + lw $t3,0($a1) C L2, 32x16 MDU stall + maddu $v1,$at C M1 + lw $t4,0($a0) C L2 + maddu $v0,$at C M1 + mfhi$v0 C M1 carry + sw $t0,-8($a0) C S0 + mflo$t1 C M1 + multu $t3,$a3 C M2 + lw $t2,4($a1) C L3, 32x16 MDU stall + maddu $t4,$at C M2 + lw $v1,4($a0) C L3 + maddu $v0,$at C M2 + mfhi$v0 C M2 carry + sw $t1,-4($a0) C S1 + bne $a1,$a2,0b +23: mflo $t0 C M2 + multu $t2,$a3 C M3 + nop C 32x16 MDU stall + maddu $v1,$at C M3 + maddu $v0,$at C M3 + mflo$at C M3 + mfhi$v0 C M3 carry + sw $t0,0($a0) C S2 + jr $ra +sw $at,4($a0) C S3 +1: mflo$at + mfhi$v0 + jr $ra +sw $at,0($a0) +EPILOGUE(mpn_addmul_1) +ASM_END() diff -r 6ab06c72027e -r 789677d6e8b2 mpn/mips32/r1/mul_1.asm --- /dev/null +++ b/mpn/mips32/r1/mul_1.asm @@ -0,0 +1,69 @@ +include(`../config.m4') + +C cycles/limb +C 4KEc 7.66 +C 24Kc 7.54 +C 24KEc 7.55 +C P5600 7.04 +C XBurst 10.54 + +C INPUT PARAMETERS +C rp $a0 +C up $a1 +C n$a2 +C vl $a3 + +ASM_START() + .setnoat +PROLOGUE(mpn_mul_1) + lw $v1,0($a1) C L0 + ori $at,$zero,1 + multu $v1,$a3 C M0 + beq $at,$a2,1f C 32x16 MDU stall +addiu $t1,$a2,-2 + mfhi$v0 C M0 carry + beqz$t1,23f +lw $t2,4($a1) C L1 + mflo$t0 C M0 + andi$t3,$t1,1 + sll $a2,$t1,2 + beqz$t3,0f +addu $a2,$a2,$a1 + multu $t2,$a3 C M1 + lw $t2,8($a1) C L2, 32x16 MDU stall + maddu $v0,$at C M1 + mfhi$v0 C M1 carry + sw $t0,0($a0) C S0 + beq $at,$t1,23f +addiu $a0,$a0,4 + addiu $a1,$a1,4 + mflo$t0 C M1 +0: addiu $a1,$a1,8 + addiu $a0,$a0,8 + multu $t2,$a3 C M1 + lw $t3,0($a1) C L2, 32x16 MDU stall + maddu $v0,$at C M1 + mfhi$v0
[PATCH 2 of 3] Add MIPS32R2 RDHWR-based cycle counter support
Add MIPS32R2 RDHWR-based cycle counter support. diff -r 0ba6f9f13912 -r 6ab06c72027e configure.ac --- a/configure.ac +++ b/configure.ac @@ -1035,7 +1035,11 @@ path_64="mips64/hilo mips64" ;; esac - +;; + + mipsisa32r2*-*-*) +SPEED_CYCLECOUNTER_OBJ=mips32r2.lo +cyclecounter_size=1 ;; esac ;; diff -r 0ba6f9f13912 -r 6ab06c72027e tune/Makefile.am --- a/tune/Makefile.am +++ b/tune/Makefile.am @@ -33,7 +33,7 @@ AM_LDFLAGS = -no-install EXTRA_DIST = alpha.asm pentium.asm sparcv9.asm hppa.asm hppa2.asm hppa2w.asm \ - ia64.asm powerpc.asm powerpc64.asm x86_64.asm many.pl + ia64.asm powerpc.asm powerpc64.asm x86_64.asm mips32r2.asm many.pl noinst_HEADERS = speed.h # Prefer -static on the speed and tune programs, since that can avoid diff -r 0ba6f9f13912 -r 6ab06c72027e tune/mips32r2.asm --- /dev/null +++ b/tune/mips32r2.asm @@ -0,0 +1,12 @@ +include(`../config.m4') + +ASM_START() +PROLOGUE(speed_cyclecounter) + rdhwr $2,$2 + slti$3,$2,0 + sll $2,$2,1 C save multiply and assume CCRes is 2 + sw $3,4($4) + jr $ra +sw $2,0($4) +EPILOGUE(speed_cyclecounter) +ASM_END() ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
[PATCH 1 of 3] Provide c/l results on some MIPS32 CPUs
Provide c/l results on some MIPS32 CPUs. diff -r ed6ddbb7a15b -r 0ba6f9f13912 mpn/mips32/addmul_1.asm --- a/mpn/mips32/addmul_1.asm +++ b/mpn/mips32/addmul_1.asm @@ -31,6 +31,13 @@ include(`../config.m4') +C cycles/limb +C 4KEc 16.33 +C 24Kc 18.04 +C 24KEc18.05 +C P5600 9.05 +C XBurst 17.05 + C INPUT PARAMETERS C res_ptr $4 C s1_ptr $5 diff -r ed6ddbb7a15b -r 0ba6f9f13912 mpn/mips32/mul_1.asm --- a/mpn/mips32/mul_1.asm +++ b/mpn/mips32/mul_1.asm @@ -31,6 +31,13 @@ include(`../config.m4') +C cycles/limb +C 4KEc 12.22 +C 24Kc 14.03 +C 24KEc14.05 +C P5600 7.05 +C XBurst 13.05 + C INPUT PARAMETERS C res_ptr $4 C s1_ptr $5 diff -r ed6ddbb7a15b -r 0ba6f9f13912 mpn/mips32/submul_1.asm --- a/mpn/mips32/submul_1.asm +++ b/mpn/mips32/submul_1.asm @@ -31,6 +31,13 @@ include(`../config.m4') +C cycles/limb +C 4KEc 16.33 +C 24Kc 18.04 +C 24KEc18.05 +C P5600 9.05 +C XBurst 17.05 + C INPUT PARAMETERS C res_ptr $4 C s1_ptr $5 ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Suggested tune/tuneup.c patch
And please take a screenshot of the affected parameters before and after this change, as a sanity check. I added a history preservation feature to the .../devel/thres/ pages. At 23:59 each night, all pages are copied to .../devel/thres/-MM-DD. (There is no index, one needs to type in the wanted date manually.) -- Torbjörn Please encrypt, key id 0xC8601622 ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: hgcd1/2
ni...@lysator.liu.se (Niels Möller) writes: Sounds good. Give it a few days, and delete if it still looks slow everywhere. I cooked a modern alternative: static mp_double_limb_t div1 (mp_limb_t n0, mp_limb_t d0) { mp_double_limb_t res; int ncnt, dcnt, cnt; mp_limb_t q; mp_limb_t mask; ASSERT (n0 >= d0); count_leading_zeros (ncnt, n0); count_leading_zeros (dcnt, d0); cnt = dcnt - ncnt; d0 <<= cnt; q = -(mp_limb_t) (n0 >= d0); n0 -= d0 & q; d0 >>= 1; q = -q; while (--cnt >= 0) { mask = -(mp_limb_t) (n0 >= d0); n0 -= d0 & mask; d0 >>= 1; q = (q << 1) - mask; } res.d0 = n0; res.d1 = q; return res; } That should use the same choices for div1 and div2. Is it important to inline some or all of the div1/div2 code? If we move them out to separate .c files, it gets easier to share between hgcd2 and hgcd2_jacobi, and easier to experiment with assembly div1. But it gets a bit more complex to set things up for tuneup. Not considering tuneup hardness, I suppose inlining makes some sense, at least of _METHOD 1 which often ends up as a single instruction. I've never been a big fan of inlining. Small operands GCD in GMP is special in many ways, as it is hundreds of times slower than multiplication. We really need to try hard to improve it. -- Torbjörn Please encrypt, key id 0xC8601622 ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel
Re: Suggested tune/tuneup.c patch
ni...@lysator.liu.se (Niels Möller) writes: Below patch adds a helper function for tuning *_METHOD values, evaluated at some fix size. What do you think? It helps with some function comments, outlining what a function does, and what its arguments mean. Please consider adding that before committing. And please take a screenshot of the affected parameters before and after this change, as a sanity check. -- Torbjörn Please encrypt, key id 0xC8601622 ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel