from:"Niels Möller"

Re: [PATCH 09/20] Prepare for creating hidden aliases of all routines

2013-03-04 Thread Niels Möller

Richard Henderson r...@twiddle.net writes:

 index 1b27998..ff0dc45 100644
 --- a/gmp-h.in
 +++ b/gmp-h.in
 @@ -251,6 +251,10 @@ typedef __mpq_struct *mpq_ptr;
   __GMP_PUBLIC_DATA   - for declaring data variables
   __GMP_PUBLIC_ALIAS  - for re-declaring symbols with another name
  
 +   When using GCC and building an ELF shared library, we'll arrange for
 +   the rest of GMP to see a hidden alias.  The true public symbol will
 +   be created in the individual source files.
 +

This is supposed to be an implementation detail, right? I.e., it should
depend only on if gmp *itself* is compiled with gcc, and work the same
no matter which compiler is used for the application *using* gmp (and
which will be processing the public gmp.h header)?

I admit I don't understand the fine details here.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

Re: [PATCH 01/20] Delete mpn/generic/sizeinbase.c

2013-03-04 Thread Niels Möller

Richard Henderson r...@twiddle.net writes:

 As far as I can tell it hasn't been used since

 2002-02-09  Kevin Ryde  ke...@swox.se

 * configure.in, mpn/Makefile.am, gmp-impl.h (mpn_sizeinbase): Remove.
 * mpn/generic/sizeinbase.c: Remove file.

 removed it from MPN_OBJECTS.  It's certainly never built.

Does anyone remember why it was deleted back then? I think it makes a
lot of sense as a public mpn function.

The file has seen some non-trivial updates by Torbjörn and Marco in
recent years.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

Re: [PATCH 00/20] Create and use hidden aliases in libgmp.so

2013-03-04 Thread Niels Möller

Richard Henderson r...@twiddle.net writes:

 Although I'm frankly a bit confused as to why we're using
 m4 for the assembly macro-isation, as opposed to the C preprocessor.

Different taste, I guess. I don't know why GMP moved to m4, but
personally, I wouldn't use cpp for anything that isn't C code, it's too
deeply tied to the notion of a C token.

And I think the additional power of m4 is used in gmp in a couple of
places, e.g., ifelse tests applied to macro arguments, and macros
expanding to macro definitions. And I don't think the macrology would
get much prettier with cpp-style token pasting...

 In particular, when -g is passed through from gcc to gas, gas sees and
 records the # file line markers that the preprocessor leaves, and puts
 that into the auto-generated line number debugging info.  With our m4
 scheme we get no such # markers and instead gas records line number
 info assocated with a no longer extant temp file.

The non-existant temp file definately is annoying, you have to comment
out a rm command in the m4-ccas script when debugging gmp assembly
files.

As for #line markers, you can get them with m4 -s (not sure if it's a
GNUism or traditional m4). But when using non-trivial M4 macros, I find
it's often more helpful to have gdb guide you in the .s file after macro
expansion.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

Re: [PATCH 00/20] Create and use hidden aliases in libgmp.so

2013-03-05 Thread Niels Möller

David Miller da...@davemloft.net writes:

 And it causes the debugging problem Richard mentioned too.  I really
 want to step in the orignal source file, the thing I'm going to edit
 to fix the bug, not some intermediate file.

Does it help to just add -s to the m4 invocation?

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

Re: [PATCH 00/20] Create and use hidden aliases in libgmp.so

2013-03-05 Thread Niels Möller

Richard Henderson r...@twiddle.net writes:

 But perhaps more importantly, everyone who programs in C understands
 how the preprocessor works.

Actually, I think that's incorrect.

Everyone has some *familiarity* with the C preprocessor, which surely is
an advantage. And maybe most C programmers think they they understand
it. But in my experience, very few understand the fine details of cpp
macro expansion.

E.g., you probably wrote those GLUE and GLU1 macros without any
difficulty, but understanding how that works and why really makes you a
Great Guru to the vast majority of C programmers.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

Re: [PATCH 00/20] Create and use hidden aliases in libgmp.so

2013-03-05 Thread Niels Möller

Torbjorn Granlund t...@gmplib.org writes:

 ni...@lysator.liu.se (Niels Möller) writes:
   
 I would expect #line to cause syntax problem for macines where # is not
 a comment charachter.  Like ARM, where #17 is the small constant
 argument 17.

GNU as on my arm doesn't complain about #line as generated by m4 -s. Not
sure if it generates any useful debug info from it, though. But I
imagine one might need some configure check to disable -s under certain
circumstances.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

Re: [PATCH 00/20] Create and use hidden aliases in libgmp.so

2013-03-05 Thread Niels Möller

Torbjorn Granlund t...@gmplib.org writes:

 Ehum, I don't understand with which cpp quirk that indirection is
 coping...

The point of the indirection is to get macro arguments expanded *before*
substitution. Which matters only (I think) when using the # and ## cpp
operators. Example: gcc -E on this file

  #define S1(x) prefix_##x
  #define S(x) S1(x)
  #define FOO bar
  S(FOO)
  S1(FOO)

produces

  prefix_bar
  prefix_FOO

 Perhaps we should use m4 for preprocessing C instead?  :-)

That would certainly cause some additional confusion. Any suggestion for
appropriate m4 quote characters to use? ;-)

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

Re: gmp-devel list

2013-03-06 Thread Niels Möller

Zimmermann Paul paul.zimmerm...@loria.fr writes:

 the gmp-devel list is for Technical discussions between developers. We have
 seen recently several patches posted, which I believe do no match the list
 definition. If there is no other way to transfer source code, maybe one should
 create a gmp-patch list?

We need a forum for both posting patches, and discussing the same
patches. To me, it seems easiest to have both on the same list.

I take it the *huge* patch containing the generated configure script was
a mistake.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

Re: [PATCH 2/2] Optimize 64-bit mpn_add_N and mpn_sub_N for sparc T3 and later.

2013-03-07 Thread Niels Möller

Torbjorn Granlund t...@gmplib.org writes:

 I optimised submul_1.asm, and then edited both addmul_1 and submul_1 to
 use as similar operand order as possible.

So remaining differences are necessary? I don't remember much sparc
assembly, but it seems carry handling is done slightly differently. But
appearantly there was no need for negation-on-the-fly tricks in
submul_1, you just use subcc instead of addcc for the final subtraction
(maybe we discussed that trick in some other context?).

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

Re: mpn_cnd_add_n

2013-03-10 Thread Niels Möller

Torbjorn Granlund t...@gmplib.org writes:

 OK with me, but either test on powerpc64 before checking in, or keep an
 eye on nightbuild regressions and fix any problem there.

I checked it now, but darn, it did break on powerpc. Register r6 was
used instead of the symbolic name n in a couple of places. With the
following additional changes (complete file in the tree ~nisse/hack/gmp
on shell) it appears to work again.

It would be good if someone who actually understands powerpc assembly
could have a look before I check it in. And maybe it's better to replace
the hardcoded r7 when used in the loop by some symbolic name? Something
like

  define(`ulimb', `r7') C Overlap with n input argument

Regards,
/Niels

--- a/mpn/powerpc64/mode64/aorscnd_n.asmSun Mar 10 10:00:12 2013 +0100
+++ b/mpn/powerpc64/mode64/aorscnd_n.asmSun Mar 10 12:30:31 2013 +0100
@@ -64,11 +64,11 @@
subfic  cnd, cnd, 0
subfe   cnd, cnd, cnd
 
-   rldicl. r0, r6, 0,62C r0 = n  3, set cr0
+   rldicl. r0, n, 0,62 C r0 = n  3, set cr0
cmpdi   cr6, r0, 2
-   addir6, r6, 3   C compute count...
-   srdir6, r6, 2   C ...for ctr
-   mtctr   r6  C copy count into ctr
+   addin, n, 3 C compute count...
+   srdin, n, 2 C ...for ctr
+   mtctr   n   C copy count into ctr
beq cr0, L(b00)
blt cr6, L(b01)
beq cr6, L(b10)
@@ -122,7 +122,7 @@
b   L(ret)
 
 L(b00):CLRCB   C clear/set cy
-L(go): ld  r6, 0(up)   C load s1 limb
+L(go): ld  r7, 0(up)   C load s1 limb
ld  r27, 0(vp)  C load s2 limb
ld  r8, 8(up)   C load s1 limb
ld  r9, 8(vp)   C load s2 limb
@@ -139,8 +139,8 @@
addiup, up, 32
addivp, vp, 32
 
-L(top):ADDSUBC r28, r27, r6
-   ld  r6, 0(up)   C load s1 limb
+L(top):ADDSUBC r28, r27, r7
+   ld  r7, 0(up)   C load s1 limb
ld  r27, 0(vp)  C load s2 limb
ADDSUBC r29, r9, r8
ld  r8, 8(up)   C load s1 limb
@@ -164,7 +164,7 @@
and r0, r0, cnd
bdnzL(top)  C decrement ctr and loop back
 
-L(end):ADDSUBC r28, r27, r6
+L(end):ADDSUBC r28, r27, r7
ADDSUBC r29, r9, r8
ADDSUBC r30, r11, r10
ADDSUBC r31, r0, r12



-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

Re: mpn_cnd_add_n

2013-03-12 Thread Niels Möller

ni...@lysator.liu.se (Niels Möller) writes:

 Here's a patch that reorders the arguments for mpn_addcnd_n and
 mpn_subcnd_n (I think it's best to keep this change separate from the
 renaming, since the potential problems are quite different).

This seems to work now, after additional fixes for powerpc64 and x86_64
w*ndows. Here's the next patch, doing the rename of functions and files.
I generated it with hg diff --git, to get concise diffs for the renamed
files (to apply, I think one need a recent patch program, or use git
apply, where the latter apparently doesn't require any git repo).

I plan to check this in fairly soon.

Regards,
/Niels

diff --git a/ChangeLog b/ChangeLog
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,32 @@
+2013-03-12  Niels Möller  ni...@lysator.liu.se
+
+   New names mpn_cnd_add_n and mpn_cnd_sub_n.
+   * mpn/generic/cnd_add_n.c (mpn_cnd_add_n): Renamed file and
+   function, from addcnd.c:mpn_addcnd_n.
+   * mpn/generic/cnd_sub_n.c (mpn_cnd_sub_n): Renamed, from
+   subcnd.c:mpn_subcnd_n.
+   * mpn/arm/cnd_aors_n.asm: Renamed file, from aorscnd.asm, and
+   renamed functions.
+   * mpn/x86_64/cnd_aors_n.asm: Analogous renaming.
+   * mpn/powerpc64/mode64/cnd_aors_n.asm: Analogous renaming.
+   * gmp-impl.h (mpn_cnd_add_n, mpn_cnd_add_n): Updated prototypes
+   with new names.
+   * configure.ac: Updated for new names.
+   * tests/refmpn.c (refmpn_cnd_add_n): Renamed, from refmpn_addcnd_n.
+   (refmpn_cnd_sub_n): Renamed, from refmpn_subcnd_n.
+   * tests/tests.h (refmpn_cnd_add_n, refmpn_cnd_sub_n): Updated
+   prototypes with new names.
+   * tune/common.c (speed_mpn_cnd_add_n): Renamed, from
+   speed_mpn_addcnd_n, call mpn_cnd_add_n.
+   (speed_mpn_cnd_sub_n): Renamed, from speed_mpn_subcnd_n, call
+   mpn_cnd_sub_n.
+   * tune/speed.h (speed_mpn_cnd_add_n, speed_mpn_cnd_sub_n): Updated
+   prototypes with new names.
+   * tune/speed.c (routine): Updated list with new names.
+   * tests/devel/try.c: Updated for new mpn_cnd_* names.
+   * mpn/generic/sbpi1_div_sec.c: Likewise.
+   * mpn/generic/powm_sec.c: Likewise.
+
 2013-03-12  Torbjorn Granlund  t...@gmplib.org
 
* configure.ac: Add missing to extra_functions for coreibwl.
diff --git a/configure.ac b/configure.ac
--- a/configure.ac
+++ b/configure.ac
@@ -2718,7 +2718,7 @@
   add_n_sub_n addaddmul_1msb0
 
 gmp_mpn_functions=$extra_functions   \
-  add add_1 add_n sub sub_1 sub_n addcnd_n subcnd_n neg com   \
+  add add_1 add_n sub sub_1 sub_n cnd_add_n cnd_sub_n neg com \
   mul_1 addmul_1 submul_1 \
   add_err1_n add_err2_n add_err3_n sub_err1_n sub_err2_n sub_err3_n   \
   lshift rshift dive_1 diveby3 divis divrem divrem_1 divrem_2 \
@@ -2775,7 +2775,7 @@
 tmp_mulfunc=aors_err2_n ;;
   add_err3_n|sub_err3_n)
 tmp_mulfunc=aors_err3_n ;;
-  addcnd_n|subcnd_n) tmp_mulfunc=aorscnd_n   ;;
+  cnd_add_n|cnd_sub_n) tmp_mulfunc=cnd_aors_n   ;;
   addmul_1|submul_1) tmp_mulfunc=aorsmul_1 ;;
   mul_2|addmul_2)tmp_mulfunc=aormul_2 ;;
   popcount|hamdist)  tmp_mulfunc=popham;;
@@ -3268,7 +3268,6 @@
 #undef HAVE_NATIVE_mpn_add_n_sub_n
 #undef HAVE_NATIVE_mpn_add_nc
 #undef HAVE_NATIVE_mpn_addaddmul_1msb0
-#undef HAVE_NATIVE_mpn_addcnd_n
 #undef HAVE_NATIVE_mpn_addlsh1_n
 #undef HAVE_NATIVE_mpn_addlsh2_n
 #undef HAVE_NATIVE_mpn_addlsh_n
@@ -3301,6 +3300,8 @@
 #undef HAVE_NATIVE_mpn_bdiv_dbm1c
 #undef HAVE_NATIVE_mpn_bdiv_q_1
 #undef HAVE_NATIVE_mpn_pi1_bdiv_q_1
+#undef HAVE_NATIVE_mpn_cnd_add_n
+#undef HAVE_NATIVE_mpn_cnd_sub_n
 #undef HAVE_NATIVE_mpn_com
 #undef HAVE_NATIVE_mpn_copyd
 #undef HAVE_NATIVE_mpn_copyi
@@ -3357,7 +3358,6 @@
 #undef HAVE_NATIVE_mpn_sqr_diag_addlsh1
 #undef HAVE_NATIVE_mpn_sub_n
 #undef HAVE_NATIVE_mpn_sub_nc
-#undef HAVE_NATIVE_mpn_subcnd_n
 #undef HAVE_NATIVE_mpn_sublsh1_n
 #undef HAVE_NATIVE_mpn_sublsh2_n
 #undef HAVE_NATIVE_mpn_sublsh_n
diff --git a/gmp-impl.h b/gmp-impl.h
--- a/gmp-impl.h
+++ b/gmp-impl.h
@@ -1555,10 +1555,10 @@
 __GMP_DECLSPEC mp_size_t mpn_powm_sec_itch (mp_size_t, mp_size_t, mp_size_t);
 #define   mpn_tabselect __MPN(tabselect)
 __GMP_DECLSPEC void  mpn_tabselect (volatile mp_limb_t *, volatile 
mp_limb_t *, mp_size_t, mp_size_t, mp_size_t);
-#define   mpn_addcnd_n __MPN(addcnd_n)
-__GMP_DECLSPEC mp_limb_t mpn_addcnd_n (mp_limb_t, mp_ptr, mp_srcptr, 
mp_srcptr, mp_size_t);
-#define   mpn_subcnd_n __MPN(subcnd_n)
-__GMP_DECLSPEC mp_limb_t mpn_subcnd_n (mp_limb_t, mp_ptr, mp_srcptr, 
mp_srcptr, mp_size_t);
+#define   mpn_cnd_add_n __MPN(cnd_add_n)
+__GMP_DECLSPEC mp_limb_t mpn_cnd_add_n (mp_limb_t, mp_ptr, mp_srcptr, 
mp_srcptr, mp_size_t);
+#define   mpn_cnd_sub_n __MPN(cnd_sub_n)
+__GMP_DECLSPEC mp_limb_t mpn_cnd_sub_n (mp_limb_t, mp_ptr, mp_srcptr, 
mp_srcptr, mp_size_t);
 
 #define mpn_sb_div_qr_sec __MPN

Re: mpn_cnd_add_n

2013-03-13 Thread Niels Möller

ni...@lysator.liu.se (Niels Möller) writes:

 I plan to check this in fairly soon.

Checked in this renaming now.

Next, I'd like to make mpn_cnd_add_n, mpn_cnd_sub_n and mpn_zero_p
public (I guess that also implies some documentation...).

mpn_zero_p is an inline function. In gmp.h, setting up inlining looks
pretty hairy, with __GMP_EXTERN_INLINE defined in a couple of different
ways depending on compiler.

On the other hand, gmp-impl.h seems to simply use static inline,
unconditionally. If this is good enough for all supported compilers,
maybe it's good enough also in gmp.h? Potential problems:

* We may still need some __GMP_FORCE_* logic for binary compatibility
  with applications which expect the functions to be exported.

* We might break pointer equality (different compilation units might get
  different copies of non-inlined versions). I haven't thought very
  carefully about it, but I guess we already have that problem for
  compilers where __GMP_EXTERN_INLINE expands to static inline (DEC,
  SCO, and SunPro C compilers). If it is a problem. I think this can be
  worked around with some macro indirection to make sure that pointers
  will refer to a unique function in the library.

Anyway, unless someone things cleaning this up now is important, I guess
I'll try to do mpn_zero_p in the same way as other inline functions in
gmp.h.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

ARM neon pseudo op

2013-04-02 Thread Niels Möller

On my pandaboard (with a cortex-a9), I run Debian GNU/Linux, and the
assembler calls itself

  $ as --version
  GNU assembler (GNU Binutils for Debian) 2.22

It refuses to assemble the new shiny mpn/arm/neon/*.asm files. With
somewhat confusing error messages like

  tmp-lshift.s: Assembler messages:
  tmp-lshift.s:111: Error: selected processor does not support ARM mode 
`vdup.32 d6,r3'
  tmp-lshift.s:113:  gcc -std=gnu99 -c -DHAVE_CONFIG_H -I. 
-I/home/nisse/hack/gmp/mpn -I.. -D__GMP_WITHIN_GMP -I/home/nisse/hack/gmp 
-DOPERATION_rshift -marm -O2 -pedantic -fomit-frame-pointer -march=armv7-a 
-mtune=cortex-a9 -g -Wa,--noexecstack tmp-rshift.s -o rshift.o
  Error: selected processor does not support ARM mode `vdup.32 d7,r3'
  tmp-lshift.s:119: Error: selected processor does not support ARM mode 
`vshl.u64 d18,d19,d7'

I had to patch the files like

diff -r ea84e60fd6dc mpn/arm/neon/lorrshift.asm
--- a/mpn/arm/neon/lorrshift.asmTue Apr 02 07:40:32 2013 +0200
+++ b/mpn/arm/neon/lorrshift.asmTue Apr 02 11:32:05 2013 +0200
@@ -21,6 +21,8 @@ dnl  along with the GNU MP Library.  If 
 
 include(`../config.m4')
 
+   .fpuneon 
+ 
 C   cycles/limb cycles/limb cycles/limb  good 

to convince the assembler. Not sure if it's best to do this at the top
of each file, or if it should somehow be handled by ASM_START.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.

___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

Re: ARM neon pseudo op

2013-04-02 Thread Niels Möller

Torbjorn Granlund t...@gmplib.org writes:

 I need the first few dozen lines from the configure output to have a
 guess about what might go wrong.  You might actually compare the output
 to that of panda.gmplib.org yourself.

 http://gmplib.org/devel/testmachines/build/success/panda.gmplib.org-stat:standard.txt

Hmm, it's supposed to be a flag (-mfpu=neon) on the gcc command line?
Then it's my fault... I had set CFLAGS manually to what gmp's configure
selected for me some month or two ago, plus a -g.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

Re: ARM public key benchmark

2013-04-02 Thread Niels Möller

ni...@lysator.liu.se (Niels Möller) writes:

 I'm not yet using GMP's mpn_cnd_{add,sub}_n, that's the next thing I'd
 like to try.

That wasn't a clear win... I use addmul_1 and submul_1 as a fallback
(and I always do in-place operation, so that works). Now, cnd_sub_n
beats submul_1 (except for n == 2, which I don't use):

$ GMP_CPU_FREQUENCY=1e9 ./speed -C -s 1-10,100 mpn_submul_1.1 mpn_cnd_sub_n
clock_gettime is 1.000ns accurate
overhead 8.87 cycles, precision 1000 units of 1.00e-06 secs, CPU freq 1000.00 
MHz
mpn_submul_1.1 mpn_cnd_sub_n
1#19.8927   21.6831
2#10.9752   12.4106
3  9.5514   #8.9371
4  8.5227   #6.6696
5  7.8316   #6.7412
6  7.1571   #6.0339
7  7.2859   #5.3320
8  6.8553   #4.8715
9  6.6945   #5.0376
10 6.3129   #4.8351
1005.5065   #3.2110

But for addition, mpn_addmul_1 beats mpn_cnd_add_n for many small sizes,

$ GMP_CPU_FREQUENCY=1e9 ./speed -C -s 1-10,100 mpn_addmul_1.1 mpn_cnd_add_n
clock_gettime is 1.000ns accurate
overhead 8.94 cycles, precision 1000 units of 1.00e-06 secs, CPU freq 1000.00 
MHz
mpn_addmul_1.1 mpn_cnd_add_n
1#19.8927   21.2256
2#10.8574   11.6940
3 #8.02358.5240
4 #6.45616.5216
5 #6.03086.5071
6 #5.49375.9282
7 #5.20635.3603
8  4.8838   #4.7493
9 #4.92494.9533
10#4.53644.8244
1003.4846   #3.2842

Some questions:

1. I guess one can expect submul_1 to always be a bit slower than
   addmul_1, since submul_1 needs additional arithmetics besides the
   umaal? One could perhaps do some negations on the fly, a - b C = -
   ((-a) + b*C), maybe that would be advantageous?

2. cnd_add_n should be at least as fast as addmul_1, shouldn't it? It
   appears to be 0.25 c/l faster for larger operands, so maybe it's only
   a question of optimizing loop setup and feedin?

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

Re: ARM public key benchmark

2013-04-03 Thread Niels Möller

ni...@lysator.liu.se (Niels Möller) writes:

 So it should be doable with the addmul_1 loop and two additional,
 non-recurrency, not instructions per limb, and then maybe some extra
 logic for the return value. One could aim for 4.25 c/l, I guess.

The below seems to give correct results. But still 5.25 c/l. Maybe
scheduling can be improved, I just put the new mvn instructions
immediately preceding umaal and str.

Regards,
/Niels

dnl  ARM mpn_submul_1.

dnl  Copyright 2012 Free Software Foundation, Inc.

dnl  This file is part of the GNU MP Library.

dnl  The GNU MP Library is free software; you can redistribute it and/or modify
dnl  it under the terms of the GNU Lesser General Public License as published
dnl  by the Free Software Foundation; either version 3 of the License, or (at
dnl  your option) any later version.

dnl  The GNU MP Library is distributed in the hope that it will be useful, but
dnl  WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
dnl  or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public
dnl  License for more details.

dnl  You should have received a copy of the GNU Lesser General Public License
dnl  along with the GNU MP Library.  If not, see http://www.gnu.org/licenses/.

include(`../config.m4')

Ccycles/limb
C StrongARM: -
C XScale -
C Cortex-A7  ?
C Cortex-A8  ?
C Cortex-A9  5.25
C Cortex-A15 ?

C TODO
C  * Micro-optimise feed-in code.
C  * Optimise for n=1,2 by delaying register saving.
C  * Try using ldm/stm.

define(`rp',`r0')
define(`up',`r1')
define(`n', `r2')
define(`v0',`r3')

ASM_START()
PROLOGUE(mpn_submul_1)
stmfd   sp!, { r4, r5, r6, r7 }

andsr6, n, #3
mov r12, #0
beq L(fi0)
cmp r6, #2
bcc L(fi1)
beq L(fi2)

L(fi3): ldr r4, [up], #4
ldr r6, [rp, #0]
ldr r5, [up], #4
b   L(lo3)

L(fi0): ldr r5, [up], #4
ldr r7, [rp], #4
ldr r4, [up], #4
b   L(lo0)

L(fi1): ldr r4, [up], #4
ldr r6, [rp], #8
subsn, n, #1
beq L(1)
ldr r5, [up], #4
b   L(lo1)

L(fi2): ldr r5, [up], #4
ldr r7, [rp], #12
ldr r4, [up], #4
b   L(lo2)

ALIGN(16)
L(top): ldr r6, [rp, #-8]
ldr r5, [up], #4
mvn r7, r7
str r7, [rp, #-12]
L(lo1): mvn r6, r6
umaal   r6, r12, r4, v0
ldr r7, [rp, #-4]
ldr r4, [up], #4
mvn r6, r6
str r6, [rp, #-8]
L(lo0): mvn r7, r7
umaal   r7, r12, r5, v0
ldr r6, [rp, #0]
ldr r5, [up], #4
mvn r7, r7
str r7, [rp, #-4]
L(lo3): mvn r6, r6
umaal   r6, r12, r4, v0
ldr r7, [rp, #4]
ldr r4, [up], #4
mvn r6, r6
str r6, [rp], #16
L(lo2): mvn r7, r7
umaal   r7, r12, r5, v0
subsn, n, #4
bhi L(top)

ldr r6, [rp, #-8]
mvn r7, r7
str r7, [rp, #-12]
L(1):   mvn r6, r6
umaal   r6, r12, r4, v0
mvn r6, r6
str r6, [rp, #-8]
mov r0, r12
ldmfd   sp!, { r4, r5, r6, r7 }
bx  lr
EPILOGUE()

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

Re: ARM public key benchmark

2013-04-03 Thread Niels Möller

Torbjorn Granlund t...@gmplib.org writes:

 Have you considered complementing C instead?

Not until now. Actually looks nice:

  A - b C = A + b (~C) + b - b B^n 

So this saves one not instruction, and we have to add and subtract the
scalar b from incoming and outgoing carry.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

Re: ARM public key benchmark

2013-04-04 Thread Niels Möller

ni...@lysator.liu.se (Niels Möller) writes:

 I'll also try using fewer updates of the up pointer, that seems to save
 half a cycle, and could perhaps speed up addmul_1 too.

No speedup for addmul_1, unfortunately, but a saving for submul_1. Here
are new versions of both files (for mpn/arm/v6). I wonder if this
submul_1 complement trick is useful on some other platforms too, e.g.,
64-bit sparc?

Running at 3.25 and 3.9 c/l on A9:

$ GMP_CPU_FREQUENCY=1e9 ./speed -s1-1000 -f 1.2  -C mpn_addmul_1.17 
mpn_submul_1.17
clock_gettime is 1.000ns accurate
overhead 8.98 cycles, precision 1000 units of 1.00e-06 secs, CPU freq 1000.00 
MHz
mpn_addmul_1.17 mpn_submul_1.17
1 19.9985  #18.7576
2 10.9929  #10.8108
3 #7.98808.6664
4 #6.40996.9205
5 #5.92516.3792
6 #5.43846.1228
7 #5.23365.9634
8 #4.86485.4154
9 #4.83595.2633
10#4.54235.2216
12#4.31224.8634
14#4.18764.8111
16#4.08814.6616
19#4.00454.5861
22#3.85924.4916
26#3.71914.4362
31#3.71944.3459
37#3.64374.2051
44#3.83684.3953
52#3.50394.1120
62#3.64974.2448
74#3.60154.1965
88#3.53764.1343
105   #3.50874.0832
126   #3.49884.1222
151   #3.43973.9997
181   #3.39003.9654
217   #3.34833.9500
260   #3.29943.9175
312   #3.33133.8723
374   #3.31733.8869
448   #3.26573.9315
537   #3.27993.9140
644   #3.30773.8574
772   #3.26213.9059
926   #3.26733.8317

dnl  ARM mpn_submul_1.

dnl  Copyright 2012, 2013 Free Software Foundation, Inc.

dnl  This file is part of the GNU MP Library.

dnl  The GNU MP Library is free software; you can redistribute it and/or modify
dnl  it under the terms of the GNU Lesser General Public License as published
dnl  by the Free Software Foundation; either version 3 of the License, or (at
dnl  your option) any later version.

dnl  The GNU MP Library is distributed in the hope that it will be useful, but
dnl  WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
dnl  or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public
dnl  License for more details.

dnl  You should have received a copy of the GNU Lesser General Public License
dnl  along with the GNU MP Library.  If not, see http://www.gnu.org/licenses/.

include(`../config.m4')

Ccycles/limb
C StrongARM: -
C XScale -
C Cortex-A7  ?
C Cortex-A8  ?
C Cortex-A9  3.9
C Cortex-A15 ?

C TODO
C  * Micro-optimise feed-in code.
C  * Optimise for n=1,2 by delaying register saving.
C  * Try using ldm/stm.

define(`rp',`r0')
define(`up',`r1')
define(`n', `r2')
define(`v0',`r3')

ASM_START()
PROLOGUE(mpn_submul_1)
stmfd   sp!, { r4, r5, r6, r7 }

andsr6, n, #3
mov r12, v0
beq L(fi0)
cmp r6, #2
bcc L(fi1)
beq L(fi2)

L(fi3): ldr r4, [up], #12
mvn r4, r4
ldr r6, [rp, #0]
ldr r5, [up, #-8]
b   L(lo3)

L(fi0): ldr r5, [up], #16
mvn r5, r5
ldr r7, [rp], #4
ldr r4, [up, #-12]
b   L(lo0)

L(fi1): ldr r4, [up], #20
mvn r4, r4
ldr r6, [rp], #8
subsn, n, #1
beq L(1)
ldr r5, [up, #-16]
b   L(lo1)

L(fi2): ldr r5, [up], #8
mvn r5, r5
ldr r7, [rp], #12
ldr r4, [up, #-4]
b   L(lo2)

ALIGN(16)
L(top): ldr r6, [rp, #-8]
ldr r5, [up], #16
str r7, [rp, #-12]
L(lo1): umaal   r6, r12, r4, v0
mvn r5, r5
ldr r7, [rp, #-4]
ldr r4, [up, #-12]
str r6, [rp, #-8]
L(lo0): umaal   r7, r12, r5, v0
mvn r4, r4
ldr r6, [rp, #0]
ldr r5, [up, #-8]
str r7, [rp, #-4]
L(lo3): umaal   r6, r12, r4, v0
mvn r5, r5
ldr r7, [rp, #4]
ldr r4, [up, #-4]
str r6, [rp], #16
L(lo2): umaal   r7, r12, r5, v0
mvn r4, r4
subsn, n, #4
bhi L(top)

ldr r6, [rp, #-8]
str r7, [rp, #-12]
L(1):   umaal   r6, r12, r4, v0
str r6, [rp, #-8]
sub r0, v0, r12
ldmfd   sp!, { r4, r5, r6, r7 }
bx  lr
EPILOGUE()
dnl  ARM mpn_addmul_1.

dnl  Copyright 2012 Free Software Foundation, Inc.

dnl  This file is part of the GNU MP Library.

dnl  The GNU MP Library is free

Re: ARM public key benchmark

2013-04-04 Thread Niels Möller

Torbjorn Granlund t...@gmplib.org writes:

 I sometimes get better A9 performance with *discrete* pointer updates,
 not one-out-of-four autoincrement pointer updates like used here.  I
 think the code you started with had that one-out-of-four trick for str,
 already?

Right, it uses a single update of rp (which is used for both loads and
stores), I just changed it to handle up updates in a similar way.

 I had on the other hand not realised David's ones complement + pre-invert
 carry trick.

Not sure I understand what you are referring to here. I haven't been
following the sparc developments very closely (and I don't remember much
of sparc assembly).

 Cool!  Looks like it is actually faster than 3.9 for some
 alignments/sizes.

It seems one iteration takes 15.5 cycles. I guess that means that even
and odd iterations are executed differently? It would be nice to get it
down to 15 cycles (3.75 c/l) (the addmul_1 iteration takes 13 cycles,
there's no good reason the four additional mvn instructions should cost
more than two cycles). But I find instruction scheduling both very hard
and tedious.

 Did you time this on some other CPU too?

No. When I get home (I don't log in to the gmp machines from the office
network), I might get time to try it on the appropriate gmp machine.

 1. Use descrete ptr updates for up and/or rp.

Maybe. Costs additional instructions, but with more freedom on where to
place the pointer increment. Loopmixer would help.

A digression: I'm running Debian GNU/Linux on my pandaboard system. The
Linux way to get access to the instruction counter seems to be via
perf_event_open. However, when I tried it, it seems no hardware-based
events exist (I do get access to the software-based ones though, so the
interface is partially working). Also, clock_get_time with
CLOCK_PROCESS_CPUTIME_ID gives very poor accuracy, so maybe the entire
high res timers-subsystem is non-working. Any clues on where to look
for solving this problem is appreciated. The obvious (to me) things seem
to be enabled in the kernel config

  $ zgrep 'PERF_EV\|HIGH_RES' /proc/config.gz
  CONFIG_HIGH_RES_TIMERS=y
  CONFIG_HAVE_PERF_EVENTS=y
  CONFIG_PERF_EVENTS=y
  CONFIG_HW_PERF_EVENTS=y

And it's no use to even think of porting the loop mixer to arm without
access to cycle-accurate timing.

 2. Move the one-out-of-four autoincrement updates to other ldr/str
insns.

Could try that.

 3. Use ldm/stm.  Often an A9 win.

If we want to schedule loads early, that seems to rule out using a
single ldm to load all values used in the loop. Right? But two ldm,
loading two limbs at a time, could work. stm seems easier. Any changes
of this type will break the current loop setup logic, I'm afraid.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

Re: ARM public key benchmark

2013-04-04 Thread Niels Möller

Richard Henderson r...@twiddle.net writes:

 Looking around the web it seems that what most folks do is write a
 minimal kernel module that toggles the bit that allows userspace
 access to the cycle counter MSRs.

I've seen some code snippets to do that too. I have a dual core system;
one known issue when I tried searching for this some month ago was that
it set the needed register bits only on the core on which the kernel
module happened to be loaded. So that seems like a partial solution
only. And I have no idea about how to write a Linux kernel module that
makes a certain function be executed on all available cores; if I can
find out, I'll definitely try that.

 I havn't fiddled with this myself, because the A15 includes a real
 high-resolution clock, which is accessed via CLOCK_MONOTONIC.

On my system, both CLOCK_MONOTONIC and CLOCK_PROCESS_CPUTIME_ID seem to
tick in units of roughly 30 us. Which is disappointingly poor
resolution.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

Re: ARM public key benchmark

2013-04-04 Thread Niels Möller

Torbjorn Granlund t...@gmplib.org writes:

 The newer sparc adds 64-bit carrying adds, but they still don't have
 corresponding subtraction instructions.  Se David sets carry before
 entering the loop, and ones complements the subtrahend.

Ah. I think I even suggested that trick, for mpn_sub_n. Not obviously
applicable to submul_1, though, which made me think you were referring
to something different.

 I assume that ldm loads the registers in some secific order, such as
 lowest numbered first.

I guess it's lowest numbered first (and lowest memory address).

But a loop with

  use   r7  
  ldm   up!, {r4,r5,r6,r7}
  use   r4

looks like poor scheduling betwen load of r4 and use of it, and the ldm
can't be moved earlier since it clobbers r7. But I have a pretty vague
idea about how this really works.

 Then, it could lift the screboard bit for
 availale register values while ldm executes.

I'm not sure if I understand, but if ldm is executed in the background,
with r4 loaded first, there's still tight between load of r4 and use.
From what I've learnt so far about A9 scheduling, I'd want to place the
load of r4 before the last use of r7, and then I can't use ldm.

 Using ldm with just two register might be pointless.  Also, it will for
 50% of alignments take 2 cycles.  Doing three registers is (as we've
 discussed in the past) more applealing.

Right, rewriting the loop with 3-way unrolling would be an interesting
experiment. But I don't think I'll look into that soon.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

Re: Better tabselect

2013-04-10 Thread Niels Möller

Torbjorn Granlund t...@gmplib.org writes:

 The multitude and pattern of mask computations make side channel leakage
 worse if the mask computation is made stupidly.  I don't trust compilers
 here, since they might use a conditional move or other leaky method.

One possible variant,

  LIMB_HIGHBIT_TO_MASK ( ~((i-k) | (k-i)))

BTW, a comment on LIMB_HIGHBIT_TO_MASK, which I have intended to write
for some time. Current definition uses a condition when arithmetic shift
is unavailable,

#define LIMB_HIGHBIT_TO_MASK(n) \
  (((mp_limb_signed_t) -1  1)  0 \
   ? (mp_limb_signed_t) (n)  (GMP_LIMB_BITS - 1)  \
   : (n)  GMP_LIMB_HIGHBIT ? MP_LIMB_T_MAX : CNST_LIMB(0))

I'd suggest changing the else part to a shift + negation,

   : - ((n)  (GMP_LIMB_BITS - 1))

But I don't know if there's still any C implementation which lack
arithmetic shift, so I don't know how to test that change.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

Re: mini-gmp mpz_addmul_ui and mpz_submul_ui

2013-04-17 Thread Niels Möller

Torbjorn Granlund t...@gmplib.org writes:

 ni...@lysator.liu.se (Niels Möller) writes:

   I'm considering adding mpz_addmul_ui and mpz_submul_ui to mini-gmp.
   
 OK...due to popular demand?

I just added mini-gmp in nettle. I don't use it for the librry itself (I
might make that an option later), but for generating ecc-related tables
at build time. This simplifies cross compilation, since the program in
question is run on the build system, so by using mini-gmp, I get away
without depending on gmp being installed in the build system, and
without any configure tests for figuring out how to link with it.

To do this, I had to replace gmp_printf by calls to mpz_out_str (which
seems unavoidable; I don't want a printf implementation in mini-gmp),
but I also had to replace a call to mpz_submul_ui, which seems easy
enough to implement.

 I have no strong opinion, but we should not forget the meaning of the
 mini moniker.

Right. I'm leaning towards the simpler implementation. Avoiding some
temporary storage is not terribly important for the sizes mini-gmp is
intended for.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

Re: Public mpz_ptr and mpz_srcptr

2013-05-02 Thread Niels Möller

ni...@lysator.liu.se (Niels Möller) writes:

 Here's another possible use of mpz pointers:

And another one, I just stumbled upon: Passing mpz_t to varargs
functions,

  mpz_srcptr n = va_arg(args, mpz_srcptr);

(I just edited that line to use mpz_srcptr. When I wrote the previous
version 10 years ago I used to be const MP_INT *, since that seemed to
be a more supported piece of GMP API. But mpz_srcptr looks less ugly).

Is there any other supported way to do that?

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

Re: _basecase or _sec? [

2013-05-02 Thread Niels Möller

Torbjorn Granlund t...@gmplib.org writes:

 I see the need of the following:

   function
   
   mul
   gcdext
   add, sub

mod (div_r_sec) is more important than general division. And modular
inverse is more important than general gcdext.

I've also seen some need for add_1/sub_1.

Modular inverse is a bit tricky, I have an implementation (at
http://git.lysator.liu.se/nettle/nettle/blobs/master/sec-modinv.c) which
is some 50 time slower than mpn_gcdext. As far as I'm aware, this is a
novel algorithm. I think it could be extended to return the gcd and/or
a success/fail indication without leaking.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

Re: _basecase or _sec? [

2013-05-02 Thread Niels Möller

Torbjorn Granlund t...@gmplib.org writes:

 Hmm...can one use Newton for any a^(-1) mod b (when that exists)?  (I
 listed gcdext for the sake of inverses, and omitted gcd.)

I think Newton analogues exist only when b is a power, not in general.
And the most important case is prime b.

 I am not familiar with the Jebelean (or Möller!) criteria for gcd, but
 my intuition suggests that there exists a O(k)-bit matrix which reduces
 k bits from the operands.  And the same intuition suggests that it can
 be computed from O(k) high bits.  Am I wrong?

You need to have roughly *2k* bits of each operand to make k bits
progress. And you get into trouble at least in the case that those 2k
bits are equal (you need to subtract one operand from the other, but
which depends on the other bits).

Let me think aloud.

As a start, one should try to construct an algorithm which makes (at
least) two bits of progress per iteration, with all problematic special
cases handled in a side-channel silent manner. I looked into it briefly
before writing my current code, but I couldn't find anything easy.

And I'm afraid the table lookup even for just two bits of progress needs
5 bits from each operand, i.e., 1024 entries, which might leak
information through the cache instead.

Right-to-left algorithms will likely have easier bookkeeping than
left-to-right algorithms.

Instead of a lehmer-like algorithms which eliminates k bits of each
operand using a vector/matrix multiply, it might be better (smaller
table) to eliminate k bits from the larger operand only. But then we
need to keep track of which operand is largest. I think this simplest
appproach was the one I looked a bit into, and it didn't seem very
promising. I think one can always eliminate two bits, by setting

  m -- min(a,b)
  M -- max(a,b)
  a -- (M + k m) / 4
  b -- m
  
where k = ± 1, which ever is needed to make M + k m a multiple of 4. And
then some additional logic for the case of even numbers (if M = 0 (mod
4), just set k = 0, and if M = 2 (mod 4), set k = 2). But to go beyond
two bits, one would need something like k-ary gcd, which might introduce
false factors.

The k \in {0, 1, 2, -1} could be a conditional subtraction and addition

1.  Subtract: d -- a - [a odd] b   (if a odd)
2.  Swap: b -- a, d -- -d (if underflow above, d  0) 
3.  Add:  a -- a + 2 b (under certain conditions)
4.  Shift:a -- a / 4   Always

Current code does (1) and (2) only (and a shift by one bit).

If one wants a single addmul_1, one could choose k \in {0,1,2,3}
instead, but one still *needs* to take sign (a-b) into account; it's no
use to add the larger operand to the smaller. Analysis of the guaranteed
progress after n iterations is going to get more complicated with k  0.

For any variant, I think the iteration needs to take into account:

  Low bits of a and b (one can try to maintain b odd at all times, but
  one needs to handle the case of even a). Or high bits for a
  left-to-right algorithm, but I think that's going to be less
  convenient.

  Sign of (a - b).

The straight-forward way to determine the sign of (a - b) is to actually
compute the difference. The result of this subtraction is directly
useful for the plain binary algorithm, but a bit wasted if we try
something more sophisticated.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

Re: _basecase or _sec? [

2013-05-03 Thread Niels Möller

Torbjorn Granlund t...@gmplib.org writes:

 I think it exists also for b can be factorised into prime powers...

Sure, any factorization can be used with CRT, but it's only powers which
allow newton-like hensel-lifts, as far as I'm aware.

And the most important case is when b is prime.

   You need to have roughly *2k* bits of each operand to make k bits
   progress. And you get into trouble at least in the case that those 2k
   bits are equal (you need to subtract one operand from the other, but
   which depends on the other bits).
   
 We don't need to insist on keeping operands positive.

Hmm. In general, one needs to replace the largest number, to make
progres. But I guess in the case of many high bits being equal, it might
not matter.

 My thought was not to have a table, but plain loop which executed a
 limb-size dependent number of iterations, and has not internal branches.

I see. Makes some sense. To get a nice progress guarantee, one such
large iteration would need to apply the matrix, and then do a small
number of additional reductions.

 I have not looked at right-to-left algorithms for gcdext, at least not
 in a very long time.  Do they resolve gcd(a,b) = ax + by or something
 like gcd(a,b)*2^t = ax + by?

I haven't looked at more sophisticated right-to-left gcd for a long time
(Stehlé and Zimmermann). But the plain binary algorithm shifts out the
zero bits as they appear. And the co-factors (to-be inverse) are divided
by 2 (mod b) in each step, by a shift and a conditional add.

It's possible, and it might be more efficient, to divide the cofactors
by 2^k (mod b) at the end, instead of one bit at a time.

 Right, answering my question above.  Cannot false factors of 2 be
 handled?

As long as the iteration only makes the update a -- a + k b, no false
factors can appear. The problem is the k-ary algorithm (Weber's
accelarated gcd), which uses an update of the form a -- u a + v b, with
|u| != 1.

And as for factors of two, I have only considered inversion modulo an
odd b.

 I suspect non-leaky code has to compute the sign as a full subtraction.

It should be possible to do a non-leaky loop examining all limbs,
without carry propagation. Just doing comparisions a[k]  b[k] and a[k]
== b[k], combining results with logic operations. No idea if that can be
faster than a plain subtraction.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

Re: Changes to mini-gmp and 5.1.2

2013-05-17 Thread Niels Möller

Torbjorn Granlund t...@gmplib.org writes:

 Should we move any of the mini-gmp changes to 5.1.2?

I think the following would make sense to include:

2013-02-25  Niels Möller  ni...@lysator.liu.se

* mini-gmp/tests/t-double.c (testmain): Declare double variables
as volatile, to drop extended precision.

* mini-gmp/tests/testutils.c (testfree): New function. Use it
everywhere where test programs deallocate storage allocated via
the mini-gmp allocation functions, including uses of mpz_get_str
for various test failure messages.

IIRC, the second change fixes crashes when displaying messages about
failing tests.

2013-02-19 Marco Bodrato bodr...@mail.dm.unipi.it

* mini-gmp/mini-gmp.c: Move asserts to work-around a compiler bug.

* mini-gmp/tests/t-reuse.c: Fix typo causing the same negation condition
to be applied to all operands. (See 2013-02-03, Torbjorn)

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

Re: caching of transforms used for large multiplications

2013-06-12 Thread Niels Möller

Daniel Lichtblau d...@wolfram.com writes:

 My question(s): Are these NTT intermediate results cached?

No. I have a tentative interface for this in my small-prime fft code
which haven't yet been integrated with gmp. I think Paul Zimmermann may
also have some patches along those lines for the Schönhage-Strassen fft
which is used in GMP now.

 If not, is there a way to tell GMP to keep them around for
 subsequent reuse? I suspect such caching might be a generally useful
 thing to have if it is not already present.

Interfaces for operations on invariant numbers have been discussed for a
long time, but no public interfaces have been implemented (and few
internal). Caching transforms may be useful also for smaller
multiplications (I think I saw a speedup of a few % already with
Karatsuba, when I played with that). And for division for all sizes.

 For fast GCD it could amount to a notable speed gain, I think. (I
 believe Niels remarked specifically on this but do not have the
 reference handy. Sorry.)

In GCD, there are certain large unbalanced multiplication. Splitting the
large factor and reusing the transform of the small factor is one
possible approach to optimization. Another is transforming the 2x2
matrix elements once (since they're used twice for a matrix x vector
multiplication), that's done in the small-prime fft code. And yet
another is using wraparound arithmetic. (And there are surely other
tricks as well).

Hmm, looking at the GCD code now, it makes some limited use of wraparound
arithmetic. Only for the matrix x vector multiplication for the
first of the two recursive reductions in hgcd. The gmp function to look
for is mpn_mulmod_bnm1.

I also dug up some old slides. Below are some estimates extracted from a
presentation I made at KTH, May 15, 2008. I don't remember much of the
details, but for the largest savings, you'd want to not only cache
transforms, but also do some linear operations in the transform domain.

E.g,, to multiply two 2x2 matrices, first FFT-transform the 4
elements of each matrix, *then* expand them linearly to two 7-element
vectors for Strassen multiplication. Do the 7 pointwise multiplications,
linearly transform them back to four elements, and finally, four inverse
FFTs, one for each element of the result matrix.

Regards,
/Niels

\section{Further work}
\titleslide{Further work}

\begin{frame}
  \frametitle{Matrix multiplication}
  \begin{equation*}
M_1 \cdot M_2 \quad \text{$2\times2$ matrices}
  \end{equation*}
  \bigskip

  Assume \FFT and sizes such that transforms and pointwise
  multiplication take equal time.

  \bigskip
  \begin{tabular}{l|}
 \FFT  \IFFT  Pointwise  Saving \\
\hline
Naive  16  8  8  0\% \\
Schönhage-Strassen  14  7  7  12\% \\
Invariance  8  4  8  37\% \\
S.-S. + invariance  8  4  7  40\% \\
  \end{tabular}
\end{frame}

\begin{frame}
  \frametitle{Matrix-vector multiplication}
  
  \begin{itemize}
  \item If $\alpha, \beta$ are returned: $M$ of size $n/4$, $A', B'$ of size
$n/2$.
\begin{equation*}
  M^{-1} \cdot \M{A \\ B} = 2^p \M{\alpha \\ \beta} + M^{-1} \cdot \M{A' \\ 
B'}
\end{equation*}
\begin{tabular}{l|rrl}
   \#Mults.  Prod. size \\
  \hline
  Naive  4  $3n/4$  Wins in \FFT range \\
  Block  8  $n/2$  Can use invariance \\
  S.-S.  7  $n/2$  Wins in Karatsuba range \\
\end{tabular}
\pause
  \item If only matrix is returned: $M$ of size $n/4$, $A, B$ of size
$n$.
\begin{equation*}
  \M{\alpha \\ \beta} = M^{-1} \cdot \M{A \\ B}
\end{equation*}
$\alpha, \beta$ are of size $3n/4$ (cancellation!). Compute $\bmod (2^k \pm 
1)$,
with transform size $\approx 3n/4$.
\pause
  \item 
\alert{Same transform size}, $3n/4$, no matter if reduced
numbers are available or not!
  \end{itemize}
\end{frame}

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.

___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

Re: caching of transforms used for large multiplications

2013-06-13 Thread Niels Möller

Daniel Lichtblau d...@wolfram.com writes:

 Re GCD usage, mostly I had in mind the matrix multiplications
 (which I believe are balanced) and a few others that,

You may want to have a look at
http://gmplib.org:8000/gcd-nisse/file/tip/mpn/generic/matrix22_mul.c.
This code reuses transforms, and it does a few additions in the
transform domain. So 8 forward transforms, 8 pointwise muls, and 4
inverse transforms. One pointwise multiplication could be eliminated
using Strassen's trick, at the cost of more complicated operations in
the transform domain.

 Possibly one value would be
 a factor of two larger in size than the other (i.e. the smaller
 on the order of the square root of the bigger). This might be
 what Niels had in mind with splitting the larger factor...

Exactly. Another trick, suggested by Paul, iirc: Say you have the 2x2
matrix M, elements of size n, and a vector (x;y) of size 2n, and you
want the (unbalanced) matrix-vector product

  M (x;y)

Then split x and y as x = x_1 B^n + x_0, y = y_1 B^n + y_0, and compute
the balanced *matrix* product

  M (x_0, x_1; y_0, y_1)

using any optimization tricks available. The two resulting columns can
be combined cheaply to get the desired vector.

 It was indeed Niels' Subquadratic GCD, dated October 30, 2008-- did
 not say where the presentation was given but it could well be an
 update of the slides referred to in Niels' response.

I wonder where I have those files. The timing matches the two-month
period where I worked at KTH, sharing a room with Torbjörn and working
with integration and optimization of the subquadratic gcd code. I
made a longer presentation at the department towards the end of that
project.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

Re: caching of transforms used for large multiplications

2013-06-15 Thread Niels Möller

Daniel Lichtblau d...@wolfram.com writes:

 I'm fairly sure readers will understand that no offense was intended.

Well, on a public and technical mailing list, jokes at the expense of
other developers are not appropriate. Except possibly when you hold a
*correct* belief that the subject of the joke will not be offended.

 That said, I'll not trouble you with further communication.

It appears you and Torbjörn are no longer on speaking terms. Which seems
unnecessary. Respectfull and productive communication is possible even
with some degree of personal dislike. It requires a bit more care in
writing than communication between best friends, of course.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

Re: intel core i5 ivy bridge

2013-08-15 Thread Niels Möller

Hamid Shayesteh shirazch...@yahoo.com writes:

 can i install gmp library on laptop model asus k55vd(cpu: core i5 ivy
 bridge),

Most likely yes, but it also depends on which operating system you use.

 if yes, please help me and send instruction.
 i use gcc for programming.

The gmp-devel list is for discussions relating to development of the gmp
library itself, not general questions from gmp users.

Follow the installation instructions at
https://gmplib.org/manual/Installing-GMP.html#Installing-GMP. If you
encounter any problems, please send any further questions to the
gmp-discuss or the gmp-bugs list as appropriate (see
https://gmplib.org/#MAILINGLISTS).

Or get a precompiled package for your operating system, e.g., apt-get
install libgmp-dev on Debian GNU/Linux.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

Re: Licensing issue with gnutls and gmp

2013-08-19 Thread Niels Möller

Tomas Mraz tm...@redhat.com writes:

 Would it be possible to dual license the next GMP release under the
 LGPLv3+ and  GPLv2+ licenses? I heard that it might be possible to do
 that.

That has been discussed. I think it makes a lot of sense, but it's
unlikely to happen any time soon.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

Re: ARM public key benchmark

2013-08-23 Thread Niels Möller

ni...@lysator.liu.se (Niels Möller) writes:

 David Miller da...@davemloft.net writes:

 #include linux/smp.h

 ...

  on_each_cpu(my_func, my_func_args, 1);

 Cool. I'll have to play with that, but probably not until next week.

And some months later... I tried that now. I use the below linux kernel
module which modifies the user enable register when loaded. On all
cores.

Nevertheless, when I read the enable bit in a user space program it is
zero (and instructions trying to access the cycle counter fails with
illegal instruction).

I then added some code to the module to display the value of the bit
before and after I try to write it. If I load and immediately unload the
module, the values are as I expect:

  root@arm:/home/nisse# insmod ~nisse/incoming/enableccnt.ko  ; rmmod enableccnt
  root@arm:/home/nisse# dmesg |tail -14
  [5728897.345672] enableccnt starting
  [5728897.349487] enableccnt enableccnt_write: 1
  [5728897.354186] enableccnt before: 0
  [5728897.357910] enableccnt after: 1
  [5728897.361572] enableccnt enableccnt_write: 1
  [5728897.366271] enableccnt before: 0
  [5728897.369995] enableccnt after: 1
  [5728897.385314] enableccnt stopping
  [5728897.392120] enableccnt enableccnt_write: 0
  [5728897.399871] enableccnt before: 1
  [5728897.406616] enableccnt after: 0
  [5728897.413146] enableccnt enableccnt_write: 0
  [5728897.420806] enableccnt before: 1
  [5728897.427429] enableccnt after: 0

(Note that there are two cores). But if I delay unload just a second,
something else resets the flag in the mean time:

  root@arm:/home/nisse# insmod ~nisse/incoming/enableccnt.ko  ; sleep 1; rmmod 
enableccnt
  root@arm:/home/nisse# dmesg |tail -14
  [5729055.679229] enableccnt starting
  [5729055.685974] enableccnt enableccnt_write: 1
  [5729055.693603] enableccnt before: 0
  [5729055.700286] enableccnt after: 1
  [5729055.706848] enableccnt enableccnt_write: 1
  [5729055.714508] enableccnt before: 0
  [5729055.721221] enableccnt after: 1
  [5729056.752655] enableccnt stopping
  [5729056.759399] enableccnt enableccnt_write: 0
  [5729056.767028] enableccnt before: 0
  [5729056.773742] enableccnt after: 0
  [5729056.780364] enableccnt enableccnt_write: 0
  [5729056.788024] enableccnt before: 0
  [5729056.794738] enableccnt after: 0

Which maybe isn't too surprising, after all the kernel owns all
registers. But I greped the kernel sources, and I really find no
accesses to this status register anywhere. So I'm lost now.

If anyone on this list have some hint on where to look, that's
appreciated. The platform is still an OMAP4 Panda board. Otherwise I
guess I'll have to give up this track.

Regards,
/Niels

/* Adapted from http://bench.cr.yp.to/cpucycles/netwalker.html */
#include linux/module.h
#include linux/kernel.h
MODULE_LICENSE(Dual BSD/GPL);

#define DEVICE_NAME enableccnt

static void
enableccnt_write (void *info)
{
  int val_write = *(int *) info;
  int val_read;
  printk (KERN_INFO DEVICE_NAME  enableccnt_write: %d\n, val_write);
  asm volatile (mrc p15, 0, %0, c9, c14, 0: =r(val_read)); 
  printk (KERN_INFO DEVICE_NAME  before: %d\n, val_read);
  asm volatile (mcr p15, 0, %0, c9, c14, 0 :: r(val_write));
  asm volatile (mrc p15, 0, %0, c9, c14, 0: =r(val_read)); 
  printk (KERN_INFO DEVICE_NAME  after: %d\n, val_read);
}

static int enableccnt_init(void)
{
  int val = 1;
  printk(KERN_INFO DEVICE_NAME  starting\n);
  on_each_cpu (enableccnt_write, val, 1);
  return 0;
}

static void enableccnt_exit(void)
{
  int val = 0;
  printk(KERN_INFO DEVICE_NAME  stopping\n);
  on_each_cpu (enableccnt_write, val, 1);
}

module_init(enableccnt_init);
module_exit(enableccnt_exit);

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

Re: automake -a

2013-09-23 Thread Niels Möller

Marc Glisse marc.gli...@inria.fr writes:

 It seems that it would be
 easiest to do what the comment in .bootstrap says, remove those files
 from the repository and use automake -a to make sure we have a
 consistent version of everything (no -f so it doesn't override our
 specific INSTALL, config.guess, etc).

Makes sense to me, as long as removing files in the repo is decided on a
case-by-case basis. E.g., I think the COPYING file should stay in the
repo, even if automake -a currently installs an identical file.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

Re: division-free binary-to-decimal conversion

2013-10-07 Thread Niels Möller

Zimmermann Paul paul.zimmerm...@inria.fr writes:

 with Cyril Bouvier we have written a preprint describing several division-free
 algorithms to convert from binary to decimal (the mp?_get_str routines):

  http://www.loria.fr/~zimmerma/papers/get_str.pdf

I've had a first reading. Seems to give impressive speedups. And nice
with another application of mulmid.

I think I noticed one typo: On page 3, the second inequality just
after Eq. (3), should probably say 0 = y/2^n  1, not 0 = y b^k / 2^n
 1.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

Re: mpn_divexact_1 comments

2013-10-16 Thread Niels Möller

Mark Sofroniou ma...@wolfram.com writes:

 If the quotient will be one word shorter than the dividend then set
 the top word to zero:

   if ((divisor  1) == 0)
 {
   if ((dst != src)  (src[size - 1]  divisor)) dst[size - 1] = 0;
 ...

I don't understand. Doesn't the current loop always write all quotient
limbs, including dst[size-1]? And why the condition (dst != src) ???

 If the divisor is a power of two then skip the division code and
 just shift or copy:

Low-level gmp functions, like this one, usually don't do that type of
optimizations. The caller can do that check, if powers of two are likely
enough that it matters. And I magine constant d is an important usecase.

Furthermore, this function is implemented in assembly on several
platforms, and adding a power-of-two check and a call to mpn_rshift to
all implementations of mpn_divexact_1 would get a bit messy.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

div_qr_1 interface

2013-10-16 Thread Niels Möller

 of the License, or (at your
+ option) any later version.
+ 
+ The GNU MP Library is distributed in the hope that it will be useful, but
+ WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public
+ License for more details.
+ 
+ You should have received a copy of the GNU Lesser General Public License
+ along with the GNU MP Library.  If not, see http://www.gnu.org/licenses/.  */
+ 
+ #include gmp.h
+ #include gmp-impl.h
+ #include longlong.h
+ 
+ #if GMP_NAIL_BITS  0
+ #error Nail bits not supported
+ #endif
+ 
+ mp_limb_t
+ mpn_divf_qr_1n_pi1 (mp_ptr qp, mp_size_t n, mp_limb_t u,
+   mp_limb_t d, mp_limb_t dinv)
+ {
+   ASSERT (n  0);
+   ASSERT (d  GMP_NUMB_HIGHBIT);
+   ASSERT (u  d);
+ 
+   do
+ {
+   mp_limb_t q;
+   udiv_qrnnd_preinv (q, u, u, 0, d, dinv);
+   qp[--n] = q;
+ }
+   while (n  0);
+ 
+   return u;
+ }
diff -Nrc2 gmp.4fa4b9b52bd5/mpn/generic/div_qr_1n_pi1.c 
gmp/mpn/generic/div_qr_1n_pi1.c
*** gmp.4fa4b9b52bd5/mpn/generic/div_qr_1n_pi1.c1970-01-01 
01:00:00.0 +0100
--- gmp/mpn/generic/div_qr_1n_pi1.c 2013-10-16 21:15:19.0 +0200
***
*** 0 
--- 1,58 
+ /* mpn_div_qr_1n_pi1
+ 
+Contributed to the GNU project by Niels Möller
+ 
+THIS FILE CONTAINS INTERNAL FUNCTIONS WITH MUTABLE INTERFACES.  IT IS
+ONLY SAFE TO REACH THEM THROUGH DOCUMENTED INTERFACES.  IN FACT, IT IS
+ALMOST GUARANTEED THAT THEY'LL CHANGE OR DISAPPEAR IN A FUTURE GNU MP
+RELEASE.
+ 
+ 
+ Copyright 2013 Free Software Foundation, Inc.
+ 
+ This file is part of the GNU MP Library.
+ 
+ The GNU MP Library is free software; you can redistribute it and/or modify
+ it under the terms of the GNU Lesser General Public License as published by
+ the Free Software Foundation; either version 3 of the License, or (at your
+ option) any later version.
+ 
+ The GNU MP Library is distributed in the hope that it will be useful, but
+ WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public
+ License for more details.
+ 
+ You should have received a copy of the GNU Lesser General Public License
+ along with the GNU MP Library.  If not, see http://www.gnu.org/licenses/.  */
+ 
+ #include gmp.h
+ #include gmp-impl.h
+ #include longlong.h
+ 
+ #if GMP_NAIL_BITS  0
+ #error Nail bits not supported
+ #endif
+ 
+ /* Divides (uh B^n + {up, n}) by d, storing the quotient at {qp, n}.
+Requires that uh  d. */
+ mp_limb_t
+ mpn_div_qr_1n_pi1 (mp_ptr qp, mp_srcptr up, mp_size_t n, mp_limb_t uh,
+  mp_limb_t d, mp_limb_t dinv)
+ {
+   ASSERT (n  0);
+   ASSERT (uh  d);
+   ASSERT (d  GMP_NUMB_HIGHBIT);
+   ASSERT (MPN_SAME_OR_SEPARATE_P (qp, up, n));
+ 
+   do
+ {
+   mp_limb_t q, ul;
+ 
+   ul = up[--n];
+   udiv_qrnnd_preinv (q, uh, uh, ul, d, dinv);
+   qp[n] = q;
+ }
+   while (n  0);
+ 
+   return uh;
+ }
diff -Nrc2 gmp.4fa4b9b52bd5/mpn/generic/divrem_1.c gmp/mpn/generic/divrem_1.c
*** gmp.4fa4b9b52bd5/mpn/generic/divrem_1.c 2013-10-16 21:15:19.0 
+0200
--- gmp/mpn/generic/divrem_1.c  2013-10-16 21:15:19.0 +0200
***
*** 138,154 
  invert_limb (dinv, d);
  
! for (i = un - 1; i = 0; i--)
!   {
! n0 = up[i]  GMP_NAIL_BITS;
! udiv_qrnnd_preinv (*qp, r, r, n0, d, dinv);
! r = GMP_NAIL_BITS;
! qp--;
!   }
! for (i = qxn - 1; i = 0; i--)
!   {
! udiv_qrnnd_preinv (*qp, r, r, CNST_LIMB(0), d, dinv);
! r = GMP_NAIL_BITS;
! qp--;
!   }
  return r;
}
--- 138,149 
  invert_limb (dinv, d);
  
! /* Undo the offsetting */
! qp -= (n-1);
! 
! if (un  0)
!   r = mpn_div_qr_1n_pi1 (qp + qxn, up, un, r, d, dinv);
! 
! if (qxn  0)
!   r = mpn_divf_qr_1n_pi1 (qp, qxn, r, d, dinv);
  return r;
}


-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.

___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

Re: div_qr_1 interface

2013-10-17 Thread Niels Möller

ni...@lysator.liu.se (Niels Möller) writes:

 To get going, I've written C implementations of mpn_div_qr_1n_pi1 and
 mpn_divf_qr1n_pi1, and made divrem_1 call them.

Below, also an mpn_div_qr_1, using these primitives (and with some
inspiration from divrem_1). For return value, I use the type

  typedef struct { mp_limb_t r; mp_limb_t qh; } gmp_qr_1_t;

The order here is a micro-optimization. I expect that on most ABI:s, for
the typical code fragment

  gmp_qr_1_t res;
  ...
  res.r = mpn_foo (...);
  return res;

the return value from mpn_foo will already be in the right register, and
only qh needs to be moved from some callee-save register to the second
return value register.

Are there any problems with using a struct return values like here, in
terms of performance, portability, style...?

If you agree this makes sense, I'm considering checking in the div_qr_1*
functions fairly soon (maybe divrem_1 changes can wait until
div_qr_1*_pi1 has settled. The interface should allow for additional
precomputed values).

I also found some old x86_64 assembly code for the new div_qr_1
algorithm. With perfect scheduling, it could run at 10 cycles on opteron
(the main cpu of interest at the time). Main loop is 28 instructions,
two independent mul, and decoder limited).

Regards,
/Niels

/* mpn_div_qr_1 -- mpn by limb division.

   Contributed to the GNU project by Niels Möller and Torbjörn Granlund

Copyright 1991, 1993, 1994, 1996, 1998, 1999, 2000, 2002, 2003, 2013 Free 
Software
Foundation, Inc.

This file is part of the GNU MP Library.

The GNU MP Library is free software; you can redistribute it and/or modify
it under the terms of the GNU Lesser General Public License as published by
the Free Software Foundation; either version 3 of the License, or (at your
option) any later version.

The GNU MP Library is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public
License for more details.

You should have received a copy of the GNU Lesser General Public License
along with the GNU MP Library.  If not, see http://www.gnu.org/licenses/.  */

#include gmp.h
#include gmp-impl.h
#include longlong.h

#ifndef DIV_QR_1_NORM_THRESHOLD
#define DIV_QR_1_NORM_THRESHOLD 3
#endif
#ifndef DIV_QR_1_UNNORM_THRESHOLD
#define DIV_QR_1_UNNORM_THRESHOLD 3
#endif

#if GMP_NAIL_BITS  0
#error Nail bits not supported
#endif

/* Divides {up, n} by d. Writes the n-1 low quotient limbs at {qp,
 * n-1}. Returns remainder and high quotient limb. */
gmp_qr_1_t
mpn_div_qr_1 (mp_ptr qp, mp_srcptr up, mp_size_t n,
  mp_limb_t d)
{
  gmp_qr_1_t res;

  unsigned cnt;
  mp_limb_t uh;

  ASSERT (n  0);
  ASSERT (d  0);

  if (d  GMP_NUMB_HIGHBIT)
{
  /* Normalized case */
  mp_limb_t dinv, q;

  uh = up[--n];

  q = (uh = d);
  res.qh = q;
  uh -= (-q)  d;

  if (BELOW_THRESHOLD (n, DIV_QR_1_NORM_THRESHOLD))
{
  cnt = 0;
plain:
  while (n  0)
{
  mp_limb_t ul = up[--n];
  udiv_qrnnd (qp[n], uh, uh, ul, d);
}
  res.r = uh  cnt;
  return res;
}
  invert_limb (dinv, d);
  res.r = mpn_div_qr_1n_pi1 (qp, up, n, uh, d, dinv);
  return res;
}
  else
{
  /* Unnormalized case */
  mp_limb_t dinv, ul;

  if (! UDIV_NEEDS_NORMALIZATION
   BELOW_THRESHOLD (n, DIV_QR_1_UNNORM_THRESHOLD))
{
  uh = up[--n];
  udiv_qrnnd (res.qh, uh, 0, uh, d);
  cnt = 0;
  goto plain;
}

  count_leading_zeros (cnt, d);
  d = cnt;

#if HAVE_NATIVE_div_qr_1u_pi1
  /* FIXME: Call loop doing on-the-fly normalization */
#endif

  /* Shift up front, use qp area for shifted copy. A bit messy,
 since we have only n-1 limbs available, and shift the high
 limb manually. */
  uh = up[--n];
  ul = (uh  cnt) | mpn_lshift (qp, up, n, cnt);
  uh = (GMP_LIMB_BITS - cnt);

  if (UDIV_NEEDS_NORMALIZATION
   BELOW_THRESHOLD (n, DIV_QR_1_UNNORM_THRESHOLD))
{
  udiv_qrnnd (res.qh, uh, uh, ul, d);
  up = qp;
  goto plain;
}
  invert_limb (dinv, d);

  udiv_qrnnd_preinv (res.qh, uh, uh, ul, d, dinv);
  res.r = mpn_div_qr_1n_pi1 (qp, qp, n, uh, d, dinv)  cnt;
  return res;
}
}

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

Re: div_qr_1 interface

2013-10-17 Thread Niels Möller

Torbjorn Granlund t...@gmplib.org writes:

 I am not too enthusiastic about struct return types for critical
 functions.  I expect this to be returned via a stack slot everywhere o
 almost everywhere.

As far as I understand, the most common ABIs for x86_64 and ARM (which
is pretty close to almost everywhere...) both return structs of this form
in two registers: %rax/%rdx, and %r0/%r1.

Consider the test compilation unit

  typedef struct {
unsigned long q; unsigned long r;
  } qr_t;
  
  qr_t divrem (unsigned long u, unsigned long d)
  {
qr_t res;
res.q = u/d;
res.r = u - res.q*d;
return res;
  }

On x86_64 (and gnu/linux), gcc -c -O compiles this to

   0:   48 89 f8mov%rdi,%rax
   3:   31 d2   xor%edx,%edx
   5:   48 f7 f6div%rsi
   8:   48 89 famov%rdi,%rdx
   b:   48 0f af f0 imul   %rax,%rsi
   f:   48 29 f2sub%rsi,%rdx
  12:   c3  retq   

Both inputs and outputs are passed in registers. The return value is the
only thing stored on the stack.

 I recall to have seen some code for that.  How fast does it run
 currently on the various CPUs?

Don't know yet.

 Code comment:

 I think we cannot afford to do a separate lshift of the dividend operand
 when the divisor is just a few limbs.  We need to to shifting on-the-
 fly, however irksome that might be.  AN mpn_div_qr_1u_pi1 is called-for.

I think we'll definitely want mpn_div_qr_1u_pi1 for the most common
platforms. I was thinking, that maybe we could let it be an optional
function, with no C implementation, and resort to a separate mpn_lshift
if the function is missing.

But if needed, it's no big deal to extract a C mpn_div_qr_1u_pi1 from
divrem_1.c, with on-the-fly shifting. 

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

Re: div_qr_1 interface

2013-10-17 Thread Niels Möller

ni...@lysator.liu.se (Niels Möller) writes:

 Consider the test compilation unit

   typedef struct {
 unsigned long q; unsigned long r;
   } qr_t;
   
   qr_t divrem (unsigned long u, unsigned long d)
   {
 qr_t res;
 res.q = u/d;
 res.r = u - res.q*d;
 return res;
   }

 On x86_64 (and gnu/linux), gcc -c -O compiles this to

0: 48 89 f8mov%rdi,%rax
3: 31 d2   xor%edx,%edx
5: 48 f7 f6div%rsi
8: 48 89 famov%rdi,%rdx
b: 48 0f af f0 imul   %rax,%rsi
f: 48 29 f2sub%rsi,%rdx
   12: c3  retq   

And I guess it is relevant to compare this to the same function,
reorganized to return the second value via a pointer:

  unsigned long divrem (unsigned long u, unsigned long d,
unsigned long *qp)
  {
unsigned long q;
q = u/d;
*qp = q;
  
return u - q * d;
  }

which is compiled to

   0:   48 89 d1mov%rdx,%rcx
   3:   48 89 f8mov%rdi,%rax
   6:   31 d2   xor%edx,%edx
   8:   48 f7 f6div%rsi
   b:   48 0f af f0 imul   %rax,%rsi
   f:   48 89 01mov%rax,(%rcx)
  12:   48 89 f8mov%rdi,%rax
  15:   48 29 f0sub%rsi,%rax
  18:   c3  retq   

If the caller is going to store the returned value directly in memory
anyway, there's little difference. And if the caller is going to operate
on the return value, and needs it in a register, I think struct return
should be epsilon more efficient.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

Re: div_qr_1 interface

2013-10-20 Thread Niels Möller

ni...@lysator.liu.se (Niels Möller) writes:

 I'm about to push the first step, with C implementations of mpn_div_qr_1
 and mpn_div_qr_1n_pi1.

Done now, including some tuning code. It would be interesting to have

  DIV_QR_1N_PI1_METHOD
  DIV_QR_1_NORM_THRESHOLD
  DIV_QR_1_UNNORM_THRESHOLD

added to the tuning reports on the web.

I'll try to get the x86_64 assembly for mpn_div_qr_1n_pi1 in soon.
I think my plan is to do things in this order:

1. Get div_qr_1 working efficiently.

2. Get divf_qr_1 (fraction) working efficiently.

3. Let divrem_1.c use the new code.

4. Check for regressions, and delete divrem_1.asm for onw platform at a
   time.

but this will most likely take a long time to complete.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

Re: div_qr_1 interface

2013-10-20 Thread Niels Möller

Torbjorn Granlund t...@gmplib.org writes:

 Which tail call?

In the normalized case, the checked in mpn_div_qr_1 does something like

  *qh = ...;
  ...
  return mpn_div_qr_1n_pi1(...);

Which is a nice tail call. With the struct-returning version one
gets instead

  res.qh = ...;
  ...
  res.r = mpn_div_qr_1n_pi1(...);
  return res;

which I don't think the compiler has any chance of turning into a tail
call.

 Should we unconfuse ourselves and users about what type of inverse is to
 be passed?  The pi1 moniker might be replaced.  How many one-limb
 inverse types do we currently consider?

It would make some sense to me to do like the mod_1_* functions, and
have a corresponding _cps function precomputing anything needed. The ABI
then says nothing about the details, only that a maximum of 4 (say)
precomputed limbs will be used.

For the fancy div_qr_1 algorithm, what's needed is the same old
reciprocal dinv = (B^2-1)/d - B, and B^2 - d*(B+dinv), same as for
mod_1_1. And shiftcount, in the unnormalized case.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

Re: div_qr_1 interface

2013-10-20 Thread Niels Möller

ni...@lysator.liu.se (Niels Möller) writes:

 I'll try to get the x86_64 assembly for mpn_div_qr_1n_pi1 in soon.

Pushed first working version now, see
http://gmplib.org:8000/gmp/file/tip/mpn/x86_64/div_qr_1n_pi1.asm

On my core2 laptop:

$ ./speed -s 2-10,100,500 -C mpn_divrem_1.0x 
mpn_div_qr_1.0x
overhead 6.13 cycles, precision 1 units of 8.33e-10 secs, CPU freq 1200.00 
MHz
mpn_divrem_1.0x mpn_div_qr_1.0x
2 60.6420  #39.9427
3#40.9839   55.0469
4#43.7667   44.4534
5 44.6333  #38.9055
6 39.6259  #34.4167
7 34.0063  #32.4018
8 30.1364  #28.5745
9 29.6472  #27.4599
1029.1270  #26.7300
100   24.7920  #20.6700
500   24.4400  #19.7600

So here it's a clear win, except an ugly regression for n = 3.

On shell, the same command gives:

2#37.4379   51.1157
3#30.0256   61.0904
4#25.8058   27.0781
5#23.2717   24.2831
6#21.7520   22.4346
7#20.5219   21.
8#19.4783   20.1101
9#18.7726   19.3369
10   #18.3271   18.7228
100  #13.8063   13.8175
500  #13.2670   13.2750

So here the new code is epsilon slower for the larger sizes. Maybe the
loopmixer can help.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

Re: div_qr_1 interface

2013-10-21 Thread Niels Möller

Torbjorn Granlund t...@gmplib.org writes:

 On Intel chips, op-to-mem is expensive.  Even op-from-memory is often
 slower than load+op.  (I understand the register shortage problem.)

The following (untested) variant needs one register too many.

  UP, QP, UN:   Load, store, loop counter.
  DINV, B2, B2md:   Loop invariant constants.
  U2, U1I, U0, Q1I, Q0: Inputs.
  U1O, Q1O: Outputs.
  Q2, %rax, %rdx:   Locals.

Also U1I - U1O recurrency chain (with opteron cycle counts)

mov U2, Q2
mov U2, Q1O
neg Q2
mov DINV, %rax
and DINV, Q1O
mul U1I
add Q0, Q1O
adc $0, Q2
mov %rax, Q0
add %rdx, Q1O
adc $0, Q2

mov B2, %rax
and B2, U2
mul U1I C 0 6
lea (U0, B2md), U1O
add U0, U2
cmovnc  U2, U1O
adc U1I, Q1O
adc Q1I, Q2
mov Q2, (QP, UN, 8)
jc  L(incr)

L(incr_done):
mov (UP, UN, 8), U0
add %rax, U0C 4
adc %rdx, U1O   C 5
sbb U2, U2  C 6

25 instructions (27 K10 decoder slots) excluding loop overhead.

But one variable must be moved out of the registers. Maybe B2md (used
once) is the best candidate. Then

lea (U0, B2md), U1O

would have to be replaced by

mov (%rsp), U1O C Can be done very early
...
add U0, U1O

We then have 26 instructions + loop overhead, or 54 instructions for 2
iterations. Or possibly DINV, if one thinks the quotient logic is less
critical.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

Re: div_qr_1 interface

2013-10-21 Thread Niels Möller

Torbjorn Granlund t...@gmplib.org writes:

 I looked at the logic following this:

 sbb U2, U2  C 7 13

 You negate the U2 copy in Q2.  It seems that three adc by sbb
 could avoid the neg.

The problem is the final use, where Q2 is added, with carry, to a
different register. It's tempting to replace

adc Q1I, Q2

with

sbb Q2, Q1I

and negated Q2, but I'm afraid that will get the sense of the carry
wrong. Do you see any trick to get that right without negating Q2
somewhere along the way?

 I might also be possible to replace the early loop and stuff by
 cmov.

Maybe, but the simple way to do conditional addition with lea + cmov
won't to, since we also need carry out.

Does it matter if we do

mov B2, r
and mask, r

or

mov $0, r
cmovc   B2, r

?

 To optimise register usage, I sometimes annotate the code with live
 ranges for each register.  That will help with register coalescing.

There are lots of possibilities, since the computations for Q and U are
mostly independent. The data flow is something like

  load U limb
   |
  _V_
  U2, U1I, U0 - |___| - U2, U1O, U0 
   \   |__/ cy
   _V__V___V_
Q1I, Q0- |__|  - Q1O, Q0
\
 V
   store Q limb

 (T is rather shot-lived, perhaps its register could serve two usages?)

It could perhaps eliminated.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

Re: div_qr_1 interface

2013-10-22 Thread Niels Möller

Torbjorn Granlund t...@gmplib.org writes:

 * The code is no win for AMD k10/k8 (although close to 10 c/l might well be
   possible)

I tried replacing one masking op by cmov, as you suggested. We then get
down to 11.25 c/l on K10. I put this modified version in the k10
subdirectory, since it was a significant slowdown on some other
processors.

Next thing to try is to delay the Q1 store, but that's a bit more work.
After that, I guess I should try the loop mixer.

I benchmarked the code on the k8, k10, core2, sandybridge, nehalem and
nano machines. I couldn't log in to haswell and piledriver.

/Niels


-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

Re: div_qr_1 interface

2013-10-22 Thread Niels Möller

Torbjorn Granlund t...@gmplib.org writes:

 I turned out the code was a bit slower on k8.

 This patch changes that.  With it applied, things takes 11 c/l on both
 pipelines.  This is also a 2 c/l improvement for piledriver.

Cool. 

 I have not tested that this is correct.  If you like the patch, please
 consider putting the result in the k8 subdir.

I'll try to get that done reasonably soon.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

Re: div_qr_1 interface

2013-10-22 Thread Niels Möller

Torbjorn Granlund t...@gmplib.org writes:

 But I lack unit testing code for the function, making hacking quite
 cumbersome.  I don't feel safe hacking *any* GMP assembly code without
 tests/devel/try.c's function and access checks.

tests/mpn/t-div.c includes tests for mpn_div_qr_1, including single
guard limbs before and after the q area. That provides for some
reasonable testing, and I was running while GMP_CHECK_RANDOMIZE=0
./t-div ; : ; done over night in the weekend.

But sure, support also in try.c would be good.

 (1) Shorten a dep chain, and avoid keeping CF live over an inc
 instruction.  The cmov doesn't really depend on sbb, since the
 latter insn never really changes carry.  (This might btw be useful
 to teach loppmixer!)

Right, using cmov (instead of masking) allows some more reordering.

 (2) Reallocate Q2 to an old register (not r8-r15) and then use the
 32-bit form of adc $0,reg.  That form is shorter.

Clever!

 (3) Offet UP to avoid the offset in the loop.

Should be straight forward. Or one could offset UN, but I think it's
nice to be able to use dec / jnz for the loop. Or is the obscure loop
decrement and branch instruction useful?

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

Re: div_qr_1 interface

2013-10-22 Thread Niels Möller

ni...@lysator.liu.se (Niels Möller) writes:

 ni...@lysator.liu.se (Niels Möller) writes:

 But sure, support also in try.c would be good.

 Added now.

And sure enough, it detects some bugs in the new assembly code. For size
n==1, there's a missing mov. I'll add that shortly. Then there's another
problem with n==2, which needs a bit more debugging. These sizes aren't
exercised by div_qr_1.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

Re: div_qr_1 interface

2013-10-24 Thread Niels Möller

Torbjorn Granlund t...@gmplib.org writes:

 Basically qp = up won't work, but qp = up + k for any positive k will?
 Does the C code share that property?

I think the C code has the same problem. Hmm, and it looks like in both
the C and asm code, it's only the first iteration, before the main loop,
which has this problem. Then it should be easy to fix, in the asm case,
it's one additional mov and not in the main loop.

 I think it would be good to fix that, since it is surely a common usage
 scenario.

I agree.

If/when the code is reorganized to delay the q stores (to avoid the adc
Q2,8(QP, UN, 8) instruction), I think the overlap issue will also get
fixed in the process.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

Re: A contribution to GMP

2013-10-24 Thread Niels Möller

Torbjorn Granlund t...@gmplib.org writes:

 We might want to re-phrase it somewhat, perhaps just consider -
 seriously consider, or even New projects should use mpfr since mpf is
 slower, less complete, and not actively developed.

I guess in theory, mpf could be slightly faster than mpfr since mpf
doesn't make any effort to have precisely defined rounding. But if it
isn't faster in practice, and doesn't have any other advantages over
mpfr, it would make sense to clearly deprecate all use of mpf.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

Re: div_qr_1 interface

2013-10-24 Thread Niels Möller

ni...@lysator.liu.se (Niels Möller) writes:

 Torbjorn Granlund t...@gmplib.org writes:

 Basically qp = up won't work, but qp = up + k for any positive k will?
 Does the C code share that property?
[...]
 I think it would be good to fix that, since it is surely a common usage
 scenario.

 I agree.

I've checked in some bugfixes. Now it seems to work according to try,
and without the no-overlap patch which it turned out I never pushed into
the repo.

 If/when the code is reorganized to delay the q stores (to avoid the adc
 Q2,8(QP, UN, 8) instruction),

Here's a first crude version, which seems to work. I could eliminate
other uses of T, and then use that register to the previous Q1 limb
around

I had to change the logic

lea (B2md, U0), T
...
and B2, U2
add U2, U0
cmovnc  U0, T
adc U1, Q1

Here, B2md holds B^2 (mod d) - d, and U2 is initially a mask. To use U2
as temporary (it dies at the add), I had to change it to

and B2, U2
add U2, U0
lea (B2md, U0), U2
cmovnc  U0, U2
adc U1, Q1

which gives the same result, provided that B2md is adjusted to just hold
-d. This sequence has less friendly dependencies. Maybe it would be
better to stick some of the loop-invariant constants in memory instead.

It's getting late, so I'll not do any benchmarking right now.

Regards,
/Niels

dnl  x86-64 mpn_div_qr_1n_pi1
dnl  -- Divide an mpn number by a normalized single-limb number,
dnl using a single-limb inverse.

dnl  Contributed to the GNU project by Niels Möller

dnl  Copyright 2013 Free Software Foundation, Inc.

dnl  This file is part of the GNU MP Library.

dnl  The GNU MP Library is free software; you can redistribute it and/or modify
dnl  it under the terms of the GNU Lesser General Public License as published
dnl  by the Free Software Foundation; either version 3 of the License, or (at
dnl  your option) any later version.

dnl  The GNU MP Library is distributed in the hope that it will be useful, but
dnl  WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
dnl  or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public
dnl  License for more details.

dnl  You should have received a copy of the GNU Lesser General Public License
dnl  along with the GNU MP Library.  If not, see http://www.gnu.org/licenses/.

include(`../config.m4')


C   c/l
C AMD K8,K9 13
C AMD K10   13
C AMD bull  16.5
C AMD pile  15
C AMD steam  ?
C AMD bobcat16
C AMD jaguar ?
C Intel P4  47  poor
C Intel core19.25
C Intel NHM 18
C Intel SBR 15  poor
C Intel IBR 13
C Intel HWL 11.7
C Intel BWL  ?
C Intel atom52  very poor
C VIA nano  19


C INPUT Parameters
define(`QP', `%rdi')
define(`UP', `%rsi')
define(`UN_INPUT', `%rdx')
define(`U1', `%rcx')C Also in %rax
define(`D', `%r8')
define(`DINV', `%r9')

C Invariants
define(`B2', `%rbp')
define(`B2md', `%rbx')

C Variables
define(`UN', `%r8') C Overlaps D input
define(`T', `%r10')
define(`U0', `%r11')
define(`U2', `%r12')
define(`Q0', `%r13')
define(`Q1', `%r14')
define(`Q2', `%r15')

ABI_SUPPORT(STD64)

ASM_START()
TEXT
ALIGN(16)
PROLOGUE(mpn_div_qr_1n_pi1)
FUNC_ENTRY(6)
IFDOS(` mov 56(%rsp), %r8   ')
IFDOS(` mov 64(%rsp), %r9   ')
dec UN_INPUT
jnz L(first)

C Just a single 2/1 division.
C T, U0 are allocated in scratch registers
lea 1(U1), T
mov U1, %rax
mul DINV
mov (UP), U0
add U0, %rax
adc T, %rdx
mov %rdx, T
imulD, %rdx
sub %rdx, U0
cmp U0, %rax
lea (U0, D), %rax
cmovnc  U0, %rax
sbb $0, T
cmp D, %rax
jc  L(single_div_done)
sub D, %rax
add $1, T
L(single_div_done):
mov T, (QP)
FUNC_EXIT
ret
L(first):
C FIXME: Could delay some of these until we enter the loop.
push%r15
push%r14
push%r13
push%r12
push%rbx
push%rbp

mov D, B2
imulDINV, B2
neg B2
C   mov B2, B2md
C   sub D, B2md
mov D, B2md
neg B2md

C D not needed until final reduction
pushD
mov UN_INPUT, UNC Clobbers D

mov DINV, %rax
mul U1
mov %rax, Q0
add U1, %rdx
mov %rdx, Q1

mov B2, %rax
mul U1
mov -8(UP, UN, 8), U0
mov (UP, UN, 8), U1
C mov   Q1, (QP, UN, 8)
add %rax, U0
adc %rdx, U1
sbb U2, U2
dec UN
mov U1, %rax
jz  L(final)

ALIGN(16)

C Loop is 28 instructions, 30 decoder slots

Re: div_qr_1 interface

2013-10-25 Thread Niels Möller

Torbjorn Granlund t...@gmplib.org writes:

 ni...@lysator.liu.se (Niels Möller) writes:

   The interesting thing is that the next higher function, mpn_div_qr_1,
   should return the high quotient limb separately.

 I am not sure I agree.  Please explain.

A long time ago, we choose an interface for sbpi1_div_qr which does
*not* store the most significant limb; instead it returns it. I think it
was the intention that a new top-level mpn_div_qr should follow that
convention, and not store the top limb. I don't remember if that was
purely for esthetic reasons, or if it also simplified the divideconquer
division code.

Having some of the special division case functions, like mpn_div_qr_1,
storing the top quotient limb leads to unnecessary complications when
writing the general mpn_div_qr; it forces mpn_div_qr for n/1 to do the
top limb itself and call mpn_div_qr_1 with (n-1)/1, which seems like a
case of bad impedance mismatch.

But then, requirements for div_qr_1 don't necessarily translate to
requirements for the primitive division loops we're discussing now.

For the recent mpn_div_qr_1 and mpn_div_qr_1n_pi1, it turned out to work
fine to have mpn_div_qr_1 for n/1 do the top quotient limb up front, and
let mpn_div_qr_1n_pi1 generate n-1 quotient limbs (but n-1 *full* limbs,
since we pass in a high u limb, reduced to uh  d).

It's unclear to me if the best design for the unnormalized case is to
follow mpn_div_qr_1n_pi1 or not. It get be clearer after writing a
decent mpn_div_qr_1u_pi1.

Regards,
/Niels




-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

Re: Amd64 relocation R_X86_64_32S in a static lib

2013-11-05 Thread Niels Möller

Torbjorn Granlund t...@gmplib.org writes:

 Now we have (at least) two OpenBSD ABIs for AMD64, pre 5.3 and now 5.3,
 5.4.  To make sense of things, I would not be surprised to see
 R_X86_64_64 banned from 5.5 and on, creating a 3rd OpenBSD AMD64 ABI.

I don't understand the fine details of which reloc types make sense in
pic code, but if I understand Philip correctly, the main problem is not
a ABI change, but changed compiler default. And then you get link errors
when linking together pic objects created by the compiler, with non-pic
objects created from the gmp assembly files.

I think you'd get similar problems as if a user configures gmp with,
e.g., --disable-shared CFLAGS=-fpic. Do you agree with this analysis?

To figure out if an invocation $(CC) $(CFLAGS) generates pic code or
not, one needs a configure test like

  AC_TRY_COMPILE(
  ...
  #if __PIC__
  #error
  bla bla
  #endif
  ...
  )

Using the result of that test when deciding whether or not assembly
files should use pic mode should give the correct behavior, both with
compilers doing pic by default, and users enabling it explicitly.

/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

Re: Amd64 relocation R_X86_64_32S in a static lib

2013-11-05 Thread Niels Möller

Torbjorn Granlund t...@gmplib.org writes:

 If one tries to assemble and link the former code on a 5.3/5.4 system,
 one gets again the error relocation R_X86_64_32S can not be used when
 making a shared object; recompile with -fPIC.

What other objects were involved in the link? I guess something like
crt.o was compiled as non-pic on pre-5.3 systems, but as pic-code on
5.3? I'd like to know if the problem is that the linker input files are
an somehow incompatible mix of pic and non-pic objects, or if the
problem is is that by default the linker insists on creating a position
independent executable (PIE).

 I'd call this an ABI change since code valid under the AMD64 ELF ABI as
 well as on older OpenBSD releases is no longer supported.

If you move an *executable* from pre-5.3 to 5.3, it might still work
fine, if non-position-independent-executables are still supported, and
compatible crt.o and the like are included in the executable.

It's very clear that the ABI-spec defines which executables and shared
libraries are valid, but I imagine one could have a long
flamewa^H^H^H^H^H^Hdebate on whether or not the ABI really specifies
which object files and static libraries are valid when used with the
default compiler settings. But in any case, it's going to be
inconvenient for users if static libraries can't be copied over to a
later OS release.

 (There are pros and cons with feature tests, though.  I consider the
 free software OSes out there to be primary targets for GMP, and I'd like
 to avoid creating GMP objects that cannot be moved between FooBSD x.y
 and FooBSD x.y+k.

I think it would be a good test to compile gmp on pre-5.3, and use that
library when compiling and linking some simple program on openbsd-5.3.
Trying both with a static and a shared gmp library. And maybe trying
also with --disable-assembly.

If any of that fails, that's clearly demonstrates some type of ABI
incompatibility.

I'd also like to see the motivation for the discontinued support for

movl$17, %edi
call*jumptab(,%rax,8)

Which seems to be a very sensible thing to use in pic code. Offsets
between text and data segment are usually known at link time. Does
openbsd use some flavor of address space randomization which adds a
random load-time offset between .text and .data?

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

Re: Amd64 relocation R_X86_64_32S in a static lib

2013-11-05 Thread Niels Möller

Philip Guenther guent...@gmail.com writes:

 That's bad.  But then why not allow R_X86_64_32S too?

 Because there's no guarantee that the target is within 2GB, as the 
 reference may resolve to the symbol in a different shared object which is 
 too far away.

Can you use ELF visibility to provide the needed guarantee that it will
never resolve to a symbol in a different shared object?

I think references *within* a shared object is an important and common
case, worthy of some optimization.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

Re: [PATCH] Support powerpc64le-linux platform

2013-12-04 Thread Niels Möller

Ulrich Weigand uweig...@de.ibm.com writes:

 - Update configfsf.guess/sub to current upstream versions.

We should definitely do that, independent of the other fixes. Please
remind if we don't get to that within a week or so (I can't do it right
away).

 - Change mpn/powerpc64/elf.m4 to generate appropriate assembler
   code for the new ELFv2 ABI.  The relevant change here is that
   we no longer require function descriptors; instead, functions
   that use the TOC register need to provide a pair of global
   and local entry points.  For more details, see the GCC patch
   messages here:
   http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01144.html
   http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01141.html

Looks sane to me (without understanding the fine details). Maybe
Torbjörn has some comments. I'm not really familiar neither with how ABI
variants are handled in GMP, nor with powerpc assembly.

 Would this be OK for GMP mainline?

I think so.

 (I haven't submitted a GMP patch before; if there's anything else I
 should be doing to facilitate this process, please let me know!)

We ask for FSF copyright assignments for all substantial changes. Yours
is maybe borderline, but I think we'd still like to have some paperwork.
We can arrange it off-list.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel

Re: porting gmp to pnacl

2013-12-20 Thread Niels Möller

Kovács Zoltán zol...@geogebra.org writes:

 3. When I try to compile a simple demo manually, this is what I get:

 kovzol@kovzol:~/tmp/nacl_sdk/naclports/src/out/repository/gmp-5.1.3/demos$
 make isprime
 /home/kovzol/tmp/nacl_sdk/pepper_31/toolchain/linux_x86_newlib/bin/x86_64-nacl-gcc
 -std=gnu99 -O2 -pedanticisprime.c   -o isprime
 /tmp/ccBeMP1x.o: In function `main':
 isprime.c:(.text+0x11c): undefined reference to `__gmpz_init'

At least here the problem is obvious: You don't link with libgmp at all.
I have no idea why the Makefile generates a command like that. If you
think that is a probem with GMP configure and Makefiles, submit a bug
report to gmp-b...@gmplib.org, carefully following the instructions in
the manual.

For help with building gmp, gmp-devel is not the right forum. Perhaps
there's someone on gmp-discuss who can help you, otherwise I think
you'll have better luck in a forum more specific to your platform and
toolchain.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel

Re: Side-channel silent modular inverse

2013-12-27 Thread Niels Möller

Torbjorn Granlund t...@gmplib.org writes:

 I suppose I already suggested that one computes a^{-1} mod b
 as a^{b-1} mod b, using a plain old modexp.

For prime b (which is an important special case).

 I realise that this will be asymptotically slower, in this setting
 O(n^3) vs O(n^2), but it ought have a much lower constant factor.

I think powm actually was slower when I tried, for the sizes of a few
limbs which were relevant for ecdsa, but I'm not sure. Some benchmarking
is needed.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel

Re: Side-channel silent modular inverse

2013-12-27 Thread Niels Möller

Torbjorn Granlund t...@gmplib.org writes:

 * mpn_sec_add_1
 
 I'd say use the obvious algorithm: Create vector of n-1 zeros and then
 the input limb arg at index 0, invoke mpn_add_n.

That's good enough, I guess, at the cost of some extra scratch space.

 * mpn_cnd_neg

 Create zero vector, invoke mpn_sub_n.

That doesn't make it conditional. And I see no obvious way to do
conditional negation on top of mpn_cnd_sub_n.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel

Re: Including limits.h in gmp-impl.h

2013-12-27 Thread Niels Möller

Marc Glisse marc.gli...@inria.fr writes:

 Then I am considering pushing the attached patch soon.

Do we really need a configure test? Which C compilers lack limits.h?

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel

Re: Side-channel silent modular inverse

2013-12-27 Thread Niels Möller

Torbjorn Granlund t...@gmplib.org writes:

 Compute T = 2 x A using mpn_add_n or mpn_lshift.
 Use mpn_cnd_sub_n with A, T as arguments.

Should work (except if T is computed mod B^n, one doesn't get the
correct carry out, but that isn't needed here). But it's a bit awkward,
and this is a performacne critical function; some 30% of the time to
create a side-channel silent ecdsa signature is spent doing the modular
inversion.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel

Re: Side-channel silent modular inverse

2013-12-27 Thread Niels Möller

Torbjorn Granlund t...@gmplib.org writes:

 I had neglected the significance of modular inversion for elliptic curve
 arithmetic.

In my implementation, it's needed in two places.

* For ecdsa signatures, the random nonce k is inverted mod q, the ecc
  group order.

* When converting coordinates back from jacobian representation to
  affine representation. Then the z coordinate is inverted mod p.

 My suggestion was just for a reasonably efficient fall-back.

Fair enough.

/nisse


-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel

Re: TODO for 5.2

2013-12-29 Thread Niels Möller

Torbjorn Granlund t...@gmplib.org writes:

 I notice you make this non-public.  Is it premature to make it part of
 the public interface?

I just put the declarations together with the other mpn_sec_* functions.
I think it makes sense to make mpn_sec_div_*, mpn_sec_minvert and
mpn_sec_powm public together. Does mpn_sec_powm need more work (besides
the rename) before made public?

Also about naming, I noticed the inconsistency between mpz_invert and
mpn_invert, which do very different things. If/when we add an mpz
function corresponding to mpn_sec_minvert, I guess that would be named
mpz_sec_invert?

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel

Re: [PATCH 01/20] Delete mpn/generic/sizeinbase.c

2014-01-01 Thread Niels Möller

Torbjorn Granlund t...@gmplib.org writes:

 Checked old mail, but it is only mentioned 7 months earlier, when Kevin
 aded it.  It does not make sense as an internal function, the
 MPN_SIZEINBASE macro plays its role.

 I suppose we could preserve it, and make it public.

Below a patch to that effect. I think it's good to make this function
public. Needs documentation. *Maybe* it doesn't need its own tests, since
it's the same as mpz_sizeinbase but with different argument type.

Regards,
/Niels

diff -r ffe8c8da8c90 configure.ac
--- a/configure.ac  Wed Jan 01 16:45:58 2014 +0100
+++ b/configure.ac  Wed Jan 01 23:07:15 2014 +0100
@@ -2800,7 +2800,8 @@
   mul mul_fft mul_n sqr mul_basecase sqr_basecase nussbaumer_mul  \
   mulmid_basecase toom42_mulmid mulmid_n mulmid
   \
   random random2 pow_1\
-  rootrem sqrtrem get_str set_str scan0 scan1 popcount hamdist cmp\
+  rootrem sqrtrem sizeinbase get_str set_str  \
+  scan0 scan1 popcount hamdist cmp\
   perfsqr perfpow \
   gcd_1 gcd gcdext_1 gcdext gcd_subdiv_step   \
   gcdext_lehmer
   \
diff -r ffe8c8da8c90 gmp-h.in
--- a/gmp-h.in  Wed Jan 01 16:45:58 2014 +0100
+++ b/gmp-h.in  Wed Jan 01 23:07:15 2014 +0100
@@ -1575,6 +1575,9 @@
 #define mpn_set_str __MPN(set_str)
 __GMP_DECLSPEC mp_size_t mpn_set_str (mp_ptr, const unsigned char *, size_t, 
int);
 
+#define mpn_sizeinbase __MPN(sizeinbase)
+__GMP_DECLSPEC size_t mpn_sizeinbase (mp_srcptr, mp_size_t, int);
+
 #define mpn_sqrtrem __MPN(sqrtrem)
 __GMP_DECLSPEC mp_size_t mpn_sqrtrem (mp_ptr, mp_ptr, mp_srcptr, mp_size_t);
 

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel

Re: Including limits.h in gmp-impl.h

2014-01-02 Thread Niels Möller

Marc Glisse marc.gli...@inria.fr writes:

 So, do you agree with the change?

Yes. I don't understand why the C spec doesn't define USHORT_MAX the way
it does, but we should nevertheless define our constant in the same way
as C's USHORT_MAX.

 (care to add a suitable comment?)

I think the comment could be improved. But if you intend to remove the
definition soon anyway, it doesn't matter.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel

Re: [PATCH 01/20] Delete mpn/generic/sizeinbase.c

2014-01-02 Thread Niels Möller

Torbjorn Granlund t...@gmplib.org writes:

 This makes sense to me.  Please apply (and please write documentation!).

Done.

Not sure about the right interface. Now, the docs says that it is
required that n  0 and that the input must be normalized (i.e., top
limb non-zero), even though the current implementation also allows n =
0.

I'm thinking that maybe it makes sense to generalize the function to
allow unnormalized input. If so, one also has to specify the return
value in case the input is zero. I'm leaning towards returning 0 in this
case, but maybe consistency with the mpz function (which returns 1) is
more important.

That would be a compatible extension of the currently documented
interface, so it doesn't have to be decided before release.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel

Re: TODO for 5.2 v3

2014-01-07 Thread Niels Möller

[0] = b;
+  MPN_ZERO (scratch + 1, n-1);
+  return mpn_sub_n (rp, ap, scratch, n);
+}
diff -r 84343784aa3d mpn/x86_64/cnd_neg.asm
--- /dev/null   Thu Jan 01 00:00:00 1970 +
+++ b/mpn/x86_64/cnd_neg.asmTue Jan 07 15:13:37 2014 +0100
@@ -0,0 +1,66 @@
+dnl  AMD64 mpn_cnd_neg
+
+dnl  Copyright 2014 Free Software Foundation, Inc.
+
+dnl  This file is part of the GNU MP Library.
+
+dnl  The GNU MP Library is free software; you can redistribute it and/or modify
+dnl  it under the terms of the GNU Lesser General Public License as published
+dnl  by the Free Software Foundation; either version 3 of the License, or (at
+dnl  your option) any later version.
+
+dnl  The GNU MP Library is distributed in the hope that it will be useful, but
+dnl  WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+dnl  or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public
+dnl  License for more details.
+
+dnl  You should have received a copy of the GNU Lesser General Public License
+dnl  along with the GNU MP Library.  If not, see https://www.gnu.org/licenses/.
+
+include(`../config.m4')
+
+C INPUT PARAMETERS
+define(`cnd',  `%rdi') dnl rcx
+define(`rp',   `%rsi') dnl rdx
+define(`up',   `%rdx') dnl r8
+define(`n',`%rcx') dnl r9
+C scratch parameter is ignored
+
+ABI_SUPPORT(DOS64)
+ABI_SUPPORT(STD64)
+
+ASM_START()
+   TEXT
+   ALIGN(16)
+PROLOGUE(mpn_cnd_neg)
+   FUNC_ENTRY(4)
+
+   lea (up,n,8), up
+   lea (rp,n,8), rp
+
+   neg n
+
+   neg cnd
+   sbb cnd, cndC make cnd mask, also copy to cy
+
+L(loop):
+   mov (up, n, 8), %r8
+   sbb R32(%rax), R32(%rax)C Save carry
+   xor cnd, %r8C Clears carry, very annoying.
+   add R32(%rax), R32(%rax)C Restore carry
+   adc $0, %r8 
+   mov %r8, (rp, n, 8)
+   inc n
+   jne L(loop)
+   C Generate carry out, if cnd and x != 0
+   inc R32(%rax)
+   and R32(cnd), R32(%rax)
+   FUNC_EXIT()
+   ret
+EPILOGUE()
+
+
+PROLOGUE(mpn_cnd_neg_itch)
+   xor R32(%rax), R32(%rax)
+   ret
+EPILOGUE()
diff -r 84343784aa3d mpn/x86_64/cnd_swap.asm
--- /dev/null   Thu Jan 01 00:00:00 1970 +
+++ b/mpn/x86_64/cnd_swap.asm   Tue Jan 07 15:13:37 2014 +0100
@@ -0,0 +1,61 @@
+dnl  AMD64 mpn_cnd_swap
+
+dnl  Copyright 2014 Free Software Foundation, Inc.
+
+dnl  This file is part of the GNU MP Library.
+
+dnl  The GNU MP Library is free software; you can redistribute it and/or modify
+dnl  it under the terms of the GNU Lesser General Public License as published
+dnl  by the Free Software Foundation; either version 3 of the License, or (at
+dnl  your option) any later version.
+
+dnl  The GNU MP Library is distributed in the hope that it will be useful, but
+dnl  WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+dnl  or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public
+dnl  License for more details.
+
+dnl  You should have received a copy of the GNU Lesser General Public License
+dnl  along with the GNU MP Library.  If not, see https://www.gnu.org/licenses/.
+
+include(`../config.m4')
+
+C INPUT PARAMETERS
+define(`cnd',  `%rdi') dnl rcx
+define(`up',   `%rsi') dnl rdx
+define(`vp',   `%rdx') dnl r8
+define(`n',`%rcx') dnl r9
+C scratch parameter is ignored
+
+ABI_SUPPORT(DOS64)
+ABI_SUPPORT(STD64)
+
+ASM_START()
+   TEXT
+   ALIGN(16)
+PROLOGUE(mpn_cnd_swap)
+   FUNC_ENTRY(4)
+
+   neg cnd
+   sbb cnd, cndC make cnd mask
+
+   lea (up,n,8), up
+   lea (vp,n,8), vp
+
+   neg n
+
+L(loop):
+   mov (up, n, 8), %r8
+   mov (vp, n, 8), %r9
+   mov %r8, %r10
+   xor %r9, %r8
+   and cnd, %r8
+   xor %r8, %r10
+   xor %r8, %r9
+   mov %r10, (up, n, 8)
+   mov %r9, (vp, n, 8)
+   inc n
+   jne L(loop)
+
+   FUNC_EXIT()
+   ret
+EPILOGUE()

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel

Re: TODO for 5.2 v3

2014-01-13 Thread Niels Möller

bodr...@mail.dm.unipi.it writes:

 Ciao,

 Il Mar, 7 Gennaio 2014 4:58 pm, Niels Möller ha scritto:
 Here's a first patch adding a couple of other functions. Benchmarking
 and testing is missing (except that the sec_minvert tests still pass).

 Interesting...

Another thing I was about to ask, but forgot, is use of volatile.

I added it to the mpn_cnd_swap and mpn_sec_eq_ui prototypes, in an
attempt to tell the compiler to not be too clever. But I'm not entirely
sure where it is useful.

 diff -r 84343784aa3d mpn/x86_64/cnd_neg.asm
 --- /dev/nullThu Jan 01 00:00:00 1970 +
 +++ b/mpn/x86_64/cnd_neg.asm Tue Jan 07 15:13:37 2014 +0100

 What about _itch functions when the .asm source is used?

They're supposed to exist, and return zero. E.g.,

+PROLOGUE(mpn_cnd_neg_itch)
+   xor R32(%rax), R32(%rax)
+   ret
+EPILOGUE()

 +C scratch parameter is ignored

 _itch should return 0...

 And some obvious patch like the following would take advantage of this.

Nice, but won't make any difference until we have mpn_sec_add_1
assembly, right?

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel

Re: mini-gmp

2014-01-18 Thread Niels Möller

Zimmermann Paul paul.zimmerm...@inria.fr writes:

 Here are the division functions I had to re-implement on top of mini-gmp
 (I know mpn_divrem is obsolete, but its interface is better suited for MPFR):

Do you need fraction limbs? From a quick look, it seems like you
implemented that for mpn_divrem_1, but not for mpn_divrem and
mpn_tdiv_qr?

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel

Re: Cleaning out varargs

2014-01-19 Thread Niels Möller

Torbjorn Granlund t...@gmplib.org writes:

 Since we cleaned out KR stuff a few years back, we could require
 stdarg.h without causing additional portability problems, right?

Right.

There's also some $U left in Makefile.am, which as far as I understand
is a remnant of ansi2knr support.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel

Re: Cleanups

2014-01-19 Thread Niels Möller

Torbjorn Granlund t...@gmplib.org writes:

 We have a non-public mpn_invert
 function family, which actually computes a fraction.  I suggest that we
 rename them to something like mpn_finvert* (at least before they are
 made public).

In our papers, I think we ended up using the term reciprocal. Maybe
some abbreviation of that would make sense. Also applies to invert_limb.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel

Re: mpz_limbs interface

2014-01-21 Thread Niels Möller

Zimmermann Paul paul.zimmerm...@inria.fr writes:

 we have looked at the mpz_limbs interface, 

Thanks for the review.

 it seems it won't be sufficient for
 our needs in MPFR. Consider for example the following function, which
 generates nbits random bits into mp[]:

I see. In this particular case, I think the right gmp interface change
is to add mpn_urandomb and mpn_rrandomb (similar to current mpn_random
and mpn_random2, but with a randstate argument). If I understand this
correctly, the main obstacle is that random number internals use mpz
functions, which is an undesirable dependency for mpn functions.

Any other problematic cases?

   MPFR_ASSERTN (nbits = 1);
   /* To be sure to avoid the potential allocation of mpz_urandomb */
   ALLOC(z) = SIZ(z) = MPFR_PREC2LIMBS (nbits);
   PTR(z)   = mp;
[...]
   mpz_urandomb (z, rstate, nbits);

I think this is brittle, and I don't think this style should be
officially supported in the public gmp interface. mpz_urandomb is not
included in mini-gmp, but if you, e.g., try the same with other
functions, it will most likely break with mini-gmp which assigns many of
the output parameters using mpz_swap.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel

Re: mpz_limbs interface

2014-01-21 Thread Niels Möller

Torbjorn Granlund t...@gmplib.org writes:

 ni...@lysator.liu.se (Niels Möller) writes:

   I see. In this particular case, I think the right gmp interface change
   is to add mpn_urandomb and mpn_rrandomb (similar to current mpn_random
   and mpn_random2, but with a randstate argument). If I understand this
   correctly, the main obstacle is that random number internals use mpz
   functions, which is an undesirable dependency for mpn functions.
   
 We should indeed do this.

After 5.2?

 For testing the result, I'd write a wrapper C program that run old and
 new code in separate processes, and compare that the results are
 identical for the same (now and then regenerated) seeds.

If the generation is intended to be machine-independent, it would also
make sense to also add tests generating some random numbers from a few
fixed seeds, and compare to expected (constant) numbers.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel

Re: mpz_limbs interface

2014-01-21 Thread Niels Möller

Marc Glisse marc.gli...@inria.fr writes:

 We already have function mpz_array_init which encourages thinking of
 the buffer as belonging to a variable,

There's a ChangeLog entry from the day before yesterday:

* doc/gmp.texi: Undocument mpz_array_init.

I take it Torbjörn really wants to deprecate this function.

Are you using it in mpfr?

 Gratuitous extra alloc+dealloc pairs can have a horrible performance
 effect.

One can still use mpz_realloc2, which can be viewed as an allocation
hint to the gmp implementation (so in mini-gmp, it's harmless but also
mostly useless, due to the frequent use of mpz_swap).

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel

Re: New mpn random generators

2014-01-21 Thread Niels Möller

Torbjorn Granlund t...@gmplib.org writes:

 ni...@lysator.liu.se (Niels Möller) writes:

   After 5.2?
   
 Probably, but you're welcome to fix it today also.  :-)

 Or would you suggest that we postpone the release?

No, I donät think the release whould wait for new randomness functions.
As you say, the design is not entirely obvious, and maybe that has to
take some time.

Actually, there are two parts: (i) Rewrite current code to not use mpz
internally, and (ii) define a new public mpn interface. First part has
no interface design issues.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel

Re: mpz_limbs interface

2014-01-21 Thread Niels Möller

Marc Glisse marc.gli...@inria.fr writes:

 I was talking of the case where I already have a preexisting buffer I
 want to use as _mp_d, I don't want to allocate a new one with
 mpz_realloc2. That seems to be a few steps further than the new limb
 interface goes, and I can understand not wanting to support that.

No, the new limb interface can only do it the other way round. If you
want to use a buffer as mpz_t output, and access it with low-level
fucntions too, you have give the reponsibility for allocating it to
_mpz_realloc, and write the code to tolerate that any use of the buffer
as an mpz_t output might do an unexpected reallocation.

When does this situation arise? In the given example, the root cause was
a missing mpn-level function in gmp.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel

sec_invert performance

2014-02-04 Thread Niels Möller

I'm confused by the pretty bad performance of sec_invert. Try e.g.,

  ./tune/speed -s 1-200 -P invert mpn_gcdext mpn_sec_invert mpz_powm_sec

Say we have inputs of n bits. Then my understadning of these algorithms
is:

* mpn_gcdext, for sizes of interest, is Lehmer's algorithm, O(n^2), with
  a reasonably small constant since it reduces roughly one limb per
  iteration.

* mpn_sec_powm does n squarings + redc (and then some n/window_size
  multiplies and table lookups). And the squarings and redc are O(n^2),
  so total running time should be O(n^3).

* mpn_sec_invert has a loop of 2n iterations. Each iterations makes 9
  calls to various O(n) functions, like mpn_cnd_add_n, mpn_rshift, etc.
  So it ought to be O(n^2), and a constant factor slower than
  mpn_gcdext.

But when I run speed, *both* ratios mpn_invert / mpn_gcdext and
mpz_sec_powm / mpn_gcdext keep growing with n. So mpn_invert is not a
constant factor slower than mpn_gcdext, as I would have expected, and
it's significantly slower than mpn_sec_powm for largish sizes like 100
limbs. So it looks like it's O(n^3), not O(n^2).

Any idea what's going on? 

And then, there's also a sharp step at n = 113 or 114 (maybe that's an
artifact of the testmachine, but it's reproducible here).

$ ./speed -s 100-120  -c -r mpn_gcdext mpn_sec_invert mpz_powm_sec 
overhead 5.42 cycles, precision 1 units of 6.25e-10 secs, CPU freq 1600.00 
MHz
   mpn_gcdext mpn_sec_invert  mpz_powm_sec
100#286753.00   97.9985   68.7822
101#290981.00   96.3812   69.3512
102#297710.00   94.5395   68.9482
103#301772.00   94.9747   69.6444
104#308297.00   94.8706   69.0619
105#310131.00   96.1709   70.7960
106#312842.00   96.6907   70.8039
107#321814.00   96.2753   70.4176
108#313583.00  100.2901   73.2001
109#321223.00  100.3469   72.9247
110#328283.00   99.1086   72.5408
111#333812.00   99.4312   72.9408
112#335619.00  100.7185   73.4388
113#343734.00   46.7804   73.1944
114#352574.00   46.8327   72.9897
115#356299.00   46.9179   73.3826
116#366523.00   46.0239   72.2455
117#371422.00   46.5541   72.7283
118#367404.00   50.3468   74.7026
119#374872.00   48.6011   74.7945
120#376131.00   48.3491   75.4658

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.

___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel

Re: sec_invert performance

2014-02-04 Thread Niels Möller

Torbjorn Granlund t...@gmplib.org writes:

 I quick guess is that the exponent is fixed for powm, not a function of
 the input size.

Ah, its hardcoded to always use 6 limbs. If I change it to use the same
size as b and m, results are a bit different:

$ ./speed -s 1-64 -c -r mpn_gcdext mpn_sec_invert mpz_powm_sec
overhead 5.43 cycles, precision 1 units of 6.25e-10 secs, CPU freq 1600.00 
MHz
   mpn_gcdext mpn_sec_invert  mpz_powm_sec
1#1393.539.25632.8281
2#3700.838.45302.8537
3#5628.829.22964.7198
4#7387.089.87056.5449
5#9270.67   12.22719.4494
6   #11102.89   13.2985   11.9077
7   #12923.54   14.2668   14.8305
8   #14985.50   14.9848   17.2423
9   #16826.96   17.1405   21.4808
10  #18998.00   17.9614   24.7252
11  #20390.00   19.6006   29.7402
12  #23244.21   19.9373   32.0728
13  #24235.58   22.4859   39.6380
14  #26027.70   23.5316   44.4109
15  #29157.89   23.7497   47.7260
16  #30368.62   26.6497   53.3113
17  #32534.43   28.4042   60.1591
18  #33580.67   29.1482   69.0387
19  #34424.60   31.2023   78.6099
20  #36470.00   32.4313   84.8543
21  #36405.25   35.8625   99.2162
22  #41335.25   33.8338   97.9815
23  #39861.67   38.2781  116.4425
24  #41421.67   39.4835  124.2736
25  #44087.67   41.0079  138.0154
26  #48386.33   39.9403  139.1569
27  #40868.00   50.5410  184.1573

So if you know that the modulo is prime (or has some other type of known
factorization), using powm actually beats sec_invert for sizes up to 6
limbs.

I still puzzled why the ratio between sec_invert and gcdext doesn't
approach some constant (I'd expect the ratio to start growing again for
sizes above the threshold for subquadratic gcdext).

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel

Re: sec_invert performance

2014-02-04 Thread Niels Möller

Torbjorn Granlund t...@gmplib.org writes:

 All linear-time functions really take Bn+C time, and quadratic-time
 functions take An²+Bn+C, of course.  If the sec_invert function has
 small B and/or C while gcdext has larger B and/or C, then the observed
 pattern seem even more reasonable.  Do you agree?

Not sure. I had expected the quadratic terms to dominate (and hence the
ratio almost converging to the ratio between the n^2 coefficients), well
before we cross the GCDEXT_DC_THRESHOLD which is on the order of 500
limbs.

But I haven't had any deep thoughts about it. (I guess one could also
try to estimate all of A, B and C from the measurements. Putting such
numerics into speed is something I'd like to do sometime).

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel

Re: mpz_limbs interface

2014-02-06 Thread Niels Möller

ni...@lysator.liu.se (Niels Möller) writes:

 For mpn_set_d, I think it would make some sense to have it return a
 base-2 exponent, and write the mantissa to a few limbs. Number of limbs
 would be a constant, part of the ABI, similar to LIMBS_PER_DOUBLE but
 renamed for external use.

   mp_bitcnt_t 
   mpn_set_d (mp_limb_t rp[LIMBS_PER_BOUBLE], d);

Below is a patch to do this (and return value is long, not mp_bitcnt_t,
since it needs to be signed).

What do you think?

Regards,
/Niels

diff -Nprc gmp.2462ec1bc65c/gmp-h.in gmp/gmp-h.in
*** gmp.2462ec1bc65c/gmp-h.in   2014-02-06 09:29:33.665957246 +0100
--- gmp/gmp-h.in2014-02-06 09:29:33.669957220 +0100
*** __GMP_DECLSPEC mp_bitcnt_t mpn_scan1 (mp
*** 1584,1589 
--- 1584,1593 
  #define mpn_set_str __MPN(set_str)
  __GMP_DECLSPEC mp_size_t mpn_set_str (mp_ptr, const unsigned char *, size_t, 
int);
  
+ #define MPN_SET_D_SIZE ((53 + GMP_NUMB_BITS - 1) / GMP_NUMB_BITS)
+ #define mpn_set_d __MPN(set_d)
+ __GMP_DECLSPEC long mpn_set_d (mp_ptr, double);
+ 
  #define mpn_sizeinbase __MPN(sizeinbase)
  __GMP_DECLSPEC size_t mpn_sizeinbase (mp_srcptr, mp_size_t, int);
  
diff -Nprc gmp.2462ec1bc65c/mpn/generic/set_d.c gmp/mpn/generic/set_d.c
*** gmp.2462ec1bc65c/mpn/generic/set_d.c1970-01-01 01:00:00.0 
+0100
--- gmp/mpn/generic/set_d.c 2014-02-06 09:29:33.669957220 +0100
***
*** 0 
--- 1,160 
+ /* mpn_set_d convert from double to array of mp_limb_t.
+ 
+ Copyright 1996, 1999-2002, 2006, 2012, 2014 Free Software Foundation, Inc.
+ 
+ This file is part of the GNU MP Library.
+ 
+ The GNU MP Library is free software; you can redistribute it and/or modify
+ it under the terms of either:
+ 
+   * the GNU Lesser General Public License as published by the Free
+ Software Foundation; either version 3 of the License, or (at your
+ option) any later version.
+ 
+ or
+ 
+   * the GNU General Public License as published by the Free Software
+ Foundation; either version 2 of the License, or (at your option) any
+ later version.
+ 
+ or both in parallel, as here.
+ 
+ The GNU MP Library is distributed in the hope that it will be useful, but
+ WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ for more details.
+ 
+ You should have received copies of the GNU General Public License and the
+ GNU Lesser General Public License along with the GNU MP Library.  If not,
+ see https://www.gnu.org/licenses/.  */
+ 
+ #include gmp.h
+ #include gmp-impl.h
+ 
+ #ifdef XDEBUG
+ #undef _GMP_IEEE_FLOATS
+ #endif
+ 
+ #ifndef _GMP_IEEE_FLOATS
+ #define _GMP_IEEE_FLOATS 0
+ #endif
+ 
+ /* Stores the mantissa as MPN_SET_D_SIZE limbs at rp, and returns the
+exponent. Defined so that after e = mpn_set_d (rp, double d),
+ 
+  d = {rp, MPN_SET_D_SIZE} 2^e
+ 
+Always returns with rp[MPN_SET_D_SIZE - 1]  0, but besides this
+requirement, the precise normalization is unspecified.
+ */
+ long
+ mpn_set_d (mp_ptr rp, double d)
+ {
+   long exp;
+   ASSERT (d  0); /* Exclude negatives */
+   ASSERT (d == d);/* Exclude NaN */
+   ASSERT (d != 0.5*d);/* Exclude infinities */
+ 
+   MPN_ZERO (rp, MPN_SET_D_SIZE);
+ 
+ #if _GMP_IEEE_FLOATS
+   {
+ #if defined (__alpha)  __GNUC__ == 2  __GNUC_MINOR__ == 8
+ /* Work around alpha-specific bug in GCC 2.8.x.  */
+ volatile
+ #endif
+ union ieee_double_extract x;
+ mp_size_t i;
+ uint_least32_t manl, manh;
+ mp_bitcnt_t shift;
+ x.d = d;
+ exp = x.s.exp;
+ manl = x.s.manl;
+ manh = x.s.manh;
+ 
+ if (exp  0)
+   /* Set implicit one bit */
+   manh |= 1UL  20;
+ else
+   /* Denormalized case, not optimized */
+   while ( (manh  (1UL  22)) == 0)
+   {
+ manh = (manh  1) | (manl  31);
+ manl = (manl  1)  0x;
+ exp--;
+   }
+ 
+ /* Assign mantissa */
+ #if GMP_NUMB_BITS = 32
+ rp[0] = manl;
+ #else
+ for (i = 0; manl  0; manl = GMP_NUMB_BITS, i++)
+   rp[i] = manl  GMP_NUMB_MASK;
+ #endif
+ 
+ shift = 32 % GMP_NUMB_BITS;
+ i = 32 / GMP_NUMB_BITS;
+ 
+ rp[i++] |= ( (mp_limb_t) manh  shift)  GMP_NUMB_MASK;
+ if (GMP_NUMB_BITS - shift  20)
+   {
+   manh = (GMP_NUMB_BITS - shift);
+ #if GMP_NUMB_BITS = 32
+   rp[i] = manh;
+ #else
+   for (; manh  0; manh = GMP_NUMB_BITS, i++)
+ rp[i] = manh  GMP_NUMB_MASK;
+ #endif
+   }
+ 
+ exp -= (1023 + 52);
+   }
+ #else /* Non-ieee-floats */
+   {
+ mp_size_t i;
+ mp_limb_t x;
+ 
+ exp = 0;
+ /* Normalize to range 0.5 = d  1.0. Arbitrarily, start with 32-bit 
steps. */
+ if (d  1.0)
+   do
+   {
+ d *= 4294967296.0; /* 2^32 */
+ exp -= 32;
+   }
+   while (d  1.0);
+ else if (d = 4294967296.0)
+   do
+   {
+ d *= 1.0 / 4294967296.0;
+ exp += 32;
+   }
+   while (d

Re: mpz_limbs interface

2014-02-06 Thread Niels Möller

Marc Glisse marc.gli...@inria.fr writes:

 Why not return int, since int is what we use for _mp_size?

int would be sufficient for all known float formats I guess. I choose
long because that was the signed type closest to mp_bit_cnt_t.

 Is 53 really safe for non-IEEE double? Maybe something based on
 DBL_MANT_DIG, assuming that FLT_RADIX==2?

The existing code was limited to 53 bits (e.g., in the definition of
LIMBS_PER_DOUBLE), I didn't try to improve on that part.

 I don't think we are still supporting gcc-2.8...

That volatile hack? I just copied it from __gmp_extract_double. Probably
it's no longer needed.

 From a performance POV, it may not be optimal to split the
 sign/infinity/nan/zero test from the rest, but I agree it makes the
 interface simpler.

I'm not sure what's the right thing for mpn_set_d to do with such
inputs. It seemed simplest to disallow them, and then we need no special
error return, and we can produce a normalized output in all cases.

It's a bit strange to disallow zero, though. One alternative might be to
allow zero, and specify that rp[MPN_SET_D_SIZE-1] == 0 iff the input is
zero.

 +   ASSERT (d != 0.5*d); /* Exclude infinities */

 That excludes more than infinities, it might also exclude
 FLT_TRUE_MIN, no?

I would have expected that FLT_TRUE_MIN * 0.5 == 0.0. And then it's not
excluded by that assert. But I'm not familiar with those fine floating
point details.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel

Re: mpz_limbs interface

2014-02-06 Thread Niels Möller

Vincent Lefevre vinc...@vinc17.net writes:

 In rounding mode toward +inf (FE_UPWARD), FLT_TRUE_MIN * 0.5 gives
 FLT_TRUE_MIN. 

I see.

 You may need:
   ASSERT (d - d == 0);

That should exclude both infinities and NaN:s, right?

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel

Re: mpz_limbs interface

2014-02-06 Thread Niels Möller

Vincent Lefevre vinc...@vinc17.net writes:

 Yes. If you want to differentiate between NaN and infinities:
 d != d is true only for NaN.

I'm looking at the definition of DOUBLE_NAN_INF_ACTION in gmp-impl.h.
Maybe it could be simplified to a single, unconditional, definition

#define DOUBLE_NAN_INF_ACTION(x, a_nan, a_inf)  \
  do {  \
if (UNLIKELY ((x) - (x) != 0.0))\
  { \
if ((x) != (x)) { a_nan; }  \
else { a_inf; } \
  } \
  } while (0)

Does any of the current three definitions have any advantage over the
above portable one? I would hope that the above is

  * no slower then the _GMP_IEEE_FLOATS definition (which extracts the
exponent via a union).

  * equivalent to the NOP definition for VAX; then x - x != 0.0 should be
always false, and the compiler ought to know.

  * faster than the generic version, which implies a sequence of three
(false) tests in the common case.

I'm tempted to 

1. Replace the DOUBLE_NAN_INF_ACTION macro.

2. Check in the mpn_set_d changes (possibly with smaller tweaks).

3. Make mpn_get_d public, and document both functions.

Ok? Or should this wait until after 5.2?

Ah, and about mpn_get_d. Maybe it would make sense to drop the sign
argument, and and use the sign of the size argument, or limit to
non-negative numbers only (it's trivial for the caller to negate the
output when desired).

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel

Re: mpz_limbs interface

2014-02-06 Thread Niels Möller

Marc Glisse marc.gli...@inria.fr writes:

 On Thu, 6 Feb 2014, Niels Möller wrote:

 I'm looking at the definition of DOUBLE_NAN_INF_ACTION in gmp-impl.h.
 Maybe it could be simplified to a single, unconditional, definition

 Note that there exist standard functions like isfinite.

But so far, we don't use any libm functions in gmp. 

  * no slower then the _GMP_IEEE_FLOATS definition (which extracts the
exponent via a union).

 Are you sure about that?

I'm not sure about anything... I'd expect that moving floating point
values to integer registers (or to memory) is often slow.

 If x is a denormal, x-x may take a very long
 time to compute.

I don't think the case of denormals is very important for performance.
And there's a bit-by-bit normalization loop for that case later on. How
long is very long? 10 cycles? 100 cycles?

 Ok? Or should this wait until after 5.2?

 I would have been in favor of avoiding new features less than a couple
 months before the release, but since there are already plenty of *sec*
 changes going on...

Personally, I feel a bit easier about adding feature close to release,
than about rewriting code for existing features.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel

Re: mpz_limbs interface

2014-02-07 Thread Niels Möller

Torbjorn Granlund t...@gmplib.org writes:

 But then the shifting done in __gmp_extract_double will be needed in
 every caller. Or am I mistaken?

I think you are right, for all current callers but one. The exception is
mpq_set_d, which needs to remove trailing zero bits for normalization,
using a different shift count.

Other possible exceptions: 

* The various cmp_d functions should in the most likely case not need
  shifting of all of the mantissa bits. They could be arranged to first
  compare whichever top bits are the easiest to shift into the right
  position (in some cases, that would include only the most significant
  one bit of the mantissa), followed by a full comparison to all
  mantissa bits, followed by more and more unlikely comparisons for
  lower limbs being zero.

  (For this trick to work, I suspect mpn_set_d can't be allowed any
  normalization freedom; the most significant one bit must be placed in
  a fix position).

* mpfr, which as far as I understand doesn't do limbification in the
  same way as mpf.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel

Re: mpz_limbs interface

2014-02-07 Thread Niels Möller

bodr...@mail.dm.unipi.it writes:

 Il Gio, 6 Febbraio 2014 9:56 am, Niels ha scritto:
 What do you think?

   void
   mpz_set_d (mpz_ptr r, double d)
   {
 int negative;
 !   mp_limb_t tp[MPN_SET_D_SIZE];
 !   mpz_t t = MPZ_ROINIT_N (tp, MPN_SET_D_SIZE);

 Uhm, in tests/mpz/t-limbs.c we test MPZ_ROINIT_N only
 #if __STDC_VERSION__ = 199901

I wasn't sure about whether or not that type of initializer, with one
member initialized with a pointer to a local variable, is allowed in
c89. But I wrote a small test program which compiled with no complaints
from gcc -std=c89. My interpretation of that was that a pointer to a
local variable is constant enough to use in an initializer.

But of course, if that doesn't work portably, MPZ_ROINIT_N mustn't be
used here.

   exp = mpn_set_d (mpz_limbs_write (r, MPN_SET_D_SIZE), d);

Not sure if it really matters, but my intention was to try to avoid
doing realloc twice. With the above, you may realloc once growing the
allocation to MPN_SET_D_SIZE (which could be 2), and then realloc again
in mpn_mul_2exp.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel

Re: mpz_limbs interface

2014-02-08 Thread Niels Möller

bodr...@mail.dm.unipi.it writes:

 So, maybe we can discuss about adding a new function to the _limbs interface:

 mp_ptr
 mpz_init_limbs_write (mpz_t x, mp_size_t n)
 {
   ALLOC (x) = n;
   PTR (x) = (mp_ptr) (*__gmp_allocate_func) (n*GMP_LIMB_BYTES);
   SIZ (x) = 0;

   return PTR (x);
 }

What is this intended for? Looks a bit like like mpz_init2.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel

mpn_sec_div tests

2014-02-08 Thread Niels Möller

I tried to improve the mpn_sec_div_* tests, by adding the following to
the t-div test:

--- a/tests/mpn/t-div.c Thu Feb 06 08:14:38 2014 +0100
+++ b/tests/mpn/t-div.c Sat Feb 08 14:57:23 2014 +0100
@@ -308,8 +308,15 @@ main (int argc, char **argv)
MPN_COPY (qp, junkp, nn - dn + 1);
  qp[nn - dn] = mpn_sec_div_qr (qp, rp, nn, dup, dn, scratch);
  ASSERT_ALWAYS (ran == scratch[itch]);
- check_one (qp, rp, np, nn, dup, dn, mpn_sec_div_qr, 0);
-
+ check_one (qp, rp, np, nn, dup, dn, mpn_sec_div_qr (unnorm), 0);
+#if 1
+ MPN_COPY (rp, np, nn);
+ if (nn = dn)
+   MPN_COPY (qp, junkp, nn - dn + 1);
+ qp[nn - dn] = mpn_sec_div_qr (qp, rp, nn, dnp, dn, scratch);
+ ASSERT_ALWAYS (ran == scratch[itch]);
+ check_one (qp, rp, np, nn, dnp, dn, mpn_sec_div_qr (norm), 0);
+#endif
  /* Test mpn_sec_div_r */
  itch = mpn_sec_div_r_itch (nn, dn);
  if (itch + 1  alloc)

But I then get the following failure, from the mpn_sec_div_r test just
after this change,

***
mpn_sec_div_r failed test -300: q too small
N=19e6b016cd8e4b6a  
 ...   

D=0001
Q= 33cd602d9b1c96d5 98653fa4c9c6d254
cf3580b66c725b56 ... c56d5d8653fa4c9c 752544f3580b66c7 15b576194fe93271
d49513cd602d9b1c
R=
N-Q*D= 19e6b016cd8e4b6a cc329fd264e3692a
679ac05b36392dab ... 3a92a279ac05b363 8adabb0ca7f49938 ea4a89e6b016cd8e
2b6aec329fd264e3
nn = 3136, dn = 1, qn = 3136

I'm a bit puzzled. I can't spot any error in the test code (which is a
copy of the test just above, but using dnp (normalized divisor) rather
than dup. So maybe the mpn_sec_div_qr call overwrites something in this
case?

I'll try to debug this fairly soon.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.

___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel

Re: mpn_sec_div tests

2014-02-08 Thread Niels Möller

ni...@lysator.liu.se (Niels Möller) writes:

 I'm a bit puzzled. I can't spot any error in the test code (which is a
 copy of the test just above, but using dnp (normalized divisor) rather
 than dup.

Found it now. Comment a few lines down:

 pass qp[] from the previous function, mpn_sec_div_qr.  */

So reordering the tests should work.

Regards, 
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel

mpn_sec_powm

2014-02-09 Thread Niels Möller

After some discussion with Torbjörn, I intend to change mpn_sec_powm to
take the exponent size argument in bits, rather than limbs (because the
current code may leak high bit of the exponent, which can cause serious
problems for some applications, e.g., dsa signatures). But first, I'd
like to fix a more minor issue.

The window size k should always be smaller than exponent size, k = eb.
I.e, if you have an exponent of eb bits, the largest window size should
correspond to computing a table of all 2^{eb} powers, and then select
one of them. This works fine, except for corner cases in the tuning of
adding a corresponding assert gives an assertion failure from tuneup.

Say the POWM_SEC_TABLE table (created by tuneup) starts with 1, then the
win_size function uses a table like

  x[] = { 0, 1, 17, ..., ~(mp_bitcnt_t) 0}

The window size k (k = 1) is chosen so that x[k-1]  eb = x[k].
So we get

  eb = 1  ==  k = 1
  eb = 2  ==  k = 2
  eb = 3  ==  k = 2

On the other hand, if the tuned table starts with 2, we get

  x[] = { 0, 2, 17, ..., ~(mp_bitcnt_t) 0}

  eb = 1  ==  k = 1
  eb = 2  ==  k = 1
  eb = 3  ==  k = 2

So it differs only for eb = 2: are going to use a window size of either
1 or 2 bits in this case.

But if the tuneup program generates a first entry of 1, or a larger
value, is based on comparison of exponent size of 1 bit with the two
window sizes k == 1 and k == 2. And eb = 1 and k = 2 violates the k =
eb condition (and any corresponding assert), and generally makes no
sense; it means we compute a table of 4 entries, and then use the
exponent bit to select one of the two first entries.

So there's some kind of off-by-one error here. I think tuning should
start with nbits = 2 (that's tuneup's name for the exponent size, and
the first size where it makes sense to try a window k  1), and time k =
1 and k = 2. And if k = 2 was faster, output nbits - 1 == 1, not nbits
like the current code.

It might also be interesting to note that there's only a single
gmp-mparam.h file where POWM_SEC_TABLE starts with 1: x86_64/bobcat.
There are a larger number of gmp-mparam.h files where it starts with 2,
which would shrink to 1 if the code is fixed to emit nbits - 1.

All this might not be terribly important, but there is a conditional for
eb  windowsize (before the loop, i.e., for the initial value of eb), not
exercised by the testsuite, but needed because of this tuneup
peculiarity (see
https://gmplib.org/devel/lcov/shell/tmp/lcov/gmp/mpn/sec_powm.c.gcov.html).
And I'd like to eliminate that test.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.

___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel

Re: mpz_roinit_n documentation

2014-02-09 Thread Niels Möller

bodr...@mail.dm.unipi.it writes:

 Should we add to the documentation of

 mpz_roinit_n (mpz_t x, const mp_limb_t *xp, mp_size_t xs)

 that the array pointed by xp must have at least a readable limb even if
 xs==0?

If that's not obvious from other documentation of mpz (which I doubt it
is), I think it deserves documentation. Can you add that? Also applies
to the MPZ_ROINIT_N macro.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel

Re: mpn_sec_powm

2014-02-10 Thread Niels Möller

ni...@lysator.liu.se (Niels Möller) writes:

 After some discussion with Torbjörn, I intend to change mpn_sec_powm to
 take the exponent size argument in bits, rather than limbs (because the
 current code may leak high bit of the exponent, which can cause serious
 problems for some applications, e.g., dsa signatures).

Any comments on the below patch?

Regards,
/Niels

diff -Nrc2 gmp.133eee634d4a/doc/gmp.texi gmp/doc/gmp.texi
*** gmp.133eee634d4a/doc/gmp.texi   Mon Feb 10 22:12:32 2014
--- gmp/doc/gmp.texiMon Feb 10 22:12:32 2014
***
*** 5744,5761 
  
  
! @deftypefun void mpn_sec_powm (mp_limb_t *@var{rp}, const mp_limb_t 
*@var{bp}, mp_size_t @var{bn}, const mp_limb_t *@var{ep}, mp_size_t @var{en},  
const mp_limb_t *@var{mp}, mp_size_t @var{n}, mp_limb_t *@var{tp})
! @deftypefunx mp_size_t mpn_sec_powm_itch (mp_size_t @var{bn}, mp_size_t 
@var{en}, size_t @var{n})
  Set @var{R} to @m{B^E \bmod @var{M}, (@var{B} raised to @var{E}) modulo
  @var{M}}, where @var{R} = @{@var{rp},@var{n}@}, @var{M} = 
@{@var{mp},@var{n}@},
! and @var{E} = @{@var{ep},@var{en}@}.
  
! It is required that @math{@var{B}  0}, that @math{@var{E}  0} specifically
! with @m{@var{ep}[@var{en}-1] @neq 0, @var{ep}[@var{en}-1] != 0}, and that
! @math{@var{M}  0} is odd.
  
  No overlapping between @var{R} and the input operands is allowed.
  
  This function requires scratch space of @code{mpn_sec_powm_itch(@var{bn},
! @var{en}, @var{n})} limbs to be passed in the @var{tp} parameter.  The scratch
  space requirements are guaranteed to increase monotonously in the operand
  sizes.
--- 5744,5759 
  
  
! @deftypefun void mpn_sec_powm (mp_limb_t *@var{rp}, const mp_limb_t 
*@var{bp}, mp_size_t @var{bn}, const mp_limb_t *@var{ep}, mp_bitcnt_t 
@var{ebits},  const mp_limb_t *@var{mp}, mp_size_t @var{n}, mp_limb_t *@var{tp})
! @deftypefunx mp_size_t mpn_sec_powm_itch (mp_size_t @var{bn}, mp_bitcnt_t 
@var{ebits}, size_t @var{n})
  Set @var{R} to @m{B^E \bmod @var{M}, (@var{B} raised to @var{E}) modulo
  @var{M}}, where @var{R} = @{@var{rp},@var{n}@}, @var{M} = 
@{@var{mp},@var{n}@},
! and @var{E} consists of the least @var{ebits} in the area pointed to by 
@var{ep}.
  
! It is required that @math{@var{B}  0}, and that @math{@var{M}  0} is odd.
  
  No overlapping between @var{R} and the input operands is allowed.
  
  This function requires scratch space of @code{mpn_sec_powm_itch(@var{bn},
! @var{ebits}, @var{n})} limbs to be passed in the @var{tp} parameter.  The 
scratch
  space requirements are guaranteed to increase monotonously in the operand
  sizes.
diff -Nrc2 gmp.133eee634d4a/gmp-h.in gmp/gmp-h.in
*** gmp.133eee634d4a/gmp-h.in   Mon Feb 10 22:12:32 2014
--- gmp/gmp-h.inMon Feb 10 22:12:32 2014
***
*** 1660,1666 
  
  #define mpn_sec_powm __MPN(sec_powm)
! __GMP_DECLSPEC void mpn_sec_powm (mp_ptr, mp_srcptr, mp_size_t, mp_srcptr, 
mp_size_t, mp_srcptr, mp_size_t, mp_ptr);
  #define mpn_sec_powm_itch __MPN(sec_powm_itch)
! __GMP_DECLSPEC mp_size_t mpn_sec_powm_itch (mp_size_t, mp_size_t, mp_size_t) 
__GMP_ATTRIBUTE_PURE;
  
  #define mpn_sec_tabselect __MPN(sec_tabselect)
--- 1660,1666 
  
  #define mpn_sec_powm __MPN(sec_powm)
! __GMP_DECLSPEC void mpn_sec_powm (mp_ptr, mp_srcptr, mp_size_t, mp_srcptr, 
mp_bitcnt_t, mp_srcptr, mp_size_t, mp_ptr);
  #define mpn_sec_powm_itch __MPN(sec_powm_itch)
! __GMP_DECLSPEC mp_size_t mpn_sec_powm_itch (mp_size_t, mp_bitcnt_t, 
mp_size_t) __GMP_ATTRIBUTE_PURE;
  
  #define mpn_sec_tabselect __MPN(sec_tabselect)
diff -Nrc2 gmp.133eee634d4a/mpn/generic/sec_powm.c gmp/mpn/generic/sec_powm.c
*** gmp.133eee634d4a/mpn/generic/sec_powm.c Mon Feb 10 22:12:32 2014
--- gmp/mpn/generic/sec_powm.c  Mon Feb 10 22:12:32 2014
***
*** 257,265 
  void
  mpn_sec_powm (mp_ptr rp, mp_srcptr bp, mp_size_t bn,
! mp_srcptr ep, mp_size_t en,
  mp_srcptr mp, mp_size_t n, mp_ptr tp)
  {
mp_limb_t ip[2], *mip;
-   mp_bitcnt_t ebi;
int windowsize, this_windowsize;
mp_limb_t expbits;
--- 257,264 
  void
  mpn_sec_powm (mp_ptr rp, mp_srcptr bp, mp_size_t bn,
! mp_srcptr ep, mp_bitcnt_t ebi,
  mp_srcptr mp, mp_size_t n, mp_ptr tp)
  {
mp_limb_t ip[2], *mip;
int windowsize, this_windowsize;
mp_limb_t expbits;
***
*** 268,272 
int cnd;
  
!   ASSERT (en  0  ep[en - 1] != 0);
ASSERT (n = 1  ((mp[0]  1) != 0));
/* The code works for bn = 0, but the defined scratch space is 2 limbs
--- 267,271 
int cnd;
  
!   ASSERT (ebi  0);
ASSERT (n = 1  ((mp[0]  1) != 0));
/* The code works for bn = 0, but the defined scratch space is 2 limbs
***
*** 274,279 
ASSERT (bn = 1);
  
-   MPN_SIZEINBASE_2EXP(ebi, ep, en, 1);
- 
windowsize = win_size (ebi);
  
--- 273,276 
***
*** 416,420 
  
  mp_size_t
! mpn_sec_powm_itch (mp_size_t bn, mp_size_t en, mp_size_t n)
  {
int windowsize;
--- 413,417

Re: mpn_sec_powm

2014-02-11 Thread Niels Möller

Torbjorn Granlund t...@gmplib.org writes:

 The exponent size argument docs implies that bits in the leading limb
 beyond the bit count argument are to be ignored.  I am not sure that is
 a wise promise.

Given the current implementation, it's natural. But we could document
that it is required that any left over bits in the top limb must be
zero. Would that be better?

 Please use something else than ebits, since that sounds like the
 arguments contains bits with individual meaning.  IIRC enb would
 follow conventions used elsewhere in the manual.

Ok, then I should switch to enb both in the docs and the code.

/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel

Re: mpn_sec_powm

2014-02-11 Thread Niels Möller

Torbjorn Granlund t...@gmplib.org writes:

 Please use something else than ebits, since that sounds like the
 arguments contains bits with individual meaning.  IIRC enb would
 follow conventions used elsewhere in the manual.

Naming is hard ;-) The docs for mpn_sec_invert use nbcnt (I guess that's
your choice). Docs for mpf_urandomb use nbits. Other mp_bitcnt_t
arguments in the manual give little guidance, names like n, op,
starting_bit, bit_index, prec, bit, m2exp.

So enb here is as good as any.

/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel

Re: mpn_sec_powm

2014-02-11 Thread Niels Möller

ni...@lysator.liu.se (Niels Möller) writes:

 Maybe one problem is that to make a fair comparison between window
 sizes k and k+1, the exponent bit size should be divisible by both k and
 k+1.

I'm testing the below patch. Output from four consecutive runs:

#define POWM_SEC_TABLE  1,29,203,839,1409
#define POWM_SEC_TABLE  1,11,203,399,899
#define POWM_SEC_TABLE  1,5,203,579,1289
#define POWM_SEC_TABLE  1,17,203,839,1409

At least the middle value is consistent ;-) It means that we should
increase the window size from 3 to 4 bits when enb  203. I.e.,
benchmarking was done at 204 = 17*12 bits. 

Increasing the window size here saves 17 modular multiplications (on
four limbs) and table lookups (out of 68), at the cost of 8 additional
multiplies in the precomputation, and each of the remaining 51 table
lookups getting twice as expensive. Which sounds more or less sane to
me.

/Niels


--- a/tune/tuneup.c Tue Feb 11 22:26:02 2014 +0100
+++ b/tune/tuneup.c Tue Feb 11 22:27:40 2014 +0100
@@ -1868,7 +1868,7 @@ tune_powm_sec (void)
   mp_size_t n;
   int k, i;
   mp_size_t itch;
-  mp_bitcnt_t nbits, nbits_next, possible_nbits_cutoff;
+  mp_bitcnt_t nbits, possible_nbits_cutoff;
   const int n_max = 3000 / GMP_NUMB_BITS;
   const int n_measurements = 5;
   mp_ptr rp, bp, ep, mp, tp;
@@ -1914,7 +1914,10 @@ tune_powm_sec (void)
   /* For nbits == 1, we should always use k == 1, so no need to tune
  that. Starting with nbits == 2 also ensure that nbits always is
  larger than the windowsize k+1. */
-  for (nbits = 2; nbits = n_max * GMP_NUMB_BITS; )
+  for (nbits = 2;
+   nbits = n_max * GMP_NUMB_BITS  k  10;
+   /* Increment, and round upwards to a multiple of k(k+1) */ 
+   nbits = k*(k+1)*(1+nbits/(k*(k+1
 {
   n = (nbits - 1) / GMP_NUMB_BITS + 1;
 
@@ -1972,9 +1975,6 @@ tune_powm_sec (void)
}
   else
possible_nbits_cutoff = 0;
-
-  nbits_next = nbits * 65 / 64;
-  nbits = nbits_next + (nbits_next == nbits);
 }
   printf (\n);
   TMP_FREE;

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel

Re: sec_karatsuba

2014-02-11 Thread Niels Möller

bodr...@mail.dm.unipi.it writes:

 I tried to write a _sec_ version of Karatsuba multiplication. It is not
 intended for 5.2, of course, but it is ready and I like to share it. It is
 attached.

Cool.

 It obviously is slower than the non_sec toom22, but not terribly:

 180.00635   #0.90791.0035
 190.00677   #0.97221.0510
 200.00761   #0.88290.9771
 210.00833   #0.92910.9859

So threshold around 20 limbs, or 1280 bits. So 4096-bit rsa (with
2048-bit secret primes) is one potential application.

 I used evaluation in +1 instead of -1, to avoid branches due to the
 sign,

For the -1 point, one could use mpn_sub + mpn_cnd_neg to compute
|a-b|. And then also conditonally negate and sign-extend the
corresponding product. I don't know what's best, it's a couple of
negations vs an extra limb in the product.

 By the way, if you have the time to look at the code and comment, I'll
 appreciate.

I haven't read it very carefully, so only one minor point, 

   ASSERT (an = bn);
 
   s = an  1;
   n = an - s;
   t = bn - n;
 
   ASSERT (mpn_sec_mul_itch (an, bn) = 4*n + 4);
   if (UNLIKELY (s + t = n))
 {
   mpn_mul_basecase (rp, ap, an, bp, bn);
   return;
 }

I think I'd prefer to do the check for the too unbalanced case earlier,
and not rely on t being signed.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel

Re: Bug found in nightbuilds

2014-02-15 Thread Niels Möller

Torbjorn Granlund t...@gmplib.org writes:

 I suspect this is a new bug, and that the error is in the test code.  It
 seems to happen for many seeds, around one in 25, for s390.  If the bug
 were old it'd had happened before.

Hmm. Look like it's the returning of the high limb via the separate *qp
which is broken? And there's no (non-inline) assembly involved, its
generic/mpn_div_qr_1.c.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel

Re: Bug found in nightbuilds

2014-02-16 Thread Niels Möller

Torbjorn Granlund t...@gmplib.org writes:

 Also, keeping DIV_QR_1_UNNORM_THRESHOLD = 1 surely does the job, but it
 puts another test it the critical path.  Perhaps that's unavoidable, I
 haven't checked.

I'm not entirely sure I follow you, but I guess the alternative is
something like

  if (nn  0)
if (BELOW_THRESHOLD (...))
  {
do { ... nn-- ...} while (nn  0);
  }
else
  {
invert_limb(...);
return div_qr_1n_pi1 (...)  cnt;
  }
  return uh;

Then the BELOW_THRESHOLD test can be always false when the threshold is
0. A clever enough compiler could optimize it away also if the threshold
is 1, but I have no idea if gcc does that.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel

< 1 2 3 4 5 6 7 8 >

101 - 200 of 723 matches

Mail list logo