Re: bug in longlong.h for aarch64 sub_ddmmss

2020-06-17 Thread Torbjörn Granlund
t...@gmplib.org (Torbjörn Granlund) writes:

  A crude test generator:

A slightly better variant.  It expects an argument, either "add" or
"sub", allowing it to test both add_ss and sub_ddmmss.

This version should work fine on 32-bit systems without modification.

This code is not well-defined (as Vincent might point out) as it
converts between signed and unsigned quantities in the range where the
signed number is negative.  But it does the job I intended it to do.

I will integrate this in some form in the GMP test facility.



gen-longlong.c
Description: Binary data


-- 
Torbjörn
Please encrypt, key id 0xC8601622
___
gmp-bugs mailing list
gmp-bugs@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-bugs


Re: bug in longlong.h for aarch64 sub_ddmmss

2020-06-17 Thread Torbjörn Granlund
ni...@lysator.liu.se (Niels Möller) writes:

  Do you agree?

I agree.

  The excluded case,

sub_ddmmss(ah, al, bh, /*compile time constant*/0) 

  could clearly be optimized, in a different way, but I'd guess it's rare
  enough in real code to not be worth the effort?

Clearly, longlong.h was too complicated to get right already, so let's
not add an even deeper if-else tree.  (Look at the 64-bit PPC add_ss
to see what I'm talking about.)

Here is a better patch for 6.2.0.  It actually improves generated code.

* For Arm32/64, we use ADDS in sub_ddmmss whenever the rhs immediate
  value is negative.  We let gcc decide which operands fit and which
  need to go through a temp register.  (This improves code as ADDS
  allows constants 0,1,2,..,4095 and also 0,4096,8192..,4096*4095.  That
  latter interval was not handled before.)

* We improve add_ss analogously.

* For PPC, we explicitly exclude 0.  Also, we fixed a repeated typo
  causing invalid ADDIC forms to be use.



diff
Description: Binary data


-- 
Torbjörn
Please encrypt, key id 0xC8601622
___
gmp-bugs mailing list
gmp-bugs@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-bugs


Re: bug in longlong.h for aarch64 sub_ddmmss

2020-06-17 Thread Paul Zimmermann
   Dear Torbjörn,

> A crude test generator: [...]

great! On gcc115 it fails with the longlong.h shipped with GMP 6.2.0:

zimmerma@gcc115:/tmp/gmp-6.2.0$ /opt/cfarm/gcc-latest/bin/gcc test.c
zimmerma@gcc115:/tmp/gmp-6.2.0$ ./a.out > test1.c
zimmerma@gcc115:/tmp/gmp-6.2.0$ /opt/cfarm/gcc-latest/bin/gcc test1.c -lgmp
zimmerma@gcc115:/tmp/gmp-6.2.0$ ./a.out 
error for f2(0x1,0x0): want (0x0,0x1) got (0x,0x1)
...
error for f4034(0x7fff,0x0): want (0x0,0x7fff) got 
(0x,0x7fff)

With the patch you sent it works:

zimmerma@gcc115:/tmp/gmp-6.2.0$ ./a.out
zimmerma@gcc115:/tmp/gmp-6.2.0$ 

Paul
___
gmp-bugs mailing list
gmp-bugs@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-bugs


Re: bug in longlong.h for aarch64 sub_ddmmss

2020-06-17 Thread Torbjörn Granlund
Vincent Lefevre  writes:

  Note: in the tests, it is important to test the macro on constants
  in order to test the "if" case.

A crude test generator:

#include 
#include 
#include "gmp-impl.h"

void
one (size_t ind, mp_limb_t m0, mp_limb_t s0)
{
  printf ("void f%zu(mp_limb_t* r1p, mp_limb_t* r0p) {\n", ind);
  printf ("mp_limb_t r1, r0;\n");
  printf ("sub_ddmmss (r1, r0, 0, %ld, 0, %ld);\n", (long) m0, (long) s0);
  printf ("*r1p = r1;  *r0p = r0;\n");
  printf ("}\n");
}

mp_limb_t ops[1000];
mp_limb_t res1[1000];
mp_limb_t res0[1000];

int
main ()
{
  size_t n_operands = 0;
  size_t n_functions = 0;

  for (int i = 0; i < 16; i++)
{
  ops[n_operands++] = 1 << i;
  ops[n_operands++] = -(1 << i);
  ops[n_operands++] = (1 << i) - 1;
  ops[n_operands++] = -(1 << i) - 1;
}

  printf ("#include \n");
  printf ("#include \n");
  printf ("#include \"gmp-impl.h\"\n");
  printf ("#include \"longlong.h\"\n");

  /* Generate functions and print them.  */
  for (int i = 0; i < n_operands; i++)
{
  for (int j = 0; j < n_operands; j++)
{
  one (n_functions++, ops[i], ops[j]);
}
}

#if 1
  /* Print out ops[] definition.  */
  printf ("int ops[%zu] = {\n", n_operands);
  for (int i = 0; i < n_operands; i++)
{
  printf ("%ld,", (long) ops[i]);
  if ((i + 1) % 4 == 0)
puts ("");
}
  printf ("};\n");
#endif

  /* Print out function pointer table.  */
  printf ("typedef void (*func_t) (mp_limb_t*, mp_limb_t*);\n");
  printf ("func_t funcs[%zu] = {\n", n_functions);
  for (size_t i = 0; i < n_functions; i++)
{
  printf ("f%zu,", i);
  if ((i + 1) % 16 == 0)
puts ("");
}
  printf ("};\n");

  /* Print out table of reference results.  */
  printf ("mp_limb_t ref[%zu][2] = {\n", n_functions);
  for (int i = 0; i < n_operands; i++)
{
  for (int j = 0; j < n_operands; j++)
{
  printf ("{0x%llx,0x%llx},\n", (unsigned long long) ( ops[i] - 
ops[j]), (unsigned long long) (-(mp_limb_t) (ops[i] < ops[j])));
}
}
  printf ("};\n");


  printf ("int main ()\n{\n");
  printf ("  mp_limb_t r1, r0;\n");
  printf ("  int err = 0;\n");
  printf ("  size_t ind = 0;\n");
  printf ("  for (size_t i = 0; i < %zu; i++)\n", n_functions);
  printf ("{\n");
  printf ("  int ii = i / %zu, jj = i %% %zu;\n", n_operands, n_operands);
  printf ("  funcs[i](&r1, &r0);\n");
  printf ("  if (r0 != ref[ind][0] || r1 != ref[ind][1]) {\n");
  printf (" gmp_printf (\"error for f%%zu(0x%%Mx,0x%%Mx): want 
(0x%%Mx,0x%%Mx) got (0x%%Mx,0x%%Mx)\\n\", i, ops[ii], ops[jj], ref[ind][1], 
ref[ind][0], r1, r0);\n");
  printf (" err++;\n");
  printf ("   }\n");
  printf ("  ind++;\n");
  printf ("}\n");


  printf ("  return err != 0;\n");
  printf ("}\n");
  return 0;
}


-- 
Torbjörn
Please encrypt, key id 0xC8601622
___
gmp-bugs mailing list
gmp-bugs@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-bugs


Re: bug in longlong.h for aarch64 sub_ddmmss

2020-06-17 Thread Paul Zimmermann
   Dear Niels,

> Sorry for being unclear, sub_ddmmss has six arguments, and the case I
> wanted to single out was
> 
>sub_ddmmss(rh, rl, ah, al, bh, /*compile time constant*/0)
>  
> > in MPFR we have 15 places where we call sub_ddmmss, among which 8 have bl=0:
> 
> Those seem to have al == 0 (in above notation), which is a different
> case.

sorry I mixed al and bl. After looking at the MPFR code that hit the bug:

if (MPFR_UNLIKELY(_r < MPFR_LIMB_HIGHBIT))  \
  _r = MPFR_LIMB_HIGHBIT;   \
umul_ppmm (_h, _l, _r, _r); \
sub_ddmmss (_h, _l, _u, 0, _h, _l); \

I guess the compiler did optimize separately the _r < MPFR_LIMB_HIGHBIT
case, in which case it did deduce _r = MPFR_LIMB_HIGHBIT, thus _l=0
after umul_ppmm, and the special code for bl=0 was used in longlong.h.

Paul
___
gmp-bugs mailing list
gmp-bugs@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-bugs


Re: bug in longlong.h for aarch64 sub_ddmmss

2020-06-17 Thread Niels Möller
Paul Zimmermann  writes:

>Dear Niels,
>
>> The excluded case,
>> 
>>   sub_ddmmss(ah, al, bh, /*compile time constant*/0) 
>> 
>> could clearly be optimized, in a different way, but I'd guess it's rare
>> enough in real code to not be worth the effort?

Sorry for being unclear, sub_ddmmss has six arguments, and the case I
wanted to single out was

   sub_ddmmss(rh, rl, ah, al, bh, /*compile time constant*/0)
 
> in MPFR we have 15 places where we call sub_ddmmss, among which 8 have bl=0:

Those seem to have al == 0 (in above notation), which is a different
case.

  sub_ddmmss (rh, rl, ah, 0, bh, bl)

should be the same as

  rh = ah - bh - (bl > 0);
  rl = -bl;

So we have a borrow to propagate except of also bl == 0, and hence some
runtime carry logic is needed.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
___
gmp-bugs mailing list
gmp-bugs@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-bugs


Re: bug in longlong.h for aarch64 sub_ddmmss

2020-06-17 Thread Paul Zimmermann
   Dear Niels,

> The excluded case,
> 
>   sub_ddmmss(ah, al, bh, /*compile time constant*/0) 
> 
> could clearly be optimized, in a different way, but I'd guess it's rare
> enough in real code to not be worth the effort?

in MPFR we have 15 places where we call sub_ddmmss, among which 8 have bl=0:

$ grep sub_ddmmss *.[ch] | grep -v "mpfr-longlong" | grep -v "mpfr-gmp"
div.c:  sub_ddmmss (h, l, u0, 0, h, l);
div.c:sub_ddmmss (h, l, u0, 0, h, l);
div.c:sub_ddmmss (r3, r2, r3, r2, v1, v0);
div.c:  sub_ddmmss (s1, s0, s1, s0, v1, v0);
invsqrt_limb.h:sub_ddmmss (_h, _l, _u, 0, _h, _l); \
invsqrt_limb.h:sub_ddmmss (_h, _l, _h, _l, _r >> 60, _r << 4); \
invsqrt_limb.h:sub_ddmmss (_h, _l, _h, _l, 0, 64); \
invsqrt_limb.h:sub_ddmmss (_h, _l, _h, _l, _r >> 61, _r << 3); \
invsqrt_limb.h:sub_ddmmss (_h, _l, _h, _l, 0, 16); \
invsqrt_limb.h:sub_ddmmss (_h, _l, _h, _l, _r >> 62, _r << 2); \
invsqrt_limb.h:sub_ddmmss (_h, _l, _h, _l, 0, 4);  \
invsqrt_limb.h:sub_ddmmss (_h, _l, _h, _l, _r >> 63, (_r << 1) + 1);   \
rec_sqrt_limb.h:sub_ddmmss (_h, _l, 0x4000UL, 0, _h, _l);   
\
sqrt.c:  sub_ddmmss (rb, sb, u0, 0, rb, sb);
sqrt.c:  sub_ddmmss (rb, sb, u0, low, rb, sb);

Best regards,
Paul
___
gmp-bugs mailing list
gmp-bugs@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-bugs


Re: bug in longlong.h for aarch64 sub_ddmmss

2020-06-17 Thread Niels Möller
t...@gmplib.org (Torbjörn Granlund) writes:

> Using the ARM "subs rd,rm,imm12" instruction, we compute
>
> {cout, rd} = rm + ~imm + 1
>
> while the "adds rd,rm,imm12" instruction, we compute
>
> {cout, rd} = rm + imm
>
> .  which is quite different.  The former will for example always set
> cout when rm = imm = 0 as in Vincent's example.  The latter will never
> set carry when imm = 0 or rm = 0;

Right, it's a bit subtle. The case we're trying to handle specially is

  {ah, al} - {bh, bl}

with bl = B - x, x small.

I would expect that the existing code could be fixed if we exclude bl =
0 (since we'd then get get x = B, which qualifies as "x small" only
modulo B, but not as a plain mathematical integer).

  if (__builtin_constant_p (bl) && bl != 0 && -(UDItype)(bl) < 0x1000)

Then, if bl = B - x, we get (modulo B^2):

  {ah, al} - {bh, bl} = (ah - bh) B  + al + x - B
  = (ah + ~bh + 1) B + al + x - B
  = (ah + ~bh) B + al + x 

which should be computed correctly with the sequence adds, sbc, using
carry out from al + x.

Do you agree?

The excluded case,

  sub_ddmmss(ah, al, bh, /*compile time constant*/0) 

could clearly be optimized, in a different way, but I'd guess it's rare
enough in real code to not be worth the effort?

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
___
gmp-bugs mailing list
gmp-bugs@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-bugs


Re: bug in longlong.h for aarch64 sub_ddmmss

2020-06-17 Thread Torbjörn Granlund
ni...@lysator.liu.se (Niels Möller) writes:

  And, which I guess is more relevant in the sub_ddmmss context, it also
  means that there's little need for separate instructions for adding and
  subtracting constants.

The error we struck here for ARM and (one of the errors) for PPC was
that we added 0 instead of subtracting zero, the latter meaning adding
~0 + 1.

Using the ARM "subs rd,rm,imm12" instruction, we compute

{cout, rd} = rm + ~imm + 1

while the "adds rd,rm,imm12" instruction, we compute

{cout, rd} = rm + imm

.  which is quite different.  The former will for example always set
cout when rm = imm = 0 as in Vincent's example.  The latter will never
set carry when imm = 0 or rm = 0;


-- 
Torbjörn
Please encrypt, key id 0xC8601622
___
gmp-bugs mailing list
gmp-bugs@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-bugs


Re: bug in longlong.h for aarch64 sub_ddmmss

2020-06-17 Thread Niels Möller
t...@gmplib.org (Torbjörn Granlund) writes:

>   >   {cout, r} = a + ~b + cin
>
>   This is a - b - borrow, where the borrow is the complement of the
>   carry bit.
>
> Niels' definition is important as it captures the similarity with
> addition.  It is indeed how the instructions are described in the vendor
> manuals.

And it's closer to the hardware, since this makes it clear that there's
only one stage of carry propagation (~b is a totally parallel bit-by-bit
inversion, while logic to constuct -b is slower and more complex), done
identically as for addition. Likely using something like
https://en.wikipedia.org/wiki/Kogge%E2%80%93Stone_adder.

Then, the ISA could of course complement the carry flag at input and
output, regardless of the rest of the hardware, like it's done on x86.
But I've found the ARM convention more convenient, e.g., when writing
wraparound code for special-form ECC-related primes.

And, which I guess is more relevant in the sub_ddmmss context, it also
means that there's little need for separate instructions for adding and
subtracting constants.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
___
gmp-bugs mailing list
gmp-bugs@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-bugs


Re: bug in longlong.h for aarch64 sub_ddmmss

2020-06-17 Thread Torbjörn Granlund
Vincent Lefevre  writes:

  On 2020-06-17 11:04:26 +0200, Niels Möller wrote:
  > t...@gmplib.org (Torbjörn Granlund) writes:
  > 
  > > The two - signs ought to be ~, I think.  Let me think a buit more about 
that.
  > 
  > If I remember ARM conventions correctly, subtract with carry is defined
  > as
  > 
  >   {cout, r} = a + ~b + cin

  This is a - b - borrow, where the borrow is the complement of the
  carry bit.

Niels' definition is important as it captures the similarity with
addition.  It is indeed how the instructions are described in the vendor
manuals.

-- 
Torbjörn
Please encrypt, key id 0xC8601622
___
gmp-bugs mailing list
gmp-bugs@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-bugs


Re: bug in longlong.h for aarch64 sub_ddmmss

2020-06-17 Thread Vincent Lefevre
On 2020-06-17 11:04:26 +0200, Niels Möller wrote:
> t...@gmplib.org (Torbjörn Granlund) writes:
> 
> > The two - signs ought to be ~, I think.  Let me think a buit more about 
> > that.
> 
> If I remember ARM conventions correctly, subtract with carry is defined
> as
> 
>   {cout, r} = a + ~b + cin

This is a - b - borrow, where the borrow is the complement of the
carry bit.

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)
___
gmp-bugs mailing list
gmp-bugs@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-bugs


Re: bug in longlong.h for aarch64 sub_ddmmss

2020-06-17 Thread Torbjörn Granlund
ni...@lysator.liu.se (Niels Möller) writes:

  t...@gmplib.org (Torbjörn Granlund) writes:

  > The two - signs ought to be ~, I think.  Let me think a buit more about 
that.

  If I remember ARM conventions correctly, subtract with carry is defined
  as

{cout, r} = a + ~b + cin

Here we traded non-borrow subtract and non-carry addition on both ARM
and PPC.

Subtract is defined on both these architectures as

{cout, r} = a + ~b + 1

while addition is refined similarly as

{cout, r} = a + b

.

-- 
Torbjörn
Please encrypt, key id 0xC8601622
___
gmp-bugs mailing list
gmp-bugs@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-bugs


Re: bug in longlong.h for aarch64 sub_ddmmss

2020-06-17 Thread Niels Möller
t...@gmplib.org (Torbjörn Granlund) writes:

> The two - signs ought to be ~, I think.  Let me think a buit more about that.

If I remember ARM conventions correctly, subtract with carry is defined
as

  {cout, r} = a + ~b + cin

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
___
gmp-bugs mailing list
gmp-bugs@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-bugs