Re: 68000 issue in longlong.h

2021-03-08 Thread Torbjörn Granlund
Thanks for your analysis, Niels!

I now agree that sticking an & in there is the right thing to do.

(I'm adding a test config for plain 68000 too; I thought we had one
already but that does not seem to be the case.)


-- 
Torbjörn
Please encrypt, key id 0xC8601622
___
gmp-bugs mailing list
gmp-bugs@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-bugs


Re: 68000 issue in longlong.h

2021-03-05 Thread Niels Möller
"se...@t-online.de"  writes:

> There you see the original code on the left and the generated assembly in the 
> middle.
> You can add the & to
> "=d" (__umul_tmp1)
> in the left window and you see immediately the change in the comiled output.

Ok, let me see if I understand the problem. The inline asm for umul_ppmm
is as follows:

#define umul_ppmm(xh, xl, a, b) \
  do { USItype __umul_tmp1, __umul_tmp2;\
__asm__ ("| Inlined umul_ppmm\n"\
"   move%.l %5,%3\n"\
"   move%.l %2,%0\n"\
"   move%.w %3,%1\n"\
"   swap%3\n"   \
"   swap%0\n"   \
"   mulu%.w %2,%1\n"\
"   mulu%.w %3,%0\n"\
"   mulu%.w %2,%3\n"\
"   swap%2\n"   \
"   mulu%.w %5,%2\n"\
"   add%.l  %3,%2\n"\
"   jcc 1f\n"   \
"   add%.l  %#0x1,%0\n" \
"1: move%.l %2,%3\n"\
"   clr%.w  %2\n"   \
"   swap%2\n"   \
"   swap%3\n"   \
"   clr%.w  %3\n"   \
"   add%.l  %3,%1\n"\
"   addx%.l %2,%0\n"\
"   | End inlined umul_ppmm"\
  : "=" (xh), "=" (xl), \
"=d" (__umul_tmp1), "=" (__umul_tmp2) \
  : "%2" ((USItype)(a)), "d" ((USItype)(b)));   \
  } while (0)

There are two instructions mentioning the input %5,

  move%.l %5,%3   (first instruction)

  mulu%.w %5,%2   (close to the middle)

The %2 register is both an input and output, and referenced in a few
places in between. My m68k assembly knowledge is very rusty, most of
them may be harmless reads, but the "swap %2" instruction just before
the use of %5 looks like it relies on %2 and %5 being distinct
registers? In the linked-to compiler output (listed as gcc-6.5.0b, using
-O1), the template is instantiated as

| Inlined umul_ppmm
move.l  d5,d2
move.l  d5,d0
move.w  d2,d1
swapd2
swapd0
mulu.w  d5,d1
mulu.w  d2,d0
mulu.w  d5,d2
swapd5
mulu.w  d5,d5
add.l   d2,d5
jcc 1f
add.l   #0x1,d0
clr.w   d5
swapd5
swapd2
clr.w   d2
add.l   d2,d1
addx.l  d5,d0
| End inlined umul_ppmm

so it's quite clear that both %2 and %5 are mapped to the same register,
d5. Adding the suggested &, marking %2 as an "early clobber" output,
changes the instantiation to

| Inlined umul_ppmm
move.l  d5,d2
move.l  d6,d0
move.w  d2,d1
swapd2
swapd0
mulu.w  d6,d1
mulu.w  d2,d0
mulu.w  d6,d2
swapd6
mulu.w  d5,d6
add.l   d2,d6
jcc 1f
add.l   #0x1,d0
clr.w   d6
swapd6
swapd2
clr.w   d2
add.l   d2,d1
addx.l  d6,d0
| End inlined umul_ppmm

I.e., %2 gets its own register, d6. If my analysis is right, the
critical difference is

swapd5
mulu.w  d5,d5

vs

swapd6
mulu.w  d5,d6

I can't say for sure if using the same register for %2 and %5 is
expected behavior of gcc. But to me it seems reasonable of gcc to try to
share a register when an inline asm template is instantiated with
identical expressions for two of the inputs (as in the squaring
umul_ppmm(xh, xl, u, u). And the additional & seems to be the documented
way to tell it to not do that for this asm template.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
___
gmp-bugs mailing list
gmp-bugs@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-bugs


Re: 68000 issue in longlong.h

2021-03-03 Thread Torbjörn Granlund
"se...@t-online.de"  writes:

  Hello,
  I am compiling and testing libgmp for AmigaOS. Yes, I am crazy.
   
  I use
  Thanks to bebbo's gcc https://github.com/bebbo/amiga-gcc 
   I was quite successful. I had some 
  issues in the tests hovever.
  For instance t-conv, t-sqrt and t-sqrt_ui failed. The bugs disappeared when 
  I compiled without optimization and appeared when I used -O2, so I 
  suspected a compiler bug and reported that to the Amiga gcc6 maintainer. He 
  looked after the problem and found an issue inside libgmp's code.
   
  The bug is inside gmp's header longlong.h:1220, where asm for 68000 is 
  defined:
   
  wrong code:
  ...
  " | End inlined umul_ppmm" \
  : "=" (xh), "=" (xl), \
  "=d" (__umul_tmp1), "=" (__umul_tmp2) \
  : "%2" ((USItype)(a)), "d" ((USItype)(b))); \
   
  correction:
  ...
  " | End inlined umul_ppmm" \
  : "=" (xh), "=" (xl), \
  "=" (__umul_tmp1), "=" (__umul_tmp2) \<---  an & must be added 
  after the =
  : "%2" ((USItype)(a)), "d" ((USItype)(b))); \
   
The current code looks correct to me.  Please explain why you think it
is not correct.


-- 
Torbjörn
Please encrypt, key id 0xC8601622
___
gmp-bugs mailing list
gmp-bugs@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-bugs