Re: [PATCH, rs6000] Fix ELFv2 homogeneous float aggregate ABI bug

2014-07-10 Thread Richard Biener
On Wed, Jul 9, 2014 at 6:02 PM, Ulrich Weigand uweig...@de.ibm.com wrote:
 Hello,

 the implementation of homogenous float aggregates for the ELFv2 ABI has
 unfortunately shown to have a bug in a corner case.

 The problem is that because such aggregates are packed in the argument
 save area, but each (4-byte) float occupies one of just 13 registers
 on its own, we may run out of registers while we're still within the
 first 64 bytes of the argument save area.

 Usually, any argument that doesn't fit into register should go in
 memory.  But that rule doesn't apply within the first 64 bytes, where
 such arguments need to go into GPRs first.  This is important since
 the ABI guarantees that the first 64 bytes of the save area are free,
 e.g. to store GPRs into.  If an argument is actually passed within
 those first 64 bytes, some code (e.g. libffi assembler stubs) may
 clobber its contents.

 Now, the existing rs6000_function_arg code will handle this case
 correctly if the extra floats come in a *new* argument.  For example,
 in the following test case

 struct float2 { float x[2]; };
 struct float6 { float x[6]; };
 struct float8 { float x[8]; };

 float func (struct float8 a, struct float6 b, struct float2 c);

 both parts of c are correctly expected in r10.

 However, the code handles incorrectly the case where a *single*
 aggregate argument is split between FPRs and extra floats.
 For example, b.x[5] is expected in memory, although it ought
 to reside in r9.

 The appended patch fixes the implementation to comply with the ABI.

 This is an ABI change for the affected corner cases of the ELFv2
 ABI.  However, those cases should be extremely rare; the full
 compat.exe and struct-layout-1.exp ABI compatibility test suite
 passed, with the exception of two tests specifically intended
 to test multiple homogeneous float aggregates.

 Tested on powerpc64le-linux.

 OK for mainline?
 [ The patch should then also go into the 4.8 and 4.9 branches for
 consistency. ]

Can you add -Wpsabi warnings for all the issues you found?  That way
a world re-build will at least show if anything important is affected.

Thanks,
Richard.

 Bye,
 Ulrich


 ChangeLog:

 * config/rs6000/rs6000.c (rs6000_function_arg): If a float argument
 does not fit fully into floating-point registers, and there is still
 space in the register parameter area, use GPRs to pass those parts
 of the argument.
 (rs6000_arg_partial_bytes): Likewise.

 Index: gcc/config/rs6000/rs6000.c
 ===
 --- gcc/config/rs6000/rs6000.c  (revision 212100)
 +++ gcc/config/rs6000/rs6000.c  (working copy)
 @@ -10227,6 +10227,7 @@
   rtx r, off;
   int i, k = 0;
   unsigned long n_fpreg = (GET_MODE_SIZE (elt_mode) + 7)  3;
 + int fpr_words;

   /* Do we also need to pass this argument in the parameter
  save area?  */
 @@ -10255,6 +10256,37 @@
   rvec[k++] = gen_rtx_EXPR_LIST (VOIDmode, r, off);
 }

 + /* If there were not enough FPRs to hold the argument, the rest
 +usually goes into memory.  However, if the current position
 +is still within the register parameter area, a portion may
 +actually have to go into GPRs.
 +
 +Note that it may happen that the portion of the argument
 +passed in the first half of the first GPR was already
 +passed in the last FPR as well.
 +
 +For unnamed arguments, we already set up GPRs to cover the
 +whole argument in rs6000_psave_function_arg, so there is
 +nothing further to do at this point.  */
 + fpr_words = (i * GET_MODE_SIZE (elt_mode)) / (TARGET_32BIT ? 4 : 8);
 + if (i  n_elts  align_words + fpr_words  GP_ARG_NUM_REG
 +  cum-nargs_prototype  0)
 +{
 + enum machine_mode rmode = TARGET_32BIT ? SImode : DImode;
 + int n_words = rs6000_arg_size (mode, type);
 +
 + align_words += fpr_words;
 + n_words -= fpr_words;
 +
 + do
 +   {
 + r = gen_rtx_REG (rmode, GP_ARG_MIN_REG + align_words);
 + off = GEN_INT (fpr_words++ * GET_MODE_SIZE (rmode));
 + rvec[k++] = gen_rtx_EXPR_LIST (VOIDmode, r, off);
 +   }
 + while (++align_words  GP_ARG_NUM_REG  --n_words != 0);
 +   }
 +
   return rs6000_finish_function_arg (mode, rvec, k);
 }
else if (align_words  GP_ARG_NUM_REG)
 @@ -10330,8 +10362,23 @@
/* Otherwise, we pass in FPRs only.  Check for partial copies.  */
passed_in_gprs = false;
if (cum-fregno + n_elts * n_fpreg  FP_ARG_MAX_REG + 1)
 -   ret = ((FP_ARG_MAX_REG + 1 - cum-fregno)
 -  * MIN (8, GET_MODE_SIZE (elt_mode)));
 +   {
 + /* Compute number of bytes / words passed in FPRs.  If 

Re: [PATCH, rs6000] Fix ELFv2 homogeneous float aggregate ABI bug

2014-07-10 Thread Jakub Jelinek
On Wed, Jul 09, 2014 at 02:19:36PM -0400, David Edelsohn wrote:
  This is an ABI change for the affected corner cases of the ELFv2
  ABI.  However, those cases should be extremely rare; the full
  compat.exe and struct-layout-1.exp ABI compatibility test suite
  passed, with the exception of two tests specifically intended
  to test multiple homogeneous float aggregates.
 
  Tested on powerpc64le-linux.
 
  OK for mainline?
  [ The patch should then also go into the 4.8 and 4.9 branches for
  consistency. ]
 
 Okay everywhere.

IMHO it should only go to the trunk, same as
https://gcc.gnu.org/ml/gcc-patches/2014-07/msg00540.html (see also related
PRs, has been discussed on IRC too).

If it is a corner case, the better, fewer people will be affected when
transitioning from 4.9.x to 5.0 (or 4.10?).  But 4.9.0 is for a few months
in the wild already, I don't think ABI changes are something people would
like to see between 4.9.0 and 4.9.1 or any other patchlevel revision,
most people aren't prepared to rebuild the world just because they have
upgraded to a newer compiler snapshot or new patchlevel version.

Jakub


Re: [PATCH, rs6000] Fix ELFv2 homogeneous float aggregate ABI bug

2014-07-10 Thread Ulrich Weigand
Jakub Jelinek wrote:
 On Wed, Jul 09, 2014 at 02:19:36PM -0400, David Edelsohn wrote:
   This is an ABI change for the affected corner cases of the ELFv2
   ABI.  However, those cases should be extremely rare; the full
   compat.exe and struct-layout-1.exp ABI compatibility test suite
   passed, with the exception of two tests specifically intended
   to test multiple homogeneous float aggregates.
  
   Tested on powerpc64le-linux.
  
   OK for mainline?
   [ The patch should then also go into the 4.8 and 4.9 branches for
   consistency. ]
  
  Okay everywhere.
 
 IMHO it should only go to the trunk, same as
 https://gcc.gnu.org/ml/gcc-patches/2014-07/msg00540.html (see also related
 PRs, has been discussed on IRC too).
 
 If it is a corner case, the better, fewer people will be affected when
 transitioning from 4.9.x to 5.0 (or 4.10?).  But 4.9.0 is for a few months
 in the wild already, I don't think ABI changes are something people would
 like to see between 4.9.0 and 4.9.1 or any other patchlevel revision,
 most people aren't prepared to rebuild the world just because they have
 upgraded to a newer compiler snapshot or new patchlevel version.

Richard Biener wrote:
 Can you add -Wpsabi warnings for all the issues you found?  That way
 a world re-build will at least show if anything important is affected.

OK, to summarize the off-line discussions on these concerns, what I'm
now planning to do is the following:

- Add -Wpsabi warnings for all cases changed to the mainline patch
- Backport *only* the warnings, but not the actual change in behavior
  to the 4.8 and 4.9 branches.

This should match existing precent of how to handle ABI changes.

Doing a world-rebuild with the warnings enabled would certainly be
extremely useful feedback ...

Thanks,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



Re: [PATCH, rs6000] Fix ELFv2 homogeneous float aggregate ABI bug

2014-07-09 Thread David Edelsohn
On Wed, Jul 9, 2014 at 12:02 PM, Ulrich Weigand uweig...@de.ibm.com wrote:
 Hello,

 the implementation of homogenous float aggregates for the ELFv2 ABI has
 unfortunately shown to have a bug in a corner case.

 The problem is that because such aggregates are packed in the argument
 save area, but each (4-byte) float occupies one of just 13 registers
 on its own, we may run out of registers while we're still within the
 first 64 bytes of the argument save area.

 Usually, any argument that doesn't fit into register should go in
 memory.  But that rule doesn't apply within the first 64 bytes, where
 such arguments need to go into GPRs first.  This is important since
 the ABI guarantees that the first 64 bytes of the save area are free,
 e.g. to store GPRs into.  If an argument is actually passed within
 those first 64 bytes, some code (e.g. libffi assembler stubs) may
 clobber its contents.

 Now, the existing rs6000_function_arg code will handle this case
 correctly if the extra floats come in a *new* argument.  For example,
 in the following test case

 struct float2 { float x[2]; };
 struct float6 { float x[6]; };
 struct float8 { float x[8]; };

 float func (struct float8 a, struct float6 b, struct float2 c);

 both parts of c are correctly expected in r10.

 However, the code handles incorrectly the case where a *single*
 aggregate argument is split between FPRs and extra floats.
 For example, b.x[5] is expected in memory, although it ought
 to reside in r9.

 The appended patch fixes the implementation to comply with the ABI.

 This is an ABI change for the affected corner cases of the ELFv2
 ABI.  However, those cases should be extremely rare; the full
 compat.exe and struct-layout-1.exp ABI compatibility test suite
 passed, with the exception of two tests specifically intended
 to test multiple homogeneous float aggregates.

 Tested on powerpc64le-linux.

 OK for mainline?
 [ The patch should then also go into the 4.8 and 4.9 branches for
 consistency. ]

Okay everywhere.

Thanks, David