Re: [PATCH, rs6000] Fix PR79261 (vec_xxpermdi is not endian-sensitive)

2017-02-17 Thread Segher Boessenkool
Hi Bill,

On Thu, Feb 16, 2017 at 02:16:02PM -0600, Bill Schmidt wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79261 records that the interface
> vec_xxpermdi isn't implemented in a bi-endian fashion; instead, it produces
> results appropriate for big-endian vector element numbering even when run on
> a little endian machine.  This is not part of the "official vector API" from
> the ELFv2 ABI document, but should still have appropriate bi-endian behavior.

Maybe this needs adding (or updating) some documentation?

> +;; Special version of xxpermdi that retains big-endian semantics.
> +(define_expand "vsx_xxpermdi__be"
> +  [(match_operand:VSX_L 0 "vsx_register_operand" "")
> +   (match_operand:VSX_L 1 "vsx_register_operand" "")
> +   (match_operand:VSX_L 2 "vsx_register_operand" "")
> +   (match_operand:QI 3 "u5bit_cint_operand" "")]

Please remove the "".

Okay with that and perhaps some doc changes.  Thanks,


Segher


[PATCH, rs6000] Fix PR79261 (vec_xxpermdi is not endian-sensitive)

2017-02-16 Thread Bill Schmidt
Hi,

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79261 records that the interface
vec_xxpermdi isn't implemented in a bi-endian fashion; instead, it produces
results appropriate for big-endian vector element numbering even when run on
a little endian machine.  This is not part of the "official vector API" from
the ELFv2 ABI document, but should still have appropriate bi-endian behavior.

The problem appears to have arisen because some of GCC's internal usages of 
the xxpermdi instruction wanted to use them in big-endian form, which is 
natural to the original hardware instruction.  The define_expand used for this
(vsx_xxpermdi_) is also the one that is used for the built-in forms.
Since these two uses require different semantics, I've created a new
vsx_xxpermdi__be define_expand to be used for the internal uses, and
re-directed those usages there.  That version retains the original semantics,
while the semantics for the built-ins are changed to be endian-sensitive.

I've added a runtime test to verify that the vec_xxpermdi intrinsics produce
correct results for the various vector modes in both big- and little-endian
environments.

Bootstrapped and tested on powerpc64[le]-unknown-linux-gnu with no regressions
in either environment.  Is this ok for trunk, and after a suitable period, for
backport to GCC 5 and 6?

Thanks,
Bill


[gcc]

2017-02-16  Bill Schmidt  

PR target/79261
* config/rs6000/rs6000.c (rs6000_expand_ternop_builtin): Add
support for CODE_FOR_vsx_xxpermdi_v2d[fi]_be.
* config/rs6000/rs6000.md (reload_gpr_from_vsx): Call
generator for vsx_xxpermdi__be.
* config/rs6000/vsx.md (vsx_xxpermdi_): Remove logic to
force big-endian semantics.
(vsx_xxpermdi__be): New define_expand with same
implementation as previous version of vsx_xxpermdi_.

[gcc/testsuite]

2017-02-16  Bill Schmidt  

PR target/79261
* gcc.target/powerpc/vec-xxpermdi.c: New file.


Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 245490)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -15436,6 +15436,8 @@ rs6000_expand_ternop_builtin (enum insn_code icode
 }
   else if (icode == CODE_FOR_vsx_xxpermdi_v2df
|| icode == CODE_FOR_vsx_xxpermdi_v2di
+   || icode == CODE_FOR_vsx_xxpermdi_v2df_be
+   || icode == CODE_FOR_vsx_xxpermdi_v2di_be
|| icode == CODE_FOR_vsx_xxsldwi_v16qi
|| icode == CODE_FOR_vsx_xxsldwi_v8hi
|| icode == CODE_FOR_vsx_xxsldwi_v4si
Index: gcc/config/rs6000/rs6000.md
===
--- gcc/config/rs6000/rs6000.md (revision 245490)
+++ gcc/config/rs6000/rs6000.md (working copy)
@@ -8419,7 +8419,7 @@
   rtx gpr_lo_reg = gen_lowpart (DFmode, dest);
 
   emit_insn (gen_p8_mfvsrd_3_ (gpr_hi_reg, src));
-  emit_insn (gen_vsx_xxpermdi_ (tmp, src, src, GEN_INT (3)));
+  emit_insn (gen_vsx_xxpermdi__be (tmp, src, src, GEN_INT (3)));
   emit_insn (gen_p8_mfvsrd_3_ (gpr_lo_reg, tmp));
   DONE;
 }
Index: gcc/config/rs6000/vsx.md
===
--- gcc/config/rs6000/vsx.md(revision 245490)
+++ gcc/config/rs6000/vsx.md(working copy)
@@ -2461,6 +2461,38 @@
  op1 = gen_lowpart (V2DImode, op1);
}
 }
+  emit_insn (gen (target, op0, op1, perm0, perm1));
+  DONE;
+})
+
+;; Special version of xxpermdi that retains big-endian semantics.
+(define_expand "vsx_xxpermdi__be"
+  [(match_operand:VSX_L 0 "vsx_register_operand" "")
+   (match_operand:VSX_L 1 "vsx_register_operand" "")
+   (match_operand:VSX_L 2 "vsx_register_operand" "")
+   (match_operand:QI 3 "u5bit_cint_operand" "")]
+  "VECTOR_MEM_VSX_P (mode)"
+{
+  rtx target = operands[0];
+  rtx op0 = operands[1];
+  rtx op1 = operands[2];
+  int mask = INTVAL (operands[3]);
+  rtx perm0 = GEN_INT ((mask >> 1) & 1);
+  rtx perm1 = GEN_INT ((mask & 1) + 2);
+  rtx (*gen) (rtx, rtx, rtx, rtx, rtx);
+
+  if (mode == V2DFmode)
+gen = gen_vsx_xxpermdi2_v2df_1;
+  else
+{
+  gen = gen_vsx_xxpermdi2_v2di_1;
+  if (mode != V2DImode)
+   {
+ target = gen_lowpart (V2DImode, target);
+ op0 = gen_lowpart (V2DImode, op0);
+ op1 = gen_lowpart (V2DImode, op1);
+   }
+}
   /* In little endian mode, vsx_xxpermdi2__1 will perform a
  transformation we don't want; it is necessary for
  rs6000_expand_vec_perm_const_1 but not for this use.  So we
@@ -3870,7 +3902,7 @@
 {
   rtx op1 = operands[1];
   rtx v4si_tmp = gen_reg_rtx (V4SImode);
-  emit_insn (gen_vsx_xxpermdi_v4si (v4si_tmp, op1, op1, const1_rtx));
+  emit_insn (gen_vsx_xxpermdi_v4si_be (v4si_tmp, op1, op1, const1_rtx));
   operands[1] = v4si_tmp;
   operands[3] = GEN_INT (12 - INTVAL (operands[3]));
 }
Index: