Re: [PATCH] Simplify a VEC_SELECT fed by its own inverse

2014-06-02 Thread Andreas Schwab
Bill Schmidt wschm...@linux.vnet.ibm.com writes:

 Index: gcc/testsuite/gcc.target/powerpc/vsxcopy.c
 ===
 --- gcc/testsuite/gcc.target/powerpc/vsxcopy.c(revision 0)
 +++ gcc/testsuite/gcc.target/powerpc/vsxcopy.c(working copy)
 @@ -0,0 +1,15 @@
 +/* { dg-do compile { target { powerpc64*-*-* } } } */
 +/* { dg-require-effective-target powerpc_vsx_ok } */
 +/* { dg-options -O1 } */
 +/* { dg-final { scan-assembler lxvd2x } } */
 +/* { dg-final { scan-assembler stxvd2x } } */
 +/* { dg-final { scan-assembler-not xxpermdi } } */

You need -mvsx if you want to see VSX insns.  Tested on
powerpc64-suse-linux and installed as obvious.

Andreas.

* gcc.target/powerpc/vsxcopy.c (dg-options): Add -mvsx.

diff --git a/gcc/testsuite/gcc.target/powerpc/vsxcopy.c 
b/gcc/testsuite/gcc.target/powerpc/vsxcopy.c
index fc1f0bd..fbe3c67 100644
--- a/gcc/testsuite/gcc.target/powerpc/vsxcopy.c
+++ b/gcc/testsuite/gcc.target/powerpc/vsxcopy.c
@@ -1,6 +1,6 @@
 /* { dg-do compile { target { powerpc64*-*-* } } } */
 /* { dg-require-effective-target powerpc_vsx_ok } */
-/* { dg-options -O1 } */
+/* { dg-options -O1 -mvsx } */
 /* { dg-final { scan-assembler lxvd2x } } */
 /* { dg-final { scan-assembler stxvd2x } } */
 /* { dg-final { scan-assembler-not xxpermdi } } */
-- 
2.0.0

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
And now for something completely different.


Re: [PATCH] Simplify a VEC_SELECT fed by its own inverse

2014-04-22 Thread Marc Glisse

On Mon, 21 Apr 2014, Richard Henderson wrote:


On 04/21/2014 01:19 PM, Bill Schmidt wrote:

+  if (GET_CODE (trueop0) == VEC_SELECT
+  GET_MODE (XEXP (trueop0, 0)) == mode)
+   {
+ rtx op0_subop1 = XEXP (trueop0, 1);
+ gcc_assert (GET_CODE (op0_subop1) == PARALLEL);
+ gcc_assert (XVECLEN (trueop1, 0) == GET_MODE_NUNITS (mode));
+
+ /* Apply the outer ordering vector to the inner one.  (The inner
+ordering vector is expressly permitted to be of a different
+length than the outer one.)  If the result is { 0, 1, ..., n-1 }
+then the two VEC_SELECTs cancel.  */
+ for (int i = 0; i  XVECLEN (trueop1, 0); ++i)
+   {
+ rtx x = XVECEXP (trueop1, 0, i);
+ gcc_assert (CONST_INT_P (x));
+ rtx y = XVECEXP (op0_subop1, 0, INTVAL (x));
+ gcc_assert (CONST_INT_P (y));


In two places you're asserting that you've got a constant permutation.  Surely
there should be a non-assertion check and graceful exit for either select to be
a variable permutation.


Note that in the case where trueop0 is a CONST_VECTOR, we already check 
each element of trueop1:


  gcc_assert (CONST_INT_P (x));

In the case where the result is a scalar, we also have:

  gcc_assert (CONST_INT_P (XVECEXP (trueop1, 0, 0)));

so we will have other issues if something ever creates a variable 
vec_select. Not that a graceful exit will hurt of course.


--
Marc Glisse


Re: [PATCH] Simplify a VEC_SELECT fed by its own inverse

2014-04-22 Thread Richard Sandiford
Marc Glisse marc.gli...@inria.fr writes:
 On Mon, 21 Apr 2014, Richard Henderson wrote:

 On 04/21/2014 01:19 PM, Bill Schmidt wrote:
 +  if (GET_CODE (trueop0) == VEC_SELECT
 +  GET_MODE (XEXP (trueop0, 0)) == mode)
 +   {
 + rtx op0_subop1 = XEXP (trueop0, 1);
 + gcc_assert (GET_CODE (op0_subop1) == PARALLEL);
 + gcc_assert (XVECLEN (trueop1, 0) == GET_MODE_NUNITS (mode));
 +
 + /* Apply the outer ordering vector to the inner one.  (The inner
 +ordering vector is expressly permitted to be of a different
 +length than the outer one.)  If the result is { 0, 1, ..., n-1 }
 +then the two VEC_SELECTs cancel.  */
 + for (int i = 0; i  XVECLEN (trueop1, 0); ++i)
 +   {
 + rtx x = XVECEXP (trueop1, 0, i);
 + gcc_assert (CONST_INT_P (x));
 + rtx y = XVECEXP (op0_subop1, 0, INTVAL (x));
 + gcc_assert (CONST_INT_P (y));

 In two places you're asserting that you've got a constant permutation.  
 Surely
 there should be a non-assertion check and graceful exit for either
 select to be
 a variable permutation.

 Note that in the case where trueop0 is a CONST_VECTOR, we already check 
 each element of trueop1:

gcc_assert (CONST_INT_P (x));

 In the case where the result is a scalar, we also have:

gcc_assert (CONST_INT_P (XVECEXP (trueop1, 0, 0)));

 so we will have other issues if something ever creates a variable 
 vec_select. Not that a graceful exit will hurt of course.

I realise this isn't the point, but maybe we should go easy on this kind
of gcc_assert.  Using INTVAL is itself an assertion that you have a
CONST_INT.  Adding gcc_asserts on top (and so forcing the assert even
in release compilers) kind-of subverts the --enable-checking option.

Thanks,
Richard


Re: [PATCH] Simplify a VEC_SELECT fed by its own inverse

2014-04-22 Thread Bill Schmidt
Hi,

Below is the revised patch addressing Richard's concerns about the
assertions.  Bootstrapped and tested on
powerpc64[,le]-unknown-linux-gnu.  Ok for trunk?

Thanks,
Bill


[gcc]

2014-04-22  Bill Schmidt  wschm...@linux.vnet.ibm.com

* simplify-rtx.c (simplify_binary_operation_1): Optimize case of
nested VEC_SELECTs that are inverses of each other.

[gcc/testsuite]

2014-04-22  Bill Schmidt  wschm...@linux.vnet.ibm.com

* gcc.target/powerpc/vsxcopy.c: New test.


Index: gcc/simplify-rtx.c
===
--- gcc/simplify-rtx.c  (revision 209516)
+++ gcc/simplify-rtx.c  (working copy)
@@ -3673,6 +3673,31 @@ simplify_binary_operation_1 (enum rtx_code code, e
}
}
 
+  /* If we have two nested selects that are inverses of each
+other, replace them with the source operand.  */
+  if (GET_CODE (trueop0) == VEC_SELECT
+  GET_MODE (XEXP (trueop0, 0)) == mode)
+   {
+ rtx op0_subop1 = XEXP (trueop0, 1);
+ gcc_assert (GET_CODE (op0_subop1) == PARALLEL);
+ gcc_assert (XVECLEN (trueop1, 0) == GET_MODE_NUNITS (mode));
+
+ /* Apply the outer ordering vector to the inner one.  (The inner
+ordering vector is expressly permitted to be of a different
+length than the outer one.)  If the result is { 0, 1, ..., n-1 }
+then the two VEC_SELECTs cancel.  */
+ for (int i = 0; i  XVECLEN (trueop1, 0); ++i)
+   {
+ rtx x = XVECEXP (trueop1, 0, i);
+ if (!CONST_INT_P (x))
+   return 0;
+ rtx y = XVECEXP (op0_subop1, 0, INTVAL (x));
+ if (!CONST_INT_P (y) || i != INTVAL (y))
+   return 0;
+   }
+ return XEXP (trueop0, 0);
+   }
+
   return 0;
 case VEC_CONCAT:
   {
Index: gcc/testsuite/gcc.target/powerpc/vsxcopy.c
===
--- gcc/testsuite/gcc.target/powerpc/vsxcopy.c  (revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vsxcopy.c  (working copy)
@@ -0,0 +1,15 @@
+/* { dg-do compile { target { powerpc64*-*-* } } } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options -O1 } */
+/* { dg-final { scan-assembler lxvd2x } } */
+/* { dg-final { scan-assembler stxvd2x } } */
+/* { dg-final { scan-assembler-not xxpermdi } } */
+
+typedef float vecf __attribute__ ((vector_size (16)));
+extern vecf j, k;
+
+void fun (void)
+{
+  j = k;
+}
+




[PATCH] Simplify a VEC_SELECT fed by its own inverse

2014-04-21 Thread Bill Schmidt
Hi,

This patch adds a small RTL simplification for the case where the first
operand to a VEC_SELECT is another VEC_SELECT with an inverse selection
function.  E.g.,

(vec_select:V4SF (vec_select:V4SF (OP:V4SF) (parallel [2 3 0 1]))
 (parallel [2 3 0 1]))

may be simplified to (OP:V4SF).  This comes up in practice on
powerpc64le-linux-gnu because of the characteristics of certain loads
and stores in the Power7 and later ISAs.  When running in little endian
modes, one of these loads must be followed by a permute to reorder the
vector elements, and one of these stores must be preceded by a permute
to once again reorder them.  Thus a simple copy may end up with two
permutes whose effect cancels each other out.  The patch ensures those
redundancies are detected and removed.

Note that it would be possible to do a more general transformation here,
in which any vec_select feeding another could be replaced by a
vec_select performing the composite function of the other two.  I have
not done this because I am unaware of this situation arising in
practice.  If it's desirable, I can extend the patch in this direction.

Bootstrapped and tested on powerpc64[,le]-unknown-linux-gnu with no
regressions.  Is this ok for trunk?

Thanks,
Bill


[gcc]

2014-04-21  Bill Schmidt  wschm...@linux.vnet.ibm.com

* simplify-rtx.c (simplify_binary_operation_1): Optimize case of
nested VEC_SELECTs that are inverses of each other.

[gcc/testsuite]

2014-04-21  Bill Schmidt  wschm...@linux.vnet.ibm.com

* gcc.target/powerpc/vsxcopy.c: New test.


Index: gcc/simplify-rtx.c
===
--- gcc/simplify-rtx.c  (revision 209516)
+++ gcc/simplify-rtx.c  (working copy)
@@ -3673,6 +3673,34 @@ simplify_binary_operation_1 (enum rtx_code code, e
}
}
 
+  /* If we have two nested selects that are inverses of each
+other, replace them with the source operand.  */
+  if (GET_CODE (trueop0) == VEC_SELECT)
+   {
+ enum machine_mode reg_mode = GET_MODE (XEXP (trueop0, 0));
+ rtx op0_subop1 = XEXP (trueop0, 1);
+ gcc_assert (VECTOR_MODE_P (reg_mode));
+ gcc_assert (GET_MODE_INNER (mode) == GET_MODE_INNER (reg_mode));
+ gcc_assert (GET_CODE (op0_subop1) == PARALLEL);
+
+ if (XVECLEN (trueop1, 0) == XVECLEN (op0_subop1, 0))
+   {
+ /* Apply the second ordering vector to the first.
+If the result is { 0, 1, ..., n-1 } then the
+two VEC_SELECTs cancel.  */
+ for (int i = 0; i  XVECLEN (trueop1, 0); ++i)
+   {
+ rtx x = XVECEXP (trueop1, 0, i);
+ gcc_assert (CONST_INT_P (x));
+ rtx y = XVECEXP (op0_subop1, 0, INTVAL (x));
+ gcc_assert (CONST_INT_P (y));
+ if (i != INTVAL (y))
+   return 0;
+   }
+ return XEXP (trueop0, 0);
+   }
+   }
+
   return 0;
 case VEC_CONCAT:
   {
Index: gcc/testsuite/gcc.target/powerpc/vsxcopy.c
===
--- gcc/testsuite/gcc.target/powerpc/vsxcopy.c  (revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vsxcopy.c  (working copy)
@@ -0,0 +1,15 @@
+/* { dg-do compile { target { powerpc64*-*-* } } } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options -O1 } */
+/* { dg-final { scan-assembler lxvd2x } } */
+/* { dg-final { scan-assembler stxvd2x } } */
+/* { dg-final { scan-assembler-not xxpermdi } } */
+
+typedef float vecf __attribute__ ((vector_size (16)));
+extern vecf j, k;
+
+void fun (void)
+{
+  j = k;
+}
+




Re: [PATCH] Simplify a VEC_SELECT fed by its own inverse

2014-04-21 Thread Marc Glisse

On Mon, 21 Apr 2014, Bill Schmidt wrote:


Note that it would be possible to do a more general transformation here,
in which any vec_select feeding another could be replaced by a
vec_select performing the composite function of the other two.  I have
not done this because I am unaware of this situation arising in
practice.  If it's desirable, I can extend the patch in this direction.


It does arise, but I think it isn't done because not all permutations are 
(optimally) supported by all targets.



Index: gcc/simplify-rtx.c
===
--- gcc/simplify-rtx.c  (revision 209516)
+++ gcc/simplify-rtx.c  (working copy)
@@ -3673,6 +3673,34 @@ simplify_binary_operation_1 (enum rtx_code code, e
}
}

+  /* If we have two nested selects that are inverses of each
+other, replace them with the source operand.  */
+  if (GET_CODE (trueop0) == VEC_SELECT)
+   {
+ enum machine_mode reg_mode = GET_MODE (XEXP (trueop0, 0));
+ rtx op0_subop1 = XEXP (trueop0, 1);
+ gcc_assert (VECTOR_MODE_P (reg_mode));
+ gcc_assert (GET_MODE_INNER (mode) == GET_MODE_INNER (reg_mode));
+ gcc_assert (GET_CODE (op0_subop1) == PARALLEL);
+
+ if (XVECLEN (trueop1, 0) == XVECLEN (op0_subop1, 0))
+   {
+ /* Apply the second ordering vector to the first.
+If the result is { 0, 1, ..., n-1 } then the
+two VEC_SELECTs cancel.  */
+ for (int i = 0; i  XVECLEN (trueop1, 0); ++i)
+   {
+ rtx x = XVECEXP (trueop1, 0, i);
+ gcc_assert (CONST_INT_P (x));
+ rtx y = XVECEXP (op0_subop1, 0, INTVAL (x));
+ gcc_assert (CONST_INT_P (y));
+ if (i != INTVAL (y))
+   return 0;
+   }
+ return XEXP (trueop0, 0);
+   }
+   }


I may have missed it, but don't you want to check that what you are 
returning has the right mode/length (or generate the obvious vec_select 
otherwise)? I don't know if any platform has such constructions (probably 
not), but in principle you could start from a vector of size 4, extract 
{1,0} from it, extract {1,0} from that, and you don't want to return the 
initial vector as is. On the other hand, I don't think you really care 
whether trueop1 is smaller than op0_subop1. Starting from a vector of size 
2, extracting {1,0,1,0} then {3,0} gives the initial vector just fine.


--
Marc Glisse


Re: [PATCH] Simplify a VEC_SELECT fed by its own inverse

2014-04-21 Thread Bill Schmidt
Hi Marc,

Good points!  I will rework the patch with your suggestions in mind.

Thanks!
Bill

On Mon, 2014-04-21 at 18:51 +0200, Marc Glisse wrote:
 On Mon, 21 Apr 2014, Bill Schmidt wrote:
 
  Note that it would be possible to do a more general transformation here,
  in which any vec_select feeding another could be replaced by a
  vec_select performing the composite function of the other two.  I have
  not done this because I am unaware of this situation arising in
  practice.  If it's desirable, I can extend the patch in this direction.
 
 It does arise, but I think it isn't done because not all permutations are 
 (optimally) supported by all targets.
 
  Index: gcc/simplify-rtx.c
  ===
  --- gcc/simplify-rtx.c  (revision 209516)
  +++ gcc/simplify-rtx.c  (working copy)
  @@ -3673,6 +3673,34 @@ simplify_binary_operation_1 (enum rtx_code code, e
  }
  }
 
  +  /* If we have two nested selects that are inverses of each
  +other, replace them with the source operand.  */
  +  if (GET_CODE (trueop0) == VEC_SELECT)
  +   {
  + enum machine_mode reg_mode = GET_MODE (XEXP (trueop0, 0));
  + rtx op0_subop1 = XEXP (trueop0, 1);
  + gcc_assert (VECTOR_MODE_P (reg_mode));
  + gcc_assert (GET_MODE_INNER (mode) == GET_MODE_INNER (reg_mode));
  + gcc_assert (GET_CODE (op0_subop1) == PARALLEL);
  +
  + if (XVECLEN (trueop1, 0) == XVECLEN (op0_subop1, 0))
  +   {
  + /* Apply the second ordering vector to the first.
  +If the result is { 0, 1, ..., n-1 } then the
  +two VEC_SELECTs cancel.  */
  + for (int i = 0; i  XVECLEN (trueop1, 0); ++i)
  +   {
  + rtx x = XVECEXP (trueop1, 0, i);
  + gcc_assert (CONST_INT_P (x));
  + rtx y = XVECEXP (op0_subop1, 0, INTVAL (x));
  + gcc_assert (CONST_INT_P (y));
  + if (i != INTVAL (y))
  +   return 0;
  +   }
  + return XEXP (trueop0, 0);
  +   }
  +   }
 
 I may have missed it, but don't you want to check that what you are 
 returning has the right mode/length (or generate the obvious vec_select 
 otherwise)? I don't know if any platform has such constructions (probably 
 not), but in principle you could start from a vector of size 4, extract 
 {1,0} from it, extract {1,0} from that, and you don't want to return the 
 initial vector as is. On the other hand, I don't think you really care 
 whether trueop1 is smaller than op0_subop1. Starting from a vector of size 
 2, extracting {1,0,1,0} then {3,0} gives the initial vector just fine.
 



Re: [PATCH] Simplify a VEC_SELECT fed by its own inverse

2014-04-21 Thread Bill Schmidt
Hi,

Here's a revised patch in response to Marc's comments.  Again,
bootstrapped and tested on powerpc64[,le]-unknown-linux-gnu.  Is this ok
for trunk?

Thanks,
Bill


[gcc]

2014-04-21  Bill Schmidt  wschm...@linux.vnet.ibm.com

* simplify-rtx.c (simplify_binary_operation_1): Optimize case of
nested VEC_SELECTs that are inverses of each other.

[gcc/testsuite]

2014-04-21  Bill Schmidt  wschm...@linux.vnet.ibm.com

* gcc.target/powerpc/vsxcopy.c: New test.


Index: gcc/simplify-rtx.c
===
--- gcc/simplify-rtx.c  (revision 209516)
+++ gcc/simplify-rtx.c  (working copy)
@@ -3673,6 +3673,31 @@ simplify_binary_operation_1 (enum rtx_code code, e
}
}
 
+  /* If we have two nested selects that are inverses of each
+other, replace them with the source operand.  */
+  if (GET_CODE (trueop0) == VEC_SELECT
+  GET_MODE (XEXP (trueop0, 0)) == mode)
+   {
+ rtx op0_subop1 = XEXP (trueop0, 1);
+ gcc_assert (GET_CODE (op0_subop1) == PARALLEL);
+ gcc_assert (XVECLEN (trueop1, 0) == GET_MODE_NUNITS (mode));
+
+ /* Apply the outer ordering vector to the inner one.  (The inner
+ordering vector is expressly permitted to be of a different
+length than the outer one.)  If the result is { 0, 1, ..., n-1 }
+then the two VEC_SELECTs cancel.  */
+ for (int i = 0; i  XVECLEN (trueop1, 0); ++i)
+   {
+ rtx x = XVECEXP (trueop1, 0, i);
+ gcc_assert (CONST_INT_P (x));
+ rtx y = XVECEXP (op0_subop1, 0, INTVAL (x));
+ gcc_assert (CONST_INT_P (y));
+ if (i != INTVAL (y))
+   return 0;
+   }
+ return XEXP (trueop0, 0);
+   }
+
   return 0;
 case VEC_CONCAT:
   {
Index: gcc/testsuite/gcc.target/powerpc/vsxcopy.c
===
--- gcc/testsuite/gcc.target/powerpc/vsxcopy.c  (revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vsxcopy.c  (working copy)
@@ -0,0 +1,15 @@
+/* { dg-do compile { target { powerpc64*-*-* } } } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options -O1 } */
+/* { dg-final { scan-assembler lxvd2x } } */
+/* { dg-final { scan-assembler stxvd2x } } */
+/* { dg-final { scan-assembler-not xxpermdi } } */
+
+typedef float vecf __attribute__ ((vector_size (16)));
+extern vecf j, k;
+
+void fun (void)
+{
+  j = k;
+}
+




Re: [PATCH] Simplify a VEC_SELECT fed by its own inverse

2014-04-21 Thread Richard Henderson
On 04/21/2014 01:19 PM, Bill Schmidt wrote:
 +  if (GET_CODE (trueop0) == VEC_SELECT
 +GET_MODE (XEXP (trueop0, 0)) == mode)
 + {
 +   rtx op0_subop1 = XEXP (trueop0, 1);
 +   gcc_assert (GET_CODE (op0_subop1) == PARALLEL);
 +   gcc_assert (XVECLEN (trueop1, 0) == GET_MODE_NUNITS (mode));
 +
 +   /* Apply the outer ordering vector to the inner one.  (The inner
 +  ordering vector is expressly permitted to be of a different
 +  length than the outer one.)  If the result is { 0, 1, ..., n-1 }
 +  then the two VEC_SELECTs cancel.  */
 +   for (int i = 0; i  XVECLEN (trueop1, 0); ++i)
 + {
 +   rtx x = XVECEXP (trueop1, 0, i);
 +   gcc_assert (CONST_INT_P (x));
 +   rtx y = XVECEXP (op0_subop1, 0, INTVAL (x));
 +   gcc_assert (CONST_INT_P (y));

In two places you're asserting that you've got a constant permutation.  Surely
there should be a non-assertion check and graceful exit for either select to be
a variable permutation.


r~


Re: [PATCH] Simplify a VEC_SELECT fed by its own inverse

2014-04-21 Thread Bill Schmidt
On Mon, 2014-04-21 at 13:48 -0700, Richard Henderson wrote:
 On 04/21/2014 01:19 PM, Bill Schmidt wrote:
  +  if (GET_CODE (trueop0) == VEC_SELECT
  +  GET_MODE (XEXP (trueop0, 0)) == mode)
  +   {
  + rtx op0_subop1 = XEXP (trueop0, 1);
  + gcc_assert (GET_CODE (op0_subop1) == PARALLEL);
  + gcc_assert (XVECLEN (trueop1, 0) == GET_MODE_NUNITS (mode));
  +
  + /* Apply the outer ordering vector to the inner one.  (The inner
  +ordering vector is expressly permitted to be of a different
  +length than the outer one.)  If the result is { 0, 1, ..., n-1 }
  +then the two VEC_SELECTs cancel.  */
  + for (int i = 0; i  XVECLEN (trueop1, 0); ++i)
  +   {
  + rtx x = XVECEXP (trueop1, 0, i);
  + gcc_assert (CONST_INT_P (x));
  + rtx y = XVECEXP (op0_subop1, 0, INTVAL (x));
  + gcc_assert (CONST_INT_P (y));
 
 In two places you're asserting that you've got a constant permutation.  Surely
 there should be a non-assertion check and graceful exit for either select to 
 be
 a variable permutation.

Ah.  I was not aware this was even a possibility.  From rtl.def:

/* Describes an operation that selects parts of a vector.   
   Operands 0 is the source vector, operand 1 is a PARALLEL that contains   
   a CONST_INT for each of the subparts of the result vector, giving the
   number of the source subpart that should be stored into it.  */  
DEF_RTL_EXPR(VEC_SELECT, vec_select, ee, RTX_BIN_ARITH) 

If variable permutations are possible with VEC_SELECT, then I suppose
this commentary should be updated.  The GCC internals document contains
similar text in section 13.12.

I'll rebuild the patch per your comments tomorrow.  Thanks for letting
me know about this!

Regards,
Bill

 
 
 r~