Re: [PATCH] Fix vector rotate regression (PR tree-optimization/57233)

2014-06-26 Thread Jakub Jelinek
On Thu, Jun 26, 2014 at 07:43:55AM +0200, Marc Glisse wrote:
 +  if (compute_type == TREE_TYPE (type)
 +   !VECTOR_INTEGER_TYPE_P (TREE_TYPE (rhs2)))
 +{
 +  optab oplv, opl, oprv, opr, opo;
 +  oplv = optab_for_tree_code (LSHIFT_EXPR, type, optab_vector);
 +  oprv = optab_for_tree_code (RSHIFT_EXPR, type, optab_vector);
 +  opo = optab_for_tree_code (BIT_IOR_EXPR, type, optab_default);
 +  opl = optab_for_tree_code (LSHIFT_EXPR, type, optab_scalar);
 +  opr = optab_for_tree_code (RSHIFT_EXPR, type, optab_scalar);
 
 How well are we separating lshiftrt from ashiftrt? Are ROTATE_EXPR always on
 an unsigned type so it is fine? Or do we expand one in terms of the other if
 it isn't available so it doesn't matter?

You're right, for RSHIFT_EXPR I should use
oprv = vlshr_optab;
opr = lshr_optab;
because expmed.c always uses lshiftrt.
 
 +  if (optab_handler (oplv, TYPE_MODE (type)) != CODE_FOR_nothing
 +   optab_handler (opl, TYPE_MODE (type)) == CODE_FOR_nothing
 +   optab_handler (oprv, TYPE_MODE (type)) != CODE_FOR_nothing
 +   optab_handler (opr, TYPE_MODE (type)) == CODE_FOR_nothing)
 +{
 +  opl = oplv;
 +  opr = oprv;
 +}
 
 Maybe arrange the conditions in order (oplv!=  oprv!=  (opl== ||
 opr==)), or separate the replacement of opl and of opv into 2 separate ifs?

The reason I wrote them in this order was so that it fits on the lines ;)
Guess two separate ifs would be fine.

 Somehow, it feels like those checks should be somewhere in get_compute_type
 so we test both scalar and vector versions for each size, or we could call
 get_compute_type for both and pick the best.

It would be easier to just call get_compute_type in each case and pick the
wider, but the preexisting case didn't bother, so I haven't bothered either.
Passing two optabs optionally to get_compute_type would be IMHO too ugly.
 
 +  compute_type = get_compute_type (LSHIFT_EXPR, opl, type);
 +  if (compute_type == TREE_TYPE (type)
 +  || compute_type != get_compute_type (RSHIFT_EXPR, opr, type)
 +  || compute_type != get_compute_type (BIT_IOR_EXPR, opo, type))
 +compute_type = TREE_TYPE (type);
 
 Since we have determined compute_type from ashift (let's assume that's the
 one least likely to exist), I would just check that optab is ok with using
 this mode for the other 2 ops. Here, if we have shifts in 128 bits and ior
 in both 128 and 256 bits, we will fail (I thought that might be the case in
 AVX, but apparently not). Plus it is faster ;-)

Makes sense.

 Does rotate hit PR 56873? (I noticed the -mno-xop and no -mxop test)

There are exec testcases, -mxop AFAIK has rotates, so I think it is not
worth testing.  And there is a single asm match compile testcase, where
the -mno-xop ensures that when somebody -mxop in RUNTESTFLAGS or CPU
defaults to -mxop we don't get a matching failure, because in that case it
would emit a vector rotate insn.

Jakub


Re: [PATCH] Fix vector rotate regression (PR tree-optimization/57233)

2014-06-26 Thread Jakub Jelinek
On Thu, Jun 26, 2014 at 08:10:15AM +0200, Jakub Jelinek wrote:
  +compute_type = get_compute_type (LSHIFT_EXPR, opl, type);
  +if (compute_type == TREE_TYPE (type)
  +|| compute_type != get_compute_type (RSHIFT_EXPR, opr, type)
  +|| compute_type != get_compute_type (BIT_IOR_EXPR, opo, type))
  +  compute_type = TREE_TYPE (type);
  
  Since we have determined compute_type from ashift (let's assume that's the
  one least likely to exist), I would just check that optab is ok with using
  this mode for the other 2 ops. Here, if we have shifts in 128 bits and ior
  in both 128 and 256 bits, we will fail (I thought that might be the case in
  AVX, but apparently not). Plus it is faster ;-)
 
 Makes sense.

...

So like this?  I've also changed get_compute_type so that it will DTRT
even for -mavx and V4DImode vectors, so e.g. f5/f6/f8 routines in
avx-pr57233.c improve.  Also, even for shifts by scalar, if e.g. target
doesn't have shifts by scalar at all, and only has narrower vector by vector
shifts, it should handle this case too.

2014-06-25  Jakub Jelinek  ja...@redhat.com

PR tree-optimization/57233
PR tree-optimization/61299
* tree-vect-generic.c (get_compute_type, count_type_subparts): New
functions.
(expand_vector_operations_1): Use them.  If {L,R}ROTATE_EXPR
would be lowered to scalar shifts, check if corresponding
shifts and vector BIT_IOR_EXPR are supported and don't lower
or lower just to narrower vector type in that case.
* expmed.c (expand_shift_1): Fix up handling of vector
shifts and rotates.

* gcc.dg/pr57233.c: New test.
* gcc.target/i386/pr57233.c: New test.
* gcc.target/i386/sse2-pr57233.c: New test.
* gcc.target/i386/avx-pr57233.c: New test.
* gcc.target/i386/avx2-pr57233.c: New test.
* gcc.target/i386/avx512f-pr57233.c: New test.
* gcc.target/i386/xop-pr57233.c: New test.

--- gcc/tree-vect-generic.c.jj  2014-05-11 22:21:28.0 +0200
+++ gcc/tree-vect-generic.c 2014-06-26 10:18:32.815895488 +0200
@@ -1334,15 +1334,67 @@ lower_vec_perm (gimple_stmt_iterator *gs
   update_stmt (gsi_stmt (*gsi));
 }
 
+/* Return type in which CODE operation with optab OP can be
+   computed.  */
+
+static tree
+get_compute_type (enum tree_code code, optab op, tree type)
+{
+  /* For very wide vectors, try using a smaller vector mode.  */
+  tree compute_type = type;
+  if (op
+   (!VECTOR_MODE_P (TYPE_MODE (type))
+ || optab_handler (op, TYPE_MODE (type)) == CODE_FOR_nothing))
+{
+  tree vector_compute_type
+   = type_for_widest_vector_mode (TREE_TYPE (type), op);
+  if (vector_compute_type != NULL_TREE
+  (TYPE_VECTOR_SUBPARTS (vector_compute_type)
+  TYPE_VECTOR_SUBPARTS (compute_type))
+  (optab_handler (op, TYPE_MODE (vector_compute_type))
+ != CODE_FOR_nothing))
+   compute_type = vector_compute_type;
+}
+
+  /* If we are breaking a BLKmode vector into smaller pieces,
+ type_for_widest_vector_mode has already looked into the optab,
+ so skip these checks.  */
+  if (compute_type == type)
+{
+  enum machine_mode compute_mode = TYPE_MODE (compute_type);
+  if (VECTOR_MODE_P (compute_mode))
+   {
+ if (op  optab_handler (op, compute_mode) != CODE_FOR_nothing)
+   return compute_type;
+ if (code == MULT_HIGHPART_EXPR
+  can_mult_highpart_p (compute_mode,
+ TYPE_UNSIGNED (compute_type)))
+   return compute_type;
+   }
+  /* There is no operation in hardware, so fall back to scalars.  */
+  compute_type = TREE_TYPE (type);
+}
+
+  return compute_type;
+}
+
+/* Helper function of expand_vector_operations_1.  Return number of
+   vector elements for vector types or 1 for other types.  */
+
+static inline int
+count_type_subparts (tree type)
+{
+  return VECTOR_TYPE_P (type) ? TYPE_VECTOR_SUBPARTS (type) : 1;
+}
+
 /* Process one statement.  If we identify a vector operation, expand it.  */
 
 static void
 expand_vector_operations_1 (gimple_stmt_iterator *gsi)
 {
   gimple stmt = gsi_stmt (*gsi);
-  tree lhs, rhs1, rhs2 = NULL, type, compute_type;
+  tree lhs, rhs1, rhs2 = NULL, type, compute_type = NULL_TREE;
   enum tree_code code;
-  enum machine_mode compute_mode;
   optab op = unknown_optab;
   enum gimple_rhs_class rhs_class;
   tree new_rhs;
@@ -1455,11 +1507,83 @@ expand_vector_operations_1 (gimple_stmt_
{
   op = optab_for_tree_code (code, type, optab_scalar);
 
+ compute_type = get_compute_type (code, op, type);
+ if (compute_type == type)
+   return;
  /* The rtl expander will expand vector/scalar as vector/vector
-if necessary.  Don't bother converting the stmt here.  */
- if (optab_handler (op, TYPE_MODE (type)) == CODE_FOR_nothing
-  

Re: [PATCH] Fix vector rotate regression (PR tree-optimization/57233)

2014-06-26 Thread Marc Glisse

On Thu, 26 Jun 2014, Jakub Jelinek wrote:


So like this?  I've also changed get_compute_type so that it will DTRT
even for -mavx and V4DImode vectors, so e.g. f5/f6/f8 routines in
avx-pr57233.c improve.  Also, even for shifts by scalar, if e.g. target
doesn't have shifts by scalar at all, and only has narrower vector by vector
shifts, it should handle this case too.


All that? Cool!


@@ -1455,11 +1507,83 @@ expand_vector_operations_1 (gimple_stmt_
{
  op = optab_for_tree_code (code, type, optab_scalar);

+ compute_type = get_compute_type (code, op, type);
+ if (compute_type == type)
+   return;
  /* The rtl expander will expand vector/scalar as vector/vector
-if necessary.  Don't bother converting the stmt here.  */
- if (optab_handler (op, TYPE_MODE (type)) == CODE_FOR_nothing
-  optab_handler (opv, TYPE_MODE (type)) != CODE_FOR_nothing)
+if necessary.  Pick one with wider vector type.  */
+ tree compute_vtype = get_compute_type (code, opv, type);
+ if (count_type_subparts (compute_vtype)
+  count_type_subparts (compute_type))
+   {
+ compute_type = compute_vtype;
+ op = opv;
+   }
+   }
+
+  if (code == LROTATE_EXPR || code == RROTATE_EXPR)
+   {
+ if (compute_type == NULL_TREE)
+   compute_type = get_compute_type (code, op, type);
+ if (compute_type == type)
return;
+ /* Before splitting vector rotates into scalar rotates,
+see if we can't use vector shifts and BIT_IOR_EXPR
+instead.  For vector by vector rotates we'd also
+need to check BIT_AND_EXPR and NEGATE_EXPR, punt there
+for now, fold doesn't seem to create such rotates anyway.  */
+ if (compute_type == TREE_TYPE (type)
+  !VECTOR_INTEGER_TYPE_P (TREE_TYPE (rhs2)))
+   {
+ optab oplv, opl, oprv, opr, opo;
+ oplv = optab_for_tree_code (LSHIFT_EXPR, type, optab_vector);
+ /* Right shift always has to be logical, no matter what
+signedness type has.  */
+ oprv = vlshr_optab;
+ opo = optab_for_tree_code (BIT_IOR_EXPR, type, optab_default);
+ opl = optab_for_tree_code (LSHIFT_EXPR, type, optab_scalar);
+ oprv = lshr_optab;
+ opr = optab_for_tree_code (RSHIFT_EXPR, type, optab_scalar);


Looks like there are some typos in there, you are assigning to oprv twice.

--
Marc Glisse


Re: [PATCH] Fix vector rotate regression (PR tree-optimization/57233)

2014-06-26 Thread Jakub Jelinek
On Thu, Jun 26, 2014 at 01:16:41PM +0200, Marc Glisse wrote:
 +  if (compute_type == TREE_TYPE (type)
 +   !VECTOR_INTEGER_TYPE_P (TREE_TYPE (rhs2)))
 +{
 +  optab oplv, opl, oprv, opr, opo;
 +  oplv = optab_for_tree_code (LSHIFT_EXPR, type, optab_vector);
 +  /* Right shift always has to be logical, no matter what
 + signedness type has.  */
 +  oprv = vlshr_optab;
 +  opo = optab_for_tree_code (BIT_IOR_EXPR, type, optab_default);
 +  opl = optab_for_tree_code (LSHIFT_EXPR, type, optab_scalar);
 +  oprv = lshr_optab;
 +  opr = optab_for_tree_code (RSHIFT_EXPR, type, optab_scalar);
 
 Looks like there are some typos in there, you are assigning to oprv twice.

Oops, fixed thusly.

2014-06-25  Jakub Jelinek  ja...@redhat.com

PR tree-optimization/57233
PR tree-optimization/61299
* tree-vect-generic.c (get_compute_type, count_type_subparts): New
functions.
(expand_vector_operations_1): Use them.  If {L,R}ROTATE_EXPR
would be lowered to scalar shifts, check if corresponding
shifts and vector BIT_IOR_EXPR are supported and don't lower
or lower just to narrower vector type in that case.
* expmed.c (expand_shift_1): Fix up handling of vector
shifts and rotates.

* gcc.dg/pr57233.c: New test.
* gcc.target/i386/pr57233.c: New test.
* gcc.target/i386/sse2-pr57233.c: New test.
* gcc.target/i386/avx-pr57233.c: New test.
* gcc.target/i386/avx2-pr57233.c: New test.
* gcc.target/i386/avx512f-pr57233.c: New test.
* gcc.target/i386/xop-pr57233.c: New test.

--- gcc/tree-vect-generic.c.jj  2014-06-26 11:00:00.477268305 +0200
+++ gcc/tree-vect-generic.c 2014-06-26 13:33:33.024069715 +0200
@@ -1334,15 +1334,67 @@ lower_vec_perm (gimple_stmt_iterator *gs
   update_stmt (gsi_stmt (*gsi));
 }
 
+/* Return type in which CODE operation with optab OP can be
+   computed.  */
+
+static tree
+get_compute_type (enum tree_code code, optab op, tree type)
+{
+  /* For very wide vectors, try using a smaller vector mode.  */
+  tree compute_type = type;
+  if (op
+   (!VECTOR_MODE_P (TYPE_MODE (type))
+ || optab_handler (op, TYPE_MODE (type)) == CODE_FOR_nothing))
+{
+  tree vector_compute_type
+   = type_for_widest_vector_mode (TREE_TYPE (type), op);
+  if (vector_compute_type != NULL_TREE
+  (TYPE_VECTOR_SUBPARTS (vector_compute_type)
+  TYPE_VECTOR_SUBPARTS (compute_type))
+  (optab_handler (op, TYPE_MODE (vector_compute_type))
+ != CODE_FOR_nothing))
+   compute_type = vector_compute_type;
+}
+
+  /* If we are breaking a BLKmode vector into smaller pieces,
+ type_for_widest_vector_mode has already looked into the optab,
+ so skip these checks.  */
+  if (compute_type == type)
+{
+  enum machine_mode compute_mode = TYPE_MODE (compute_type);
+  if (VECTOR_MODE_P (compute_mode))
+   {
+ if (op  optab_handler (op, compute_mode) != CODE_FOR_nothing)
+   return compute_type;
+ if (code == MULT_HIGHPART_EXPR
+  can_mult_highpart_p (compute_mode,
+ TYPE_UNSIGNED (compute_type)))
+   return compute_type;
+   }
+  /* There is no operation in hardware, so fall back to scalars.  */
+  compute_type = TREE_TYPE (type);
+}
+
+  return compute_type;
+}
+
+/* Helper function of expand_vector_operations_1.  Return number of
+   vector elements for vector types or 1 for other types.  */
+
+static inline int
+count_type_subparts (tree type)
+{
+  return VECTOR_TYPE_P (type) ? TYPE_VECTOR_SUBPARTS (type) : 1;
+}
+
 /* Process one statement.  If we identify a vector operation, expand it.  */
 
 static void
 expand_vector_operations_1 (gimple_stmt_iterator *gsi)
 {
   gimple stmt = gsi_stmt (*gsi);
-  tree lhs, rhs1, rhs2 = NULL, type, compute_type;
+  tree lhs, rhs1, rhs2 = NULL, type, compute_type = NULL_TREE;
   enum tree_code code;
-  enum machine_mode compute_mode;
   optab op = unknown_optab;
   enum gimple_rhs_class rhs_class;
   tree new_rhs;
@@ -1455,11 +1507,76 @@ expand_vector_operations_1 (gimple_stmt_
{
   op = optab_for_tree_code (code, type, optab_scalar);
 
+ compute_type = get_compute_type (code, op, type);
+ if (compute_type == type)
+   return;
  /* The rtl expander will expand vector/scalar as vector/vector
-if necessary.  Don't bother converting the stmt here.  */
- if (optab_handler (op, TYPE_MODE (type)) == CODE_FOR_nothing
-  optab_handler (opv, TYPE_MODE (type)) != CODE_FOR_nothing)
+if necessary.  Pick one with wider vector type.  */
+ tree compute_vtype = get_compute_type (code, opv, type);
+ if (count_type_subparts (compute_vtype)
+  count_type_subparts (compute_type))
+   {
+ 

Re: [PATCH] Fix vector rotate regression (PR tree-optimization/57233)

2014-06-26 Thread Richard Biener
On Thu, 26 Jun 2014, Jakub Jelinek wrote:

 On Thu, Jun 26, 2014 at 01:16:41PM +0200, Marc Glisse wrote:
  +if (compute_type == TREE_TYPE (type)
  + !VECTOR_INTEGER_TYPE_P (TREE_TYPE (rhs2)))
  +  {
  +optab oplv, opl, oprv, opr, opo;
  +oplv = optab_for_tree_code (LSHIFT_EXPR, type, optab_vector);
  +/* Right shift always has to be logical, no matter what
  +   signedness type has.  */
  +oprv = vlshr_optab;
  +opo = optab_for_tree_code (BIT_IOR_EXPR, type, optab_default);
  +opl = optab_for_tree_code (LSHIFT_EXPR, type, optab_scalar);
  +oprv = lshr_optab;
  +opr = optab_for_tree_code (RSHIFT_EXPR, type, optab_scalar);
  
  Looks like there are some typos in there, you are assigning to oprv twice.
 
 Oops, fixed thusly.

Ok.

Thanks,
Richard.

 2014-06-25  Jakub Jelinek  ja...@redhat.com
 
   PR tree-optimization/57233
   PR tree-optimization/61299
   * tree-vect-generic.c (get_compute_type, count_type_subparts): New
   functions.
   (expand_vector_operations_1): Use them.  If {L,R}ROTATE_EXPR
   would be lowered to scalar shifts, check if corresponding
   shifts and vector BIT_IOR_EXPR are supported and don't lower
   or lower just to narrower vector type in that case.
   * expmed.c (expand_shift_1): Fix up handling of vector
   shifts and rotates.
 
   * gcc.dg/pr57233.c: New test.
   * gcc.target/i386/pr57233.c: New test.
   * gcc.target/i386/sse2-pr57233.c: New test.
   * gcc.target/i386/avx-pr57233.c: New test.
   * gcc.target/i386/avx2-pr57233.c: New test.
   * gcc.target/i386/avx512f-pr57233.c: New test.
   * gcc.target/i386/xop-pr57233.c: New test.
 
 --- gcc/tree-vect-generic.c.jj2014-06-26 11:00:00.477268305 +0200
 +++ gcc/tree-vect-generic.c   2014-06-26 13:33:33.024069715 +0200
 @@ -1334,15 +1334,67 @@ lower_vec_perm (gimple_stmt_iterator *gs
update_stmt (gsi_stmt (*gsi));
  }
  
 +/* Return type in which CODE operation with optab OP can be
 +   computed.  */
 +
 +static tree
 +get_compute_type (enum tree_code code, optab op, tree type)
 +{
 +  /* For very wide vectors, try using a smaller vector mode.  */
 +  tree compute_type = type;
 +  if (op
 +   (!VECTOR_MODE_P (TYPE_MODE (type))
 +   || optab_handler (op, TYPE_MODE (type)) == CODE_FOR_nothing))
 +{
 +  tree vector_compute_type
 + = type_for_widest_vector_mode (TREE_TYPE (type), op);
 +  if (vector_compute_type != NULL_TREE
 +(TYPE_VECTOR_SUBPARTS (vector_compute_type)
 +TYPE_VECTOR_SUBPARTS (compute_type))
 +(optab_handler (op, TYPE_MODE (vector_compute_type))
 +   != CODE_FOR_nothing))
 + compute_type = vector_compute_type;
 +}
 +
 +  /* If we are breaking a BLKmode vector into smaller pieces,
 + type_for_widest_vector_mode has already looked into the optab,
 + so skip these checks.  */
 +  if (compute_type == type)
 +{
 +  enum machine_mode compute_mode = TYPE_MODE (compute_type);
 +  if (VECTOR_MODE_P (compute_mode))
 + {
 +   if (op  optab_handler (op, compute_mode) != CODE_FOR_nothing)
 + return compute_type;
 +   if (code == MULT_HIGHPART_EXPR
 +can_mult_highpart_p (compute_mode,
 +   TYPE_UNSIGNED (compute_type)))
 + return compute_type;
 + }
 +  /* There is no operation in hardware, so fall back to scalars.  */
 +  compute_type = TREE_TYPE (type);
 +}
 +
 +  return compute_type;
 +}
 +
 +/* Helper function of expand_vector_operations_1.  Return number of
 +   vector elements for vector types or 1 for other types.  */
 +
 +static inline int
 +count_type_subparts (tree type)
 +{
 +  return VECTOR_TYPE_P (type) ? TYPE_VECTOR_SUBPARTS (type) : 1;
 +}
 +
  /* Process one statement.  If we identify a vector operation, expand it.  */
  
  static void
  expand_vector_operations_1 (gimple_stmt_iterator *gsi)
  {
gimple stmt = gsi_stmt (*gsi);
 -  tree lhs, rhs1, rhs2 = NULL, type, compute_type;
 +  tree lhs, rhs1, rhs2 = NULL, type, compute_type = NULL_TREE;
enum tree_code code;
 -  enum machine_mode compute_mode;
optab op = unknown_optab;
enum gimple_rhs_class rhs_class;
tree new_rhs;
 @@ -1455,11 +1507,76 @@ expand_vector_operations_1 (gimple_stmt_
   {
op = optab_for_tree_code (code, type, optab_scalar);
  
 +   compute_type = get_compute_type (code, op, type);
 +   if (compute_type == type)
 + return;
 /* The rtl expander will expand vector/scalar as vector/vector
 -  if necessary.  Don't bother converting the stmt here.  */
 -   if (optab_handler (op, TYPE_MODE (type)) == CODE_FOR_nothing
 -optab_handler (opv, TYPE_MODE (type)) != CODE_FOR_nothing)
 +  if necessary.  Pick one with wider vector type.  */
 +   tree compute_vtype = get_compute_type (code, opv, type);
 +   if (count_type_subparts