Re: [PATCH] Fix result for conditional reductions matching at index 0 (PR tree-optimization/80631)

2017-12-11 Thread Richard Biener
On Mon, 11 Dec 2017, Jakub Jelinek wrote:

> On Mon, Dec 11, 2017 at 06:00:11PM +0100, Kilian Verhetsel wrote:
> > Jakub Jelinek  writes:
> > > Of course it can be done efficiently, what we care most is that the body 
> > > of
> > > the vectorized loop is efficient.
> > 
> > That's fair, I was looking at the x86 assembly being generated when a single
> > vectorized iteration was enough (because that is the context in which I
> > first encountered this bug):
> > 
> > int f(unsigned int *x, unsigned int k) {
> >   unsigned int result = 8;
> >   for (unsigned int i = 0; i < 8; i++) {
> > if (x[i] == k) result = i;
> >   }
> >   return result;
> > }
> > 
> > where the vpand instruction this generates would have to be replaced
> > with a variable blend if the default value weren't 0 — although I had
> > not realized even SSE4.1 on x86 includes such an instruction, making
> > this point less relevant.
> 
> So, here is my version of the patch, independent from your change.
> As I said, your patch is still highly valueable if it will be another
> STMT_VINFO_VEC_REDUCTION_TYPE kind to be used for the cases like the
> above testcase, where base is equal to TYPE_MIN_VALUE, or future improvement
> of base being variable, but TYPE_OVERFLOW_UNDEFINED iterator, where all we
> need is that the maximum number of iterations is smaller than the maximum
> of the type we use for the reduction phi.
> 
> The patch handles also negative steps, though for now only on signed type
> (for unsigned it can't be really negative, but perhaps we could treat
> unsigned values with the msb set as if they were negative and consider
> overflows in that direction).
> 
> Bootstrapped/regtested on x86_64-linux, i686-linux, powerpc64le-linux,
> bootstrapped on powerpc64-linux, regtest there ongoing.  Ok for trunk?
> 
> The patch prefers to emit what we were emitting if possible (i.e. zero
> value for the COND_EXPR never hit) - building a zero vector is usually
> cheaper than any other; if that is not possible, checks if initial_def
> can be used for that value - then we can avoid the
> res == induc_val ? initial_def : res;
> conditional move; if even that is not possible, attempts to use any other
> value.  If no value can be found, it for now uses COND_REDUCTION, which is
> more expensive, but correct.

Ok.

Thanks,
Richard.

> 2017-12-11  Jakub Jelinek  
> 
>   PR tree-optimization/80631
>   * tree-vect-loop.c (get_initial_def_for_reduction): Fix comment typo.
>   (vect_create_epilog_for_reduction): Add INDUC_VAL and INDUC_CODE
>   arguments, for INTEGER_INDUC_COND_REDUCTION use INDUC_VAL instead of
>   hardcoding zero as the value if COND_EXPR is never true.  For
>   INTEGER_INDUC_COND_REDUCTION don't emit the final COND_EXPR if
>   INDUC_VAL is equal to INITIAL_DEF, and use INDUC_CODE instead of
>   hardcoding MAX_EXPR as the reduction operation.
>   (is_nonwrapping_integer_induction): Allow negative step.
>   (vectorizable_reduction): Compute INDUC_VAL and INDUC_CODE for
>   vect_create_epilog_for_reduction, if no value is suitable, don't
>   use INTEGER_INDUC_COND_REDUCTION for now.  Formatting fixes.
> 
>   * gcc.dg/vect/pr80631-1.c: New test.
>   * gcc.dg/vect/pr80631-2.c: New test.
>   * gcc.dg/vect/pr65947-13.c: Expect integer induc cond reduction
>   vectorization.
> 
> --- gcc/tree-vect-loop.c.jj   2017-12-11 14:57:38.0 +0100
> +++ gcc/tree-vect-loop.c  2017-12-11 16:59:06.930720928 +0100
> @@ -4034,7 +4034,7 @@ get_initial_def_for_reduction (gimple *s
>  case MULT_EXPR:
>  case BIT_AND_EXPR:
>{
> -/* ADJUSMENT_DEF is NULL when called from
> +/* ADJUSTMENT_DEF is NULL when called from
> vect_create_epilog_for_reduction to vectorize double reduction.  
> */
>  if (adjustment_def)
> *adjustment_def = init_val;
> @@ -4283,6 +4283,11 @@ get_initial_defs_for_reduction (slp_tree
> DOUBLE_REDUC is TRUE if double reduction phi nodes should be handled.
> SLP_NODE is an SLP node containing a group of reduction statements. The 
>   first one in this group is STMT.
> +   INDUC_VAL is for INTEGER_INDUC_COND_REDUCTION the value to use for the 
> case
> + when the COND_EXPR is never true in the loop.  For MAX_EXPR, it needs to
> + be smaller than any value of the IV in the loop, for MIN_EXPR larger 
> than
> + any value of the IV in the loop.
> +   INDUC_CODE is the code for epilog reduction if 
> INTEGER_INDUC_COND_REDUCTION.
>  
> This function:
> 1. Creates the reduction def-use cycles: sets the arguments for 
> @@ -4330,7 +4335,8 @@ vect_create_epilog_for_reduction (vec vec reduction_phis,
>bool double_reduc, 
> slp_tree slp_node,
> -   slp_instance slp_node_instance)
> + 

Re: [PATCH] Fix result for conditional reductions matching at index 0

2017-12-11 Thread Jakub Jelinek
On Mon, Dec 11, 2017 at 06:00:11PM +0100, Kilian Verhetsel wrote:
> Jakub Jelinek  writes:
> > Of course it can be done efficiently, what we care most is that the body of
> > the vectorized loop is efficient.
> 
> That's fair, I was looking at the x86 assembly being generated when a single
> vectorized iteration was enough (because that is the context in which I
> first encountered this bug):
> 
> int f(unsigned int *x, unsigned int k) {
>   unsigned int result = 8;
>   for (unsigned int i = 0; i < 8; i++) {
> if (x[i] == k) result = i;
>   }
>   return result;
> }
> 
> where the vpand instruction this generates would have to be replaced
> with a variable blend if the default value weren't 0 — although I had
> not realized even SSE4.1 on x86 includes such an instruction, making
> this point less relevant.

See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80631#c6 where I've
attached so far untested prototype.  If it is added before your patch
makes it in, your patch would start by introducing another kind
(say SIMPLE_INTEGER_INDUC_COND_REDUCTION) and would use that for the
spots that are handled by the PR80631 patch as INTEGER_INDUC_COND_REDUCTION
right now and your code for the rest.  E.g. the above testcase with my
patch, because i is unsigned and base is the minimum of the type is emitted as
COND_REDUCTION, which is what your patch would improve.

> > Another thing is, as your patch is quite large, we need a copyright
> > assignment for the changes before we can accept it, see
> > https://gcc.gnu.org/contribute.html for details.
> >
> > If you are already covered by an assignment of some company, please tell
> > us which one it is, otherwise contact us and we'll get you the needed
> > forms.
> 
> I am not covered by any copyright assignment yet. Do I need to send you
> any additional information?

I'll send it offlist.

Jakub


Re: [PATCH] Fix result for conditional reductions matching at index 0

2017-12-11 Thread Kilian Verhetsel

Jakub Jelinek  writes:
> Of course it can be done efficiently, what we care most is that the body of
> the vectorized loop is efficient.

That's fair, I was looking at the x86 assembly being generated when a single
vectorized iteration was enough (because that is the context in which I
first encountered this bug):

int f(unsigned int *x, unsigned int k) {
  unsigned int result = 8;
  for (unsigned int i = 0; i < 8; i++) {
if (x[i] == k) result = i;
  }
  return result;
}

where the vpand instruction this generates would have to be replaced
with a variable blend if the default value weren't 0 — although I had
not realized even SSE4.1 on x86 includes such an instruction, making
this point less relevant.

> Thanks, it applies cleanly now
> > +  else if ((STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION
> > +   || (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
> > +   == INTEGER_INDUC_COND_REDUCTION))
> > +  && reduc_fn == IFN_LAST)»
>
> contains a character at the end of line that makes it not to compile.

My bad, I must have added this when I opened the patch file itself to
inspect it...

> Another thing is, as your patch is quite large, we need a copyright
> assignment for the changes before we can accept it, see
> https://gcc.gnu.org/contribute.html for details.
>
> If you are already covered by an assignment of some company, please tell
> us which one it is, otherwise contact us and we'll get you the needed
> forms.

I am not covered by any copyright assignment yet. Do I need to send you
any additional information?


Re: [PATCH] Fix result for conditional reductions matching at index 0

2017-12-11 Thread Jakub Jelinek
On Mon, Dec 11, 2017 at 02:11:34PM +0100, Jakub Jelinek wrote:
> Thanks, it applies cleanly now
> > +  else if ((STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION
> > +   || (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
> > +   == INTEGER_INDUC_COND_REDUCTION))
> > +  && reduc_fn == IFN_LAST)»
> 
> contains a character at the end of line that makes it not to compile.

Another thing is, as your patch is quite large, we need a copyright
assignment for the changes before we can accept it, see
https://gcc.gnu.org/contribute.html for details.

If you are already covered by an assignment of some company, please tell
us which one it is, otherwise contact us and we'll get you the needed
forms.

Jakub


Re: [PATCH] Fix result for conditional reductions matching at index 0

2017-12-11 Thread Jakub Jelinek
On Mon, Dec 11, 2017 at 11:56:55AM +0100, Kilian Verhetsel wrote:
> Jakub Jelinek  writes:
> > As it doesn't apply, I can't easily check what the patch generates
> > on the PR80631 testcases vs. my thoughts on that; though, if it emits
> > something more complicated for the simple cases, perhaps we could improve
> > those (not handle it like COND_REDUCTION, but instead as former
> > INTEGER_INDUC_COND_REDUCTION and just use a different constant instead of 0
> > if 0 isn't usable for the condition never matched value.
> 
> While you could use values different from 0, I'm not sure this can be
> done as efficiently.  0 is convenient because a single bitwise-and
> between the index vector and the condition will set lanes that do not
> contain a match to 0.

Of course it can be done efficiently, what we care most is that the body of
the vectorized loop is efficient.  Whether we choose -1, 0 or 124 as the
COND_EXPR not ever meant value matters only before that loop (when we need
to load that into a register holding vector of all those constants) and
then a scalar comparison on the REDUC_* result.  Load of -1 vector on some 
targets
is as expensive as load of 0, for arbitrary value worst case it is one
memory load compared to a specialized zero register (or set all bits)
instruction.  On the other side, by not using any offsetted iteration var,
one can reuse the vector register that holds the IV, which can be used in
some loops too and thus decrease register pressure.
And while comparison against 0 is sometimes one scalar insn
cheaper than comparison against other value, if the insn producing it
already sets the flags, I doubt it is the case here, so it is exactly the
same cost.  Not to mention that in your patch you are instead subtracting
one in the scalar code.

> Jakub Jelinek  writes:
> > First of all, I fail to see why we don't handle negative step,
> > that can be done with REDUC_MIN instead of REDUC_MAX.
> 
> That would not work without first using values different from 0 to
> indicate the absence of matches (except in cases where all indices are
> negative), which I assume is why the test was there in the first place.
> 
> Below is the patch with fixed formatting and changes to make it apply
> cleanly to r255537 (as far as I can tell this was simply caused by some
> variables and constants having been renamed).

Thanks, it applies cleanly now
> +  else if ((STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION
> + || (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
> + == INTEGER_INDUC_COND_REDUCTION))
> +&& reduc_fn == IFN_LAST)»

contains a character at the end of line that makes it not to compile.

Trying to understand your patch, here is the difference with your patch
between additional:
--- tree-vect-loop.c2017-12-11 13:39:35.619122907 +0100
+++ tree-vect-loop.c2017-12-11 13:35:27.0 +0100
@@ -6021,8 +6021,8 @@ vectorizable_reduction (gimple *stmt, gi
dump_printf_loc (MSG_NOTE, vect_location,
 "condition expression based on "
 "integer induction.\n");
- STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
-   = INTEGER_INDUC_COND_REDUCTION;
+/*   STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
+   = INTEGER_INDUC_COND_REDUCTION; */
}
so that COND_REDUCTION is used, and the case with
INTEGER_INDUC_COND_REDUCTION with your patch on:
int v[256] = { 77, 1, 79, 3, 4, 5, 6, 7 };

__attribute__((noipa)) void
foo ()
{
  int k, r = -1;
  for (k = 0; k < 256; k++)
if (v[k] == 77)
  r = k;
  if (r != 0)
__builtin_abort ();
}

   vect_cst__21 = { 8, 8, 8, 8, 8, 8, 8, 8 };
   vect_cst__28 = { 77, 77, 77, 77, 77, 77, 77, 77 };
+  vect_cst__30 = { -1, -1, -1, -1, -1, -1, -1, -1 };
 
[local count: 139586436]:
   # k_12 = PHI 
   # r_13 = PHI 
   # ivtmp_11 = PHI 
   # vect_vec_iv_.0_22 = PHI 
-  # vect_r_3.1_24 = PHI 
+  # vect_r_3.1_24 = PHI 
   # vectp_v.2_25 = PHI 
-  # ivtmp_30 = PHI 
-  # _32 = PHI <_33(9), { 0, 0, 0, 0, 0, 0, 0, 0 }(2)>
-  # ivtmp_43 = PHI 
+  # ivtmp_31 = PHI 
+  # _33 = PHI <_34(9), { 0, 0, 0, 0, 0, 0, 0, 0 }(2)>
+  # ivtmp_41 = PHI 
   vect_vec_iv_.0_23 = vect_vec_iv_.0_22 + vect_cst__21;
   vect__1.4_27 = MEM[(int *)vectp_v.2_25];
   _1 = v[k_12];
   vect_r_3.6_29 = VEC_COND_EXPR ;
   r_3 = _1 == 77 ? k_12 : r_13;
   k_8 = k_12 + 1;
   ivtmp_2 = ivtmp_11 - 1;
   vectp_v.2_26 = vectp_v.2_25 + 32;
-  _33 = VEC_COND_EXPR ;
-  ivtmp_31 = ivtmp_30 + { 8, 8, 8, 8, 8, 8, 8, 8 };
-  ivtmp_44 = ivtmp_43 + 1;
-  if (ivtmp_44 < 32)
+  _34 = VEC_COND_EXPR ;
+  ivtmp_32 

Re: [PATCH] Fix result for conditional reductions matching at index 0

2017-12-11 Thread Kilian Verhetsel

Hello,

Jakub Jelinek  writes:
> As it doesn't apply, I can't easily check what the patch generates
> on the PR80631 testcases vs. my thoughts on that; though, if it emits
> something more complicated for the simple cases, perhaps we could improve
> those (not handle it like COND_REDUCTION, but instead as former
> INTEGER_INDUC_COND_REDUCTION and just use a different constant instead of 0
> if 0 isn't usable for the condition never matched value.

While you could use values different from 0, I'm not sure this can be
done as efficiently.  0 is convenient because a single bitwise-and
between the index vector and the condition will set lanes that do not
contain a match to 0.

Jakub Jelinek  writes:
> First of all, I fail to see why we don't handle negative step,
> that can be done with REDUC_MIN instead of REDUC_MAX.

That would not work without first using values different from 0 to
indicate the absence of matches (except in cases where all indices are
negative), which I assume is why the test was there in the first place.

Below is the patch with fixed formatting and changes to make it apply
cleanly to r255537 (as far as I can tell this was simply caused by some
variables and constants having been renamed).

2017-11-21  Kilian Verhetsel  

gcc/ChangeLog:
PR testsuite/81179
* tree-vect-loop.c (vect_create_epilog_for_reduction): Fix the
returned value for INTEGER_INDUC_COND_REDUCTION whose last match
occurred at index 0.
(vectorizable_reduction): For
INTEGER_INDUC_COND_REDUCTION, pass the PHI statement that sets
the induction variable to the code generating the epilogue and
check that no overflow will occur.

gcc/testsuite/ChangeLog:
* gcc.dg/vect/vect-iv-cond-reduc-overflow-1.c: New test for
overflows when compiling conditional reductions.
* gcc.dg/vect/vect-iv-cond-reduc-overflow-2.c: Likewise.

Index: gcc/testsuite/gcc.dg/vect/vect-iv-cond-reduc-overflow-1.c
===
--- gcc/testsuite/gcc.dg/vect/vect-iv-cond-reduc-overflow-1.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/vect/vect-iv-cond-reduc-overflow-1.c	(working copy)
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_condition } */
+
+#include "tree-vect.h"
+#include 
+
+extern void abort (void) __attribute__ ((noreturn));
+
+#define N UINT_MAX
+
+/* Condition reduction with maximum possible loop size.  Will fail to vectorize
+   because values in the index vector will overflow.  */
+unsigned int
+condition_reduction (unsigned int *a, unsigned int min_v)
+{
+  unsigned int last = -72;
+
+  for (unsigned int i = 0; i < N; i++)
+if (a[i] < min_v)
+  last = i;
+
+  return last;
+}
+
+/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
+/* { dg-final { scan-tree-dump "condition expression based on integer induction." "vect" } } */
+/* { dg-final { scan-tree-dump "loop size is greater than data size" "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-iv-cond-reduc-overflow-2.c
===
--- gcc/testsuite/gcc.dg/vect/vect-iv-cond-reduc-overflow-2.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/vect/vect-iv-cond-reduc-overflow-2.c	(working copy)
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_condition } */
+
+#include "tree-vect.h"
+#include 
+
+extern void abort (void) __attribute__ ((noreturn));
+
+#define N (UINT_MAX - 1)
+
+/* Condition reduction with maximum possible loop size, minus one.  This should
+   still be vectorized correctly.  */
+unsigned int
+condition_reduction (unsigned int *a, unsigned int min_v)
+{
+  unsigned int last = -72;
+
+  for (unsigned int i = 0; i < N; i++)
+if (a[i] < min_v)
+  last = i;
+
+  return last;
+}
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+/* { dg-final { scan-tree-dump "condition expression based on integer induction." "vect" } } */
Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c	(revision 255537)
+++ gcc/tree-vect-loop.c	(working copy)
@@ -4325,7 +4325,7 @@ get_initial_defs_for_reduction (slp_tree slp_node,
 
 static void
 vect_create_epilog_for_reduction (vec vect_defs, gimple *stmt,
-  gimple *reduc_def_stmt,
+  gimple *reduc_def_stmt, gimple *induct_stmt,
   int ncopies, internal_fn reduc_fn,
   vec reduction_phis,
   bool double_reduc, 
@@ -4486,7 +4486,9 @@ vect_create_epilog_for_reduction (vec vect_d
  The first match will be a 1 to allow 0 to be used for non-matching
  indexes.  If there are no matches at all then the vector will be all
  zeroes.  */
-  if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION)
+  if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION
+  || 

Re: [PATCH] Fix result for conditional reductions matching at index 0

2017-12-08 Thread Jakub Jelinek
Hi!

What is going on with this patch, will anybody commit it?

This is also http://gcc.gnu.org/PR80631 apparently.

The patch doesn't apply cleanly to current trunk due to the
reduc_code -> reduc_fn changes.

As it doesn't apply, I can't easily check what the patch generates
on the PR80631 testcases vs. my thoughts on that; though, if it emits
something more complicated for the simple cases, perhaps we could improve
those (not handle it like COND_REDUCTION, but instead as former
INTEGER_INDUC_COND_REDUCTION and just use a different constant instead of 0
if 0 isn't usable for the condition never matched value.  We could
check the value from the loop preheader edge of the induc_stmt phi
and if it is constant and for REDUC_MAX (does that imply always positive
step?) larger than minimum value of the type, we can use the minimum of
the type as the value instead of zero.  If REDUC_MIN and the initial constant
is smaller than the maximum value of the type, we could use the maximum.
Otherwise punt to what you're doing.  Or instead of minimum or maximum check
the value of initial_def, if it is constant, if that value is usable, we
could even avoid the final COND_EXPR.

Another thing is that is_nonwrapping_integer_induction has:
  /* Make sure the loop is integer based.  */
  if (TREE_CODE (base) != INTEGER_CST
  || TREE_CODE (step) != INTEGER_CST)
return false;

  /* Check that the induction increments.  */
  if (tree_int_cst_sgn (step) == -1)
return false;

  if (TYPE_OVERFLOW_UNDEFINED (lhs_type))
return true;

First of all, I fail to see why we don't handle negative step,
that can be done with REDUC_MIN instead of REDUC_MAX.

And, I also fail to see why we need to require INTEGER_CST base if
TYPE_OVERFLOW_UNDEFINED (lhs_type), in that case we know it will not wrap,
so all we care about is if step * ni won't cover all iterations and we can
use some value (type minimum or maximum) to denote the condition never
true case.

On Wed, Nov 22, 2017 at 05:57:08PM +0100, Kilian Verhetsel wrote:
> 2017-11-21  Kilian Verhetsel 

Missing space before <.

> gcc/ChangeLog:
>   PR testsuite/81179
>   * tree-vect-loop.c
> (vect_create_epilog_for_reduction): Fix the returned value for

8 spaces instead of a tab, note the (vect_create_epilog_for_reduction):
should just go after space right after the filename.

>   INTEGER_INDUC_COND_REDUCTION whose last match occurred at
>   index 0.
>   (vectorizable_reduction): For INTEGER_INDUC_COND_REDUCTION,
>   pass the PHI statement that sets the induction variable to the
>   code generating the epilogue and check that no overflow will
>   occur.
> 
> gcc/testsuite/ChangeLog:
>   * gcc.dg/vect/vect-iv-cond-reduc-overflow-1.c: New test for
> overflows when compiling conditional reductions.
> * gcc.dg/vect/vect-iv-cond-reduc-overflow-2.c: Likewise.

Again, spaces instead of tab twice.

> -  if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION)
> +  if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION
> +  || STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
> +  == INTEGER_INDUC_COND_REDUCTION)

Formatting, == should be below STMT_VINFO on the previous line, for emacs
users perhaps even better surrounded by ()s, like:


  if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION
  || (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
  == INTEGER_INDUC_COND_REDUCTION))

(happens many times in the patch).

Jakub


Re: [PATCH] Fix result for conditional reductions matching at index 0

2017-11-23 Thread Richard Biener
On Thu, Nov 23, 2017 at 10:51 AM, Alan Hayward  wrote:
>
>> On 22 Nov 2017, at 16:57, Kilian Verhetsel  
>> wrote:
>>
>>
>> Thank you both for your comments.
>>
>> I have added the check to ensure the index vector won't cause an
>> overflow. I also added tests to the testsuite in order to check that the
>> loop is vectorized for UINT_MAX - 1 iterations but not UINT_MAX
>> iterations. I was not able to write code that triggers
>> INTEGER_INDUC_COND_REDUCTION when using char or other smaller types
>> (changing the types of last, min_v and a to something else causes
>> COND_REDUCTION to be used instead), so these tests are only compiled and
>> not executed.
>
> I had similar problems when playing around with -14.c, but didn’t have chance 
> to
> investigate why. Possibly worth raising a bug to mentioning there is a missed
> optimisation. It’d be nice to figure out why.
>
>>
>> I also moved an instruction that generates a vector of zeroes (used for
>> COND_REDUCTION) in the branch of code run only for COND_REDUCTION, this
>> should remove the unused vector that Alan noticed.
>
> Patch is ok for me.

Fine with me as well.

Thanks for taking care of this bug.

Richard.

>
> Alan.
>


Re: [PATCH] Fix result for conditional reductions matching at index 0

2017-11-23 Thread Alan Hayward

> On 22 Nov 2017, at 16:57, Kilian Verhetsel  
> wrote:
> 
> 
> Thank you both for your comments.
> 
> I have added the check to ensure the index vector won't cause an
> overflow. I also added tests to the testsuite in order to check that the
> loop is vectorized for UINT_MAX - 1 iterations but not UINT_MAX
> iterations. I was not able to write code that triggers
> INTEGER_INDUC_COND_REDUCTION when using char or other smaller types
> (changing the types of last, min_v and a to something else causes
> COND_REDUCTION to be used instead), so these tests are only compiled and
> not executed.

I had similar problems when playing around with -14.c, but didn’t have chance to
investigate why. Possibly worth raising a bug to mentioning there is a missed
optimisation. It’d be nice to figure out why.

> 
> I also moved an instruction that generates a vector of zeroes (used for
> COND_REDUCTION) in the branch of code run only for COND_REDUCTION, this
> should remove the unused vector that Alan noticed.

Patch is ok for me.


Alan.



Re: [PATCH] Fix result for conditional reductions matching at index 0

2017-11-22 Thread Kilian Verhetsel

Thank you both for your comments.

I have added the check to ensure the index vector won't cause an
overflow. I also added tests to the testsuite in order to check that the
loop is vectorized for UINT_MAX - 1 iterations but not UINT_MAX
iterations. I was not able to write code that triggers
INTEGER_INDUC_COND_REDUCTION when using char or other smaller types
(changing the types of last, min_v and a to something else causes
COND_REDUCTION to be used instead), so these tests are only compiled and
not executed.

I also moved an instruction that generates a vector of zeroes (used for
COND_REDUCTION) in the branch of code run only for COND_REDUCTION, this
should remove the unused vector that Alan noticed.

2017-11-21  Kilian Verhetsel 

gcc/ChangeLog:
PR testsuite/81179
* tree-vect-loop.c
(vect_create_epilog_for_reduction): Fix the returned value for
INTEGER_INDUC_COND_REDUCTION whose last match occurred at
index 0.
(vectorizable_reduction): For INTEGER_INDUC_COND_REDUCTION,
pass the PHI statement that sets the induction variable to the
code generating the epilogue and check that no overflow will
occur.

gcc/testsuite/ChangeLog:
* gcc.dg/vect/vect-iv-cond-reduc-overflow-1.c: New test for
overflows when compiling conditional reductions.
* gcc.dg/vect/vect-iv-cond-reduc-overflow-2.c: Likewise.

Index: gcc/testsuite/gcc.dg/vect/vect-iv-cond-reduc-overflow-1.c
===
--- gcc/testsuite/gcc.dg/vect/vect-iv-cond-reduc-overflow-1.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/vect/vect-iv-cond-reduc-overflow-1.c	(working copy)
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_condition } */
+
+#include "tree-vect.h"
+#include 
+
+extern void abort (void) __attribute__ ((noreturn));
+
+#define N UINT_MAX
+
+/* Condition reduction with maximum possible loop size.  Will fail to vectorize
+   because values in the index vector will overflow.  */
+unsigned int
+condition_reduction (unsigned int *a, unsigned int min_v)
+{
+  unsigned int last = -72;
+
+  for (unsigned int i = 0; i < N; i++)
+if (a[i] < min_v)
+  last = i;
+
+  return last;
+}
+
+/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
+/* { dg-final { scan-tree-dump "condition expression based on integer induction." "vect" } } */
+/* { dg-final { scan-tree-dump "loop size is greater than data size" "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-iv-cond-reduc-overflow-2.c
===
--- gcc/testsuite/gcc.dg/vect/vect-iv-cond-reduc-overflow-2.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/vect/vect-iv-cond-reduc-overflow-2.c	(working copy)
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_condition } */
+
+#include "tree-vect.h"
+#include 
+
+extern void abort (void) __attribute__ ((noreturn));
+
+#define N (UINT_MAX - 1)
+
+/* Condition reduction with maximum possible loop size, minus one.  This should
+   still be vectorized correctly.  */
+unsigned int
+condition_reduction (unsigned int *a, unsigned int min_v)
+{
+  unsigned int last = -72;
+
+  for (unsigned int i = 0; i < N; i++)
+if (a[i] < min_v)
+  last = i;
+
+  return last;
+}
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+/* { dg-final { scan-tree-dump "condition expression based on integer induction." "vect" } } */
Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c	(revision 254996)
+++ gcc/tree-vect-loop.c	(working copy)
@@ -4316,7 +4316,7 @@ get_initial_defs_for_reduction (slp_tree slp_node,
 
 static void
 vect_create_epilog_for_reduction (vec vect_defs, gimple *stmt,
-  gimple *reduc_def_stmt,
+  gimple *reduc_def_stmt, gimple *induct_stmt,
   int ncopies, enum tree_code reduc_code,
   vec reduction_phis,
   bool double_reduc, 
@@ -4477,7 +4477,9 @@ vect_create_epilog_for_reduction (vec vect_d
  The first match will be a 1 to allow 0 to be used for non-matching
  indexes.  If there are no matches at all then the vector will be all
  zeroes.  */
-  if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION)
+  if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION
+  || STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
+  == INTEGER_INDUC_COND_REDUCTION)
 {
   tree indx_before_incr, indx_after_incr;
   int nunits_out = TYPE_VECTOR_SUBPARTS (vectype);
@@ -4754,7 +4756,9 @@ vect_create_epilog_for_reduction (vec vect_d
   else
 new_phi_result = PHI_RESULT (new_phis[0]);
 
-  if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION
+  if ((STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION
+   || STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
+   == INTEGER_INDUC_COND_REDUCTION)
 

Re: [PATCH] Fix result for conditional reductions matching at index 0

2017-11-22 Thread Richard Biener
On Wed, Nov 22, 2017 at 12:01 PM, Alan Hayward  wrote:
>
>> On 22 Nov 2017, at 09:14, Richard Biener  wrote:
>>
>> On Tue, Nov 21, 2017 at 5:43 PM, Kilian Verhetsel
>>  wrote:
>>>
 This is PR81179 I think, please mention that in the changelog.
>>>
>>> Correct, my bad for missing that.
>>>
 This unconditionally pessimizes code even if there is no valid index
 zero, right?
>>>
>>> Almost, since for a loop such as:
>>>
>>>  #define OFFSET 1
>>>  unsigned int find(const unsigned int *a, unsigned int v) {
>>>unsigned int result = 120;
>>>for (unsigned int i = OFFSET; i < 32+OFFSET; i++) {
>>>  if (a[i-OFFSET] == v) result = i;
>>>}
>>>return result;
>>>  }
>>>
>>> the index i will match the contents of the index vector used here ---
>>> but this does indeed pessimize the code generated for, say, OFFSET
>>> = 2. It is probably more sensible to use the existing code path in those
>>> situations.
>>>
 The issue with the COND_REDUCITION index vector is overflow IIRC.
>>>
>>> Does that mean such overflows can already manifest themselves for
>>> regular COND_REDUCTION? I had assumed sufficient checks were already in
>>> place because of the presence of the is_nonwrapping_integer_induction
>>> test.
>>
>> But only if we need the index vector?  The patch looked like you're changing
>> how other modes are handled (in my look I didn't make myself familiar with
>> the various modes again...).  Anyway, Alan hopefully remembers what he
>> coded so I'll defer to him here.
>>
>> If Alan is happy with the patch consider it approved.
>>
>
> Richard’s right with his question.
>
> The optimisation needs to fail if the number of interactions of the loop + 1 
> doesn’t
> fit into the data type used for the result.
>
> I took the test pr65947-14.c
> First I set N to 0x-1. This compiled and vectorised. That’s ok.
> Now if I set N to 0x it still vectorises, but this should fail.
>
> Compare to pr65947-14.c where we set  last = a[I]; inside the if.
> When set N to 0x-1, it compiled and vectorised. That’s ok.
> When set N to 0x it fails to vectorise with the message
> "loop size is greater than data size”.
>
> Looks like you might just need to add the one check.
>
> Also see pr65947-9.c which uses the slightly more useful char indexes.

Some extra test variants might be indeed useful to really cover all corner
cases.  The PR had some "trivial" patch for the issue by me that I considered
a too big hammer.

I wonder if other compilers pull another trick to deal with the situation.

Richard.

>
> Alan.
>
>
>


Re: [PATCH] Fix result for conditional reductions matching at index 0

2017-11-22 Thread Alan Hayward

> On 22 Nov 2017, at 09:14, Richard Biener  wrote:
> 
> On Tue, Nov 21, 2017 at 5:43 PM, Kilian Verhetsel
>  wrote:
>> 
>>> This is PR81179 I think, please mention that in the changelog.
>> 
>> Correct, my bad for missing that.
>> 
>>> This unconditionally pessimizes code even if there is no valid index
>>> zero, right?
>> 
>> Almost, since for a loop such as:
>> 
>>  #define OFFSET 1
>>  unsigned int find(const unsigned int *a, unsigned int v) {
>>unsigned int result = 120;
>>for (unsigned int i = OFFSET; i < 32+OFFSET; i++) {
>>  if (a[i-OFFSET] == v) result = i;
>>}
>>return result;
>>  }
>> 
>> the index i will match the contents of the index vector used here ---
>> but this does indeed pessimize the code generated for, say, OFFSET
>> = 2. It is probably more sensible to use the existing code path in those
>> situations.
>> 
>>> The issue with the COND_REDUCITION index vector is overflow IIRC.
>> 
>> Does that mean such overflows can already manifest themselves for
>> regular COND_REDUCTION? I had assumed sufficient checks were already in
>> place because of the presence of the is_nonwrapping_integer_induction
>> test.
> 
> But only if we need the index vector?  The patch looked like you're changing
> how other modes are handled (in my look I didn't make myself familiar with
> the various modes again...).  Anyway, Alan hopefully remembers what he
> coded so I'll defer to him here.
> 
> If Alan is happy with the patch consider it approved.
> 

Richard’s right with his question.

The optimisation needs to fail if the number of interactions of the loop + 1 
doesn’t
fit into the data type used for the result.

I took the test pr65947-14.c
First I set N to 0x-1. This compiled and vectorised. That’s ok.
Now if I set N to 0x it still vectorises, but this should fail.

Compare to pr65947-14.c where we set  last = a[I]; inside the if.
When set N to 0x-1, it compiled and vectorised. That’s ok.
When set N to 0x it fails to vectorise with the message
"loop size is greater than data size”.

Looks like you might just need to add the one check.

Also see pr65947-9.c which uses the slightly more useful char indexes.


Alan.





Re: [PATCH] Fix result for conditional reductions matching at index 0

2017-11-22 Thread Richard Biener
On Tue, Nov 21, 2017 at 5:43 PM, Kilian Verhetsel
 wrote:
>
>> This is PR81179 I think, please mention that in the changelog.
>
> Correct, my bad for missing that.
>
>> This unconditionally pessimizes code even if there is no valid index
>> zero, right?
>
> Almost, since for a loop such as:
>
>   #define OFFSET 1
>   unsigned int find(const unsigned int *a, unsigned int v) {
> unsigned int result = 120;
> for (unsigned int i = OFFSET; i < 32+OFFSET; i++) {
>   if (a[i-OFFSET] == v) result = i;
> }
> return result;
>   }
>
> the index i will match the contents of the index vector used here ---
> but this does indeed pessimize the code generated for, say, OFFSET
> = 2. It is probably more sensible to use the existing code path in those
> situations.
>
>> The issue with the COND_REDUCITION index vector is overflow IIRC.
>
> Does that mean such overflows can already manifest themselves for
> regular COND_REDUCTION? I had assumed sufficient checks were already in
> place because of the presence of the is_nonwrapping_integer_induction
> test.

But only if we need the index vector?  The patch looked like you're changing
how other modes are handled (in my look I didn't make myself familiar with
the various modes again...).  Anyway, Alan hopefully remembers what he
coded so I'll defer to him here.

If Alan is happy with the patch consider it approved.

Thanks,
Richard.


Re: [PATCH] Fix result for conditional reductions matching at index 0

2017-11-21 Thread Alan Hayward

> On 21 Nov 2017, at 16:43, Kilian Verhetsel  
> wrote:
> 
> 
>> This is PR81179 I think, please mention that in the changelog.
> 
> Correct, my bad for missing that.
> 
>> This unconditionally pessimizes code even if there is no valid index
>> zero, right?
> 
> Almost, since for a loop such as:
> 
>  #define OFFSET 1
>  unsigned int find(const unsigned int *a, unsigned int v) {
>unsigned int result = 120;
>for (unsigned int i = OFFSET; i < 32+OFFSET; i++) {
>  if (a[i-OFFSET] == v) result = i;
>}
>return result;
>  }
> 
> the index i will match the contents of the index vector used here ---
> but this does indeed pessimize the code generated for, say, OFFSET
> = 2. It is probably more sensible to use the existing code path in those
> situations.
> 

Looking at the final asm on aarch64 for -14.c, the code has only grown
By a single instruction in the epilogue. Which is good, given the vector
pass dump for this patch is quite a bit longer.

In the vector dump, there is a vector of 0’s
_57 = { 0, 0, 0, 0 };
created which is then never used (and later gets optimised away).
Be nice if that could avoid getting created.
I’ve not had chance to scrutinise the patch yet to see where that is created.


>> The issue with the COND_REDUCITION index vector is overflow IIRC.
> 
> Does that mean such overflows can already manifest themselves for
> regular COND_REDUCTION? I had assumed sufficient checks were already in
> place because of the presence of the is_nonwrapping_integer_induction
> test.

As an aside, it’s possibly worth mentioning that this issue will go away for
aarch64-sve when Richard Sandiford’s SVE patch goes in, as that’ll support
CLASTB reduction.



Re: [PATCH] Fix result for conditional reductions matching at index 0

2017-11-21 Thread Kilian Verhetsel

> This is PR81179 I think, please mention that in the changelog.

Correct, my bad for missing that.

> This unconditionally pessimizes code even if there is no valid index
> zero, right?

Almost, since for a loop such as:

  #define OFFSET 1
  unsigned int find(const unsigned int *a, unsigned int v) {
unsigned int result = 120;
for (unsigned int i = OFFSET; i < 32+OFFSET; i++) {
  if (a[i-OFFSET] == v) result = i;
}
return result;
  }

the index i will match the contents of the index vector used here ---
but this does indeed pessimize the code generated for, say, OFFSET
= 2. It is probably more sensible to use the existing code path in those
situations.

> The issue with the COND_REDUCITION index vector is overflow IIRC.

Does that mean such overflows can already manifest themselves for
regular COND_REDUCTION? I had assumed sufficient checks were already in
place because of the presence of the is_nonwrapping_integer_induction
test.


Re: [PATCH] Fix result for conditional reductions matching at index 0

2017-11-21 Thread Richard Biener
On Tue, Nov 21, 2017 at 12:35 PM, Kilian Verhetsel
 wrote:
>
> Hi,
>
> When translating conditional reductions based on integer induction, the
> compiler uses the value zero to indicate the absence of any matches: if
> the index of the last match is still zero at the end of the loop, the
> default value is returned. The problem with this approach is that this
> default value is returned not only when there were no matches at all,
> but also when the last match occurred at index 0. This causes the test
> gcc.dg/vect/pr65947-14.c to fail.
>
> This patch corrects this by reusing the vector of indices used for
> COND_REDUCTION, which starts at 1. If the 1-based index of the last
> match is non-zero, 1 is subtracted from it, otherwise the initial value
> is returned.
>
> I tested this patch on x86_64-pc-linux-gnu (both with SSE and AVX2,
> causing both paths through the reduc_code != ERROR_MARK branch being
> taken).

This is PR81179 I think, please mention that in the changelog.

This unconditionally pessimizes code even if there is no valid index
zero, right?

The issue with the COND_REDUCITION index vector is overflow IIRC.

Alan, can you please comment on the patch?

Thanks,
Richard.

> 2017-11-21  Kilian Verhetsel 
>
> * tree-vect-loop.c
> (vect_create_epilog_for_reduction): Fix the returned value for
> INTEGER_INDUC_COND_REDUCTION whose last match occurred at
> index 0.
> (vectorizable_reduction): For INTEGER_INDUC_COND_REDUCTION,
> pass the PHI statement that sets the induction variable to the
> code generating the epilogue.
>