Re: [PATCH] Fix result for conditional reductions matching at index 0 (PR tree-optimization/80631)
On Mon, 11 Dec 2017, Jakub Jelinek wrote: > On Mon, Dec 11, 2017 at 06:00:11PM +0100, Kilian Verhetsel wrote: > > Jakub Jelinekwrites: > > > Of course it can be done efficiently, what we care most is that the body > > > of > > > the vectorized loop is efficient. > > > > That's fair, I was looking at the x86 assembly being generated when a single > > vectorized iteration was enough (because that is the context in which I > > first encountered this bug): > > > > int f(unsigned int *x, unsigned int k) { > > unsigned int result = 8; > > for (unsigned int i = 0; i < 8; i++) { > > if (x[i] == k) result = i; > > } > > return result; > > } > > > > where the vpand instruction this generates would have to be replaced > > with a variable blend if the default value weren't 0 — although I had > > not realized even SSE4.1 on x86 includes such an instruction, making > > this point less relevant. > > So, here is my version of the patch, independent from your change. > As I said, your patch is still highly valueable if it will be another > STMT_VINFO_VEC_REDUCTION_TYPE kind to be used for the cases like the > above testcase, where base is equal to TYPE_MIN_VALUE, or future improvement > of base being variable, but TYPE_OVERFLOW_UNDEFINED iterator, where all we > need is that the maximum number of iterations is smaller than the maximum > of the type we use for the reduction phi. > > The patch handles also negative steps, though for now only on signed type > (for unsigned it can't be really negative, but perhaps we could treat > unsigned values with the msb set as if they were negative and consider > overflows in that direction). > > Bootstrapped/regtested on x86_64-linux, i686-linux, powerpc64le-linux, > bootstrapped on powerpc64-linux, regtest there ongoing. Ok for trunk? > > The patch prefers to emit what we were emitting if possible (i.e. zero > value for the COND_EXPR never hit) - building a zero vector is usually > cheaper than any other; if that is not possible, checks if initial_def > can be used for that value - then we can avoid the > res == induc_val ? initial_def : res; > conditional move; if even that is not possible, attempts to use any other > value. If no value can be found, it for now uses COND_REDUCTION, which is > more expensive, but correct. Ok. Thanks, Richard. > 2017-12-11 Jakub Jelinek > > PR tree-optimization/80631 > * tree-vect-loop.c (get_initial_def_for_reduction): Fix comment typo. > (vect_create_epilog_for_reduction): Add INDUC_VAL and INDUC_CODE > arguments, for INTEGER_INDUC_COND_REDUCTION use INDUC_VAL instead of > hardcoding zero as the value if COND_EXPR is never true. For > INTEGER_INDUC_COND_REDUCTION don't emit the final COND_EXPR if > INDUC_VAL is equal to INITIAL_DEF, and use INDUC_CODE instead of > hardcoding MAX_EXPR as the reduction operation. > (is_nonwrapping_integer_induction): Allow negative step. > (vectorizable_reduction): Compute INDUC_VAL and INDUC_CODE for > vect_create_epilog_for_reduction, if no value is suitable, don't > use INTEGER_INDUC_COND_REDUCTION for now. Formatting fixes. > > * gcc.dg/vect/pr80631-1.c: New test. > * gcc.dg/vect/pr80631-2.c: New test. > * gcc.dg/vect/pr65947-13.c: Expect integer induc cond reduction > vectorization. > > --- gcc/tree-vect-loop.c.jj 2017-12-11 14:57:38.0 +0100 > +++ gcc/tree-vect-loop.c 2017-12-11 16:59:06.930720928 +0100 > @@ -4034,7 +4034,7 @@ get_initial_def_for_reduction (gimple *s > case MULT_EXPR: > case BIT_AND_EXPR: >{ > -/* ADJUSMENT_DEF is NULL when called from > +/* ADJUSTMENT_DEF is NULL when called from > vect_create_epilog_for_reduction to vectorize double reduction. > */ > if (adjustment_def) > *adjustment_def = init_val; > @@ -4283,6 +4283,11 @@ get_initial_defs_for_reduction (slp_tree > DOUBLE_REDUC is TRUE if double reduction phi nodes should be handled. > SLP_NODE is an SLP node containing a group of reduction statements. The > first one in this group is STMT. > + INDUC_VAL is for INTEGER_INDUC_COND_REDUCTION the value to use for the > case > + when the COND_EXPR is never true in the loop. For MAX_EXPR, it needs to > + be smaller than any value of the IV in the loop, for MIN_EXPR larger > than > + any value of the IV in the loop. > + INDUC_CODE is the code for epilog reduction if > INTEGER_INDUC_COND_REDUCTION. > > This function: > 1. Creates the reduction def-use cycles: sets the arguments for > @@ -4330,7 +4335,8 @@ vect_create_epilog_for_reduction (vec vec reduction_phis, >bool double_reduc, > slp_tree slp_node, > - slp_instance slp_node_instance) > +
Re: [PATCH] Fix result for conditional reductions matching at index 0
On Mon, Dec 11, 2017 at 06:00:11PM +0100, Kilian Verhetsel wrote: > Jakub Jelinekwrites: > > Of course it can be done efficiently, what we care most is that the body of > > the vectorized loop is efficient. > > That's fair, I was looking at the x86 assembly being generated when a single > vectorized iteration was enough (because that is the context in which I > first encountered this bug): > > int f(unsigned int *x, unsigned int k) { > unsigned int result = 8; > for (unsigned int i = 0; i < 8; i++) { > if (x[i] == k) result = i; > } > return result; > } > > where the vpand instruction this generates would have to be replaced > with a variable blend if the default value weren't 0 — although I had > not realized even SSE4.1 on x86 includes such an instruction, making > this point less relevant. See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80631#c6 where I've attached so far untested prototype. If it is added before your patch makes it in, your patch would start by introducing another kind (say SIMPLE_INTEGER_INDUC_COND_REDUCTION) and would use that for the spots that are handled by the PR80631 patch as INTEGER_INDUC_COND_REDUCTION right now and your code for the rest. E.g. the above testcase with my patch, because i is unsigned and base is the minimum of the type is emitted as COND_REDUCTION, which is what your patch would improve. > > Another thing is, as your patch is quite large, we need a copyright > > assignment for the changes before we can accept it, see > > https://gcc.gnu.org/contribute.html for details. > > > > If you are already covered by an assignment of some company, please tell > > us which one it is, otherwise contact us and we'll get you the needed > > forms. > > I am not covered by any copyright assignment yet. Do I need to send you > any additional information? I'll send it offlist. Jakub
Re: [PATCH] Fix result for conditional reductions matching at index 0
Jakub Jelinekwrites: > Of course it can be done efficiently, what we care most is that the body of > the vectorized loop is efficient. That's fair, I was looking at the x86 assembly being generated when a single vectorized iteration was enough (because that is the context in which I first encountered this bug): int f(unsigned int *x, unsigned int k) { unsigned int result = 8; for (unsigned int i = 0; i < 8; i++) { if (x[i] == k) result = i; } return result; } where the vpand instruction this generates would have to be replaced with a variable blend if the default value weren't 0 — although I had not realized even SSE4.1 on x86 includes such an instruction, making this point less relevant. > Thanks, it applies cleanly now > > + else if ((STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION > > + || (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) > > + == INTEGER_INDUC_COND_REDUCTION)) > > + && reduc_fn == IFN_LAST)» > > contains a character at the end of line that makes it not to compile. My bad, I must have added this when I opened the patch file itself to inspect it... > Another thing is, as your patch is quite large, we need a copyright > assignment for the changes before we can accept it, see > https://gcc.gnu.org/contribute.html for details. > > If you are already covered by an assignment of some company, please tell > us which one it is, otherwise contact us and we'll get you the needed > forms. I am not covered by any copyright assignment yet. Do I need to send you any additional information?
Re: [PATCH] Fix result for conditional reductions matching at index 0
On Mon, Dec 11, 2017 at 02:11:34PM +0100, Jakub Jelinek wrote: > Thanks, it applies cleanly now > > + else if ((STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION > > + || (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) > > + == INTEGER_INDUC_COND_REDUCTION)) > > + && reduc_fn == IFN_LAST)» > > contains a character at the end of line that makes it not to compile. Another thing is, as your patch is quite large, we need a copyright assignment for the changes before we can accept it, see https://gcc.gnu.org/contribute.html for details. If you are already covered by an assignment of some company, please tell us which one it is, otherwise contact us and we'll get you the needed forms. Jakub
Re: [PATCH] Fix result for conditional reductions matching at index 0
On Mon, Dec 11, 2017 at 11:56:55AM +0100, Kilian Verhetsel wrote: > Jakub Jelinekwrites: > > As it doesn't apply, I can't easily check what the patch generates > > on the PR80631 testcases vs. my thoughts on that; though, if it emits > > something more complicated for the simple cases, perhaps we could improve > > those (not handle it like COND_REDUCTION, but instead as former > > INTEGER_INDUC_COND_REDUCTION and just use a different constant instead of 0 > > if 0 isn't usable for the condition never matched value. > > While you could use values different from 0, I'm not sure this can be > done as efficiently. 0 is convenient because a single bitwise-and > between the index vector and the condition will set lanes that do not > contain a match to 0. Of course it can be done efficiently, what we care most is that the body of the vectorized loop is efficient. Whether we choose -1, 0 or 124 as the COND_EXPR not ever meant value matters only before that loop (when we need to load that into a register holding vector of all those constants) and then a scalar comparison on the REDUC_* result. Load of -1 vector on some targets is as expensive as load of 0, for arbitrary value worst case it is one memory load compared to a specialized zero register (or set all bits) instruction. On the other side, by not using any offsetted iteration var, one can reuse the vector register that holds the IV, which can be used in some loops too and thus decrease register pressure. And while comparison against 0 is sometimes one scalar insn cheaper than comparison against other value, if the insn producing it already sets the flags, I doubt it is the case here, so it is exactly the same cost. Not to mention that in your patch you are instead subtracting one in the scalar code. > Jakub Jelinek writes: > > First of all, I fail to see why we don't handle negative step, > > that can be done with REDUC_MIN instead of REDUC_MAX. > > That would not work without first using values different from 0 to > indicate the absence of matches (except in cases where all indices are > negative), which I assume is why the test was there in the first place. > > Below is the patch with fixed formatting and changes to make it apply > cleanly to r255537 (as far as I can tell this was simply caused by some > variables and constants having been renamed). Thanks, it applies cleanly now > + else if ((STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION > + || (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) > + == INTEGER_INDUC_COND_REDUCTION)) > +&& reduc_fn == IFN_LAST)» contains a character at the end of line that makes it not to compile. Trying to understand your patch, here is the difference with your patch between additional: --- tree-vect-loop.c2017-12-11 13:39:35.619122907 +0100 +++ tree-vect-loop.c2017-12-11 13:35:27.0 +0100 @@ -6021,8 +6021,8 @@ vectorizable_reduction (gimple *stmt, gi dump_printf_loc (MSG_NOTE, vect_location, "condition expression based on " "integer induction.\n"); - STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) - = INTEGER_INDUC_COND_REDUCTION; +/* STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) + = INTEGER_INDUC_COND_REDUCTION; */ } so that COND_REDUCTION is used, and the case with INTEGER_INDUC_COND_REDUCTION with your patch on: int v[256] = { 77, 1, 79, 3, 4, 5, 6, 7 }; __attribute__((noipa)) void foo () { int k, r = -1; for (k = 0; k < 256; k++) if (v[k] == 77) r = k; if (r != 0) __builtin_abort (); } vect_cst__21 = { 8, 8, 8, 8, 8, 8, 8, 8 }; vect_cst__28 = { 77, 77, 77, 77, 77, 77, 77, 77 }; + vect_cst__30 = { -1, -1, -1, -1, -1, -1, -1, -1 }; [local count: 139586436]: # k_12 = PHI # r_13 = PHI # ivtmp_11 = PHI # vect_vec_iv_.0_22 = PHI - # vect_r_3.1_24 = PHI + # vect_r_3.1_24 = PHI # vectp_v.2_25 = PHI - # ivtmp_30 = PHI - # _32 = PHI <_33(9), { 0, 0, 0, 0, 0, 0, 0, 0 }(2)> - # ivtmp_43 = PHI + # ivtmp_31 = PHI + # _33 = PHI <_34(9), { 0, 0, 0, 0, 0, 0, 0, 0 }(2)> + # ivtmp_41 = PHI vect_vec_iv_.0_23 = vect_vec_iv_.0_22 + vect_cst__21; vect__1.4_27 = MEM[(int *)vectp_v.2_25]; _1 = v[k_12]; vect_r_3.6_29 = VEC_COND_EXPR ; r_3 = _1 == 77 ? k_12 : r_13; k_8 = k_12 + 1; ivtmp_2 = ivtmp_11 - 1; vectp_v.2_26 = vectp_v.2_25 + 32; - _33 = VEC_COND_EXPR ; - ivtmp_31 = ivtmp_30 + { 8, 8, 8, 8, 8, 8, 8, 8 }; - ivtmp_44 = ivtmp_43 + 1; - if (ivtmp_44 < 32) + _34 = VEC_COND_EXPR ; + ivtmp_32
Re: [PATCH] Fix result for conditional reductions matching at index 0
Hello, Jakub Jelinekwrites: > As it doesn't apply, I can't easily check what the patch generates > on the PR80631 testcases vs. my thoughts on that; though, if it emits > something more complicated for the simple cases, perhaps we could improve > those (not handle it like COND_REDUCTION, but instead as former > INTEGER_INDUC_COND_REDUCTION and just use a different constant instead of 0 > if 0 isn't usable for the condition never matched value. While you could use values different from 0, I'm not sure this can be done as efficiently. 0 is convenient because a single bitwise-and between the index vector and the condition will set lanes that do not contain a match to 0. Jakub Jelinek writes: > First of all, I fail to see why we don't handle negative step, > that can be done with REDUC_MIN instead of REDUC_MAX. That would not work without first using values different from 0 to indicate the absence of matches (except in cases where all indices are negative), which I assume is why the test was there in the first place. Below is the patch with fixed formatting and changes to make it apply cleanly to r255537 (as far as I can tell this was simply caused by some variables and constants having been renamed). 2017-11-21 Kilian Verhetsel gcc/ChangeLog: PR testsuite/81179 * tree-vect-loop.c (vect_create_epilog_for_reduction): Fix the returned value for INTEGER_INDUC_COND_REDUCTION whose last match occurred at index 0. (vectorizable_reduction): For INTEGER_INDUC_COND_REDUCTION, pass the PHI statement that sets the induction variable to the code generating the epilogue and check that no overflow will occur. gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-iv-cond-reduc-overflow-1.c: New test for overflows when compiling conditional reductions. * gcc.dg/vect/vect-iv-cond-reduc-overflow-2.c: Likewise. Index: gcc/testsuite/gcc.dg/vect/vect-iv-cond-reduc-overflow-1.c === --- gcc/testsuite/gcc.dg/vect/vect-iv-cond-reduc-overflow-1.c (nonexistent) +++ gcc/testsuite/gcc.dg/vect/vect-iv-cond-reduc-overflow-1.c (working copy) @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_condition } */ + +#include "tree-vect.h" +#include + +extern void abort (void) __attribute__ ((noreturn)); + +#define N UINT_MAX + +/* Condition reduction with maximum possible loop size. Will fail to vectorize + because values in the index vector will overflow. */ +unsigned int +condition_reduction (unsigned int *a, unsigned int min_v) +{ + unsigned int last = -72; + + for (unsigned int i = 0; i < N; i++) +if (a[i] < min_v) + last = i; + + return last; +} + +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */ +/* { dg-final { scan-tree-dump "condition expression based on integer induction." "vect" } } */ +/* { dg-final { scan-tree-dump "loop size is greater than data size" "vect" } } */ Index: gcc/testsuite/gcc.dg/vect/vect-iv-cond-reduc-overflow-2.c === --- gcc/testsuite/gcc.dg/vect/vect-iv-cond-reduc-overflow-2.c (nonexistent) +++ gcc/testsuite/gcc.dg/vect/vect-iv-cond-reduc-overflow-2.c (working copy) @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_condition } */ + +#include "tree-vect.h" +#include + +extern void abort (void) __attribute__ ((noreturn)); + +#define N (UINT_MAX - 1) + +/* Condition reduction with maximum possible loop size, minus one. This should + still be vectorized correctly. */ +unsigned int +condition_reduction (unsigned int *a, unsigned int min_v) +{ + unsigned int last = -72; + + for (unsigned int i = 0; i < N; i++) +if (a[i] < min_v) + last = i; + + return last; +} + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ +/* { dg-final { scan-tree-dump "condition expression based on integer induction." "vect" } } */ Index: gcc/tree-vect-loop.c === --- gcc/tree-vect-loop.c (revision 255537) +++ gcc/tree-vect-loop.c (working copy) @@ -4325,7 +4325,7 @@ get_initial_defs_for_reduction (slp_tree slp_node, static void vect_create_epilog_for_reduction (vec vect_defs, gimple *stmt, - gimple *reduc_def_stmt, + gimple *reduc_def_stmt, gimple *induct_stmt, int ncopies, internal_fn reduc_fn, vec reduction_phis, bool double_reduc, @@ -4486,7 +4486,9 @@ vect_create_epilog_for_reduction (vec vect_d The first match will be a 1 to allow 0 to be used for non-matching indexes. If there are no matches at all then the vector will be all zeroes. */ - if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION) + if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION + ||
Re: [PATCH] Fix result for conditional reductions matching at index 0
Hi! What is going on with this patch, will anybody commit it? This is also http://gcc.gnu.org/PR80631 apparently. The patch doesn't apply cleanly to current trunk due to the reduc_code -> reduc_fn changes. As it doesn't apply, I can't easily check what the patch generates on the PR80631 testcases vs. my thoughts on that; though, if it emits something more complicated for the simple cases, perhaps we could improve those (not handle it like COND_REDUCTION, but instead as former INTEGER_INDUC_COND_REDUCTION and just use a different constant instead of 0 if 0 isn't usable for the condition never matched value. We could check the value from the loop preheader edge of the induc_stmt phi and if it is constant and for REDUC_MAX (does that imply always positive step?) larger than minimum value of the type, we can use the minimum of the type as the value instead of zero. If REDUC_MIN and the initial constant is smaller than the maximum value of the type, we could use the maximum. Otherwise punt to what you're doing. Or instead of minimum or maximum check the value of initial_def, if it is constant, if that value is usable, we could even avoid the final COND_EXPR. Another thing is that is_nonwrapping_integer_induction has: /* Make sure the loop is integer based. */ if (TREE_CODE (base) != INTEGER_CST || TREE_CODE (step) != INTEGER_CST) return false; /* Check that the induction increments. */ if (tree_int_cst_sgn (step) == -1) return false; if (TYPE_OVERFLOW_UNDEFINED (lhs_type)) return true; First of all, I fail to see why we don't handle negative step, that can be done with REDUC_MIN instead of REDUC_MAX. And, I also fail to see why we need to require INTEGER_CST base if TYPE_OVERFLOW_UNDEFINED (lhs_type), in that case we know it will not wrap, so all we care about is if step * ni won't cover all iterations and we can use some value (type minimum or maximum) to denote the condition never true case. On Wed, Nov 22, 2017 at 05:57:08PM +0100, Kilian Verhetsel wrote: > 2017-11-21 Kilian VerhetselMissing space before <. > gcc/ChangeLog: > PR testsuite/81179 > * tree-vect-loop.c > (vect_create_epilog_for_reduction): Fix the returned value for 8 spaces instead of a tab, note the (vect_create_epilog_for_reduction): should just go after space right after the filename. > INTEGER_INDUC_COND_REDUCTION whose last match occurred at > index 0. > (vectorizable_reduction): For INTEGER_INDUC_COND_REDUCTION, > pass the PHI statement that sets the induction variable to the > code generating the epilogue and check that no overflow will > occur. > > gcc/testsuite/ChangeLog: > * gcc.dg/vect/vect-iv-cond-reduc-overflow-1.c: New test for > overflows when compiling conditional reductions. > * gcc.dg/vect/vect-iv-cond-reduc-overflow-2.c: Likewise. Again, spaces instead of tab twice. > - if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION) > + if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION > + || STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) > + == INTEGER_INDUC_COND_REDUCTION) Formatting, == should be below STMT_VINFO on the previous line, for emacs users perhaps even better surrounded by ()s, like: if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION || (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == INTEGER_INDUC_COND_REDUCTION)) (happens many times in the patch). Jakub
Re: [PATCH] Fix result for conditional reductions matching at index 0
On Thu, Nov 23, 2017 at 10:51 AM, Alan Haywardwrote: > >> On 22 Nov 2017, at 16:57, Kilian Verhetsel >> wrote: >> >> >> Thank you both for your comments. >> >> I have added the check to ensure the index vector won't cause an >> overflow. I also added tests to the testsuite in order to check that the >> loop is vectorized for UINT_MAX - 1 iterations but not UINT_MAX >> iterations. I was not able to write code that triggers >> INTEGER_INDUC_COND_REDUCTION when using char or other smaller types >> (changing the types of last, min_v and a to something else causes >> COND_REDUCTION to be used instead), so these tests are only compiled and >> not executed. > > I had similar problems when playing around with -14.c, but didn’t have chance > to > investigate why. Possibly worth raising a bug to mentioning there is a missed > optimisation. It’d be nice to figure out why. > >> >> I also moved an instruction that generates a vector of zeroes (used for >> COND_REDUCTION) in the branch of code run only for COND_REDUCTION, this >> should remove the unused vector that Alan noticed. > > Patch is ok for me. Fine with me as well. Thanks for taking care of this bug. Richard. > > Alan. >
Re: [PATCH] Fix result for conditional reductions matching at index 0
> On 22 Nov 2017, at 16:57, Kilian Verhetsel> wrote: > > > Thank you both for your comments. > > I have added the check to ensure the index vector won't cause an > overflow. I also added tests to the testsuite in order to check that the > loop is vectorized for UINT_MAX - 1 iterations but not UINT_MAX > iterations. I was not able to write code that triggers > INTEGER_INDUC_COND_REDUCTION when using char or other smaller types > (changing the types of last, min_v and a to something else causes > COND_REDUCTION to be used instead), so these tests are only compiled and > not executed. I had similar problems when playing around with -14.c, but didn’t have chance to investigate why. Possibly worth raising a bug to mentioning there is a missed optimisation. It’d be nice to figure out why. > > I also moved an instruction that generates a vector of zeroes (used for > COND_REDUCTION) in the branch of code run only for COND_REDUCTION, this > should remove the unused vector that Alan noticed. Patch is ok for me. Alan.
Re: [PATCH] Fix result for conditional reductions matching at index 0
Thank you both for your comments. I have added the check to ensure the index vector won't cause an overflow. I also added tests to the testsuite in order to check that the loop is vectorized for UINT_MAX - 1 iterations but not UINT_MAX iterations. I was not able to write code that triggers INTEGER_INDUC_COND_REDUCTION when using char or other smaller types (changing the types of last, min_v and a to something else causes COND_REDUCTION to be used instead), so these tests are only compiled and not executed. I also moved an instruction that generates a vector of zeroes (used for COND_REDUCTION) in the branch of code run only for COND_REDUCTION, this should remove the unused vector that Alan noticed. 2017-11-21 Kilian Verhetselgcc/ChangeLog: PR testsuite/81179 * tree-vect-loop.c (vect_create_epilog_for_reduction): Fix the returned value for INTEGER_INDUC_COND_REDUCTION whose last match occurred at index 0. (vectorizable_reduction): For INTEGER_INDUC_COND_REDUCTION, pass the PHI statement that sets the induction variable to the code generating the epilogue and check that no overflow will occur. gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-iv-cond-reduc-overflow-1.c: New test for overflows when compiling conditional reductions. * gcc.dg/vect/vect-iv-cond-reduc-overflow-2.c: Likewise. Index: gcc/testsuite/gcc.dg/vect/vect-iv-cond-reduc-overflow-1.c === --- gcc/testsuite/gcc.dg/vect/vect-iv-cond-reduc-overflow-1.c (nonexistent) +++ gcc/testsuite/gcc.dg/vect/vect-iv-cond-reduc-overflow-1.c (working copy) @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_condition } */ + +#include "tree-vect.h" +#include + +extern void abort (void) __attribute__ ((noreturn)); + +#define N UINT_MAX + +/* Condition reduction with maximum possible loop size. Will fail to vectorize + because values in the index vector will overflow. */ +unsigned int +condition_reduction (unsigned int *a, unsigned int min_v) +{ + unsigned int last = -72; + + for (unsigned int i = 0; i < N; i++) +if (a[i] < min_v) + last = i; + + return last; +} + +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */ +/* { dg-final { scan-tree-dump "condition expression based on integer induction." "vect" } } */ +/* { dg-final { scan-tree-dump "loop size is greater than data size" "vect" } } */ Index: gcc/testsuite/gcc.dg/vect/vect-iv-cond-reduc-overflow-2.c === --- gcc/testsuite/gcc.dg/vect/vect-iv-cond-reduc-overflow-2.c (nonexistent) +++ gcc/testsuite/gcc.dg/vect/vect-iv-cond-reduc-overflow-2.c (working copy) @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_condition } */ + +#include "tree-vect.h" +#include + +extern void abort (void) __attribute__ ((noreturn)); + +#define N (UINT_MAX - 1) + +/* Condition reduction with maximum possible loop size, minus one. This should + still be vectorized correctly. */ +unsigned int +condition_reduction (unsigned int *a, unsigned int min_v) +{ + unsigned int last = -72; + + for (unsigned int i = 0; i < N; i++) +if (a[i] < min_v) + last = i; + + return last; +} + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ +/* { dg-final { scan-tree-dump "condition expression based on integer induction." "vect" } } */ Index: gcc/tree-vect-loop.c === --- gcc/tree-vect-loop.c (revision 254996) +++ gcc/tree-vect-loop.c (working copy) @@ -4316,7 +4316,7 @@ get_initial_defs_for_reduction (slp_tree slp_node, static void vect_create_epilog_for_reduction (vec vect_defs, gimple *stmt, - gimple *reduc_def_stmt, + gimple *reduc_def_stmt, gimple *induct_stmt, int ncopies, enum tree_code reduc_code, vec reduction_phis, bool double_reduc, @@ -4477,7 +4477,9 @@ vect_create_epilog_for_reduction (vec vect_d The first match will be a 1 to allow 0 to be used for non-matching indexes. If there are no matches at all then the vector will be all zeroes. */ - if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION) + if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION + || STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) + == INTEGER_INDUC_COND_REDUCTION) { tree indx_before_incr, indx_after_incr; int nunits_out = TYPE_VECTOR_SUBPARTS (vectype); @@ -4754,7 +4756,9 @@ vect_create_epilog_for_reduction (vec vect_d else new_phi_result = PHI_RESULT (new_phis[0]); - if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION + if ((STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION + || STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) + == INTEGER_INDUC_COND_REDUCTION)
Re: [PATCH] Fix result for conditional reductions matching at index 0
On Wed, Nov 22, 2017 at 12:01 PM, Alan Haywardwrote: > >> On 22 Nov 2017, at 09:14, Richard Biener wrote: >> >> On Tue, Nov 21, 2017 at 5:43 PM, Kilian Verhetsel >> wrote: >>> This is PR81179 I think, please mention that in the changelog. >>> >>> Correct, my bad for missing that. >>> This unconditionally pessimizes code even if there is no valid index zero, right? >>> >>> Almost, since for a loop such as: >>> >>> #define OFFSET 1 >>> unsigned int find(const unsigned int *a, unsigned int v) { >>>unsigned int result = 120; >>>for (unsigned int i = OFFSET; i < 32+OFFSET; i++) { >>> if (a[i-OFFSET] == v) result = i; >>>} >>>return result; >>> } >>> >>> the index i will match the contents of the index vector used here --- >>> but this does indeed pessimize the code generated for, say, OFFSET >>> = 2. It is probably more sensible to use the existing code path in those >>> situations. >>> The issue with the COND_REDUCITION index vector is overflow IIRC. >>> >>> Does that mean such overflows can already manifest themselves for >>> regular COND_REDUCTION? I had assumed sufficient checks were already in >>> place because of the presence of the is_nonwrapping_integer_induction >>> test. >> >> But only if we need the index vector? The patch looked like you're changing >> how other modes are handled (in my look I didn't make myself familiar with >> the various modes again...). Anyway, Alan hopefully remembers what he >> coded so I'll defer to him here. >> >> If Alan is happy with the patch consider it approved. >> > > Richard’s right with his question. > > The optimisation needs to fail if the number of interactions of the loop + 1 > doesn’t > fit into the data type used for the result. > > I took the test pr65947-14.c > First I set N to 0x-1. This compiled and vectorised. That’s ok. > Now if I set N to 0x it still vectorises, but this should fail. > > Compare to pr65947-14.c where we set last = a[I]; inside the if. > When set N to 0x-1, it compiled and vectorised. That’s ok. > When set N to 0x it fails to vectorise with the message > "loop size is greater than data size”. > > Looks like you might just need to add the one check. > > Also see pr65947-9.c which uses the slightly more useful char indexes. Some extra test variants might be indeed useful to really cover all corner cases. The PR had some "trivial" patch for the issue by me that I considered a too big hammer. I wonder if other compilers pull another trick to deal with the situation. Richard. > > Alan. > > >
Re: [PATCH] Fix result for conditional reductions matching at index 0
> On 22 Nov 2017, at 09:14, Richard Bienerwrote: > > On Tue, Nov 21, 2017 at 5:43 PM, Kilian Verhetsel > wrote: >> >>> This is PR81179 I think, please mention that in the changelog. >> >> Correct, my bad for missing that. >> >>> This unconditionally pessimizes code even if there is no valid index >>> zero, right? >> >> Almost, since for a loop such as: >> >> #define OFFSET 1 >> unsigned int find(const unsigned int *a, unsigned int v) { >>unsigned int result = 120; >>for (unsigned int i = OFFSET; i < 32+OFFSET; i++) { >> if (a[i-OFFSET] == v) result = i; >>} >>return result; >> } >> >> the index i will match the contents of the index vector used here --- >> but this does indeed pessimize the code generated for, say, OFFSET >> = 2. It is probably more sensible to use the existing code path in those >> situations. >> >>> The issue with the COND_REDUCITION index vector is overflow IIRC. >> >> Does that mean such overflows can already manifest themselves for >> regular COND_REDUCTION? I had assumed sufficient checks were already in >> place because of the presence of the is_nonwrapping_integer_induction >> test. > > But only if we need the index vector? The patch looked like you're changing > how other modes are handled (in my look I didn't make myself familiar with > the various modes again...). Anyway, Alan hopefully remembers what he > coded so I'll defer to him here. > > If Alan is happy with the patch consider it approved. > Richard’s right with his question. The optimisation needs to fail if the number of interactions of the loop + 1 doesn’t fit into the data type used for the result. I took the test pr65947-14.c First I set N to 0x-1. This compiled and vectorised. That’s ok. Now if I set N to 0x it still vectorises, but this should fail. Compare to pr65947-14.c where we set last = a[I]; inside the if. When set N to 0x-1, it compiled and vectorised. That’s ok. When set N to 0x it fails to vectorise with the message "loop size is greater than data size”. Looks like you might just need to add the one check. Also see pr65947-9.c which uses the slightly more useful char indexes. Alan.
Re: [PATCH] Fix result for conditional reductions matching at index 0
On Tue, Nov 21, 2017 at 5:43 PM, Kilian Verhetselwrote: > >> This is PR81179 I think, please mention that in the changelog. > > Correct, my bad for missing that. > >> This unconditionally pessimizes code even if there is no valid index >> zero, right? > > Almost, since for a loop such as: > > #define OFFSET 1 > unsigned int find(const unsigned int *a, unsigned int v) { > unsigned int result = 120; > for (unsigned int i = OFFSET; i < 32+OFFSET; i++) { > if (a[i-OFFSET] == v) result = i; > } > return result; > } > > the index i will match the contents of the index vector used here --- > but this does indeed pessimize the code generated for, say, OFFSET > = 2. It is probably more sensible to use the existing code path in those > situations. > >> The issue with the COND_REDUCITION index vector is overflow IIRC. > > Does that mean such overflows can already manifest themselves for > regular COND_REDUCTION? I had assumed sufficient checks were already in > place because of the presence of the is_nonwrapping_integer_induction > test. But only if we need the index vector? The patch looked like you're changing how other modes are handled (in my look I didn't make myself familiar with the various modes again...). Anyway, Alan hopefully remembers what he coded so I'll defer to him here. If Alan is happy with the patch consider it approved. Thanks, Richard.
Re: [PATCH] Fix result for conditional reductions matching at index 0
> On 21 Nov 2017, at 16:43, Kilian Verhetsel> wrote: > > >> This is PR81179 I think, please mention that in the changelog. > > Correct, my bad for missing that. > >> This unconditionally pessimizes code even if there is no valid index >> zero, right? > > Almost, since for a loop such as: > > #define OFFSET 1 > unsigned int find(const unsigned int *a, unsigned int v) { >unsigned int result = 120; >for (unsigned int i = OFFSET; i < 32+OFFSET; i++) { > if (a[i-OFFSET] == v) result = i; >} >return result; > } > > the index i will match the contents of the index vector used here --- > but this does indeed pessimize the code generated for, say, OFFSET > = 2. It is probably more sensible to use the existing code path in those > situations. > Looking at the final asm on aarch64 for -14.c, the code has only grown By a single instruction in the epilogue. Which is good, given the vector pass dump for this patch is quite a bit longer. In the vector dump, there is a vector of 0’s _57 = { 0, 0, 0, 0 }; created which is then never used (and later gets optimised away). Be nice if that could avoid getting created. I’ve not had chance to scrutinise the patch yet to see where that is created. >> The issue with the COND_REDUCITION index vector is overflow IIRC. > > Does that mean such overflows can already manifest themselves for > regular COND_REDUCTION? I had assumed sufficient checks were already in > place because of the presence of the is_nonwrapping_integer_induction > test. As an aside, it’s possibly worth mentioning that this issue will go away for aarch64-sve when Richard Sandiford’s SVE patch goes in, as that’ll support CLASTB reduction.
Re: [PATCH] Fix result for conditional reductions matching at index 0
> This is PR81179 I think, please mention that in the changelog. Correct, my bad for missing that. > This unconditionally pessimizes code even if there is no valid index > zero, right? Almost, since for a loop such as: #define OFFSET 1 unsigned int find(const unsigned int *a, unsigned int v) { unsigned int result = 120; for (unsigned int i = OFFSET; i < 32+OFFSET; i++) { if (a[i-OFFSET] == v) result = i; } return result; } the index i will match the contents of the index vector used here --- but this does indeed pessimize the code generated for, say, OFFSET = 2. It is probably more sensible to use the existing code path in those situations. > The issue with the COND_REDUCITION index vector is overflow IIRC. Does that mean such overflows can already manifest themselves for regular COND_REDUCTION? I had assumed sufficient checks were already in place because of the presence of the is_nonwrapping_integer_induction test.
Re: [PATCH] Fix result for conditional reductions matching at index 0
On Tue, Nov 21, 2017 at 12:35 PM, Kilian Verhetselwrote: > > Hi, > > When translating conditional reductions based on integer induction, the > compiler uses the value zero to indicate the absence of any matches: if > the index of the last match is still zero at the end of the loop, the > default value is returned. The problem with this approach is that this > default value is returned not only when there were no matches at all, > but also when the last match occurred at index 0. This causes the test > gcc.dg/vect/pr65947-14.c to fail. > > This patch corrects this by reusing the vector of indices used for > COND_REDUCTION, which starts at 1. If the 1-based index of the last > match is non-zero, 1 is subtracted from it, otherwise the initial value > is returned. > > I tested this patch on x86_64-pc-linux-gnu (both with SSE and AVX2, > causing both paths through the reduc_code != ERROR_MARK branch being > taken). This is PR81179 I think, please mention that in the changelog. This unconditionally pessimizes code even if there is no valid index zero, right? The issue with the COND_REDUCITION index vector is overflow IIRC. Alan, can you please comment on the patch? Thanks, Richard. > 2017-11-21 Kilian Verhetsel > > * tree-vect-loop.c > (vect_create_epilog_for_reduction): Fix the returned value for > INTEGER_INDUC_COND_REDUCTION whose last match occurred at > index 0. > (vectorizable_reduction): For INTEGER_INDUC_COND_REDUCTION, > pass the PHI statement that sets the induction variable to the > code generating the epilogue. >