[PATCH] Fix PR92324 more
There's another case missed, SLP reductions still use the wrong sign for the epilogue operations. I also noticed that we generate somewhat ugly code for the epilogue since we fall back to scalar extracts but do not consider to reduce a larger vector to a smaller one first (and neither multiple vectors, but that's for another fix). Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Richard. 2019-11-15 Richard Biener PR tree-optimization/92324 * tree-vect-loop.c (vect_create_epilog_for_reduction): Fix singedness of SLP reduction epilouge operations. Also reduce the vector width for SLP reductions before doing elementwise operations if possible. * gcc.dg/vect/pr92324-4.c: New testcase. Index: gcc/tree-vect-loop.c === --- gcc/tree-vect-loop.c(revision 278281) +++ gcc/tree-vect-loop.c(working copy) @@ -4930,6 +4941,8 @@ vect_create_epilog_for_reduction (stmt_v bool reduce_with_shift; tree vec_temp; + gcc_assert (slp_reduc || new_phis.length () == 1); + /* See if the target wants to do the final (shift) reduction in a vector mode of smaller size and first reduce upper/lower halves against each other. */ @@ -4937,6 +4950,21 @@ vect_create_epilog_for_reduction (stmt_v tree stype = TREE_TYPE (vectype); unsigned nunits = TYPE_VECTOR_SUBPARTS (vectype).to_constant (); unsigned nunits1 = nunits; + if ((mode1 = targetm.vectorize.split_reduction (mode)) != mode + && new_phis.length () == 1) + { + nunits1 = GET_MODE_NUNITS (mode1).to_constant (); + /* For SLP reductions we have to make sure lanes match up, but +since we're doing individual element final reduction reducing +vector width here is even more important. +??? We can also separate lanes with permutes, for the common +case of power-of-two group-size odd/even extracts would work. */ + if (slp_reduc && nunits != nunits1) + { + nunits1 = least_common_multiple (nunits1, group_size); + gcc_assert (exact_log2 (nunits1) != -1 && nunits1 <= nunits); + } + } if (!slp_reduc && (mode1 = targetm.vectorize.split_reduction (mode)) != mode) nunits1 = GET_MODE_NUNITS (mode1).to_constant (); @@ -4958,7 +4986,6 @@ vect_create_epilog_for_reduction (stmt_v new_temp = new_phi_result; while (nunits > nunits1) { - gcc_assert (!slp_reduc); nunits /= 2; vectype1 = get_related_vectype_for_scalar_type (TYPE_MODE (vectype), stype, nunits); @@ -5113,6 +5140,8 @@ vect_create_epilog_for_reduction (stmt_v int vec_size_in_bits = tree_to_uhwi (TYPE_SIZE (vectype1)); int element_bitsize = tree_to_uhwi (bitsize); + tree compute_type = TREE_TYPE (vectype); + gimple_seq stmts = NULL; FOR_EACH_VEC_ELT (new_phis, i, new_phi) { int bit_offset; @@ -5120,12 +5149,8 @@ vect_create_epilog_for_reduction (stmt_v vec_temp = PHI_RESULT (new_phi); else vec_temp = gimple_assign_lhs (new_phi); - tree rhs = build3 (BIT_FIELD_REF, scalar_type, vec_temp, bitsize, -bitsize_zero_node); - epilog_stmt = gimple_build_assign (new_scalar_dest, rhs); - new_temp = make_ssa_name (new_scalar_dest, epilog_stmt); - gimple_assign_set_lhs (epilog_stmt, new_temp); - gsi_insert_before (_gsi, epilog_stmt, GSI_SAME_STMT); + new_temp = gimple_build (, BIT_FIELD_REF, compute_type, + vec_temp, bitsize, bitsize_zero_node); /* In SLP we don't need to apply reduction operation, so we just collect s' values in SCALAR_RESULTS. */ @@ -5137,14 +5162,9 @@ vect_create_epilog_for_reduction (stmt_v bit_offset += element_bitsize) { tree bitpos = bitsize_int (bit_offset); - tree rhs = build3 (BIT_FIELD_REF, scalar_type, vec_temp, - bitsize, bitpos); - - epilog_stmt = gimple_build_assign (new_scalar_dest, rhs); - new_name = make_ssa_name (new_scalar_dest, epilog_stmt); - gimple_assign_set_lhs (epilog_stmt, new_name); - gsi_insert_before (_gsi, epilog_stmt, GSI_SAME_STMT); - + new_name = gimple_build (, BIT_FIELD_REF, + compute_type, vec_temp, + bitsize, bitpos); if (slp_reduc) { /* In SLP we don't need to apply reduction
Re: [PATCH] Fix PR92324
On Mon, 11 Nov 2019 at 14:36, Richard Biener wrote: > > On Mon, 11 Nov 2019, Christophe Lyon wrote: > > > On Fri, 8 Nov 2019 at 09:57, Richard Biener wrote: > > > > > > > > > I've been sitting on this for a few days since I'm not 100% happy > > > with how the code looks like. There's possibly still holes in it > > > (chains with mixed signed/unsigned adds for example might pick > > > up signed adds in the epilogue), but the wrong-code cases should > > > work fine now. I'm probably going to followup with some > > > mass renaming of variable/parameter names to make it more clear > > > which stmt / type we are actually looking at ... > > > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. > > > > > > Richard. > > > > > > 2019-11-08 Richard Biener > > > > > > PR tree-optimization/ > > > * tree-vect-loop.c (vect_create_epilog_for_reduction): Use > > > STMT_VINFO_REDUC_VECTYPE for all computations, inserting > > > sign-conversions as necessary. > > > (vectorizable_reduction): Reject conversions in the chain > > > that are not sign-conversions, base analysis on a non-converting > > > stmt and its operation sign. Set STMT_VINFO_REDUC_VECTYPE. > > > * tree-vect-stmts.c (vect_stmt_relevant_p): Don't dump anything > > > for debug stmts. > > > * tree-vectorizer.h (_stmt_vec_info::reduc_vectype): New. > > > (STMT_VINFO_REDUC_VECTYPE): Likewise. > > > > > > * gcc.dg/vect/pr92205.c: XFAIL. > > > * gcc.dg/vect/pr92324-1.c: New testcase. > > > * gcc.dg/vect/pr92324-2.c: Likewise. > > > > > > > Hi Richard, > > > > This new testcase (pr92324-2.c) fails on arm/aarch64: > > FAIL: gcc.dg/vect/pr92324-2.c -flto -ffat-lto-objects execution test > > FAIL: gcc.dg/vect/pr92324-2.c execution test > > Can you open a bugreport? I suspect the arm/aarch64 vector > min/max patterns are bogus or we go a different code-path > in the vectorizer here (IIRC x86 has reduc_[us]max_* patterns > while aarch64 has none of that). > Sure, I've filed PR92473. > Thanks, > Richard. > > > Christophe > > > > > > > Index: gcc/tree-vect-loop.c > > > === > > > --- gcc/tree-vect-loop.c(revision 277922) > > > +++ gcc/tree-vect-loop.c(working copy) > > > @@ -4231,7 +4231,6 @@ vect_create_epilog_for_reduction (stmt_v > > >gimple *new_phi = NULL, *phi; > > >stmt_vec_info phi_info; > > >gimple_stmt_iterator exit_gsi; > > > - tree vec_dest; > > >tree new_temp = NULL_TREE, new_name, new_scalar_dest; > > >gimple *epilog_stmt = NULL; > > >gimple *exit_phi; > > > @@ -4264,7 +4263,7 @@ vect_create_epilog_for_reduction (stmt_v > > > } > > >gcc_assert (!nested_in_vect_loop || double_reduc); > > > > > > - vectype = STMT_VINFO_VECTYPE (stmt_info); > > > + vectype = STMT_VINFO_REDUC_VECTYPE (reduc_info); > > >gcc_assert (vectype); > > >mode = TYPE_MODE (vectype); > > > > > > @@ -4505,48 +4504,43 @@ vect_create_epilog_for_reduction (stmt_v > > > one vector. */ > > >if (REDUC_GROUP_FIRST_ELEMENT (stmt_info) || direct_slp_reduc) > > > { > > > + gimple_seq stmts = NULL; > > >tree first_vect = PHI_RESULT (new_phis[0]); > > > - gassign *new_vec_stmt = NULL; > > > - vec_dest = vect_create_destination_var (scalar_dest, vectype); > > > + first_vect = gimple_convert (, vectype, first_vect); > > >for (k = 1; k < new_phis.length (); k++) > > > { > > > gimple *next_phi = new_phis[k]; > > >tree second_vect = PHI_RESULT (next_phi); > > > - tree tem = make_ssa_name (vec_dest, new_vec_stmt); > > > - new_vec_stmt = gimple_build_assign (tem, code, > > > - first_vect, second_vect); > > > - gsi_insert_before (_gsi, new_vec_stmt, GSI_SAME_STMT); > > > - first_vect = tem; > > > + second_vect = gimple_convert (, vectype, second_vect); > > > + first_vect = gimple_build (, code, vectype, > > > +first_vect, second_vect); > > > } > > > + gsi_insert_seq_before (_gsi, stmts, GSI_SAME_STMT); > > > > > >new_phi_result = first_vect; > > > - if (new_vec_stmt) > > > -{ > > > - new_phis.truncate (0); > > > - new_phis.safe_push (new_vec_stmt); > > > -} > > > + new_phis.truncate (0); > > > + new_phis.safe_push (SSA_NAME_DEF_STMT (first_vect)); > > > } > > >/* Likewise if we couldn't use a single defuse cycle. */ > > >else if (ncopies > 1) > > > { > > >gcc_assert (new_phis.length () == 1); > > > + gimple_seq stmts = NULL; > > >tree first_vect = PHI_RESULT (new_phis[0]); > > > - gassign *new_vec_stmt = NULL; > > > - vec_dest = vect_create_destination_var (scalar_dest, vectype); > > > + first_vect =
Re: [PATCH] Fix PR92324
On Mon, 11 Nov 2019, Christophe Lyon wrote: > On Fri, 8 Nov 2019 at 09:57, Richard Biener wrote: > > > > > > I've been sitting on this for a few days since I'm not 100% happy > > with how the code looks like. There's possibly still holes in it > > (chains with mixed signed/unsigned adds for example might pick > > up signed adds in the epilogue), but the wrong-code cases should > > work fine now. I'm probably going to followup with some > > mass renaming of variable/parameter names to make it more clear > > which stmt / type we are actually looking at ... > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. > > > > Richard. > > > > 2019-11-08 Richard Biener > > > > PR tree-optimization/ > > * tree-vect-loop.c (vect_create_epilog_for_reduction): Use > > STMT_VINFO_REDUC_VECTYPE for all computations, inserting > > sign-conversions as necessary. > > (vectorizable_reduction): Reject conversions in the chain > > that are not sign-conversions, base analysis on a non-converting > > stmt and its operation sign. Set STMT_VINFO_REDUC_VECTYPE. > > * tree-vect-stmts.c (vect_stmt_relevant_p): Don't dump anything > > for debug stmts. > > * tree-vectorizer.h (_stmt_vec_info::reduc_vectype): New. > > (STMT_VINFO_REDUC_VECTYPE): Likewise. > > > > * gcc.dg/vect/pr92205.c: XFAIL. > > * gcc.dg/vect/pr92324-1.c: New testcase. > > * gcc.dg/vect/pr92324-2.c: Likewise. > > > > Hi Richard, > > This new testcase (pr92324-2.c) fails on arm/aarch64: > FAIL: gcc.dg/vect/pr92324-2.c -flto -ffat-lto-objects execution test > FAIL: gcc.dg/vect/pr92324-2.c execution test Can you open a bugreport? I suspect the arm/aarch64 vector min/max patterns are bogus or we go a different code-path in the vectorizer here (IIRC x86 has reduc_[us]max_* patterns while aarch64 has none of that). Thanks, Richard. > Christophe > > > > Index: gcc/tree-vect-loop.c > > === > > --- gcc/tree-vect-loop.c(revision 277922) > > +++ gcc/tree-vect-loop.c(working copy) > > @@ -4231,7 +4231,6 @@ vect_create_epilog_for_reduction (stmt_v > >gimple *new_phi = NULL, *phi; > >stmt_vec_info phi_info; > >gimple_stmt_iterator exit_gsi; > > - tree vec_dest; > >tree new_temp = NULL_TREE, new_name, new_scalar_dest; > >gimple *epilog_stmt = NULL; > >gimple *exit_phi; > > @@ -4264,7 +4263,7 @@ vect_create_epilog_for_reduction (stmt_v > > } > >gcc_assert (!nested_in_vect_loop || double_reduc); > > > > - vectype = STMT_VINFO_VECTYPE (stmt_info); > > + vectype = STMT_VINFO_REDUC_VECTYPE (reduc_info); > >gcc_assert (vectype); > >mode = TYPE_MODE (vectype); > > > > @@ -4505,48 +4504,43 @@ vect_create_epilog_for_reduction (stmt_v > > one vector. */ > >if (REDUC_GROUP_FIRST_ELEMENT (stmt_info) || direct_slp_reduc) > > { > > + gimple_seq stmts = NULL; > >tree first_vect = PHI_RESULT (new_phis[0]); > > - gassign *new_vec_stmt = NULL; > > - vec_dest = vect_create_destination_var (scalar_dest, vectype); > > + first_vect = gimple_convert (, vectype, first_vect); > >for (k = 1; k < new_phis.length (); k++) > > { > > gimple *next_phi = new_phis[k]; > >tree second_vect = PHI_RESULT (next_phi); > > - tree tem = make_ssa_name (vec_dest, new_vec_stmt); > > - new_vec_stmt = gimple_build_assign (tem, code, > > - first_vect, second_vect); > > - gsi_insert_before (_gsi, new_vec_stmt, GSI_SAME_STMT); > > - first_vect = tem; > > + second_vect = gimple_convert (, vectype, second_vect); > > + first_vect = gimple_build (, code, vectype, > > +first_vect, second_vect); > > } > > + gsi_insert_seq_before (_gsi, stmts, GSI_SAME_STMT); > > > >new_phi_result = first_vect; > > - if (new_vec_stmt) > > -{ > > - new_phis.truncate (0); > > - new_phis.safe_push (new_vec_stmt); > > -} > > + new_phis.truncate (0); > > + new_phis.safe_push (SSA_NAME_DEF_STMT (first_vect)); > > } > >/* Likewise if we couldn't use a single defuse cycle. */ > >else if (ncopies > 1) > > { > >gcc_assert (new_phis.length () == 1); > > + gimple_seq stmts = NULL; > >tree first_vect = PHI_RESULT (new_phis[0]); > > - gassign *new_vec_stmt = NULL; > > - vec_dest = vect_create_destination_var (scalar_dest, vectype); > > + first_vect = gimple_convert (, vectype, first_vect); > >stmt_vec_info next_phi_info = loop_vinfo->lookup_stmt (new_phis[0]); > >for (int k = 1; k < ncopies; ++k) > > { > > next_phi_info = STMT_VINFO_RELATED_STMT (next_phi_info); > > tree second_vect = PHI_RESULT (next_phi_info->stmt); >
Re: [PATCH] Fix PR92324
On Fri, 8 Nov 2019 at 09:57, Richard Biener wrote: > > > I've been sitting on this for a few days since I'm not 100% happy > with how the code looks like. There's possibly still holes in it > (chains with mixed signed/unsigned adds for example might pick > up signed adds in the epilogue), but the wrong-code cases should > work fine now. I'm probably going to followup with some > mass renaming of variable/parameter names to make it more clear > which stmt / type we are actually looking at ... > > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. > > Richard. > > 2019-11-08 Richard Biener > > PR tree-optimization/ > * tree-vect-loop.c (vect_create_epilog_for_reduction): Use > STMT_VINFO_REDUC_VECTYPE for all computations, inserting > sign-conversions as necessary. > (vectorizable_reduction): Reject conversions in the chain > that are not sign-conversions, base analysis on a non-converting > stmt and its operation sign. Set STMT_VINFO_REDUC_VECTYPE. > * tree-vect-stmts.c (vect_stmt_relevant_p): Don't dump anything > for debug stmts. > * tree-vectorizer.h (_stmt_vec_info::reduc_vectype): New. > (STMT_VINFO_REDUC_VECTYPE): Likewise. > > * gcc.dg/vect/pr92205.c: XFAIL. > * gcc.dg/vect/pr92324-1.c: New testcase. > * gcc.dg/vect/pr92324-2.c: Likewise. > Hi Richard, This new testcase (pr92324-2.c) fails on arm/aarch64: FAIL: gcc.dg/vect/pr92324-2.c -flto -ffat-lto-objects execution test FAIL: gcc.dg/vect/pr92324-2.c execution test Christophe > Index: gcc/tree-vect-loop.c > === > --- gcc/tree-vect-loop.c(revision 277922) > +++ gcc/tree-vect-loop.c(working copy) > @@ -4231,7 +4231,6 @@ vect_create_epilog_for_reduction (stmt_v >gimple *new_phi = NULL, *phi; >stmt_vec_info phi_info; >gimple_stmt_iterator exit_gsi; > - tree vec_dest; >tree new_temp = NULL_TREE, new_name, new_scalar_dest; >gimple *epilog_stmt = NULL; >gimple *exit_phi; > @@ -4264,7 +4263,7 @@ vect_create_epilog_for_reduction (stmt_v > } >gcc_assert (!nested_in_vect_loop || double_reduc); > > - vectype = STMT_VINFO_VECTYPE (stmt_info); > + vectype = STMT_VINFO_REDUC_VECTYPE (reduc_info); >gcc_assert (vectype); >mode = TYPE_MODE (vectype); > > @@ -4505,48 +4504,43 @@ vect_create_epilog_for_reduction (stmt_v > one vector. */ >if (REDUC_GROUP_FIRST_ELEMENT (stmt_info) || direct_slp_reduc) > { > + gimple_seq stmts = NULL; >tree first_vect = PHI_RESULT (new_phis[0]); > - gassign *new_vec_stmt = NULL; > - vec_dest = vect_create_destination_var (scalar_dest, vectype); > + first_vect = gimple_convert (, vectype, first_vect); >for (k = 1; k < new_phis.length (); k++) > { > gimple *next_phi = new_phis[k]; >tree second_vect = PHI_RESULT (next_phi); > - tree tem = make_ssa_name (vec_dest, new_vec_stmt); > - new_vec_stmt = gimple_build_assign (tem, code, > - first_vect, second_vect); > - gsi_insert_before (_gsi, new_vec_stmt, GSI_SAME_STMT); > - first_vect = tem; > + second_vect = gimple_convert (, vectype, second_vect); > + first_vect = gimple_build (, code, vectype, > +first_vect, second_vect); > } > + gsi_insert_seq_before (_gsi, stmts, GSI_SAME_STMT); > >new_phi_result = first_vect; > - if (new_vec_stmt) > -{ > - new_phis.truncate (0); > - new_phis.safe_push (new_vec_stmt); > -} > + new_phis.truncate (0); > + new_phis.safe_push (SSA_NAME_DEF_STMT (first_vect)); > } >/* Likewise if we couldn't use a single defuse cycle. */ >else if (ncopies > 1) > { >gcc_assert (new_phis.length () == 1); > + gimple_seq stmts = NULL; >tree first_vect = PHI_RESULT (new_phis[0]); > - gassign *new_vec_stmt = NULL; > - vec_dest = vect_create_destination_var (scalar_dest, vectype); > + first_vect = gimple_convert (, vectype, first_vect); >stmt_vec_info next_phi_info = loop_vinfo->lookup_stmt (new_phis[0]); >for (int k = 1; k < ncopies; ++k) > { > next_phi_info = STMT_VINFO_RELATED_STMT (next_phi_info); > tree second_vect = PHI_RESULT (next_phi_info->stmt); > - tree tem = make_ssa_name (vec_dest, new_vec_stmt); > - new_vec_stmt = gimple_build_assign (tem, code, > - first_vect, second_vect); > - gsi_insert_before (_gsi, new_vec_stmt, GSI_SAME_STMT); > - first_vect = tem; > + second_vect = gimple_convert (, vectype, second_vect); > + first_vect = gimple_build (, code, vectype, > +first_vect, second_vect); > } >
Re: [PATCH] Fix PR92324
Richard Biener writes: > On Fri, 8 Nov 2019, Richard Sandiford wrote: > >> Richard Biener writes: >> > I've been sitting on this for a few days since I'm not 100% happy >> > with how the code looks like. There's possibly still holes in it >> > (chains with mixed signed/unsigned adds for example might pick >> > up signed adds in the epilogue), but the wrong-code cases should >> > work fine now. I'm probably going to followup with some >> > mass renaming of variable/parameter names to make it more clear >> > which stmt / type we are actually looking at ... >> > >> > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. >> >> Does this look like the right way of updating neutral_op_for_slp_reduction? >> It now needs to know whether the caller is using STMT_VINFO_REDUC_VECTYPE >> (for an epilogue value) or STMT_VINFO_REDUC_VECTYPE_IN (for a PHI argument). >> >> Fixes various gcc.target/aarch64/sve/slp_* tests, will give it a >> full test on aarch64-linux-gnu. > > Yeah, it looks sensible. In vect_create_epilog_for_reduction > please move the call down to the only use in the > > else if (direct_slp_reduc) > { > > block. Thanks, here's what I installed after testing on aarch64-linux-gnu and x86_64-linux-gnu. Richard 2019-11-08 Richard Sandiford gcc/ * tree-vect-loop.c (neutral_op_for_slp_reduction): Take the vector type as an argument rather than reading it from the stmt_vec_info. (vect_create_epilog_for_reduction): Update accordingly. (vectorizable_reduction): Likewise. (vect_transform_cycle_phi): Likewise. Index: gcc/tree-vect-loop.c === --- gcc/tree-vect-loop.c2019-11-08 11:58:22.331095690 + +++ gcc/tree-vect-loop.c2019-11-08 16:06:11.389032120 + @@ -2590,17 +2590,17 @@ reduction_fn_for_scalar_code (enum tree_ /* If there is a neutral value X such that SLP reduction NODE would not be affected by the introduction of additional X elements, return that X, - otherwise return null. CODE is the code of the reduction. REDUC_CHAIN - is true if the SLP statements perform a single reduction, false if each - statement performs an independent reduction. */ + otherwise return null. CODE is the code of the reduction and VECTOR_TYPE + is the vector type that would hold element X. REDUC_CHAIN is true if + the SLP statements perform a single reduction, false if each statement + performs an independent reduction. */ static tree -neutral_op_for_slp_reduction (slp_tree slp_node, tree_code code, - bool reduc_chain) +neutral_op_for_slp_reduction (slp_tree slp_node, tree vector_type, + tree_code code, bool reduc_chain) { vec stmts = SLP_TREE_SCALAR_STMTS (slp_node); stmt_vec_info stmt_vinfo = stmts[0]; - tree vector_type = STMT_VINFO_VECTYPE (stmt_vinfo); tree scalar_type = TREE_TYPE (vector_type); class loop *loop = gimple_bb (stmt_vinfo->stmt)->loop_father; gcc_assert (loop); @@ -4220,11 +4220,6 @@ vect_create_epilog_for_reduction (stmt_v = as_a (STMT_VINFO_REDUC_DEF (vect_orig_stmt (stmt_info))->stmt); enum tree_code code = STMT_VINFO_REDUC_CODE (reduc_info); internal_fn reduc_fn = STMT_VINFO_REDUC_FN (reduc_info); - tree neutral_op = NULL_TREE; - if (slp_node) -neutral_op - = neutral_op_for_slp_reduction (slp_node_instance->reduc_phis, code, - REDUC_GROUP_FIRST_ELEMENT (stmt_info)); stmt_vec_info prev_phi_info; tree vectype; machine_mode mode; @@ -4822,6 +4817,14 @@ vect_create_epilog_for_reduction (stmt_v scalar value if we have one, otherwise the initial scalar value is itself a neutral value. */ tree vector_identity = NULL_TREE; + tree neutral_op = NULL_TREE; + if (slp_node) + { + stmt_vec_info first = REDUC_GROUP_FIRST_ELEMENT (stmt_info); + neutral_op + = neutral_op_for_slp_reduction (slp_node_instance->reduc_phis, + vectype, code, first != NULL); + } if (neutral_op) vector_identity = gimple_build_vector_from_val (, vectype, neutral_op); @@ -6214,7 +6217,7 @@ vectorizable_reduction (stmt_vec_info st tree neutral_op = NULL_TREE; if (slp_node) neutral_op = neutral_op_for_slp_reduction - (slp_node_instance->reduc_phis, orig_code, + (slp_node_instance->reduc_phis, vectype_out, orig_code, REDUC_GROUP_FIRST_ELEMENT (stmt_info) != NULL); if (double_reduc && reduction_type == FOLD_LEFT_REDUCTION) @@ -6797,7 +6800,7 @@ vect_transform_cycle_phi (stmt_vec_info gcc_assert (slp_node == slp_node_instance->reduc_phis); stmt_vec_info first = REDUC_GROUP_FIRST_ELEMENT (reduc_stmt_info); tree neutral_op - = neutral_op_for_slp_reduction (slp_node, + =
Re: [PATCH] Fix PR92324
On Fri, 8 Nov 2019, Richard Sandiford wrote: > Richard Biener writes: > > I've been sitting on this for a few days since I'm not 100% happy > > with how the code looks like. There's possibly still holes in it > > (chains with mixed signed/unsigned adds for example might pick > > up signed adds in the epilogue), but the wrong-code cases should > > work fine now. I'm probably going to followup with some > > mass renaming of variable/parameter names to make it more clear > > which stmt / type we are actually looking at ... > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. > > Does this look like the right way of updating neutral_op_for_slp_reduction? > It now needs to know whether the caller is using STMT_VINFO_REDUC_VECTYPE > (for an epilogue value) or STMT_VINFO_REDUC_VECTYPE_IN (for a PHI argument). > > Fixes various gcc.target/aarch64/sve/slp_* tests, will give it a > full test on aarch64-linux-gnu. Yeah, it looks sensible. In vect_create_epilog_for_reduction please move the call down to the only use in the else if (direct_slp_reduc) { block. Thanks, Richard. > Thanks, > Richard > > > 2019-11-08 Richard Sandiford > > gcc/ > * tree-vect-loop.c (neutral_op_for_slp_reduction): Take the > vector type as an argument rather than reading it from the > stmt_vec_info. > (vect_create_epilog_for_reduction): Update accordingly, > passing the STMT_VINFO_REDUC_VECTYPE. > (vectorizable_reduction): Likewise. > (vect_transform_cycle_phi): Likewise, but passing the > STMT_VINFO_REDUC_VECTYPE_IN. > > Index: gcc/tree-vect-loop.c > === > --- gcc/tree-vect-loop.c 2019-11-08 09:06:29.654896085 + > +++ gcc/tree-vect-loop.c 2019-11-08 10:41:54.498861004 + > @@ -2586,17 +2586,17 @@ reduction_fn_for_scalar_code (enum tree_ > > /* If there is a neutral value X such that SLP reduction NODE would not > be affected by the introduction of additional X elements, return that X, > - otherwise return null. CODE is the code of the reduction. REDUC_CHAIN > - is true if the SLP statements perform a single reduction, false if each > - statement performs an independent reduction. */ > + otherwise return null. CODE is the code of the reduction and VECTOR_TYPE > + is the vector type that would hold element X. REDUC_CHAIN is true if > + the SLP statements perform a single reduction, false if each statement > + performs an independent reduction. */ > > static tree > -neutral_op_for_slp_reduction (slp_tree slp_node, tree_code code, > - bool reduc_chain) > +neutral_op_for_slp_reduction (slp_tree slp_node, tree vector_type, > + tree_code code, bool reduc_chain) > { >vec stmts = SLP_TREE_SCALAR_STMTS (slp_node); >stmt_vec_info stmt_vinfo = stmts[0]; > - tree vector_type = STMT_VINFO_VECTYPE (stmt_vinfo); >tree scalar_type = TREE_TYPE (vector_type); >class loop *loop = gimple_bb (stmt_vinfo->stmt)->loop_father; >gcc_assert (loop); > @@ -4216,11 +4216,6 @@ vect_create_epilog_for_reduction (stmt_v > = as_a (STMT_VINFO_REDUC_DEF (vect_orig_stmt > (stmt_info))->stmt); >enum tree_code code = STMT_VINFO_REDUC_CODE (reduc_info); >internal_fn reduc_fn = STMT_VINFO_REDUC_FN (reduc_info); > - tree neutral_op = NULL_TREE; > - if (slp_node) > -neutral_op > - = neutral_op_for_slp_reduction (slp_node_instance->reduc_phis, code, > - REDUC_GROUP_FIRST_ELEMENT (stmt_info)); >stmt_vec_info prev_phi_info; >tree vectype; >machine_mode mode; > @@ -4267,11 +4262,15 @@ vect_create_epilog_for_reduction (stmt_v >gcc_assert (vectype); >mode = TYPE_MODE (vectype); > > + tree neutral_op = NULL_TREE; >tree initial_def = NULL; >tree induc_val = NULL_TREE; >tree adjustment_def = NULL; >if (slp_node) > -; > +neutral_op > + = neutral_op_for_slp_reduction (slp_node_instance->reduc_phis, > + vectype, code, > + REDUC_GROUP_FIRST_ELEMENT (stmt_info)); >else > { >/* Get at the scalar def before the loop, that defines the initial > value > @@ -6210,7 +6209,7 @@ vectorizable_reduction (stmt_vec_info st >tree neutral_op = NULL_TREE; >if (slp_node) > neutral_op = neutral_op_for_slp_reduction > - (slp_node_instance->reduc_phis, orig_code, > + (slp_node_instance->reduc_phis, vectype_out, orig_code, > REDUC_GROUP_FIRST_ELEMENT (stmt_info) != NULL); > >if (double_reduc && reduction_type == FOLD_LEFT_REDUCTION) > @@ -6793,7 +6792,7 @@ vect_transform_cycle_phi (stmt_vec_info >gcc_assert (slp_node == slp_node_instance->reduc_phis); >stmt_vec_info first = REDUC_GROUP_FIRST_ELEMENT (reduc_stmt_info); >tree neutral_op > - = neutral_op_for_slp_reduction (slp_node,
Re: [PATCH] Fix PR92324
Richard Biener writes: > I've been sitting on this for a few days since I'm not 100% happy > with how the code looks like. There's possibly still holes in it > (chains with mixed signed/unsigned adds for example might pick > up signed adds in the epilogue), but the wrong-code cases should > work fine now. I'm probably going to followup with some > mass renaming of variable/parameter names to make it more clear > which stmt / type we are actually looking at ... > > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. Does this look like the right way of updating neutral_op_for_slp_reduction? It now needs to know whether the caller is using STMT_VINFO_REDUC_VECTYPE (for an epilogue value) or STMT_VINFO_REDUC_VECTYPE_IN (for a PHI argument). Fixes various gcc.target/aarch64/sve/slp_* tests, will give it a full test on aarch64-linux-gnu. Thanks, Richard 2019-11-08 Richard Sandiford gcc/ * tree-vect-loop.c (neutral_op_for_slp_reduction): Take the vector type as an argument rather than reading it from the stmt_vec_info. (vect_create_epilog_for_reduction): Update accordingly, passing the STMT_VINFO_REDUC_VECTYPE. (vectorizable_reduction): Likewise. (vect_transform_cycle_phi): Likewise, but passing the STMT_VINFO_REDUC_VECTYPE_IN. Index: gcc/tree-vect-loop.c === --- gcc/tree-vect-loop.c2019-11-08 09:06:29.654896085 + +++ gcc/tree-vect-loop.c2019-11-08 10:41:54.498861004 + @@ -2586,17 +2586,17 @@ reduction_fn_for_scalar_code (enum tree_ /* If there is a neutral value X such that SLP reduction NODE would not be affected by the introduction of additional X elements, return that X, - otherwise return null. CODE is the code of the reduction. REDUC_CHAIN - is true if the SLP statements perform a single reduction, false if each - statement performs an independent reduction. */ + otherwise return null. CODE is the code of the reduction and VECTOR_TYPE + is the vector type that would hold element X. REDUC_CHAIN is true if + the SLP statements perform a single reduction, false if each statement + performs an independent reduction. */ static tree -neutral_op_for_slp_reduction (slp_tree slp_node, tree_code code, - bool reduc_chain) +neutral_op_for_slp_reduction (slp_tree slp_node, tree vector_type, + tree_code code, bool reduc_chain) { vec stmts = SLP_TREE_SCALAR_STMTS (slp_node); stmt_vec_info stmt_vinfo = stmts[0]; - tree vector_type = STMT_VINFO_VECTYPE (stmt_vinfo); tree scalar_type = TREE_TYPE (vector_type); class loop *loop = gimple_bb (stmt_vinfo->stmt)->loop_father; gcc_assert (loop); @@ -4216,11 +4216,6 @@ vect_create_epilog_for_reduction (stmt_v = as_a (STMT_VINFO_REDUC_DEF (vect_orig_stmt (stmt_info))->stmt); enum tree_code code = STMT_VINFO_REDUC_CODE (reduc_info); internal_fn reduc_fn = STMT_VINFO_REDUC_FN (reduc_info); - tree neutral_op = NULL_TREE; - if (slp_node) -neutral_op - = neutral_op_for_slp_reduction (slp_node_instance->reduc_phis, code, - REDUC_GROUP_FIRST_ELEMENT (stmt_info)); stmt_vec_info prev_phi_info; tree vectype; machine_mode mode; @@ -4267,11 +4262,15 @@ vect_create_epilog_for_reduction (stmt_v gcc_assert (vectype); mode = TYPE_MODE (vectype); + tree neutral_op = NULL_TREE; tree initial_def = NULL; tree induc_val = NULL_TREE; tree adjustment_def = NULL; if (slp_node) -; +neutral_op + = neutral_op_for_slp_reduction (slp_node_instance->reduc_phis, + vectype, code, + REDUC_GROUP_FIRST_ELEMENT (stmt_info)); else { /* Get at the scalar def before the loop, that defines the initial value @@ -6210,7 +6209,7 @@ vectorizable_reduction (stmt_vec_info st tree neutral_op = NULL_TREE; if (slp_node) neutral_op = neutral_op_for_slp_reduction - (slp_node_instance->reduc_phis, orig_code, + (slp_node_instance->reduc_phis, vectype_out, orig_code, REDUC_GROUP_FIRST_ELEMENT (stmt_info) != NULL); if (double_reduc && reduction_type == FOLD_LEFT_REDUCTION) @@ -6793,7 +6792,7 @@ vect_transform_cycle_phi (stmt_vec_info gcc_assert (slp_node == slp_node_instance->reduc_phis); stmt_vec_info first = REDUC_GROUP_FIRST_ELEMENT (reduc_stmt_info); tree neutral_op - = neutral_op_for_slp_reduction (slp_node, + = neutral_op_for_slp_reduction (slp_node, vectype_in, STMT_VINFO_REDUC_CODE (reduc_info), first != NULL); get_initial_defs_for_reduction (slp_node_instance->reduc_phis,
[PATCH] Fix PR92324
I've been sitting on this for a few days since I'm not 100% happy with how the code looks like. There's possibly still holes in it (chains with mixed signed/unsigned adds for example might pick up signed adds in the epilogue), but the wrong-code cases should work fine now. I'm probably going to followup with some mass renaming of variable/parameter names to make it more clear which stmt / type we are actually looking at ... Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. Richard. 2019-11-08 Richard Biener PR tree-optimization/ * tree-vect-loop.c (vect_create_epilog_for_reduction): Use STMT_VINFO_REDUC_VECTYPE for all computations, inserting sign-conversions as necessary. (vectorizable_reduction): Reject conversions in the chain that are not sign-conversions, base analysis on a non-converting stmt and its operation sign. Set STMT_VINFO_REDUC_VECTYPE. * tree-vect-stmts.c (vect_stmt_relevant_p): Don't dump anything for debug stmts. * tree-vectorizer.h (_stmt_vec_info::reduc_vectype): New. (STMT_VINFO_REDUC_VECTYPE): Likewise. * gcc.dg/vect/pr92205.c: XFAIL. * gcc.dg/vect/pr92324-1.c: New testcase. * gcc.dg/vect/pr92324-2.c: Likewise. Index: gcc/tree-vect-loop.c === --- gcc/tree-vect-loop.c(revision 277922) +++ gcc/tree-vect-loop.c(working copy) @@ -4231,7 +4231,6 @@ vect_create_epilog_for_reduction (stmt_v gimple *new_phi = NULL, *phi; stmt_vec_info phi_info; gimple_stmt_iterator exit_gsi; - tree vec_dest; tree new_temp = NULL_TREE, new_name, new_scalar_dest; gimple *epilog_stmt = NULL; gimple *exit_phi; @@ -4264,7 +4263,7 @@ vect_create_epilog_for_reduction (stmt_v } gcc_assert (!nested_in_vect_loop || double_reduc); - vectype = STMT_VINFO_VECTYPE (stmt_info); + vectype = STMT_VINFO_REDUC_VECTYPE (reduc_info); gcc_assert (vectype); mode = TYPE_MODE (vectype); @@ -4505,48 +4504,43 @@ vect_create_epilog_for_reduction (stmt_v one vector. */ if (REDUC_GROUP_FIRST_ELEMENT (stmt_info) || direct_slp_reduc) { + gimple_seq stmts = NULL; tree first_vect = PHI_RESULT (new_phis[0]); - gassign *new_vec_stmt = NULL; - vec_dest = vect_create_destination_var (scalar_dest, vectype); + first_vect = gimple_convert (, vectype, first_vect); for (k = 1; k < new_phis.length (); k++) { gimple *next_phi = new_phis[k]; tree second_vect = PHI_RESULT (next_phi); - tree tem = make_ssa_name (vec_dest, new_vec_stmt); - new_vec_stmt = gimple_build_assign (tem, code, - first_vect, second_vect); - gsi_insert_before (_gsi, new_vec_stmt, GSI_SAME_STMT); - first_vect = tem; + second_vect = gimple_convert (, vectype, second_vect); + first_vect = gimple_build (, code, vectype, +first_vect, second_vect); } + gsi_insert_seq_before (_gsi, stmts, GSI_SAME_STMT); new_phi_result = first_vect; - if (new_vec_stmt) -{ - new_phis.truncate (0); - new_phis.safe_push (new_vec_stmt); -} + new_phis.truncate (0); + new_phis.safe_push (SSA_NAME_DEF_STMT (first_vect)); } /* Likewise if we couldn't use a single defuse cycle. */ else if (ncopies > 1) { gcc_assert (new_phis.length () == 1); + gimple_seq stmts = NULL; tree first_vect = PHI_RESULT (new_phis[0]); - gassign *new_vec_stmt = NULL; - vec_dest = vect_create_destination_var (scalar_dest, vectype); + first_vect = gimple_convert (, vectype, first_vect); stmt_vec_info next_phi_info = loop_vinfo->lookup_stmt (new_phis[0]); for (int k = 1; k < ncopies; ++k) { next_phi_info = STMT_VINFO_RELATED_STMT (next_phi_info); tree second_vect = PHI_RESULT (next_phi_info->stmt); - tree tem = make_ssa_name (vec_dest, new_vec_stmt); - new_vec_stmt = gimple_build_assign (tem, code, - first_vect, second_vect); - gsi_insert_before (_gsi, new_vec_stmt, GSI_SAME_STMT); - first_vect = tem; + second_vect = gimple_convert (, vectype, second_vect); + first_vect = gimple_build (, code, vectype, +first_vect, second_vect); } + gsi_insert_seq_before (_gsi, stmts, GSI_SAME_STMT); new_phi_result = first_vect; new_phis.truncate (0); - new_phis.safe_push (new_vec_stmt); + new_phis.safe_push (SSA_NAME_DEF_STMT (first_vect)); } else new_phi_result = PHI_RESULT (new_phis[0]); @@ -4877,13 +4871,14 @@ vect_create_epilog_for_reduction (stmt_v in a vector mode of smaller size and first reduce upper/lower halves against each