[PATCH] Fix PR92324 more

2019-11-15 Thread Richard Biener


There's another case missed, SLP reductions still use the wrong
sign for the epilogue operations.  I also noticed that we generate
somewhat ugly code for the epilogue since we fall back to scalar
extracts but do not consider to reduce a larger vector to a smaller
one first (and neither multiple vectors, but that's for another fix).

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2019-11-15  Richard Biener  

PR tree-optimization/92324
* tree-vect-loop.c (vect_create_epilog_for_reduction): Fix
singedness of SLP reduction epilouge operations.  Also reduce
the vector width for SLP reductions before doing elementwise
operations if possible.

* gcc.dg/vect/pr92324-4.c: New testcase.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c(revision 278281)
+++ gcc/tree-vect-loop.c(working copy)
@@ -4930,6 +4941,8 @@ vect_create_epilog_for_reduction (stmt_v
   bool reduce_with_shift;
   tree vec_temp;
 
+  gcc_assert (slp_reduc || new_phis.length () == 1);
+
   /* See if the target wants to do the final (shift) reduction
 in a vector mode of smaller size and first reduce upper/lower
 halves against each other.  */
@@ -4937,6 +4950,21 @@ vect_create_epilog_for_reduction (stmt_v
   tree stype = TREE_TYPE (vectype);
   unsigned nunits = TYPE_VECTOR_SUBPARTS (vectype).to_constant ();
   unsigned nunits1 = nunits;
+  if ((mode1 = targetm.vectorize.split_reduction (mode)) != mode
+ && new_phis.length () == 1)
+   {
+ nunits1 = GET_MODE_NUNITS (mode1).to_constant ();
+ /* For SLP reductions we have to make sure lanes match up, but
+since we're doing individual element final reduction reducing
+vector width here is even more important.
+???  We can also separate lanes with permutes, for the common
+case of power-of-two group-size odd/even extracts would work.  */
+ if (slp_reduc && nunits != nunits1)
+   {
+ nunits1 = least_common_multiple (nunits1, group_size);
+ gcc_assert (exact_log2 (nunits1) != -1 && nunits1 <= nunits);
+   }
+   }
   if (!slp_reduc
  && (mode1 = targetm.vectorize.split_reduction (mode)) != mode)
nunits1 = GET_MODE_NUNITS (mode1).to_constant ();
@@ -4958,7 +4986,6 @@ vect_create_epilog_for_reduction (stmt_v
   new_temp = new_phi_result;
   while (nunits > nunits1)
{
- gcc_assert (!slp_reduc);
  nunits /= 2;
  vectype1 = get_related_vectype_for_scalar_type (TYPE_MODE (vectype),
  stype, nunits);
@@ -5113,6 +5140,8 @@ vect_create_epilog_for_reduction (stmt_v
 
  int vec_size_in_bits = tree_to_uhwi (TYPE_SIZE (vectype1));
  int element_bitsize = tree_to_uhwi (bitsize);
+ tree compute_type = TREE_TYPE (vectype);
+ gimple_seq stmts = NULL;
   FOR_EACH_VEC_ELT (new_phis, i, new_phi)
 {
   int bit_offset;
@@ -5120,12 +5149,8 @@ vect_create_epilog_for_reduction (stmt_v
 vec_temp = PHI_RESULT (new_phi);
   else
 vec_temp = gimple_assign_lhs (new_phi);
-  tree rhs = build3 (BIT_FIELD_REF, scalar_type, vec_temp, bitsize,
-bitsize_zero_node);
-  epilog_stmt = gimple_build_assign (new_scalar_dest, rhs);
-  new_temp = make_ssa_name (new_scalar_dest, epilog_stmt);
-  gimple_assign_set_lhs (epilog_stmt, new_temp);
-  gsi_insert_before (_gsi, epilog_stmt, GSI_SAME_STMT);
+ new_temp = gimple_build (, BIT_FIELD_REF, compute_type,
+  vec_temp, bitsize, bitsize_zero_node);
 
   /* In SLP we don't need to apply reduction operation, so we just
  collect s' values in SCALAR_RESULTS.  */
@@ -5137,14 +5162,9 @@ vect_create_epilog_for_reduction (stmt_v
bit_offset += element_bitsize)
 {
   tree bitpos = bitsize_int (bit_offset);
-  tree rhs = build3 (BIT_FIELD_REF, scalar_type, vec_temp,
- bitsize, bitpos);
-
-  epilog_stmt = gimple_build_assign (new_scalar_dest, rhs);
-  new_name = make_ssa_name (new_scalar_dest, epilog_stmt);
-  gimple_assign_set_lhs (epilog_stmt, new_name);
-  gsi_insert_before (_gsi, epilog_stmt, GSI_SAME_STMT);
-
+ new_name = gimple_build (, BIT_FIELD_REF,
+  compute_type, vec_temp,
+  bitsize, bitpos);
   if (slp_reduc)
 {
   /* In SLP we don't need to apply reduction 

Re: [PATCH] Fix PR92324

2019-11-12 Thread Christophe Lyon
On Mon, 11 Nov 2019 at 14:36, Richard Biener  wrote:
>
> On Mon, 11 Nov 2019, Christophe Lyon wrote:
>
> > On Fri, 8 Nov 2019 at 09:57, Richard Biener  wrote:
> > >
> > >
> > > I've been sitting on this for a few days since I'm not 100% happy
> > > with how the code looks like.  There's possibly still holes in it
> > > (chains with mixed signed/unsigned adds for example might pick
> > > up signed adds in the epilogue), but the wrong-code cases should
> > > work fine now.  I'm probably going to followup with some
> > > mass renaming of variable/parameter names to make it more clear
> > > which stmt / type we are actually looking at ...
> > >
> > > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.
> > >
> > > Richard.
> > >
> > > 2019-11-08  Richard Biener  
> > >
> > > PR tree-optimization/
> > > * tree-vect-loop.c (vect_create_epilog_for_reduction): Use
> > > STMT_VINFO_REDUC_VECTYPE for all computations, inserting
> > > sign-conversions as necessary.
> > > (vectorizable_reduction): Reject conversions in the chain
> > > that are not sign-conversions, base analysis on a non-converting
> > > stmt and its operation sign.  Set STMT_VINFO_REDUC_VECTYPE.
> > > * tree-vect-stmts.c (vect_stmt_relevant_p): Don't dump anything
> > > for debug stmts.
> > > * tree-vectorizer.h (_stmt_vec_info::reduc_vectype): New.
> > > (STMT_VINFO_REDUC_VECTYPE): Likewise.
> > >
> > > * gcc.dg/vect/pr92205.c: XFAIL.
> > > * gcc.dg/vect/pr92324-1.c: New testcase.
> > > * gcc.dg/vect/pr92324-2.c: Likewise.
> > >
> >
> > Hi Richard,
> >
> > This new testcase (pr92324-2.c) fails on arm/aarch64:
> > FAIL: gcc.dg/vect/pr92324-2.c -flto -ffat-lto-objects execution test
> > FAIL: gcc.dg/vect/pr92324-2.c execution test
>
> Can you open a bugreport?  I suspect the arm/aarch64 vector
> min/max patterns are bogus or we go a different code-path
> in the vectorizer here (IIRC x86 has reduc_[us]max_* patterns
> while aarch64 has none of that).
>

Sure, I've filed PR92473.


> Thanks,
> Richard.
>
> > Christophe
> >
> >
> > > Index: gcc/tree-vect-loop.c
> > > ===
> > > --- gcc/tree-vect-loop.c(revision 277922)
> > > +++ gcc/tree-vect-loop.c(working copy)
> > > @@ -4231,7 +4231,6 @@ vect_create_epilog_for_reduction (stmt_v
> > >gimple *new_phi = NULL, *phi;
> > >stmt_vec_info phi_info;
> > >gimple_stmt_iterator exit_gsi;
> > > -  tree vec_dest;
> > >tree new_temp = NULL_TREE, new_name, new_scalar_dest;
> > >gimple *epilog_stmt = NULL;
> > >gimple *exit_phi;
> > > @@ -4264,7 +4263,7 @@ vect_create_epilog_for_reduction (stmt_v
> > >  }
> > >gcc_assert (!nested_in_vect_loop || double_reduc);
> > >
> > > -  vectype = STMT_VINFO_VECTYPE (stmt_info);
> > > +  vectype = STMT_VINFO_REDUC_VECTYPE (reduc_info);
> > >gcc_assert (vectype);
> > >mode = TYPE_MODE (vectype);
> > >
> > > @@ -4505,48 +4504,43 @@ vect_create_epilog_for_reduction (stmt_v
> > >   one vector.  */
> > >if (REDUC_GROUP_FIRST_ELEMENT (stmt_info) || direct_slp_reduc)
> > >  {
> > > +  gimple_seq stmts = NULL;
> > >tree first_vect = PHI_RESULT (new_phis[0]);
> > > -  gassign *new_vec_stmt = NULL;
> > > -  vec_dest = vect_create_destination_var (scalar_dest, vectype);
> > > +  first_vect = gimple_convert (, vectype, first_vect);
> > >for (k = 1; k < new_phis.length (); k++)
> > >  {
> > >   gimple *next_phi = new_phis[k];
> > >tree second_vect = PHI_RESULT (next_phi);
> > > -  tree tem = make_ssa_name (vec_dest, new_vec_stmt);
> > > -  new_vec_stmt = gimple_build_assign (tem, code,
> > > - first_vect, second_vect);
> > > -  gsi_insert_before (_gsi, new_vec_stmt, GSI_SAME_STMT);
> > > - first_vect = tem;
> > > + second_vect = gimple_convert (, vectype, second_vect);
> > > +  first_vect = gimple_build (, code, vectype,
> > > +first_vect, second_vect);
> > >  }
> > > +  gsi_insert_seq_before (_gsi, stmts, GSI_SAME_STMT);
> > >
> > >new_phi_result = first_vect;
> > > -  if (new_vec_stmt)
> > > -{
> > > -  new_phis.truncate (0);
> > > -  new_phis.safe_push (new_vec_stmt);
> > > -}
> > > +  new_phis.truncate (0);
> > > +  new_phis.safe_push (SSA_NAME_DEF_STMT (first_vect));
> > >  }
> > >/* Likewise if we couldn't use a single defuse cycle.  */
> > >else if (ncopies > 1)
> > >  {
> > >gcc_assert (new_phis.length () == 1);
> > > +  gimple_seq stmts = NULL;
> > >tree first_vect = PHI_RESULT (new_phis[0]);
> > > -  gassign *new_vec_stmt = NULL;
> > > -  vec_dest = vect_create_destination_var (scalar_dest, vectype);
> > > +  first_vect = 

Re: [PATCH] Fix PR92324

2019-11-11 Thread Richard Biener
On Mon, 11 Nov 2019, Christophe Lyon wrote:

> On Fri, 8 Nov 2019 at 09:57, Richard Biener  wrote:
> >
> >
> > I've been sitting on this for a few days since I'm not 100% happy
> > with how the code looks like.  There's possibly still holes in it
> > (chains with mixed signed/unsigned adds for example might pick
> > up signed adds in the epilogue), but the wrong-code cases should
> > work fine now.  I'm probably going to followup with some
> > mass renaming of variable/parameter names to make it more clear
> > which stmt / type we are actually looking at ...
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.
> >
> > Richard.
> >
> > 2019-11-08  Richard Biener  
> >
> > PR tree-optimization/
> > * tree-vect-loop.c (vect_create_epilog_for_reduction): Use
> > STMT_VINFO_REDUC_VECTYPE for all computations, inserting
> > sign-conversions as necessary.
> > (vectorizable_reduction): Reject conversions in the chain
> > that are not sign-conversions, base analysis on a non-converting
> > stmt and its operation sign.  Set STMT_VINFO_REDUC_VECTYPE.
> > * tree-vect-stmts.c (vect_stmt_relevant_p): Don't dump anything
> > for debug stmts.
> > * tree-vectorizer.h (_stmt_vec_info::reduc_vectype): New.
> > (STMT_VINFO_REDUC_VECTYPE): Likewise.
> >
> > * gcc.dg/vect/pr92205.c: XFAIL.
> > * gcc.dg/vect/pr92324-1.c: New testcase.
> > * gcc.dg/vect/pr92324-2.c: Likewise.
> >
> 
> Hi Richard,
> 
> This new testcase (pr92324-2.c) fails on arm/aarch64:
> FAIL: gcc.dg/vect/pr92324-2.c -flto -ffat-lto-objects execution test
> FAIL: gcc.dg/vect/pr92324-2.c execution test

Can you open a bugreport?  I suspect the arm/aarch64 vector
min/max patterns are bogus or we go a different code-path
in the vectorizer here (IIRC x86 has reduc_[us]max_* patterns
while aarch64 has none of that).

Thanks,
Richard.

> Christophe
> 
> 
> > Index: gcc/tree-vect-loop.c
> > ===
> > --- gcc/tree-vect-loop.c(revision 277922)
> > +++ gcc/tree-vect-loop.c(working copy)
> > @@ -4231,7 +4231,6 @@ vect_create_epilog_for_reduction (stmt_v
> >gimple *new_phi = NULL, *phi;
> >stmt_vec_info phi_info;
> >gimple_stmt_iterator exit_gsi;
> > -  tree vec_dest;
> >tree new_temp = NULL_TREE, new_name, new_scalar_dest;
> >gimple *epilog_stmt = NULL;
> >gimple *exit_phi;
> > @@ -4264,7 +4263,7 @@ vect_create_epilog_for_reduction (stmt_v
> >  }
> >gcc_assert (!nested_in_vect_loop || double_reduc);
> >
> > -  vectype = STMT_VINFO_VECTYPE (stmt_info);
> > +  vectype = STMT_VINFO_REDUC_VECTYPE (reduc_info);
> >gcc_assert (vectype);
> >mode = TYPE_MODE (vectype);
> >
> > @@ -4505,48 +4504,43 @@ vect_create_epilog_for_reduction (stmt_v
> >   one vector.  */
> >if (REDUC_GROUP_FIRST_ELEMENT (stmt_info) || direct_slp_reduc)
> >  {
> > +  gimple_seq stmts = NULL;
> >tree first_vect = PHI_RESULT (new_phis[0]);
> > -  gassign *new_vec_stmt = NULL;
> > -  vec_dest = vect_create_destination_var (scalar_dest, vectype);
> > +  first_vect = gimple_convert (, vectype, first_vect);
> >for (k = 1; k < new_phis.length (); k++)
> >  {
> >   gimple *next_phi = new_phis[k];
> >tree second_vect = PHI_RESULT (next_phi);
> > -  tree tem = make_ssa_name (vec_dest, new_vec_stmt);
> > -  new_vec_stmt = gimple_build_assign (tem, code,
> > - first_vect, second_vect);
> > -  gsi_insert_before (_gsi, new_vec_stmt, GSI_SAME_STMT);
> > - first_vect = tem;
> > + second_vect = gimple_convert (, vectype, second_vect);
> > +  first_vect = gimple_build (, code, vectype,
> > +first_vect, second_vect);
> >  }
> > +  gsi_insert_seq_before (_gsi, stmts, GSI_SAME_STMT);
> >
> >new_phi_result = first_vect;
> > -  if (new_vec_stmt)
> > -{
> > -  new_phis.truncate (0);
> > -  new_phis.safe_push (new_vec_stmt);
> > -}
> > +  new_phis.truncate (0);
> > +  new_phis.safe_push (SSA_NAME_DEF_STMT (first_vect));
> >  }
> >/* Likewise if we couldn't use a single defuse cycle.  */
> >else if (ncopies > 1)
> >  {
> >gcc_assert (new_phis.length () == 1);
> > +  gimple_seq stmts = NULL;
> >tree first_vect = PHI_RESULT (new_phis[0]);
> > -  gassign *new_vec_stmt = NULL;
> > -  vec_dest = vect_create_destination_var (scalar_dest, vectype);
> > +  first_vect = gimple_convert (, vectype, first_vect);
> >stmt_vec_info next_phi_info = loop_vinfo->lookup_stmt (new_phis[0]);
> >for (int k = 1; k < ncopies; ++k)
> > {
> >   next_phi_info = STMT_VINFO_RELATED_STMT (next_phi_info);
> >   tree second_vect = PHI_RESULT (next_phi_info->stmt);
> 

Re: [PATCH] Fix PR92324

2019-11-11 Thread Christophe Lyon
On Fri, 8 Nov 2019 at 09:57, Richard Biener  wrote:
>
>
> I've been sitting on this for a few days since I'm not 100% happy
> with how the code looks like.  There's possibly still holes in it
> (chains with mixed signed/unsigned adds for example might pick
> up signed adds in the epilogue), but the wrong-code cases should
> work fine now.  I'm probably going to followup with some
> mass renaming of variable/parameter names to make it more clear
> which stmt / type we are actually looking at ...
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.
>
> Richard.
>
> 2019-11-08  Richard Biener  
>
> PR tree-optimization/
> * tree-vect-loop.c (vect_create_epilog_for_reduction): Use
> STMT_VINFO_REDUC_VECTYPE for all computations, inserting
> sign-conversions as necessary.
> (vectorizable_reduction): Reject conversions in the chain
> that are not sign-conversions, base analysis on a non-converting
> stmt and its operation sign.  Set STMT_VINFO_REDUC_VECTYPE.
> * tree-vect-stmts.c (vect_stmt_relevant_p): Don't dump anything
> for debug stmts.
> * tree-vectorizer.h (_stmt_vec_info::reduc_vectype): New.
> (STMT_VINFO_REDUC_VECTYPE): Likewise.
>
> * gcc.dg/vect/pr92205.c: XFAIL.
> * gcc.dg/vect/pr92324-1.c: New testcase.
> * gcc.dg/vect/pr92324-2.c: Likewise.
>

Hi Richard,

This new testcase (pr92324-2.c) fails on arm/aarch64:
FAIL: gcc.dg/vect/pr92324-2.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/pr92324-2.c execution test

Christophe


> Index: gcc/tree-vect-loop.c
> ===
> --- gcc/tree-vect-loop.c(revision 277922)
> +++ gcc/tree-vect-loop.c(working copy)
> @@ -4231,7 +4231,6 @@ vect_create_epilog_for_reduction (stmt_v
>gimple *new_phi = NULL, *phi;
>stmt_vec_info phi_info;
>gimple_stmt_iterator exit_gsi;
> -  tree vec_dest;
>tree new_temp = NULL_TREE, new_name, new_scalar_dest;
>gimple *epilog_stmt = NULL;
>gimple *exit_phi;
> @@ -4264,7 +4263,7 @@ vect_create_epilog_for_reduction (stmt_v
>  }
>gcc_assert (!nested_in_vect_loop || double_reduc);
>
> -  vectype = STMT_VINFO_VECTYPE (stmt_info);
> +  vectype = STMT_VINFO_REDUC_VECTYPE (reduc_info);
>gcc_assert (vectype);
>mode = TYPE_MODE (vectype);
>
> @@ -4505,48 +4504,43 @@ vect_create_epilog_for_reduction (stmt_v
>   one vector.  */
>if (REDUC_GROUP_FIRST_ELEMENT (stmt_info) || direct_slp_reduc)
>  {
> +  gimple_seq stmts = NULL;
>tree first_vect = PHI_RESULT (new_phis[0]);
> -  gassign *new_vec_stmt = NULL;
> -  vec_dest = vect_create_destination_var (scalar_dest, vectype);
> +  first_vect = gimple_convert (, vectype, first_vect);
>for (k = 1; k < new_phis.length (); k++)
>  {
>   gimple *next_phi = new_phis[k];
>tree second_vect = PHI_RESULT (next_phi);
> -  tree tem = make_ssa_name (vec_dest, new_vec_stmt);
> -  new_vec_stmt = gimple_build_assign (tem, code,
> - first_vect, second_vect);
> -  gsi_insert_before (_gsi, new_vec_stmt, GSI_SAME_STMT);
> - first_vect = tem;
> + second_vect = gimple_convert (, vectype, second_vect);
> +  first_vect = gimple_build (, code, vectype,
> +first_vect, second_vect);
>  }
> +  gsi_insert_seq_before (_gsi, stmts, GSI_SAME_STMT);
>
>new_phi_result = first_vect;
> -  if (new_vec_stmt)
> -{
> -  new_phis.truncate (0);
> -  new_phis.safe_push (new_vec_stmt);
> -}
> +  new_phis.truncate (0);
> +  new_phis.safe_push (SSA_NAME_DEF_STMT (first_vect));
>  }
>/* Likewise if we couldn't use a single defuse cycle.  */
>else if (ncopies > 1)
>  {
>gcc_assert (new_phis.length () == 1);
> +  gimple_seq stmts = NULL;
>tree first_vect = PHI_RESULT (new_phis[0]);
> -  gassign *new_vec_stmt = NULL;
> -  vec_dest = vect_create_destination_var (scalar_dest, vectype);
> +  first_vect = gimple_convert (, vectype, first_vect);
>stmt_vec_info next_phi_info = loop_vinfo->lookup_stmt (new_phis[0]);
>for (int k = 1; k < ncopies; ++k)
> {
>   next_phi_info = STMT_VINFO_RELATED_STMT (next_phi_info);
>   tree second_vect = PHI_RESULT (next_phi_info->stmt);
> -  tree tem = make_ssa_name (vec_dest, new_vec_stmt);
> -  new_vec_stmt = gimple_build_assign (tem, code,
> - first_vect, second_vect);
> -  gsi_insert_before (_gsi, new_vec_stmt, GSI_SAME_STMT);
> - first_vect = tem;
> + second_vect = gimple_convert (, vectype, second_vect);
> + first_vect = gimple_build (, code, vectype,
> +first_vect, second_vect);
> }
> 

Re: [PATCH] Fix PR92324

2019-11-08 Thread Richard Sandiford
Richard Biener  writes:
> On Fri, 8 Nov 2019, Richard Sandiford wrote:
>
>> Richard Biener  writes:
>> > I've been sitting on this for a few days since I'm not 100% happy
>> > with how the code looks like.  There's possibly still holes in it
>> > (chains with mixed signed/unsigned adds for example might pick
>> > up signed adds in the epilogue), but the wrong-code cases should
>> > work fine now.  I'm probably going to followup with some
>> > mass renaming of variable/parameter names to make it more clear
>> > which stmt / type we are actually looking at ...
>> >
>> > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.
>> 
>> Does this look like the right way of updating neutral_op_for_slp_reduction?
>> It now needs to know whether the caller is using STMT_VINFO_REDUC_VECTYPE
>> (for an epilogue value) or STMT_VINFO_REDUC_VECTYPE_IN (for a PHI argument).
>> 
>> Fixes various gcc.target/aarch64/sve/slp_* tests, will give it a
>> full test on aarch64-linux-gnu.
>
> Yeah, it looks sensible.  In vect_create_epilog_for_reduction
> please move the call down to the only use in the
>
>   else if (direct_slp_reduc)
> {
>
> block.

Thanks, here's what I installed after testing on aarch64-linux-gnu
and x86_64-linux-gnu.

Richard

2019-11-08  Richard Sandiford  

gcc/
* tree-vect-loop.c (neutral_op_for_slp_reduction): Take the
vector type as an argument rather than reading it from the
stmt_vec_info.
(vect_create_epilog_for_reduction): Update accordingly.
(vectorizable_reduction): Likewise.
(vect_transform_cycle_phi): Likewise.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c2019-11-08 11:58:22.331095690 +
+++ gcc/tree-vect-loop.c2019-11-08 16:06:11.389032120 +
@@ -2590,17 +2590,17 @@ reduction_fn_for_scalar_code (enum tree_
 
 /* If there is a neutral value X such that SLP reduction NODE would not
be affected by the introduction of additional X elements, return that X,
-   otherwise return null.  CODE is the code of the reduction.  REDUC_CHAIN
-   is true if the SLP statements perform a single reduction, false if each
-   statement performs an independent reduction.  */
+   otherwise return null.  CODE is the code of the reduction and VECTOR_TYPE
+   is the vector type that would hold element X.  REDUC_CHAIN is true if
+   the SLP statements perform a single reduction, false if each statement
+   performs an independent reduction.  */
 
 static tree
-neutral_op_for_slp_reduction (slp_tree slp_node, tree_code code,
- bool reduc_chain)
+neutral_op_for_slp_reduction (slp_tree slp_node, tree vector_type,
+ tree_code code, bool reduc_chain)
 {
   vec stmts = SLP_TREE_SCALAR_STMTS (slp_node);
   stmt_vec_info stmt_vinfo = stmts[0];
-  tree vector_type = STMT_VINFO_VECTYPE (stmt_vinfo);
   tree scalar_type = TREE_TYPE (vector_type);
   class loop *loop = gimple_bb (stmt_vinfo->stmt)->loop_father;
   gcc_assert (loop);
@@ -4220,11 +4220,6 @@ vect_create_epilog_for_reduction (stmt_v
 = as_a  (STMT_VINFO_REDUC_DEF (vect_orig_stmt (stmt_info))->stmt);
   enum tree_code code = STMT_VINFO_REDUC_CODE (reduc_info);
   internal_fn reduc_fn = STMT_VINFO_REDUC_FN (reduc_info);
-  tree neutral_op = NULL_TREE;
-  if (slp_node)
-neutral_op
-  = neutral_op_for_slp_reduction (slp_node_instance->reduc_phis, code,
- REDUC_GROUP_FIRST_ELEMENT (stmt_info));
   stmt_vec_info prev_phi_info;
   tree vectype;
   machine_mode mode;
@@ -4822,6 +4817,14 @@ vect_create_epilog_for_reduction (stmt_v
 scalar value if we have one, otherwise the initial scalar value
 is itself a neutral value.  */
   tree vector_identity = NULL_TREE;
+  tree neutral_op = NULL_TREE;
+  if (slp_node)
+   {
+ stmt_vec_info first = REDUC_GROUP_FIRST_ELEMENT (stmt_info);
+ neutral_op
+   = neutral_op_for_slp_reduction (slp_node_instance->reduc_phis,
+   vectype, code, first != NULL);
+   }
   if (neutral_op)
vector_identity = gimple_build_vector_from_val (, vectype,
neutral_op);
@@ -6214,7 +6217,7 @@ vectorizable_reduction (stmt_vec_info st
   tree neutral_op = NULL_TREE;
   if (slp_node)
 neutral_op = neutral_op_for_slp_reduction
-  (slp_node_instance->reduc_phis, orig_code,
+  (slp_node_instance->reduc_phis, vectype_out, orig_code,
REDUC_GROUP_FIRST_ELEMENT (stmt_info) != NULL);
 
   if (double_reduc && reduction_type == FOLD_LEFT_REDUCTION)
@@ -6797,7 +6800,7 @@ vect_transform_cycle_phi (stmt_vec_info
   gcc_assert (slp_node == slp_node_instance->reduc_phis);
   stmt_vec_info first = REDUC_GROUP_FIRST_ELEMENT (reduc_stmt_info);
   tree neutral_op
-   = neutral_op_for_slp_reduction (slp_node,
+   = 

Re: [PATCH] Fix PR92324

2019-11-08 Thread Richard Biener
On Fri, 8 Nov 2019, Richard Sandiford wrote:

> Richard Biener  writes:
> > I've been sitting on this for a few days since I'm not 100% happy
> > with how the code looks like.  There's possibly still holes in it
> > (chains with mixed signed/unsigned adds for example might pick
> > up signed adds in the epilogue), but the wrong-code cases should
> > work fine now.  I'm probably going to followup with some
> > mass renaming of variable/parameter names to make it more clear
> > which stmt / type we are actually looking at ...
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.
> 
> Does this look like the right way of updating neutral_op_for_slp_reduction?
> It now needs to know whether the caller is using STMT_VINFO_REDUC_VECTYPE
> (for an epilogue value) or STMT_VINFO_REDUC_VECTYPE_IN (for a PHI argument).
> 
> Fixes various gcc.target/aarch64/sve/slp_* tests, will give it a
> full test on aarch64-linux-gnu.

Yeah, it looks sensible.  In vect_create_epilog_for_reduction
please move the call down to the only use in the

  else if (direct_slp_reduc)
{

block.

Thanks,
Richard.

> Thanks,
> Richard
> 
> 
> 2019-11-08  Richard Sandiford  
> 
> gcc/
>   * tree-vect-loop.c (neutral_op_for_slp_reduction): Take the
>   vector type as an argument rather than reading it from the
>   stmt_vec_info.
>   (vect_create_epilog_for_reduction): Update accordingly,
>   passing the STMT_VINFO_REDUC_VECTYPE.
>   (vectorizable_reduction): Likewise.
>   (vect_transform_cycle_phi): Likewise, but passing the
>   STMT_VINFO_REDUC_VECTYPE_IN.
> 
> Index: gcc/tree-vect-loop.c
> ===
> --- gcc/tree-vect-loop.c  2019-11-08 09:06:29.654896085 +
> +++ gcc/tree-vect-loop.c  2019-11-08 10:41:54.498861004 +
> @@ -2586,17 +2586,17 @@ reduction_fn_for_scalar_code (enum tree_
>  
>  /* If there is a neutral value X such that SLP reduction NODE would not
> be affected by the introduction of additional X elements, return that X,
> -   otherwise return null.  CODE is the code of the reduction.  REDUC_CHAIN
> -   is true if the SLP statements perform a single reduction, false if each
> -   statement performs an independent reduction.  */
> +   otherwise return null.  CODE is the code of the reduction and VECTOR_TYPE
> +   is the vector type that would hold element X.  REDUC_CHAIN is true if
> +   the SLP statements perform a single reduction, false if each statement
> +   performs an independent reduction.  */
>  
>  static tree
> -neutral_op_for_slp_reduction (slp_tree slp_node, tree_code code,
> -   bool reduc_chain)
> +neutral_op_for_slp_reduction (slp_tree slp_node, tree vector_type,
> +   tree_code code, bool reduc_chain)
>  {
>vec stmts = SLP_TREE_SCALAR_STMTS (slp_node);
>stmt_vec_info stmt_vinfo = stmts[0];
> -  tree vector_type = STMT_VINFO_VECTYPE (stmt_vinfo);
>tree scalar_type = TREE_TYPE (vector_type);
>class loop *loop = gimple_bb (stmt_vinfo->stmt)->loop_father;
>gcc_assert (loop);
> @@ -4216,11 +4216,6 @@ vect_create_epilog_for_reduction (stmt_v
>  = as_a  (STMT_VINFO_REDUC_DEF (vect_orig_stmt 
> (stmt_info))->stmt);
>enum tree_code code = STMT_VINFO_REDUC_CODE (reduc_info);
>internal_fn reduc_fn = STMT_VINFO_REDUC_FN (reduc_info);
> -  tree neutral_op = NULL_TREE;
> -  if (slp_node)
> -neutral_op
> -  = neutral_op_for_slp_reduction (slp_node_instance->reduc_phis, code,
> -   REDUC_GROUP_FIRST_ELEMENT (stmt_info));
>stmt_vec_info prev_phi_info;
>tree vectype;
>machine_mode mode;
> @@ -4267,11 +4262,15 @@ vect_create_epilog_for_reduction (stmt_v
>gcc_assert (vectype);
>mode = TYPE_MODE (vectype);
>  
> +  tree neutral_op = NULL_TREE;
>tree initial_def = NULL;
>tree induc_val = NULL_TREE;
>tree adjustment_def = NULL;
>if (slp_node)
> -;
> +neutral_op
> +  = neutral_op_for_slp_reduction (slp_node_instance->reduc_phis,
> +   vectype, code,
> +   REDUC_GROUP_FIRST_ELEMENT (stmt_info));
>else
>  {
>/* Get at the scalar def before the loop, that defines the initial 
> value
> @@ -6210,7 +6209,7 @@ vectorizable_reduction (stmt_vec_info st
>tree neutral_op = NULL_TREE;
>if (slp_node)
>  neutral_op = neutral_op_for_slp_reduction
> -  (slp_node_instance->reduc_phis, orig_code,
> +  (slp_node_instance->reduc_phis, vectype_out, orig_code,
> REDUC_GROUP_FIRST_ELEMENT (stmt_info) != NULL);
>  
>if (double_reduc && reduction_type == FOLD_LEFT_REDUCTION)
> @@ -6793,7 +6792,7 @@ vect_transform_cycle_phi (stmt_vec_info
>gcc_assert (slp_node == slp_node_instance->reduc_phis);
>stmt_vec_info first = REDUC_GROUP_FIRST_ELEMENT (reduc_stmt_info);
>tree neutral_op
> - = neutral_op_for_slp_reduction (slp_node,

Re: [PATCH] Fix PR92324

2019-11-08 Thread Richard Sandiford
Richard Biener  writes:
> I've been sitting on this for a few days since I'm not 100% happy
> with how the code looks like.  There's possibly still holes in it
> (chains with mixed signed/unsigned adds for example might pick
> up signed adds in the epilogue), but the wrong-code cases should
> work fine now.  I'm probably going to followup with some
> mass renaming of variable/parameter names to make it more clear
> which stmt / type we are actually looking at ...
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Does this look like the right way of updating neutral_op_for_slp_reduction?
It now needs to know whether the caller is using STMT_VINFO_REDUC_VECTYPE
(for an epilogue value) or STMT_VINFO_REDUC_VECTYPE_IN (for a PHI argument).

Fixes various gcc.target/aarch64/sve/slp_* tests, will give it a
full test on aarch64-linux-gnu.

Thanks,
Richard


2019-11-08  Richard Sandiford  

gcc/
* tree-vect-loop.c (neutral_op_for_slp_reduction): Take the
vector type as an argument rather than reading it from the
stmt_vec_info.
(vect_create_epilog_for_reduction): Update accordingly,
passing the STMT_VINFO_REDUC_VECTYPE.
(vectorizable_reduction): Likewise.
(vect_transform_cycle_phi): Likewise, but passing the
STMT_VINFO_REDUC_VECTYPE_IN.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c2019-11-08 09:06:29.654896085 +
+++ gcc/tree-vect-loop.c2019-11-08 10:41:54.498861004 +
@@ -2586,17 +2586,17 @@ reduction_fn_for_scalar_code (enum tree_
 
 /* If there is a neutral value X such that SLP reduction NODE would not
be affected by the introduction of additional X elements, return that X,
-   otherwise return null.  CODE is the code of the reduction.  REDUC_CHAIN
-   is true if the SLP statements perform a single reduction, false if each
-   statement performs an independent reduction.  */
+   otherwise return null.  CODE is the code of the reduction and VECTOR_TYPE
+   is the vector type that would hold element X.  REDUC_CHAIN is true if
+   the SLP statements perform a single reduction, false if each statement
+   performs an independent reduction.  */
 
 static tree
-neutral_op_for_slp_reduction (slp_tree slp_node, tree_code code,
- bool reduc_chain)
+neutral_op_for_slp_reduction (slp_tree slp_node, tree vector_type,
+ tree_code code, bool reduc_chain)
 {
   vec stmts = SLP_TREE_SCALAR_STMTS (slp_node);
   stmt_vec_info stmt_vinfo = stmts[0];
-  tree vector_type = STMT_VINFO_VECTYPE (stmt_vinfo);
   tree scalar_type = TREE_TYPE (vector_type);
   class loop *loop = gimple_bb (stmt_vinfo->stmt)->loop_father;
   gcc_assert (loop);
@@ -4216,11 +4216,6 @@ vect_create_epilog_for_reduction (stmt_v
 = as_a  (STMT_VINFO_REDUC_DEF (vect_orig_stmt (stmt_info))->stmt);
   enum tree_code code = STMT_VINFO_REDUC_CODE (reduc_info);
   internal_fn reduc_fn = STMT_VINFO_REDUC_FN (reduc_info);
-  tree neutral_op = NULL_TREE;
-  if (slp_node)
-neutral_op
-  = neutral_op_for_slp_reduction (slp_node_instance->reduc_phis, code,
- REDUC_GROUP_FIRST_ELEMENT (stmt_info));
   stmt_vec_info prev_phi_info;
   tree vectype;
   machine_mode mode;
@@ -4267,11 +4262,15 @@ vect_create_epilog_for_reduction (stmt_v
   gcc_assert (vectype);
   mode = TYPE_MODE (vectype);
 
+  tree neutral_op = NULL_TREE;
   tree initial_def = NULL;
   tree induc_val = NULL_TREE;
   tree adjustment_def = NULL;
   if (slp_node)
-;
+neutral_op
+  = neutral_op_for_slp_reduction (slp_node_instance->reduc_phis,
+ vectype, code,
+ REDUC_GROUP_FIRST_ELEMENT (stmt_info));
   else
 {
   /* Get at the scalar def before the loop, that defines the initial value
@@ -6210,7 +6209,7 @@ vectorizable_reduction (stmt_vec_info st
   tree neutral_op = NULL_TREE;
   if (slp_node)
 neutral_op = neutral_op_for_slp_reduction
-  (slp_node_instance->reduc_phis, orig_code,
+  (slp_node_instance->reduc_phis, vectype_out, orig_code,
REDUC_GROUP_FIRST_ELEMENT (stmt_info) != NULL);
 
   if (double_reduc && reduction_type == FOLD_LEFT_REDUCTION)
@@ -6793,7 +6792,7 @@ vect_transform_cycle_phi (stmt_vec_info
   gcc_assert (slp_node == slp_node_instance->reduc_phis);
   stmt_vec_info first = REDUC_GROUP_FIRST_ELEMENT (reduc_stmt_info);
   tree neutral_op
-   = neutral_op_for_slp_reduction (slp_node,
+   = neutral_op_for_slp_reduction (slp_node, vectype_in,
STMT_VINFO_REDUC_CODE (reduc_info),
first != NULL);
   get_initial_defs_for_reduction (slp_node_instance->reduc_phis,


[PATCH] Fix PR92324

2019-11-08 Thread Richard Biener


I've been sitting on this for a few days since I'm not 100% happy
with how the code looks like.  There's possibly still holes in it
(chains with mixed signed/unsigned adds for example might pick
up signed adds in the epilogue), but the wrong-code cases should
work fine now.  I'm probably going to followup with some
mass renaming of variable/parameter names to make it more clear
which stmt / type we are actually looking at ...

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2019-11-08  Richard Biener  

PR tree-optimization/
* tree-vect-loop.c (vect_create_epilog_for_reduction): Use
STMT_VINFO_REDUC_VECTYPE for all computations, inserting
sign-conversions as necessary.
(vectorizable_reduction): Reject conversions in the chain
that are not sign-conversions, base analysis on a non-converting
stmt and its operation sign.  Set STMT_VINFO_REDUC_VECTYPE.
* tree-vect-stmts.c (vect_stmt_relevant_p): Don't dump anything
for debug stmts.
* tree-vectorizer.h (_stmt_vec_info::reduc_vectype): New.
(STMT_VINFO_REDUC_VECTYPE): Likewise.

* gcc.dg/vect/pr92205.c: XFAIL.
* gcc.dg/vect/pr92324-1.c: New testcase.
* gcc.dg/vect/pr92324-2.c: Likewise.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c(revision 277922)
+++ gcc/tree-vect-loop.c(working copy)
@@ -4231,7 +4231,6 @@ vect_create_epilog_for_reduction (stmt_v
   gimple *new_phi = NULL, *phi;
   stmt_vec_info phi_info;
   gimple_stmt_iterator exit_gsi;
-  tree vec_dest;
   tree new_temp = NULL_TREE, new_name, new_scalar_dest;
   gimple *epilog_stmt = NULL;
   gimple *exit_phi;
@@ -4264,7 +4263,7 @@ vect_create_epilog_for_reduction (stmt_v
 }
   gcc_assert (!nested_in_vect_loop || double_reduc);
 
-  vectype = STMT_VINFO_VECTYPE (stmt_info);
+  vectype = STMT_VINFO_REDUC_VECTYPE (reduc_info);
   gcc_assert (vectype);
   mode = TYPE_MODE (vectype);
 
@@ -4505,48 +4504,43 @@ vect_create_epilog_for_reduction (stmt_v
  one vector.  */
   if (REDUC_GROUP_FIRST_ELEMENT (stmt_info) || direct_slp_reduc)
 {
+  gimple_seq stmts = NULL;
   tree first_vect = PHI_RESULT (new_phis[0]);
-  gassign *new_vec_stmt = NULL;
-  vec_dest = vect_create_destination_var (scalar_dest, vectype);
+  first_vect = gimple_convert (, vectype, first_vect);
   for (k = 1; k < new_phis.length (); k++)
 {
  gimple *next_phi = new_phis[k];
   tree second_vect = PHI_RESULT (next_phi);
-  tree tem = make_ssa_name (vec_dest, new_vec_stmt);
-  new_vec_stmt = gimple_build_assign (tem, code,
- first_vect, second_vect);
-  gsi_insert_before (_gsi, new_vec_stmt, GSI_SAME_STMT);
- first_vect = tem;
+ second_vect = gimple_convert (, vectype, second_vect);
+  first_vect = gimple_build (, code, vectype,
+first_vect, second_vect);
 }
+  gsi_insert_seq_before (_gsi, stmts, GSI_SAME_STMT);
 
   new_phi_result = first_vect;
-  if (new_vec_stmt)
-{
-  new_phis.truncate (0);
-  new_phis.safe_push (new_vec_stmt);
-}
+  new_phis.truncate (0);
+  new_phis.safe_push (SSA_NAME_DEF_STMT (first_vect));
 }
   /* Likewise if we couldn't use a single defuse cycle.  */
   else if (ncopies > 1)
 {
   gcc_assert (new_phis.length () == 1);
+  gimple_seq stmts = NULL;
   tree first_vect = PHI_RESULT (new_phis[0]);
-  gassign *new_vec_stmt = NULL;
-  vec_dest = vect_create_destination_var (scalar_dest, vectype);
+  first_vect = gimple_convert (, vectype, first_vect);
   stmt_vec_info next_phi_info = loop_vinfo->lookup_stmt (new_phis[0]);
   for (int k = 1; k < ncopies; ++k)
{
  next_phi_info = STMT_VINFO_RELATED_STMT (next_phi_info);
  tree second_vect = PHI_RESULT (next_phi_info->stmt);
-  tree tem = make_ssa_name (vec_dest, new_vec_stmt);
-  new_vec_stmt = gimple_build_assign (tem, code,
- first_vect, second_vect);
-  gsi_insert_before (_gsi, new_vec_stmt, GSI_SAME_STMT);
- first_vect = tem;
+ second_vect = gimple_convert (, vectype, second_vect);
+ first_vect = gimple_build (, code, vectype,
+first_vect, second_vect);
}
+  gsi_insert_seq_before (_gsi, stmts, GSI_SAME_STMT);
   new_phi_result = first_vect;
   new_phis.truncate (0);
-  new_phis.safe_push (new_vec_stmt);
+  new_phis.safe_push (SSA_NAME_DEF_STMT (first_vect));
 }
   else
 new_phi_result = PHI_RESULT (new_phis[0]);
@@ -4877,13 +4871,14 @@ vect_create_epilog_for_reduction (stmt_v
 in a vector mode of smaller size and first reduce upper/lower
 halves against each