Re: [PATCH] tree-optimization/95839 - teach SLP vectorization about vector inputs

2020-07-01 Thread Richard Biener
On Fri, 26 Jun 2020, Richard Biener wrote:

> (sorry for the duplicate, forgot to copy the list)
> 
> This teaches SLP analysis about vector typed externals that are
> fed into the SLP operations via lane extracting BIT_FIELD_REFs.
> It shows that there's currently no good representation for
> vector code on the SLP side so I went a half way and represent
> such vector externals uses always using a SLP permutation node
> with a single external SLP child which has a non-standard
> representation of no scalar defs but only a vector def.  That
> works best for shielding the rest of the vectorizer from it.
> 
> I'm not sure it's actually worth the trouble and what real-world
> cases benefit from this.  In theory vectorized unrolled code  
>   
> interfacing with scalar code might be one case but there
> we necessarily go through memory and there's no intermediate  
>   
> pass transforming that to registers [to make BB vectorization
> cheaper]. 
>   
> 
> It's also not even close to ready for re-vectorizing vectorized   
>   
> code with a larger VF.
>   
>   
> Any opinions?   

I have now installed this.

Richard.

> Bootstrapped / tested on x86_64-unknown-linux-gnu.
>   
>   
> Thanks, 
> Richard. 
> 
> 2020-06-26  Richard Biener  
> 
>   PR tree-optimization/95839
>   * tree-vect-slp.c (vect_slp_tree_uniform_p): Pre-existing
>   vectors are not uniform.
>   (vect_build_slp_tree_1): Handle BIT_FIELD_REFs of
>   vector registers.
>   (vect_build_slp_tree_2): For groups of lane extracts
>   from a vector register generate a permute node
>   with a special child representing the pre-existing vector.
>   (vect_prologue_cost_for_slp): Pre-existing vectors cost nothing.
>   (vect_slp_analyze_node_operations): Use SLP_TREE_LANES.
>   (vectorizable_slp_permutation): Do not generate or cost identity
>   permutes.
>   (vect_schedule_slp_instance): Handle pre-existing vector
>   that are function arguments.
> 
>   * gcc.dg/vect/bb-slp-pr95839-2.c: New testcase.
> ---
>  gcc/testsuite/gcc.dg/vect/bb-slp-pr95839-2.c |  20 
>  gcc/tree-vect-slp.c  | 119 ---
>  2 files changed, 124 insertions(+), 15 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-pr95839-2.c
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr95839-2.c 
> b/gcc/testsuite/gcc.dg/vect/bb-slp-pr95839-2.c
> new file mode 100644
> index 000..49e75d8c95c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr95839-2.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_double } */
> +/* { dg-additional-options "-w -Wno-psabi" } */
> +
> +typedef double __attribute__((vector_size(16))) v2df;
> +
> +v2df f(v2df a, v2df b)
> +{
> +  return (v2df){a[0] + b[0], a[1] + b[1]};
> +}
> +
> +v2df g(v2df a, v2df b)
> +{
> +  return (v2df){a[0] + b[1], a[1] + b[0]};
> +}
> +
> +/* Verify we manage to vectorize this with using the original vectors
> +   and do not end up with any vector CTORs.  */
> +/* { dg-final { scan-tree-dump-times "basic block vectorized" 2 "slp2" } } */
> +/* { dg-final { scan-tree-dump-not "vect_cst" "slp2" } } */
> diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
> index b223956e3af..83ec382ee0d 100644
> --- a/gcc/tree-vect-slp.c
> +++ b/gcc/tree-vect-slp.c
> @@ -247,6 +247,10 @@ vect_slp_tree_uniform_p (slp_tree node)
>gcc_assert (SLP_TREE_DEF_TYPE (node) == vect_constant_def
> || SLP_TREE_DEF_TYPE (node) == vect_external_def);
>  
> +  /* Pre-exsting vectors.  */
> +  if (SLP_TREE_SCALAR_OPS (node).is_empty ())
> +return false;
> +
>unsigned i;
>tree op, first = NULL_TREE;
>FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_OPS (node), i, op)
> @@ -838,7 +842,7 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
> *swap,
>else
>   {
> rhs_code = gimple_assign_rhs_code (stmt);
> -   load_p = TREE_CODE_CLASS (rhs_code) == tcc_reference;
> +   load_p = gimple_vuse (stmt);
>   }
>  
>/* Check the operation.  */
> @@ -899,6 +903,22 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
> *swap,
>need_same_oprnds = true;
>first_op1 = gimple_assign_rhs2 (stmt);
>  }
> +   else if (!load_p
> +&& rhs_code == BIT_FIELD_REF)
> + {
> +   tree vec = TREE_OPERAND (gimple_assign_rhs1 (stmt), 0);
> +   if (TREE_CODE (vec) != SSA_NAME
> +   || !types_compatible_p (vectype, TREE_TYPE (vec)))
> + {
> +   if (dump_enabled_p ())
> + dump_printf_loc 

[PATCH] tree-optimization/95839 - teach SLP vectorization about vector inputs

2020-06-26 Thread Richard Biener
(sorry for the duplicate, forgot to copy the list)

This teaches SLP analysis about vector typed externals that are
fed into the SLP operations via lane extracting BIT_FIELD_REFs.
It shows that there's currently no good representation for
vector code on the SLP side so I went a half way and represent
such vector externals uses always using a SLP permutation node
with a single external SLP child which has a non-standard
representation of no scalar defs but only a vector def.  That
works best for shielding the rest of the vectorizer from it.

I'm not sure it's actually worth the trouble and what real-world
cases benefit from this.  In theory vectorized unrolled code
interfacing with scalar code might be one case but there
we necessarily go through memory and there's no intermediate
pass transforming that to registers [to make BB vectorization
cheaper].   

It's also not even close to ready for re-vectorizing vectorized 
code with a larger VF.

Any opinions?   

Bootstrapped / tested on x86_64-unknown-linux-gnu.

Thanks, 
Richard. 

2020-06-26  Richard Biener  

PR tree-optimization/95839
* tree-vect-slp.c (vect_slp_tree_uniform_p): Pre-existing
vectors are not uniform.
(vect_build_slp_tree_1): Handle BIT_FIELD_REFs of
vector registers.
(vect_build_slp_tree_2): For groups of lane extracts
from a vector register generate a permute node
with a special child representing the pre-existing vector.
(vect_prologue_cost_for_slp): Pre-existing vectors cost nothing.
(vect_slp_analyze_node_operations): Use SLP_TREE_LANES.
(vectorizable_slp_permutation): Do not generate or cost identity
permutes.
(vect_schedule_slp_instance): Handle pre-existing vector
that are function arguments.

* gcc.dg/vect/bb-slp-pr95839-2.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/bb-slp-pr95839-2.c |  20 
 gcc/tree-vect-slp.c  | 119 ---
 2 files changed, 124 insertions(+), 15 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-pr95839-2.c

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr95839-2.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-pr95839-2.c
new file mode 100644
index 000..49e75d8c95c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr95839-2.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_double } */
+/* { dg-additional-options "-w -Wno-psabi" } */
+
+typedef double __attribute__((vector_size(16))) v2df;
+
+v2df f(v2df a, v2df b)
+{
+  return (v2df){a[0] + b[0], a[1] + b[1]};
+}
+
+v2df g(v2df a, v2df b)
+{
+  return (v2df){a[0] + b[1], a[1] + b[0]};
+}
+
+/* Verify we manage to vectorize this with using the original vectors
+   and do not end up with any vector CTORs.  */
+/* { dg-final { scan-tree-dump-times "basic block vectorized" 2 "slp2" } } */
+/* { dg-final { scan-tree-dump-not "vect_cst" "slp2" } } */
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index b223956e3af..83ec382ee0d 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -247,6 +247,10 @@ vect_slp_tree_uniform_p (slp_tree node)
   gcc_assert (SLP_TREE_DEF_TYPE (node) == vect_constant_def
  || SLP_TREE_DEF_TYPE (node) == vect_external_def);
 
+  /* Pre-exsting vectors.  */
+  if (SLP_TREE_SCALAR_OPS (node).is_empty ())
+return false;
+
   unsigned i;
   tree op, first = NULL_TREE;
   FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_OPS (node), i, op)
@@ -838,7 +842,7 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,
   else
{
  rhs_code = gimple_assign_rhs_code (stmt);
- load_p = TREE_CODE_CLASS (rhs_code) == tcc_reference;
+ load_p = gimple_vuse (stmt);
}
 
   /* Check the operation.  */
@@ -899,6 +903,22 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
*swap,
   need_same_oprnds = true;
   first_op1 = gimple_assign_rhs2 (stmt);
 }
+ else if (!load_p
+  && rhs_code == BIT_FIELD_REF)
+   {
+ tree vec = TREE_OPERAND (gimple_assign_rhs1 (stmt), 0);
+ if (TREE_CODE (vec) != SSA_NAME
+ || !types_compatible_p (vectype, TREE_TYPE (vec)))
+   {
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"Build SLP failed: "
+"BIT_FIELD_REF not supported\n");
+ /* Fatal