On Fri, 26 Jun 2020, Richard Biener wrote:
> (sorry for the duplicate, forgot to copy the list)
>
> This teaches SLP analysis about vector typed externals that are
> fed into the SLP operations via lane extracting BIT_FIELD_REFs.
> It shows that there's currently no good representation for
> vector code on the SLP side so I went a half way and represent
> such vector externals uses always using a SLP permutation node
> with a single external SLP child which has a non-standard
> representation of no scalar defs but only a vector def. That
> works best for shielding the rest of the vectorizer from it.
>
> I'm not sure it's actually worth the trouble and what real-world
> cases benefit from this. In theory vectorized unrolled code
>
> interfacing with scalar code might be one case but there
> we necessarily go through memory and there's no intermediate
>
> pass transforming that to registers [to make BB vectorization
> cheaper].
>
>
> It's also not even close to ready for re-vectorizing vectorized
>
> code with a larger VF.
>
>
> Any opinions?
I have now installed this.
Richard.
> Bootstrapped / tested on x86_64-unknown-linux-gnu.
>
>
> Thanks,
> Richard.
>
> 2020-06-26 Richard Biener
>
> PR tree-optimization/95839
> * tree-vect-slp.c (vect_slp_tree_uniform_p): Pre-existing
> vectors are not uniform.
> (vect_build_slp_tree_1): Handle BIT_FIELD_REFs of
> vector registers.
> (vect_build_slp_tree_2): For groups of lane extracts
> from a vector register generate a permute node
> with a special child representing the pre-existing vector.
> (vect_prologue_cost_for_slp): Pre-existing vectors cost nothing.
> (vect_slp_analyze_node_operations): Use SLP_TREE_LANES.
> (vectorizable_slp_permutation): Do not generate or cost identity
> permutes.
> (vect_schedule_slp_instance): Handle pre-existing vector
> that are function arguments.
>
> * gcc.dg/vect/bb-slp-pr95839-2.c: New testcase.
> ---
> gcc/testsuite/gcc.dg/vect/bb-slp-pr95839-2.c | 20
> gcc/tree-vect-slp.c | 119 ---
> 2 files changed, 124 insertions(+), 15 deletions(-)
> create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-pr95839-2.c
>
> diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr95839-2.c
> b/gcc/testsuite/gcc.dg/vect/bb-slp-pr95839-2.c
> new file mode 100644
> index 000..49e75d8c95c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr95839-2.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_double } */
> +/* { dg-additional-options "-w -Wno-psabi" } */
> +
> +typedef double __attribute__((vector_size(16))) v2df;
> +
> +v2df f(v2df a, v2df b)
> +{
> + return (v2df){a[0] + b[0], a[1] + b[1]};
> +}
> +
> +v2df g(v2df a, v2df b)
> +{
> + return (v2df){a[0] + b[1], a[1] + b[0]};
> +}
> +
> +/* Verify we manage to vectorize this with using the original vectors
> + and do not end up with any vector CTORs. */
> +/* { dg-final { scan-tree-dump-times "basic block vectorized" 2 "slp2" } } */
> +/* { dg-final { scan-tree-dump-not "vect_cst" "slp2" } } */
> diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
> index b223956e3af..83ec382ee0d 100644
> --- a/gcc/tree-vect-slp.c
> +++ b/gcc/tree-vect-slp.c
> @@ -247,6 +247,10 @@ vect_slp_tree_uniform_p (slp_tree node)
>gcc_assert (SLP_TREE_DEF_TYPE (node) == vect_constant_def
> || SLP_TREE_DEF_TYPE (node) == vect_external_def);
>
> + /* Pre-exsting vectors. */
> + if (SLP_TREE_SCALAR_OPS (node).is_empty ())
> +return false;
> +
>unsigned i;
>tree op, first = NULL_TREE;
>FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_OPS (node), i, op)
> @@ -838,7 +842,7 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char
> *swap,
>else
> {
> rhs_code = gimple_assign_rhs_code (stmt);
> - load_p = TREE_CODE_CLASS (rhs_code) == tcc_reference;
> + load_p = gimple_vuse (stmt);
> }
>
>/* Check the operation. */
> @@ -899,6 +903,22 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char
> *swap,
>need_same_oprnds = true;
>first_op1 = gimple_assign_rhs2 (stmt);
> }
> + else if (!load_p
> +&& rhs_code == BIT_FIELD_REF)
> + {
> + tree vec = TREE_OPERAND (gimple_assign_rhs1 (stmt), 0);
> + if (TREE_CODE (vec) != SSA_NAME
> + || !types_compatible_p (vectype, TREE_TYPE (vec)))
> + {
> + if (dump_enabled_p ())
> + dump_printf_loc