[Bug middle-end/109153] missed vector constructor optimizations

2023-03-16 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109153

Tamar Christina  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |tnfchris at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #4 from Tamar Christina  ---
Assigning to myself to prevent duplicate work.

[Bug middle-end/109153] missed vector constructor optimizations

2023-03-16 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109153

--- Comment #3 from Tamar Christina  ---
(In reply to Richard Biener from comment #2)
> On the GIMPLE side we should canonicalize here I think, at which point
> inserts into a splatted vector become more profitable depends?
> 
>   _4 = VEC_PERM_EXPR ;
>   _5 = VEC_PERM_EXPR ;
>   _6 = {_4, _5};
> 
> we have simplify_vector_constructor in tree-ssa-forwprop.cc.
>

Ah great! 

> For the other BIT_INSERT_EXPR case I'd go to match.pd, but adding a function
> to forwprop is also possible.
> 
> If we want to expand { 4, 4, _1, 4, 4, ..} with splat + insert we should
> IMHO do that at RTL expansion time where we already try splat (I think).
> Not sure how to apply costing there though.  There's also the possibility
> to expand { a, a, b, b, a, b, a, ... } with two splat + blend.  For
> vec_init RTL expansion the target has full control, so it can decide for
> itself (if we do not want to do anything in generic code).

Ok, so the suggestion is to in gimple canonicalize to the simplest vector
constructor form and deal with it in vec_init?  This makes sense, I initially
thought gimple was easier since modifying constructors are simpler in gimple
than RTL.

But it looks like we do all "costing" based pattern checks already in
aarch64_expand_vector_init so as you said, simplifying the vector constructors
should just make it work.

So will go with that and extend aarch64_expand_vector_init if needed.  Thanks!

[Bug middle-end/109153] missed vector constructor optimizations

2023-03-16 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109153

--- Comment #2 from Richard Biener  ---
On the GIMPLE side we should canonicalize here I think, at which point
inserts into a splatted vector become more profitable depends?

  _4 = VEC_PERM_EXPR ;
  _5 = VEC_PERM_EXPR ;
  _6 = {_4, _5};

we have simplify_vector_constructor in tree-ssa-forwprop.cc.

For the other BIT_INSERT_EXPR case I'd go to match.pd, but adding a function
to forwprop is also possible.

If we want to expand { 4, 4, _1, 4, 4, ..} with splat + insert we should
IMHO do that at RTL expansion time where we already try splat (I think).
Not sure how to apply costing there though.  There's also the possibility
to expand { a, a, b, b, a, b, a, ... } with two splat + blend.  For
vec_init RTL expansion the target has full control, so it can decide for
itself (if we do not want to do anything in generic code).

[Bug middle-end/109153] missed vector constructor optimizations

2023-03-16 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109153

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 CC||ktkachov at gcc dot gnu.org
 Ever confirmed|0   |1
   Last reconfirmed||2023-03-16
 Status|UNCONFIRMED |NEW

--- Comment #1 from ktkachov at gcc dot gnu.org ---
Confirmed. Does the midend have a way of judging whether a constructor is
cheaper?