[Bug middle-end/109153] missed vector constructor optimizations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109153 Tamar Christina changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |tnfchris at gcc dot gnu.org Status|NEW |ASSIGNED --- Comment #4 from Tamar Christina --- Assigning to myself to prevent duplicate work.
[Bug middle-end/109153] missed vector constructor optimizations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109153 --- Comment #3 from Tamar Christina --- (In reply to Richard Biener from comment #2) > On the GIMPLE side we should canonicalize here I think, at which point > inserts into a splatted vector become more profitable depends? > > _4 = VEC_PERM_EXPR ; > _5 = VEC_PERM_EXPR ; > _6 = {_4, _5}; > > we have simplify_vector_constructor in tree-ssa-forwprop.cc. > Ah great! > For the other BIT_INSERT_EXPR case I'd go to match.pd, but adding a function > to forwprop is also possible. > > If we want to expand { 4, 4, _1, 4, 4, ..} with splat + insert we should > IMHO do that at RTL expansion time where we already try splat (I think). > Not sure how to apply costing there though. There's also the possibility > to expand { a, a, b, b, a, b, a, ... } with two splat + blend. For > vec_init RTL expansion the target has full control, so it can decide for > itself (if we do not want to do anything in generic code). Ok, so the suggestion is to in gimple canonicalize to the simplest vector constructor form and deal with it in vec_init? This makes sense, I initially thought gimple was easier since modifying constructors are simpler in gimple than RTL. But it looks like we do all "costing" based pattern checks already in aarch64_expand_vector_init so as you said, simplifying the vector constructors should just make it work. So will go with that and extend aarch64_expand_vector_init if needed. Thanks!
[Bug middle-end/109153] missed vector constructor optimizations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109153 --- Comment #2 from Richard Biener --- On the GIMPLE side we should canonicalize here I think, at which point inserts into a splatted vector become more profitable depends? _4 = VEC_PERM_EXPR ; _5 = VEC_PERM_EXPR ; _6 = {_4, _5}; we have simplify_vector_constructor in tree-ssa-forwprop.cc. For the other BIT_INSERT_EXPR case I'd go to match.pd, but adding a function to forwprop is also possible. If we want to expand { 4, 4, _1, 4, 4, ..} with splat + insert we should IMHO do that at RTL expansion time where we already try splat (I think). Not sure how to apply costing there though. There's also the possibility to expand { a, a, b, b, a, b, a, ... } with two splat + blend. For vec_init RTL expansion the target has full control, so it can decide for itself (if we do not want to do anything in generic code).
[Bug middle-end/109153] missed vector constructor optimizations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109153 ktkachov at gcc dot gnu.org changed: What|Removed |Added CC||ktkachov at gcc dot gnu.org Ever confirmed|0 |1 Last reconfirmed||2023-03-16 Status|UNCONFIRMED |NEW --- Comment #1 from ktkachov at gcc dot gnu.org --- Confirmed. Does the midend have a way of judging whether a constructor is cheaper?