https://gcc.gnu.org/bugzilla/show_bug.cgi?id=74585
--- Comment #14 from Bill Schmidt ---
(In reply to Richard Biener from comment #13)
>
> You mean stores like the following?
>
> (insn 13 12 14 2 (set (mem/c:V4SI (plus:DI (reg/f:DI 150 virtual-stack-vars)
> (const_int 112
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=74585
--- Comment #13 from Richard Biener ---
(In reply to Bill Schmidt from comment #11)
> With the original test case, -mcpu=power8 is problematic because of the use
> of the "swapping stores," whose RHS is a vec_select rather than a register
> or
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=74585
--- Comment #12 from Bill Schmidt ---
The rest of the ugly code (once you ignore the loads/stores) is horrible
choices of register allocation. Need to understand why we're not making use of
the high floating-point registers; too much copying
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=74585
--- Comment #11 from Bill Schmidt ---
With the original test case, -mcpu=power8 is problematic because of the use of
the "swapping stores," whose RHS is a vec_select rather than a register or
subreg. This prevents us from saving the RHS of the
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=74585
--- Comment #10 from Bill Schmidt ---
The dse pass is responsible for removing all the unnecessary stack activity. I
think that we are probably confusing it because the stores are full vector
stores, but the loads are vector element loads of
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=74585
--- Comment #9 from Bill Schmidt ---
We do optimize things well for the following:
typedef struct
{
__vector double vx0;
__vector double vx1;
__vector double vx2;
__vector double vx3;
} vdoublex8_t;
vdoublex8_t
test_vecd8_rotate_left
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=74585
--- Comment #8 from Bill Schmidt ---
FYI, adding -mcpu=power9 to the options makes it much easier to read the RTL,
as it gets rid of the extra vector swaps needed for POWER8.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=74585
Bill Schmidt changed:
What|Removed |Added
Component|tree-optimization |rtl-optimization
Summary|SRA