https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91934
Richard Biener changed:
What|Removed |Added
Depends on||87105, 87746, 87800
--- Comment #8
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91934
--- Comment #7 from Richard Biener ---
So the difference between good and bad is data-ref access analysis which
figures
single-element interleaving in GCC 8 and nicer interleaving in GCC 9 where
I rewrote parts of that analysis:
t.c:15:9: note:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91934
Jakub Jelinek changed:
What|Removed |Added
CC||jakub at gcc dot gnu.org
--- Comment #6
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91934
Richard Biener changed:
What|Removed |Added
Status|WAITING |NEW
Known to work|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91934
--- Comment #4 from Dmitrii Tochanskii ---
I'm not a good specialist in avx, so I just see something like loop unroll or
may be very log data preparation. For example:
=
vmovups ymm3, YMMWORD PTR [r8+r9]
vmovups ymm5,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91934
--- Comment #3 from Dmitrii Tochanskii ---
Yep, -fno-loop-unroll-and-jam helps me! Interesting.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91934
Richard Biener changed:
What|Removed |Added
Keywords||needs-bisection
--- Comment #2 from
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91934
Richard Biener changed:
What|Removed |Added
Keywords||missed-optimization