[Bug target/89057] [8/9 Regression] AArch64 ld3 st4 less optimized

2021-01-12 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89057

rsandifo at gcc dot gnu.org  changed:

   What|Removed |Added

Summary|[8/9/10 Regression] AArch64 |[8/9 Regression] AArch64
   |ld3 st4 less optimized  |ld3 st4 less optimized

--- Comment #12 from rsandifo at gcc dot gnu.org  
---
Fixed for GCC 10 by r10-9255.

[Bug target/89057] [8/9 Regression] AArch64 ld3 st4 less optimized

2019-02-22 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89057

Jakub Jelinek  changed:

   What|Removed |Added

   Target Milestone|8.3 |8.4

--- Comment #5 from Jakub Jelinek  ---
GCC 8.3 has been released.

[Bug target/89057] [8/9 Regression] AArch64 ld3 st4 less optimized

2019-01-30 Thread linux at carewolf dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89057

--- Comment #4 from Allan Jensen  ---
While that change might have made things worse. The real problem is probably
that the registers for those instructions are loaded and stored using
intrinsics, so proper register allocation and combining cant be performed.

For ARMv7 for instance the same code can be optimized to having no moves but
just a single vswp instruction between ld3 and st4. And MSVC and clang can do
that but GCC can not.

[Bug target/89057] [8/9 Regression] AArch64 ld3 st4 less optimized

2019-01-30 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89057

Jakub Jelinek  changed:

   What|Removed |Added

 CC||collison at gcc dot gnu.org,
   ||jakub at gcc dot gnu.org

--- Comment #3 from Jakub Jelinek  ---
Started with r249702.
https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01974.html

[Bug target/89057] [8/9 Regression] AArch64 ld3 st4 less optimized

2019-01-28 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89057

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P2
   Target Milestone|--- |8.3