[Bug target/108724] [11 Regression] Poor codegen when summing two arrays without AVX or SSE

2023-05-29 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108724

Jakub Jelinek  changed:

   What|Removed |Added

   Target Milestone|11.4|11.5

--- Comment #11 from Jakub Jelinek  ---
GCC 11.4 is being released, retargeting bugs to GCC 11.5.

[Bug target/108724] [11 Regression] Poor codegen when summing two arrays without AVX or SSE

2023-05-23 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108724

--- Comment #10 from Richard Biener  ---
On trunk we're back to vectorizing but as intended with DImode which makes us
save half of the loads and stores and we think the extended required arithmetic
covers up for that (by quite some margin).

movabsq $9223372034707292159, %rcx
movq(%rdx), %rax
movq(%rsi), %rsi
movq%rcx, %rdx
andq%rax, %rdx
andq%rsi, %rcx
xorq%rsi, %rax
addq%rcx, %rdx
movabsq $-9223372034707292160, %rcx
andq%rcx, %rax
xorq%rdx, %rax
movq%rax, (%rdi)

vs

movl(%rdx), %eax
addl(%rsi), %eax
movl%eax, (%rdi)
movl4(%rdx), %eax
addl4(%rsi), %eax
movl%eax, 4(%rdi)

[Bug target/108724] [11 Regression] Poor codegen when summing two arrays without AVX or SSE

2023-05-05 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108724

Richard Biener  changed:

   What|Removed |Added

  Component|tree-optimization   |target
 Target||x86_64-*-* i?86-*-*

--- Comment #9 from Richard Biener  ---
And the remaining issue with GCC 11 would be that we fail to account for the
GPR -> XMM move.  Or the remaining issue for _all_ branches is that we fail
to realize that emulated "vector" CTORs are even more expensive since we lack
a good way to materialize the CTOR in a GPR (generic RTL expansion fails to
consider using shift + and for example).

Not sure what a good expansion of a V2SImode, V4HImode or V8QImode
CTOR to a GPR DImode reg would look like.