subject:"\[Bug target\/52572\] suboptimal assignment to avx element"

[Bug target/52572] suboptimal assignment to avx element

2021-12-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52572

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |enhancement
   Last reconfirmed||2021-12-25
 Target||x86_64-linux-gnu
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1

--- Comment #4 from Andrew Pinski  ---
LLVM produces:

vxorps  %xmm1, %xmm1, %xmm1
vblendps$3, %ymm1, %ymm0, %ymm0 # ymm0 =
ymm1[0,1],ymm0[2,3,4,5,6,7]

and

vxorps  %xmm0, %xmm0, %xmm0
vblendps$252, (%rdi), %ymm0, %ymm0  # ymm0 =
ymm0[0,1],mem[2,3,4,5,6,7]

Which I suspect is better.

[Bug target/52572] suboptimal assignment to avx element

2012-03-13 Thread jakub at gcc dot gnu.org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52572

Jakub Jelinek jakub at gcc dot gnu.org changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #1 from Jakub Jelinek jakub at gcc dot gnu.org 2012-03-13 
07:54:14 UTC ---
Have you actually tried that?  Mixing VEX encoded insns with legacy encoded
SSE* insns is very costly, for good performance there needs to be a vzeroupper
in between (but then you lose the upper bits).  See e.g. 2.8 in the AVX
Programming Reference.

[Bug target/52572] suboptimal assignment to avx element

2012-03-13 Thread marc.glisse at normalesup dot org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52572

--- Comment #2 from Marc Glisse marc.glisse at normalesup dot org 2012-03-13 
08:16:58 UTC ---
(In reply to comment #1)
 Have you actually tried that?

Ah, no, sorry, I only have occasional access to such a machine to benchmark the
code. From a -Os perspective it is still shorter (but indeed that matters less
to me than -O3 performance).

  Mixing VEX encoded insns with legacy encoded
 SSE* insns is very costly, for good performance there needs to be a vzeroupper
 in between (but then you lose the upper bits).  See e.g. 2.8 in the AVX
 Programming Reference.

Thanks, I'd missed that.

The vblendpd solution should still apply (from the initial 'v' it sounds safe),
no?

[Bug target/52572] suboptimal assignment to avx element

2012-03-13 Thread marc.glisse at normalesup dot org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52572

--- Comment #3 from Marc Glisse marc.glisse at normalesup dot org 2012-03-13 
17:57:58 UTC ---
Or for this variant:
__m256d f(__m256d *y){
  __m256d x=*y;
  x[0]=0; // or x[3]
  return x;
}
it looks like vmaskmovpd could replace:
vmovapd(%rdi), %ymm0
vmovapd%xmm0, %xmm1
vmovlpd.LC0(%rip), %xmm1, %xmm1
vinsertf128$0x0, %xmm1, %ymm0, %ymm0
(I tried a version with __builtin_shuffle but it wouldn't generate vmaskmovpd
either)

(sorry for the naive suggestions, there are too many possibilities to optimize
them all...)

[Bug target/52572] suboptimal assignment to avx element

[Bug target/52572] suboptimal assignment to avx element

[Bug target/52572] suboptimal assignment to avx element

[Bug target/52572] suboptimal assignment to avx element

4 matches

Site Navigation

Mail list logo

Footer information