https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113871
Bug ID: 113871 Summary: psrlq is not used for PERM<a,{0},1,2,3,4> Product: gcc Version: 14.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: enhancement Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: pinskia at gcc dot gnu.org Target Milestone: --- Target: x86_64 Take: ``` #define vect64 __attribute__((vector_size(8))) void f(vect64unsigned short *a) { *a = __builtin_shufflevector(*a,(vect64 unsigned short){0}, 1,2,3, 4); } ``` This should just produce: ``` movq (%rdi), %xmm0 psrlq $16, %xmm0, %xmm0 movq %xmm0, (%rdi) retq ``` But instead we get: ``` movzwl 6(%rdi), %eax movzwl 4(%rdi), %edx salq $16, %rax orq %rdx, %rax movzwl 2(%rdi), %edx salq $16, %rax orq %rdx, %rax movq %rax, (%rdi) ret ``` With AVX enabled we get slightly better: ``` f: .LFB0: .cfi_startproc vmovq (%rdi), %xmm0 vpxor %xmm1, %xmm1, %xmm1 vpshufb .LC1(%rip), %xmm1, %xmm1 vpshufb .LC0(%rip), %xmm0, %xmm0 vpor %xmm1, %xmm0, %xmm0 vmovq %xmm0, (%rdi) ret ``` Note LLVM is able to catch this for x86_64 (for aarch64, GCC is able to use `ushr d31, d31, 16` while LLVM does not). I suspect vec_shr_<mode> pattern is missing and once it is added, it will just work.