https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108064
Bug ID: 108064 Summary: [13 Regression] apache-arrow-cpp-9.0.0 is vectored incorrectly: arithmetic shift instead of logical Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: slyfox at gcc dot gnu.org Target Milestone: --- Initially observed the failure as an array test failure in apache-arrow-cpp-9.0.0: [ FAILED ] TestSwapEndianArrayData.RandomData There array of int16_t gets endianness shifted element by element. Minimized example: // $ cat a.cc typedef short int i16; static inline i16 ByteSwap16(i16 value) { constexpr auto m = static_cast<i16>(0xff); return static_cast<i16>(((value >> 8) & m) | ((value & m) << 8)); } __attribute__((noipa)) void swab16(i16 * d, const i16* s) { for (unsigned long i = 0; i < 4; i++) { d[i] = ByteSwap16(s[i]); } } __attribute__((noipa)) int main(void) { /* need to alogn inputs to make sure vectized part of the loop gets executed. */ alignas(16) i16 a[4] = {0xff, 0, 0, 0}; alignas(16) i16 b[4]; alignas(16) i16 c[4]; swab16(b, a); swab16(c, b); /* Contents of 'a' should be equivalent to 'c'. But gcc bug generates invalid vectored shifts. */ if (a[0] != c[0]) __builtin_trap(); } Weekly gcc-13 (and master branch) generate invalid code for it: $ ./gcc-git/bin/g++ -O3 a.cc -o a && ./a Illegal instruction (core dumped) $ ./gcc-git/bin/g++ -O0 a.cc -o a && ./a AFAIU swab16() gets miscompiled: Dump of assembler code for function _Z6swab16PsPKs: ... movq (%rsi),%xmm0 movdqa %xmm0,%xmm1 psllw $0x8,%xmm0 psraw $0x8,%xmm1 ; <<<- should be psrlw! por %xmm1,%xmm0 movq %xmm0,(%rdi) Here 'gcc' loads 64 bits at a time and swaps even and odd bytes - 'psllw' moves odd bytes (zero-filling, ok) - 'psraw' moves even bytes (sign-extending, bug) As a result 'por' has a chance of masking even byte position with a sign bit. $ ./gcc-git/bin/g++ -v |& unnix Using built-in specs. COLLECT_GCC=/<<NIX>>/gcc-13.0.0/bin/g++ COLLECT_LTO_WRAPPER=/<<NIX>>/gcc-13.0.0/libexec/gcc/x86_64-unknown-linux-gnu/13.0.0/lto-wrapper Target: x86_64-unknown-linux-gnu Configured with: Thread model: posix Supported LTO compression algorithms: zlib gcc version 13.0.0 20221211 (experimental) (GCC)