https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91838
Bug ID: 91838 Summary: incorrect use of shr and shrx to shift by 64, missed optimization of vector shift Product: gcc Version: 9.2.0 Status: UNCONFIRMED Keywords: missed-optimization, wrong-code Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Target: x86_64-*-* Test case: using T = unsigned char; // or ushort, or uint using V [[gnu::vector_size(8)]] = T; V f(V x) { return x >> 8 * sizeof(T); } GCC 10 compiles to either xor or shift (which should better be xor, as well) GCC 9.2 compiles to: vmovq rax, xmm0 mov ecx, 64 shr rax, cl sal rax, (64 - 8*sizeof(T)) vmovq xmm0, rax The `shr rax, cl`, where cl == 64 is a nop, because shr (and shrx, which is used when BMI2 is enabled) mask the count with 0x3f. Consequently the last element of the input vector is unchanged in the output. In any case, the use of shr/shrx with shifts > 64 (or 32 in case of the 32-bit variant) should not occur.