https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91838

            Bug ID: 91838
           Summary: incorrect use of shr and shrx to shift by 64, missed
                    optimization of vector shift
           Product: gcc
           Version: 9.2.0
            Status: UNCONFIRMED
          Keywords: missed-optimization, wrong-code
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: kretz at kde dot org
  Target Milestone: ---
            Target: x86_64-*-*

Test case:

using T = unsigned char; // or ushort, or uint
using V [[gnu::vector_size(8)]] = T;
V f(V x) { return x >> 8 * sizeof(T); }

GCC 10 compiles to either xor or shift (which should better be xor, as well)

GCC 9.2 compiles to:
  vmovq rax, xmm0
  mov ecx, 64
  shr rax, cl
  sal rax, (64 - 8*sizeof(T))
  vmovq xmm0, rax

The `shr rax, cl`, where cl == 64 is a nop, because shr (and shrx, which is
used when BMI2 is enabled) mask the count with 0x3f. Consequently the last
element of the input vector is unchanged in the output.

In any case, the use of shr/shrx with shifts > 64 (or 32 in case of the 32-bit
variant) should not occur.

Reply via email to