https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82498

            Bug ID: 82498
           Summary: Missed optimization for x86 rotate instruction
           Product: gcc
           Version: 7.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: lloyd at randombit dot net
  Target Milestone: ---

GCC doesn't seem to realize that x86 masks the high bits in the rol/ror
instructions. GCC 7.2.0 on x86-64 compiles this function, which is attempting
to do a 32-bit rotate without invoking undefined behavior

#include <stdint.h>

uint32_t rotate_left(uint32_t input, uint8_t rot)
   {
   if(rot == 0)
      return input;
   rot %= 8 * sizeof(uint32_t);
   return static_cast<uint32_t>((input << rot) | (input >>
(8*sizeof(uint32_t)-rot)));;
   }


Into

        movl    %esi, %ecx      # rot, rot
        movl    %edi, %eax      # input, tmp97
        andl    $31, %ecx       #, rot
        roll    %cl, %eax       # rot, tmp97
        testb   %sil, %sil      # rot
        cmove   %edi, %eax      # tmp97,, input, <retval>

The `andl` is unnecessary as the machine will mask the rotation amount for us.
In addition the testb/cmov pair can be omitted. Overall this resulted in a ~15%
slowdown in some code using many variable rotations (CAST-128 cipher being used
in an OpenPGP library).

Some related (but not quite the same, and supposedly fixed) issues: 57157 59100

Reply via email to