https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110591

            Bug ID: 110591
           Summary: [i386] (Maybe) Missed optimisation: _cmpccxadd sets
                    flags
           Product: gcc
           Version: 13.1.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: thiago at kde dot org
  Target Milestone: ---

In:
#include <immintrin.h>

bool increment_if(int *ptr, int v)
{
    return _cmpccxadd_epi32(ptr, v, 1, _CMPCCX_Z) == v;
}

GCC generates (and current Clang does the same):

increment_if(int*, int):
        movl    $1, %edx
        movl    %esi, %eax
        cmpzxadd        %edx, %eax, (%rdi)
        cmpl    %eax, %esi
        sete    %al
        ret

The CMPccXADD instructions set EFLAGS to the result of the comparison of their
memory operand to the middle one, which will get the current value of that
memory location whether the comparison succeeded or not. That means the CMP
instruction on the next line is superfluous, since it'll set the flags to
exactly what they are already set to. That means this particular example could
be written:

        movl    $1, %edx
        cmpzxadd        %edx, %esi, (%rdi)
        sete    %al
        ret

Saving 2 retire slots and 1 uop. This can be done every time the result of the
intrinsic is compared to the same value that was passed as the intrinsic's
second parameter.

However, in a real workload, this function is likely to be inlined, where the
extra MOV may not be present at all and the CMP is likely to be followed by a
Jcc instead of a SETcc. For the latter case, the CMP+Jcc would be macro-fused,
so there would be no 1-uop gain. Moreover, this atomic operation is likely
going to be multiple cycles long and the conditional code after it probably
can't be speculated very well either.

I'll leave it up to you to decide whether it's worth pursuing this.

Reply via email to