https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110591
Bug ID: 110591 Summary: [i386] (Maybe) Missed optimisation: _cmpccxadd sets flags Product: gcc Version: 13.1.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: thiago at kde dot org Target Milestone: --- In: #include <immintrin.h> bool increment_if(int *ptr, int v) { return _cmpccxadd_epi32(ptr, v, 1, _CMPCCX_Z) == v; } GCC generates (and current Clang does the same): increment_if(int*, int): movl $1, %edx movl %esi, %eax cmpzxadd %edx, %eax, (%rdi) cmpl %eax, %esi sete %al ret The CMPccXADD instructions set EFLAGS to the result of the comparison of their memory operand to the middle one, which will get the current value of that memory location whether the comparison succeeded or not. That means the CMP instruction on the next line is superfluous, since it'll set the flags to exactly what they are already set to. That means this particular example could be written: movl $1, %edx cmpzxadd %edx, %esi, (%rdi) sete %al ret Saving 2 retire slots and 1 uop. This can be done every time the result of the intrinsic is compared to the same value that was passed as the intrinsic's second parameter. However, in a real workload, this function is likely to be inlined, where the extra MOV may not be present at all and the CMP is likely to be followed by a Jcc instead of a SETcc. For the latter case, the CMP+Jcc would be macro-fused, so there would be no 1-uop gain. Moreover, this atomic operation is likely going to be multiple cycles long and the conditional code after it probably can't be speculated very well either. I'll leave it up to you to decide whether it's worth pursuing this.