https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67317

            Bug ID: 67317
           Summary: [x86] Silly code generation for
                    _addcarry_u32/_addcarry_u64
           Product: gcc
           Version: 5.2.0
            Status: UNCONFIRMED
          Severity: minor
          Priority: P3
         Component: inline-asm
          Assignee: unassigned at gcc dot gnu.org
          Reporter: myriachan at gmail dot com
  Target Milestone: ---

x86 intrinsics _addcarry_u32 and _addcarry_u64 generate silly code.  For
example, the following function to get the result of a 64-bit addition (the XOR
is to the output clearer):

        u64 testcarry(u64 a, u64 b, u64 c, u64 d)
        {
                u64 result0, result1;
                _addcarry_u64(_addcarry_u64(0, a, c, &result0), b, d,
&result1);
                return result0 ^ result1;
        }

This is the code generated with -O1, -O2 and -O3:

        xor     r8d, r8d
        add     r8b, -1
        adc     rdx, rdi
        setc    r8b
        mov     rax, rdx
        add     r8b, -1
        adc     rcx, rsi
        xor     rax, rcx
        ret

The first sillyness is that _addcarry_u64 does not optimize a compile-time
constant 0 being the first carry parameter.  Instead of "adc", it should just
use "add".

The second sillyness is with the use of r8b to store the carry flag, then using
"add r8b, -1" to put the result back into carry.

Instead, the code should be something like this:

        add     rdx, rdi
        mov     rax, rdx
        adc     rcx, rsi
        xor     rax, rcx
        ret

Naturally, for something this simple, I'd use unsigned __int128, but this came
up in large number math.

Reply via email to