https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113982
Bug ID: 113982 Summary: Poor codegen for 64-bit add with carry widening functions Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: janschultke at googlemail dot com Target Milestone: --- I was trying to get optimal codegen for a 64-bit addition with a carry, but it's tough to do with GCC: > struct add_result { > unsigned long long sum; > bool carry; > }; > > add_result add_wide_1(unsigned long long x, unsigned long long y) { > auto r = (unsigned __int128) x + y; > return add_result{static_cast<unsigned long long>(r), bool(r >> 64)}; > } > > add_result add_wide_2(unsigned long long x, unsigned long long y) { > unsigned long long r; > bool carry = __builtin_add_overflow(x, y, &r); > return add_result{r, carry}; > } ## Expected output (clang -march=x86-64-v4 -O3) add_wide_1(unsigned long long, unsigned long long): mov rax, rdi add rax, rsi setb dl ret add_wide_2(unsigned long long, unsigned long long): mov rax, rdi add rax, rsi setb dl ret ## Actual output (GCC -march=x86-64-v4 -O3) (https://godbolt.org/z/qGc9WeEvK) add_wide_1(unsigned long long, unsigned long long): mov rcx, rdi lea rax, [rdi+rsi] xor edx, edx xor edi, edi add rsi, rcx adc rdi, 0 mov dl, dil and dl, 1 ret add_wide_2(unsigned long long, unsigned long long): add rdi, rsi mov edx, 0 mov rax, rdi setc dl ret The output for the 128-bit version looks pretty bad. It looks like GCC isn't aware that we only access the carry bit, so it doesn't need to do full 128-bit arithmetic so to speak. The add_wide_2 output also isn't optimal. Why would it output "mov edx, 0" instead of "xor edx, edx"?