[Bug target/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173 Thomas Koenig changed: What|Removed |Added CC||tkoenig at gcc dot gnu.org Last reconfirmed|2017-01-23 00:00:00 |2021-5-28 --- Comment #10 from Thomas Koenig --- Just had a look at trunk. It currently produces adc: leaq800(%rsi), %rcx xorl%edx, %edx .L2: movq(%rdi), %rax addb$-1, %dl adcq(%rsi), %rax setc%dl addq$8, %rsi movq%rax, (%rdi) addq$8, %rdi cmpq%rcx, %rsi jne .L2 ret Clang does adc:# @adc movl$4, %eax xorl%ecx, %ecx .LBB0_1:# =>This Inner Loop Header: Depth=1 movq-32(%rsi,%rax,8), %rdx addb$-1, %cl adcq%rdx, -32(%rdi,%rax,8) movq-24(%rsi,%rax,8), %rcx adcq%rcx, -24(%rdi,%rax,8) movq-16(%rsi,%rax,8), %rcx adcq%rcx, -16(%rdi,%rax,8) movq-8(%rsi,%rax,8), %rcx adcq%rcx, -8(%rdi,%rax,8) movq(%rsi,%rax,8), %rcx adcq%rcx, (%rdi,%rax,8) setb%cl addq$5, %rax cmpq$104, %rax jne .LBB0_1 retq so it actually unrolls the loop and does the ideal sequence of add with carry.
[Bug target/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173 --- Comment #9 from Michael_S --- Despite what I wrote above, I did took a look at the trunk on godbolt with same old code from a year ago. Because it was so easy. And indeed a trunk looks ALOT better. But until it's released I wouldn't know if it's actually up to speed of MSVC and clang.
[Bug target/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173 --- Comment #8 from Michael_S --- (In reply to Jakub Jelinek from comment #7) > (In reply to Michael_S from comment #5) > > I agree with regard to "other targets", first of all, aarch64, but x86_64 > > variant of gcc already provides requested functionality in for of > > _subborrow_u64 () and _addcarry_u64() intrinsic functions. > > The problem here is not lack of functionality, but very poor implementation > > (mentioned many times on bugzilla with minimal effect). > > In that regard gcc is more than decade behind MSVC and ~4 years behind > > clang/llvm. Surprisingly, icc is also quite bad. > > Are you sure you have tested gcc trunk? > There have been fixes for this a month ago as part of PR97387 fixes. I am sure I did not :-) And I most likely am not going to test trunk, sorry. I'd wait for release. Essentially I am posting here not because I deeply care about the topic right now (I did care a years or so ago, but lost interest since then, see https://www.realworldtech.com/forum/?threadid=188061=188061), but because I stepped on it occasionally while waiting for response to 97832. So, sorry for intervention.
[Bug target/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #7 from Jakub Jelinek --- (In reply to Michael_S from comment #5) > I agree with regard to "other targets", first of all, aarch64, but x86_64 > variant of gcc already provides requested functionality in for of > _subborrow_u64 () and _addcarry_u64() intrinsic functions. > The problem here is not lack of functionality, but very poor implementation > (mentioned many times on bugzilla with minimal effect). > In that regard gcc is more than decade behind MSVC and ~4 years behind > clang/llvm. Surprisingly, icc is also quite bad. Are you sure you have tested gcc trunk? There have been fixes for this a month ago as part of PR97387 fixes.
[Bug target/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173 --- Comment #6 from Michael_S --- (In reply to Marc Glisse from comment #1) > We could start with the simpler: > > void f(unsigned*__restrict__ r,unsigned*__restrict__ s,unsigned a,unsigned > b,unsigned c, unsigned d){ > *r=a+b; > *s=c+d+(*r } > That works for dual-precision addition, but not for triple or more. > After combine, we have: > > (insn 34 12 20 2 (set (reg:SI 93 [ _15+4 ]) > (ltu:SI (reg:CCC 17 flags) > (const_int 0 [0]))) 608 {*setcc_si_1_movzbl} > (expr_list:REG_DEAD (reg:CCC 17 flags) > (nil))) > > (insn 21 20 22 2 (parallel [ > (set (reg:SI 102) > (plus:SI (reg:SI 37 r8 [ c ]) > (reg:SI 38 r9 [ d ]))) > (clobber (reg:CC 17 flags)) > ]) "a.c":3 213 {*addsi_1} > (expr_list:REG_DEAD (reg:SI 38 r9 [ d ]) > (expr_list:REG_UNUSED (reg:CC 17 flags) > (expr_list:REG_DEAD (reg:SI 37 r8 [ c ]) > (nil) > > (insn 25 24 26 2 (parallel [ > (set (reg:SI 105) > (plus:SI (reg:SI 102) > (reg:SI 93 [ _15+4 ]))) > (clobber (reg:CC 17 flags)) > ]) "a.c":3 213 {*addsi_1} > (expr_list:REG_DEAD (reg:SI 93 [ _15+4 ]) > (expr_list:REG_UNUSED (reg:CC 17 flags) > (expr_list:REG_DEAD (reg:SI 102) > (nil) > > The combine dump says "Trying 21, 34 -> 25:" but the next line is blank and > it moves on to trying something else. > > If I use parentheses *s=c+(d+(*r > Trying 23, 24 -> 25: > Successfully matched this instruction: > (parallel [ > (set (reg:SI 105) > (plus:SI (plus:SI (ltu:SI (reg:CCC 17 flags) > (const_int 0 [0])) > (reg:SI 37 r8 [ c ])) > (reg:SI 38 r9 [ d ]))) > (clobber (reg:CC 17 flags)) > ]) > Instruction not appropriate for target. > > I don't know where that target restriction is coming from, but at least we > seem to be getting somewhere. > > If I remove c and keep *s=d+(*r > Failed to match this instruction: > (parallel [ > (set (reg:SI 103) > (plus:SI (ltu:SI (reg:CCC 17 flags) > (const_int 0 [0])) > (reg:SI 38 r9 [ d ]))) > (clobber (reg:CC 17 flags)) > ]) > Failed to match this instruction: > (set (reg:SI 103) > (plus:SI (ltu:SI (reg:CCC 17 flags) > (const_int 0 [0])) > (reg:SI 38 r9 [ d ]))) > > we would probably need a special pattern for this case, virtually adding 0.
[Bug target/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173 Michael_S changed: What|Removed |Added CC||already5chosen at yahoo dot com --- Comment #5 from Michael_S --- (In reply to Vincent Lefèvre from comment #0) > There should be a way to support full add-with-carry and > subtract-with-borrow by generating adc / sbb instructions on x86_64 (and > similar instructions on other targets). > > GCC could add builtins, such as __builtin_addc* and __builtin_subc* (two > arguments, carry in, carry out, and the result), similar to Clang: > http://clang.llvm.org/docs/LanguageExtensions.html#multiprecision-arithmetic- > builtins > as suggested in PR 60206 comment 3. > > Detection of special constructs in standard C/... code would be useful too. > Here are some examples from > https://gcc.gnu.org/ml/gcc-help/2017-01/msg00067.html for subtraction: > > typedef unsigned long T; > > void sub1 (T *p, T u0, T u1, T u2, T v0, T v1, T v2) > { > T t1; > int b0, b1; > > p[0] = u0 - v0; > b0 = u0 < v0; > t1 = u1 - v1; > b1 = u1 < v1; > p[1] = t1 - b0; > b1 |= p[1] > t1; > p[2] = u2 - v2 - b1; > } > > void sub2 (T *p, T u0, T u1, T u2, T v0, T v1, T v2) > { > int b0, b1; > > p[0] = u0 - v0; > b0 = u0 < v0; > p[1] = u1 - v1 - b0; > b1 = u1 < v1 || (u1 == v1 && b0 != 0); > p[2] = u2 - v2 - b1; > } > > In the second example, the b1 line could also be replaced by: > > b1 = u1 < v1 + b0 || v1 + b0 < v1; > > For the subtractions, optimal code would contain 1 sub and 2 sbb's. I agree with regard to "other targets", first of all, aarch64, but x86_64 variant of gcc already provides requested functionality in for of _subborrow_u64 () and _addcarry_u64() intrinsic functions. The problem here is not lack of functionality, but very poor implementation (mentioned many times on bugzilla with minimal effect). In that regard gcc is more than decade behind MSVC and ~4 years behind clang/llvm. Surprisingly, icc is also quite bad.
[Bug target/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173 Bug 79173 depends on bug 80491, which changed state. Bug 80491 Summary: [7 Regression] Compiler regression for long-add case. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80491 What|Removed |Added Status|RESOLVED|REOPENED Resolution|FIXED |---
[Bug target/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173 Bug 79173 depends on bug 80491, which changed state. Bug 80491 Summary: [7 Regression] Compiler regression for long-add case. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80491 What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED
[Bug target/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173 --- Comment #4 from Vincent Lefèvre --- Also, make sure that the optimization is still done when a variable is a constant or replaced by a constant (with Clang, the optimization is no longer done in such a case).
[Bug target/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173 Richard Biener changed: What|Removed |Added Target||x86_64-*-* Status|UNCONFIRMED |NEW Last reconfirmed||2017-01-23 Version|unknown |7.0 Ever confirmed|0 |1 Severity|normal |enhancement --- Comment #3 from Richard Biener --- Confirmed, the bug should possibly be split (Clang compatible builtins, target specific detection of patterns, middle-end detection of patterns). Note a suitable middle-end representation would not have the address argument but likely return two values (via the usual _Complex int trick).
[Bug target/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173 --- Comment #2 from Marc Glisse --- (In reply to Marc Glisse from comment #1) > Trying 23, 24 -> 25: > Successfully matched this instruction: > (parallel [ > (set (reg:SI 105) > (plus:SI (plus:SI (ltu:SI (reg:CCC 17 flags) > (const_int 0 [0])) > (reg:SI 37 r8 [ c ])) > (reg:SI 38 r9 [ d ]))) > (clobber (reg:CC 17 flags)) > ]) > Instruction not appropriate for target. I didn't notice immediately, but apparently ix86_legitimate_combined_insn was happy to let combine propagate the hard registers r8 and r9 into the simple additions *addsi_1 which has (match_operand:SWI48 2 "x86_64_general_operand" "rme,re,0,le"), but then it won't accept them in the addition with carry. My guess would be that letting the hard registers into the additions is premature and should be delayed until after combine. If I manually reject those in gdb, we do produce addl+adcl.
[Bug target/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173 --- Comment #1 from Marc Glisse --- We could start with the simpler: void f(unsigned*__restrict__ r,unsigned*__restrict__ s,unsigned a,unsigned b,unsigned c, unsigned d){ *r=a+b; *s=c+d+(*r 25:" but the next line is blank and it moves on to trying something else. If I use parentheses *s=c+(d+(*r 25: Successfully matched this instruction: (parallel [ (set (reg:SI 105) (plus:SI (plus:SI (ltu:SI (reg:CCC 17 flags) (const_int 0 [0])) (reg:SI 37 r8 [ c ])) (reg:SI 38 r9 [ d ]))) (clobber (reg:CC 17 flags)) ]) Instruction not appropriate for target. I don't know where that target restriction is coming from, but at least we seem to be getting somewhere. If I remove c and keep *s=d+(*r