[Bug target/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)

2021-05-28 Thread tkoenig at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

Thomas Koenig  changed:

   What|Removed |Added

 CC||tkoenig at gcc dot gnu.org
   Last reconfirmed|2017-01-23 00:00:00 |2021-5-28

--- Comment #10 from Thomas Koenig  ---
Just had a look at trunk.

It currently produces

adc:
leaq800(%rsi), %rcx
xorl%edx, %edx
.L2:
movq(%rdi), %rax
addb$-1, %dl
adcq(%rsi), %rax
setc%dl
addq$8, %rsi
movq%rax, (%rdi)
addq$8, %rdi
cmpq%rcx, %rsi
jne .L2
ret

Clang does

adc:# @adc
movl$4, %eax
xorl%ecx, %ecx
.LBB0_1:# =>This Inner Loop Header: Depth=1
movq-32(%rsi,%rax,8), %rdx
addb$-1, %cl
adcq%rdx, -32(%rdi,%rax,8)
movq-24(%rsi,%rax,8), %rcx
adcq%rcx, -24(%rdi,%rax,8)
movq-16(%rsi,%rax,8), %rcx
adcq%rcx, -16(%rdi,%rax,8)
movq-8(%rsi,%rax,8), %rcx
adcq%rcx, -8(%rdi,%rax,8)
movq(%rsi,%rax,8), %rcx
adcq%rcx, (%rdi,%rax,8)
setb%cl
addq$5, %rax
cmpq$104, %rax
jne .LBB0_1
retq

so it actually unrolls the loop and does the ideal sequence of
add with carry.

[Bug target/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)

2020-11-15 Thread already5chosen at yahoo dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

--- Comment #9 from Michael_S  ---
Despite what I wrote above, I did took a look at the trunk on godbolt with same
old code from a year ago. Because it was so easy. And indeed a trunk looks ALOT
better.
But until it's released I wouldn't know if it's actually up to speed of MSVC
and clang.

[Bug target/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)

2020-11-15 Thread already5chosen at yahoo dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

--- Comment #8 from Michael_S  ---
(In reply to Jakub Jelinek from comment #7)
> (In reply to Michael_S from comment #5)
> > I agree with regard to "other targets", first of all, aarch64, but x86_64
> > variant of gcc already provides requested functionality in for of
> > _subborrow_u64 () and _addcarry_u64() intrinsic functions.
> > The problem here is not lack of functionality, but very poor implementation
> > (mentioned many times on bugzilla with minimal effect).
> > In that regard gcc is more than decade behind MSVC and ~4 years behind
> > clang/llvm. Surprisingly, icc is also quite bad.
> 
> Are you sure you have tested gcc trunk?
> There have been fixes for this a month ago as part of PR97387 fixes.

I am sure I did not :-)
And I most likely am not going to test trunk, sorry. I'd wait for release.
Essentially I am posting here not because I deeply care about the topic right
now (I did care a years or so ago, but lost interest since then, see
https://www.realworldtech.com/forum/?threadid=188061=188061), but
because I stepped on it occasionally while waiting for response to 97832.

So, sorry for intervention.

[Bug target/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)

2020-11-15 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #7 from Jakub Jelinek  ---
(In reply to Michael_S from comment #5)
> I agree with regard to "other targets", first of all, aarch64, but x86_64
> variant of gcc already provides requested functionality in for of
> _subborrow_u64 () and _addcarry_u64() intrinsic functions.
> The problem here is not lack of functionality, but very poor implementation
> (mentioned many times on bugzilla with minimal effect).
> In that regard gcc is more than decade behind MSVC and ~4 years behind
> clang/llvm. Surprisingly, icc is also quite bad.

Are you sure you have tested gcc trunk?
There have been fixes for this a month ago as part of PR97387 fixes.

[Bug target/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)

2020-11-15 Thread already5chosen at yahoo dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

--- Comment #6 from Michael_S  ---
(In reply to Marc Glisse from comment #1)
> We could start with the simpler:
> 
> void f(unsigned*__restrict__ r,unsigned*__restrict__ s,unsigned a,unsigned
> b,unsigned c, unsigned d){
>   *r=a+b;
>   *s=c+d+(*r }
> 

That works for dual-precision addition, but not for triple or more.

> After combine, we have:
> 
> (insn 34 12 20 2 (set (reg:SI 93 [ _15+4 ])
> (ltu:SI (reg:CCC 17 flags)
> (const_int 0 [0]))) 608 {*setcc_si_1_movzbl}
>  (expr_list:REG_DEAD (reg:CCC 17 flags)
> (nil)))
> 
> (insn 21 20 22 2 (parallel [
> (set (reg:SI 102)
> (plus:SI (reg:SI 37 r8 [ c ])
> (reg:SI 38 r9 [ d ])))
> (clobber (reg:CC 17 flags))
> ]) "a.c":3 213 {*addsi_1}
>  (expr_list:REG_DEAD (reg:SI 38 r9 [ d ])
> (expr_list:REG_UNUSED (reg:CC 17 flags)
> (expr_list:REG_DEAD (reg:SI 37 r8 [ c ])
> (nil)
> 
> (insn 25 24 26 2 (parallel [
> (set (reg:SI 105)
> (plus:SI (reg:SI 102)
> (reg:SI 93 [ _15+4 ])))
> (clobber (reg:CC 17 flags))
> ]) "a.c":3 213 {*addsi_1}
>  (expr_list:REG_DEAD (reg:SI 93 [ _15+4 ])
> (expr_list:REG_UNUSED (reg:CC 17 flags)
> (expr_list:REG_DEAD (reg:SI 102)
> (nil)
> 
> The combine dump says "Trying 21, 34 -> 25:" but the next line is blank and
> it moves on to trying something else.
> 
> If I use parentheses *s=c+(d+(*r 
> Trying 23, 24 -> 25:
> Successfully matched this instruction:
> (parallel [
> (set (reg:SI 105)
> (plus:SI (plus:SI (ltu:SI (reg:CCC 17 flags)
> (const_int 0 [0]))
> (reg:SI 37 r8 [ c ]))
> (reg:SI 38 r9 [ d ])))
> (clobber (reg:CC 17 flags))
> ])
> Instruction not appropriate for target.
> 
> I don't know where that target restriction is coming from, but at least we
> seem to be getting somewhere.
> 
> If I remove c and keep *s=d+(*r 
> Failed to match this instruction:
> (parallel [
> (set (reg:SI 103)
> (plus:SI (ltu:SI (reg:CCC 17 flags)
> (const_int 0 [0]))
> (reg:SI 38 r9 [ d ])))
> (clobber (reg:CC 17 flags))
> ])
> Failed to match this instruction:
> (set (reg:SI 103)
> (plus:SI (ltu:SI (reg:CCC 17 flags)
> (const_int 0 [0]))
> (reg:SI 38 r9 [ d ])))
> 
> we would probably need a special pattern for this case, virtually adding 0.

[Bug target/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)

2020-11-15 Thread already5chosen at yahoo dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

Michael_S  changed:

   What|Removed |Added

 CC||already5chosen at yahoo dot com

--- Comment #5 from Michael_S  ---
(In reply to Vincent Lefèvre from comment #0)
> There should be a way to support full add-with-carry and
> subtract-with-borrow by generating adc / sbb instructions on x86_64 (and
> similar instructions on other targets).
> 
> GCC could add builtins, such as __builtin_addc* and __builtin_subc* (two
> arguments, carry in, carry out, and the result), similar to Clang:
> http://clang.llvm.org/docs/LanguageExtensions.html#multiprecision-arithmetic-
> builtins
> as suggested in PR 60206 comment 3.
> 
> Detection of special constructs in standard C/... code would be useful too.
> Here are some examples from
> https://gcc.gnu.org/ml/gcc-help/2017-01/msg00067.html for subtraction:
> 
> typedef unsigned long T;
> 
> void sub1 (T *p, T u0, T u1, T u2, T v0, T v1, T v2)
> {
>   T t1;
>   int b0, b1;
> 
>   p[0] = u0 - v0;
>   b0 = u0 < v0;
>   t1 = u1 - v1;
>   b1 = u1 < v1;
>   p[1] = t1 - b0;
>   b1 |= p[1] > t1;
>   p[2] = u2 - v2 - b1;
> }
> 
> void sub2 (T *p, T u0, T u1, T u2, T v0, T v1, T v2)
> {
>   int b0, b1;
> 
>   p[0] = u0 - v0;
>   b0 = u0 < v0;
>   p[1] = u1 - v1 - b0;
>   b1 = u1 < v1 || (u1 == v1 && b0 != 0);
>   p[2] = u2 - v2 - b1;
> }
> 
> In the second example, the b1 line could also be replaced by:
> 
>   b1 = u1 < v1 + b0 || v1 + b0 < v1;
> 
> For the subtractions, optimal code would contain 1 sub and 2 sbb's.

I agree with regard to "other targets", first of all, aarch64, but x86_64
variant of gcc already provides requested functionality in for of
_subborrow_u64 () and _addcarry_u64() intrinsic functions.
The problem here is not lack of functionality, but very poor implementation
(mentioned many times on bugzilla with minimal effect).
In that regard gcc is more than decade behind MSVC and ~4 years behind
clang/llvm. Surprisingly, icc is also quite bad.

[Bug target/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)

2019-11-14 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173
Bug 79173 depends on bug 80491, which changed state.

Bug 80491 Summary: [7 Regression] Compiler regression for long-add case.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80491

   What|Removed |Added

 Status|RESOLVED|REOPENED
 Resolution|FIXED   |---

[Bug target/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)

2019-11-14 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173
Bug 79173 depends on bug 80491, which changed state.

Bug 80491 Summary: [7 Regression] Compiler regression for long-add case.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80491

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

[Bug target/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)

2017-01-25 Thread vincent-gcc at vinc17 dot net
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

--- Comment #4 from Vincent Lefèvre  ---
Also, make sure that the optimization is still done when a variable is a
constant or replaced by a constant (with Clang, the optimization is no longer
done in such a case).

[Bug target/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)

2017-01-23 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

Richard Biener  changed:

   What|Removed |Added

 Target||x86_64-*-*
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2017-01-23
Version|unknown |7.0
 Ever confirmed|0   |1
   Severity|normal  |enhancement

--- Comment #3 from Richard Biener  ---
Confirmed, the bug should possibly be split (Clang compatible builtins, target
specific detection of patterns, middle-end detection of patterns).  Note a
suitable middle-end representation would not have the address argument but
likely
return two values (via the usual _Complex int trick).

[Bug target/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)

2017-01-21 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

--- Comment #2 from Marc Glisse  ---
(In reply to Marc Glisse from comment #1)
> Trying 23, 24 -> 25:
> Successfully matched this instruction:
> (parallel [
> (set (reg:SI 105)
> (plus:SI (plus:SI (ltu:SI (reg:CCC 17 flags)
> (const_int 0 [0]))
> (reg:SI 37 r8 [ c ]))
> (reg:SI 38 r9 [ d ])))
> (clobber (reg:CC 17 flags))
> ])
> Instruction not appropriate for target.

I didn't notice immediately, but apparently ix86_legitimate_combined_insn was
happy to let combine propagate the hard registers r8 and r9 into the simple
additions *addsi_1 which has (match_operand:SWI48 2 "x86_64_general_operand"
"rme,re,0,le"), but then it won't accept them in the addition with carry. My
guess would be that letting the hard registers into the additions is premature
and should be delayed until after combine. If I manually reject those in gdb,
we do produce addl+adcl.

[Bug target/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)

2017-01-21 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

--- Comment #1 from Marc Glisse  ---
We could start with the simpler:

void f(unsigned*__restrict__ r,unsigned*__restrict__ s,unsigned a,unsigned
b,unsigned c, unsigned d){
  *r=a+b;
  *s=c+d+(*r 25:" but the next line is blank and it
moves on to trying something else.

If I use parentheses *s=c+(d+(*r 25:
Successfully matched this instruction:
(parallel [
(set (reg:SI 105)
(plus:SI (plus:SI (ltu:SI (reg:CCC 17 flags)
(const_int 0 [0]))
(reg:SI 37 r8 [ c ]))
(reg:SI 38 r9 [ d ])))
(clobber (reg:CC 17 flags))
])
Instruction not appropriate for target.

I don't know where that target restriction is coming from, but at least we seem
to be getting somewhere.

If I remove c and keep *s=d+(*r