[Bug target/95737] PPC: Unnecessary extsw after negative less than

2022-07-28 Thread guihaoc at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95737

HaoChen Gui  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #11 from HaoChen Gui  ---
Fixed.

[Bug target/95737] PPC: Unnecessary extsw after negative less than

2022-05-17 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95737

--- Comment #10 from CVS Commits  ---
The master branch has been updated by HaoChen Gui :

https://gcc.gnu.org/g:a174dc1a7f2bf0a71475ff633b130a60c0c3ff4a

commit r13-582-ga174dc1a7f2bf0a71475ff633b130a60c0c3ff4a
Author: Haochen Gui 
Date:   Wed May 11 09:19:52 2022 +0800

This patch adds a combine pattern for "CA minus one". The SImode "CA minus
one" can be converted to DImode as CA only has two values (0 or 1).

gcc/
PR target/95737
* config/rs6000/rs6000.md (*subfsi3_carry_in_xx_64): New.

gcc/testsuite/
PR target/95737
* gcc.target/powerpc/pr95737.c: New.

[Bug target/95737] PPC: Unnecessary extsw after negative less than

2022-01-14 Thread guihaoc at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95737

--- Comment #9 from HaoChen Gui  ---
Add a pattern to convert the plus mode to DI. 

+(define_insn_and_split "*my_split"
+  [(set (match_operand:DI 0 "gpc_reg_operand")
+   (sign_extend:DI (plus:SI (match_operand:SI 1 "ca_operand")
+(const_int -1]
+  ""
+  "#"
+  ""
+  [(parallel [(set (match_dup 0)
+  (plus:DI (match_dup 2)
+   (const_int -1)))
+ (clobber (match_dup 2))])]
+{
+  operands[2] = copy_rtx (operands[1]);
+  PUT_MODE (operands[2], DImode);
+})

With the patch, the "extsw" could be optimized out. I compared the performance
between P8 code (with the patch) and P9 code. The performance of P9 is better. 
ISA says that computation with CA causes additional latency. It should be true.
The only concern is P9 code uses more register.

[Bug target/95737] PPC: Unnecessary extsw after negative less than

2022-01-13 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95737

--- Comment #8 from Segher Boessenkool  ---
Somewhat more constructive...  The problem here is that neg isn't pushed
"through" isel insns.  This in general means you need to negate both inputs
to the isel of course, but there are cases where that is advantageous, like
here when both of those inputs are constants (or actually registers, but
set to some constant).

This is one reason the new set[n]bc[r] insns are so useful to us: it is all
one insn, also on RTL level, so it naturally works out in such cases.  It
also is slightly faster anyway of course, fewer insns and all that, even
if for dataflow it is all the same.

[Bug target/95737] PPC: Unnecessary extsw after negative less than

2022-01-13 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95737

--- Comment #7 from Segher Boessenkool  ---
> Seems cmp+isel on P9 is sub-optimal.

For this particular test, perhaps.  But it is better overall, at least some
years ago.  It was benchmarked (with spec), on p9.

[Bug target/95737] PPC: Unnecessary extsw after negative less than

2022-01-05 Thread guihaoc at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95737

HaoChen Gui  changed:

   What|Removed |Added

 CC||guihaoc at gcc dot gnu.org

--- Comment #6 from HaoChen Gui  ---
//source code
unsigned long long negativeLessThan(unsigned long long a, unsigned long long b)
{
   return -(a < b);
}

//P8 with -O2
subfc 4,4,3
subfe 3,3,3
extsw 3,3


//P9 with -O2
li 10,0
li 9,1
cmpld 0,3,4
isel 3,9,10,0
neg 3,3

Seems cmp+isel on P9 is sub-optimal.

[Bug target/95737] PPC: Unnecessary extsw after negative less than

2020-06-21 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95737

Segher Boessenkool  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |segher at gcc dot 
gnu.org
 CC||segher at gcc dot gnu.org
 Target|powerpc-*-*-*   |powerpc64*-*-*
 Ever confirmed|0   |1
 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2020-06-21

--- Comment #5 from Segher Boessenkool  ---
In gimple this already is

negativeLessThan (long long unsigned int a, long long unsigned int b)
{
  _Bool _1;
  int _2;
  int _3;
  long long unsigned int _6;

  _1 = a_4(D) < b_5(D);
  _2 = (int) _1;
  _3 = -_2;
  _6 = (long long unsigned int) _3;
  return _6;
}

Then, it is expanded as a sign_extend:DI of a subreg:SI, and nowhere
does it see this isn't necessary (it isn't because that SI cannot be
negative).

The RTL code isn't optimised very well before combine, and that does

Trying 11 -> 12:
   11: {r128:SI=ca:SI-0x1;clobber ca:SI;}
  REG_UNUSED ca:SI
   12: r123:DI=sign_extend(r128:SI)
  REG_DEAD r128:SI
Failed to match this instruction:
(set (reg:DI 123)
(sign_extend:DI (plus:SI (reg:SI 98 ca [+4 ])
(const_int -1 [0x]

(note everything is made SImode in insn 11 before, it absorbed the
subreg).  Combine cannot keep track of known zero bits of hard regs
well, so it fails to see that XER[CA] is only ever 0 or 1 here (it
always is, but it doesn't know that either).

I'll try to add an extra pattern for this extend, that will do the
trick I think.

[Bug target/95737] PPC: Unnecessary extsw after negative less than

2020-06-21 Thread wschmidt at linux dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95737

--- Comment #4 from wschmidt at linux dot ibm.com ---
On 6/19/20 12:43 PM, jens.seifert at de dot ibm.com wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95737
>
> Jens Seifert  changed:
>
> What|Removed |Added
> 
>   Status|RESOLVED|UNCONFIRMED
>   Resolution|DUPLICATE   |---
>
> --- Comment #3 from Jens Seifert  ---
> This is different as the extsw also happens if the result gets used e.g.
> followed by a andc, which is my case. I obviously oversimplified the sample. 
> It
> has nothing to do with function result and ABI requirements. gcc assume that
> the result of -(a < b) implemented by subfc, subfe is signed 32-bit. But the
> result is already 64-bit.
>
> unsigned long long branchlesconditional(unsigned long long a, unsigned long
> long b, unsigned long long c)
> {
> unsigned long long mask = -(a < b);
> return c &~ mask;
> }
>
> results in
>
> _Z20branchlesconditionalyyy:
> .LFB1:
>  .cfi_startproc
>  subfc 4,4,3
>  subfe 3,3,3
>  not 3,3
>  extsw 3,3
>  and 3,3,5
>  blr
>
> expected
> subfc
> subfe
> andc
>
Thanks for verifying, Jens!

[Bug target/95737] PPC: Unnecessary extsw after negative less than

2020-06-19 Thread jens.seifert at de dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95737

Jens Seifert  changed:

   What|Removed |Added

 Status|RESOLVED|UNCONFIRMED
 Resolution|DUPLICATE   |---

--- Comment #3 from Jens Seifert  ---
This is different as the extsw also happens if the result gets used e.g.
followed by a andc, which is my case. I obviously oversimplified the sample. It
has nothing to do with function result and ABI requirements. gcc assume that
the result of -(a < b) implemented by subfc, subfe is signed 32-bit. But the
result is already 64-bit.

unsigned long long branchlesconditional(unsigned long long a, unsigned long
long b, unsigned long long c)
{
   unsigned long long mask = -(a < b);
   return c &~ mask;
}

results in

_Z20branchlesconditionalyyy:
.LFB1:
.cfi_startproc
subfc 4,4,3
subfe 3,3,3
not 3,3
extsw 3,3
and 3,3,5
blr

expected
subfc
subfe
andc

[Bug target/95737] PPC: Unnecessary extsw after negative less than

2020-06-19 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95737

Bill Schmidt  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
 Status|UNCONFIRMED |RESOLVED

--- Comment #2 from Bill Schmidt  ---
If you can show this is different from 65010 (not a return value issue), please
reopen.

*** This bug has been marked as a duplicate of bug 65010 ***

[Bug target/95737] PPC: Unnecessary extsw after negative less than

2020-06-19 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95737

--- Comment #1 from Bill Schmidt  ---
Please test this out of context of a return statement.  The problem with
unnecessary extends of return values is widely known and not specific to this
particular case.