[Bug target/95737] PPC: Unnecessary extsw after negative less than
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95737 HaoChen Gui changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #11 from HaoChen Gui --- Fixed.
[Bug target/95737] PPC: Unnecessary extsw after negative less than
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95737 --- Comment #10 from CVS Commits --- The master branch has been updated by HaoChen Gui : https://gcc.gnu.org/g:a174dc1a7f2bf0a71475ff633b130a60c0c3ff4a commit r13-582-ga174dc1a7f2bf0a71475ff633b130a60c0c3ff4a Author: Haochen Gui Date: Wed May 11 09:19:52 2022 +0800 This patch adds a combine pattern for "CA minus one". The SImode "CA minus one" can be converted to DImode as CA only has two values (0 or 1). gcc/ PR target/95737 * config/rs6000/rs6000.md (*subfsi3_carry_in_xx_64): New. gcc/testsuite/ PR target/95737 * gcc.target/powerpc/pr95737.c: New.
[Bug target/95737] PPC: Unnecessary extsw after negative less than
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95737 --- Comment #9 from HaoChen Gui --- Add a pattern to convert the plus mode to DI. +(define_insn_and_split "*my_split" + [(set (match_operand:DI 0 "gpc_reg_operand") + (sign_extend:DI (plus:SI (match_operand:SI 1 "ca_operand") +(const_int -1] + "" + "#" + "" + [(parallel [(set (match_dup 0) + (plus:DI (match_dup 2) + (const_int -1))) + (clobber (match_dup 2))])] +{ + operands[2] = copy_rtx (operands[1]); + PUT_MODE (operands[2], DImode); +}) With the patch, the "extsw" could be optimized out. I compared the performance between P8 code (with the patch) and P9 code. The performance of P9 is better. ISA says that computation with CA causes additional latency. It should be true. The only concern is P9 code uses more register.
[Bug target/95737] PPC: Unnecessary extsw after negative less than
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95737 --- Comment #8 from Segher Boessenkool --- Somewhat more constructive... The problem here is that neg isn't pushed "through" isel insns. This in general means you need to negate both inputs to the isel of course, but there are cases where that is advantageous, like here when both of those inputs are constants (or actually registers, but set to some constant). This is one reason the new set[n]bc[r] insns are so useful to us: it is all one insn, also on RTL level, so it naturally works out in such cases. It also is slightly faster anyway of course, fewer insns and all that, even if for dataflow it is all the same.
[Bug target/95737] PPC: Unnecessary extsw after negative less than
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95737 --- Comment #7 from Segher Boessenkool --- > Seems cmp+isel on P9 is sub-optimal. For this particular test, perhaps. But it is better overall, at least some years ago. It was benchmarked (with spec), on p9.
[Bug target/95737] PPC: Unnecessary extsw after negative less than
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95737 HaoChen Gui changed: What|Removed |Added CC||guihaoc at gcc dot gnu.org --- Comment #6 from HaoChen Gui --- //source code unsigned long long negativeLessThan(unsigned long long a, unsigned long long b) { return -(a < b); } //P8 with -O2 subfc 4,4,3 subfe 3,3,3 extsw 3,3 //P9 with -O2 li 10,0 li 9,1 cmpld 0,3,4 isel 3,9,10,0 neg 3,3 Seems cmp+isel on P9 is sub-optimal.
[Bug target/95737] PPC: Unnecessary extsw after negative less than
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95737 Segher Boessenkool changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |segher at gcc dot gnu.org CC||segher at gcc dot gnu.org Target|powerpc-*-*-* |powerpc64*-*-* Ever confirmed|0 |1 Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2020-06-21 --- Comment #5 from Segher Boessenkool --- In gimple this already is negativeLessThan (long long unsigned int a, long long unsigned int b) { _Bool _1; int _2; int _3; long long unsigned int _6; _1 = a_4(D) < b_5(D); _2 = (int) _1; _3 = -_2; _6 = (long long unsigned int) _3; return _6; } Then, it is expanded as a sign_extend:DI of a subreg:SI, and nowhere does it see this isn't necessary (it isn't because that SI cannot be negative). The RTL code isn't optimised very well before combine, and that does Trying 11 -> 12: 11: {r128:SI=ca:SI-0x1;clobber ca:SI;} REG_UNUSED ca:SI 12: r123:DI=sign_extend(r128:SI) REG_DEAD r128:SI Failed to match this instruction: (set (reg:DI 123) (sign_extend:DI (plus:SI (reg:SI 98 ca [+4 ]) (const_int -1 [0x] (note everything is made SImode in insn 11 before, it absorbed the subreg). Combine cannot keep track of known zero bits of hard regs well, so it fails to see that XER[CA] is only ever 0 or 1 here (it always is, but it doesn't know that either). I'll try to add an extra pattern for this extend, that will do the trick I think.
[Bug target/95737] PPC: Unnecessary extsw after negative less than
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95737 --- Comment #4 from wschmidt at linux dot ibm.com --- On 6/19/20 12:43 PM, jens.seifert at de dot ibm.com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95737 > > Jens Seifert changed: > > What|Removed |Added > > Status|RESOLVED|UNCONFIRMED > Resolution|DUPLICATE |--- > > --- Comment #3 from Jens Seifert --- > This is different as the extsw also happens if the result gets used e.g. > followed by a andc, which is my case. I obviously oversimplified the sample. > It > has nothing to do with function result and ABI requirements. gcc assume that > the result of -(a < b) implemented by subfc, subfe is signed 32-bit. But the > result is already 64-bit. > > unsigned long long branchlesconditional(unsigned long long a, unsigned long > long b, unsigned long long c) > { > unsigned long long mask = -(a < b); > return c &~ mask; > } > > results in > > _Z20branchlesconditionalyyy: > .LFB1: > .cfi_startproc > subfc 4,4,3 > subfe 3,3,3 > not 3,3 > extsw 3,3 > and 3,3,5 > blr > > expected > subfc > subfe > andc > Thanks for verifying, Jens!
[Bug target/95737] PPC: Unnecessary extsw after negative less than
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95737 Jens Seifert changed: What|Removed |Added Status|RESOLVED|UNCONFIRMED Resolution|DUPLICATE |--- --- Comment #3 from Jens Seifert --- This is different as the extsw also happens if the result gets used e.g. followed by a andc, which is my case. I obviously oversimplified the sample. It has nothing to do with function result and ABI requirements. gcc assume that the result of -(a < b) implemented by subfc, subfe is signed 32-bit. But the result is already 64-bit. unsigned long long branchlesconditional(unsigned long long a, unsigned long long b, unsigned long long c) { unsigned long long mask = -(a < b); return c &~ mask; } results in _Z20branchlesconditionalyyy: .LFB1: .cfi_startproc subfc 4,4,3 subfe 3,3,3 not 3,3 extsw 3,3 and 3,3,5 blr expected subfc subfe andc
[Bug target/95737] PPC: Unnecessary extsw after negative less than
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95737 Bill Schmidt changed: What|Removed |Added Resolution|--- |DUPLICATE Status|UNCONFIRMED |RESOLVED --- Comment #2 from Bill Schmidt --- If you can show this is different from 65010 (not a return value issue), please reopen. *** This bug has been marked as a duplicate of bug 65010 ***
[Bug target/95737] PPC: Unnecessary extsw after negative less than
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95737 --- Comment #1 from Bill Schmidt --- Please test this out of context of a return statement. The problem with unnecessary extends of return values is widely known and not specific to this particular case.