[Bug target/91635] wrong code at -O2 with __builtin_add_overflow() and shifts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91635 Jim Wilson changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED Assignee|unassigned at gcc dot gnu.org |wilson at gcc dot gnu.org --- Comment #27 from Jim Wilson --- Fixed on mainline and gcc-9 branch.
[Bug target/91635] wrong code at -O2 with __builtin_add_overflow() and shifts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91635 --- Comment #26 from Kito Cheng --- Author: kito Date: Thu Sep 19 06:38:23 2019 New Revision: 275929 URL: https://gcc.gnu.org/viewcvs?rev=275929&root=gcc&view=rev Log: RISC-V: Fix bad insn splits with paradoxical subregs. Shifting by more than the size of a SUBREG_REG doesn't work, so we either need to disable splits if an input is paradoxical, or else we need to generate a clean temporary for intermediate results. Jakub wrote the first version of this patch, so gets primary credit for it. gcc/ PR target/91635 * config/riscv/riscv.md (zero_extendsidi2, zero_extendhi2, extend2): Don't split if paradoxical_subreg_p (operands[0]). (*lshrsi3_zero_extend_3+1, *lshrsi3_zero_extend_3+2): Add clobber and use as intermediate value. gcc/testsuite/ PR target/91635 * gcc.c-torture/execute/pr91635.c: New test. * gcc.target/riscv/shift-shift-4.c: New test. * gcc.target/riscv/shift-shift-5.c: New test. Added: branches/gcc-9-branch/gcc/testsuite/gcc.c-torture/execute/pr91635.c branches/gcc-9-branch/gcc/testsuite/gcc.target/riscv/shift-shift-4.c branches/gcc-9-branch/gcc/testsuite/gcc.target/riscv/shift-shift-5.c Modified: branches/gcc-9-branch/gcc/ChangeLog branches/gcc-9-branch/gcc/config/riscv/riscv.md branches/gcc-9-branch/gcc/testsuite/ChangeLog
[Bug target/91635] wrong code at -O2 with __builtin_add_overflow() and shifts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91635 --- Comment #25 from Jim Wilson --- Author: wilson Date: Thu Sep 5 20:32:55 2019 New Revision: 275444 URL: https://gcc.gnu.org/viewcvs?rev=275444&root=gcc&view=rev Log: RISC-V: Fix bad insn splits with paradoxical subregs. Shifting by more than the size of a SUBREG_REG doesn't work, so we either need to disable splits if an input is paradoxical, or else we need to generate a clean temporary for intermediate results. Jakub wrote the first version of this patch, so gets primary credit for it. gcc/ PR target/91635 * config/riscv/riscv.md (zero_extendsidi2, zero_extendhi2, extend2): Don't split if paradoxical_subreg_p (operands[0]). (*lshrsi3_zero_extend_3+1, *lshrsi3_zero_extend_3+2): Add clobber and use as intermediate value. gcc/testsuite/ PR target/91635 * gcc.c-torture/execute/pr91635.c: New test. * gcc.target/riscv/shift-shift-4.c: New test. * gcc.target/riscv/shift-shift-5.c: New test. Added: trunk/gcc/testsuite/gcc.c-torture/execute/pr91635.c trunk/gcc/testsuite/gcc.target/riscv/shift-shift-4.c trunk/gcc/testsuite/gcc.target/riscv/shift-shift-5.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/riscv/riscv.md trunk/gcc/testsuite/ChangeLog
[Bug target/91635] wrong code at -O2 with __builtin_add_overflow() and shifts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91635 --- Comment #24 from Segher Boessenkool --- (clobber of a match_operand I mean, sigh).
[Bug target/91635] wrong code at -O2 with __builtin_add_overflow() and shifts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91635 --- Comment #23 from Segher Boessenkool --- If a splitter wants a new register during combine, it should do a match_scratch for that. This is documented. You do not normally get new registers during combine. combine cannot really handle those. combine however goes to a lot of effort moving existing REG_DEAD notes.
[Bug target/91635] wrong code at -O2 with __builtin_add_overflow() and shifts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91635 --- Comment #22 from Jim Wilson --- > First of all, I think you need to use :DI instead of :GPR for the last > define_split, as it shouldn't be iterated with. Yes, sloppy copy paste. I will fix before committing. > Second, doesn't this mean that the splitters will be matched then only > during combine and never afterwards? The define_splits were specifically written to match patterns created by combine that aren't valid instructions. So I think it is OK if they can only be matched during combine. My empirical results are good looking at libc and libstdc++ code size, I haven't seen any cases where I get worse code size by modifying them to use clobbers. > The can_create_pseudo_p () -> gen_reg_rtx (mode) way of adding intermediate > temporaries is used heavily in many other backends, just look at say i386, > rs6000 or aarch64 backends for examples (I've looked only at those 3 and > found many spots in each), so if missing REG_DEAD notes affect code quality, > perhaps it would be more useful to try to change the combiner to add those > notes (say before splitting remember highest pseudo number, after the > splitter see if we've added some new pseudos and by scanning the resulting > sequence see if we shouldn't add REG_DEAD notes for those somewhere, because > they shouldn't be used anywhere outside of the sequence. Yes, long term, I think it would be a good idea to improve combine to add REG_DEAD notes for temporary regs created by splitters. I don't plan to work on that as I have too much other stuff to do. I'd be willing to create a bug report if you want me to.
[Bug target/91635] wrong code at -O2 with __builtin_add_overflow() and shifts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91635 Jakub Jelinek changed: What|Removed |Added CC||segher at gcc dot gnu.org --- Comment #21 from Jakub Jelinek --- (In reply to Jim Wilson from comment #20) > Created attachment 46830 [details] > proposed patch to fix paradoxical reg in splitter problem > > I get better code size with this alternative patch. I added two more > testcases for the issues I found while evaluating this patch. There is no > change to the rv32/newlib libc/libstdc++ library code sizes. The rv64/linux > libc.so is now 94 bytes smaller. The libstdc++.so is 6 bytes larger but > text is 10 bytes smaller and rodata is 16 bytes larger. I'm not sure what > happened there, maybe alignment padding, but I can live with it. The text > section looks like the same code, but slightly better register allocation, > which allows a few more compressed instructions to be used. This has passed > rv32/newlib and rv64/linux cross toolchain build and check. The new > testcase from Jakub still works, along with my two new testcases. > > I tried adding instrumentation to the patch to abort if I ended up with a > paradoxical reg in the splitters, and it never triggered, so it looks like > combine is doing the right thing when allocating a reg for the clobber. First of all, I think you need to use :DI instead of :GPR for the last define_split, as it shouldn't be iterated with. Second, doesn't this mean that the splitters will be matched then only during combine and never afterwards? The can_create_pseudo_p () -> gen_reg_rtx (mode) way of adding intermediate temporaries is used heavily in many other backends, just look at say i386, rs6000 or aarch64 backends for examples (I've looked only at those 3 and found many spots in each), so if missing REG_DEAD notes affect code quality, perhaps it would be more useful to try to change the combiner to add those notes (say before splitting remember highest pseudo number, after the splitter see if we've added some new pseudos and by scanning the resulting sequence see if we shouldn't add REG_DEAD notes for those somewhere, because they shouldn't be used anywhere outside of the sequence.
[Bug target/91635] wrong code at -O2 with __builtin_add_overflow() and shifts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91635 --- Comment #20 from Jim Wilson --- Created attachment 46830 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46830&action=edit proposed patch to fix paradoxical reg in splitter problem I get better code size with this alternative patch. I added two more testcases for the issues I found while evaluating this patch. There is no change to the rv32/newlib libc/libstdc++ library code sizes. The rv64/linux libc.so is now 94 bytes smaller. The libstdc++.so is 6 bytes larger but text is 10 bytes smaller and rodata is 16 bytes larger. I'm not sure what happened there, maybe alignment padding, but I can live with it. The text section looks like the same code, but slightly better register allocation, which allows a few more compressed instructions to be used. This has passed rv32/newlib and rv64/linux cross toolchain build and check. The new testcase from Jakub still works, along with my two new testcases. I tried adding instrumentation to the patch to abort if I ended up with a paradoxical reg in the splitters, and it never triggered, so it looks like combine is doing the right thing when allocating a reg for the clobber.
[Bug target/91635] wrong code at -O2 with __builtin_add_overflow() and shifts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91635 --- Comment #19 from Jim Wilson --- I did cross toolchain builds and checks for rv32i/newlib and rv64gc/glibc. There were no regressions. Since the splitters exist to reduce code size, I looked at the libc and libstdc++ sizes with and without the patch. For the rv32i build, libstdc++ is the same size, and libc is 4 bytes smaller. I extracted a testcase. int isnanf (float x) { union u { float f; int i; } u; u.f = x; int i = u.i; i &= 0x7fff; return i > 0x7f80; } Without the patch with -O2 -S we get sllia0,a0,1 li a5,2139095040 srlia0,a0,1 sgt a0,a0,a5 ret with the patch we get li a5,-16777216 sllia0,a0,1 sgtua0,a0,a5 ret The issue here is that the extra temporary used for the splitter that generates the slli/srli gives combine a little more freedom, and it manages to find an optimization that was missed before. For the rv64gc build, libc is 38 bytes smaller and libstdc++ is 34 bytes larger. The libc issue appears to be the same as above. The libstdc++ issue is more interesting. I extracted a testcase. unsigned long sub (long l) { union u { struct s { int a : 19; unsigned int b : 13; int x; } s; long l; } u; u.l = l; return u.s.b; } Without the patch, with -O2 -S, we generate srliw a0,a0,19 ret and with the patch we get srlia0,a0,19 sllia0,a0,51 srlia0,a0,51 ret The issue here is that there is no REG_DEAD note for the temporary that we generated. So without the patch combine generates Successfully matched this instruction: (set (reg:DI 78) (zero_extract:DI (reg:DI 82) (const_int 13 [0xd]) (const_int 19 [0x13]))) and with the patch combine generates Failed to match this instruction: (parallel [ (set (reg:DI 78) (zero_extract:DI (reg:DI 82) (const_int 13 [0xd]) (const_int 19 [0x13]))) (set (reg:DI 83) (and:DI (ashift:DI (reg:DI 82) (const_int 32 [0x20])) (const_int -2251799813685248 [0xfff8]))) ]) where reg 83 is the temporary generated by the splitter which has no REG_DEAD note. Since the temporary appears to be generally useful, and lack of a REG_DEAD note causes problems, I think it would be better to modify the two define_split patterns to have a clobber instead of calling gen_reg_rtx. I do get the REG_DEAD note in this case, and the correct single instruction output for the second testcase. The three define_insn_and_split patterns can remain the same as they aren't causing any trouble. Though I suppose there is a question here of whether combine will ever accidentally substitute in a paradoxical reg for a clobber. I would hope that it doesn't. I can check for that. Since I have a new patch, and new testcases, I need to redo the builds and checks from scratch.
[Bug target/91635] wrong code at -O2 with __builtin_add_overflow() and shifts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91635 --- Comment #18 from Jim Wilson --- I was willing to write a patch, I just needed a day to catch up as I'm hopelessly overloaded with work. But since Jakub wrote one, it looks right to me, and I think we should use that. I'm not sure exactly what testing Kito did, and he has now left on vacation, so I will just do my own testing. I think the reason why RISC-V needs paradoxical_reg and other architectures don't is that most architectures have sense enough to define zero and sign extend instructions. RISC-V will finally get that when the draft B (bit manipulation) extension is formally approved.
[Bug target/91635] wrong code at -O2 with __builtin_add_overflow() and shifts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91635 --- Comment #17 from Kito Cheng --- Hi Jakub: Thanks your patch, I've run with gcc testsuite and no new fails, and I am ruining more gcc testsuite regression with different arch/abi combination for that. I am amazing that seems like RISC-V is first back-end use paradoxical_subreg_p :P Hi Jim: Basically my local patch is almost same as Jakub, so I think you can use Jakub's directly if you think that's OK.
[Bug target/91635] wrong code at -O2 with __builtin_add_overflow() and shifts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91635 --- Comment #16 from Jakub Jelinek --- Created attachment 46809 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46809&action=edit gcc10-pr91635.patch If you want full patch from me, here it is, but as I said I can't easily test it (I don't even have cross-glibc or qemu around to test if the testcase FAILs or PASSes). I agree the last splitter is changed just for consistency and it is unlikely it would hit and for the post-reload splitter it will be hard to come up with testcases. I've tried to put all 4 testcases from Zdenek into one.
[Bug target/91635] wrong code at -O2 with __builtin_add_overflow() and shifts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91635 --- Comment #15 from Jakub Jelinek --- Then gen_rtx_REG (mode, REGNO (SUBREG_REG (x))) case I meant only in case some splitter would be required for proper operation rather than just being an optimization. If they are all an optimization, the !paradoxical_subreg_p guards are probably ok.
[Bug target/91635] wrong code at -O2 with __builtin_add_overflow() and shifts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91635 --- Comment #14 from Zdenek Sojka --- Thank you all for looking into this! I can't provide testcases for other issues, if they exist at all, since I am hitting the first one too often. If you fix only the first issue, there is a chance the other "broken" patterns will be hit in the future; if such a testcase exists at all.
[Bug target/91635] wrong code at -O2 with __builtin_add_overflow() and shifts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91635 --- Comment #13 from Jim Wilson --- I see 5 broken patterns which matches the list already given. The four testcases are all triggering on the same splitter, which is the first define_split, lshrsi3_zero_extend_3+1. The second define_split, lshrsi3_zero_extend_3+2, requires a mask which has at least 32 zeros in the low bits. I don't think that this can occur with a paradoxical reg, because the only valid input bits are all undefined bits in the upper half of the paradoxical reg. We may as well fix it anyways though to be consistent, but this means writing a testcase appears to be impossible. The other three are post reload splitters, which will be hard to write a testcase for. So it looks like we only have one useful testcase here, but that is better than nothing. Kito and I can test RISC-V patches. Using SUBREG_REG would require generating a new reg rtx with the wider mode. May not be worth the trouble considering how uncommon that case will be. For the second define_split, the existing testcases that trigger it are unsigned long sub1 (unsigned long i) { return (i >> 32) << 32; } unsigned long sub2 (unsigned long i) { return (i >> 63) << 63; } Like I mentioned above, I don't see how that can be rewritten to use a paradoxical reg.
[Bug target/91635] wrong code at -O2 with __builtin_add_overflow() and shifts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91635 --- Comment #12 from Kito Cheng --- Hi Zdenek: I can reproduce for all new 3 testcases, it seems like cause by same issue, thanks you provide more testcase for that!
[Bug target/91635] wrong code at -O2 with __builtin_add_overflow() and shifts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91635 --- Comment #11 from Zdenek Sojka --- Created attachment 46803 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46803&action=edit another testcase
[Bug target/91635] wrong code at -O2 with __builtin_add_overflow() and shifts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91635 --- Comment #10 from Zdenek Sojka --- Created attachment 46802 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46802&action=edit another testcase
[Bug target/91635] wrong code at -O2 with __builtin_add_overflow() and shifts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91635 --- Comment #9 from Zdenek Sojka --- Created attachment 46801 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46801&action=edit another testcase
[Bug target/91635] wrong code at -O2 with __builtin_add_overflow() and shifts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91635 --- Comment #8 from Jakub Jelinek --- Certainly both register_operand predicate as well as the usual register constraints don't rule out subregs even after reload. can_create_pseudo_p () is false after reload of course. In theory, what you could do after reload if the operand is a paradoxical subreg would be just use SUBREG_REG in that case instead of the subreg itself.
[Bug target/91635] wrong code at -O2 with __builtin_add_overflow() and shifts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91635 --- Comment #7 from Kito Cheng --- Hi Jakub: > that doesn't mean paradoxical subregs can't appear there, just it will be > less likely. Does it mean paradoxical subregs will appear during intermediate result even after reload, so for such split pattern with smaller mode than word_mode we should check it's paradoxical subreg or not? Thanks
[Bug target/91635] wrong code at -O2 with __builtin_add_overflow() and shifts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91635 --- Comment #6 from Jakub Jelinek --- Other problematic splitters are zero_extendsidi2, zero_extendhi2, extend2, dunno if I haven't missed something else. While they are all post-reload splitters, that doesn't mean paradoxical subregs can't appear there, just it will be less likely.
[Bug target/91635] wrong code at -O2 with __builtin_add_overflow() and shifts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91635 --- Comment #5 from Kito Cheng --- Hi Zdenek: Could you share more testcase? I've a patch is based on Jakub's one. Thanks :)
[Bug target/91635] wrong code at -O2 with __builtin_add_overflow() and shifts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91635 --- Comment #4 from Zdenek Sojka --- (In reply to Jakub Jelinek from comment #3) > Looks like a backend bug to me. ... > fixes this, but I don't have a way to test riscv, plus the backend has > several similar problems elsewhere. I can provide several other testcases where code using shifts is miscompiled.
[Bug target/91635] wrong code at -O2 with __builtin_add_overflow() and shifts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91635 Jakub Jelinek changed: What|Removed |Added CC||andrew at sifive dot com, ||kito at gcc dot gnu.org, ||palmer at gcc dot gnu.org, ||wilson at gcc dot gnu.org --- Comment #3 from Jakub Jelinek --- Looks like a backend bug to me. Trying 12, 13 -> 14: 12: r85:SI=r72:DI#0<<0x1 REG_DEAD r72:DI 13: r76:DI=zero_extend(r85:SI#0) REG_DEAD r85:SI 14: r87:SI=r76:DI#0 0>>0x1 REG_DEAD r76:DI Failed to match this instruction: (set (subreg:DI (reg:SI 87) 0) (and:DI (reg:DI 72 [ _1 ]) (const_int 32767 [0x7fff]))) Splitting with gen_split_35 Successfully matched this instruction: (set (subreg:DI (reg:SI 87) 0) (ashift:DI (reg:DI 72 [ _1 ]) (const_int 49 [0x31]))) Successfully matched this instruction: (set (subreg:DI (reg:SI 87) 0) (lshiftrt:DI (subreg:DI (reg:SI 87) 0) (const_int 49 [0x31]))) That splitting through gen_split_35 is incorrect in reusing the paradoxical subreg destination for the intermediate result, because in a paradoxical subreg the high bits are undefined, while the splitter relies on them being defined. --- gcc/config/riscv/riscv.md.jj2019-07-08 23:52:53.0 +0200 +++ gcc/config/riscv/riscv.md 2019-09-02 17:18:26.909424288 +0200 @@ -1770,15 +1770,17 @@ [(set (match_operand:GPR 0 "register_operand") (and:GPR (match_operand:GPR 1 "register_operand") (match_operand:GPR 2 "p2m1_shift_operand")))] - "" - [(set (match_dup 0) + "can_create_pseudo_p () || !paradoxical_subreg_p (operands[0])" + [(set (match_dup 3) (ashift:GPR (match_dup 1) (match_dup 2))) (set (match_dup 0) - (lshiftrt:GPR (match_dup 0) (match_dup 2)))] + (lshiftrt:GPR (match_dup 3) (match_dup 2)))] { /* Op2 is a VOIDmode constant, so get the mode size from op1. */ operands[2] = GEN_INT (GET_MODE_BITSIZE (GET_MODE (operands[1])) - exact_log2 (INTVAL (operands[2]) + 1)); + operands[3] += can_create_pseudo_p () ? gen_reg_rtx (mode) : operands[0]; }) ;; Handle AND with 0xF...F0...0 where there are 32 to 63 zeros. This can be @@ -1787,13 +1789,15 @@ [(set (match_operand:DI 0 "register_operand") (and:DI (match_operand:DI 1 "register_operand") (match_operand:DI 2 "high_mask_shift_operand")))] - "TARGET_64BIT" - [(set (match_dup 0) + "TARGET_64BIT + && (can_create_pseudo_p () || !paradoxical_subreg_p (operands[0]))" + [(set (match_dup 3) (lshiftrt:DI (match_dup 1) (match_dup 2))) (set (match_dup 0) - (ashift:DI (match_dup 0) (match_dup 2)))] + (ashift:DI (match_dup 3) (match_dup 2)))] { operands[2] = GEN_INT (ctz_hwi (INTVAL (operands[2]))); + operands[3] = can_create_pseudo_p () ? gen_reg_rtx (DImode) : operands[0]; }) ;; fixes this, but I don't have a way to test riscv, plus the backend has several similar problems elsewhere.
[Bug target/91635] wrong code at -O2 with __builtin_add_overflow() and shifts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91635 Jakub Jelinek changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2019-09-02 Ever confirmed|0 |1