[Bug target/106602] riscv: suboptimal codegen for zero_extendsidi2_shifted w/o bitmanip
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106602 --- Comment #29 from CVS Commits --- The releases/gcc-13 branch has been updated by Palmer Dabbelt : https://gcc.gnu.org/g:a252c73b1f2b920d8a2ff2e8ca59989e06652fbb commit r13-7220-ga252c73b1f2b920d8a2ff2e8ca59989e06652fbb Author: Palmer Dabbelt Date: Mon Apr 17 11:20:42 2023 -0700 RISC-V: Clean up the pr106602.c testcase The test case that was added is rv64i-specific, as there's better ways to generate this code on rv32i (where the long/int cast is a NOP) and on rv64i_zba (where we have word shifts). This renames the original test case and adds two more for those targets. gcc/testsuite/ChangeLog: PR target/106602 * gcc.target/riscv/pr106602.c: Moved to... * gcc.target/riscv/pr106602-rv64i.c: ...here. * gcc.target/riscv/pr106602-rv32i.c: New test. * gcc.target/riscv/pr106602-rv64i_zba.c: New test. (cherry picked from commit 8c010f6fe5ebe80d2e054b31e04ae0e9f12ae368)
[Bug target/106602] riscv: suboptimal codegen for zero_extendsidi2_shifted w/o bitmanip
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106602 --- Comment #28 from CVS Commits --- The trunk branch has been updated by Palmer Dabbelt : https://gcc.gnu.org/g:8c010f6fe5ebe80d2e054b31e04ae0e9f12ae368 commit r14-5-g8c010f6fe5ebe80d2e054b31e04ae0e9f12ae368 Author: Palmer Dabbelt Date: Tue Apr 11 11:04:56 2023 -0700 RISC-V: Clean up the pr106602.c testcase The test case that was added is rv64i-specific, as there's better ways to generate this code on rv32i (where the long/int cast is a NOP) and on rv64i_zba (where we have word shifts). This renames the original test case and adds two more for those targets. gcc/testsuite/ChangeLog: PR target/106602 * gcc.target/riscv/pr106602.c: Moved to... * gcc.target/riscv/pr106602-rv64i.c: ...here. * gcc.target/riscv/pr106602-rv32i.c: New test. * gcc.target/riscv/pr106602-rv64i_zba.c: New test.
[Bug target/106602] riscv: suboptimal codegen for zero_extendsidi2_shifted w/o bitmanip
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106602 Jeffrey A. Law changed: What|Removed |Added Resolution|--- |FIXED Status|NEW |RESOLVED --- Comment #27 from Jeffrey A. Law --- Fixed on the trunk.
[Bug target/106602] riscv: suboptimal codegen for zero_extendsidi2_shifted w/o bitmanip
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106602 --- Comment #26 from CVS Commits --- The master branch has been updated by Jeff Law : https://gcc.gnu.org/g:2e886eef7f2b5aadb00171af868f0895b647c3a4 commit r13-4907-g2e886eef7f2b5aadb00171af868f0895b647c3a4 Author: Raphael Moreira Zinsly Date: Tue Dec 27 18:29:25 2022 -0500 RISC-V: Produce better code with complex constants [PR95632] [PR106602] gcc/Changelog: PR target/95632 PR target/106602 * config/riscv/riscv.md: New pattern to simulate complex const_int loads. gcc/testsuite/ChangeLog: * gcc.target/riscv/pr95632.c: New test. * gcc.target/riscv/pr106602.c: New test.
[Bug target/106602] riscv: suboptimal codegen for zero_extendsidi2_shifted w/o bitmanip
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106602 --- Comment #25 from Jeffrey A. Law --- To outline what we were thinking. Yes, it's possible that 4->3 combinations aren't supported. I'd have to sit down with the combine sources to be sure. So the alternate approach we came up with was to mimick ZBS's ability to load up single bit constants with a define_insn_and_split, obviously splitting it into a constant load + shift unconditionally if it's still around post-combine. That's going to lead into a bigger question about constant loads. Right now we tend to break them down during expansion. The advantage of that is the components become CSE candidates. The downside is it will tend to inhibit instruction combination as we've seen in this case. I don't have a good answer on the best approach.
[Bug target/106602] riscv: suboptimal codegen for zero_extendsidi2_shifted w/o bitmanip
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106602 --- Comment #24 from Jeffrey A. Law --- Just a note. Raphael and I are going to poke at this from a different direction.
[Bug target/106602] riscv: suboptimal codegen for zero_extendsidi2_shifted w/o bitmanip
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106602 --- Comment #23 from Vineet Gupta --- (In reply to Jeffrey A. Law from comment #20) > Yea, I think so (3 shifts). Two for masking, one to put the bits in the > right position. Then we just have to figure out how to combine the initial > shift with the 3 for the masking and ultimately result with just two :-) Does combine handle 3 split outputs ? If I hack my define_split to only have 2, I can see it split + matching the 2 insn (final codegen is obviously bogus) Trying 7, 8, 9 -> 10: 7: r78:DI=0x1 8: r79:DI=r78:DI<<0x26 REG_DEAD r78:DI REG_EQUAL 0x40 9: r77:DI=r79:DI-0x40 REG_DEAD r79:DI REG_EQUAL 0x3fffc0 10: r75:DI=r76:DI:DI REG_DEAD r77:DI REG_DEAD r76:DI Failed to match this instruction: (set (reg:DI 75) (and:DI (reg:DI 76) (const_int 274877906880 [0x3fffc0]))) Splitting with gen_split_37 (riscv.md:2089) Successfully matched this instruction: (set (reg:DI 77) (lshiftrt:DI (reg:DI 76) <--- (const_int 6 [0x6]))) Successfully matched this instruction: (set (reg:DI 75) (ashift:DI (reg:DI 77) <--- (const_int 32 [0x20])))
[Bug target/106602] riscv: suboptimal codegen for zero_extendsidi2_shifted w/o bitmanip
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106602 --- Comment #22 from Vineet Gupta --- (In reply to Vineet Gupta from comment #21) > However > try_combine > recog_for_combine > recog_for_combine_1 > recog( ) > > is failing and we get "Failed to recognize..." False alarm: In a different successful split case, recog() does fail initially too. I see split_insns correctly returning the first shift-right. (insn 22 0 23 (set (reg:DI 77) (lshiftrt:DI (reg:DI 76) (const_int 6 [0x6]))) -1 (nil)) debugging
[Bug target/106602] riscv: suboptimal codegen for zero_extendsidi2_shifted w/o bitmanip
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106602 --- Comment #21 from Vineet Gupta --- I've been experimenting with (define_predicate "consecutive_bits_operand" (match_code "const_int") { unsigned HOST_WIDE_INT val = UINTVAL (op); if (exact_log2 ((val >> ctz_hwi (val)) + 1) < 0) return false; return true; }) (define_split [(set (match_operand:DI 0 "register_operand") (and:DI (match_operand:DI 1 "register_operand") (match_operand:DI 2 "consecutive_bits_operand"))) (clobber (match_operand:DI 3 "register_operand"))] "TARGET_64BIT" [(set (match_dup 3) (lshiftrt:DI (match_dup 1) (match_dup 4))) (set (match_dup 3) (ashift:DI (match_dup 3) (match_dup 5))) (set (match_dup 0) (lshiftrt:DI (match_dup 3) (match_dup 6)))] { unsigned HOST_WIDE_INT mask = UINTVAL (operands[2]); int r = ctz_hwi (mask); int l = clz_hwi (mask); operands[4] = GEN_INT (r); operands[5] = GEN_INT (r+l); operands[6] = GEN_INT (l); }) However try_combine recog_for_combine recog_for_combine_1 recog( ) is failing and we get "Failed to recognize..." (gdb) call debug_rtx(x1) (set (reg:DI 75) (and:DI (reg:DI 76) (const_int 274877906880 [0x3fffc0]))) I can't step through recog() cpp code since gen* insert #line for md file and gdb steps through unrelated md code. FWIW cc1 is being built with -g3 -O0
[Bug target/106602] riscv: suboptimal codegen for zero_extendsidi2_shifted w/o bitmanip
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106602 --- Comment #20 from Jeffrey A. Law --- Yea, I think so (3 shifts). Two for masking, one to put the bits in the right position. Then we just have to figure out how to combine the initial shift with the 3 for the masking and ultimately result with just two :-)
[Bug target/106602] riscv: suboptimal codegen for zero_extendsidi2_shifted w/o bitmanip
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106602 --- Comment #19 from Vineet Gupta --- (In reply to Jeffrey A. Law from comment #13) > Trying 7, 8, 9 -> 10: > 7: r140:DI=0x1 > 8: r141:DI=r140:DI<<0x26 > REG_DEAD r140:DI > REG_EQUAL 0x40 > 9: r139:DI=r141:DI-0x40 > REG_DEAD r141:DI > REG_EQUAL 0x3fffc0 >10: r137:DI=r138:DI:DI > REG_DEAD r139:DI > REG_DEAD r138:DI > Failed to match this instruction: > (set (reg:DI 137) > (and:DI (reg:DI 138) > (const_int 274877906880 [0x3fffc0]))) > > > That's what we're looking for. I think I had a wrong switch somewhere. > Match that with a define_split and you should be good to go. I experimented with a define_split and initially saw it generate sllia5,a0,32 srlia0,a5,6 but realized for both leading and trailing zeroes (either of arbitrary length) we'll need 3 shifts ?
[Bug target/106602] riscv: suboptimal codegen for zero_extendsidi2_shifted w/o bitmanip
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106602 --- Comment #18 from Vineet Gupta --- (In reply to Jeffrey A. Law from comment #17) > Vineet: I don't know your priorities and such, but I've got a junior > engineer starting in a bit under two weeks and this would actually be a good > issue for them to tackle as part of learning about GCC. So if you want to > set it aside, I can have someone poking at it fairly soon. Seem slike I've spent way too much time on this than warranted so might as well complete it :-)
[Bug target/106602] riscv: suboptimal codegen for zero_extendsidi2_shifted w/o bitmanip
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106602 --- Comment #17 from Jeffrey A. Law --- Vineet: I don't know your priorities and such, but I've got a junior engineer starting in a bit under two weeks and this would actually be a good issue for them to tackle as part of learning about GCC. So if you want to set it aside, I can have someone poking at it fairly soon.
[Bug target/106602] riscv: suboptimal codegen for zero_extendsidi2_shifted w/o bitmanip
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106602 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement
[Bug target/106602] riscv: suboptimal codegen for zero_extendsidi2_shifted w/o bitmanip
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106602 --- Comment #16 from Andrew Pinski --- (In reply to Vineet Gupta from comment #14) > (In reply to Jeffrey A. Law from comment #13) > > Trying 7, 8, 9 -> 10: > > 7: r140:DI=0x1 > > 8: r141:DI=r140:DI<<0x26 > > REG_DEAD r140:DI > > REG_EQUAL 0x40 > > 9: r139:DI=r141:DI-0x40 > > REG_DEAD r141:DI > > REG_EQUAL 0x3fffc0 > >10: r137:DI=r138:DI:DI > > REG_DEAD r139:DI > > REG_DEAD r138:DI > > Failed to match this instruction: > > (set (reg:DI 137) > > (and:DI (reg:DI 138) > > (const_int 274877906880 [0x3fffc0]))) > > > > > > That's what we're looking for. I think I had a wrong switch somewhere. > > Match that with a define_split and you should be good to go. > > I think you are missing the original left shift 6. > So insn 6 from my dumps is important if we are to match > zero_extendsidi2_shifted which matches and+shift (iff 3>>2==0x). > > But it feels like you agree on using REG_EQUAL note (if avail) to created > merged pattern in try_combine() as opposed to my original thinking of only > doing it if regular pattern match failed. So what Jeff is saying is you just need a define_split which matches that final set. Something like: (define_split [(set (match_operand:DI 0 "register_operand" "") (and:SI (match_operand:SI 1 "register_operand" "") (match_operand:SI 2 "shifted_mask_operand" "")))] "" [(set (match_operand:DI 0
[Bug target/106602] riscv: suboptimal codegen for zero_extendsidi2_shifted w/o bitmanip
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106602 --- Comment #15 from Jeffrey A. Law --- THe hope is the shift 6 combines with the first shift you emit for (set (reg:DI 137) (and:DI (reg:DI 138) (const_int 274877906880 [0x3fffc0]))) Conceptually this is similar to creating bridge patterns for the combiner.
[Bug target/106602] riscv: suboptimal codegen for zero_extendsidi2_shifted w/o bitmanip
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106602 --- Comment #14 from Vineet Gupta --- (In reply to Jeffrey A. Law from comment #13) > Trying 7, 8, 9 -> 10: > 7: r140:DI=0x1 > 8: r141:DI=r140:DI<<0x26 > REG_DEAD r140:DI > REG_EQUAL 0x40 > 9: r139:DI=r141:DI-0x40 > REG_DEAD r141:DI > REG_EQUAL 0x3fffc0 >10: r137:DI=r138:DI:DI > REG_DEAD r139:DI > REG_DEAD r138:DI > Failed to match this instruction: > (set (reg:DI 137) > (and:DI (reg:DI 138) > (const_int 274877906880 [0x3fffc0]))) > > > That's what we're looking for. I think I had a wrong switch somewhere. > Match that with a define_split and you should be good to go. I think you are missing the original left shift 6. So insn 6 from my dumps is important if we are to match zero_extendsidi2_shifted which matches and+shift (iff 3>>2==0x). But it feels like you agree on using REG_EQUAL note (if avail) to created merged pattern in try_combine() as opposed to my original thinking of only doing it if regular pattern match failed.
[Bug target/106602] riscv: suboptimal codegen for zero_extendsidi2_shifted w/o bitmanip
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106602 --- Comment #13 from Jeffrey A. Law --- Trying 7, 8, 9 -> 10: 7: r140:DI=0x1 8: r141:DI=r140:DI<<0x26 REG_DEAD r140:DI REG_EQUAL 0x40 9: r139:DI=r141:DI-0x40 REG_DEAD r141:DI REG_EQUAL 0x3fffc0 10: r137:DI=r138:DI:DI REG_DEAD r139:DI REG_DEAD r138:DI Failed to match this instruction: (set (reg:DI 137) (and:DI (reg:DI 138) (const_int 274877906880 [0x3fffc0]))) That's what we're looking for. I think I had a wrong switch somewhere. Match that with a define_split and you should be good to go.
[Bug target/106602] riscv: suboptimal codegen for zero_extendsidi2_shifted w/o bitmanip
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106602 --- Comment #12 from Jeffrey A. Law --- But insns 6, 7 and 8 aren't important here. We have a REG_EQUAL on insn 9 which indicates that (reg:DI 77) has the value 0xffc0. So I would have expected combine to substitute that into the use of (reg:DI 77) in insn 10 which would give you a shot a recognizing the result is just a pair of shifts. But the REG_EQUAL doesn't seem to be used in that case. My insn #s are different, but here's the relevant part of the dump: Trying 9 -> 15: 9: r139:DI=r141:DI-0x40 REG_DEAD r141:DI REG_EQUAL 0x3fffc0 15: a0:DI=r138:DI:DI REG_DEAD r138:DI REG_DEAD r139:DI Failed to match this instruction: (set (reg/i:DI 10 a0) (and:DI (plus:DI (reg:DI 141) (const_int -64 [0xffc0])) (reg:DI 138))) BUt we know the (and (plus ...))) is just 0x3fffc0, what's not clear is why we don't use that information and try to recognize (set (reg:DI 10) (and (reg:DI 138) (const_int 0x3fffc0)) Which is just a pair of shifts.
[Bug target/106602] riscv: suboptimal codegen for zero_extendsidi2_shifted w/o bitmanip
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106602 --- Comment #11 from Vineet Gupta --- This definitely seems worth pursuing: For bitmanip, following four insn seq matches Trying 7, 8, 6 -> 9: 7: r78:DI=0x40 8: r77:DI=r78:DI-0x40 REG_EQUAL 0x3fffc0 6: r76:DI=r79:DI<<0x6 9: r75:DI=r76:DI:DI Successfully matched this instruction: (set (reg:DI 75) (and:DI (ashift:DI (reg:DI 79) For non-bitmanip, a similar 3 insn seq doesn't Trying 9, 6 -> 10: 9: r77:DI=r79:DI-0x40 REG_EQUAL 0x3fffc0 6: r76:DI=r80:DI<<0x6 10: r75:DI=r76:DI:DI Failed to match this instruction: (set (reg:DI 75) (and:DI (plus:DI (reg:DI 79) (const_int -64 [0xffc0])) (ashift:DI (reg:DI 80) (const_int 6 [0x6] If we re-ran last failing case with REQ_EQUAL note it would match.
[Bug target/106602] riscv: suboptimal codegen for zero_extendsidi2_shifted w/o bitmanip
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106602 --- Comment #10 from Vineet Gupta --- At expand time, RV backend was NOT creating REQ_EQUAL note for 1 << 38 (insn 8 7 9 2 (set (reg:DI 79) (ashift:DI (reg:DI 78) (const_int 38 [0x26]))) "../../../shifter.c":10:40 -1 However cse1 adds it anyways, so that was a false alarm. (insn 8 7 9 2 (set (reg:DI 79) (ashift:DI (reg:DI 78) (const_int 38 [0x26]))) "../../../../shifter.c":10:40 159 {ashldi3} (expr_list:REG_EQUAL (const_int 274877906944 [0x40]) So when entering combine, we have following (insn 6 3 7 2 (set (reg:DI 76) (ashift:DI (reg/v:DI 74 [ a ]) (const_int 6 [0x6]))) (insn 7 6 8 2 (set (reg:DI 78) (const_int 1 [0x1])) (insn 8 7 9 2 (set (reg:DI 79) (ashift:DI (reg:DI 78) (const_int 38 [0x26]))) (expr_list:REG_EQUAL (const_int 274877906944 [0x40]) (insn 9 8 10 2 (set (reg:DI 77) (plus:DI (reg:DI 79) (const_int -64 [0xffc0]))) (expr_list:REG_EQUAL (const_int 274877906880 [0x3fffc0]) (insn 10 9 15 2 (set (reg:DI 75) (and:DI (reg:DI 76) (reg:DI 77))) Obviously the regular code flow can't merge/match 5. The trailing note processing in combine replaces the pattern with available set REG_EQUAL but it only handles two instructions at a time. This issue needs it to handle three: insn 9, 10, 6 Perhaps we need to do the REQ_EQUAL note handling not seperately in the end, but as part of each of 2 insn combine / 3 insn combine / 4 insn combine (after the regular try_combine() fails for each of them ?
[Bug target/106602] riscv: suboptimal codegen for zero_extendsidi2_shifted w/o bitmanip
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106602 --- Comment #9 from Vineet Gupta --- (In reply to Jeffrey A. Law from comment #7) > There's some code in combine that's supposed to take advantage of REG_EQUAL > notes which is supposed to help with this kind of scenario. Digging into > that might help. Awesome, thx for the pointer Jeff. Indeed combine has a note processing fallback in the end. RV backend is got generating a note for the const split case (1 << 38). Let me see if I can fix that.
[Bug target/106602] riscv: suboptimal codegen for zero_extendsidi2_shifted w/o bitmanip
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106602 palmer at gcc dot gnu.org changed: What|Removed |Added CC||palmer at gcc dot gnu.org --- Comment #8 from palmer at gcc dot gnu.org --- (In reply to Jeffrey A. Law from comment #7) > There's some code in combine that's supposed to take advantage of REG_EQUAL > notes which is supposed to help with this kind of scenario. Digging into > that might help. IMO that's the right way to go here. I think anything we do in the RISC-V backend would likely just push around the problem: splitting early seems like generally the right thing to do, but we'll eventually trip up combine just by virtue of making instruction sequences longer. IIUC something like REG_EQUAL would allow us to keep both flavors around so something can sort it out later. That said, I've never really reached this deep into the middle end so it's all a bit over my head and I decided it'd be saner to just close the file and say nothing ;)
[Bug target/106602] riscv: suboptimal codegen for zero_extendsidi2_shifted w/o bitmanip
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106602 Jeffrey A. Law changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Last reconfirmed||2022-11-01
[Bug target/106602] riscv: suboptimal codegen for zero_extendsidi2_shifted w/o bitmanip
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106602 Jeffrey A. Law changed: What|Removed |Added CC||law at gcc dot gnu.org --- Comment #7 from Jeffrey A. Law --- There's some code in combine that's supposed to take advantage of REG_EQUAL notes which is supposed to help with this kind of scenario. Digging into that might help.
[Bug target/106602] riscv: suboptimal codegen for zero_extendsidi2_shifted w/o bitmanip
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106602 --- Comment #6 from Vineet Gupta --- (In reply to Andrew Pinski from comment #4) > (In reply to Vineet Gupta from comment #3) > > Interestingly, if one builds for -march=rv64gc_zbs # single bit extension > > > > then the optimal code seq for bitmanip is generated, while no zbs > > instructions are used. > > > > foo2: > > sllia5,a0,32 > > srlia0,a5,26 > > ret > > Most likely because > lia5,1 > sllia5,a5,38 > Could be done using one instruction. > > And then combine could do its thing. But with multiple instructions, it > becomes harder to do. > I have not looked into rtl dumps but this is just knowing what combine can > do and such. Looks like our updates collided, indeed that 1 << 38 is the issue here.
[Bug target/106602] riscv: suboptimal codegen for zero_extendsidi2_shifted w/o bitmanip
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106602 --- Comment #5 from Vineet Gupta --- Gimple for the test is _1 = a_2(D) << 6; _3 = _1 & 274877906880; // 0x3f__ffc0 And 0x3f__ffc0 = 0x40__ - 0x40 For !TARGET_ZBA there's a combiner pattern to match the seq of instruction generated normally: define_insn_and_split "zero_extendsidi2_shifted" Trying 7, 8, 6 -> 9: 7: r78:DI=0x40 8: r77:DI=r78:DI-0x40 REG_DEAD r78:DI REG_EQUAL 0x3fffc0 6: r76:DI=r79:DI<<0x6 REG_DEAD r79:DI 9: r75:DI=r76:DI:DI Successfully matched this instruction: (set (reg:DI 75) (and:DI (ashift:DI (reg:DI 79) (const_int 6 [0x6])) (const_int 274877906880 [0x3fffc0]))) However for !bitmanip, RTL expansion splittable_const_int_operand() breaks up 0x40__ into 1 << 38 (insn 7 6 8 2 (set (reg:DI 78) (const_int 1 [0x1])) (insn 8 7 9 2 (set (reg:DI 79) (ashift:DI (reg:DI 78) (const_int 38 [0x26]))) So we end up with 5 tot insn, which combine can't And splittable_const_int_operand() has following check if (TARGET_64BIT && TARGET_ZBS && SINGLE_BIT_MASK_OPERAND (INTVAL (op))) return false; which explains why 1. zba alone doesn't generate slli.uw 2. zbs generates optimal slli+srli although these have nothing to do with zbs
[Bug target/106602] riscv: suboptimal codegen for zero_extendsidi2_shifted w/o bitmanip
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106602 --- Comment #4 from Andrew Pinski --- (In reply to Vineet Gupta from comment #3) > Interestingly, if one builds for -march=rv64gc_zbs # single bit extension > > then the optimal code seq for bitmanip is generated, while no zbs > instructions are used. > > foo2: > sllia5,a0,32 > srlia0,a5,26 > ret Most likely because li a5,1 sllia5,a5,38 Could be done using one instruction. And then combine could do its thing. But with multiple instructions, it becomes harder to do. I have not looked into rtl dumps but this is just knowing what combine can do and such.
[Bug target/106602] riscv: suboptimal codegen for zero_extendsidi2_shifted w/o bitmanip
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106602 Vineet Gupta changed: What|Removed |Added CC||jeffreyalaw at gmail dot com, ||vineetg at rivosinc dot com Summary|riscv: suboptimal codegen |riscv: suboptimal codegen |for shift left, right, left |for ||zero_extendsidi2_shifted ||w/o bitmanip --- Comment #3 from Vineet Gupta --- Interestingly, if one builds for -march=rv64gc_zbs # single bit extension then the optimal code seq for bitmanip is generated, while no zbs instructions are used. foo2: sllia5,a0,32 srlia0,a5,26 ret