[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417 --- Comment #39 from Levy --- Checked all pass from 250r.shorten_memrefs to 270r.ce2 In 269r.combine I saw the following combination merged the replaced address: --- modifying insn i327: r92:DI=r96:DI+0x300 REG_DEAD r96:DI deferring rescan insn with uid = 27. (!)allowing combination of insns 27 and 6 original costs 4 + 16 = 20 replacement costs 4 + 16 = 20 modifying insn i227: r92:DI=r96:DI+0x300 deferring rescan insn with uid = 27. modifying insn i3 6: r82:DI=sign_extend([r96:DI+0x320]) REG_DEAD r96:DI deferring rescan insn with uid = 6. (!)allowing combination of insns 27 and 8 original costs 4 + 16 = 20 replacement costs 4 + 16 = 20 modifying insn i227: r92:DI=r96:DI+0x300 deferring rescan insn with uid = 27. modifying insn i3 8: r84:DI=sign_extend([r96:DI+0x324]) REG_DEAD r96:DI deferring rescan insn with uid = 8. (!)allowing combination of insns 27 and 12 original costs 4 + 16 = 20 replacement costs 4 + 16 = 20 modifying insn i227: r92:DI=r96:DI+0x300 deferring rescan insn with uid = 27. modifying insn i312: r87:DI=sign_extend([r96:DI+0x328]) REG_DEAD r96:DI deferring rescan insn with uid = 12. (!)allowing combination of insns 27 and 16 original costs 4 + 16 = 20 replacement cost 16 deferring deletion of insn with uid = 27. modifying insn i316: r90:DI=sign_extend([r96:DI+0x32c]) REG_DEAD r96:DI deferring rescan insn with uid = 16. allowing combination of insns 18 and 19 --- Where in 268r.ud_dce both insns 27 are same (except for memory address): (insn 27 26 28 2 (set (reg:DI 10 a0) (lo_sum:DI (reg/f:DI 85) (symbol_ref/f:DI ("*.LC0") [flags 0x82] ))) "array_test.c":21:5 133 {*lowdi} (expr_list:REG_DEAD (reg/f:DI 85) (expr_list:REG_EQUAL (symbol_ref/f:DI ("*.LC0") [flags 0x82] ) (nil --- In 270r.combine (patched): (note 27 3 6 2 NOTE_INSN_DELETED) and following insns 768 + 32/36/40/44 were put back as: (insn 6 27 8 2 (set (reg:DI 82 [ MEM[(int *)array_5(D) + 800B] ]) (sign_extend:DI (mem:SI (plus:DI (reg:DI 96) (const_int 800 [0x320])) [1 MEM[(int *)array_5(D) + 800B]+0 S4 A32]))) "array_test.c":7:5 90 {extendsidi2} (insn 8 6 10 2 (set (reg:DI 84 [ MEM[(int *)array_5(D) + 804B] ]) (sign_extend:DI (mem:SI (plus:DI (reg:DI 96) (const_int 804 [0x324])) [1 MEM[(int *)array_5(D) + 804B]+0 S4 A32]))) "array_test.c":7:5 90 {extendsidi2} (insn 12 10 14 2 (set (reg:DI 87 [ MEM[(int *)array_5(D) + 808B] ]) (sign_extend:DI (mem:SI (plus:DI (reg:DI 96) (const_int 808 [0x328])) [1 MEM[(int *)array_5(D) + 808B]+0 S4 A32]))) "array_test.c":8:5 90 {extendsidi2} (insn 16 14 18 2 (set (reg:DI 90 [ MEM[(int *)array_5(D) + 812B] ]) (sign_extend:DI (mem:SI (plus:DI (reg:DI 96) (const_int 812 [0x32c])) [1 MEM[(int *)array_5(D) + 812B]+0 S4 A32]))) "array_test.c":9:5 90 {extendsidi2} (expr_list:REG_DEAD (reg:DI 96) (nil))) Maybe combine.c needs some modification?
[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417 --- Comment #38 from Levy --- Created attachment 49575 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49575=edit riscv-shorten-memrefs_V1.patch Did little bit change in get_si_mem_base_reg() and transform() Now for the same c input array_test.c int load1r (int *array) { int a = 0; a += array[200]; a += array[201]; a += array[202]; a += array[203]; return a; } int main () { int arr[300]= {0}; arr[200] = 15; arr[201] = -33; arr[202] = 7; arr[203] = -999; int b = load1r(arr); printf("Result: %d\n",b); return 0; } in loadlr, when put a debug_rtx(pat) after: (unpatched)XEXP (pat, i) = replace_equiv_address (mem, addr); or (patched)XEXP (XEXP (pat, I), 0) = replace_equiv_address(XEXP (mem, 0), addr); unpatched gcc will produce follwing insns: - (set (reg:SI 81 [ MEM[(int *)array_5(D) + 800B] ]) (mem:SI (plus:DI (reg:DI 88) (const_int 32 [0x20])) [1 MEM[(int *)array_5(D) + 800B]+0 S4 A32])) (set (reg:SI 82 [ MEM[(int *)array_5(D) + 804B] ]) (mem:SI (plus:DI (reg:DI 89) (const_int 36 [0x24])) [1 MEM[(int *)array_5(D) + 804B]+0 S4 A32])) (set (reg:SI 84 [ MEM[(int *)array_5(D) + 808B] ]) (mem:SI (plus:DI (reg:DI 90) (const_int 40 [0x28])) [1 MEM[(int *)array_5(D) + 808B]+0 S4 A32])) (set (reg:SI 86 [ MEM[(int *)array_5(D) + 812B] ]) (mem:SI (plus:DI (reg:DI 91) (const_int 44 [0x2c])) [1 MEM[(int *)array_5(D) + 812B]+0 S4 A32])) - patched gcc will produce follwing insns: - (set (reg:DI 82 [ MEM[(int *)array_5(D) + 800B] ]) (sign_extend:DI (mem:SI (plus:DI (reg:DI 92) (const_int 32 [0x20])) [1 MEM[(int *)array_5(D) + 800B]+0 S4 A32]))) (set (reg:DI 84 [ MEM[(int *)array_5(D) + 804B] ]) (sign_extend:DI (mem:SI (plus:DI (reg:DI 93) (const_int 36 [0x24])) [1 MEM[(int *)array_5(D) + 804B]+0 S4 A32]))) (set (reg:DI 87 [ MEM[(int *)array_5(D) + 808B] ]) (sign_extend:DI (mem:SI (plus:DI (reg:DI 94) (const_int 40 [0x28])) [1 MEM[(int *)array_5(D) + 808B]+0 S4 A32]))) (set (reg:DI 90 [ MEM[(int *)array_5(D) + 812B] ]) (sign_extend:DI (mem:SI (plus:DI (reg:DI 95) (const_int 44 [0x2c])) [1 MEM[(int *)array_5(D) + 812B]+0 S4 A32]))) - which the patched one looks ok for me, but the final assembly code still shows no change (both under -Os) Not quite sure where the problem is, I'll have a look near array_test.c.250r.shorten_memrefs tomorrow. Please ignore the coding style as it's just a debug patch
[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417 --- Comment #37 from Kito Cheng --- Maybe we could add a parameter to indicate the type of memory access, plain_mem, zext_mem or sext_mem for pass_shorten_memrefs::get_si_mem_base_reg. e.g. for (int i = 0; i < 2; i++) { rtx mem = XEXP (pat, i); rtx addr; mem_access_type_t type; if (get_si_mem_base_reg (mem, , )) { HOST_WIDE_INT regno = REGNO (XEXP (addr, 0)); /* Do not transform store zero as these cannot be compressed. */ if (i == 0) { if (XEXP (pat, 1) == CONST0_RTX (GET_MODE (XEXP (pat, 1 continue; } if (m->get_or_insert (regno) > 3) { addr = targetm.legitimize_address (addr, addr, GET_MODE (mem)); switch (type) { case plain_mem: XEXP (pat, i) = replace_equiv_address (mem, addr); break; case zext_mem: ... break; case sext_mem: ... break; } df_insn_rescan (insn); } }
[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417 --- Comment #36 from Levy --- It seems get_si_mem_base_reg() were called repeatly FOR_BB_INSNS from both pass_shorten_memrefs::analyze and pass_shorten_memrefs::transform Correct me if I'm wrong: It seems we need some data structure (a linked list should work) to store the zero/sign extend when we strip it off like: if (GET_CODE (mem) == ZERO_EXTEND || GET_CODE (mem) == SIGN_EXTEND) mem = XEXP (mem, 0); in each insns. Then in pass_shorten_memrefs::transform(), each time get_si_mem_base_reg() is called, we check whether if we need to put it back.
[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417 --- Comment #35 from Jim Wilson --- By combine issue, are you referring to the regression I mentioned in comment 3 and filed as bug 97747? We can handle that as a separate issue. It should be uncommon. I expect to get much more benefit from this patch than the downside due to that combine issue. As for the shorten-memrefs problem, I didn't notice that one in my testing. It does need to be fixed. Taking a look now, it looks pretty simple to fix. The code currently looks for MEM, it needs to handle (SIGN_EXTEND (MEM)) and ((ZERO_EXTEND (MEM)). See the get_si_mem_base_reg function. You need to skip over the sign_extend or zero_extend when looking fot the mem at the first call. Then at the second call you need to be careful to put the sign_extend or zero_extend back when creating the new RTL. Maybe just another arg to get_si_mem_base so it can record the parent rtx code of the mem. Or maybe do this outside get_si_mem_base to skip over a sign/zero extend at the first call, and then do the same at the second call but record what rtx we skipped over so that we can put it back. We can either handle this here or as another patch. But since you have some time while waiting for paperwork maybe you can try writing a fix.
[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417 --- Comment #34 from Levy --- (In reply to Jim Wilson from comment #33) > I did say that I'm willing to fix code style issues. All major software > projects I have worked with have coding style conventions. It makes it > easier to maintain a large software base when everything is using the same > coding style. We do have info to help you follow the GNU coding style. See > https://gcc.gnu.org/wiki/FormattingCodeForGCC > which has emacs and vim settings to get the right code style. No problem and thank you, Jim, I'll try to catch up the coding style. what about the combine issue and shorten-memrefs? Shall we fix it here or someplace else?
[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417 --- Comment #33 from Jim Wilson --- I did say that I'm willing to fix code style issues. All major software projects I have worked with have coding style conventions. It makes it easier to maintain a large software base when everything is using the same coding style. We do have info to help you follow the GNU coding style. See https://gcc.gnu.org/wiki/FormattingCodeForGCC which has emacs and vim settings to get the right code style.
[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417 Levy changed: What|Removed |Added Attachment #49542|0 |1 is obsolete|| --- Comment #32 from Levy --- Created attachment 49543 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49543=edit QI/HI/SImode auto Zero/Sign-extend Added one missing space at the end of the comment. Added one tab before each brace. Replace all tabs with space (come on)
[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417 Levy changed: What|Removed |Added Attachment #49536|0 |1 is obsolete|| --- Comment #31 from Levy --- Created attachment 49542 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49542=edit QI/HI/SImode auto Zero/Sign-extend Much appreciated Jim. The new patch should solve the format issue by adding the same tabs on each line. In the meanwhile I'll try to patch the pass_shorten_memrefs::analyze() in riscv-shorten-memrefs.c Any idea on the combiner issue?
[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417 --- Comment #30 from Jim Wilson --- Looking at your v2 patch, the first verison fails because you are passing mismatched modes to emit_move_insn. The version with gen_lowpart solves that problem, but fails because of infinite recursion. The v4 patch looks good. There are some coding style issues, but I can always fix them for you. There should be a space before open paren. The first && should line up with the last two. The braces should be indented two more spaces. The comment needs to end with a period and two spaces. In the comment, instead of "Separate ... to ..." I'd say "Expand ... to ..." or maybe "Emit ... as ...". Now we need the copyright assignment paperwork.
[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417 Levy changed: What|Removed |Added Attachment #49534|0 |1 is obsolete|| --- Comment #29 from Levy --- Created attachment 49536 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49536=edit QI/HI/SImode auto Zero/Sign-extend Finally, make gen_extend_insn() seems to work with auto-extension based on Jim and Kito's idea. Just 10 lines of code at the beginning of riscv_legitimize_move() in riscv-gcc/gcc/config/riscv.c if (GET_MODE_CLASS (mode) == MODE_INT && GET_MODE_SIZE (mode) < UNITS_PER_WORD && can_create_pseudo_p() && MEM_P (src)) { rtx temp_reg = gen_reg_rtx (word_mode); int zero_sign_extend = (LOAD_EXTEND_OP (mode) == ZERO_EXTEND); emit_insn(gen_extend_insn(temp_reg, src, word_mode, mode, zero_sign_extend)); riscv_emit_move(dest, gen_lowpart(mode, temp_reg)); return true; } Haven't make report-gcc, but already passed 2-stage make. I'll have a test later.
[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417 Levy changed: What|Removed |Added Attachment #49533|0 |1 is obsolete|| --- Comment #28 from Levy --- Created attachment 49534 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49534=edit V4 patch for QI/HI/SI/DI zero/sign-extend for RV32/64/128 Suggest by Kito, The V4 patch moved the gen_extend_insn_auto() to riscv.c and was included in riscv-protos.h since it handles riscv backend only.
[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417 --- Comment #27 from Levy --- (In reply to Kito Cheng from comment #25) > Seem like you have add code to gcc/optabs.h and gcc/optabs.c, however those > functions are RISC-V specific, so I would suggest you put in riscv.c and > riscv-protos.h. No problem, I'll make a new patch.
[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417 Levy changed: What|Removed |Added Attachment #49532|0 |1 is obsolete|| --- Comment #26 from Levy --- Created attachment 49533 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49533=edit QI/HI/SI/DI zero/sign-extend for RV32/64/128 BUG fix for unimplemented code
[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417 --- Comment #25 from Kito Cheng --- Seem like you have add code to gcc/optabs.h and gcc/optabs.c, however those functions are RISC-V specific, so I would suggest you put in riscv.c and riscv-protos.h.
[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417 Levy changed: What|Removed |Added Attachment #49524|0 |1 is obsolete|| --- Comment #24 from Levy --- Created attachment 49532 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49532=edit QI/HI/SI/DI zero/sign-extend for RV32/64/128 Rewrote the third proposed patch.
[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417 Levy changed: What|Removed |Added Attachment #49470|0 |1 is obsolete|| Attachment #49500|0 |1 is obsolete|| --- Comment #23 from Levy --- Created attachment 49524 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49524=edit Third test patch While I'm waiting for a solution, I've reused my second patch and made a new patch. Third test patch adds one extra function gen_extend_insn_auto() in optabs.c/h then just called by riscv_legitimize_move() Now it can emit sign/unsigned-extend insn automatically. Still haven't solved the shorten-memrefs issue. So it will still raise 2 error when make report-gcc So the -Os option (size optimization) may not behave as expected.
[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417 --- Comment #22 from Levy --- Under condition if (GET_MODE_CLASS (mode) == MODE_INT && GET_MODE_SIZE (mode) < UNITS_PER_WORD && can_create_pseudo_p() && MEM_P (src)) with var: rtx temp_reg; int extend = (LOAD_EXTEND_OP (mode) == ZERO_EXTEND); I've tried the combination of: gen_extend_insn (temp_reg, force_reg (mode, src), word_mode, mode, extend); gen_extend_insn (temp_reg, force_reg (word_mode, src), word_mode, word_mode, extend); gen_extend_insn (temp_reg, src, word_mode, mode, extend); with: riscv_emit_move(dest, gen_lowpart (mode, temp_reg)); riscv_emit_move(dest, force_reg(mode, temp_reg)); then return true All raised segfault during make gcc. For example: if (GET_MODE_CLASS (mode) == MODE_INT && GET_MODE_SIZE (mode) < UNITS_PER_WORD && can_create_pseudo_p() && MEM_P (src)) { rtx temp_reg; int extend = (LOAD_EXTEND_OP (mode) == ZERO_EXTEND); gen_extend_insn (temp_reg, force_reg (mode, src), word_mode, mode, extend); riscv_emit_move(dest, force_reg(mode, temp_reg)); return true; } At beginning of riscv_legitimize_move()
[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417 --- Comment #21 from Jim Wilson --- I submitted my testcase as 97747 so it will get more attention.
[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417 --- Comment #20 from Jim Wilson --- Maybe convert_to_mode is recursively calling convert_to_mode until you run out of stack space. You can use --save-temps to generate a .i preprocessed file form your input testcasxe, then run cc1 under gdb. You need to give the default arch/abi options or you will get an error (gdb) run -march=rv64gc -mabi=lp64d -O2 tmp.i when look at the backtrace when you hit the segfault. There are other ways to get the zero extend we need here. We could just generate the RTL for the insn directly instead of calling the gen_zero_extendXiYi2 functions because we know that they exist and what form they are in. This is inconvenient though. We could try querying the optabs table to get the right insn code. There is a gen_extend_insn function that looks like it would work and is more direct than convert_to_mode. gen_extend_insn does require that the input is in a form that the convert will accept, so it must be forced to a register if it isn't already a register. Another solution would be to use one of the rtx simplier functions, e.g. you can do simplify_gen_unary (ZERO_EXTEND, word_mode, src, mode); but the simplify functions are really for simplifying complex RTL to simpler RTL, so I think not the right approach here. I think gen_extend_insn is the best approach if we can't use convert_to_mode.
[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417 --- Comment #19 from Levy --- Also tested code without int extend, just zero-extend with unsignedp set to 1, Still seg fault. if (GET_MODE_CLASS (mode) == MODE_INT && GET_MODE_SIZE (mode) < UNITS_PER_WORD && can_create_pseudo_p() && MEM_P (src)) { rtx temp_reg = force_reg (word_mode, convert_to_mode (word_mode, src, 1)); riscv_emit_move(dest, temp_reg); return true; }
[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417 --- Comment #18 from Levy --- if (GET_MODE_CLASS (mode) == MODE_INT && GET_MODE_SIZE (mode) < UNITS_PER_WORD && can_create_pseudo_p() && MEM_P (src)) { int extend = (LOAD_EXTEND_OP (mode) == ZERO_EXTEND); rtx temp_reg = force_reg (word_mode, convert_to_mode (word_mode, src, extend)); riscv_emit_move(dest, temp_reg); return true; } tried to insert code at the beginning of riscv_legitimize_move() but seems convert_to_mode() raised seg fault druing make
[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417 --- Comment #17 from Jim Wilson --- Yes, LOAD_EXTEND_OP is a good suggestion. We can maybe do something like int extend = (LOAD_EXTEND_OP (mode) == ZERO_EXTEND); temp_reg = force_reg (word_mode, convert_to_mode (word_mode, src, extend));
[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417 --- Comment #16 from Kito Cheng --- > Or maybe we make the choice of zero-extend or sign-extend depend on the mode, > and use zero-extend for smaller than SImode and sign-extend for SImode and > larger. Maybe depend on LOAD_EXTEND_OP?
[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417 --- Comment #15 from Jim Wilson --- I tried testing your first patch. I just built unpatched/patch rv32-elf/rv64-linux toolchains and used nm/objdump to look at libraries like libc, libstdc++, libm. I managed to find a testcase from glibc that gives worse code with the patch. struct { unsigned int a : 1; unsigned int b : 1; unsigned int c : 1; unsigned int d : 1; unsigned int pad1 : 28; } s; void sub (void) { s.a = 1; s.c = 1; } Without the patch we get sub: lui a5,%hi(s) addia5,a5,%lo(s) lbu a4,1(a5) ori a4,a4,5 sb a4,1(a5) ret and with the patch we get sub: lui a4,%hi(s) lbu a5,%lo(s)(a4) andia5,a5,-6 ori a5,a5,5 sb a5,%lo(s)(a4) ret Note the extra and instruction. This seems to be a combine problem. With the patched compiler, I see in the -fdump-rtl-combine-all dump file Trying 9 -> 11: 9: r79:DI=r78:DI&0xfffa REG_DEAD r78:DI 11: r81:DI=r79:DI|0x5 REG_DEAD r79:DI Failed to match this instruction: (set (reg:DI 81) (ior:DI (and:DI (reg:DI 78 [ MEM [(struct *)] ]) (const_int 250 [0xfa])) (const_int 5 [0x5]))) Combine knows that reg 78 only has 8 nonzero bits, so it simplified the AND -6 to AND 250. If combine had left that constant alone, or if maybe combine propagated that info about nonzero bits through to r81, then it should have been able to optimize out the AND operation. This does work when the load does not zero extend by default. The ARM port shows the exact same problem. I see sub: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. movwr2, #:lower16:.LANCHOR0 movtr2, #:upper16:.LANCHOR0 ldrbr3, [r2]@ zero_extendqisi2 bic r3, r3, #5 orr r3, r3, #5 strbr3, [r2] bx lr and the bic (bit clear) is obviously unnecessary. This probably should be submitted as a separate bug if we don't want to fix it here.
[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417 --- Comment #14 from Jim Wilson --- Actually, for the SImode to DImode conversion, zero extending is unlikely to be correct in that case. The rv64 'w' instructions require the input to be sign-extended from 32-bits to 64-bits, so a sign-extend is probably required. There will be a similar consideration for rv128 when we get to that. So maybe we should only handle QImode and HImode here. Or maybe we make the choice of zero-extend or sign-extend depend on the mode, and use zero-extend for smaller than SImode and sign-extend for SImode and larger. For qimode, char is unsigned by default, so zero extension is likely the right choice. For himode, it isn't clear which is best, but the arm port does a zero extend. Also, the Huawei code size proposal says that zero exnteded byte and short loads are more important than sign extended byte and short load, so that is more evidence that zero extend is more useful in those cases.
[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417 --- Comment #13 from Jim Wilson --- The attachments show the entire riscv.c file being deleted and then readded with your change. This is due to a line ending problem. The original file has the unix style linefeed and the new file has the windows style carriage return and linefeed. Git has a setting called core.autocrlf that can help with this. This will require using git diff instead of just diff though to generate patche, but should give better diffs. Or alternatively, maybe to can force the original file to have msdos line endings before you make the diff, e.g. maybe just loading the original file into your editor and saving it without making changes will fix the line endings. You have in the second patch (mode == QImode || mode == SImode || mode == HImode) which is wrong but harmless for rv32 since we can't extend SImode, and is also wrong for the eventual rv128 support. You can fix this by using something like (GET_MODE_CLASS (mode) == MODE_INT && GET_MODE_SIZE (mode) < UNITS_PER_WORD There is one place in riscv_legitimize_move that already uses this. The code inside the if statement is a lot more verbose then necessary. You can use some helper functions to simplify it. First you can use word_mode which is DImode for rv64 and SImode for rv32 (and will be OImode for rv128). Then you can call a helper to do the mode conversion. So for instance something like temp_reg = force_reg (word_mode, convert_to_mode (word_mode, src, 1)); should work. That should emit an insn to do the zero extend and put it in a reg. Now you no longer need to check src mode or TARGET_64BIT as the code is the same in all cases. So it should just be about 3 or 4 lines of code for the body of the if statement. You have a check for REG_P (dest). This is stricter than you need, since it only works for REG and doesn't accept SUBREG. We should handle both. Also, it isn't hard to also handle non-reg or non-subreg dests. You just need to force the source to a reg, and you already did that when you generated the zero_extend operation. So this code looks like it should work for any dest and you should be able to drop the REG_P (dest) check.
[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417 --- Comment #12 from Levy --- (In reply to Kito Cheng from comment #11) > > Two failed cases: shorten-memrefs-5.c & shorten-memrefs-6.c > > Seems like shorten_memrefs pass didn't handle zero_extend and sign_extend > with memory. > > You can take a look into > riscv-shorten-memrefs.c:pass_shorten_memrefs::analyze and add logic to > handle zero_extend and sign_extend. > > > > With one instruction less, the patched one seems right and even faster to > > me. However we still need a test on sign extend and check performance > > shorten_memrefs is optimize for size, the idea is transform several load > instructions with large immediate to a load immediate instruction and load > instructions with small immediate, to use C-extensions instruction as > possible. > > so the instruction count seems increased, but the code size is smaller. Thank you cheng, I'll have a look.
[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417 Kito Cheng changed: What|Removed |Added CC||kito at gcc dot gnu.org --- Comment #11 from Kito Cheng --- > Two failed cases: shorten-memrefs-5.c & shorten-memrefs-6.c Seems like shorten_memrefs pass didn't handle zero_extend and sign_extend with memory. You can take a look into riscv-shorten-memrefs.c:pass_shorten_memrefs::analyze and add logic to handle zero_extend and sign_extend. > With one instruction less, the patched one seems right and even faster to me. > However we still need a test on sign extend and check performance shorten_memrefs is optimize for size, the idea is transform several load instructions with large immediate to a load immediate instruction and load instructions with small immediate, to use C-extensions instruction as possible. so the instruction count seems increased, but the code size is smaller.
[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417 --- Comment #10 from Levy --- Created attachment 49500 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49500=edit Optimzation Patch for QI/HImode(32bit) and QI/HI/SImode(64bit) Proposing second patch for QI/HImode(32bit) and QI/HI/SImode(64bit) Both Zero-Extend & Subreg Tested with make report-gcc Two failed cases: shorten-memrefs-5.c & shorten-memrefs-6.c Both were failed due to dejaGNU rule: /* { dg-final { scan-assembler "load1r:\n\taddi" } } */ But if we look at the assembly code, for same input in both file: int load1r (int *array) { int a = 0; a += array[200]; a += array[201]; a += array[202]; a += array[203]; return a; } Current gcc risc-v port will produce: load1r: addia5,a0,768 lw a4,36(a5) lw a0,32(a5) addwa0,a0,a4 lw a4,40(a5) addwa4,a4,a0 lw a0,44(a5) addwa0,a0,a4 ret Patched new port will produce: load1r: lwu a4,800(a0) lwu a5,804(a0) addwa5,a5,a4 lwu a4,808(a0) lwu a0,812(a0) addwa5,a5,a4 addwa0,a5,a0 ret With one instruction less, the patched one seems right and even faster to me. However we still need a test on sign extend and check performance Test case and performance evaluation will be provided later (hopefully)
[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417 --- Comment #9 from Levy --- Thanks Jim. See u on Monday.
[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417 --- Comment #8 from Levy --- Created attachment 49470 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49470=edit optimization fix for BUG 97417 proposing a temp patch here in case someone needs it, then I'll submit a full patch with test case later. Following code was added to the riscv_legitimize_move () in the riscv-gcc/gcc/config/riscv/riscv.c if(mode == QImode && MEM_P (src) && REG_P (dest) && can_create_pseudo_p()) { rtx temp_reg; if (TARGET_64BIT) { temp_reg = gen_reg_rtx (DImode); emit_insn(gen_zero_extendqidi2(temp_reg, src)); } else { temp_reg = gen_reg_rtx (SImode); emit_insn(gen_zero_extendqisi2(temp_reg, src)); } riscv_emit_move(dest, gen_lowpart(QImode,temp_reg)); return true; } Tested with make report-gcc, haven't done the regression/performance test yet: = Summary of gcc testsuite = | # of unexpected case / # of unique unexpected case | gcc | g++ | gfortran | rv64imafdc/ lp64d/ medlow |0 / 0 |0 / 0 | - |
[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417 --- Comment #7 from Jim Wilson --- That patch is basically correct. I would suggest using gen_lowpart instead of gen_rtx_SUBREG as a minor cleanup. It will do the same thing, and is shorter and easier to read. There is one problem here that you can't generate new pseudo registers during register allocation, or after register allocation is complete. So you need to disable this optimization in this case. You can do that by adding a check for can_create_pseudo_p (). This is already used explicitly in one place in riscv_legitimize_move and implicitly in some subfunctions, and is used in the arm.md movqi pattern. The next part is testing the patch. We need some correctness testing. You can just run the gcc testsuite for that. And we need some code size/performance testing. I'd suggest compiling some code with and without the patch and check function sizes and look for anything that got bigger with the patch, and check to see if it is a problem. I like to use the toolchain libraries like libc.a and libstdc++.a since they are being built anways, but using a nice benchmark is OK also as long as it is big enough to stress the compiler. If the patch passes testing, then we can look at expanding the scope to handle more modes, and also handle MEM dest in addition to REG dest. Yes, we can discuss this Monday.
[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417 Levy changed: What|Removed |Added CC||admin at levyhsu dot com --- Comment #6 from Levy --- Hi Jim Levy from StarFive. Adding following code to the head of riscv_legitimize_move() according to your reply seems to solve the problem: if(mode == QImode && MEM_P (src) && REG_P (dest)) { rtx temp_reg; if (TARGET_64BIT) { temp_reg = gen_reg_rtx (DImode); emit_insn(gen_zero_extendqidi2(temp_reg, src)); } else { temp_reg = gen_reg_rtx (SImode); emit_insn(gen_zero_extendqisi2(temp_reg, src)); } riscv_emit_move(dest, gen_rtx_SUBREG(QImode,temp_reg,0)); return true; } same foo.c will produce: foo: lui a5,%hi(active) lbu a5,%lo(active)(a5) li a0,42 bne a5,zero,.L6 ret .L6: li a0,-42 ret .size foo, .-foo .ident "GCC: (GNU) 10.2.0" Not sure if I'm doing it right, especially for 64bit DImode because I've only been with gcc for a month. Just wonder if you have time after Monday's compiler meeting so we may discuss movsi, movhi and MEM to MEM copy.
[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417 --- Comment #5 from Jim Wilson --- Yes, the volatile is the problem. We need to disable some optimizations like the combiner to avoid breaking the semantics of volatile. However, if you try looking at other ports, like arm, you can see that they don't have this problem because they generate different RTL at the start and hence do not need the combiner to generate the sign-extended load. So the proposal here is that we modify the RISC-V gcc port to also emit the sign-extended load at RTL generation time, which solves this problem. And then we need to do some testing to make sure that this actually generates good code in every case, as we don't want to accidentally introduce a code size or performance regression while fixing this volatile optimization problem. If you are curious about the combiner issue, see the init_recog_no_volatile call in combine.c. If you comment that out, the andi will be optimized away. But we can't remove that call, because that would break programs using volatile.
[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417 --- Comment #4 from jiawei --- I had did some tests with this problem and find: foo.c #include extern volatile bool active; int foo(void) { if (!active) { return 42; } else { return -42; } } code generated in foo.s foo: lui a5,%hi(active) lbu a5,%lo(active)(a5) li a0,42 andia5,a5,0xff beq a5,zero,.L2 li a0,-42 When we remove the keyword `volatile` foo_without_volatile.c #include extern bool active; int foo(void) { if (!active) { return 42; } else { return -42; } } code generated in foo_without_volatile.s foo: lui a5,%hi(active) lbu a5,%lo(active)(a5) li a0,42 beq a5,zero,.L2 li a0,-42 and then we change the type from `bool` to `int` foo_int.c #include extern volatile int active; int foo(void) { if (!active) { return 42; } else { return -42; } } code generated in foo_int.s foo: lui a5,%hi(active) lw a5,%lo(active)(a5) li a0,42 sext.w a5,a5 beq a5,zero,.L2 li a0,-42 the `sext.w` instruction replace the `andi` We also remove the keyword `volatile` in foo_int_without_volatile.c #include extern int active; int foo(void) { if (!active) { return 42; } else { return -42; } } code generated in foo_int_without_volatile.s also look like optimized foo: lui a5,%hi(active) lw a5,%lo(active)(a5) li a0,42 beq a5,zero,.L2 li a0,-42 Maybe this problem is due to the keyword `volatile`.
[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417 --- Comment #3 from Jim Wilson --- The basic idea here is that the movqi pattern in riscv.md currently emits RTL for a load that looks like this (set (reg:QI target) (mem:QI (address))) As an experiment, we want to try changing that to something like this (set (reg:DI temp) (zero_extend:DI (mem:DI (address (set (reg:QI target) (subreg:QI (reg:DI temp) 0)) The hope is that the optimizer will combine the subreg with following operations resulting in smaller faster code at the end. And this should also solve the volatile load optimization problem. So we need a patch, and then we need experiments to see if the patch actually produces better code on real programs. It should be fairly easy to write the patch even if you don't have any gcc experience. The testing part of this is probably more work than the patch writing. The movqi pattern calls riscv_legitmize_move in riscv.c, so that would have to be modified to look for qimode loads from memory, allocate a temporary register, do a zero_extending load into the temp reg, and then a subreg copy into the target register. You will probably also need to handle cases where both the target and source are memory locations, in which case this already gets split into two instructions, a load followed by a store. You can look at the movqi pattern in arm.md file to see an example of how to do this, where it calls gen_zero_extendqisi2. Though for RISC-V, we would want gen_zero_extendqidi2 for rv64 and gen_zero_extendqisi2 for rv32. If the movqi change works, then we would want similar changes for movhi and maybe also movsi for rv64. It might also be worth checking whether zero-extend or sign-extend is the better choice. We zero extend char by default, so that should be best. For rv64, the hardware sign extends simode to dimode by default, so sign-extend is probably best for that. For himode I'm not sure, I think we prefer sign-extend by default, but that isn't necessarily the best choice for loads. This would have to be tested. You can see rtl dumps by using -fdump-rtl-all. The combiner is the pass that should be optimizing away the unnecessary zero-extend. You can see details of what the combiner pass is doing by using -fdump-rtl-combine-all.
[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417 jiawei changed: What|Removed |Added CC||jiawei at iscas dot ac.cn --- Comment #2 from jiawei --- (In reply to Jim Wilson from comment #1) > Comparing with the ARM port, I see that in the ARM port, the movqi pattern > emits > (insn 8 7 9 2 (set (reg:SI 117) > (zero_extend:SI (mem/v/c:QI (reg/f:SI 115) [1 active+0 S1 A8]))) > "tmp.c\ > ":7:7 -1 > (nil)) > (insn 9 8 10 2 (set (reg:QI 116) > (subreg:QI (reg:SI 117) 0)) "tmp.c":7:7 -1 > (nil)) > and then later it combines the subreg operation with the following > zero_extend and cancels them out. > > Whereas in the RISC-V port, the movqi pattern emits > (insn 9 7 10 2 (set (reg:QI 76) > (mem/v/c:QI (lo_sum:DI (reg:DI 74) > (symbol_ref:DI ("active") [flags 0xc4] 0x7f9f0310312\ > 0 active>)) [1 active+0 S1 A8])) "tmp.c":7:7 -1 > (nil)) > and then combine refuses to combine the following zero-extend with this insn > as the memory operation is volatile. > > So it seems we need to rewrite the RISC-V port to make movqi and movhi zero > extend to si/di mode and then subreg. That probably will require cascading > changes to avoid code size and performance regressions. > > Looks like a tractable small to medium size project, but will need to wait > for a volunteer to work on it. Hi Jim, My name is Jiawei Chen. I am from the PLCT Lab. I had recurrented this bug, and want to try to help fixing this bug. What should I modify,is there any suggestions?
[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417 Jim Wilson changed: What|Removed |Added CC||wilson at gcc dot gnu.org Status|UNCONFIRMED |NEW Last reconfirmed||2020-10-15 Ever confirmed|0 |1 --- Comment #1 from Jim Wilson --- Comparing with the ARM port, I see that in the ARM port, the movqi pattern emits (insn 8 7 9 2 (set (reg:SI 117) (zero_extend:SI (mem/v/c:QI (reg/f:SI 115) [1 active+0 S1 A8]))) "tmp.c\ ":7:7 -1 (nil)) (insn 9 8 10 2 (set (reg:QI 116) (subreg:QI (reg:SI 117) 0)) "tmp.c":7:7 -1 (nil)) and then later it combines the subreg operation with the following zero_extend and cancels them out. Whereas in the RISC-V port, the movqi pattern emits (insn 9 7 10 2 (set (reg:QI 76) (mem/v/c:QI (lo_sum:DI (reg:DI 74) (symbol_ref:DI ("active") [flags 0xc4] )) [1 active+0 S1 A8])) "tmp.c":7:7 -1 (nil)) and then combine refuses to combine the following zero-extend with this insn as the memory operation is volatile. So it seems we need to rewrite the RISC-V port to make movqi and movhi zero extend to si/di mode and then subreg. That probably will require cascading changes to avoid code size and performance regressions. Looks like a tractable small to medium size project, but will need to wait for a volunteer to work on it.