[Bug target/98981] gcc-10.2 for RISC-V has extraneous register moves
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98981 --- Comment #5 from Jim Wilson --- Neither of the two patches I mentioned in comment 1 can fix the problem by themselves, as we still have a mix of SImode and DImode operations. I looked at REE. It doesn't work because there is more than one reaching def. But even if it did work, I don't think it would completely solve the problem because it runs after register allocation and hence won't be able to remove move instructions. To get the best result, we need the register allocator to take two registers with different modes with overlapping live ranges, and realize that they can be allocated to the same hard reg because the overlapping uses are non-conflicting. I haven't tried looking at the register allocator, but it doesn't seem like a good way to try to solve the problem. We have an inconvenient mix of SImdoe and DImode because we don't have SImode compare and branch instructions. That requires sign extending 32-bit values to 64-bit to compare them, which then results in the sign extend and register allocation optimization issues. it is unlikely that 32-bit compare and branch instructions will be added to the ISA though. One useful thing I noticed is that the program is doing a max operation, and the B extension adds a max instruction. Having one instruction instead of a series of instructions including a branch to compute max makes the optimization issues easier, and gcc does give the right result in this case. Using a compiler with B support I get lw a4,0(a5) lw a2,0(a3) addia5,a5,4 addia3,a3,4 addwa4,a4,a2 max a0,a4,a0 bne a5,a1,.L2 which is good code with the extra moves and sign-extends removed. So I have a workaround of sorts, but only if you have the B extension. The -mtune=sifive-7-series can support conditional move via macro fusion, I was hopeful that this would work as well as max, but unfortunately the sign-extend that was removed in the max case does't get removed in the conditional move case. Also, the conditional move is 2-address, and the register allocator ends up needing a reload, which gives us the unwanted mv again. So the code in this case is the same as without the option. I didn't check to see if this is fixable.
[Bug target/98981] gcc-10.2 for RISC-V has extraneous register moves
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98981 --- Comment #4 from Jim Wilson --- With this testcase extern void sub2 (void); void sub (int *i, int *j) { int k = *i + 1; *j = k; if (k == 0) sub2 (); } Compiling without the riscv_rtx_cost patch, I get lw a5,0(a0) addiw a4,a5,1 sw a4,0(a1) beq a4,zero,.L4 compiling with the riscv_rtx_cost patch, I get lw a5,0(a0) addiw a4,a5,1 sw a4,0(a1) addiw a5,a5,1 beq a5,zero,.L4 The problem here is that we have RTL (insn 9 7 10 2 (set (reg:SI 76) (plus:SI (subreg:SI (reg:DI 78 [ *i_4(D) ]) 0) (const_int 1 [0x1]))) "tmp.c":6:7 3 {addsi3} (expr_list:REG_DEAD (reg:DI 78 [ *i_4(D) ]) (nil))) (insn 10 9 11 2 (set (reg/v:DI 73 [ k ]) (sign_extend:DI (reg:SI 76))) "tmp.c":6:7 90 {extendsidi2} (nil)) with the SImode 76 used for the sw and the DImode 73 used for the beq. With the riscv_rtx_cost change, which makes the sign_extend add cheap, combine folds the add into the sign-extend to get (insn 9 7 10 2 (set (reg:SI 76) (plus:SI (subreg:SI (reg:DI 78 [ *i_4(D) ]) 0) (const_int 1 [0x1]))) "tmp.c":6:7 3 {addsi3} (nil)) (insn 10 9 11 2 (set (reg/v:DI 73 [ k ]) (sign_extend:DI (plus:SI (subreg:SI (reg:DI 78 [ *i_4(D) ]) 0) (const_int 1 [0x1] "tmp.c":6:7 5 {*addsi3_extended} (expr_list:REG_DEAD (reg:DI 78 [ *i_4(D) ]) (nil))) and now we have two adds which is wrong. Without the patch, combine does nothing, then ree manages to merge the sign_extend into the add and subseg the sw, so we end up with only the one addw and no explicit sign extend. My take on this is that we never should have generated the SImode add in the first place, because we don't actually have that instruction. If we would have generated the sign-extended add at rtl generation time, and subreged the result, then we would have gotten the right result. In theory. So I think the riscv_rtx_cost change is useful, but only if combined with the rtl generation change I proposed in comment 1.
[Bug target/98981] gcc-10.2 for RISC-V has extraneous register moves
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98981 --- Comment #3 from Jim Wilson --- I suppose cost model problems could explain why combine didn't do the optimization. I didn't have a chance to look at that. I still think there is a fundmental problem with how we represent SImode operations, but again cost model problems could explain why my experiments to fix that didn't work as expected. I probably didn't look at that when I was experimenting with riscv.md changes. Your patch does look useful, but setting cost to 1 for MULT is wrong, and would be just as wrong for DIV. That is OK for PLUS, MINUS, and NEG though. I think a better option is to set *total = 0 and return false. That gives no extra cost to the sign extend, and recurs to get the proper cost for the operation underneath. That would work for MUL and DIV. I found code in the rs6000 port that does this.
[Bug target/98981] gcc-10.2 for RISC-V has extraneous register moves
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98981 Kito Cheng changed: What|Removed |Added CC||kito at gcc dot gnu.org --- Comment #2 from Kito Cheng --- Here is a quick patch for fix part of this issue, it seems like because our cost model is inprecise, but I guess I need run benchmark to make sure the performance and code size didn't get any regression. find_max_i32: lui a4,%hi(.LANCHOR0) addia4,a4,%lo(.LANCHOR0) addia3,a4,1024 addia6,a4,400 li a0,0 .L3: lw a5,0(a4) lw a2,0(a3) addia4,a4,4 addia3,a3,4 addwa1,a5,a2 addwa5,a5,a2 bge a1,a0,.L2 mv a5,a0 .L2: sext.w a0,a5 bne a4,a6,.L3 ret Patch: diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c index d489717b2a5..b8c9f7200ce 100644 --- a/gcc/config/riscv/riscv.c +++ b/gcc/config/riscv/riscv.c @@ -1879,6 +1879,15 @@ riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN } /* Fall through. */ case SIGN_EXTEND: + if (TARGET_64BIT && !REG_P (XEXP (x, 0))) + { + int code = GET_CODE (XEXP (x, 0)); + if (code == PLUS || code == MINUS || code == NEG || code == MULT) + { + *total = COSTS_N_INSNS (1); + return true; + } + } *total = riscv_extend_cost (XEXP (x, 0), GET_CODE (x) == ZERO_EXTEND); return false;
[Bug target/98981] gcc-10.2 for RISC-V has extraneous register moves
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98981 Jim Wilson changed: What|Removed |Added CC||wilson at gcc dot gnu.org --- Comment #1 from Jim Wilson --- The extra move instruction is a side effect of how the riscv64 toolchain handles 32-bit arithmetic. We lie to the compiler and tell it that we have instructions that produce 32-bit results. In fact, we only have instructions that produce 64-bit sign-extended 32-bit results. The lie means that the RTL has some insns with SImode output and some instructions with DImode outputs, and sometimes we end up with nop moves to convert between the modes. In this case, it is peephole2 after regalloc that notices a SImode add followed by a sign-extend, and converts it to a sign-extending 32-bit add followed by a move, but can't eliminate the move because we already did register allocation. This same problem is also why we get the unnecessary sext after the label, as peephole can't fix that. This problem has been on my todo list for a few years, and I have ideas of how to fix it, but I have no idea when I will have time to try to fix it. I did document it for the RISC-V International Code Speed Optimization task group. https://github.com/riscv/riscv-code-speed-optimization/blob/main/projects/gcc-optimizations.adoc This one is the first one in the list.