[Bug target/105733] riscv: Poor codegen for large stack frames

2024-05-28 Thread vineetg at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105733

Vineet Gupta  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #6 from Vineet Gupta  ---
Fixed with aforementioned commit for gcc-15.

[Bug target/105733] riscv: Poor codegen for large stack frames

2024-05-21 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105733

--- Comment #5 from GCC Commits  ---
The master branch has been updated by Vineet Gupta :

https://gcc.gnu.org/g:f9cfc192ed0127edb7e79818917dd2859fce4d44

commit r15-757-gf9cfc192ed0127edb7e79818917dd2859fce4d44
Author: Vineet Gupta 
Date:   Mon May 13 11:46:03 2024 -0700

RISC-V: avoid LUI based const mat in prologue/epilogue expansion
[PR/105733]

If the constant used for stack offset can be expressed as sum of two S12
values, the constant need not be materialized (in a reg) and instead the
two S12 bits can be added to instructions involved with frame pointer.
This avoids burning a register and more importantly can often get down
to be 2 insn vs. 3.

The prev patches to generally avoid LUI based const materialization didn't
fix this PR and need this directed fix in funcion prologue/epilogue
expansion.

This fix doesn't move the neddle for SPEC, at all, but it is still a
win considering gcc generates one insn fewer than llvm for the test ;-)

   gcc-13.1 release   |  gcc 230823 |   |
  |g6619b3d4c15c|   This patch  | 
clang/llvm
   
-
li  t0,-4096 | lit0,-4096  | addi  sp,sp,-2048 | addi
sp,sp,-2048
addit0,t0,2016   | addi  t0,t0,2032| add   sp,sp,-16   | addi
sp,sp,-32
li  a4,4096  | add   sp,sp,t0  | add   a5,sp,a0| add 
a1,sp,16
add sp,sp,t0 | addi  a5,sp,-2032   | sbzero,0(a5)  | add 
a0,a0,a1
li  a5,-4096 | add   a0,a5,a0  | addi  sp,sp,2032  | sb  
zero,0(a0)
addia4,a4,-2032  | lit0, 4096  | addi  sp,sp,32| addi
sp,sp,2032
add a4,a4,a5 | sbzero,2032(a0) | ret   | addi
sp,sp,48
addia5,sp,16 | addi  t0,t0,-2032   |   | ret
add a5,a4,a5 | add   sp,sp,t0  |
add a0,a5,a0 | ret |
li  t0,4096  |
sd  a5,8(sp) |
sb  zero,2032(a0)|
addit0,t0,-2016  |
add sp,sp,t0 |
ret  |

gcc/ChangeLog:
PR target/105733
* config/riscv/riscv.h: New macros for with aligned offsets.
* config/riscv/riscv.cc (riscv_split_sum_of_two_s12): New
function to split a sum of two s12 values into constituents.
(riscv_expand_prologue): Handle offset being sum of two S12.
(riscv_expand_epilogue): Ditto.
* config/riscv/riscv-protos.h (riscv_split_sum_of_two_s12): New.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr105733.c: New Test.
* gcc.target/riscv/rvv/autovec/vls/spill-1.c: Adjust to not
expect LUI 4096.
* gcc.target/riscv/rvv/autovec/vls/spill-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-7.c: Ditto.

Tested-by: Edwin Lu  # pre-commit-CI #1568
Signed-off-by: Vineet Gupta 

[Bug target/105733] riscv: Poor codegen for large stack frames

2023-12-15 Thread vineetg at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105733

Vineet Gupta  changed:

   What|Removed |Added

 CC||vineetg at gcc dot gnu.org

--- Comment #4 from Vineet Gupta  ---
There has been good improvements in gcc codegen specially with commit below.

commit 6619b3d4c15cd754798b1048c67f3806bbcc2e6d
Author: Jivan Hakobyan 
Date:   Wed Aug 23 14:10:30 2023 -0600

Improve quality of code from LRA register elimination

This is primarily Jivan's work, I'm mostly responsible for the write-up and
coordinating with Vlad on a few questions.

On targets with limitations on immediates usable in arithmetic
instructions,
LRA's register elimination phase can construct fairly poor code.

 Tip W/o commit 6619b3d4c| With 6619b3d4c   
 |
foo: | foo:
li  t0,-4096 |  li  t0,-4096
addit0,t0,2032   |  addit0,t0,2032
li  a5,0 |
li  a4,0 |
add sp,sp,t0 |  add sp,sp,t0
add a4,a4,a5 |
add a5,a4,sp |  add a5,a5,a0
add a5,a5,a0 |
li  t0,4096  |  li  t0,4096
sb  zero,0(a5)   |  sb  zero,0(a5)
addit0,t0,-2032  |  addit0,t0,-2032
add sp,sp,t0 |  add sp,sp,t0
jr  ra   |  jr  ra

We still have the weird LUI 4096 based constant construction. I have a patch to
avoid 4096 for certain ranges  [-4096,-2049] or [2048,4094] (cribbed from
llvm).
e.g. 2064 = 2047 + 17 and we could potentially "spread" the 2 parts over 2 adds
to SP, avoiding the LUI. However if a const costs more than 1 insn, gcc wants
to force it in a register rather than split the add operation into 2 adds with
the split constants.

expand_binop
  expand_binop_directly
   avoid_expensive_constant

/* X is to be used in mode MODE as operand OPN to BINOPTAB.  If we're
   optimizing, and if the operand is a constant that costs more than
   1 instruction, force the constant into a register and return that
   register.  Return X otherwise.  UNSIGNEDP says whether X is unsigned.  */

[Bug target/105733] riscv: Poor codegen for large stack frames

2022-07-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105733

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |enhancement

[Bug target/105733] riscv: Poor codegen for large stack frames

2022-07-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105733

Andrew Pinski  changed:

   What|Removed |Added

 CC||vineet.gupta at linux dot dev

--- Comment #3 from Andrew Pinski  ---
*** Bug 106439 has been marked as a duplicate of this bug. ***

[Bug target/105733] riscv: Poor codegen for large stack frames

2022-06-06 Thread wilson at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105733

Jim Wilson  changed:

   What|Removed |Added

 CC||wilson at gcc dot gnu.org

--- Comment #2 from Jim Wilson  ---
This is a known problem that I have commented on before.  For instance here
https://github.com/riscv-collab/riscv-gcc/issues/193

Copying my reply to here since this is a better place for the bug report:

This is a known problem. GCC first emits code using a frame pointer, and then
tries to eliminate the frame pointer during register allocation. When the frame
size is larger than the immediate field of an add, we need temporaries to
compute frame offsets. Compiler optimizations like common subexpression
elimination (cse) try to optimize the calculation of the temporaries. Then when
we try to eliminate the frame pointer, we run into trouble because the cse opt
changes interfere with the frame pointer elimination, and we end up with an
ugly inefficient mess.

This problem isn't RISC-V specific, but since RISC-V has 12-bit immediates and
most everyone else has 16-bit immediates, we hit the problem sooner, and thus
makes it much more visible for RISC-V. The example above is putting 12KB on the
stack, which is larger than 12-bits of range (4KB), but well within 16-bits of
range (64KB).

I have a prototype of a patch to fix it by allowing >12-bit offsets for frame
pointer references, and then fixing this when eliminating the frame pointer,
but this can cause other problems, and needs more work to be usable. I have no
idea when someone might try to finish the patch.

End of inclusion from github.

To improve my earlier answer, we have signed 12-bit immediates, so the trouble
actually starts at 2048 bytes, and Jessica's example is larger than that. 
Reduce BUFSIZ to 2044 and you get a reasonable result.

foo:
li  a5,4096
addia5,a5,-2048
addisp,sp,-2048
add a5,a5,a0
add a0,a5,sp
li  t0,4096
sb  zero,-2048(a0)
addit0,t0,-2048
add sp,sp,t0
jr  ra

There is still a bit of oddness here.  We are adding 2048 to a0 and then using
an address -2048(a0).  I think that is more cse trouble.  2048 requires two
instructions to load into a register which is likely confusing something
somewhere.

[Bug target/105733] riscv: Poor codegen for large stack frames

2022-05-25 Thread kito at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105733

Kito Cheng  changed:

   What|Removed |Added

 CC||kito at gcc dot gnu.org
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2022-05-26

--- Comment #1 from Kito Cheng  ---
Confirmed, Thanks for reporting this!