[Bug rtl-optimization/70478] [LRA] S/390: Performance regression - superfluous stack frame
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70478 --- Comment #11 from Vladimir Makarov --- Author: vmakarov Date: Tue Apr 11 19:39:59 2017 New Revision: 246854 URL: https://gcc.gnu.org/viewcvs?rev=246854=gcc=rev Log: 2017-04-11 Vladimir MakarovPR rtl-optimization/70478 * lra-constraints.c (process_alt_operands): Check memory for disfavoring memory insn operand. Modified: trunk/gcc/ChangeLog trunk/gcc/lra-constraints.c
[Bug rtl-optimization/70478] [LRA] S/390: Performance regression - superfluous stack frame
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70478 --- Comment #10 from Vladimir Makarov --- Author: vmakarov Date: Mon Apr 10 14:58:33 2017 New Revision: 246808 URL: https://gcc.gnu.org/viewcvs?rev=246808=gcc=rev Log: 2017-04-10 Vladimir MakarovPR rtl-optimization/70478 * lra-constraints.c (curr_small_class_check): New. (update_and_check_small_class_inputs): New. (process_alt_operands): Update curr_small_class_check. Disfavor alternative insn memory operands. Check available regs for small class operands. Modified: trunk/gcc/ChangeLog trunk/gcc/lra-constraints.c
[Bug rtl-optimization/70478] [LRA] S/390: Performance regression - superfluous stack frame
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70478 --- Comment #9 from Vladimir Makarov --- Author: vmakarov Date: Sat Apr 8 19:18:42 2017 New Revision: 246789 URL: https://gcc.gnu.org/viewcvs?rev=246789=gcc=rev Log: 2017-04-08 Vladimir MakarovPR rtl-optimization/70478 * lra-constraints.c: Reverse the last patch. Modified: trunk/gcc/ChangeLog trunk/gcc/lra-constraints.c
[Bug rtl-optimization/70478] [LRA] S/390: Performance regression - superfluous stack frame
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70478 --- Comment #8 from Vladimir Makarov --- Author: vmakarov Date: Fri Apr 7 16:01:50 2017 New Revision: 246764 URL: https://gcc.gnu.org/viewcvs?rev=246764=gcc=rev Log: 2017-04-07 Vladimir MakarovPR rtl-optimization/70478 * lra-constraints.c (process_alt_operands): Disfavor alternative insn memory operands. 2017-04-07 Vladimir Makarov PR rtl-optimization/70478 * gcc.target/s390/pr70478.c: New. Added: trunk/gcc/testsuite/gcc.target/s390/pr70478.c Modified: trunk/gcc/ChangeLog trunk/gcc/lra-constraints.c trunk/gcc/testsuite/ChangeLog
[Bug rtl-optimization/70478] [LRA] S/390: Performance regression - superfluous stack frame
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70478 --- Comment #7 from Vladimir Makarov --- (In reply to Andreas Krebbel from comment #6) > The only solution we found caused other regressions. I'll try to change the sensitive LRA code to solve it. It will need to test a few targets. So, if everything is ok, the patch will be probably ready at the end of the week or on the next week.
[Bug rtl-optimization/70478] [LRA] S/390: Performance regression - superfluous stack frame
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70478 Andreas Krebbel changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |WONTFIX --- Comment #6 from Andreas Krebbel --- The only solution we found caused other regressions.
[Bug rtl-optimization/70478] [LRA] S/390: Performance regression - superfluous stack frame
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70478 --- Comment #5 from Andreas Krebbel --- (In reply to Dominik Vogt from comment #4) > Is there any new information on this issue? Adding the ? constraint modifier was an overall loss. So I did not pursue this any further.
[Bug rtl-optimization/70478] [LRA] S/390: Performance regression - superfluous stack frame
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70478 Dominik Vogt changed: What|Removed |Added CC||vogt at linux dot vnet.ibm.com --- Comment #4 from Dominik Vogt --- Is there any new information on this issue?
[Bug rtl-optimization/70478] [LRA] S/390: Performance regression - superfluous stack frame
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70478 --- Comment #3 from Andreas Krebbel --- (In reply to Vladimir Makarov from comment #2) Thanks for having a look. I'll experiment a bit with adding a '?' constraint modifier to see what impact it has on benchmarks. In fact it would match the reality a bit better anyway since the mem-mem instructions have some restrictions others don't have.
[Bug rtl-optimization/70478] [LRA] S/390: Performance regression - superfluous stack frame
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70478 --- Comment #2 from Vladimir Makarov --- The difference I see is that LRA chooses alternative "Q,0,Q" and reload chooses "d,0,R". For the "Q,O,Q" LRA reports: 2 Spill pseudo into memory: reject+=3 alt=11,overall=9,losers=1,rld_nregs=0 For "d,0,R" it reports: 0 Non-pseudo reload: reject+=2 0 Non input pseudo reload: reject++ 1 Dying matched operand reload: reject++ alt=8,overall=10,losers=1 -- refuse So it is 9 vs 10. It would be the same # of insns if we already had a stack frame. Most non-toy functions will have a stack frame. So the problem is not that bad for a real world scenario. I'll look what can I do to fix this. But I should say that it is a very sensitive code of LRA. Fiddling with heuristics might affect many programs and targets and might result in new PRs.
[Bug rtl-optimization/70478] [LRA] S/390: Performance regression - superfluous stack frame
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70478 Richard Biener changed: What|Removed |Added Keywords||ra Status|UNCONFIRMED |NEW Last reconfirmed||2016-03-31 Ever confirmed|0 |1 --- Comment #1 from Richard Biener --- It seems to somehow "optimize" with regard to the and op being unsigned char and thus all upper bits being zero and only the lower ones need to be computed? Before RA (insn 7 4 8 2 (set (reg:SI 66 [ *b_4(D)+-3 ]) (zero_extend:SI (mem:QI (reg:DI 3 %r3 [ b ]) [0 *b_4(D)+0 S1 A8]))) t.c:3 1209 {*zero_extendqisi2_extimm} (expr_list:REG_DEAD (reg:DI 3 %r3 [ b ]) (nil))) (insn 8 7 11 2 (parallel [ (set (mem:SI (reg:DI 2 %r2 [ a ]) [1 *a_2(D)+0 S4 A32]) (and:SI (mem:SI (reg:DI 2 %r2 [ a ]) [1 *a_2(D)+0 S4 A32]) (reg:SI 66 [ *b_4(D)+-3 ]))) (clobber (reg:CC 33 %cc)) ]) t.c:3 1473 {*andsi3_zarch} (expr_list:REG_DEAD (reg:SI 66 [ *b_4(D)+-3 ]) (expr_list:REG_DEAD (reg:DI 2 %r2 [ a ]) (expr_list:REG_UNUSED (reg:CC 33 %cc) (nil) after (insn 7 4 8 2 (set (reg:SI 66 [ *b_4(D)+-3 ]) (zero_extend:SI (mem:QI (reg:DI 3 %r3 [ b ]) [0 *b_4(D)+0 S1 A8]))) t.c:3 1209 {*zero_extendqisi2_extimm} (expr_list:REG_DEAD (reg:DI 3 %r3 [ b ]) (nil))) (insn 8 7 11 2 (parallel [ (set (mem:SI (reg:DI 2 %r2 [ a ]) [1 *a_2(D)+0 S4 A32]) (and:SI (mem:SI (reg:DI 2 %r2 [ a ]) [1 *a_2(D)+0 S4 A32]) (reg:SI 66 [ *b_4(D)+-3 ]))) (clobber (reg:CC 33 %cc)) ]) t.c:3 1473 {*andsi3_zarch} (expr_list:REG_DEAD (reg:SI 66 [ *b_4(D)+-3 ]) (expr_list:REG_DEAD (reg:DI 2 %r2 [ a ]) (expr_list:REG_UNUSED (reg:CC 33 %cc) (nil) but then LRA chooses to spill while reload is happy with the above.