[Bug rtl-optimization/70478] [LRA] S/390: Performance regression - superfluous stack frame

2017-04-11 Thread vmakarov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70478

--- Comment #11 from Vladimir Makarov  ---
Author: vmakarov
Date: Tue Apr 11 19:39:59 2017
New Revision: 246854

URL: https://gcc.gnu.org/viewcvs?rev=246854=gcc=rev
Log:
2017-04-11  Vladimir Makarov  

PR rtl-optimization/70478
* lra-constraints.c (process_alt_operands): Check memory for
disfavoring memory insn operand.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/lra-constraints.c

[Bug rtl-optimization/70478] [LRA] S/390: Performance regression - superfluous stack frame

2017-04-10 Thread vmakarov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70478

--- Comment #10 from Vladimir Makarov  ---
Author: vmakarov
Date: Mon Apr 10 14:58:33 2017
New Revision: 246808

URL: https://gcc.gnu.org/viewcvs?rev=246808=gcc=rev
Log:
2017-04-10  Vladimir Makarov  

PR rtl-optimization/70478
* lra-constraints.c (curr_small_class_check): New.
(update_and_check_small_class_inputs): New.
(process_alt_operands): Update curr_small_class_check.  Disfavor
alternative insn memory operands.  Check available regs for small
class operands.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/lra-constraints.c

[Bug rtl-optimization/70478] [LRA] S/390: Performance regression - superfluous stack frame

2017-04-08 Thread vmakarov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70478

--- Comment #9 from Vladimir Makarov  ---
Author: vmakarov
Date: Sat Apr  8 19:18:42 2017
New Revision: 246789

URL: https://gcc.gnu.org/viewcvs?rev=246789=gcc=rev
Log:
2017-04-08  Vladimir Makarov  

PR rtl-optimization/70478
* lra-constraints.c: Reverse the last patch.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/lra-constraints.c

[Bug rtl-optimization/70478] [LRA] S/390: Performance regression - superfluous stack frame

2017-04-07 Thread vmakarov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70478

--- Comment #8 from Vladimir Makarov  ---
Author: vmakarov
Date: Fri Apr  7 16:01:50 2017
New Revision: 246764

URL: https://gcc.gnu.org/viewcvs?rev=246764=gcc=rev
Log:
2017-04-07  Vladimir Makarov  

PR rtl-optimization/70478
* lra-constraints.c (process_alt_operands): Disfavor alternative
insn memory operands.

2017-04-07  Vladimir Makarov  

PR rtl-optimization/70478
* gcc.target/s390/pr70478.c: New.


Added:
trunk/gcc/testsuite/gcc.target/s390/pr70478.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/lra-constraints.c
trunk/gcc/testsuite/ChangeLog

[Bug rtl-optimization/70478] [LRA] S/390: Performance regression - superfluous stack frame

2017-04-05 Thread vmakarov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70478

--- Comment #7 from Vladimir Makarov  ---
(In reply to Andreas Krebbel from comment #6)
> The only solution we found caused other regressions.

I'll try to change the sensitive LRA code to solve it.  It will need to test a
few targets.  So, if everything is ok, the patch will be probably ready at the
end of the week or on the next week.

[Bug rtl-optimization/70478] [LRA] S/390: Performance regression - superfluous stack frame

2017-03-30 Thread krebbel at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70478

Andreas Krebbel  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |WONTFIX

--- Comment #6 from Andreas Krebbel  ---
The only solution we found caused other regressions.

[Bug rtl-optimization/70478] [LRA] S/390: Performance regression - superfluous stack frame

2017-02-03 Thread krebbel at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70478

--- Comment #5 from Andreas Krebbel  ---
(In reply to Dominik Vogt from comment #4)
> Is there any new information on this issue?

Adding the ? constraint modifier was an overall loss. So I did not pursue this
any further.

[Bug rtl-optimization/70478] [LRA] S/390: Performance regression - superfluous stack frame

2017-02-03 Thread vogt at linux dot vnet.ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70478

Dominik Vogt  changed:

   What|Removed |Added

 CC||vogt at linux dot vnet.ibm.com

--- Comment #4 from Dominik Vogt  ---
Is there any new information on this issue?

[Bug rtl-optimization/70478] [LRA] S/390: Performance regression - superfluous stack frame

2016-04-01 Thread krebbel at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70478

--- Comment #3 from Andreas Krebbel  ---
(In reply to Vladimir Makarov from comment #2)
Thanks for having a look.  I'll experiment a bit with adding a '?' constraint
modifier to see what impact it has on benchmarks. In fact it would match the
reality a bit better anyway since the mem-mem instructions have some
restrictions others don't have.

[Bug rtl-optimization/70478] [LRA] S/390: Performance regression - superfluous stack frame

2016-03-31 Thread vmakarov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70478

--- Comment #2 from Vladimir Makarov  ---
The difference I see is that LRA chooses alternative "Q,0,Q" and reload chooses
"d,0,R".

For the "Q,O,Q" LRA reports:

  2 Spill pseudo into memory: reject+=3
  alt=11,overall=9,losers=1,rld_nregs=0

For "d,0,R" it reports:

0 Non-pseudo reload: reject+=2
0 Non input pseudo reload: reject++
1 Dying matched operand reload: reject++
alt=8,overall=10,losers=1 -- refuse

So it is 9 vs 10.  It would be the same # of insns if we already had a stack
frame.  Most non-toy functions will have a stack frame.  So the problem is not
that bad for a real world scenario.

I'll look what can I do to fix this.  But I should say that it is a very
sensitive code of LRA.  Fiddling with heuristics might affect many programs and
targets and might result in new PRs.

[Bug rtl-optimization/70478] [LRA] S/390: Performance regression - superfluous stack frame

2016-03-31 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70478

Richard Biener  changed:

   What|Removed |Added

   Keywords||ra
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2016-03-31
 Ever confirmed|0   |1

--- Comment #1 from Richard Biener  ---
It seems to somehow "optimize" with regard to the and op being unsigned char
and thus all upper bits being zero and only the lower ones need to be computed?

Before RA

(insn 7 4 8 2 (set (reg:SI 66 [ *b_4(D)+-3 ])
(zero_extend:SI (mem:QI (reg:DI 3 %r3 [ b ]) [0 *b_4(D)+0 S1 A8])))
t.c:3 1209 {*zero_extendqisi2_extimm}
 (expr_list:REG_DEAD (reg:DI 3 %r3 [ b ])
(nil)))
(insn 8 7 11 2 (parallel [
(set (mem:SI (reg:DI 2 %r2 [ a ]) [1 *a_2(D)+0 S4 A32])
(and:SI (mem:SI (reg:DI 2 %r2 [ a ]) [1 *a_2(D)+0 S4 A32])
(reg:SI 66 [ *b_4(D)+-3 ])))
(clobber (reg:CC 33 %cc))
]) t.c:3 1473 {*andsi3_zarch}
 (expr_list:REG_DEAD (reg:SI 66 [ *b_4(D)+-3 ])
(expr_list:REG_DEAD (reg:DI 2 %r2 [ a ])
(expr_list:REG_UNUSED (reg:CC 33 %cc)
(nil)

after

(insn 7 4 8 2 (set (reg:SI 66 [ *b_4(D)+-3 ])
(zero_extend:SI (mem:QI (reg:DI 3 %r3 [ b ]) [0 *b_4(D)+0 S1 A8])))
t.c:3 1209 {*zero_extendqisi2_extimm}
 (expr_list:REG_DEAD (reg:DI 3 %r3 [ b ])
(nil)))
(insn 8 7 11 2 (parallel [
(set (mem:SI (reg:DI 2 %r2 [ a ]) [1 *a_2(D)+0 S4 A32])
(and:SI (mem:SI (reg:DI 2 %r2 [ a ]) [1 *a_2(D)+0 S4 A32])
(reg:SI 66 [ *b_4(D)+-3 ])))
(clobber (reg:CC 33 %cc))
]) t.c:3 1473 {*andsi3_zarch}
 (expr_list:REG_DEAD (reg:SI 66 [ *b_4(D)+-3 ])
(expr_list:REG_DEAD (reg:DI 2 %r2 [ a ])
(expr_list:REG_UNUSED (reg:CC 33 %cc)
(nil)

but then LRA chooses to spill while reload is happy with the above.