[Bug rtl-optimization/82683] Combine: GCC generates bad code with -tune=thunderx2t99

2017-11-09 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82683

Segher Boessenkool  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #21 from Segher Boessenkool  ---
Fixed on all open branches.

[Bug rtl-optimization/82683] Combine: GCC generates bad code with -tune=thunderx2t99

2017-11-09 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82683

--- Comment #20 from Segher Boessenkool  ---
Author: segher
Date: Thu Nov  9 10:23:30 2017
New Revision: 254565

URL: https://gcc.gnu.org/viewcvs?rev=254565=gcc=rev
Log:
Backport from mainline
2017-11-01  Segher Boessenkool  

PR rtl-optimization/64682
PR rtl-optimization/69567
PR rtl-optimization/69737
PR rtl-optimization/82683
* combine.c (distribute_notes) : If the new I2 sets the same
register mentioned in the note, drop the note, unless it came from I3,
in which case it should go to I3 again.

Modified:
branches/gcc-6-branch/gcc/ChangeLog
branches/gcc-6-branch/gcc/combine.c

[Bug rtl-optimization/82683] Combine: GCC generates bad code with -tune=thunderx2t99

2017-11-09 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82683

--- Comment #19 from Segher Boessenkool  ---
Author: segher
Date: Thu Nov  9 10:21:06 2017
New Revision: 254564

URL: https://gcc.gnu.org/viewcvs?rev=254564=gcc=rev
Log:
Backport from mainline
2017-11-01  Segher Boessenkool  

PR rtl-optimization/64682
PR rtl-optimization/69567
PR rtl-optimization/69737
PR rtl-optimization/82683
* combine.c (distribute_notes) : If the new I2 sets the same
register mentioned in the note, drop the note, unless it came from I3,
in which case it should go to I3 again.

Modified:
branches/gcc-7-branch/gcc/ChangeLog
branches/gcc-7-branch/gcc/combine.c

[Bug rtl-optimization/82683] Combine: GCC generates bad code with -tune=thunderx2t99

2017-11-01 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82683

--- Comment #18 from Segher Boessenkool  ---
Excellent, thanks for testing!  I'll backport it next week.

[Bug rtl-optimization/82683] Combine: GCC generates bad code with -tune=thunderx2t99

2017-11-01 Thread sje at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82683

--- Comment #17 from Steve Ellcey  ---
Yes, this fixed my SPEC problem.

[Bug rtl-optimization/82683] Combine: GCC generates bad code with -tune=thunderx2t99

2017-11-01 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82683

--- Comment #16 from Segher Boessenkool  ---
Should be fixed on trunk now.  Steve, could you see if it fixes the
SPEC problem for you?

[Bug rtl-optimization/82683] Combine: GCC generates bad code with -tune=thunderx2t99

2017-11-01 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82683

--- Comment #15 from Segher Boessenkool  ---
Author: segher
Date: Wed Nov  1 16:40:42 2017
New Revision: 254315

URL: https://gcc.gnu.org/viewcvs?rev=254315=gcc=rev
Log:
combine: Fix bug in giving up placing REG_DEAD notes (PR82683)

When we have a REG_DEAD note for a reg that is set in the new I2, we
drop the note on the floor (we cannot find whether to place it on I2
or on I3).  But the code I added to do this has a bug and does not
always actually drop it.  This patch fixes it.

But that on its own is too pessimistic, it turns out, and we generate
worse code.  One case where we do know where to place the note is if
it came from I3 (it should go to I3 again).  Doing this fixes all of
the regressions.


PR rtl-optimization/64682
PR rtl-optimization/69567
PR rtl-optimization/69737
PR rtl-optimization/82683
* combine.c (distribute_notes) : If the new I2 sets the same
register mentioned in the note, drop the note, unless it came from I3,
in which case it should go to I3 again.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/combine.c

[Bug rtl-optimization/82683] Combine: GCC generates bad code with -tune=thunderx2t99

2017-11-01 Thread sje at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82683

--- Comment #14 from Steve Ellcey  ---
(In reply to Segher Boessenkool from comment #13)
> I have a simpler patch.  It is testing...

Can you attach your patch to this defect so I can test it as well.

[Bug rtl-optimization/82683] Combine: GCC generates bad code with -tune=thunderx2t99

2017-10-27 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82683

Segher Boessenkool  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |segher at gcc dot 
gnu.org

--- Comment #13 from Segher Boessenkool  ---
I have a simpler patch.  It is testing...

[Bug rtl-optimization/82683] Combine: GCC generates bad code with -tune=thunderx2t99

2017-10-27 Thread sje at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82683

--- Comment #12 from Steve Ellcey  ---
Created attachment 42491
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42491=edit
Patch that fixes the test case

Here is a possible patch.  It fixes the test case and I am doing a bootstrap
and make check on it.

[Bug rtl-optimization/82683] Combine: GCC generates bad code with -tune=thunderx2t99

2017-10-26 Thread sje at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82683

--- Comment #11 from Steve Ellcey  ---
I think I see where this is going wrong but I don't know what to do about it.
In try_combine, line 3288 we have i2 and i3 of:

(insn 18 16 19 3 (set (reg:DI 91)
(ashift:DI (reg:DI 83 [ _26 ])
(const_int 2 [0x2]))) "x.i":31 674 {*aarch64_ashl_sisd_or_int_di3}
 (expr_list:REG_DEAD (reg:DI 83 [ _26 ])
(nil)))
(insn 19 18 20 3 (set (reg/f:DI 78 [ _7 ])
(plus:DI (reg/f:DI 76 [ _4 ])
(reg:DI 91))) "x.i":31 94 {*adddi3_aarch64}
 (expr_list:REG_DEAD (reg:DI 91)
(expr_list:REG_DEAD (reg/f:DI 76 [ _4 ])
(nil

After that if statement we have i2 and i3 looking like:

(insn 18 16 19 3 (set (reg:DI 91)
(ashift:DI (reg:DI 83 [ _26 ])
(const_int 2 [0x2]))) "x.i":31 674 {*aarch64_ashl_sisd_or_int_di3}
 (expr_list:REG_DEAD (reg:DI 83 [ _26 ])
(nil)))
(insn 19 18 20 3 (set (reg/f:DI 78 [ _7 ])
(plus:DI (ashift:DI (reg:DI 83 [ _26 ])
(const_int 2 [0x2]))
(reg/f:DI 76 [ _4 ]))) "x.i":31 94 {*adddi3_aarch64}
 (expr_list:REG_DEAD (reg:DI 91)
(expr_list:REG_DEAD (reg/f:DI 76 [ _4 ])
(nil


There is the bogus REG_DEAD of 83 on insn 18.  I don't know where or how this
should be fixed up though.

[Bug rtl-optimization/82683] Combine: GCC generates bad code with -tune=thunderx2t99

2017-10-26 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82683

--- Comment #10 from Segher Boessenkool  ---
reg-notes.def says

/* The value in REG dies in this insn (i.e., it is not needed past
   this insn).  If REG is set in this insn, the REG_DEAD note may,
   but need not, be omitted.  */
REG_NOTE (DEAD)

so the notes should have gone to insn 20 here, not insn 21 (which
is a new r83).  Or deleted altogether.

[Bug rtl-optimization/82683] Combine: GCC generates bad code with -tune=thunderx2t99

2017-10-24 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82683

--- Comment #9 from Segher Boessenkool  ---
The "failed to match" messages are hugely important (in fact, I want it
to print more: _why_ did combination fail, in all the cases where it is
not because of recog).

The "deferring deletion" messages are not from combine (from df, instead).
Yes they are annoying, much more so in other passes.

[Bug rtl-optimization/82683] Combine: GCC generates bad code with -tune=thunderx2t99

2017-10-24 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82683

--- Comment #8 from Wilco  ---
(In reply to Segher Boessenkool from comment #7)
> Yes, it requires to look back a bit (the info always is in this dump
> file though!)
> 
> The alternative would be to dump even more info, grow the log files
> by a factor two or so.

Well if the size of the dump is an issue, I guess all the failed to match cases
could be omitted and moved to a different option. Also you could keep the dump
a similar size size by emitting less defer/delete notes. Eg. this would make
understanding the output easier:

Combining insns 19, 20, 21 (old cost 8 + 20 + 4 = 32):
19: r78:DI=r83:DI<<0x2+r76:DI REG_DEAD r83:DI  REG_DEAD r76:DI
20: ...
21: ...
into (new cost 28 + 4 = 32):
19: (defer deletion)
20: r83:DI=sign_extend([r83:DI*0x4+r76:DI]) 
21: r82:SI=r83:DI#0 REG_DEAD r83:DI (defer rescan)

[Bug rtl-optimization/82683] Combine: GCC generates bad code with -tune=thunderx2t99

2017-10-24 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82683

--- Comment #7 from Segher Boessenkool  ---
Yes, it requires to look back a bit (the info always is in this dump
file though!)

The alternative would be to dump even more info, grow the log files
by a factor two or so.

[Bug rtl-optimization/82683] Combine: GCC generates bad code with -tune=thunderx2t99

2017-10-24 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82683

--- Comment #6 from Wilco  ---
(In reply to Segher Boessenkool from comment #5)
> Oh, it does show the intermediate results:
> 
> Trying 18 -> 19:
> Successfully matched this instruction:
> (set (reg/f:DI 78 [ _7 ])
> (plus:DI (ashift:DI (reg:DI 83 [ _26 ])
> (const_int 2 [0x2]))
> (reg/f:DI 76 [ _4 ])))
> allowing combination of insns 18 and 19
> original costs 4 + 4 = 8
> replacement cost 8
> deferring deletion of insn with uid = 18.
> modifying insn i319: r78:DI=r83:DI<<0x2+r76:DI
>   REG_DEAD r83:DI
>   REG_DEAD r76:DI
> deferring rescan insn with uid = 19.
> 
> (How do you dump things?  You forgot a -all?)

Yes, but this is all it shows for the bit that goes wrong:

allowing combination of insns 19, 20 and 21
original costs 8 + 20 + 4 = 32
replacement costs 28 + 4 = 32
deferring rescan insn with uid = 21.
deferring deletion of insn with uid = 19.
modifying insn i220: r83:DI=sign_extend([r83:DI*0x4+r76:DI])
  REG_DEAD r76:DI
deferring rescan insn with uid = 20.
modifying insn i321: r82:SI=r83:DI#0
  REG_DEAD r83:DI
  REG_DEAD r83:DI
  REG_DEAD r83:DI
deferring rescan insn with uid = 21.

I guess it would be good if it shows the instructions it's trying to combine in
this part, because it's impossible to follow when the instructions involved
have been changed already...

[Bug rtl-optimization/82683] Combine: GCC generates bad code with -tune=thunderx2t99

2017-10-24 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82683

--- Comment #5 from Segher Boessenkool  ---
Oh, it does show the intermediate results:

Trying 18 -> 19:
Successfully matched this instruction:
(set (reg/f:DI 78 [ _7 ])
(plus:DI (ashift:DI (reg:DI 83 [ _26 ])
(const_int 2 [0x2]))
(reg/f:DI 76 [ _4 ])))
allowing combination of insns 18 and 19
original costs 4 + 4 = 8
replacement cost 8
deferring deletion of insn with uid = 18.
modifying insn i319: r78:DI=r83:DI<<0x2+r76:DI
  REG_DEAD r83:DI
  REG_DEAD r76:DI
deferring rescan insn with uid = 19.

(How do you dump things?  You forgot a -all?)

[Bug rtl-optimization/82683] Combine: GCC generates bad code with -tune=thunderx2t99

2017-10-24 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82683

--- Comment #4 from Wilco  ---
(In reply to Segher Boessenkool from comment #3)
> Ah.  So we start with
> 
> insn_cost 4 for18: r91:DI=r83:DI<<0x2
>   REG_DEAD r83:DI
> insn_cost 4 for19: r78:DI=r76:DI+r91:DI
>   REG_DEAD r91:DI
>   REG_DEAD r76:DI
> insn_cost 20 for20: r82:SI=[r78:DI]
>   REG_DEAD r78:DI
> insn_cost 4 for21: r83:DI=sign_extend(r82:SI)
> 
> and first 18,19 are combined, and then 19,20,21.  Both 19 and 21 set
> r83, it is dead after 19, but that note is put on 21.

Yes, it doesn't show intermediate stages but I guess before merging 18+19+20
with 21 we have:

r82:SI=MEM([r83:DI*0x4+r76:DI]) REG_DEAD r83
r83:DI=sign_extend r82:SI

This is correct. However when it merges these, it swaps r82 for r83 on the
load, and now the REG_DEAD in the load needs to be removed as it is no longer
valid:

r83:DI=sign_extend([r83:DI*0x4+r76:DI]) ***REG_DEAD r83***

[Bug rtl-optimization/82683] Combine: GCC generates bad code with -tune=thunderx2t99

2017-10-24 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82683

--- Comment #3 from Segher Boessenkool  ---
Ah.  So we start with

insn_cost 4 for18: r91:DI=r83:DI<<0x2
  REG_DEAD r83:DI
insn_cost 4 for19: r78:DI=r76:DI+r91:DI
  REG_DEAD r91:DI
  REG_DEAD r76:DI
insn_cost 20 for20: r82:SI=[r78:DI]
  REG_DEAD r78:DI
insn_cost 4 for21: r83:DI=sign_extend(r82:SI)

and first 18,19 are combined, and then 19,20,21.  Both 19 and 21 set
r83, it is dead after 19, but that note is put on 21.

[Bug rtl-optimization/82683] Combine: GCC generates bad code with -tune=thunderx2t99

2017-10-24 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82683

--- Comment #2 from Segher Boessenkool  ---
At the start of combine you have

insn_cost 4 for18: r91:DI=r83:DI<<0x2
  REG_DEAD r83:DI

Is that death note not correct?

[Bug rtl-optimization/82683] Combine: GCC generates bad code with -tune=thunderx2t99

2017-10-24 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82683

Wilco  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2017-10-24
 CC||segher at gcc dot gnu.org,
   ||wilco at gcc dot gnu.org
  Component|target  |rtl-optimization
Summary|GCC generates bad code with |Combine: GCC generates bad
   |-tune=thunderx2t99  |code with
   ||-tune=thunderx2t99
 Ever confirmed|0   |1

--- Comment #1 from Wilco  ---
Combine seems to do the correct thing at first, creating a ldrsw but it then
invents a spurious REG_DEAD for r83:

Successfully matched this instruction:
(set (reg:DI 83 [ _26 ])
(sign_extend:DI (mem:SI (plus:DI (mult:DI (reg:DI 83 [ _26 ])
(const_int 4 [0x4]))
(reg/f:DI 76 [ _4 ])) [8 *_7+0 S4 A32])))
Successfully matched this instruction:
(set (reg/v:SI 82 [ pD.3023 ])
(subreg:SI (reg:DI 83 [ _26 ]) 0))
allowing combination of insns 19, 20 and 21
original costs 8 + 20 + 4 = 32
replacement costs 28 + 4 = 32
deferring rescan insn with uid = 21.
deferring deletion of insn with uid = 19.
modifying insn i220: r83:DI=sign_extend([r83:DI*0x4+r76:DI])
  REG_DEAD r76:DI
deferring rescan insn with uid = 20.
modifying insn i321: r82:SI=r83:DI#0  ** all good so far, both r82/r83 are
live
  REG_DEAD r83:DI   *** OOPS
  REG_DEAD r83:DI
  REG_DEAD r83:DI
deferring rescan insn with uid = 21.

The REG_DEAD r83 incorrect as both r82 and r83 are still live. I think it may
be due to this earlier combine:

Trying 18 -> 19:
Successfully matched this instruction:
(set (reg/f:DI 78 [ _7 ])
(plus:DI (ashift:DI (reg:DI 83 [ _26 ])
(const_int 2 [0x2]))
(reg/f:DI 76 [ _4 ])))
allowing combination of insns 18 and 19
original costs 4 + 4 = 8
replacement cost 8
deferring deletion of insn with uid = 18.
modifying insn i319: r78:DI=r83:DI<<0x2+r76:DI
  REG_DEAD r83:DI
  REG_DEAD r76:DI
deferring rescan insn with uid = 19.

So I guess the bug is that when combining multiple instructions, dead notes
aren't removed if later instructions assign the the same register.