[Bug rtl-optimization/56151] [4.8 Regression] Performance degradation after r194054 on x86 Atom.

2013-02-12 Thread jakub at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56151



--- Comment #12 from Jakub Jelinek jakub at gcc dot gnu.org 2013-02-12 
10:37:42 UTC ---

Author: jakub

Date: Tue Feb 12 10:37:38 2013

New Revision: 195972



URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=195972

Log:

PR rtl-optimization/56151

* optabs.c (add_equal_note): Don't return 0 if target is a MEM,

equal to op0 or op1, and last_insn pattern is CODE operation

with MEM dest and one of the operands matches that MEM.



* gcc.target/i386/pr56151.c: New test.



Added:

trunk/gcc/testsuite/gcc.target/i386/pr56151.c

Modified:

trunk/gcc/ChangeLog

trunk/gcc/optabs.c

trunk/gcc/testsuite/ChangeLog


[Bug rtl-optimization/56151] [4.8 Regression] Performance degradation after r194054 on x86 Atom.

2013-02-12 Thread jakub at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56151



Jakub Jelinek jakub at gcc dot gnu.org changed:



   What|Removed |Added



 Status|NEW |RESOLVED

 Resolution||FIXED



--- Comment #13 from Jakub Jelinek jakub at gcc dot gnu.org 2013-02-12 
10:39:17 UTC ---

Fixed.


[Bug rtl-optimization/56151] [4.8 Regression] Performance degradation after r194054 on x86 Atom.

2013-02-11 Thread jakub at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56151



--- Comment #11 from Jakub Jelinek jakub at gcc dot gnu.org 2013-02-11 
15:42:12 UTC ---

Created attachment 29419

  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=29419

gcc48-pr56151.patch



Untested patch, done in add_equal_note, so that the code doesn't have to be

duplicated between expand_binop_directly and expand_unop_directly.


[Bug rtl-optimization/56151] [4.8 Regression] Performance degradation after r194054 on x86 Atom.

2013-02-08 Thread rguenth at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56151



Richard Biener rguenth at gcc dot gnu.org changed:



   What|Removed |Added



   Priority|P3  |P1


[Bug rtl-optimization/56151] [4.8 Regression] Performance degradation after r194054 on x86 Atom.

2013-02-08 Thread jakub at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56151



--- Comment #10 from Jakub Jelinek jakub at gcc dot gnu.org 2013-02-08 
15:29:12 UTC ---

(In reply to comment #4)

 Patch to help explain the problem:

 

 Index: optabs.c

 ===

 --- optabs.c(revision 195687)

 +++ optabs.c(working copy)

 @@ -1452,8 +1452,13 @@ expand_binop_directly (enum machine_mode

  {

/* If PAT is composed of more than one insn, try to add an appropriate

  REG_EQUAL note to it.  If we can't because TEMP conflicts with an

 -operand, call expand_binop again, this time without a target.  */

 +operand, call expand_binop again, this time without a target.

 +

 +However, if target is a MEM, just accept the lossage of not having

 +a REG_EQUAL note.  This avoids splitting up insns of the form

 +MEM=MEM op X, a form we may not be able to reconstruct later.  */

if (INSN_P (pat)  NEXT_INSN (pat) != NULL_RTX

 +  ! (target  MEM_P (target))

! add_equal_note (pat, ops[0].value, optab_to_code (binoptab),

ops[1].value, ops[2].value))

 {



Shouldn't the  ! (target  MEM_P (target))

come after add_equal_note call, not before it?  I.e. shouldn't we try to add

the note if possible?


[Bug rtl-optimization/56151] [4.8 Regression] Performance degradation after r194054 on x86 Atom.

2013-02-04 Thread ebotcazou at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56151



Eric Botcazou ebotcazou at gcc dot gnu.org changed:



   What|Removed |Added



 CC||ebotcazou at gcc dot

   ||gnu.org



--- Comment #5 from Eric Botcazou ebotcazou at gcc dot gnu.org 2013-02-04 
10:34:25 UTC ---

 Patch to help explain the problem:

 

 Index: optabs.c

 ===

 --- optabs.c(revision 195687)

 +++ optabs.c(working copy)

 @@ -1452,8 +1452,13 @@ expand_binop_directly (enum machine_mode

  {

/* If PAT is composed of more than one insn, try to add an appropriate

  REG_EQUAL note to it.  If we can't because TEMP conflicts with an

 -operand, call expand_binop again, this time without a target.  */

 +operand, call expand_binop again, this time without a target.

 +

 +However, if target is a MEM, just accept the lossage of not having

 +a REG_EQUAL note.  This avoids splitting up insns of the form

 +MEM=MEM op X, a form we may not be able to reconstruct later.  */

if (INSN_P (pat)  NEXT_INSN (pat) != NULL_RTX

 +  ! (target  MEM_P (target))

! add_equal_note (pat, ops[0].value, optab_to_code (binoptab),

ops[1].value, ops[2].value))

 {



I agree that we should try to do something like that.  Generating inferior code

to be able to add a REG_EQUAL that doesn't help in the end isn't very clever.

There is also the same pattern in expand_unop_direct.



Is that the maximal test that can be added to catch this case or could it be

made more precise?


[Bug rtl-optimization/56151] [4.8 Regression] Performance degradation after r194054 on x86 Atom.

2013-02-04 Thread jakub at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56151



Jakub Jelinek jakub at gcc dot gnu.org changed:



   What|Removed |Added



 CC||jakub at gcc dot gnu.org,

   ||uros at gcc dot gnu.org



--- Comment #6 from Jakub Jelinek jakub at gcc dot gnu.org 2013-02-04 
12:03:52 UTC ---

Well, we already have peephole2s for MEM op= ARG (the TARGET_READ_MODIFY_WRITE

guarded peephole2s), the reason why it doesn't trigger in this case is that ARG

in this case ends up being a MEM, and

MEM op= MEM2 isn't a valid insn:



(insn 95 94 96 11 (set (reg:SI 3 bx [orig:135 D.1801 ] [135])

(mem:SI (reg/f:SI 4 si [orig:86 D.1803 ] [86]) [2 *_41+0 S4 A32]))

pr56151.c:32 89 {*movsi_internal}

 (nil))

(insn 96 95 97 11 (parallel [

(set (reg:SI 3 bx [orig:135 D.1801 ] [135])

(ior:SI (reg:SI 3 bx [orig:135 D.1801 ] [135])

(mem:SI (plus:SI (mult:SI (reg:SI 6 bp [orig:132 D.1801 ]

[132])

(const_int 4 [0x4]))

(symbol_ref:SI (setmask)  var_decl

0x7f587927a390 setmask)) [2 setmask S4 A32])))

(clobber (reg:CC 17 flags))

]) pr56151.c:32 421 {*iorsi_1}

 (expr_list:REG_DEAD (reg:SI 6 bp [orig:132 D.1801 ] [132])

(nil)))

(insn 97 96 98 11 (set (mem:SI (reg/f:SI 4 si [orig:86 D.1803 ] [86]) [2 *_41+0

S4 A32])

(reg:SI 3 bx [orig:135 D.1801 ] [135])) pr56151.c:32 89

{*movsi_internal}

 (expr_list:REG_DEAD (reg:SI 3 bx [orig:135 D.1801 ] [135])

(expr_list:REG_DEAD (reg/f:SI 4 si [orig:86 D.1803 ] [86])

(nil



I guess for temp = MEM; temp op= MEM2; MEM = temp; for temp dead afterwards

(this case), we could as well just add another set of the (up to 3?)

peephole2's, that would transform that into

temp = MEM2;

MEM op= temp;

instead.


[Bug rtl-optimization/56151] [4.8 Regression] Performance degradation after r194054 on x86 Atom.

2013-02-04 Thread jakub at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56151



--- Comment #7 from Jakub Jelinek jakub at gcc dot gnu.org 2013-02-04 
17:13:52 UTC ---

Created attachment 29350

  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=29350

gcc48-pr56151.patch



Untested patch for the peephole mentioned in previous comment.


[Bug rtl-optimization/56151] [4.8 Regression] Performance degradation after r194054 on x86 Atom.

2013-02-04 Thread steven at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56151



--- Comment #8 from Steven Bosscher steven at gcc dot gnu.org 2013-02-04 
17:59:22 UTC ---

(In reply to comment #7)

 Created attachment 29350 [details]

 gcc48-pr56151.patch

 

 Untested patch for the peephole mentioned in previous comment.



I don't think a new set of peepholes is an appropriate solution for

this problem, because the same issue can show up on any CISC-like

target. Before my patch, the code was good after 'expand' by optabs,

perhaps the code your peephole tries to create, can be emitted from

there instead. And perhaps, longer term, combine should be changed

to try the more profitable combination before the one it performs now.


[Bug rtl-optimization/56151] [4.8 Regression] Performance degradation after r194054 on x86 Atom.

2013-02-04 Thread jakub at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56151



--- Comment #9 from Jakub Jelinek jakub at gcc dot gnu.org 2013-02-04 
18:20:09 UTC ---

(In reply to comment #8)

 (In reply to comment #7)

  Created attachment 29350 [details]

  gcc48-pr56151.patch

  

  Untested patch for the peephole mentioned in previous comment.

 

 I don't think a new set of peepholes is an appropriate solution for

 this problem, because the same issue can show up on any CISC-like

 target. Before my patch, the code was good after 'expand' by optabs,

 perhaps the code your peephole tries to create, can be emitted from

 there instead. And perhaps, longer term, combine should be changed

 to try the more profitable combination before the one it performs now.



My patch wasn't meant as a replacement of the optabs.c change, but as a

complement.  As the sequence into which it attempts to peephole is shorter and

supposedly faster on some CPUs at least, if such an insn sequence appears in

the IL anywhere during RTL optimizations, it can optimize it.


[Bug rtl-optimization/56151] [4.8 Regression] Performance degradation after r194054 on x86 Atom.

2013-02-02 Thread steven at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56151



Steven Bosscher steven at gcc dot gnu.org changed:



   What|Removed |Added



 Status|UNCONFIRMED |NEW

   Last reconfirmed||2013-02-03

 CC|steven at gcc dot gnu.org   |

 Ever Confirmed|0   |1



--- Comment #3 from Steven Bosscher steven at gcc dot gnu.org 2013-02-03 
00:45:37 UTC ---

Confirmed.



My patch only avoids situations that ought not to happen in the first

place: self-referencing REG_EQUAL notes.  If this somehow prevents an

interesting code transformation I don't consider it my responsibility

to find out what that might be. Therefore I'm not taking this bug.



I did look at the difference in generated code. The regression is due

to the optabs.c change, reverting it results in the pre-r194054 code

again. The difference in code from the 'expand' pass is:



86: NOTE_INSN_BASIC_BLOCK 13

87: {r130:SI=r60:SI0x5;clobber flags:CC;}

88: {r131:SI=r130:SI0x2;clobber flags:CC;}

89: {r86:SI=r103:SI+r131:SI;clobber flags:CC;}

90: {r132:SI=r60:SI0x1f;clobber flags:CC;}

91: r133:SI=`setmask'

-   92: r134:SI=[r132:SI*0x4+r133:SI]

-   93: {[r86:SI]=[r86:SI]|r134:SI;clobber flags:CC;}

+   94: r136:SI=[r132:SI*0x4+r133:SI]

+   95: r137:SI=[r86:SI]

+   96: {r135:SI=r137:SI|r136:SI;clobber flags:CC;}

+   97: [r86:SI]=r135:SI

   REG_EQUAL [r86:SI]|[r132:SI*0x4+r133:SI]

// next basic block



The '+' code is the post-patch code, '-' is pre-patch (the continuous

numbering is due to the '-' code being rejected post-past, so those two

insn UIDs are lost).



Post-patch the extra moves of insns 95 and 97 are never cleaned up. The

extra move is introduced to make room for the REG_EQUAL note that is 

self-referencing in the pre-patch situation (SET of r86 with REG_EQUAL

note using r86). So the pre-patch RTL was indeed invalid.



Combine fails to restore the code to its pre-patch state because it

combines:



   94: r136:SI=[r132:SI*0x4+`setmask']

   95: r137:SI=[r86:SI]

   96: {r135:SI=r137:SI|r136:SI;clobber flags:CC;}



to:



   95: r137:SI=[r86:SI]

   96: {r135:SI=r137:SI|[r132:SI*0x4+`setmask'];clobber flags:CC;}



and the 'setmask' reference make merging the SET of r135 and the store

impossible:



Trying 96 - 97:

Failed to match this instruction:

(parallel [

(set (mem:SI (reg/f:SI 86 [ D.1803 ]) [2 *_41+0 S4 A32])

(ior:SI (reg:SI 137 [ *_41 ])

(mem:SI (plus:SI (mult:SI (reg:SI 132 [ D.1801 ])

(const_int 4 [0x4]))

(symbol_ref:SI (setmask) var_decl 0x763fb390

setmask)) [2 setmaskD.1732 S4 A32])))

(clobber (reg:CC 17 flags))

])

Failed to match this instruction:

(set (mem:SI (reg/f:SI 86 [ D.1803 ]) [2 *_41+0 S4 A32])

(ior:SI (reg:SI 137 [ *_41 ])

(mem:SI (plus:SI (mult:SI (reg:SI 132 [ D.1801 ])

(const_int 4 [0x4]))

(symbol_ref:SI (setmask) var_decl 0x763fb390 setmask))

[2 setmaskD.1732 S4 A32])))


[Bug rtl-optimization/56151] [4.8 Regression] Performance degradation after r194054 on x86 Atom.

2013-02-02 Thread steven at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56151



Steven Bosscher steven at gcc dot gnu.org changed:



   What|Removed |Added



 CC||steven at gcc dot gnu.org



--- Comment #4 from Steven Bosscher steven at gcc dot gnu.org 2013-02-03 
01:05:34 UTC ---

Patch to help explain the problem:



Index: optabs.c

===

--- optabs.c(revision 195687)

+++ optabs.c(working copy)

@@ -1452,8 +1452,13 @@ expand_binop_directly (enum machine_mode

 {

   /* If PAT is composed of more than one insn, try to add an appropriate

 REG_EQUAL note to it.  If we can't because TEMP conflicts with an

-operand, call expand_binop again, this time without a target.  */

+operand, call expand_binop again, this time without a target.

+

+However, if target is a MEM, just accept the lossage of not having

+a REG_EQUAL note.  This avoids splitting up insns of the form

+MEM=MEM op X, a form we may not be able to reconstruct later.  */

   if (INSN_P (pat)  NEXT_INSN (pat) != NULL_RTX

+  ! (target  MEM_P (target))

   ! add_equal_note (pat, ops[0].value, optab_to_code (binoptab),

   ops[1].value, ops[2].value))

{


[Bug rtl-optimization/56151] [4.8 Regression] Performance degradation after r194054 on x86 Atom.

2013-01-31 Thread rguenth at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56151



Richard Biener rguenth at gcc dot gnu.org changed:



   What|Removed |Added



   Keywords||missed-optimization

 Target||i?86-*-*

   Target Milestone|--- |4.8.0

Summary|Performance degradation |[4.8 Regression]

   |after r194054 on x86 Atom.  |Performance degradation

   ||after r194054 on x86 Atom.