[Bug rtl-optimization/56151] [4.8 Regression] Performance degradation after r194054 on x86 Atom.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56151 --- Comment #12 from Jakub Jelinek jakub at gcc dot gnu.org 2013-02-12 10:37:42 UTC --- Author: jakub Date: Tue Feb 12 10:37:38 2013 New Revision: 195972 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=195972 Log: PR rtl-optimization/56151 * optabs.c (add_equal_note): Don't return 0 if target is a MEM, equal to op0 or op1, and last_insn pattern is CODE operation with MEM dest and one of the operands matches that MEM. * gcc.target/i386/pr56151.c: New test. Added: trunk/gcc/testsuite/gcc.target/i386/pr56151.c Modified: trunk/gcc/ChangeLog trunk/gcc/optabs.c trunk/gcc/testsuite/ChangeLog
[Bug rtl-optimization/56151] [4.8 Regression] Performance degradation after r194054 on x86 Atom.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56151 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added Status|NEW |RESOLVED Resolution||FIXED --- Comment #13 from Jakub Jelinek jakub at gcc dot gnu.org 2013-02-12 10:39:17 UTC --- Fixed.
[Bug rtl-optimization/56151] [4.8 Regression] Performance degradation after r194054 on x86 Atom.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56151 --- Comment #11 from Jakub Jelinek jakub at gcc dot gnu.org 2013-02-11 15:42:12 UTC --- Created attachment 29419 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=29419 gcc48-pr56151.patch Untested patch, done in add_equal_note, so that the code doesn't have to be duplicated between expand_binop_directly and expand_unop_directly.
[Bug rtl-optimization/56151] [4.8 Regression] Performance degradation after r194054 on x86 Atom.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56151 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added Priority|P3 |P1
[Bug rtl-optimization/56151] [4.8 Regression] Performance degradation after r194054 on x86 Atom.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56151 --- Comment #10 from Jakub Jelinek jakub at gcc dot gnu.org 2013-02-08 15:29:12 UTC --- (In reply to comment #4) Patch to help explain the problem: Index: optabs.c === --- optabs.c(revision 195687) +++ optabs.c(working copy) @@ -1452,8 +1452,13 @@ expand_binop_directly (enum machine_mode { /* If PAT is composed of more than one insn, try to add an appropriate REG_EQUAL note to it. If we can't because TEMP conflicts with an -operand, call expand_binop again, this time without a target. */ +operand, call expand_binop again, this time without a target. + +However, if target is a MEM, just accept the lossage of not having +a REG_EQUAL note. This avoids splitting up insns of the form +MEM=MEM op X, a form we may not be able to reconstruct later. */ if (INSN_P (pat) NEXT_INSN (pat) != NULL_RTX + ! (target MEM_P (target)) ! add_equal_note (pat, ops[0].value, optab_to_code (binoptab), ops[1].value, ops[2].value)) { Shouldn't the ! (target MEM_P (target)) come after add_equal_note call, not before it? I.e. shouldn't we try to add the note if possible?
[Bug rtl-optimization/56151] [4.8 Regression] Performance degradation after r194054 on x86 Atom.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56151 Eric Botcazou ebotcazou at gcc dot gnu.org changed: What|Removed |Added CC||ebotcazou at gcc dot ||gnu.org --- Comment #5 from Eric Botcazou ebotcazou at gcc dot gnu.org 2013-02-04 10:34:25 UTC --- Patch to help explain the problem: Index: optabs.c === --- optabs.c(revision 195687) +++ optabs.c(working copy) @@ -1452,8 +1452,13 @@ expand_binop_directly (enum machine_mode { /* If PAT is composed of more than one insn, try to add an appropriate REG_EQUAL note to it. If we can't because TEMP conflicts with an -operand, call expand_binop again, this time without a target. */ +operand, call expand_binop again, this time without a target. + +However, if target is a MEM, just accept the lossage of not having +a REG_EQUAL note. This avoids splitting up insns of the form +MEM=MEM op X, a form we may not be able to reconstruct later. */ if (INSN_P (pat) NEXT_INSN (pat) != NULL_RTX + ! (target MEM_P (target)) ! add_equal_note (pat, ops[0].value, optab_to_code (binoptab), ops[1].value, ops[2].value)) { I agree that we should try to do something like that. Generating inferior code to be able to add a REG_EQUAL that doesn't help in the end isn't very clever. There is also the same pattern in expand_unop_direct. Is that the maximal test that can be added to catch this case or could it be made more precise?
[Bug rtl-optimization/56151] [4.8 Regression] Performance degradation after r194054 on x86 Atom.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56151 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added CC||jakub at gcc dot gnu.org, ||uros at gcc dot gnu.org --- Comment #6 from Jakub Jelinek jakub at gcc dot gnu.org 2013-02-04 12:03:52 UTC --- Well, we already have peephole2s for MEM op= ARG (the TARGET_READ_MODIFY_WRITE guarded peephole2s), the reason why it doesn't trigger in this case is that ARG in this case ends up being a MEM, and MEM op= MEM2 isn't a valid insn: (insn 95 94 96 11 (set (reg:SI 3 bx [orig:135 D.1801 ] [135]) (mem:SI (reg/f:SI 4 si [orig:86 D.1803 ] [86]) [2 *_41+0 S4 A32])) pr56151.c:32 89 {*movsi_internal} (nil)) (insn 96 95 97 11 (parallel [ (set (reg:SI 3 bx [orig:135 D.1801 ] [135]) (ior:SI (reg:SI 3 bx [orig:135 D.1801 ] [135]) (mem:SI (plus:SI (mult:SI (reg:SI 6 bp [orig:132 D.1801 ] [132]) (const_int 4 [0x4])) (symbol_ref:SI (setmask) var_decl 0x7f587927a390 setmask)) [2 setmask S4 A32]))) (clobber (reg:CC 17 flags)) ]) pr56151.c:32 421 {*iorsi_1} (expr_list:REG_DEAD (reg:SI 6 bp [orig:132 D.1801 ] [132]) (nil))) (insn 97 96 98 11 (set (mem:SI (reg/f:SI 4 si [orig:86 D.1803 ] [86]) [2 *_41+0 S4 A32]) (reg:SI 3 bx [orig:135 D.1801 ] [135])) pr56151.c:32 89 {*movsi_internal} (expr_list:REG_DEAD (reg:SI 3 bx [orig:135 D.1801 ] [135]) (expr_list:REG_DEAD (reg/f:SI 4 si [orig:86 D.1803 ] [86]) (nil I guess for temp = MEM; temp op= MEM2; MEM = temp; for temp dead afterwards (this case), we could as well just add another set of the (up to 3?) peephole2's, that would transform that into temp = MEM2; MEM op= temp; instead.
[Bug rtl-optimization/56151] [4.8 Regression] Performance degradation after r194054 on x86 Atom.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56151 --- Comment #7 from Jakub Jelinek jakub at gcc dot gnu.org 2013-02-04 17:13:52 UTC --- Created attachment 29350 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=29350 gcc48-pr56151.patch Untested patch for the peephole mentioned in previous comment.
[Bug rtl-optimization/56151] [4.8 Regression] Performance degradation after r194054 on x86 Atom.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56151 --- Comment #8 from Steven Bosscher steven at gcc dot gnu.org 2013-02-04 17:59:22 UTC --- (In reply to comment #7) Created attachment 29350 [details] gcc48-pr56151.patch Untested patch for the peephole mentioned in previous comment. I don't think a new set of peepholes is an appropriate solution for this problem, because the same issue can show up on any CISC-like target. Before my patch, the code was good after 'expand' by optabs, perhaps the code your peephole tries to create, can be emitted from there instead. And perhaps, longer term, combine should be changed to try the more profitable combination before the one it performs now.
[Bug rtl-optimization/56151] [4.8 Regression] Performance degradation after r194054 on x86 Atom.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56151 --- Comment #9 from Jakub Jelinek jakub at gcc dot gnu.org 2013-02-04 18:20:09 UTC --- (In reply to comment #8) (In reply to comment #7) Created attachment 29350 [details] gcc48-pr56151.patch Untested patch for the peephole mentioned in previous comment. I don't think a new set of peepholes is an appropriate solution for this problem, because the same issue can show up on any CISC-like target. Before my patch, the code was good after 'expand' by optabs, perhaps the code your peephole tries to create, can be emitted from there instead. And perhaps, longer term, combine should be changed to try the more profitable combination before the one it performs now. My patch wasn't meant as a replacement of the optabs.c change, but as a complement. As the sequence into which it attempts to peephole is shorter and supposedly faster on some CPUs at least, if such an insn sequence appears in the IL anywhere during RTL optimizations, it can optimize it.
[Bug rtl-optimization/56151] [4.8 Regression] Performance degradation after r194054 on x86 Atom.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56151 Steven Bosscher steven at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2013-02-03 CC|steven at gcc dot gnu.org | Ever Confirmed|0 |1 --- Comment #3 from Steven Bosscher steven at gcc dot gnu.org 2013-02-03 00:45:37 UTC --- Confirmed. My patch only avoids situations that ought not to happen in the first place: self-referencing REG_EQUAL notes. If this somehow prevents an interesting code transformation I don't consider it my responsibility to find out what that might be. Therefore I'm not taking this bug. I did look at the difference in generated code. The regression is due to the optabs.c change, reverting it results in the pre-r194054 code again. The difference in code from the 'expand' pass is: 86: NOTE_INSN_BASIC_BLOCK 13 87: {r130:SI=r60:SI0x5;clobber flags:CC;} 88: {r131:SI=r130:SI0x2;clobber flags:CC;} 89: {r86:SI=r103:SI+r131:SI;clobber flags:CC;} 90: {r132:SI=r60:SI0x1f;clobber flags:CC;} 91: r133:SI=`setmask' - 92: r134:SI=[r132:SI*0x4+r133:SI] - 93: {[r86:SI]=[r86:SI]|r134:SI;clobber flags:CC;} + 94: r136:SI=[r132:SI*0x4+r133:SI] + 95: r137:SI=[r86:SI] + 96: {r135:SI=r137:SI|r136:SI;clobber flags:CC;} + 97: [r86:SI]=r135:SI REG_EQUAL [r86:SI]|[r132:SI*0x4+r133:SI] // next basic block The '+' code is the post-patch code, '-' is pre-patch (the continuous numbering is due to the '-' code being rejected post-past, so those two insn UIDs are lost). Post-patch the extra moves of insns 95 and 97 are never cleaned up. The extra move is introduced to make room for the REG_EQUAL note that is self-referencing in the pre-patch situation (SET of r86 with REG_EQUAL note using r86). So the pre-patch RTL was indeed invalid. Combine fails to restore the code to its pre-patch state because it combines: 94: r136:SI=[r132:SI*0x4+`setmask'] 95: r137:SI=[r86:SI] 96: {r135:SI=r137:SI|r136:SI;clobber flags:CC;} to: 95: r137:SI=[r86:SI] 96: {r135:SI=r137:SI|[r132:SI*0x4+`setmask'];clobber flags:CC;} and the 'setmask' reference make merging the SET of r135 and the store impossible: Trying 96 - 97: Failed to match this instruction: (parallel [ (set (mem:SI (reg/f:SI 86 [ D.1803 ]) [2 *_41+0 S4 A32]) (ior:SI (reg:SI 137 [ *_41 ]) (mem:SI (plus:SI (mult:SI (reg:SI 132 [ D.1801 ]) (const_int 4 [0x4])) (symbol_ref:SI (setmask) var_decl 0x763fb390 setmask)) [2 setmaskD.1732 S4 A32]))) (clobber (reg:CC 17 flags)) ]) Failed to match this instruction: (set (mem:SI (reg/f:SI 86 [ D.1803 ]) [2 *_41+0 S4 A32]) (ior:SI (reg:SI 137 [ *_41 ]) (mem:SI (plus:SI (mult:SI (reg:SI 132 [ D.1801 ]) (const_int 4 [0x4])) (symbol_ref:SI (setmask) var_decl 0x763fb390 setmask)) [2 setmaskD.1732 S4 A32])))
[Bug rtl-optimization/56151] [4.8 Regression] Performance degradation after r194054 on x86 Atom.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56151 Steven Bosscher steven at gcc dot gnu.org changed: What|Removed |Added CC||steven at gcc dot gnu.org --- Comment #4 from Steven Bosscher steven at gcc dot gnu.org 2013-02-03 01:05:34 UTC --- Patch to help explain the problem: Index: optabs.c === --- optabs.c(revision 195687) +++ optabs.c(working copy) @@ -1452,8 +1452,13 @@ expand_binop_directly (enum machine_mode { /* If PAT is composed of more than one insn, try to add an appropriate REG_EQUAL note to it. If we can't because TEMP conflicts with an -operand, call expand_binop again, this time without a target. */ +operand, call expand_binop again, this time without a target. + +However, if target is a MEM, just accept the lossage of not having +a REG_EQUAL note. This avoids splitting up insns of the form +MEM=MEM op X, a form we may not be able to reconstruct later. */ if (INSN_P (pat) NEXT_INSN (pat) != NULL_RTX + ! (target MEM_P (target)) ! add_equal_note (pat, ops[0].value, optab_to_code (binoptab), ops[1].value, ops[2].value)) {
[Bug rtl-optimization/56151] [4.8 Regression] Performance degradation after r194054 on x86 Atom.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56151 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added Keywords||missed-optimization Target||i?86-*-* Target Milestone|--- |4.8.0 Summary|Performance degradation |[4.8 Regression] |after r194054 on x86 Atom. |Performance degradation ||after r194054 on x86 Atom.