[Bug lto/66752] spec2000 255.vortex performance compiled with GCC is ~20% lower than with CLANG
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66752 --- Comment #18 from Jeffrey A. Law law at gcc dot gnu.org --- Author: law Date: Fri Aug 28 16:23:12 2015 New Revision: 227307 URL: https://gcc.gnu.org/viewcvs?rev=227307root=gccview=rev Log: [PATCH][lto/66752] Fix missed FSM jump thread PR lto/66752 * tree-ssa-threadedge.c (simplify_conrol_stmt_condition): If we are unable to find X NE 0 in the tables, return X as the simplified condition. (fsm_find_control_statement_thread_paths): If nodes in NEXT_PATH are in VISISTED_BBS, then return failure. Else add nodes from NEXT_PATH to VISISTED_BBS. * tree-ssa-threadupdate.c (duplicate_thread_path): Fix up edge flags after removing the control flow statement and unnecessary edges. PR lto/66752 * gcc.dg/tree-ssa/pr66752-2.c: New test. * gcc.dg/torture/pr66752-1.c: New test * g++.dg/torture/pr66752-2.C: New test. Added: trunk/gcc/testsuite/g++.dg/torture/pr66752-2.C trunk/gcc/testsuite/gcc.dg/torture/pr66752-1.c trunk/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-ssa-threadedge.c trunk/gcc/tree-ssa-threadupdate.c
[Bug lto/66752] spec2000 255.vortex performance compiled with GCC is ~20% lower than with CLANG
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66752 Jeffrey A. Law law at redhat dot com changed: What|Removed |Added Status|REOPENED|RESOLVED Resolution|--- |FIXED --- Comment #19 from Jeffrey A. Law law at redhat dot com --- Patch reinstalled on the trunk with a fix for the bootstrapping issue exposed by ppc64.
[Bug lto/66752] spec2000 255.vortex performance compiled with GCC is ~20% lower than with CLANG
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66752 --- Comment #17 from Jeffrey A. Law law at redhat dot com --- The fix for the ppc64 bootstrap regression looks good. I'm just having a bear of a time producing a reasonable test for the regression suite.
[Bug lto/66752] spec2000 255.vortex performance compiled with GCC is ~20% lower than with CLANG
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66752 --- Comment #16 from Jeffrey A. Law law at redhat dot com --- Just a status update. This patch causes the stage1 compiler to mis-compile tree-ssa-live, which in turn causes the stage2 compiler to incorrectly issue an error when building the stage3 compiler on ppc64.
[Bug lto/66752] spec2000 255.vortex performance compiled with GCC is ~20% lower than with CLANG
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66752 Igor Zamyatin izamyatin at gmail dot com changed: What|Removed |Added CC||izamyatin at gmail dot com --- Comment #13 from Igor Zamyatin izamyatin at gmail dot com --- Why the patch has been reverted?
[Bug lto/66752] spec2000 255.vortex performance compiled with GCC is ~20% lower than with CLANG
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66752 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added Status|RESOLVED|REOPENED Resolution|FIXED |--- --- Comment #14 from Richard Biener rguenth at gcc dot gnu.org --- Reopen.
[Bug lto/66752] spec2000 255.vortex performance compiled with GCC is ~20% lower than with CLANG
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66752 --- Comment #15 from Jeffrey A. Law law at redhat dot com --- Causes bootstrap failure on ppc64 that I haven't had time to dig into.
[Bug lto/66752] spec2000 255.vortex performance compiled with GCC is ~20% lower than with CLANG
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66752 --- Comment #11 from Richard Biener rguenth at gcc dot gnu.org --- Confirmed by our testers.
[Bug lto/66752] spec2000 255.vortex performance compiled with GCC is ~20% lower than with CLANG
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66752 --- Comment #12 from Jeffrey A. Law law at gcc dot gnu.org --- Author: law Date: Sat Jul 25 05:45:42 2015 New Revision: 226206 URL: https://gcc.gnu.org/viewcvs?rev=226206root=gccview=rev Log: Revert: PR lto/66752 * tree-ssa-threadedge.c (simplify_conrol_stmt_condition): If we are unable to find X NE 0 in the tables, return X as the simplified condition. (fsm_find_control_statement_thread_paths): If nodes in NEXT_PATH are in VISISTED_BBS, then return failure. Else add nodes from NEXT_PATH to VISISTED_BBS. */ * tree-ssa-threadupdate.c (duplicate_thread_path): Fix up edge flags after removing the control flow statement and unnecessary edges. testsuite/ PR lto/66752 * gcc.dg/tree-ssa/pr66752-2.c: New test. * gcc.dg/torture/pr66752-1.c: New test * g++.dg/torture/pr66752-2.C: New test. Removed: trunk/gcc/testsuite/g++.dg/torture/pr66752-2.C trunk/gcc/testsuite/gcc.dg/torture/pr66752-1.c trunk/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-ssa-threadedge.c trunk/gcc/tree-ssa-threadupdate.c
[Bug lto/66752] spec2000 255.vortex performance compiled with GCC is ~20% lower than with CLANG
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66752 --- Comment #8 from Jeffrey A. Law law at redhat dot com --- After tracking down a couple bugs in the FSM support, I'm about ready to check in a patch that should address the missed jump threads.
[Bug lto/66752] spec2000 255.vortex performance compiled with GCC is ~20% lower than with CLANG
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66752 Jeffrey A. Law law at redhat dot com changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #10 from Jeffrey A. Law law at redhat dot com --- Should be fixed on the trunk. If you could verify that 255.vortex's performance has improved, it'd be appreciated. Thanks.
[Bug lto/66752] spec2000 255.vortex performance compiled with GCC is ~20% lower than with CLANG
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66752 --- Comment #9 from Jeffrey A. Law law at gcc dot gnu.org --- Author: law Date: Thu Jul 23 20:42:15 2015 New Revision: 226125 URL: https://gcc.gnu.org/viewcvs?rev=226125root=gccview=rev Log: PR lto/66752 * tree-ssa-threadedge.c (simplify_conrol_stmt_condition): If we are unable to find X NE 0 in the tables, return X as the simplified condition. (fsm_find_control_statement_thread_paths): If nodes in NEXT_PATH are in VISISTED_BBS, then return failure. Else add nodes from NEXT_PATH to VISISTED_BBS. */ * tree-ssa-threadupdate.c (duplicate_thread_path): Fix up edge flags after removing the control flow statement and unnecessary edges. testsuite/ PR lto/66752 * gcc.dg/tree-ssa/pr66752-2.c: New test. * gcc.dg/torture/pr66752-1.c: New test * g++.dg/torture/pr66752-2.C: New test. Added: trunk/gcc/testsuite/g++.dg/torture/pr66752-2.C trunk/gcc/testsuite/gcc.dg/torture/pr66752-1.c trunk/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-ssa-threadedge.c trunk/gcc/tree-ssa-threadupdate.c
[Bug lto/66752] spec2000 255.vortex performance compiled with GCC is ~20% lower than with CLANG
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66752 Jeffrey A. Law law at redhat dot com changed: What|Removed |Added CC||law at redhat dot com --- Comment #7 from Jeffrey A. Law law at redhat dot com --- This is something I'd be expecting the FSM code to detect for us -- but that code isn't firing at all due to what appears to be a relatively simple logic error. Hacking it up under GDB's control results in the FSM code discovering 3 threadable paths within the loop. The net result is all the manipulation/testing of FLAG is eliminated. I'll be taking a deeper look over the next few days.
[Bug lto/66752] spec2000 255.vortex performance compiled with GCC is ~20% lower than with CLANG
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66752 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added Keywords||missed-optimization Status|UNCONFIRMED |NEW Last reconfirmed||2015-07-13 CC||law at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #6 from Richard Biener rguenth at gcc dot gnu.org --- This is simply a jump-threading opportunity that is not taken (with the complication that this threads through the loop header). I wonder whether the FSM thereading machiner could catch this though.
[Bug lto/66752] spec2000 255.vortex performance compiled with GCC is ~20% lower than with CLANG
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66752 --- Comment #5 from Andrew Pinski pinskia at gcc dot gnu.org --- Looks like bug 13876.
[Bug lto/66752] spec2000 255.vortex performance compiled with GCC is ~20% lower than with CLANG
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66752 Yuri Rumyantsev ysrumyan at gmail dot com changed: What|Removed |Added CC||ysrumyan at gmail dot com --- Comment #3 from Yuri Rumyantsev ysrumyan at gmail dot com --- We found out simple reproducer for this issue which is attached. The problem is redundant test in loop: for (i = -1, flag = 1; ++i N flag;) if (a[i] == b) set 'flag' to 0 do something It is clear that test 'flag==1' can be deleted since there is the only place where it is set up to zero (then-part). Note that clang deletes it.
[Bug lto/66752] spec2000 255.vortex performance compiled with GCC is ~20% lower than with CLANG
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66752 --- Comment #4 from Yuri Rumyantsev ysrumyan at gmail dot com --- Created attachment 35947 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35947action=edit test-case to reproduce compile with -Ofast -m32 -march=slm and notice redundant test .L30: testl %edx, %edx je .L1
[Bug lto/66752] spec2000 255.vortex performance compiled with GCC is ~20% lower than with CLANG
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66752 Stupachenko Evgeny evstupac at gmail dot com changed: What|Removed |Added CC||rth at gcc dot gnu.org --- Comment #2 from Stupachenko Evgeny evstupac at gmail dot com --- 20% I've got on Silvermont. But some gap should be on all x86 machines. Not sure about other archs, but it looks like the issue is not target specific.
[Bug lto/66752] spec2000 255.vortex performance compiled with GCC is ~20% lower than with CLANG
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66752 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added Target||i?86-*-* CC||rguenth at gcc dot gnu.org --- Comment #1 from Richard Biener rguenth at gcc dot gnu.org --- For me LTO helps quite a bit here. What kind of machine was this tested on?