[Bug tree-optimization/83326] [8 Regression] SPEC CPU2017 648.exchange2_s ~6% performance regression with r255267 (reproducer attached)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83326 --- Comment #6 from Alexander Nesterovskiy --- Thanks! I see performance gain on 648.exchange2_s (~6% on Broadwell and ~3% on Skylake-X) reverting performance to r255266 level (Skylake-X regression was ~3%). And loops unrolled with 2 and 3 iterations. It's surely fixed.
[Bug tree-optimization/83326] [8 Regression] SPEC CPU2017 648.exchange2_s ~6% performance regression with r255267 (reproducer attached)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83326 --- Comment #5 from Richard Biener --- Author: rguenth Date: Thu Dec 14 14:32:24 2017 New Revision: 255635 URL: https://gcc.gnu.org/viewcvs?rev=255635=gcc=rev Log: 2017-12-14 Richard BienerPR tree-optimization/83326 * tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Add may_be_zero parameter and handle it by not marking the first peeled copy as not exiting the loop. (try_peel_loop): Likewise. (canonicalize_loop_induction_variables): Use number_of_iterations_exit to handle the case of constant or zero iterations and perform loop header copying on-the-fly. * gcc.dg/tree-ssa/pr81388-2.c: Adjust. Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.dg/tree-ssa/pr81388-2.c trunk/gcc/tree-ssa-loop-ivcanon.c
[Bug tree-optimization/83326] [8 Regression] SPEC CPU2017 648.exchange2_s ~6% performance regression with r255267 (reproducer attached)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83326 Richard Biener changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #4 from Richard Biener --- Should be fixed.
[Bug tree-optimization/83326] [8 Regression] SPEC CPU2017 648.exchange2_s ~6% performance regression with r255267 (reproducer attached)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83326 --- Comment #3 from Richard Biener --- Created attachment 42879 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42879=edit patch in testing Patch I am testing. Performance evaluation appreciated.
[Bug tree-optimization/83326] [8 Regression] SPEC CPU2017 648.exchange2_s ~6% performance regression with r255267 (reproducer attached)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83326 --- Comment #2 from Richard Biener --- We no longer unroll the inner loops in cunrolli because cunrolli will leave us with exit checks. We fail to compute the number of iterations of the inner loop(s) (pre loop header copying): [local count: 21065692]: L.5: _3 = _1 + 1; _53 = (integer(kind=8)) _3; _4 = _1 + 2; _54 = (integer(kind=8)) _4; _55 = (integer(kind=8)) i1_25; _5 = _55 * 81; _56 = _5 + -91; $3 = (gdb) p debug_bb_n (7) [local count: 63197075]: _6 = S.0_27 * 9; _57 = _6 + _56; [local count: 189610187]: # S.1_28 = PHI <_53(7), S.1_59(9)> if (S.1_28 > _54) goto ; [33.33%] else goto ; [66.67%] $1 = (gdb) p debug_bb_n (9) [local count: 126413112]: _7 = S.1_28 + _57; _8 = test_array[_7]; _9 = _8 + -10; test_array[_7] = _9; S.1_59 = S.1_28 + 1; goto ; [100.00%] this one being a bit difficult, but the other (but not as interesting(?)): [local count: 119292717]: L.14: _14 = _1 + 1; _69 = (integer(kind=8)) _14; _15 = _1 + 2; _70 = (integer(kind=8)) _15; _71 = (integer(kind=8)) i2_26; _16 = _71 * 81; _72 = _16 + -91; # S.4_31 = PHI <_69(19), S.4_75(21)> if (S.4_31 > _70) goto ; [33.33%] else goto ; [66.67%] [local count: 715863674]: _18 = S.4_31 + _73; _19 = test_array[_18]; _20 = _19 + 10; test_array[_18] = _20; S.4_75 = S.4_31 + 1; goto ; [100.00%] looks like it should be doable. And indeed it is - we are just "confused" by the maybe_zero test. IMHO we should allow constant zero or N iterations by performing the loop header copying alongside the unrolling (leaving the first exit test unremoved). Testing a patch to do that.
[Bug tree-optimization/83326] [8 Regression] SPEC CPU2017 648.exchange2_s ~6% performance regression with r255267 (reproducer attached)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83326 Richard Biener changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2017-12-10 Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #1 from Richard Biener --- I will investigate.
[Bug tree-optimization/83326] [8 Regression] SPEC CPU2017 648.exchange2_s ~6% performance regression with r255267 (reproducer attached)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83326 Andrew Pinski changed: What|Removed |Added Keywords||missed-optimization Target Milestone|--- |8.0