[Bug tree-optimization/83326] [8 Regression] SPEC CPU2017 648.exchange2_s ~6% performance regression with r255267 (reproducer attached)

2017-12-21 Thread alexander.nesterovskiy at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83326

--- Comment #6 from Alexander Nesterovskiy  ---
Thanks! I see performance gain on 648.exchange2_s (~6% on Broadwell and ~3% on
Skylake-X) reverting performance to r255266 level (Skylake-X regression was
~3%).
And loops unrolled with 2 and 3 iterations. It's surely fixed.

[Bug tree-optimization/83326] [8 Regression] SPEC CPU2017 648.exchange2_s ~6% performance regression with r255267 (reproducer attached)

2017-12-14 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83326

--- Comment #5 from Richard Biener  ---
Author: rguenth
Date: Thu Dec 14 14:32:24 2017
New Revision: 255635

URL: https://gcc.gnu.org/viewcvs?rev=255635=gcc=rev
Log:
2017-12-14  Richard Biener  

PR tree-optimization/83326
* tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Add
may_be_zero parameter and handle it by not marking the first
peeled copy as not exiting the loop.
(try_peel_loop): Likewise.
(canonicalize_loop_induction_variables): Use number_of_iterations_exit
to handle the case of constant or zero iterations and perform
loop header copying on-the-fly.

* gcc.dg/tree-ssa/pr81388-2.c: Adjust.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.dg/tree-ssa/pr81388-2.c
trunk/gcc/tree-ssa-loop-ivcanon.c

[Bug tree-optimization/83326] [8 Regression] SPEC CPU2017 648.exchange2_s ~6% performance regression with r255267 (reproducer attached)

2017-12-14 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83326

Richard Biener  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from Richard Biener  ---
Should be fixed.

[Bug tree-optimization/83326] [8 Regression] SPEC CPU2017 648.exchange2_s ~6% performance regression with r255267 (reproducer attached)

2017-12-14 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83326

--- Comment #3 from Richard Biener  ---
Created attachment 42879
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42879=edit
patch in testing

Patch I am testing.  Performance evaluation appreciated.

[Bug tree-optimization/83326] [8 Regression] SPEC CPU2017 648.exchange2_s ~6% performance regression with r255267 (reproducer attached)

2017-12-14 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83326

--- Comment #2 from Richard Biener  ---
We no longer unroll the inner loops in cunrolli because cunrolli will leave us
with exit checks.

We fail to compute the number of iterations of the inner loop(s) (pre loop
header copying):

 [local count: 21065692]:
L.5:
_3 = _1 + 1;
_53 = (integer(kind=8)) _3;
_4 = _1 + 2;
_54 = (integer(kind=8)) _4;
_55 = (integer(kind=8)) i1_25;
_5 = _55 * 81;
_56 = _5 + -91;

$3 = 
(gdb) p debug_bb_n (7)
 [local count: 63197075]:
_6 = S.0_27 * 9;
_57 = _6 + _56;

 [local count: 189610187]:
# S.1_28 = PHI <_53(7), S.1_59(9)>
if (S.1_28 > _54)
  goto ; [33.33%]
else
  goto ; [66.67%]

$1 = 
(gdb) p debug_bb_n (9)
 [local count: 126413112]:
_7 = S.1_28 + _57;
_8 = test_array[_7];
_9 = _8 + -10;
test_array[_7] = _9;
S.1_59 = S.1_28 + 1;
goto ; [100.00%]

this one being a bit difficult, but the other (but not as interesting(?)):

 [local count: 119292717]:
L.14:
_14 = _1 + 1;
_69 = (integer(kind=8)) _14;
_15 = _1 + 2;
_70 = (integer(kind=8)) _15;
_71 = (integer(kind=8)) i2_26;
_16 = _71 * 81;
_72 = _16 + -91;

# S.4_31 = PHI <_69(19), S.4_75(21)>
if (S.4_31 > _70)
  goto ; [33.33%]
else
  goto ; [66.67%]

 [local count: 715863674]:
_18 = S.4_31 + _73;
_19 = test_array[_18];
_20 = _19 + 10;
test_array[_18] = _20;
S.4_75 = S.4_31 + 1;
goto ; [100.00%]

looks like it should be doable.

And indeed it is - we are just "confused" by the maybe_zero test.  IMHO
we should allow constant zero or N iterations by performing the loop
header copying alongside the unrolling (leaving the first exit test
unremoved).

Testing a patch to do that.

[Bug tree-optimization/83326] [8 Regression] SPEC CPU2017 648.exchange2_s ~6% performance regression with r255267 (reproducer attached)

2017-12-10 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83326

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2017-12-10
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Richard Biener  ---
I will investigate.

[Bug tree-optimization/83326] [8 Regression] SPEC CPU2017 648.exchange2_s ~6% performance regression with r255267 (reproducer attached)

2017-12-08 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83326

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||missed-optimization
   Target Milestone|--- |8.0