[Bug tree-optimization/84737] [8 Regression] 20% degradation in CPU2000 172.mgrid starting with r256888

2018-04-19 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737

--- Comment #20 from Pat Haugen  ---
(In reply to Richard Biener from comment #18)
> Fixed (hopefully).

Yes, mgrid performance is back. Thanks.

[Bug tree-optimization/84737] [8 Regression] 20% degradation in CPU2000 172.mgrid starting with r256888

2018-04-19 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737

--- Comment #19 from Richard Biener  ---
Author: rguenth
Date: Thu Apr 19 12:41:42 2018
New Revision: 259493

URL: https://gcc.gnu.org/viewcvs?rev=259493&root=gcc&view=rev
Log:
2018-04-19  Richard Biener  

PR tree-optimization/84737
* tree-vect-data-refs.c (vect_copy_ref_info): New function
copying restrict info.
(vect_setup_realignment): Use it.
* tree-vectorizer.h (vect_copy_ref_info): Declare.
* tree-vect-stmts.c (vectorizable_store): Copy ref info from
the first DR to all generated stores.
(vectorizable_load): Likewise for loads.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/tree-vect-data-refs.c
trunk/gcc/tree-vect-stmts.c
trunk/gcc/tree-vectorizer.h

[Bug tree-optimization/84737] [8 Regression] 20% degradation in CPU2000 172.mgrid starting with r256888

2018-04-19 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737

Richard Biener  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #18 from Richard Biener  ---
Fixed (hopefully).

[Bug tree-optimization/84737] [8 Regression] 20% degradation in CPU2000 172.mgrid starting with r256888

2018-04-19 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737

Richard Biener  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org

--- Comment #17 from Richard Biener  ---
Created attachment 43984
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43984&action=edit
patch I am testing

It is a similar issue, if not the same.  With the attached I see the pcom
performed - can you throw it on some ppc testing please?

[Bug tree-optimization/84737] [8 Regression] 20% degradation in CPU2000 172.mgrid starting with r256888

2018-04-19 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737

--- Comment #16 from Richard Biener  ---
It's hard to tell - will try to look at more dumps produced by a cross which
hopefully matches your setup.

[Bug tree-optimization/84737] [8 Regression] 20% degradation in CPU2000 172.mgrid starting with r256888

2018-04-18 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737

Pat Haugen  changed:

   What|Removed |Added

 CC||rguenth at gcc dot gnu.org

--- Comment #15 from Pat Haugen  ---
Richard, concerning my prior comment, any thoughts if this is a similar issue
to what you fixed in pr55334?

[Bug tree-optimization/84737] [8 Regression] 20% degradation in CPU2000 172.mgrid starting with r256888

2018-04-13 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737

--- Comment #14 from Pat Haugen  ---
Created attachment 43928
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43928&action=edit
r256888 pcom dump

So the difference appears to be occurring in predictive commoning. In the
ipa-cp clone, resid.constprop, pcom is failing to hoist some loads/expressions
from the vectorized loop. This results in an additional 9 vector loads and 5
vector adds being executed each iteration of the loop.

I've attached a pcom dump of the original resid() and the clone
resid.constprop(). You can see that in the original resid(), pcom is moving
some loads/adds, but not in resid.constprop(). BB 6 is the vectorized loop in
resid(), BB 5 is the same loop in resid.constprop().

Not sure if this is a similar issue to pr55334 wrt losing restrict.

[Bug tree-optimization/84737] [8 Regression] 20% degradation in CPU2000 172.mgrid starting with r256888

2018-04-03 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737

Martin Liška  changed:

   What|Removed |Added

   Assignee|marxin at gcc dot gnu.org  |unassigned at gcc dot 
gnu.org

--- Comment #13 from Martin Liška  ---
I'm unassigning ..

[Bug tree-optimization/84737] [8 Regression] 20% degradation in CPU2000 172.mgrid starting with r256888

2018-03-27 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737

--- Comment #12 from Richard Biener  ---
This still seems to lack proper analysis...  thus a candidate for deferring.

[Bug tree-optimization/84737] [8 Regression] 20% degradation in CPU2000 172.mgrid starting with r256888

2018-03-26 Thread jamborm at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737

--- Comment #11 from Martin Jambor  ---
I guess you'll need to check whether it is PR 55334 (i.e. not preserving
restrict accross ipa-cp and/or inlining) coming back somehow...

[Bug tree-optimization/84737] [8 Regression] 20% degradation in CPU2000 172.mgrid starting with r256888

2018-03-14 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737

--- Comment #10 from Pat Haugen  ---
(In reply to Pat Haugen from comment #9)
> (pr83497, which I'm still digging on). Ignoring output miscompare and just
> timing the two versions built with -fno-tree-vectorize, I see that the 
> performance is similar. So possibly a powerpc vector cost issue.
> 

And then again, maybe not. Running with -fno-tree-vectorize and removing
-ffast-math (which eliminates the output miscompare), I still see the
degradation.

[Bug tree-optimization/84737] [8 Regression] 20% degradation in CPU2000 172.mgrid starting with r256888

2018-03-13 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737

--- Comment #9 from Pat Haugen  ---
(In reply to Martin Jambor from comment #7)
> Do I understand it correctly that you suspect that the new IPA-CP
> clone that is created from r256888 on is harmful?  In that case, you
> want to test that by trying higher values of ipa-cp-eval-threshold,
> something like --param ipa-cp-eval-threshold 610 (i.e. bigger than
> 606).  Of course, if there are other clones with evaluations between
> 500 and 610, it would affect them too.
> 

Building with --param ipa-cp-eval-threshold=610 prevented the creation of the
.resid_.constprop.1 clone and eliminated the performance degradation.

Looking at the profile more in depth, I saw that most of the time in
resid_.constprop was spent in the main vectorized loop. I tried both revisions
with -fno-tree-vectorize to see if vectorization in the clone is the real
problem on powerpc, but ran into issues with output miscompare (pr83497, which
I'm still digging on). Ignoring output miscompare and just timing the two
versions built with -fno-tree-vectorize, I see that the  performance is
similar. So possibly a powerpc vector cost issue.


> You may also want to try both fast and slow revisions with
> -fno-ipa-cp-clone as the first step, actually.

Doing this showed r256888 about 4% slower, so not near as bad.

[Bug tree-optimization/84737] [8 Regression] 20% degradation in CPU2000 172.mgrid starting with r256888

2018-03-13 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737

--- Comment #8 from Martin Liška  ---
(In reply to Martin Jambor from comment #7)
> Do I understand it correctly that you suspect that the new IPA-CP
> clone that is created from r256888 on is harmful?  In that case, you

Yes.

> want to test that by trying higher values of ipa-cp-eval-threshold,
> something like --param ipa-cp-eval-threshold 610 (i.e. bigger than
> 606).  Of course, if there are other clones with evaluations between
> 500 and 610, it would affect them too.
> 
> You may also want to try both fast and slow revisions with
> -fno-ipa-cp-clone as the first step, actually.

Btw. I can confirm that same clones is created on x86_64, but there's no speed
impact of the clone creation.

Is it also reproducible on ppc64{,le} ?

[Bug tree-optimization/84737] [8 Regression] 20% degradation in CPU2000 172.mgrid starting with r256888

2018-03-13 Thread jamborm at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737

--- Comment #7 from Martin Jambor  ---
Do I understand it correctly that you suspect that the new IPA-CP
clone that is created from r256888 on is harmful?  In that case, you
want to test that by trying higher values of ipa-cp-eval-threshold,
something like --param ipa-cp-eval-threshold 610 (i.e. bigger than
606).  Of course, if there are other clones with evaluations between
500 and 610, it would affect them too.

You may also want to try both fast and slow revisions with
-fno-ipa-cp-clone as the first step, actually.

[Bug tree-optimization/84737] [8 Regression] 20% degradation in CPU2000 172.mgrid starting with r256888

2018-03-13 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737

Martin Liška  changed:

   What|Removed |Added

 Status|WAITING |NEW
 CC||jamborm at gcc dot gnu.org

--- Comment #6 from Martin Liška  ---
I see, the change is only in resid_.constprop.1 that is created after the
patch:

before:

Evaluating opportunities for resid/1.
 - considering value &x.u for param #0 u (caller_count: 3)
 good_cloning_opportunity_p (time: 1, size: 169, freq_sum: 38778) ->
evaluation: 229, threshold: 500
 good_cloning_opportunity_p (time: 1, size: 169, freq_sum: 38778) ->
evaluation: 229, threshold: 500
 - considering value &x.v for param #1 v (caller_count: 3)
 good_cloning_opportunity_p (time: 1, size: 169, freq_sum: 38778) ->
evaluation: 229, threshold: 500
 good_cloning_opportunity_p (time: 1, size: 169, freq_sum: 38778) ->
evaluation: 229, threshold: 500
 - considering value &x.r for param #2 r (caller_count: 3)
 good_cloning_opportunity_p (time: 1, size: 169, freq_sum: 38778) ->
evaluation: 229, threshold: 500
 good_cloning_opportunity_p (time: 2, size: 246, freq_sum: 38778) ->
evaluation: 315, threshold: 500
 - considering value &x.a for param #4 a (caller_count: 4)
 good_cloning_opportunity_p (time: 1, size: 169, freq_sum: 42518) ->
evaluation: 251, threshold: 500
 good_cloning_opportunity_p (time: 1, size: 169, freq_sum: 42518) ->
evaluation: 251, threshold: 500

after:

Evaluating opportunities for resid/1.
 - considering value &x.u for param #0 u (caller_count: 3)
 good_cloning_opportunity_p (time: 1, size: 169, freq_sum: 74554) ->
evaluation: 441, threshold: 500
 good_cloning_opportunity_p (time: 1, size: 169, freq_sum: 74554) ->
evaluation: 441, threshold: 500
 - considering value &x.v for param #1 v (caller_count: 3)
 good_cloning_opportunity_p (time: 1, size: 169, freq_sum: 74554) ->
evaluation: 441, threshold: 500
 good_cloning_opportunity_p (time: 1, size: 169, freq_sum: 74554) ->
evaluation: 441, threshold: 500
 - considering value &x.r for param #2 r (caller_count: 3)
 good_cloning_opportunity_p (time: 1, size: 169, freq_sum: 74554) ->
evaluation: 441, threshold: 500
 good_cloning_opportunity_p (time: 2, size: 246, freq_sum: 74554) ->
evaluation: 606, threshold: 500
  Creating a specialized node of resid/1.

Martin can you please take a look and evaluate how profitable is that
transformation? Maybe it can help you
with tuning of the param as in PR84149.

Thanks

[Bug tree-optimization/84737] [8 Regression] 20% degradation in CPU2000 172.mgrid starting with r256888

2018-03-08 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737

--- Comment #5 from Pat Haugen  ---
Created attachment 43601
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43601&action=edit
ipa-cp dump (r256887)

(In reply to Martin Liška from comment #4)
> Thank you, may I please ask you for the IPA CP dump file for not affected
> revision (r256887). Do I understand the numbers right that version with
> .resid_.constprop.1 is slower?

Dump attached. And yes, the version with resid_.constprop.1 is slower.

Also, I tried the patch from
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84149#c5 and didn't see any
difference in execution time.

[Bug tree-optimization/84737] [8 Regression] 20% degradation in CPU2000 172.mgrid starting with r256888

2018-03-08 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737

--- Comment #4 from Martin Liška  ---
Thank you, may I please ask you for the IPA CP dump file for not affected
revision (r256887). Do I understand the numbers right that version with
.resid_.constprop.1 is slower?

[Bug tree-optimization/84737] [8 Regression] 20% degradation in CPU2000 172.mgrid starting with r256888

2018-03-07 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737

--- Comment #3 from Pat Haugen  ---
(In reply to Martin Liška from comment #1)
> Isn't that dup of 84149? Can you please tweak --param ipa-cp-eval-threshold
> to value to 200, 300, 400? Can you please attach -fdump-ipa-cp-details file?

I tried the param with the 3 different values and none made any difference to
execution time.

[Bug tree-optimization/84737] [8 Regression] 20% degradation in CPU2000 172.mgrid starting with r256888

2018-03-07 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737

--- Comment #2 from Pat Haugen  ---
Created attachment 43589
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43589&action=edit
ipa-cp dump

[Bug tree-optimization/84737] [8 Regression] 20% degradation in CPU2000 172.mgrid starting with r256888

2018-03-07 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737

Martin Liška  changed:

   What|Removed |Added

 Status|UNCONFIRMED |WAITING
   Last reconfirmed||2018-03-07
   Assignee|unassigned at gcc dot gnu.org  |marxin at gcc dot 
gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Martin Liška  ---
Isn't that dup of 84149? Can you please tweak --param ipa-cp-eval-threshold to
value to 200, 300, 400? Can you please attach -fdump-ipa-cp-details file?

[Bug tree-optimization/84737] [8 Regression] 20% degradation in CPU2000 172.mgrid starting with r256888

2018-03-06 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||missed-optimization
  Component|ipa |tree-optimization
   Target Milestone|--- |8.0
Summary|20% degradation in CPU2000  |[8 Regression] 20%
   |172.mgrid starting with |degradation in CPU2000
   |r256888 |172.mgrid starting with
   ||r256888