[Bug tree-optimization/84737] [8 Regression] 20% degradation in CPU2000 172.mgrid starting with r256888
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737 --- Comment #20 from Pat Haugen --- (In reply to Richard Biener from comment #18) > Fixed (hopefully). Yes, mgrid performance is back. Thanks.
[Bug tree-optimization/84737] [8 Regression] 20% degradation in CPU2000 172.mgrid starting with r256888
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737 --- Comment #19 from Richard Biener --- Author: rguenth Date: Thu Apr 19 12:41:42 2018 New Revision: 259493 URL: https://gcc.gnu.org/viewcvs?rev=259493&root=gcc&view=rev Log: 2018-04-19 Richard Biener PR tree-optimization/84737 * tree-vect-data-refs.c (vect_copy_ref_info): New function copying restrict info. (vect_setup_realignment): Use it. * tree-vectorizer.h (vect_copy_ref_info): Declare. * tree-vect-stmts.c (vectorizable_store): Copy ref info from the first DR to all generated stores. (vectorizable_load): Likewise for loads. Modified: trunk/gcc/ChangeLog trunk/gcc/tree-vect-data-refs.c trunk/gcc/tree-vect-stmts.c trunk/gcc/tree-vectorizer.h
[Bug tree-optimization/84737] [8 Regression] 20% degradation in CPU2000 172.mgrid starting with r256888
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737 Richard Biener changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #18 from Richard Biener --- Fixed (hopefully).
[Bug tree-optimization/84737] [8 Regression] 20% degradation in CPU2000 172.mgrid starting with r256888
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737 Richard Biener changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org --- Comment #17 from Richard Biener --- Created attachment 43984 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43984&action=edit patch I am testing It is a similar issue, if not the same. With the attached I see the pcom performed - can you throw it on some ppc testing please?
[Bug tree-optimization/84737] [8 Regression] 20% degradation in CPU2000 172.mgrid starting with r256888
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737 --- Comment #16 from Richard Biener --- It's hard to tell - will try to look at more dumps produced by a cross which hopefully matches your setup.
[Bug tree-optimization/84737] [8 Regression] 20% degradation in CPU2000 172.mgrid starting with r256888
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737 Pat Haugen changed: What|Removed |Added CC||rguenth at gcc dot gnu.org --- Comment #15 from Pat Haugen --- Richard, concerning my prior comment, any thoughts if this is a similar issue to what you fixed in pr55334?
[Bug tree-optimization/84737] [8 Regression] 20% degradation in CPU2000 172.mgrid starting with r256888
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737 --- Comment #14 from Pat Haugen --- Created attachment 43928 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43928&action=edit r256888 pcom dump So the difference appears to be occurring in predictive commoning. In the ipa-cp clone, resid.constprop, pcom is failing to hoist some loads/expressions from the vectorized loop. This results in an additional 9 vector loads and 5 vector adds being executed each iteration of the loop. I've attached a pcom dump of the original resid() and the clone resid.constprop(). You can see that in the original resid(), pcom is moving some loads/adds, but not in resid.constprop(). BB 6 is the vectorized loop in resid(), BB 5 is the same loop in resid.constprop(). Not sure if this is a similar issue to pr55334 wrt losing restrict.
[Bug tree-optimization/84737] [8 Regression] 20% degradation in CPU2000 172.mgrid starting with r256888
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737 Martin Liška changed: What|Removed |Added Assignee|marxin at gcc dot gnu.org |unassigned at gcc dot gnu.org --- Comment #13 from Martin Liška --- I'm unassigning ..
[Bug tree-optimization/84737] [8 Regression] 20% degradation in CPU2000 172.mgrid starting with r256888
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737 --- Comment #12 from Richard Biener --- This still seems to lack proper analysis... thus a candidate for deferring.
[Bug tree-optimization/84737] [8 Regression] 20% degradation in CPU2000 172.mgrid starting with r256888
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737 --- Comment #11 from Martin Jambor --- I guess you'll need to check whether it is PR 55334 (i.e. not preserving restrict accross ipa-cp and/or inlining) coming back somehow...
[Bug tree-optimization/84737] [8 Regression] 20% degradation in CPU2000 172.mgrid starting with r256888
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737 --- Comment #10 from Pat Haugen --- (In reply to Pat Haugen from comment #9) > (pr83497, which I'm still digging on). Ignoring output miscompare and just > timing the two versions built with -fno-tree-vectorize, I see that the > performance is similar. So possibly a powerpc vector cost issue. > And then again, maybe not. Running with -fno-tree-vectorize and removing -ffast-math (which eliminates the output miscompare), I still see the degradation.
[Bug tree-optimization/84737] [8 Regression] 20% degradation in CPU2000 172.mgrid starting with r256888
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737 --- Comment #9 from Pat Haugen --- (In reply to Martin Jambor from comment #7) > Do I understand it correctly that you suspect that the new IPA-CP > clone that is created from r256888 on is harmful? In that case, you > want to test that by trying higher values of ipa-cp-eval-threshold, > something like --param ipa-cp-eval-threshold 610 (i.e. bigger than > 606). Of course, if there are other clones with evaluations between > 500 and 610, it would affect them too. > Building with --param ipa-cp-eval-threshold=610 prevented the creation of the .resid_.constprop.1 clone and eliminated the performance degradation. Looking at the profile more in depth, I saw that most of the time in resid_.constprop was spent in the main vectorized loop. I tried both revisions with -fno-tree-vectorize to see if vectorization in the clone is the real problem on powerpc, but ran into issues with output miscompare (pr83497, which I'm still digging on). Ignoring output miscompare and just timing the two versions built with -fno-tree-vectorize, I see that the performance is similar. So possibly a powerpc vector cost issue. > You may also want to try both fast and slow revisions with > -fno-ipa-cp-clone as the first step, actually. Doing this showed r256888 about 4% slower, so not near as bad.
[Bug tree-optimization/84737] [8 Regression] 20% degradation in CPU2000 172.mgrid starting with r256888
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737 --- Comment #8 from Martin Liška --- (In reply to Martin Jambor from comment #7) > Do I understand it correctly that you suspect that the new IPA-CP > clone that is created from r256888 on is harmful? In that case, you Yes. > want to test that by trying higher values of ipa-cp-eval-threshold, > something like --param ipa-cp-eval-threshold 610 (i.e. bigger than > 606). Of course, if there are other clones with evaluations between > 500 and 610, it would affect them too. > > You may also want to try both fast and slow revisions with > -fno-ipa-cp-clone as the first step, actually. Btw. I can confirm that same clones is created on x86_64, but there's no speed impact of the clone creation. Is it also reproducible on ppc64{,le} ?
[Bug tree-optimization/84737] [8 Regression] 20% degradation in CPU2000 172.mgrid starting with r256888
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737 --- Comment #7 from Martin Jambor --- Do I understand it correctly that you suspect that the new IPA-CP clone that is created from r256888 on is harmful? In that case, you want to test that by trying higher values of ipa-cp-eval-threshold, something like --param ipa-cp-eval-threshold 610 (i.e. bigger than 606). Of course, if there are other clones with evaluations between 500 and 610, it would affect them too. You may also want to try both fast and slow revisions with -fno-ipa-cp-clone as the first step, actually.
[Bug tree-optimization/84737] [8 Regression] 20% degradation in CPU2000 172.mgrid starting with r256888
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737 Martin Liška changed: What|Removed |Added Status|WAITING |NEW CC||jamborm at gcc dot gnu.org --- Comment #6 from Martin Liška --- I see, the change is only in resid_.constprop.1 that is created after the patch: before: Evaluating opportunities for resid/1. - considering value &x.u for param #0 u (caller_count: 3) good_cloning_opportunity_p (time: 1, size: 169, freq_sum: 38778) -> evaluation: 229, threshold: 500 good_cloning_opportunity_p (time: 1, size: 169, freq_sum: 38778) -> evaluation: 229, threshold: 500 - considering value &x.v for param #1 v (caller_count: 3) good_cloning_opportunity_p (time: 1, size: 169, freq_sum: 38778) -> evaluation: 229, threshold: 500 good_cloning_opportunity_p (time: 1, size: 169, freq_sum: 38778) -> evaluation: 229, threshold: 500 - considering value &x.r for param #2 r (caller_count: 3) good_cloning_opportunity_p (time: 1, size: 169, freq_sum: 38778) -> evaluation: 229, threshold: 500 good_cloning_opportunity_p (time: 2, size: 246, freq_sum: 38778) -> evaluation: 315, threshold: 500 - considering value &x.a for param #4 a (caller_count: 4) good_cloning_opportunity_p (time: 1, size: 169, freq_sum: 42518) -> evaluation: 251, threshold: 500 good_cloning_opportunity_p (time: 1, size: 169, freq_sum: 42518) -> evaluation: 251, threshold: 500 after: Evaluating opportunities for resid/1. - considering value &x.u for param #0 u (caller_count: 3) good_cloning_opportunity_p (time: 1, size: 169, freq_sum: 74554) -> evaluation: 441, threshold: 500 good_cloning_opportunity_p (time: 1, size: 169, freq_sum: 74554) -> evaluation: 441, threshold: 500 - considering value &x.v for param #1 v (caller_count: 3) good_cloning_opportunity_p (time: 1, size: 169, freq_sum: 74554) -> evaluation: 441, threshold: 500 good_cloning_opportunity_p (time: 1, size: 169, freq_sum: 74554) -> evaluation: 441, threshold: 500 - considering value &x.r for param #2 r (caller_count: 3) good_cloning_opportunity_p (time: 1, size: 169, freq_sum: 74554) -> evaluation: 441, threshold: 500 good_cloning_opportunity_p (time: 2, size: 246, freq_sum: 74554) -> evaluation: 606, threshold: 500 Creating a specialized node of resid/1. Martin can you please take a look and evaluate how profitable is that transformation? Maybe it can help you with tuning of the param as in PR84149. Thanks
[Bug tree-optimization/84737] [8 Regression] 20% degradation in CPU2000 172.mgrid starting with r256888
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737 --- Comment #5 from Pat Haugen --- Created attachment 43601 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43601&action=edit ipa-cp dump (r256887) (In reply to Martin Liška from comment #4) > Thank you, may I please ask you for the IPA CP dump file for not affected > revision (r256887). Do I understand the numbers right that version with > .resid_.constprop.1 is slower? Dump attached. And yes, the version with resid_.constprop.1 is slower. Also, I tried the patch from https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84149#c5 and didn't see any difference in execution time.
[Bug tree-optimization/84737] [8 Regression] 20% degradation in CPU2000 172.mgrid starting with r256888
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737 --- Comment #4 from Martin Liška --- Thank you, may I please ask you for the IPA CP dump file for not affected revision (r256887). Do I understand the numbers right that version with .resid_.constprop.1 is slower?
[Bug tree-optimization/84737] [8 Regression] 20% degradation in CPU2000 172.mgrid starting with r256888
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737 --- Comment #3 from Pat Haugen --- (In reply to Martin Liška from comment #1) > Isn't that dup of 84149? Can you please tweak --param ipa-cp-eval-threshold > to value to 200, 300, 400? Can you please attach -fdump-ipa-cp-details file? I tried the param with the 3 different values and none made any difference to execution time.
[Bug tree-optimization/84737] [8 Regression] 20% degradation in CPU2000 172.mgrid starting with r256888
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737 --- Comment #2 from Pat Haugen --- Created attachment 43589 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43589&action=edit ipa-cp dump
[Bug tree-optimization/84737] [8 Regression] 20% degradation in CPU2000 172.mgrid starting with r256888
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737 Martin Liška changed: What|Removed |Added Status|UNCONFIRMED |WAITING Last reconfirmed||2018-03-07 Assignee|unassigned at gcc dot gnu.org |marxin at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #1 from Martin Liška --- Isn't that dup of 84149? Can you please tweak --param ipa-cp-eval-threshold to value to 200, 300, 400? Can you please attach -fdump-ipa-cp-details file?
[Bug tree-optimization/84737] [8 Regression] 20% degradation in CPU2000 172.mgrid starting with r256888
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737 Andrew Pinski changed: What|Removed |Added Keywords||missed-optimization Component|ipa |tree-optimization Target Milestone|--- |8.0 Summary|20% degradation in CPU2000 |[8 Regression] 20% |172.mgrid starting with |degradation in CPU2000 |r256888 |172.mgrid starting with ||r256888