[Bug tree-optimization/84037] [8 Regression] Speed regression of polyhedron benchmark since r256644

2018-04-24 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037 Bug 84037 depends on bug 85491, which changed state. Bug 85491 Summary: [8 Regression] nbench LU Decomposition test 15% slower than GCC 7, 30% slower than peak https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85491 What|Removed

[Bug tree-optimization/84037] [8 Regression] Speed regression of polyhedron benchmark since r256644

2018-02-16 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037 --- Comment #33 from Richard Biener --- Author: rguenth Date: Fri Feb 16 13:47:25 2018 New Revision: 257734 URL: https://gcc.gnu.org/viewcvs?rev=257734=gcc=rev Log: 2018-02-16 Richard Biener PR

[Bug tree-optimization/84037] [8 Regression] Speed regression of polyhedron benchmark since r256644

2018-02-16 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037 Richard Biener changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|---

[Bug tree-optimization/84037] [8 Regression] Speed regression of polyhedron benchmark since r256644

2018-02-14 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037 Richard Biener changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned

[Bug tree-optimization/84037] [8 Regression] Speed regression of polyhedron benchmark since r256644

2018-02-14 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037 --- Comment #30 from Jakub Jelinek --- Is this fixed now or is there more work to do?

[Bug tree-optimization/84037] [8 Regression] Speed regression of polyhedron benchmark since r256644

2018-02-12 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037 --- Comment #29 from Richard Biener --- Author: rguenth Date: Mon Feb 12 13:55:04 2018 New Revision: 257588 URL: https://gcc.gnu.org/viewcvs?rev=257588=gcc=rev Log: 2018-02-12 Richard Biener PR

[Bug tree-optimization/84037] [8 Regression] Speed regression of polyhedron benchmark since r256644

2018-02-12 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037 --- Comment #28 from Richard Biener --- Author: rguenth Date: Mon Feb 12 08:54:28 2018 New Revision: 257581 URL: https://gcc.gnu.org/viewcvs?rev=257581=gcc=rev Log: 2018-02-12 Richard Biener PR

[Bug tree-optimization/84037] [8 Regression] Speed regression of polyhedron benchmark since r256644

2018-02-09 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037 --- Comment #27 from Richard Biener --- (In reply to amker from comment #26) > (In reply to amker from comment #25) > > I tend to believe this is an register pressure based strength-reduction + > > lim problem than ivopts. > > > > So given

[Bug tree-optimization/84037] [8 Regression] Speed regression of polyhedron benchmark since r256644

2018-02-08 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037 --- Comment #26 from amker at gcc dot gnu.org --- (In reply to amker from comment #25) > I tend to believe this is an register pressure based strength-reduction + > lim problem than ivopts. > > So given class of memory references like: > >

[Bug tree-optimization/84037] [8 Regression] Speed regression of polyhedron benchmark since r256644

2018-02-08 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037 --- Comment #25 from amker at gcc dot gnu.org --- I tend to believe this is an register pressure based strength-reduction + lim problem than ivopts. So given class of memory references like: reg = ... Loop: MEM[iv_base + reg * 0];

[Bug tree-optimization/84037] [8 Regression] Speed regression of polyhedron benchmark since r256644

2018-02-08 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037 Bug 84037 depends on bug 84278, which changed state. Bug 84278 Summary: claims initv4sfv2sf is available but inits through stack https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84278 What|Removed |Added

[Bug tree-optimization/84037] [8 Regression] Speed regression of polyhedron benchmark since r256644

2018-02-08 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037 --- Comment #24 from Richard Biener --- (In reply to amker from comment #23) > (In reply to Richard Biener from comment #21) > > So after r257453 we improve the situation pre-IVOPTs to just > > 6 IVs (duplicated but trivially equivalent) plus

[Bug tree-optimization/84037] [8 Regression] Speed regression of polyhedron benchmark since r256644

2018-02-07 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037 --- Comment #23 from amker at gcc dot gnu.org --- (In reply to Richard Biener from comment #21) > So after r257453 we improve the situation pre-IVOPTs to just > 6 IVs (duplicated but trivially equivalent) plus one counting IV. But then > when

[Bug tree-optimization/84037] [8 Regression] Speed regression of polyhedron benchmark since r256644

2018-02-07 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037 --- Comment #22 from Richard Biener --- Author: rguenth Date: Wed Feb 7 15:46:17 2018 New Revision: 257453 URL: https://gcc.gnu.org/viewcvs?rev=257453=gcc=rev Log: 2018-02-07 Richard Biener PR

[Bug tree-optimization/84037] [8 Regression] Speed regression of polyhedron benchmark since r256644

2018-02-07 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037 --- Comment #21 from Richard Biener --- So after r257453 we improve the situation pre-IVOPTs to just 6 IVs (duplicated but trivially equivalent) plus one counting IV. But then when SLP is enabled IVOPTs comes along and adds another 4 IVs which

[Bug tree-optimization/84037] [8 Regression] Speed regression of polyhedron benchmark since r256644

2018-01-31 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037 --- Comment #20 from Richard Biener --- Note that targets already have the opportunity to limit vectorization by adjusting their finish_cost hook - here they even have more useful information available (kind of).

[Bug tree-optimization/84037] [8 Regression] Speed regression of polyhedron benchmark since r256644

2018-01-31 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037 --- Comment #19 from Richard Biener --- On Zen I measure 23s with --param vect-max-version-for-alias-checks=0 (thus basically before the rev.) and 33s without. With the patch and the size parameter tuned to 146 I get 25s and with 90 it is

[Bug tree-optimization/84037] [8 Regression] Speed regression of polyhedron benchmark since r256644

2018-01-31 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037 --- Comment #18 from Richard Biener --- (In reply to Jan Hubicka from comment #17) > We already have > /* This function adjusts the unroll factor based on >the hardware capabilities. For ex, bdver3 has >a loop buffer which makes

[Bug tree-optimization/84037] [8 Regression] Speed regression of polyhedron benchmark since r256644

2018-01-31 Thread hubicka at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037 --- Comment #17 from Jan Hubicka --- We already have /* This function adjusts the unroll factor based on the hardware capabilities. For ex, bdver3 has a loop buffer which makes unrolling of smaller loops less important. This function

[Bug tree-optimization/84037] [8 Regression] Speed regression of polyhedron benchmark since r256644

2018-01-30 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037 --- Comment #16 from Richard Biener --- So discussion lead to the proposal to add another unroll parameter, for example --param small-loop-size which serves as a "barrier" we may not cross when optimizing a loop. Thus for all loops <=

[Bug tree-optimization/84037] [8 Regression] Speed regression of polyhedron benchmark since r256644

2018-01-30 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037 --- Comment #15 from Richard Biener --- Oh, and if you don't disable inlining then you get down to sizes of 148 (SSE and SLP) and 91 and 75 (SSE and no SLP). So you won't get rid of two instances of vectorization regardless of the parameter

[Bug tree-optimization/84037] [8 Regression] Speed regression of polyhedron benchmark since r256644

2018-01-30 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037 --- Comment #14 from Richard Biener --- Created attachment 43289 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43289=edit patch limiting growth So I played with a simple hack limiting the amount of growth in a vectorized loop based on

[Bug tree-optimization/84037] [8 Regression] Speed regression of polyhedron benchmark since r256644

2018-01-29 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037 Richard Biener changed: What|Removed |Added CC||amker at gcc dot gnu.org --- Comment

[Bug tree-optimization/84037] [8 Regression] Speed regression of polyhedron benchmark since r256644

2018-01-29 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037 --- Comment #12 from Richard Biener --- I have opened PR84102 for the missed optimizations in this particular loop. I believe now the interesting one is the other. 30.25% a.outa.out [.] __solv_cap_MOD_fourir2d 24.83%

[Bug tree-optimization/84037] [8 Regression] Speed regression of polyhedron benchmark since r256644

2018-01-29 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037 --- Comment #11 from Richard Biener --- So probably the big slowdown is because the vectorized loop body is so much larger. Unvectorized: .L61: vmulss __solv_cap_MOD_d1(%rip), %xmm4, %xmm0 incl%ecx vmulss (%rdx),

[Bug tree-optimization/84037] [8 Regression] Speed regression of polyhedron benchmark since r256644

2018-01-29 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037 --- Comment #10 from Richard Biener --- So strided stores are costed as /* Costs of the stores. */ if (memory_access_type == VMAT_ELEMENTWISE || memory_access_type == VMAT_GATHER_SCATTER) { /* N scalar stores plus

[Bug tree-optimization/84037] [8 Regression] Speed regression of polyhedron benchmark since r256644

2018-01-29 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037 --- Comment #9 from Richard Biener --- (In reply to Martin Liška from comment #7) > (In reply to Jakub Jelinek from comment #6) > > Is it really r256643 and not r256644 that is causing this though? > > Yes, I can verify that it's r256644 that's

[Bug tree-optimization/84037] [8 Regression] Speed regression of polyhedron benchmark since r256644

2018-01-26 Thread hubicka at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037 --- Comment #8 from Jan Hubicka --- https://gcc.opensuse.org/gcc-old/c++bench-czerny/pb11/pb11-summary.txt-2-0.html runs with -Ofast -funroll-loops so indeed does not seem essential to trigger the regression (it may be two different ones of

[Bug tree-optimization/84037] [8 Regression] Speed regression of polyhedron benchmark since r256644

2018-01-25 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037 Martin Liška changed: What|Removed |Added Summary|[8 Regression] Speed|[8 Regression] Speed