https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037
Bug 84037 depends on bug 85491, which changed state.
Bug 85491 Summary: [8 Regression] nbench LU Decomposition test 15% slower than
GCC 7, 30% slower than peak
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85491
What|Removed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037
--- Comment #33 from Richard Biener ---
Author: rguenth
Date: Fri Feb 16 13:47:25 2018
New Revision: 257734
URL: https://gcc.gnu.org/viewcvs?rev=257734&root=gcc&view=rev
Log:
2018-02-16 Richard Biener
PR tree-optimization/84037
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037
Richard Biener changed:
What|Removed |Added
Status|ASSIGNED|RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037
Richard Biener changed:
What|Removed |Added
Status|NEW |ASSIGNED
Assignee|unassigned
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037
--- Comment #30 from Jakub Jelinek ---
Is this fixed now or is there more work to do?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037
--- Comment #29 from Richard Biener ---
Author: rguenth
Date: Mon Feb 12 13:55:04 2018
New Revision: 257588
URL: https://gcc.gnu.org/viewcvs?rev=257588&root=gcc&view=rev
Log:
2018-02-12 Richard Biener
PR tree-optimization/84037
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037
--- Comment #28 from Richard Biener ---
Author: rguenth
Date: Mon Feb 12 08:54:28 2018
New Revision: 257581
URL: https://gcc.gnu.org/viewcvs?rev=257581&root=gcc&view=rev
Log:
2018-02-12 Richard Biener
PR tree-optimization/84037
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037
--- Comment #27 from Richard Biener ---
(In reply to amker from comment #26)
> (In reply to amker from comment #25)
> > I tend to believe this is an register pressure based strength-reduction +
> > lim problem than ivopts.
> >
> > So given class
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037
--- Comment #26 from amker at gcc dot gnu.org ---
(In reply to amker from comment #25)
> I tend to believe this is an register pressure based strength-reduction +
> lim problem than ivopts.
>
> So given class of memory references like:
>
> reg
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037
--- Comment #25 from amker at gcc dot gnu.org ---
I tend to believe this is an register pressure based strength-reduction + lim
problem than ivopts.
So given class of memory references like:
reg = ...
Loop:
MEM[iv_base + reg * 0];
MEM[iv_b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037
Bug 84037 depends on bug 84278, which changed state.
Bug 84278 Summary: claims initv4sfv2sf is available but inits through stack
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84278
What|Removed |Added
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037
--- Comment #24 from Richard Biener ---
(In reply to amker from comment #23)
> (In reply to Richard Biener from comment #21)
> > So after r257453 we improve the situation pre-IVOPTs to just
> > 6 IVs (duplicated but trivially equivalent) plus one
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037
--- Comment #23 from amker at gcc dot gnu.org ---
(In reply to Richard Biener from comment #21)
> So after r257453 we improve the situation pre-IVOPTs to just
> 6 IVs (duplicated but trivially equivalent) plus one counting IV. But then
> when SLP
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037
--- Comment #22 from Richard Biener ---
Author: rguenth
Date: Wed Feb 7 15:46:17 2018
New Revision: 257453
URL: https://gcc.gnu.org/viewcvs?rev=257453&root=gcc&view=rev
Log:
2018-02-07 Richard Biener
PR tree-optimization/84037
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037
--- Comment #21 from Richard Biener ---
So after r257453 we improve the situation pre-IVOPTs to just
6 IVs (duplicated but trivially equivalent) plus one counting IV. But then
when SLP is enabled IVOPTs comes along and adds another 4 IVs which m
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037
--- Comment #20 from Richard Biener ---
Note that targets already have the opportunity to limit vectorization by
adjusting their finish_cost hook - here they even have more useful information
available
(kind of).
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037
--- Comment #19 from Richard Biener ---
On Zen I measure 23s with --param vect-max-version-for-alias-checks=0 (thus
basically before the rev.) and 33s without. With the patch and the size
parameter tuned to 146 I get 25s and with 90 it is 22.5s.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037
--- Comment #18 from Richard Biener ---
(In reply to Jan Hubicka from comment #17)
> We already have
> /* This function adjusts the unroll factor based on
>the hardware capabilities. For ex, bdver3 has
>a loop buffer which makes unrolling
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037
--- Comment #17 from Jan Hubicka ---
We already have
/* This function adjusts the unroll factor based on
the hardware capabilities. For ex, bdver3 has
a loop buffer which makes unrolling of smaller
loops less important. This function dec
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037
--- Comment #16 from Richard Biener ---
So discussion lead to the proposal to add another unroll parameter, for example
--param small-loop-size which serves as a "barrier" we may not cross when
optimizing a loop. Thus for all loops <= small-loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037
--- Comment #15 from Richard Biener ---
Oh, and if you don't disable inlining then you get down to sizes of 148
(SSE and SLP) and 91 and 75 (SSE and no SLP). So you won't get rid
of two instances of vectorization regardless of the parameter
(for
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037
--- Comment #14 from Richard Biener ---
Created attachment 43289
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43289&action=edit
patch limiting growth
So I played with a simple hack limiting the amount of growth in a vectorized
loop
based
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037
Richard Biener changed:
What|Removed |Added
CC||amker at gcc dot gnu.org
--- Comment #1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037
--- Comment #12 from Richard Biener ---
I have opened PR84102 for the missed optimizations in this particular loop. I
believe now the interesting one is the other.
30.25% a.outa.out [.] __solv_cap_MOD_fourir2d
24.83% a.out
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037
--- Comment #11 from Richard Biener ---
So probably the big slowdown is because the vectorized loop body is so much
larger. Unvectorized:
.L61:
vmulss __solv_cap_MOD_d1(%rip), %xmm4, %xmm0
incl%ecx
vmulss (%rdx), %
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037
--- Comment #10 from Richard Biener ---
So strided stores are costed as
/* Costs of the stores. */
if (memory_access_type == VMAT_ELEMENTWISE
|| memory_access_type == VMAT_GATHER_SCATTER)
{
/* N scalar stores plus extracting
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037
--- Comment #9 from Richard Biener ---
(In reply to Martin Liška from comment #7)
> (In reply to Jakub Jelinek from comment #6)
> > Is it really r256643 and not r256644 that is causing this though?
>
> Yes, I can verify that it's r256644 that's
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037
--- Comment #8 from Jan Hubicka ---
https://gcc.opensuse.org/gcc-old/c++bench-czerny/pb11/pb11-summary.txt-2-0.html
runs with -Ofast -funroll-loops so indeed does not seem essential to trigger
the regression (it may be two different ones of cour
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037
Martin Liška changed:
What|Removed |Added
Summary|[8 Regression] Speed|[8 Regression] Speed
|r
29 matches
Mail list logo