https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110148
--- Comment #7 from cuilili ---
(In reply to Martin Jambor from comment #6)
> I believe this has been fixed?
Yes.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110148
--- Comment #3 from cuilili ---
I reproduced S1244 regression on znver3.
Src code:
for (int i = 0; i < LEN_1D-1; i++)
{
a[i] = b[i] + c[i] * c[i] + b[i] * b[i] + c[i];
d[i] = a[i] + a[i+1];
}
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110148
cuilili changed:
What|Removed |Added
CC||lili.cui at intel dot com
--- Comment #2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104271
--- Comment #14 from cuilili ---
This regression has been fixed with the commit below and we can close this
ticket.
https://gcc.gnu.org/g:1b9a5cc9ec08e9f239dd2096edcc447b7a72f64a
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110038
--- Comment #5 from cuilili ---
(In reply to Martin Jambor from comment #4)
> So is this now fixed?
Yes, the attachment case has been fixed.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110038
--- Comment #2 from cuilili ---
(In reply to Richard Biener from comment #1)
> Probably best to limit the values to reassoc-width by adding the
> appropriate IntegerRange attribute in params.opt
>
> IntegerRange(0, 256)
>
> maybe?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104271
--- Comment #12 from cuilili ---
This regression caused by the store forwarding issue, we eliminate the
redundant two pairs of loads and stores which have store forwarding issue by
inlining.
This regression has been fixed by
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
Bug 26163 depends on bug 105493, which changed state.
Bug 105493 Summary: [12/13 Regression] x86_64 538.imagick_r 6% regressions and
2% 525.x264_r regressions on Alder Lake after r12-7319-g90d693bdc9d718
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105493
cuilili changed:
What|Removed |Added
Resolution|--- |FIXED
Status|NEW
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105493
--- Comment #2 from cuilili ---
(In reply to Richard Biener from comment #1)
> Martin is currently re-benchmarking GCC 12 on AMD, so let's see if there's
> anything left on those.
AMD may not have this issue, Richard fixed AMD regression with
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105493
Bug ID: 105493
Summary: [12/13 Regression] x86_64 538.imagick_r 6% regressions
and 2% 525.x264_r regressions on Alder Lake after
r12-7319-g90d693bdc9d718
Product: gcc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104723
--- Comment #11 from cuilili ---
(In reply to Jakub Jelinek from comment #10)
> And for the backend, the question is how big the penalty for the overlapping
> store is compared to doing multiple non-overlapping stores. Say for those
> 49
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104271
--- Comment #9 from cuilili ---
Really appreciate for your reply, I debugged SRA pass with the small testcase
and found that SRA dose not handle this situation.
SRA cannot split callee's first parameter for "Do not decompose non-BLKmode
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104271
--- Comment #7 from cuilili ---
Created attachment 52706
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52706=edit
Add a heuristic for eliminate redundant load and store in inline pass.
Hi Richard,
Could you help take a look? This is my
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104271
--- Comment #6 from cuilili ---
I created a patch to fix this regression. The patch is under performance
testing. Will sent it out later.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104723
--- Comment #9 from cuilili ---
(In reply to cuilili from comment #3)
> (In reply to Hongtao.liu from comment #1)
> > STF issue here?
>
correct comment #3
I used perf to collect the "ld_blocks.store_forward" event for those two test
cases,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104723
--- Comment #3 from cuilili ---
(In reply to Hongtao.liu from comment #1)
> STF issue here?
Yes, Since "YMMWORD PTR [rsp-72]" across the cache line, it has STLF issue
here.
vmovdqu64 YMMWORD PTR [rsp-72], ymm31 --> store 32 bytes from
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
--- Comment #28 from cuilili ---
(In reply to H.J. Lu from comment #25)
> Can this be mitigated by removing redundant load and store?
Yes, inlining say_sphere can remove redundant loads and stores, O3 does
inlining, but O2 is more sensitive to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
--- Comment #24 from cuilili ---
(In reply to cuilili from comment #23)
> (In reply to Richard Biener from comment #17)
> > I do wonder though how CLX is fine with such access pattern ;) (did you
> > test
> > with just -O2?)
>
Sorry, correct
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
cuilili changed:
What|Removed |Added
CC||lili.cui at intel dot com
--- Comment #23
20 matches
Mail list logo