[Bug tree-optimization/98813] loop is sub-optimized if index is unsigned int with offset
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98813 --- Comment #8 from Jiu Fu Guo --- For code in comment 4, it is optimized since there are some range info for "_2 = l_m_34 + _54;" where _54 > 0.
[Bug tree-optimization/98813] loop is sub-optimized if index is unsigned int with offset
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98813
--- Comment #7 from Jiu Fu Guo ---
(In reply to Richard Biener from comment #6)
> (In reply to Andrew Pinski from comment #5)
> > (In reply to Jiu Fu Guo from comment #0)
> > > For the below code:
> > > ---t.c
> > > void
> > > foo (const double* __restrict__ A, const double* __restrict__ B, double*
> > > __restrict__ C,
> > > int n, int k, int m)
> > > {
> > > for (unsigned int l_m = 0; l_m < m; l_m++)
> > > C[n + l_m] += A[k + l_m] * B[k];
> > > }
> >
> > Try using unsigned long instead of unsigned int.
> > I think this is the same as PR 61247.
>
> Yes, I think we've seen plenty examples in the past where conversions in
> the SCEV chain prevent analysis.
Yes. Thanks for your comments and suggestions!
And for this code (unsigned int), I'm thinking if we really need runtime
scev/overflow checking before vectorizing it to guard `n+m<4294967295 &&
m<4294967295`.
Without this guard, I'm wondering if the optimization is correct for the code
in comment 4.
[Bug tree-optimization/98813] loop is sub-optimized if index is unsigned int with offset
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98813
--- Comment #6 from Richard Biener ---
(In reply to Andrew Pinski from comment #5)
> (In reply to Jiu Fu Guo from comment #0)
> > For the below code:
> > ---t.c
> > void
> > foo (const double* __restrict__ A, const double* __restrict__ B, double*
> > __restrict__ C,
> > int n, int k, int m)
> > {
> > for (unsigned int l_m = 0; l_m < m; l_m++)
> > C[n + l_m] += A[k + l_m] * B[k];
> > }
>
> Try using unsigned long instead of unsigned int.
> I think this is the same as PR 61247.
Yes, I think we've seen plenty examples in the past where conversions in
the SCEV chain prevent analysis.
[Bug tree-optimization/98813] loop is sub-optimized if index is unsigned int with offset
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98813
--- Comment #5 from Andrew Pinski ---
(In reply to Jiu Fu Guo from comment #0)
> For the below code:
> ---t.c
> void
> foo (const double* __restrict__ A, const double* __restrict__ B, double*
> __restrict__ C,
> int n, int k, int m)
> {
> for (unsigned int l_m = 0; l_m < m; l_m++)
> C[n + l_m] += A[k + l_m] * B[k];
> }
Try using unsigned long instead of unsigned int.
I think this is the same as PR 61247.
[Bug tree-optimization/98813] loop is sub-optimized if index is unsigned int with offset
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98813
--- Comment #4 from Jiu Fu Guo ---
Thanks, Richard!
One interesting thing: below code is vectorized:
void
foo (const double *__restrict__ A, const double *__restrict__ B,
double *__restrict__ C, int n, int k, int m)
{
if (n > 0 && m > 0 && k > 0)
for (unsigned int l_m = 0; l_m < m; l_m++)
C[n + l_m] += A[k + l_m] * B[k];
}
[Bug tree-optimization/98813] loop is sub-optimized if index is unsigned int with offset
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98813 Richard Biener changed: What|Removed |Added Blocks||53947 Keywords||missed-optimization Ever confirmed|0 |1 Severity|normal |enhancement CC||rguenth at gcc dot gnu.org, ||rsandifo at gcc dot gnu.org Status|UNCONFIRMED |NEW Last reconfirmed||2021-01-25 --- Comment #3 from Richard Biener --- Confirmed. While niter analysis produces 'assumptions' there's no such capability in SCEV analysis. In particular we have accesses like _54 = (unsigned int) n_24(D); ... _2 = l_m_34 + _54; _3 = (long unsigned int) _2; _4 = _3 * 8; _5 = C_25(D) + _4; _6 = *_5; where SCEV analysis could "look through" the (long unsigned int) cast in case _2 >= 0 && _2 <= INT_MAX / 8, it could, similar to niter analysis, record this somewhere. The overflow analysis there possibly also has similar issues as the split_constant_offset_1 one (which might also benefit from tracking 'assumptions'). Btw, tracking 'assumptions' not as GENERIC tree expression but in a form that would be nicer to collect & simplify later would be nice. Maybe for tracking purposes just note the SSA name _4 telling it's producer is assumed to not overflow, leaving combining & producing of versioning conditions to other helpers. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 [Bug 53947] [meta-bug] vectorizer missed-optimizations
[Bug tree-optimization/98813] loop is sub-optimized if index is unsigned int with offset
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98813
--- Comment #2 from Jiu Fu Guo ---
For code:
for (unsigned int k = 0; k < BS; k++)
{
s += A[k] * B[k];
}
PR48052 handles this, and for this code, the additional runtime check seems not
required.
If there is offset in code:
for (unsigned int k = 0; k < BS; k++)
{
s += A[k+3] * B[k+3];
}
This code is not vectorized then.
[Bug tree-optimization/98813] loop is sub-optimized if index is unsigned int with offset
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98813 --- Comment #1 from Jiu Fu Guo --- Since there are additional costs for the run-time check, we can see the benefit if upbound `m` is large; if upbound is small (e.g. < 12), the vectorized code (from clang) is worse than un-vectorized binary.
