https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98813
--- Comment #8 from Jiu Fu Guo ---
For code in comment 4, it is optimized since there are some range info for "_2
= l_m_34 + _54;" where _54 > 0.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98813
--- Comment #7 from Jiu Fu Guo ---
(In reply to Richard Biener from comment #6)
> (In reply to Andrew Pinski from comment #5)
> > (In reply to Jiu Fu Guo from comment #0)
> > > For the below code:
> > > ---t.c
> > > void
> > > foo (const
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98813
--- Comment #6 from Richard Biener ---
(In reply to Andrew Pinski from comment #5)
> (In reply to Jiu Fu Guo from comment #0)
> > For the below code:
> > ---t.c
> > void
> > foo (const double* __restrict__ A, const double* __restrict__ B,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98813
--- Comment #5 from Andrew Pinski ---
(In reply to Jiu Fu Guo from comment #0)
> For the below code:
> ---t.c
> void
> foo (const double* __restrict__ A, const double* __restrict__ B, double*
> __restrict__ C,
> int n, int k, int m)
> {
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98813
--- Comment #4 from Jiu Fu Guo ---
Thanks, Richard!
One interesting thing: below code is vectorized:
void
foo (const double *__restrict__ A, const double *__restrict__ B,
double *__restrict__ C, int n, int k, int m)
{
if (n > 0 && m > 0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98813
Richard Biener changed:
What|Removed |Added
Blocks||53947
Keywords|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98813
--- Comment #2 from Jiu Fu Guo ---
For code:
for (unsigned int k = 0; k < BS; k++)
{
s += A[k] * B[k];
}
PR48052 handles this, and for this code, the additional runtime check seems not
required.
If there is offset in code:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98813
--- Comment #1 from Jiu Fu Guo ---
Since there are additional costs for the run-time check, we can see the benefit
if upbound `m` is large; if upbound is small (e.g. < 12), the vectorized code
(from clang) is worse than un-vectorized binary.