[Bug tree-optimization/81558] Loop not vectorized

2023-07-21 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81558

Andrew Pinski  changed:

   What|Removed |Added

   Last reconfirmed||2023-07-21
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1

--- Comment #4 from Andrew Pinski  ---
Confirmed.

[Bug tree-optimization/81558] Loop not vectorized

2021-07-20 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81558

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||missed-optimization
   Severity|normal  |enhancement

[Bug tree-optimization/81558] Loop not vectorized

2017-07-27 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81558

--- Comment #3 from rguenther at suse dot de  ---
On Thu, 27 Jul 2017, kugan at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81558
> 
> --- Comment #2 from kugan at gcc dot gnu.org ---
> 
> > Does LLVM do a runtime alias check here?  For foo1 GCC adds a runtime alias
> > check
> > (BB vectorization cannot version for aliasing).
> 
> Yes. LLVM does not seem to be unrolling the inner loop. As you said, when
> disabling cunrolli it works. cunroll pass will unroll after loop 
> vectorisation.
> Can anything  done with the heuristics for this case? Thanks.

cunrolli sees

Loop 2 iterates 16 times.
...
  size:   1 imgY_org.6_2 = imgY_org;
  size:   0 _3 = (long unsigned int) y_15;
  size:   1 _4 = _3 * 8;
  size:   1 _5 = imgY_org.6_2 + _4;
  size:   1 _6 = *_5;
  size:   0 _7 = (long unsigned int) x_14;
  size:   1 _8 = _7 * 2;
  size:   1 _9 = _6 + _8;
  size:   1 orgptr_24 = orgptr_16 + 2;
  size:   1 _10 = *_9;
  size:   1 *orgptr_16 = _10;
  size:   1 x_26 = x_14 + 1;

A quick shot at a heuristic would see we'd vectorize this with
V8HI/V16HImode and with statically determined 16 iterations that should
be profitable.

So yes, a heuristic is possible but it would be only a heuristic which
means there's likely a testcase that will regress in one way or another
(like missing simplifications exposed by unrolling).

Another thing is that IMHO cunrolli has a too big limit on the maximum
number of iterations it'll unroll.  Adding another param might help here,
or making it less aggressive.  Of course calcluix relies heavily on
curnolli aggressively unrolling ...

[Bug tree-optimization/81558] Loop not vectorized

2017-07-26 Thread kugan at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81558

--- Comment #2 from kugan at gcc dot gnu.org ---

> Does LLVM do a runtime alias check here?  For foo1 GCC adds a runtime alias
> check
> (BB vectorization cannot version for aliasing).

Yes. LLVM does not seem to be unrolling the inner loop. As you said, when
disabling cunrolli it works. cunroll pass will unroll after loop vectorisation.
Can anything  done with the heuristics for this case? Thanks.

[Bug tree-optimization/81558] Loop not vectorized

2017-07-26 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81558

Richard Biener  changed:

   What|Removed |Added

 CC||rguenth at gcc dot gnu.org
 Blocks||53947

--- Comment #1 from Richard Biener  ---
The inner loop in foo2 is completely unrolled by GCC and imgY_org[y][x] is

  _32 = (long unsigned int) y_103;
  _33 = _32 * 8;
  _34 = imgY_org.8_31 + _33;
  _35 = *_34;

where *_34 aliases *orgptr.  Thus it's not possible to vectorize this without
a runtime alias check.  The innermost loop in foo1 is vectorized, the unrolled
loop in foo2 is not basic-block vectorized because basic-block vectorization
runs into the very same dependence issue.

Does LLVM do a runtime alias check here?  For foo1 GCC adds a runtime alias
check
(BB vectorization cannot version for aliasing).


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations