[Bug middle-end/43182] GCC does not pull out a[0] from loop that changes a[i] for i:[1,n]
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43182 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement --- Comment #7 from Andrew Pinski --- So even though we can vectorize this loop these days, the non-vectorized loop still has the load each iteration. at -O2: .L3: movl(%ecx), %edx addl$4, %eax movl%edx, -4(%eax) cmpl%ebx, %eax jne .L3
[Bug middle-end/43182] GCC does not pull out a[0] from loop that changes a[i] for i:[1,n]
--- Comment #4 from changpeng dot fang at amd dot com 2010-02-26 18:53 --- Here is another similar case but more general. We know that a(j) and a(i) never access the same memory location. intel ifort can vectorize this triangular loop: do 10 j = 1,n do 20 i = j+1, n a(i) = a(i) - aa(i,j) * a(j) 20 continue 10 continue -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43182
[Bug middle-end/43182] GCC does not pull out a[0] from loop that changes a[i] for i:[1,n]
--- Comment #5 from pinskia at gcc dot gnu dot org 2010-02-26 18:55 --- (In reply to comment #4) Here is another similar case but more general. Actually it is a totally different case. Please file a new bug with that case; though there might already be a bug about that one. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43182
[Bug middle-end/43182] GCC does not pull out a[0] from loop that changes a[i] for i:[1,n]
--- Comment #6 from changpeng dot fang at amd dot com 2010-02-26 19:06 --- Actually it is a totally different case. Please file a new bug with that case; though there might already be a bug about that one. I could not see the difference even though j is not a compile-time constant. (it is an invariant to the innermost loop). I can say: GCC does not pull out a[j] from loop that changes a[i] for i:[j+1,n] -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43182
[Bug middle-end/43182] GCC does not pull out a[0] from loop that changes a[i] for i:[1,n]
--- Comment #2 from pinskia at gcc dot gnu dot org 2010-02-25 23:50 --- So currently inside LIM (which does load motion in general): D.2724_7 = a_6(D) + D.2723_5; D.2725_8 = *a_6(D); *D.2724_7 = D.2725_8; But LIM/alias oracle does not know that D.2723_5 has a range of [4, n_3*4] which means D.2724_7 can never equal a_6 so we don't pull out the load from a_6. -- pinskia at gcc dot gnu dot org changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever Confirmed|0 |1 Last reconfirmed|-00-00 00:00:00 |2010-02-25 23:50:10 date|| Summary|gcc could not vectorize this|GCC does not pull out a[0] |simple loop (un-handled |from loop that changes a[i] |data-ref) |for i:[1,n] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43182
[Bug middle-end/43182] GCC does not pull out a[0] from loop that changes a[i] for i:[1,n]
--- Comment #3 from pinskia at gcc dot gnu dot org 2010-02-25 23:54 --- Related to PR 29751 but that only does a simple method and does not handle this case as we need range info. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43182