Hi, Can a gatekeeper please review the attached vectorizer patch?
Sample testcases that fail at -O3: #include <stdio.h> int a[1000], b[1000], c[1000]; void init (void) { int i; for (i = 0; i < 1000; i++) { a[i] = 1; b[i] = 2; c[i] = 3; } } int main (int argc, char** argv) { int i; init (); for (i = 2; i < 1000; i++) { b[i] = a[i-2]; a[i] = c[i]; } fprintf (stderr, "%d\n", b[6]); return 0; } --- #include <stdio.h> #define N 1000 double d1[N], d2[N], d3[N]; float f1[N], f2[N]; void foo (void) { int i; for (i = 4; i < N; i++) { d1[i] = d2[i]; /* s1 */ f1[i] = f2[i]; /* s2 */ d3[i] = d1[i+3]; /* s3 */ } } void init (void) { int i; for (i = 0; i < N; i++) { d1[i] = 1.0; d2[i] = 2.0; d3[i] = 3.0; } } int main (int argc, char** argv) { init (); foo (); fprintf (stderr, "%f\n", (float)d3[4]); return 0; } --- #include <stdio.h> #define N 1000 double d1[N], d2[N], d3[N]; char f1[N], f2[N]; void foo (void) { int i; for (i = 0; i < N - 4; i++) { d1[i] = d2[i] + d3[i+1]; /* s1 */ f1[i] = f2[i]; /* s2 */ d3[i] = d1[i+3]; /* s3 */ } } void init (void) { int i; for (i = 0; i < N; i++) { d1[i] = 1.0 * i; d2[i] = 2.0 * i; d3[i] = 3.0 * i; } } int main (int argc, char** argv) { int i; init (); foo (); for (i=0; i<10; i++){ fprintf (stderr, "%f ", (float)d3[i]); } fprintf(stderr, "\n"); return 0; } --- #include <stdio.h> #define N 1000 double d1[N], d2[N], d3[N]; char f1[N], f2[N]; int main (int argc, char** argv) { int i; for (i = 0; i < N; i++) { d3[i] = 3.0 * i; /* s1 */ f1[i] = f2[i]; /* s2 */ } for (i=0; i<10; i++){ fprintf (stderr, "%f ", (float)d3[i]); } fprintf(stderr, "\n"); return 0; } --- Problem/Fix-Description: The central issue is incorrect ordering of vectorized statements. In the first example, the vector length is 4. Therefore, we need to re-order the two statements in order to perform vectorization correctly. b[i] = a[i-2]; a[i] = c[i]; To fix the problem, we topologically reorder the statements right before performing vectorization. In the other examples the problem is caused by incorrect order of certain unrolled statements during vectorization, coupled sometimes (last two examples) with incorrect handling of induction variables in loop unrolling during SIMD. Before this fix, copies of unrolled statement are inserted right after current statement, without checking dependencies. This causes problems to the following example code: d1[i] = d2[i] + d3[i+1]; /* s1 */ f1[i] = f2[i]; /* s2 */ d3[i] = d1[i+3]; /* s3 */ Due to the presence of s2, s1 and s3 need to be unrolled by 2. So, the code will become the following: d1[i:i+1] = d2[i:i+1] + d3[i+1:i+2];/* s1 */ d1[i+2:i+3] = d2[i+2:i+3] + d3[i+3:i+4];/* s1' */ f1[i:i+3] = f2[i:i+3]; /* s2 */ d3[i:i+1] = d1[i+3:i+4]; /* s3 */ d3[i+2:i+3] = d1[i+5:i+6]; /* s3' */ Due to s1', d1[i+3] is written before read which is not so in original code. The fix is to insert unrolled copies of statements in the order of their appearence in the original loop. In the last example, when vectorizing `3.0 * I' (use of index variable in non array subscript operation), the first loop is changed to (conceptually): II = [start:start+4] INC= [16, 16, 16, 16] for (i = start; i < end; i+=16) { d3[i:i+1] = 3.0 * II; /* vectorized version of s1 */ f1[i:i+15]= f2[i:i+15]; /* vectorized version of s2 */ II += INC; /* s3, created by SIMD */ } Then, unrolling is performed. Since the loop is unrolled 16 times, we need to fill remaining copies of s1. SIMD does this by creating the following unrolled copy and then insert it to the loop. KK = [2, 2, 2, 2] d3[i+2:i+3] = 3.0 * (II + KK); In the original code, unrolled copy was inserted after the last statement of the loop. However, in this case, it should be s2, not s3. The other problem is that, vectorized version of s1 needs to be unrolled 8 times, so the increment to base should be 2, 4, 6, 8, 10, 12, 14. However, the original code only deals with increment being 1, 2, 4, 8 case (hence the assertion failure). Thanks. Pallavi
vectorizer.p
Description: vectorizer.p
------------------------------------------------------------------------------ vRanger cuts backup time in half-while increasing security. With the market-leading solution for virtual backup and recovery, you get blazing-fast, flexible, and affordable data protection. Download your free trial now. http://p.sf.net/sfu/quest-d2dcopy1
_______________________________________________ Open64-devel mailing list Open64-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/open64-devel