Hi Mei, See the attached updated and re-tested patch.
For comment 1: Removed redundant WN_MAP_Set. For comment 2: Removed redundant initialization as the original code took advantage of these being initialized to NULL or FALSE according to C standard. For comment 3: 'vec_unroll_preg_store' corresponds to 'II' in the conceptual transformation shown for the fourth test case and its size corresponds to the vector length of the vectorized operation. For the example shown, since d3 is an array of doubles, we need II = [i, i+1], but if d3 was an array of floats, we would need II = [i, i+1, i+2, i+3] and so on. For a vector operation packing 16 scalar operations into one, we would need a size 16 array (though I don't think currently the vectorizer can generate such operations). Also, depending on the unrolling factor, the index variable may need to be incremented by 2,4,6,8,10,12,16. The unrolling is done in batch after vectorization. For any statement unrolled several times, each time, an increment is added to the "add_to_base" to compute how much increment is needed for the unrolling. Therefore, for a single statement (e.g. s1 in the last example) whose unroll factor is 2, each increment to the index should be a multiple of 2. Thanks. Pallavi From: Ye, Mei Sent: Wednesday, May 25, 2011 5:25 PM To: Mathew, Pallavi; open64-devel@lists.sourceforge.net Subject: RE: Code review request for vectorizer bugfix patch -L117, "WN_MAP_Set" seems to be redundant here. -L31/32, you changed the size of "vec_unroll_preg_created" and "vec_unroll_preg_store" but did not change the trip count in the loop that initializes these arrays. -L35, If the index variable can only be updated by factors: 1, 2, 4, 8, 16, then we only need to create 5 types of pregs instead of 16. -Mei From: Mathew, Pallavi [mailto:pallavi.mat...@amd.com] Sent: Wednesday, May 25, 2011 1:34 PM To: open64-devel@lists.sourceforge.net Subject: [Open64-devel] Code review request for vectorizer bugfix patch Hi, Can a gatekeeper please review the attached vectorizer patch? Sample testcases that fail at -O3: #include <stdio.h> int a[1000], b[1000], c[1000]; void init (void) { int i; for (i = 0; i < 1000; i++) { a[i] = 1; b[i] = 2; c[i] = 3; } } int main (int argc, char** argv) { int i; init (); for (i = 2; i < 1000; i++) { b[i] = a[i-2]; a[i] = c[i]; } fprintf (stderr, "%d\n", b[6]); return 0; } --- #include <stdio.h> #define N 1000 double d1[N], d2[N], d3[N]; float f1[N], f2[N]; void foo (void) { int i; for (i = 4; i < N; i++) { d1[i] = d2[i]; /* s1 */ f1[i] = f2[i]; /* s2 */ d3[i] = d1[i+3]; /* s3 */ } } void init (void) { int i; for (i = 0; i < N; i++) { d1[i] = 1.0; d2[i] = 2.0; d3[i] = 3.0; } } int main (int argc, char** argv) { init (); foo (); fprintf (stderr, "%f\n", (float)d3[4]); return 0; } --- #include <stdio.h> #define N 1000 double d1[N], d2[N], d3[N]; char f1[N], f2[N]; void foo (void) { int i; for (i = 0; i < N - 4; i++) { d1[i] = d2[i] + d3[i+1]; /* s1 */ f1[i] = f2[i]; /* s2 */ d3[i] = d1[i+3]; /* s3 */ } } void init (void) { int i; for (i = 0; i < N; i++) { d1[i] = 1.0 * i; d2[i] = 2.0 * i; d3[i] = 3.0 * i; } } int main (int argc, char** argv) { int i; init (); foo (); for (i=0; i<10; i++){ fprintf (stderr, "%f ", (float)d3[i]); } fprintf(stderr, "\n"); return 0; } --- #include <stdio.h> #define N 1000 double d1[N], d2[N], d3[N]; char f1[N], f2[N]; int main (int argc, char** argv) { int i; for (i = 0; i < N; i++) { d3[i] = 3.0 * i; /* s1 */ f1[i] = f2[i]; /* s2 */ } for (i=0; i<10; i++){ fprintf (stderr, "%f ", (float)d3[i]); } fprintf(stderr, "\n"); return 0; } --- Problem/Fix-Description: The central issue is incorrect ordering of vectorized statements. In the first example, the vector length is 4. Therefore, we need to re-order the two statements in order to perform vectorization correctly. b[i] = a[i-2]; a[i] = c[i]; To fix the problem, we topologically reorder the statements right before performing vectorization. In the other examples the problem is caused by incorrect order of certain unrolled statements during vectorization, coupled sometimes (last two examples) with incorrect handling of induction variables in loop unrolling during SIMD. Before this fix, copies of unrolled statement are inserted right after current statement, without checking dependencies. This causes problems to the following example code: d1[i] = d2[i] + d3[i+1]; /* s1 */ f1[i] = f2[i]; /* s2 */ d3[i] = d1[i+3]; /* s3 */ Due to the presence of s2, s1 and s3 need to be unrolled by 2. So, the code will become the following: d1[i:i+1] = d2[i:i+1] + d3[i+1:i+2];/* s1 */ d1[i+2:i+3] = d2[i+2:i+3] + d3[i+3:i+4];/* s1' */ f1[i:i+3] = f2[i:i+3]; /* s2 */ d3[i:i+1] = d1[i+3:i+4]; /* s3 */ d3[i+2:i+3] = d1[i+5:i+6]; /* s3' */ Due to s1', d1[i+3] is written before read which is not so in original code. The fix is to insert unrolled copies of statements in the order of their appearence in the original loop. In the last example, when vectorizing `3.0 * I' (use of index variable in non array subscript operation), the first loop is changed to (conceptually): II = [start:start+4] INC= [16, 16, 16, 16] for (i = start; i < end; i+=16) { d3[i:i+1] = 3.0 * II; /* vectorized version of s1 */ f1[i:i+15]= f2[i:i+15]; /* vectorized version of s2 */ II += INC; /* s3, created by SIMD */ } Then, unrolling is performed. Since the loop is unrolled 16 times, we need to fill remaining copies of s1. SIMD does this by creating the following unrolled copy and then insert it to the loop. KK = [2, 2, 2, 2] d3[i+2:i+3] = 3.0 * (II + KK); In the original code, unrolled copy was inserted after the last statement of the loop. However, in this case, it should be s2, not s3. The other problem is that, vectorized version of s1 needs to be unrolled 8 times, so the increment to base should be 2, 4, 6, 8, 10, 12, 14. However, the original code only deals with increment being 1, 2, 4, 8 case (hence the assertion failure). Thanks. Pallavi
vectorizer2.p
Description: vectorizer2.p
------------------------------------------------------------------------------ Simplify data backup and recovery for your virtual environment with vRanger. Installation's a snap, and flexible recovery options mean your data is safe, secure and there when you need it. Data protection magic? Nope - It's vRanger. Get your free trial download today. http://p.sf.net/sfu/quest-sfdev2dev
_______________________________________________ Open64-devel mailing list Open64-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/open64-devel