-L117, "WN_MAP_Set" seems to be redundant here.
-L31/32, you changed the size of "vec_unroll_preg_created" and 
"vec_unroll_preg_store" but did not change the trip count in the loop that 
initializes these arrays.
-L35, If the index variable can only be updated by factors: 1, 2, 4, 8, 16, 
then we only need to create 5 types of pregs instead of 16.

-Mei

From: Mathew, Pallavi [mailto:pallavi.mat...@amd.com]
Sent: Wednesday, May 25, 2011 1:34 PM
To: open64-devel@lists.sourceforge.net
Subject: [Open64-devel] Code review request for vectorizer bugfix patch

Hi,

Can a gatekeeper please review the attached vectorizer patch?

Sample testcases that fail at -O3:
#include <stdio.h>
int a[1000], b[1000], c[1000];
void init (void) {
    int i;
    for (i = 0; i < 1000; i++) {
        a[i] = 1;
        b[i] = 2;
        c[i] = 3;
    }
}

int
main (int argc, char** argv) {
    int i;
    init ();
    for (i = 2; i < 1000; i++) {
        b[i] = a[i-2];
        a[i] = c[i];
    }
    fprintf (stderr, "%d\n", b[6]);
    return 0;
}
---
#include <stdio.h>

#define N 1000
double d1[N], d2[N], d3[N];
float f1[N], f2[N];

void foo (void) {
    int i;
    for (i = 4; i < N; i++) {
        d1[i] = d2[i];         /* s1 */
        f1[i] = f2[i];         /* s2 */
        d3[i] = d1[i+3];       /* s3 */
    }
}

void
init (void) {
    int i;
    for (i = 0; i < N; i++) {
        d1[i] = 1.0;
        d2[i] = 2.0;
        d3[i] = 3.0;
    }
}

int
main (int argc, char** argv) {
    init ();
    foo ();
    fprintf (stderr, "%f\n", (float)d3[4]);
    return 0;
}

---
#include <stdio.h>

#define N 1000
double d1[N], d2[N], d3[N];
char f1[N], f2[N];

void foo (void) {
    int i;
    for (i = 0; i < N - 4; i++) {
        d1[i] = d2[i] + d3[i+1];         /* s1 */
        f1[i] = f2[i];         /* s2 */
        d3[i] = d1[i+3];       /* s3 */
    }
}

void init (void) {
    int i;
    for (i = 0; i < N; i++) {
        d1[i] = 1.0 * i;
        d2[i] = 2.0 * i;
        d3[i] = 3.0 * i;
    }
}

int main (int argc, char** argv) {
    int i;
    init ();
    foo ();
    for (i=0; i<10; i++){
        fprintf (stderr, "%f ", (float)d3[i]);
    }
    fprintf(stderr, "\n");
    return 0;
}
---
#include <stdio.h>
#define N 1000
double d1[N], d2[N], d3[N];
char f1[N], f2[N];

int main (int argc, char** argv) {
  int i;
  for (i = 0; i < N; i++) {
    d3[i] = 3.0 * i;       /* s1 */
    f1[i] = f2[i];         /* s2 */
  }
  for (i=0; i<10; i++){
    fprintf (stderr, "%f ", (float)d3[i]);
  }
  fprintf(stderr, "\n");
  return 0;
}

---
Problem/Fix-Description:
 The central issue is incorrect ordering of vectorized statements.

 In the first example, the vector length is 4. Therefore, we need to
 re-order the two statements in order to perform vectorization correctly.

 b[i] = a[i-2];
 a[i] = c[i];

 To fix the problem, we topologically reorder the statements right
 before performing vectorization.

 In the other examples the problem is caused by incorrect order of certain
 unrolled statements during vectorization, coupled sometimes (last two examples)
 with incorrect handling of induction variables in loop unrolling during SIMD.
 Before this fix, copies of unrolled statement are inserted right after current 
statement,
 without checking dependencies. This causes problems to the following example 
code:

 d1[i] = d2[i] + d3[i+1]; /* s1 */
 f1[i] = f2[i]; /* s2 */
 d3[i] = d1[i+3]; /* s3 */

 Due to the presence of s2, s1 and s3 need to be unrolled by 2. So,
 the code will become the following:

 d1[i:i+1] = d2[i:i+1] + d3[i+1:i+2];/* s1 */
 d1[i+2:i+3] = d2[i+2:i+3] + d3[i+3:i+4];/* s1' */
 f1[i:i+3] = f2[i:i+3]; /* s2 */
 d3[i:i+1] = d1[i+3:i+4]; /* s3 */
 d3[i+2:i+3] = d1[i+5:i+6]; /* s3' */

 Due to s1', d1[i+3] is written before read which is not so in original
 code. The fix is to insert unrolled copies of statements in the order
 of their appearence in the original loop.

 In the last example, when vectorizing `3.0 * I' (use of index variable in non 
array
 subscript operation), the first loop is changed to (conceptually):

 II = [start:start+4]
 INC= [16, 16, 16, 16]
 for (i = start; i < end; i+=16) {
    d3[i:i+1] = 3.0 * II;      /* vectorized version of s1 */
    f1[i:i+15]= f2[i:i+15];    /* vectorized version of s2 */
    II += INC;                 /* s3, created by SIMD */
 }

 Then, unrolling is performed. Since the loop is unrolled 16 times, we
 need to fill remaining copies of s1. SIMD does this by creating the
 following unrolled copy and then insert it to the loop.

 KK = [2, 2, 2, 2]
 d3[i+2:i+3] = 3.0 * (II + KK);

 In the original code, unrolled copy was inserted after the last
 statement of the loop. However, in this case, it should be s2, not
 s3. The other problem is that, vectorized version of s1 needs to be
 unrolled 8 times, so the increment to base should be 2, 4, 6, 8, 10,
 12, 14. However, the original code only deals with increment being 1,
 2, 4, 8 case (hence the assertion failure).

Thanks.
Pallavi

------------------------------------------------------------------------------
vRanger cuts backup time in half-while increasing security.
With the market-leading solution for virtual backup and recovery, 
you get blazing-fast, flexible, and affordable data protection.
Download your free trial now. 
http://p.sf.net/sfu/quest-d2dcopy1
_______________________________________________
Open64-devel mailing list
Open64-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/open64-devel

Reply via email to