Hi Yiran,

The reason is because PRE is not applied to the increment amount of an IV update statement. In your example,

x += N*N << 3;

is an IV update statement. If PRE is applied, the rhs of the IV update statement may be transformed so that the statement is no longer an IV update statement, which may in turn disable other IV-related optimizations.

If you change the above statement to, say:

x = z + (N*N << 3);

(N*N << 3) will then be hoisted out of the loop because it is no longer an IV update statement.

You can grep for "Set_omitted()" in opt_etable.cxx and see that "occur->Stmt()->Iv_update()" is one reason an expression is set omitted.

Fred

On 06/26/2013 05:07 PM, Yiran Wang wrote:
Hi All,

This one looks somewhat similar to the last example, but is different.

int foo(int N, int j, int *x, int *z)
{
  int y = N;
  N += 7;
  N >>= 3;
  int i;
  for(i = 0; i< j; i++)
  {
    x += N*N << 3;
    z = x + N;
    y = y + *x + *z;
  }
  return y;
}

Assembly of the loop at -O3.
.p2align 4,,15
.Lt_0_3586:
 #<loop> Loop body line 7, nesting depth: 1, estimated iterations: 1000
.loc190
 #   8    {
 #   9      x += N*N << 3;
movl %eax,%ebx # [0]
.loc1110
 #  10      z = x + N;
 #  11      y = y + *x + *z;
addl $1,%ebp # [0]
.loc190
imull %eax,%ebx # [1]
shll $3,%ebx # [4]
shll $2,%ebx # [5]
addl %ebx,%edi # [6]
addl %ebx,%esi # [6]
.loc1110
movl 0(%edi),%ecx # [7] id:23
addl 0(%esi),%ecx # [10]
addl %ecx,%edx # [13]
cmpl 36(%esp),%ebp # [13] j
jl .Lt_0_3586 # [16]

As we see, the imul instruction remains in the loop.
(and two consequent shll instructions, my guess is that CG is thinking there should not be such input from WOPT, so it is not optimized in CG, though it is simple. )

It looks like SSA PRE omitted the rhs of Iv_update statement x+= N*N<<3, and VNFRE is only doing one level of CSE, say, promoting the ASHR + LDC 3 out of the loop.

I am curious why SSA PRE is omitting the expression here. By disabling this in opt_etable.cxx, the result looks good for this test case. I wonder if there is any correctness issue for some other test case, or performance issue?

It should be noted one strength reduction transformation is done for z for this case. Also replacing "N>>=3;" with "N*=5;" results in similar sub-optimal code.

Best Regards,
Yiran Wang




------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev


_______________________________________________
Open64-devel mailing list
Open64-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/open64-devel

------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
Open64-devel mailing list
Open64-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/open64-devel

Reply via email to