Re: [Open64-devel] one more question about strength reduction and SSA PRE

Yiran Wang Thu, 27 Jun 2013 10:16:41 -0700

Thanks for your comments.

No, the assembly looks the same.


Good or bad, the compiler is able to clean up the temporary completely,
say, copy propagation and DSE.

Regards,
Yiran



On Thu, Jun 27, 2013 at 12:26 AM, Jian-Xin Lai <[email protected]> wrote:

> From your description, if the code is change a little:
>
>   for(i = 0; i< j; i++)
>   {
>     int t = N*N;
>     x += t << 3;
>     z = x + N;
>     y = y + *x + *z;
>   }
>
> Will the N*N be hoisted?
>
>
> 2013/6/27 Yiran Wang <[email protected]>
>
>> Hi All,
>>
>> This one looks somewhat similar to the last example, but is different.
>>
>> int foo(int N, int j, int *x, int *z)
>> {
>>   int y = N;
>>   N += 7;
>>   N >>= 3;
>>   int i;
>>   for(i = 0; i< j; i++)
>>   {
>>     x += N*N << 3;
>>     z = x + N;
>>     y = y + *x + *z;
>>   }
>>   return y;
>> }
>>
>> Assembly of the loop at -O3.
>> .p2align 4,,15
>> .Lt_0_3586:
>>  #<loop> Loop body line 7, nesting depth: 1, estimated iterations: 1000
>>  .loc 1 9 0
>>  #   8    {
>>  #   9      x += N*N << 3;
>> movl %eax,%ebx                 # [0]
>> .loc 1 11 0
>>  #  10      z = x + N;
>>  #  11      y = y + *x + *z;
>> addl $1,%ebp                   # [0]
>> .loc 1 9 0
>>  imull %eax,%ebx               # [1]
>> shll $3,%ebx                   # [4]
>>  shll $2,%ebx                   # [5]
>> addl %ebx,%edi                 # [6]
>>  addl %ebx,%esi                 # [6]
>> .loc 1 11 0
>>  movl 0(%edi),%ecx             # [7] id:23
>> addl 0(%esi),%ecx             # [10]
>>  addl %ecx,%edx                 # [13]
>> cmpl 36(%esp),%ebp             # [13] j
>>  jl .Lt_0_3586                 # [16]
>>
>> As we see, the imul instruction remains in the loop.
>> (and two consequent shll instructions, my guess is that CG is thinking
>> there should not be such input from WOPT, so it is not optimized in CG,
>> though it is simple. )
>>
>> It looks like SSA PRE omitted the rhs of Iv_update statement x+= N*N<<3,
>> and VNFRE is only doing one level of CSE, say, promoting the ASHR + LDC 3
>> out of the loop.
>>
>> I am curious why SSA PRE is omitting the expression here.  By disabling
>> this in opt_etable.cxx, the result looks good for this test case. I wonder
>> if there is any correctness issue for some other test case, or performance
>> issue?
>>
>> It should be noted one strength reduction transformation is done for z
>> for this case. Also replacing "N>>=3;" with "N*=5;" results in similar
>> sub-optimal code.
>>
>>  Best Regards,
>> Yiran Wang
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> This SF.net email is sponsored by Windows:
>>
>> Build for Windows Store.
>>
>> http://p.sf.net/sfu/windows-dev2dev
>> _______________________________________________
>> Open64-devel mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/open64-devel
>>
>>
>
>
> --
> Regards,
> Lai Jian-Xin
>

------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev

_______________________________________________
Open64-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/open64-devel

Re: [Open64-devel] one more question about strength reduction and SSA PRE

Reply via email to