Using option "-WOPT:icopy=off:copy=off" to disable all copy-propagation?


2013/6/28 Yiran Wang <yiran.w...@gmail.com>

> Thanks for your comments.
>
> No, the assembly looks the same.
>
> Good or bad, the compiler is able to clean up the temporary completely,
> say, copy propagation and DSE.
>
> Regards,
> Yiran
>
>
>
> On Thu, Jun 27, 2013 at 12:26 AM, Jian-Xin Lai <laij...@gmail.com> wrote:
>
>> From your description, if the code is change a little:
>>
>>   for(i = 0; i< j; i++)
>>   {
>>     int t = N*N;
>>     x += t << 3;
>>     z = x + N;
>>     y = y + *x + *z;
>>   }
>>
>> Will the N*N be hoisted?
>>
>>
>> 2013/6/27 Yiran Wang <yiran.w...@gmail.com>
>>
>>> Hi All,
>>>
>>> This one looks somewhat similar to the last example, but is different.
>>>
>>> int foo(int N, int j, int *x, int *z)
>>> {
>>>   int y = N;
>>>   N += 7;
>>>   N >>= 3;
>>>   int i;
>>>   for(i = 0; i< j; i++)
>>>   {
>>>     x += N*N << 3;
>>>     z = x + N;
>>>     y = y + *x + *z;
>>>   }
>>>   return y;
>>> }
>>>
>>> Assembly of the loop at -O3.
>>> .p2align 4,,15
>>> .Lt_0_3586:
>>>  #<loop> Loop body line 7, nesting depth: 1, estimated iterations: 1000
>>>  .loc 1 9 0
>>>  #   8    {
>>>  #   9      x += N*N << 3;
>>> movl %eax,%ebx                 # [0]
>>> .loc 1 11 0
>>>  #  10      z = x + N;
>>>  #  11      y = y + *x + *z;
>>> addl $1,%ebp                   # [0]
>>> .loc 1 9 0
>>>  imull %eax,%ebx               # [1]
>>> shll $3,%ebx                   # [4]
>>>  shll $2,%ebx                   # [5]
>>> addl %ebx,%edi                 # [6]
>>>  addl %ebx,%esi                 # [6]
>>> .loc 1 11 0
>>>  movl 0(%edi),%ecx             # [7] id:23
>>> addl 0(%esi),%ecx             # [10]
>>>  addl %ecx,%edx                 # [13]
>>> cmpl 36(%esp),%ebp             # [13] j
>>>  jl .Lt_0_3586                 # [16]
>>>
>>> As we see, the imul instruction remains in the loop.
>>> (and two consequent shll instructions, my guess is that CG is thinking
>>> there should not be such input from WOPT, so it is not optimized in CG,
>>> though it is simple. )
>>>
>>> It looks like SSA PRE omitted the rhs of Iv_update statement x+= N*N<<3,
>>> and VNFRE is only doing one level of CSE, say, promoting the ASHR + LDC 3
>>> out of the loop.
>>>
>>> I am curious why SSA PRE is omitting the expression here.  By disabling
>>> this in opt_etable.cxx, the result looks good for this test case. I wonder
>>> if there is any correctness issue for some other test case, or performance
>>> issue?
>>>
>>> It should be noted one strength reduction transformation is done for z
>>> for this case. Also replacing "N>>=3;" with "N*=5;" results in similar
>>> sub-optimal code.
>>>
>>>  Best Regards,
>>> Yiran Wang
>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> This SF.net email is sponsored by Windows:
>>>
>>> Build for Windows Store.
>>>
>>> http://p.sf.net/sfu/windows-dev2dev
>>> _______________________________________________
>>> Open64-devel mailing list
>>> Open64-devel@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/open64-devel
>>>
>>>
>>
>>
>> --
>> Regards,
>> Lai Jian-Xin
>>
>
>


-- 
Regards,
Lai Jian-Xin
------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
Open64-devel mailing list
Open64-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/open64-devel

Reply via email to