On Sat, Jul 9, 2011 at 12:40 AM, Ding-Kai Chen <[email protected]> wrote:

>
> Have you looked into binding to ZDL inside CG instead of at WHIRL level?
>
>
   Yes, it is the original approach, we meet many troubles in deleting the
IV related insns, and there are other restrictions due to CG could not
provide many high-level infos.


> The reasons are:
>
> 1. ZDL will be fairly architecture dependent and I am not sure if you want
> to include that in WHIRL. Potentially, you could generate ZDL instruction
> within CG or do it when lowering from low WHIRL to CGIR.
>
>
   Yes, ZDL is highly target dependent. But, the common nature is that the
counting mechanism hidden and the bottom branch removed, with this general
stuff, we generate ZDLBR for the trip-counted loops.


> 2. Will ZDL be used for innermost loop only? Committing to a WHIRL loop
> that early creates trouble if that loop is removed/fully unrolled in CG.
>
>    No, the SL arch is a nested ZDL arch. This is a WOPT/CG co-operation
approach, we do not say when loop is transfered to ZDLBR then it is a ZDL.
It is a *lazy* implementation, the real zdl gen is done after unroll. We
believe unroll still performs without many trouble. We will put the code
changes out next week.

Thanks.
Gang

My $0.02.
>
> Ding-Kai
>
>
> Gang Yu wrote:
>
>> Hi,
>>
>>   We are planning to merge our newly developed ZDL work to trunk. Before
>> that, according to Sun Chan's guide, we should provide a formal WHIRL change
>> proposal and get approved by Fred Chow first. The proposal is as below, it
>> get reviewed and revised by Fred. We are still looking forward to hearing
>> from the community, comments are highly appreciated. We'll submit the code
>> changes for review subsequently.
>>
>> A). a brief introduction to ZDL
>>     Zero-Delay-Loop(also called Zero-Overhead-Loop) is a commonly used DSP
>> feature which efficiently implements the "do" loops. Before entering the
>> loop segment, registers specifying the number of times(L) the loop, the
>> beginning (START_PC) and end (END_PC) of the loop body are set up. Then,
>> without explicit branches embedded in the code, the loop body is executed
>> the L times before proceeding to the next segment. Typically there is a loop
>> counter which is initialized to L and is decremented with each execution of
>> the loop. The ‘zero delay’ comes from the fact that the branch instruction
>> and another instruction used to increment/decrement the counter are
>> eliminated, resulting in fewer number of dynamic instructions and branch
>> misses are reduced to "zero".
>>
>> B.) The proposed OPR_ZDLBR operator
>>
>>   To do ZDL, we can transform the loop:
>>
>>    while(k3>=10){
>>       sum+=k1;
>>       k3 --;
>>    }
>>
>>   into the form:
>>   zdl_loop(k3-9) {
>>     sum+=k1;
>>   }
>>    So, we introduce a new ZDLBR operator, which represents the loop as:
>>
>>   LABEL L2050 0 {line: 0}
>>  LOOP_INFO 0 1 1
>>   I4I4LDID 73 <1,2,.preg_I4> T<4,.predef_I4,4> # k3
>>   I4I4LDID 77 <1,2,.preg_I4> T<4,.predef_I4,4> # <preg>
>>  END_LOOP_INFO
>>   I4I4LDID 74 <1,2,.preg_I4> T<4,.predef_I4,4> # k1
>>   I4I4LDID 75 <1,2,.preg_I4> T<4,.predef_I4,4> # sum
>>  I4ADD
>>  I4STID 75 <1,2,.preg_I4> T<4,.predef_I4,4> # sum {line: 5}
>>  ZDLBR L2050 {line: 0}
>>
>>  deletint the IV k3 is now done to relieve the burden of CG. With right CG
>> handling, we believe we can still preserve right compiling.
>>
>>  We propose a specification for the OPR_ZDLBR:
>>
>> ****************************************************************
>> ************************
>> A non-structured branch operation similar to OPR_FALSEBR/OPR_TRUEBR, which
>> pecifies a label to branch to conditionally, but the label must occur
>> before this instruction and must contain the LOOP_INFO structure.
>> ZDLBR's branch condition is implicitly specified  by the LOOP_INFO
>> associated with the label.
>> Thus, the counting mechanism  and branch condition are hidden and cannot
>> be optimized.
>> This operator can  be  used to indicate a zero-delay loop for targets that
>> provide such looping
>> instructions.
>> ****************************************************************
>> ************************
>>
>> The lower requirement for OPR_ZDLBR is suggest as:
>>
>> ****************************************************************
>> ************************
>> OPR_ZDLBR occurs at the M level WHIRL as a control flow operator,
>> it is either produced by the global scalar optimizer WOPT or by lowering
>> specific form of DO_LOOP operator.
>>  There are no special handling when OPR_ZDLBR is lowered to L and VL level
>> WHIRL.
>> ****************************************************************
>> ************************
>>
>> C). Expressablility on ZDLBR
>> With ZDLBR added, and special handling in CG, we believe open64 has equal
>> expressability to gcc's doloop_begin, doloop_end and
>> decrement_and_branch_until_**zero patterns.
>>
>>
>> Thanks
>> Gang
>>
>
>
------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
_______________________________________________
Open64-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/open64-devel

Reply via email to