Hi,
We are planning to merge our newly developed ZDL work to trunk. Before
that, according to Sun Chan's guide, we should provide a formal WHIRL change
proposal and get approved by Fred Chow first. The proposal is as below, it
get reviewed and revised by Fred. We are still looking forward to hearing
from the community, comments are highly appreciated. We'll submit the code
changes for review subsequently.
A). a brief introduction to ZDL
Zero-Delay-Loop(also called Zero-Overhead-Loop) is a commonly used DSP
feature which efficiently implements the "do" loops. Before entering the
loop segment, registers specifying the number of times(L) the loop, the
beginning (START_PC) and end (END_PC) of the loop body are set up. Then,
without explicit branches embedded in the code, the loop body is executed
the L times before proceeding to the next segment. Typically there is a loop
counter which is initialized to L and is decremented with each execution of
the loop. The ‘zero delay’ comes from the fact that the branch instruction
and another instruction used to increment/decrement the counter are
eliminated, resulting in fewer number of dynamic instructions and branch
misses are reduced to "zero".
B.) The proposed OPR_ZDLBR operator
To do ZDL, we can transform the loop:
while(k3>=10){
sum+=k1;
k3 --;
}
into the form:
zdl_loop(k3-9) {
sum+=k1;
}
So, we introduce a new ZDLBR operator, which represents the loop as:
LABEL L2050 0 {line: 0}
LOOP_INFO 0 1 1
I4I4LDID 73 <1,2,.preg_I4> T<4,.predef_I4,4> # k3
I4I4LDID 77 <1,2,.preg_I4> T<4,.predef_I4,4> # <preg>
END_LOOP_INFO
I4I4LDID 74 <1,2,.preg_I4> T<4,.predef_I4,4> # k1
I4I4LDID 75 <1,2,.preg_I4> T<4,.predef_I4,4> # sum
I4ADD
I4STID 75 <1,2,.preg_I4> T<4,.predef_I4,4> # sum {line: 5}
ZDLBR L2050 {line: 0}
deletint the IV k3 is now done to relieve the burden of CG. With right CG
handling, we believe we can still preserve right compiling.
We propose a specification for the OPR_ZDLBR:
************************************************************************************
A non-structured branch operation similar to OPR_FALSEBR/OPR_TRUEBR, which
pecifies a label to branch to conditionally, but the label must occur
before this instruction and must contain the LOOP_INFO structure.
ZDLBR's branch condition is implicitly specified by the LOOP_INFO
associated with the label.
Thus, the counting mechanism and branch condition are hidden and cannot be
optimized.
This operator can be used to indicate a zero-delay loop for targets that
provide such looping
instructions.
************************************************************************************
The lower requirement for OPR_ZDLBR is suggest as:
************************************************************************************
OPR_ZDLBR occurs at the M level WHIRL as a control flow operator,
it is either produced by the global scalar optimizer WOPT or by lowering
specific form of DO_LOOP operator.
There are no special handling when OPR_ZDLBR is lowered to L and VL level
WHIRL.
************************************************************************************
C). Expressablility on ZDLBR
With ZDLBR added, and special handling in CG, we believe open64 has equal
expressability to gcc's doloop_begin, doloop_end and
decrement_and_branch_until_zero patterns.
Thanks
Gang
------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
_______________________________________________
Open64-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/open64-devel