On Fri, Sep 30, 2011 at 5:22 PM, Roman Zhuykov zhr...@ispras.ru wrote:
2011/7/21 zhr...@ispras.ru:
This patch should be applied only after pending patches by Revital.
Ping. New version is attached, it suits current trunk without
additional patches.
Thanks for the ping.
Also this related patch needs approval:
http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01804.html
The loop should meet the following requirements.
First three are the same as for loop with doloop pattern:
...
The next three describe the control part of new supported loops.
- the last jump instruction should look like: pc=(regF!=0)?label:pc, regF is
you'd probably want to bump to next instruction if falling through,
e.g., pc=(regF!=0)?label:pc+4
flag register;
- the last instruction which sets regF should be: regF=COMPARE(regC,X),
where X
is a constant, or maybe a register, which is not changed inside a loop;
- only one instruction modifies regC inside a loop (other can use regC, but
not
write), and it should simply adjust it by a constant: regC=regC+step, where
step is a constant.
When doloop is succesfully scheduled by SMS, its number of
iterations of loop kernel should be decreased by the number of stages in a
schedule minus one, while other iterations expand to prologue and epilogue.
In new supported loops such approach can't be used, because some
instructions can use count register (regC). Instead of this,
the final register value X in compare instruction regF=COMPARE(regC,X)
is changed to another value Y respective to the stage this instruction
is scheduled (Y = X - stage * step).
making sure this does not underflow; i.e., that the number of
iterations is no less than stage (you've addressed this towards the
end below).
The main difference from doloop case is that regC can be used by some
instructions in loop body.
That's why we are unable to simply adjust regC initial value, but have
to keep it's value correct on each particular iteration.
So, we change comparison instruction accordingly.
An example:
int a[100];
int main()
{
int i;
for (i = 85; i 12; i -= 5)
a[i] = i * i;
return a[15]-225;
}
ARM assembler with -O2 -fno-auto-inc-dec:
ldr r0, .L5
mov r3, #85
mov r2, r0
.L2:
mul r1, r3, r3
sub r3, r3, #5
cmp r3, #10
str r1, [r2, #340]
sub r2, r2, #20
bne .L2
ldr r0, [r0, #60]
sub r0, r0, #225
bx lr
.L5:
.word a
Loop body is executed 15 times.
When compiling with SMS, it finds a schedule with ii=7, stage_count=3
and following times:
Stage Time Insn
0 5 mul r1, r3, r3
1 10 sub r3, r3, #5
1 11 cmp r3, #10
1 11 str r1, [r2, #340]
1 13 bne .L2
2 16 sub r2, r2, #20
branch is not scheduled last?
To make new schedule correct the loop body
should be executed 14 times and we change compare instruction:
the loop itself should execute 13 times.
regF=COMPARE(regC,X) to regF=COMPARE(regC,Y) where Y = X - stage * step.
In our example regC is r3, X is 10, step = -5, compare instruction
is scheduled on stage 1, so it should be Y = 10 - 1 * (-5) = 15.
right. In general, if the compare is on stage s (starting from 0), it
will be executed s times in the epilog, so it should exit the loop
upon reaching Y = X - s * step.
So, after SMS it looks like:
ldr r0, .L5
mov r3, #85
mov r2, r0
;;prologue
mul r1, r3, r3 ;;from stage 0 first iteration
sub r3, r3, #5 ;;3 insns from stage 1 first iteration
cmp r3, #10
str r1, [r2, #340]
mul r1, r3, r3 ;;from stage 0 second iteration
;;body
.L2:
sub r3, r3, #5
sub r2, r2, #20
cmp r3, #15 ;; new value to compare with is Y=15
str r1, [r2, #340]
mul r1, r3, r3
bne .L2
;;epilogue
sub r2, r2, #20 ;;from stage 2 pre-last iteration
sub r3, r3, #5 ;;3 insns from stage 1 last iteration
cmp r3, #10
str r1, [r2, #340]
sub r2, r2, #20 ;;from stage 2 last iteration
ldr r0, [r0, #60]
sub r0, r0, #225
bx lr
.L5:
.word a
Real ARM assembler with SMS (after some optimizations and without dead code):
mov r3, #85
ldr r0, .L8
mul r1, r3, r3
sub r3, r3, #5
mov r2, r0
str r1, [r0, #340]
mul r1, r3, r3
.L2:
sub r3, r3, #5
sub r2, r2, #20
cmp r3, #15
str r1, [r2, #340]
mul r1, r3, r3
bne .L2
str r1, [r2, #320]
ldr r0, [r0, #60]
sub r0, r0, #225
bx lr
.L8:
.word a