Re: [PATCH 5/9] [SMS] Support new loop pattern

2011-10-11 Thread Ayal Zaks
On Fri, Sep 30, 2011 at 5:22 PM, Roman Zhuykov zhr...@ispras.ru wrote:
 2011/7/21  zhr...@ispras.ru:
 This patch should be applied only after pending patches by Revital.


 Ping. New version is attached, it suits current trunk without
 additional patches.

Thanks for the ping.

 Also this related patch needs approval:
 http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01804.html

 The loop should meet the following requirements.
 First three are the same as for loop with doloop pattern:
 ...
 The next three describe the control part of new supported loops.
 - the last jump instruction should look like:  pc=(regF!=0)?label:pc, regF is

you'd probably want to bump to next instruction if falling through,
e.g., pc=(regF!=0)?label:pc+4

  flag register;
 - the last instruction which sets regF should be: regF=COMPARE(regC,X), 
 where X
  is a constant, or maybe a register, which is not changed inside a loop;
 - only one instruction modifies regC inside a loop (other can use regC, but 
 not
  write), and it should simply adjust it by a constant: regC=regC+step, where
  step is a constant.

 When doloop is succesfully scheduled by SMS, its number of
 iterations of loop kernel should be decreased by the number of stages in a
 schedule minus one, while other iterations expand to prologue and epilogue.
 In new supported loops such approach can't be used, because some
 instructions can use count register (regC).  Instead of this,
 the final register value X in compare instruction regF=COMPARE(regC,X)
 is changed to another value Y respective to the stage this instruction
 is scheduled (Y = X - stage * step).

making sure this does not underflow; i.e., that the number of
iterations is no less than stage (you've addressed this towards the
end below).


 The main difference from doloop case is that regC can be used by some
 instructions in loop body.
 That's why we are unable to simply adjust regC initial value, but have
 to keep it's value correct on each particular iteration.
 So, we change comparison instruction accordingly.

 An example:
 int a[100];
 int main()
 {
  int i;
  for (i = 85; i  12; i -= 5)
      a[i] = i * i;
  return a[15]-225;
 }
 ARM assembler with -O2 -fno-auto-inc-dec:
        ldr     r0, .L5
        mov     r3, #85
        mov     r2, r0
 .L2:
        mul     r1, r3, r3
        sub     r3, r3, #5
        cmp     r3, #10
        str     r1, [r2, #340]
        sub     r2, r2, #20
        bne     .L2
        ldr     r0, [r0, #60]
        sub     r0, r0, #225
        bx      lr
 .L5:
        .word   a

 Loop body is executed 15 times.
 When compiling with SMS, it finds a schedule with ii=7, stage_count=3
 and following times:
 Stage  Time       Insn
 0          5      mul     r1, r3, r3
 1         10     sub     r3, r3, #5
 1         11     cmp     r3, #10
 1         11     str     r1, [r2, #340]
 1         13     bne     .L2
 2         16     sub     r2, r2, #20


branch is not scheduled last?

 To make new schedule correct the loop body
 should be executed 14 times and we change compare instruction:

the loop itself should execute 13 times.

 regF=COMPARE(regC,X) to regF=COMPARE(regC,Y) where Y = X - stage * step.
 In our example regC is r3, X is 10, step = -5, compare instruction
 is scheduled on stage 1, so it should be Y = 10 - 1 * (-5) = 15.


right. In general, if the compare is on stage s (starting from 0), it
will be executed s times in the epilog, so it should exit the loop
upon reaching Y = X - s * step.

 So, after SMS it looks like:
        ldr     r0, .L5
        mov     r3, #85
        mov     r2, r0
 ;;prologue
        mul     r1, r3, r3      ;;from stage 0 first iteration
        sub     r3, r3, #5      ;;3 insns from stage 1 first iteration
        cmp     r3, #10
        str     r1, [r2, #340]
        mul     r1, r3, r3      ;;from stage 0 second iteration
 ;;body
 .L2:
        sub     r3, r3, #5
        sub     r2, r2, #20
        cmp     r3, #15         ;; new value to compare with is Y=15
        str     r1, [r2, #340]
        mul     r1, r3, r3
        bne     .L2
 ;;epilogue
        sub     r2, r2, #20     ;;from stage 2 pre-last iteration
        sub     r3, r3, #5      ;;3 insns from stage 1 last iteration
        cmp     r3, #10
        str     r1, [r2, #340]
        sub     r2, r2, #20     ;;from stage 2 last iteration

        ldr     r0, [r0, #60]
        sub     r0, r0, #225
        bx      lr
 .L5:
        .word   a

 Real ARM assembler with SMS (after some optimizations and without dead code):
        mov     r3, #85
        ldr     r0, .L8
        mul     r1, r3, r3
        sub     r3, r3, #5
        mov     r2, r0
        str     r1, [r0, #340]
        mul     r1, r3, r3
 .L2:
        sub     r3, r3, #5
        sub     r2, r2, #20
        cmp     r3, #15
        str     r1, [r2, #340]
        mul     r1, r3, r3
        bne     .L2
        str     r1, [r2, #320]
        ldr     r0, [r0, #60]
        sub     r0, r0, #225
        bx      lr
 .L8:
        .word   a


 

Re: [PATCH 5/9] [SMS] Support new loop pattern

2011-07-27 Thread Roman Zhuykov
2011/7/26 Richard Sandiford richard.sandif...@linaro.org:
 Note that on ARM, the comparison and loop counter addition can happen
 as a single parallel:

Certainly, I notice such subs ARM instructions.  IMHO, this pattern seems to
appear rarely in real loops.  For loops without doloop_end pattern we have to
make the following instruction transformation as I have noticed already:

The final register value X in compare instruction regF=COMPARE(regC,X) is
changed to another value Y respective to the stage this instruction is
scheduled: (Y = X - stage * step)

In subs instruction we are unable to do this, because we can't change the
number to compare with.  It seems there are three following ways of
solving this.

The first way is to check that counter register is not used by non-control-flow
instructions before running SMS on such loops.  The same condition is
checked in doloop_condition_get.

The second way is to allow SMS to process loop with subs instruction, but when
the schedule is already computed, then allow to apply it only if X == Y
(otherwise new schedule lead to miscompilation).

The third way is to create a pair of sub and cmp instructions instead of subs
when needed.

 I think we'd need to handle that too before getting rid of the
 ARM doloop_end pattern.

I think all three ways are complicated enough and decide to begin with
implementing SMS without such loops support.

--
Roman Zhuykov
zhr...@ispras.ru


Re: [PATCH 5/9] [SMS] Support new loop pattern

2011-07-26 Thread Richard Sandiford
zhr...@ispras.ru writes:
 The next three describe the control part of new supported loops.
 - the last jump instruction should look like:  pc=(regF!=0)?label:pc, regF is
   flag register;
 - the last instruction which sets regF should be: regF=COMPARE(regC,X), where 
 X
   is a constant, or maybe a register, which is not changed inside a loop;
 - only one instruction modifies regC inside a loop (other can use regC, but 
 not
   write), and it should simply adjust it by a constant: regC=regC+step, where
   step is a constant.

Note that on ARM, the comparison and loop counter addition can happen
as a single parallel:

(insn 29 27 30 3 (parallel [
(set (reg:CC_NOOV 24 cc)
(compare:CC_NOOV (plus:SI (reg:SI 142 [ ivtmp.9 ])
(const_int -1 [0x]))
(const_int 0 [0])))
(set (reg:SI 142 [ ivtmp.9 ])
(plus:SI (reg:SI 142 [ ivtmp.9 ])
(const_int -1 [0x])))
]) /tmp/bar5.c:9 6 {addsi3_compare0}
 (nil))

I think we'd need to handle that too before getting rid of the
ARM doloop_end pattern.

Richard


Re: [PATCH 5/9] [SMS] Support new loop pattern

2011-07-24 Thread Revital1 Eres
Hello Roman,

 This patch should be applied only after pending patches by Revital. This
patch
 significantly enhances the existing implementation of the SMS.  Patch
adds
 support of scheduling loops without doloop pattern.  The loop should meet
the
 following requirements.

Thanks for the patch!
I will try to go through it in more details soon.
I'm testing now whether the recent bootstrap
failure on ARM machine (PR49789) is revolved with your patch.
I also plan to see the effect of it on PowerPC and SPU which currently
support doloop pattern.

Thanks,
Revital