Re: IRA update_equiv_regs for (was Re: ICE for interim fix for PR/110748)

2023-08-14 Thread Jeff Law via Gcc-patches




On 8/14/23 18:35, Vineet Gupta wrote:


On 8/11/23 17:04, Jeff Law wrote:


I'm wondering (naively) if there is some way to tune this - for a 
given backend. In general it would make sense to do the replacement, 
but not if the cost changes (e.g. consts could be embedded in x86 
insn freely, but not for RISC-V where this is costly and if something 
is split, it might been intentional.

I'm not immediately aware of a way to tune.

When it comes to tuning, the toplevel questions are do we have any of 
the info we need to tune at the point where the transformation occurs. 
The two most obvious pieces here would be loop info an register pressure.


ie, do we have enough loop structure to know if the def is at a 
shallower loop nest than the use.  There's a reasonable chance we have 
this information as my recollection is this analysis is done fairly 
early in IRA.


But that means we likely don't have any sense of register pressure at 
the points between the def and use.   So the most useful metric for 
tuning isn't really available.


I'd argue that even if the register pressure were high, in some cases, 
there's just no way around it and RA needs to honor what the backend did 
apriori (split in this case), otherwise we end up with something which 
doesn't compute literally and leads to ICE. I'm puzzled that in this 
case, intentional implementation is getting in the way. So while I don't 
care about the -0.0 case in itself, it seems with the current framework 
we can't just achieve the results, other that the roundabout way of 
peephole2 you alluded to.
I think you'll run into a lot of resistance with that approach.   The 
fact it we're being a bit sneaky and telling a bit of a fib in the 
backend (claiming support for certain capabilities that don't actually 
exist).


As many have said, lie to GCC and ultimately it will gets revenge.  This 
is but one example.


When we lie to some parts of gcc, we may well trigger undesirable 
behavior later in the pipeline.  It's a tradeoff and sometimes we have 
to back out those little lies.










The one thing that stands out is we don't do this transformation at 
all when register pressure sensitive scheduling is enabled. And we 
really should be turning that on by default.  Our data shows register 
pressure sensitive scheduling is about a 6-7% cycle improvement on 
x264 as it avoids spilling in those key satd loops.



 /* Don't move insns if live range shrinkage or register
 pressure-sensitive scheduling were done because it will not
 improve allocation but likely worsen insn scheduling.  */
  if (optimize
  && !flag_live_range_shrinkage
  && !(flag_sched_pressure && flag_schedule_insns))
    combine_and_move_insns ();



So you might want to look at register pressure sensitive scheduling 
first.  If you go into x264_r from specint and look at 
x264_pixel_satd_8x4.  First verify the loops are fully unrolled. If 
they are, then look for 32bit loads/stores into the stack.  If you 
have them, then you're spilling and getting crappy performance.  Using 
register pressure sensitive scheduling should help significantly.


Is that -fira-loop-pressure ?

-fsched-pressure I think.




We've certainly seen that internally.  The plan was to submit a patch 
to make register pressure sensitive scheduling the default when the 
scheduler is enabled.  We just haven't pushed on it.  If you can 
verify that you're seeing spilling as well, then it'd certainly 
bolster the argument that register-pressure-sensitive-scheduling is 
desirable.


I can confirm that the loop is fully unrolled and there's a zillion 
stack spills there for intermediate computes (-Ofast 
-march=rv64gc_zba_zbb_zbs, no V in that build).
Yea, you'll take a big hit from those spills.  Good to get a 
confirmation that you're seeing it too.


The fix should be pretty simple.  We just turn on -fsched-pressure in 
the RV backend.



Jeff


Re: IRA update_equiv_regs for (was Re: ICE for interim fix for PR/110748)

2023-08-14 Thread Jeff Law via Gcc-patches




On 8/12/23 10:44, Jivan Hakobyan wrote:

Yes, as mentioned Jeff I have some work in that scope.

The first is related to address computation when it has a large constant 
part.

Suppose we have this code:

     int  consume (void *);
     int foo (void) {
        int x[1000];
        return consume (x);
     }

before IRA we have the following sequence
     19: r140:DI=0xf000
     20: r136:DI=r140:DI+0x60
       REG_EQUAL 0xf060
     8: a0:DI=frame:DI+r136:DI
       REG_DEAD r136:DI

but during IRA (eliminate_regs_in_insn) insn 8 transforms to
    8: a0:DI=r136:DI+0xfa0+frame:DI
         REG_DEAD r136:DI

and in the end, we get the wrong sequence.
    21: r136:DI=0xf060
       REG_EQUIV 0xf060
    25: r143:DI=0x1000
    26: r142:DI=r143:DI-0x60
       REG_DEAD r143:DI
       REG_EQUAL 0xfa0
    27: r142:DI=r142:DI+r136:DI
       REG_DEAD r136:DI
    8: a0:DI=r142:DI+frame:DI
       REG_DEAD r142:DI

My changes prevent that transformation.
I have tested on spec and did not get regressions.
Besides. executed 40B fewer instructions.
Right.  And this looks like a generic failing of the register 
elimination code to simplify after eliminating fp/ap to sp.  It's a bit 
of a surprise as I thought that code had some simplification 
capabilities.   But clearly if it has that ability it isn't working 
well.  Part of me wondered if it's falling down due to constants not 
fitting in a 12 bit signed immediate.   I've got a TODO to look at your 
patch in this space.  Maybe tonight if I can keep moving things off my 
TODO list ;-)




The second work related to hoisting out loop invariant code.
I have a test case where SP + const can be hoisted out.
..
.L3:
       call foo
       addi a5,sp,16
       sh3add a0,a0,a5
...

Before IRA that code is already out of the loop, but IRA moves back.
My approach was done in update_equiv_regs().
It prevents any move if its uses and defs are held in a single place, 
and used in the loop.

Currently, that improvement is under evaluation.
Yea, we're going to need to sit down with this.  IRA is working per 
design and we may be able to avoid these problems with -fsched-pressure, 
but it feels a bit hackish.



Jeff


Re: IRA update_equiv_regs for (was Re: ICE for interim fix for PR/110748)

2023-08-14 Thread Vineet Gupta



On 8/11/23 17:04, Jeff Law wrote:


I'm wondering (naively) if there is some way to tune this - for a 
given backend. In general it would make sense to do the replacement, 
but not if the cost changes (e.g. consts could be embedded in x86 
insn freely, but not for RISC-V where this is costly and if something 
is split, it might been intentional.

I'm not immediately aware of a way to tune.

When it comes to tuning, the toplevel questions are do we have any of 
the info we need to tune at the point where the transformation occurs. 
The two most obvious pieces here would be loop info an register pressure.


ie, do we have enough loop structure to know if the def is at a 
shallower loop nest than the use.  There's a reasonable chance we have 
this information as my recollection is this analysis is done fairly 
early in IRA.


But that means we likely don't have any sense of register pressure at 
the points between the def and use.   So the most useful metric for 
tuning isn't really available.


I'd argue that even if the register pressure were high, in some cases, 
there's just no way around it and RA needs to honor what the backend did 
apriori (split in this case), otherwise we end up with something which 
doesn't compute literally and leads to ICE. I'm puzzled that in this 
case, intentional implementation is getting in the way. So while I don't 
care about the -0.0 case in itself, it seems with the current framework 
we can't just achieve the results, other that the roundabout way of 
peephole2 you alluded to.




The one thing that stands out is we don't do this transformation at 
all when register pressure sensitive scheduling is enabled. And we 
really should be turning that on by default.  Our data shows register 
pressure sensitive scheduling is about a 6-7% cycle improvement on 
x264 as it avoids spilling in those key satd loops.



 /* Don't move insns if live range shrinkage or register
 pressure-sensitive scheduling were done because it will not
 improve allocation but likely worsen insn scheduling.  */
  if (optimize
  && !flag_live_range_shrinkage
  && !(flag_sched_pressure && flag_schedule_insns))
    combine_and_move_insns ();



So you might want to look at register pressure sensitive scheduling 
first.  If you go into x264_r from specint and look at 
x264_pixel_satd_8x4.  First verify the loops are fully unrolled. If 
they are, then look for 32bit loads/stores into the stack.  If you 
have them, then you're spilling and getting crappy performance.  Using 
register pressure sensitive scheduling should help significantly.


Is that -fira-loop-pressure ?


We've certainly seen that internally.  The plan was to submit a patch 
to make register pressure sensitive scheduling the default when the 
scheduler is enabled.  We just haven't pushed on it.  If you can 
verify that you're seeing spilling as well, then it'd certainly 
bolster the argument that register-pressure-sensitive-scheduling is 
desirable.


I can confirm that the loop is fully unrolled and there's a zillion 
stack spills there for intermediate computes (-Ofast 
-march=rv64gc_zba_zbb_zbs, no V in that build).


Thx,
-Vineet


Re: IRA update_equiv_regs for (was Re: ICE for interim fix for PR/110748)

2023-08-12 Thread Jivan Hakobyan via Gcc-patches
Yes, as mentioned Jeff I have some work in that scope.

The first is related to address computation when it has a large constant
part.
Suppose we have this code:

int  consume (void *);
int foo (void) {
   int x[1000];
   return consume (x);
}

before IRA we have the following sequence
19: r140:DI=0xf000
20: r136:DI=r140:DI+0x60
  REG_EQUAL 0xf060
8: a0:DI=frame:DI+r136:DI
  REG_DEAD r136:DI

but during IRA (eliminate_regs_in_insn) insn 8 transforms to
   8: a0:DI=r136:DI+0xfa0+frame:DI
REG_DEAD r136:DI

and in the end, we get the wrong sequence.
   21: r136:DI=0xf060
  REG_EQUIV 0xf060
   25: r143:DI=0x1000
   26: r142:DI=r143:DI-0x60
  REG_DEAD r143:DI
  REG_EQUAL 0xfa0
   27: r142:DI=r142:DI+r136:DI
  REG_DEAD r136:DI
   8: a0:DI=r142:DI+frame:DI
  REG_DEAD r142:DI

My changes prevent that transformation.
I have tested on spec and did not get regressions.
Besides. executed 40B fewer instructions.

The second work related to hoisting out loop invariant code.
I have a test case where SP + const can be hoisted out.
..
.L3:
  call foo
  addi a5,sp,16
  sh3add a0,a0,a5
...

Before IRA that code is already out of the loop, but IRA moves back.
My approach was done in update_equiv_regs().
It prevents any move if its uses and defs are held in a single place, and
used in the loop.
Currently, that improvement is under evaluation.


On Sat, Aug 12, 2023 at 4:05 AM Jeff Law via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

>
>
> On 8/11/23 17:32, Vineet Gupta wrote:
> >
> > On 8/1/23 12:17, Vineet Gupta wrote:
> >> Hi Jeff,
> >>
> >> As discussed this morning, I'm sending over dumps for the optim of DF
> >> const -0.0 (PR/110748)  [1]
> >> For rv64gc_zbs build, IRA is undoing the split which eventually leads
> >> to ICE in final pass.
> >>
> >> [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110748#c15
> >>
> >> void znd(double *d) {  *d = -0.0;   }
> >>
> >>
> >> *split1*
> >>
> >> (insn 10 3 11 2 (set (reg:DI 136)
> >> (const_int [0x8000])) "neg.c":4:5 -1
> >>
> >> (insn 11 10 0 2 (set (mem:DF (reg:DI 135) [1 *d_2(D)+0 S8 A64])
> >> (subreg:DF (reg:DI 136) 0)) "neg.c":4:5 -1
> >>
> >> *ira*
> >>
> >> (insn 11 9 12 2 (set (mem:DF (reg:DI 135) [1 *d_2(D)+0 S8 A64])
> >> (const_double:DF -0.0 [-0x0.0p+0])) "neg.c":4:5 190
> >> {*movdf_hardfloat_rv64}
> >>  (expr_list:REG_DEAD (reg:DI 135)
> >>
> >>
> >> For the working case, the large const is not involved and not subject
> >> to IRA playing foul.
> >
> > I investigated this some more. So IRA update_equiv_regs () has code
> > identifying potential replacements: if a reg is referenced exactly
> > twice: set once and used once.
> >
> >if (REG_N_REFS (regno) == 2
> >&& (rtx_equal_p (replacement, src)
> >|| ! equiv_init_varies_p (src))
> >&& NONJUMP_INSN_P (insn)
> >&& equiv_init_movable_p (PATTERN (insn), regno))
> >  reg_equiv[regno].replace = 1;
> >  }
> >
> > And combine_and_move_insns () does the replacement, undoing the split1
> > above.
> Right.  This is as expected.  There was actually similar code that goes
> back even before the introduction of IRA -- like to the 80s and 90s.
>
> Conceptually the idea is a value with an equivalence that has a single
> set and single use isn't a good use of a hard register.  Better to
> narrow the live range to a single pair of instructions.
>
> It's not always a good tradeoff.  Consider if the equivalence was also a
> loop invariant and hoisted out of the loop and register pressure is low.
>
>
> >
> > In fact this is the reason for many more split1 being undone. See the
> > suboptimal codegen for large const for Andrew Pinski's test case [1]
> No doubt.  I think it's also a problem with some of Jivan's work.
>
>
> >
> > I'm wondering (naively) if there is some way to tune this - for a given
> > backend. In general it would make sense to do the replacement, but not
> > if the cost changes (e.g. consts could be embedded in x86 insn freely,
> > but not for RISC-V where this is costly and if something is split, it
> > might been intentional.
> I'm not immediately aware of a way to tune.
>
> When it comes to tuning, the toplevel questions are do we have any of
> the info we need to tune at the point where the transformation occurs.
> The two most obvious pieces here would be loop info an register pressure.
>
> ie, do we have enough loop structure to know if the def is at a
> shallower loop nest than the use.  There's a reasonable chance we have
> this information as my recollection is this analysis is done fairly
> early in IRA.
>
> But that means we likely don't have any sense of register pressure at
> the points between the def and use.   So the most useful metric for
> tuning isn't really available.
>
> The one thing that stands out is we don't do this 

Re: IRA update_equiv_regs for (was Re: ICE for interim fix for PR/110748)

2023-08-11 Thread Jeff Law via Gcc-patches




On 8/11/23 17:32, Vineet Gupta wrote:


On 8/1/23 12:17, Vineet Gupta wrote:

Hi Jeff,

As discussed this morning, I'm sending over dumps for the optim of DF 
const -0.0 (PR/110748)  [1]
For rv64gc_zbs build, IRA is undoing the split which eventually leads 
to ICE in final pass.


[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110748#c15

void znd(double *d) {  *d = -0.0;   }


*split1*

(insn 10 3 11 2 (set (reg:DI 136)
    (const_int [0x8000])) "neg.c":4:5 -1

(insn 11 10 0 2 (set (mem:DF (reg:DI 135) [1 *d_2(D)+0 S8 A64])
    (subreg:DF (reg:DI 136) 0)) "neg.c":4:5 -1

*ira*

(insn 11 9 12 2 (set (mem:DF (reg:DI 135) [1 *d_2(D)+0 S8 A64])
    (const_double:DF -0.0 [-0x0.0p+0])) "neg.c":4:5 190 
{*movdf_hardfloat_rv64}

 (expr_list:REG_DEAD (reg:DI 135)


For the working case, the large const is not involved and not subject 
to IRA playing foul.


I investigated this some more. So IRA update_equiv_regs () has code 
identifying potential replacements: if a reg is referenced exactly 
twice: set once and used once.


       if (REG_N_REFS (regno) == 2
       && (rtx_equal_p (replacement, src)
           || ! equiv_init_varies_p (src))
       && NONJUMP_INSN_P (insn)
       && equiv_init_movable_p (PATTERN (insn), regno))
         reg_equiv[regno].replace = 1;
     }

And combine_and_move_insns () does the replacement, undoing the split1 
above.
Right.  This is as expected.  There was actually similar code that goes 
back even before the introduction of IRA -- like to the 80s and 90s.


Conceptually the idea is a value with an equivalence that has a single 
set and single use isn't a good use of a hard register.  Better to 
narrow the live range to a single pair of instructions.


It's not always a good tradeoff.  Consider if the equivalence was also a 
loop invariant and hoisted out of the loop and register pressure is low.





In fact this is the reason for many more split1 being undone. See the 
suboptimal codegen for large const for Andrew Pinski's test case [1]

No doubt.  I think it's also a problem with some of Jivan's work.




I'm wondering (naively) if there is some way to tune this - for a given 
backend. In general it would make sense to do the replacement, but not 
if the cost changes (e.g. consts could be embedded in x86 insn freely, 
but not for RISC-V where this is costly and if something is split, it 
might been intentional.

I'm not immediately aware of a way to tune.

When it comes to tuning, the toplevel questions are do we have any of 
the info we need to tune at the point where the transformation occurs. 
The two most obvious pieces here would be loop info an register pressure.


ie, do we have enough loop structure to know if the def is at a 
shallower loop nest than the use.  There's a reasonable chance we have 
this information as my recollection is this analysis is done fairly 
early in IRA.


But that means we likely don't have any sense of register pressure at 
the points between the def and use.   So the most useful metric for 
tuning isn't really available.


The one thing that stands out is we don't do this transformation at all 
when register pressure sensitive scheduling is enabled.  And we really 
should be turning that on by default.  Our data shows register pressure 
sensitive scheduling is about a 6-7% cycle improvement on x264 as it 
avoids spilling in those key satd loops.



 /* Don't move insns if live range shrinkage or register
 pressure-sensitive scheduling were done because it will not
 improve allocation but likely worsen insn scheduling.  */
  if (optimize
  && !flag_live_range_shrinkage
  && !(flag_sched_pressure && flag_schedule_insns))
combine_and_move_insns ();



So you might want to look at register pressure sensitive scheduling 
first.  If you go into x264_r from specint and look at 
x264_pixel_satd_8x4.  First verify the loops are fully unrolled.  If 
they are, then look for 32bit loads/stores into the stack.  If you have 
them, then you're spilling and getting crappy performance.  Using 
register pressure sensitive scheduling should help significantly.


We've certainly seen that internally.  The plan was to submit a patch to 
make register pressure sensitive scheduling the default when the 
scheduler is enabled.  We just haven't pushed on it.  If you can verify 
that you're seeing spilling as well, then it'd certainly bolster the 
argument that register-pressure-sensitive-scheduling is desirable.


Jeff









IRA update_equiv_regs for (was Re: ICE for interim fix for PR/110748)

2023-08-11 Thread Vineet Gupta



On 8/1/23 12:17, Vineet Gupta wrote:

Hi Jeff,

As discussed this morning, I'm sending over dumps for the optim of DF 
const -0.0 (PR/110748)  [1]
For rv64gc_zbs build, IRA is undoing the split which eventually leads 
to ICE in final pass.


[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110748#c15

void znd(double *d) {  *d = -0.0;   }


*split1*

(insn 10 3 11 2 (set (reg:DI 136)
    (const_int [0x8000])) "neg.c":4:5 -1

(insn 11 10 0 2 (set (mem:DF (reg:DI 135) [1 *d_2(D)+0 S8 A64])
    (subreg:DF (reg:DI 136) 0)) "neg.c":4:5 -1

*ira*

(insn 11 9 12 2 (set (mem:DF (reg:DI 135) [1 *d_2(D)+0 S8 A64])
    (const_double:DF -0.0 [-0x0.0p+0])) "neg.c":4:5 190 
{*movdf_hardfloat_rv64}

 (expr_list:REG_DEAD (reg:DI 135)


For the working case, the large const is not involved and not subject 
to IRA playing foul.


I investigated this some more. So IRA update_equiv_regs () has code 
identifying potential replacements: if a reg is referenced exactly 
twice: set once and used once.


      if (REG_N_REFS (regno) == 2
      && (rtx_equal_p (replacement, src)
          || ! equiv_init_varies_p (src))
      && NONJUMP_INSN_P (insn)
      && equiv_init_movable_p (PATTERN (insn), regno))
        reg_equiv[regno].replace = 1;
    }

And combine_and_move_insns () does the replacement, undoing the split1 
above.


As a hack if I disable this code, the ICE above goes away and we get the 
right output.


    bseti    a5,zero,63
    sd    a5,0(a0)
    ret

In fact this is the reason for many more split1 being undone. See the 
suboptimal codegen for large const for Andrew Pinski's test case [1]


  long long f(void)
    {
  unsigned t = 0x101_0101;
  long long t1 = t;
  long long t2 = ((unsigned long long )t) << 32;
  asm("":"+r"(t1));
  return t1 | t2;
    }

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-June/620413.html

Again if I use the hacked compiler the redundant immediate goes away.

  upstream  |   ira hack     |
    li a0,0x101_    |    li   a5,0x101   |
    addi a0,a0,0x101    |    addi a5,a5,0x101    |
    li a5,0x101_    |    mv   a0,a5  |
    slli a0,a0,32   |    slli a5,a5,32   |
    addi a5,a5,0x101|    or   a0,a0,a5   |
    or   a0,a5,a0   |    ret |
    ret ||



This code has been there since 2009, when ira.c was created so it 
obviously per design/expected.


I'm wondering (naively) if there is some way to tune this - for a given 
backend. In general it would make sense to do the replacement, but not 
if the cost changes (e.g. consts could be embedded in x86 insn freely, 
but not for RISC-V where this is costly and if something is split, it 
might been intentional.


Thx,
-Vineet


ICE for interim fix for PR/110748

2023-08-01 Thread Vineet Gupta

Hi Jeff,

As discussed this morning, I'm sending over dumps for the optim of DF 
const -0.0 (PR/110748)  [1]
For rv64gc_zbs build, IRA is undoing the split which eventually leads to 
ICE in final pass.


[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110748#c15

void znd(double *d) {  *d = -0.0;   }


*split1*

(insn 10 3 11 2 (set (reg:DI 136)
    (const_int [0x8000])) "neg.c":4:5 -1

(insn 11 10 0 2 (set (mem:DF (reg:DI 135) [1 *d_2(D)+0 S8 A64])
    (subreg:DF (reg:DI 136) 0)) "neg.c":4:5 -1

*ira*

(insn 11 9 12 2 (set (mem:DF (reg:DI 135) [1 *d_2(D)+0 S8 A64])
    (const_double:DF -0.0 [-0x0.0p+0])) "neg.c":4:5 190 
{*movdf_hardfloat_rv64}

 (expr_list:REG_DEAD (reg:DI 135)


For the working case, the large const is not involved and not subject to 
IRA playing foul.


Attached are split1 and IRA dumps for OK (rv64gc) and NOK (rv64gc_zbs) 
cases.


Thx,
-Vineet
;; Function znd (znd, funcdef_no=0, decl_uid=2278, cgraph_uid=1, symbol_order=0)

Starting decreasing number of live ranges...
starting the processing of deferred insns
ending the processing of deferred insns
df_analyze called
;; 1 loops found
;;
;; Loop 0
;;  header 0, latch 1
;;  depth 0, outer -1
;;  nodes: 0 1 2
;; 2 succs { 1 }
rescanning insn with uid = 11.
deleting insn with uid = 10.
starting the processing of deferred insns
ending the processing of deferred insns
df_analyze called
df_worklist_dataflow_doublequeue: n_basic_blocks 3 n_edges 2 count 3 (1)
Reg 135 uninteresting
;; 1 loops found
;;
;; Loop 0
;;  header 0, latch 1
;;  depth 0, outer -1
;;  nodes: 0 1 2
;; 2 succs { 1 }
Building IRA IR
starting the processing of deferred insns
ending the processing of deferred insns
df_analyze called

Pass 0 for finding pseudo/allocno costs

a0 (r135,l0) best GR_REGS, allocno GR_REGS

  a0(r135,l0) costs: SIBCALL_REGS:2000,2000 JALR_REGS:2000,2000 
GR_REGS:2000,2000 MEM:1,1


Pass 1 for finding pseudo/allocno costs

r135: preferred GR_REGS, alternative NO_REGS, allocno GR_REGS

  a0(r135,l0) costs: GR_REGS:2000,2000 MEM:1,1

   Insn 11(l0): point = 0
   Insn 9(l0): point = 2
 a0(r135): [1..2]
Compressing live ranges: from 5 to 2 - 40%
Ranges after the compression:
 a0(r135): [0..1]
+++Allocating 0 bytes for conflict table (uncompressed size 8)
;; a0(r135,l0) conflicts:
;; total conflict hard regs:
;; conflict hard regs:


  pref0:a0(r135)<-hr10@2000
  regions=1, blocks=3, points=2
allocnos=1 (big 0), copies=0, conflicts=0, ranges=1

 Allocnos coloring:


  Loop 0 (parent -1, header bb2, depth 0)
bbs: 2
all: 0r135
modified regnos: 135
border:
Pressure: GR_REGS=2
Hard reg set forest:
  0:( 1 5-63)@0
1:( 5-31)@24000
  Allocno a0r135 of GR_REGS(28) has 27 avail. regs  5-31, node:  5-31 
(confl regs =  0-4 32-127)
  Forming thread from colorable bucket:
  Pushing a0(r135,l0)(cost 0)
  Popping a0(r135,l0)  -- assign reg 10
Disposition:
0:r135 l010
New iteration of spill/restore move
+++Costs: overall -2000, reg -2000, mem 0, ld 0, st 0, move 0
+++   move loops 0, new jumps 0


znd

Dataflow summary:
;;  fully invalidated by EH  0 [zero] 3 [gp] 4 [tp] 5 [t0] 6 [t1] 7 [t2] 10 
[a0] 11 [a1] 12 [a2] 13 [a3] 14 [a4] 15 [a5] 16 [a6] 17 [a7] 28 [t3] 29 [t4] 30 
[t5] 31 [t6] 32 [ft0] 33 [ft1] 34 [ft2] 35 [ft3] 36 [ft4] 37 [ft5] 38 [ft6] 39 
[ft7] 42 [fa0] 43 [fa1] 44 [fa2] 45 [fa3] 46 [fa4] 47 [fa5] 48 [fa6] 49 [fa7] 
60 [ft8] 61 [ft9] 62 [ft10] 63 [ft11] 66 [vl] 67 [vtype] 68 [vxrm] 69 [frm] 70 
[N/A] 71 [N/A] 72 [N/A] 73 [N/A] 74 [N/A] 75 [N/A] 76 [N/A] 77 [N/A] 78 [N/A] 
79 [N/A] 80 [N/A] 81 [N/A] 82 [N/A] 83 [N/A] 84 [N/A] 85 [N/A] 86 [N/A] 87 
[N/A] 88 [N/A] 89 [N/A] 90 [N/A] 91 [N/A] 92 [N/A] 93 [N/A] 94 [N/A] 95 [N/A] 
96 [v0] 97 [v1] 98 [v2] 99 [v3] 100 [v4] 101 [v5] 102 [v6] 103 [v7] 104 [v8] 
105 [v9] 106 [v10] 107 [v11] 108 [v12] 109 [v13] 110 [v14] 111 [v15] 112 [v16] 
113 [v17] 114 [v18] 115 [v19] 116 [v20] 117 [v21] 118 [v22] 119 [v23] 120 [v24] 
121 [v25] 122 [v26] 123 [v27] 124 [v28] 125 [v29] 126 [v30] 127 [v31]
;;  hardware regs used   2 [sp] 64 [arg] 65 [frame]
;;  regular block artificial uses2 [sp] 8 [s0] 64 [arg] 65 [frame]
;;  eh block artificial uses 2 [sp] 8 [s0] 64 [arg] 65 [frame]
;;  entry block defs 1 [ra] 2 [sp] 8 [s0] 10 [a0] 11 [a1] 12 [a2] 13 [a3] 
14 [a4] 15 [a5] 16 [a6] 17 [a7] 42 [fa0] 43 [fa1] 44 [fa2] 45 [fa3] 46 [fa4] 47 
[fa5] 48 [fa6] 49 [fa7] 64 [arg] 65 [frame]
;;  exit block uses  1 [ra] 2 [sp] 8 [s0] 65 [frame]
;;  regs ever live   10 [a0]
;;  ref usage   r1={1d,1u} r2={1d,2u} r8={1d,2u} r10={1d,1u} r11={1d} r12={1d} 
r13={1d} r14={1d} r15={1d} r16={1d} r17={1d} r42={1d} r43={1d} r44={1d} 
r45={1d} r46={1d} r47={1d} r48={1d} r49={1d} r64={1d,1u} r65={1d,2u} 
r135={1d,1u} 
;;total ref usage 32{22d,10u,0e} in 2{2 regular + 0 call} insns.
(note 1 0 4 NOTE_INSN_DELETED)
(note 4 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(note 2 4 3 2