Re: IRA update_equiv_regs for (was Re: ICE for interim fix for PR/110748)
On 8/14/23 18:35, Vineet Gupta wrote: On 8/11/23 17:04, Jeff Law wrote: I'm wondering (naively) if there is some way to tune this - for a given backend. In general it would make sense to do the replacement, but not if the cost changes (e.g. consts could be embedded in x86 insn freely, but not for RISC-V where this is costly and if something is split, it might been intentional. I'm not immediately aware of a way to tune. When it comes to tuning, the toplevel questions are do we have any of the info we need to tune at the point where the transformation occurs. The two most obvious pieces here would be loop info an register pressure. ie, do we have enough loop structure to know if the def is at a shallower loop nest than the use. There's a reasonable chance we have this information as my recollection is this analysis is done fairly early in IRA. But that means we likely don't have any sense of register pressure at the points between the def and use. So the most useful metric for tuning isn't really available. I'd argue that even if the register pressure were high, in some cases, there's just no way around it and RA needs to honor what the backend did apriori (split in this case), otherwise we end up with something which doesn't compute literally and leads to ICE. I'm puzzled that in this case, intentional implementation is getting in the way. So while I don't care about the -0.0 case in itself, it seems with the current framework we can't just achieve the results, other that the roundabout way of peephole2 you alluded to. I think you'll run into a lot of resistance with that approach. The fact it we're being a bit sneaky and telling a bit of a fib in the backend (claiming support for certain capabilities that don't actually exist). As many have said, lie to GCC and ultimately it will gets revenge. This is but one example. When we lie to some parts of gcc, we may well trigger undesirable behavior later in the pipeline. It's a tradeoff and sometimes we have to back out those little lies. The one thing that stands out is we don't do this transformation at all when register pressure sensitive scheduling is enabled. And we really should be turning that on by default. Our data shows register pressure sensitive scheduling is about a 6-7% cycle improvement on x264 as it avoids spilling in those key satd loops. /* Don't move insns if live range shrinkage or register pressure-sensitive scheduling were done because it will not improve allocation but likely worsen insn scheduling. */ if (optimize && !flag_live_range_shrinkage && !(flag_sched_pressure && flag_schedule_insns)) combine_and_move_insns (); So you might want to look at register pressure sensitive scheduling first. If you go into x264_r from specint and look at x264_pixel_satd_8x4. First verify the loops are fully unrolled. If they are, then look for 32bit loads/stores into the stack. If you have them, then you're spilling and getting crappy performance. Using register pressure sensitive scheduling should help significantly. Is that -fira-loop-pressure ? -fsched-pressure I think. We've certainly seen that internally. The plan was to submit a patch to make register pressure sensitive scheduling the default when the scheduler is enabled. We just haven't pushed on it. If you can verify that you're seeing spilling as well, then it'd certainly bolster the argument that register-pressure-sensitive-scheduling is desirable. I can confirm that the loop is fully unrolled and there's a zillion stack spills there for intermediate computes (-Ofast -march=rv64gc_zba_zbb_zbs, no V in that build). Yea, you'll take a big hit from those spills. Good to get a confirmation that you're seeing it too. The fix should be pretty simple. We just turn on -fsched-pressure in the RV backend. Jeff
Re: IRA update_equiv_regs for (was Re: ICE for interim fix for PR/110748)
On 8/12/23 10:44, Jivan Hakobyan wrote: Yes, as mentioned Jeff I have some work in that scope. The first is related to address computation when it has a large constant part. Suppose we have this code: int consume (void *); int foo (void) { int x[1000]; return consume (x); } before IRA we have the following sequence 19: r140:DI=0xf000 20: r136:DI=r140:DI+0x60 REG_EQUAL 0xf060 8: a0:DI=frame:DI+r136:DI REG_DEAD r136:DI but during IRA (eliminate_regs_in_insn) insn 8 transforms to 8: a0:DI=r136:DI+0xfa0+frame:DI REG_DEAD r136:DI and in the end, we get the wrong sequence. 21: r136:DI=0xf060 REG_EQUIV 0xf060 25: r143:DI=0x1000 26: r142:DI=r143:DI-0x60 REG_DEAD r143:DI REG_EQUAL 0xfa0 27: r142:DI=r142:DI+r136:DI REG_DEAD r136:DI 8: a0:DI=r142:DI+frame:DI REG_DEAD r142:DI My changes prevent that transformation. I have tested on spec and did not get regressions. Besides. executed 40B fewer instructions. Right. And this looks like a generic failing of the register elimination code to simplify after eliminating fp/ap to sp. It's a bit of a surprise as I thought that code had some simplification capabilities. But clearly if it has that ability it isn't working well. Part of me wondered if it's falling down due to constants not fitting in a 12 bit signed immediate. I've got a TODO to look at your patch in this space. Maybe tonight if I can keep moving things off my TODO list ;-) The second work related to hoisting out loop invariant code. I have a test case where SP + const can be hoisted out. .. .L3: call foo addi a5,sp,16 sh3add a0,a0,a5 ... Before IRA that code is already out of the loop, but IRA moves back. My approach was done in update_equiv_regs(). It prevents any move if its uses and defs are held in a single place, and used in the loop. Currently, that improvement is under evaluation. Yea, we're going to need to sit down with this. IRA is working per design and we may be able to avoid these problems with -fsched-pressure, but it feels a bit hackish. Jeff
Re: IRA update_equiv_regs for (was Re: ICE for interim fix for PR/110748)
On 8/11/23 17:04, Jeff Law wrote: I'm wondering (naively) if there is some way to tune this - for a given backend. In general it would make sense to do the replacement, but not if the cost changes (e.g. consts could be embedded in x86 insn freely, but not for RISC-V where this is costly and if something is split, it might been intentional. I'm not immediately aware of a way to tune. When it comes to tuning, the toplevel questions are do we have any of the info we need to tune at the point where the transformation occurs. The two most obvious pieces here would be loop info an register pressure. ie, do we have enough loop structure to know if the def is at a shallower loop nest than the use. There's a reasonable chance we have this information as my recollection is this analysis is done fairly early in IRA. But that means we likely don't have any sense of register pressure at the points between the def and use. So the most useful metric for tuning isn't really available. I'd argue that even if the register pressure were high, in some cases, there's just no way around it and RA needs to honor what the backend did apriori (split in this case), otherwise we end up with something which doesn't compute literally and leads to ICE. I'm puzzled that in this case, intentional implementation is getting in the way. So while I don't care about the -0.0 case in itself, it seems with the current framework we can't just achieve the results, other that the roundabout way of peephole2 you alluded to. The one thing that stands out is we don't do this transformation at all when register pressure sensitive scheduling is enabled. And we really should be turning that on by default. Our data shows register pressure sensitive scheduling is about a 6-7% cycle improvement on x264 as it avoids spilling in those key satd loops. /* Don't move insns if live range shrinkage or register pressure-sensitive scheduling were done because it will not improve allocation but likely worsen insn scheduling. */ if (optimize && !flag_live_range_shrinkage && !(flag_sched_pressure && flag_schedule_insns)) combine_and_move_insns (); So you might want to look at register pressure sensitive scheduling first. If you go into x264_r from specint and look at x264_pixel_satd_8x4. First verify the loops are fully unrolled. If they are, then look for 32bit loads/stores into the stack. If you have them, then you're spilling and getting crappy performance. Using register pressure sensitive scheduling should help significantly. Is that -fira-loop-pressure ? We've certainly seen that internally. The plan was to submit a patch to make register pressure sensitive scheduling the default when the scheduler is enabled. We just haven't pushed on it. If you can verify that you're seeing spilling as well, then it'd certainly bolster the argument that register-pressure-sensitive-scheduling is desirable. I can confirm that the loop is fully unrolled and there's a zillion stack spills there for intermediate computes (-Ofast -march=rv64gc_zba_zbb_zbs, no V in that build). Thx, -Vineet
Re: IRA update_equiv_regs for (was Re: ICE for interim fix for PR/110748)
Yes, as mentioned Jeff I have some work in that scope. The first is related to address computation when it has a large constant part. Suppose we have this code: int consume (void *); int foo (void) { int x[1000]; return consume (x); } before IRA we have the following sequence 19: r140:DI=0xf000 20: r136:DI=r140:DI+0x60 REG_EQUAL 0xf060 8: a0:DI=frame:DI+r136:DI REG_DEAD r136:DI but during IRA (eliminate_regs_in_insn) insn 8 transforms to 8: a0:DI=r136:DI+0xfa0+frame:DI REG_DEAD r136:DI and in the end, we get the wrong sequence. 21: r136:DI=0xf060 REG_EQUIV 0xf060 25: r143:DI=0x1000 26: r142:DI=r143:DI-0x60 REG_DEAD r143:DI REG_EQUAL 0xfa0 27: r142:DI=r142:DI+r136:DI REG_DEAD r136:DI 8: a0:DI=r142:DI+frame:DI REG_DEAD r142:DI My changes prevent that transformation. I have tested on spec and did not get regressions. Besides. executed 40B fewer instructions. The second work related to hoisting out loop invariant code. I have a test case where SP + const can be hoisted out. .. .L3: call foo addi a5,sp,16 sh3add a0,a0,a5 ... Before IRA that code is already out of the loop, but IRA moves back. My approach was done in update_equiv_regs(). It prevents any move if its uses and defs are held in a single place, and used in the loop. Currently, that improvement is under evaluation. On Sat, Aug 12, 2023 at 4:05 AM Jeff Law via Gcc-patches < gcc-patches@gcc.gnu.org> wrote: > > > On 8/11/23 17:32, Vineet Gupta wrote: > > > > On 8/1/23 12:17, Vineet Gupta wrote: > >> Hi Jeff, > >> > >> As discussed this morning, I'm sending over dumps for the optim of DF > >> const -0.0 (PR/110748) [1] > >> For rv64gc_zbs build, IRA is undoing the split which eventually leads > >> to ICE in final pass. > >> > >> [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110748#c15 > >> > >> void znd(double *d) { *d = -0.0; } > >> > >> > >> *split1* > >> > >> (insn 10 3 11 2 (set (reg:DI 136) > >> (const_int [0x8000])) "neg.c":4:5 -1 > >> > >> (insn 11 10 0 2 (set (mem:DF (reg:DI 135) [1 *d_2(D)+0 S8 A64]) > >> (subreg:DF (reg:DI 136) 0)) "neg.c":4:5 -1 > >> > >> *ira* > >> > >> (insn 11 9 12 2 (set (mem:DF (reg:DI 135) [1 *d_2(D)+0 S8 A64]) > >> (const_double:DF -0.0 [-0x0.0p+0])) "neg.c":4:5 190 > >> {*movdf_hardfloat_rv64} > >> (expr_list:REG_DEAD (reg:DI 135) > >> > >> > >> For the working case, the large const is not involved and not subject > >> to IRA playing foul. > > > > I investigated this some more. So IRA update_equiv_regs () has code > > identifying potential replacements: if a reg is referenced exactly > > twice: set once and used once. > > > >if (REG_N_REFS (regno) == 2 > >&& (rtx_equal_p (replacement, src) > >|| ! equiv_init_varies_p (src)) > >&& NONJUMP_INSN_P (insn) > >&& equiv_init_movable_p (PATTERN (insn), regno)) > > reg_equiv[regno].replace = 1; > > } > > > > And combine_and_move_insns () does the replacement, undoing the split1 > > above. > Right. This is as expected. There was actually similar code that goes > back even before the introduction of IRA -- like to the 80s and 90s. > > Conceptually the idea is a value with an equivalence that has a single > set and single use isn't a good use of a hard register. Better to > narrow the live range to a single pair of instructions. > > It's not always a good tradeoff. Consider if the equivalence was also a > loop invariant and hoisted out of the loop and register pressure is low. > > > > > > In fact this is the reason for many more split1 being undone. See the > > suboptimal codegen for large const for Andrew Pinski's test case [1] > No doubt. I think it's also a problem with some of Jivan's work. > > > > > > I'm wondering (naively) if there is some way to tune this - for a given > > backend. In general it would make sense to do the replacement, but not > > if the cost changes (e.g. consts could be embedded in x86 insn freely, > > but not for RISC-V where this is costly and if something is split, it > > might been intentional. > I'm not immediately aware of a way to tune. > > When it comes to tuning, the toplevel questions are do we have any of > the info we need to tune at the point where the transformation occurs. > The two most obvious pieces here would be loop info an register pressure. > > ie, do we have enough loop structure to know if the def is at a > shallower loop nest than the use. There's a reasonable chance we have > this information as my recollection is this analysis is done fairly > early in IRA. > > But that means we likely don't have any sense of register pressure at > the points between the def and use. So the most useful metric for > tuning isn't really available. > > The one thing that stands out is we don't do this
Re: IRA update_equiv_regs for (was Re: ICE for interim fix for PR/110748)
On 8/11/23 17:32, Vineet Gupta wrote: On 8/1/23 12:17, Vineet Gupta wrote: Hi Jeff, As discussed this morning, I'm sending over dumps for the optim of DF const -0.0 (PR/110748) [1] For rv64gc_zbs build, IRA is undoing the split which eventually leads to ICE in final pass. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110748#c15 void znd(double *d) { *d = -0.0; } *split1* (insn 10 3 11 2 (set (reg:DI 136) (const_int [0x8000])) "neg.c":4:5 -1 (insn 11 10 0 2 (set (mem:DF (reg:DI 135) [1 *d_2(D)+0 S8 A64]) (subreg:DF (reg:DI 136) 0)) "neg.c":4:5 -1 *ira* (insn 11 9 12 2 (set (mem:DF (reg:DI 135) [1 *d_2(D)+0 S8 A64]) (const_double:DF -0.0 [-0x0.0p+0])) "neg.c":4:5 190 {*movdf_hardfloat_rv64} (expr_list:REG_DEAD (reg:DI 135) For the working case, the large const is not involved and not subject to IRA playing foul. I investigated this some more. So IRA update_equiv_regs () has code identifying potential replacements: if a reg is referenced exactly twice: set once and used once. if (REG_N_REFS (regno) == 2 && (rtx_equal_p (replacement, src) || ! equiv_init_varies_p (src)) && NONJUMP_INSN_P (insn) && equiv_init_movable_p (PATTERN (insn), regno)) reg_equiv[regno].replace = 1; } And combine_and_move_insns () does the replacement, undoing the split1 above. Right. This is as expected. There was actually similar code that goes back even before the introduction of IRA -- like to the 80s and 90s. Conceptually the idea is a value with an equivalence that has a single set and single use isn't a good use of a hard register. Better to narrow the live range to a single pair of instructions. It's not always a good tradeoff. Consider if the equivalence was also a loop invariant and hoisted out of the loop and register pressure is low. In fact this is the reason for many more split1 being undone. See the suboptimal codegen for large const for Andrew Pinski's test case [1] No doubt. I think it's also a problem with some of Jivan's work. I'm wondering (naively) if there is some way to tune this - for a given backend. In general it would make sense to do the replacement, but not if the cost changes (e.g. consts could be embedded in x86 insn freely, but not for RISC-V where this is costly and if something is split, it might been intentional. I'm not immediately aware of a way to tune. When it comes to tuning, the toplevel questions are do we have any of the info we need to tune at the point where the transformation occurs. The two most obvious pieces here would be loop info an register pressure. ie, do we have enough loop structure to know if the def is at a shallower loop nest than the use. There's a reasonable chance we have this information as my recollection is this analysis is done fairly early in IRA. But that means we likely don't have any sense of register pressure at the points between the def and use. So the most useful metric for tuning isn't really available. The one thing that stands out is we don't do this transformation at all when register pressure sensitive scheduling is enabled. And we really should be turning that on by default. Our data shows register pressure sensitive scheduling is about a 6-7% cycle improvement on x264 as it avoids spilling in those key satd loops. /* Don't move insns if live range shrinkage or register pressure-sensitive scheduling were done because it will not improve allocation but likely worsen insn scheduling. */ if (optimize && !flag_live_range_shrinkage && !(flag_sched_pressure && flag_schedule_insns)) combine_and_move_insns (); So you might want to look at register pressure sensitive scheduling first. If you go into x264_r from specint and look at x264_pixel_satd_8x4. First verify the loops are fully unrolled. If they are, then look for 32bit loads/stores into the stack. If you have them, then you're spilling and getting crappy performance. Using register pressure sensitive scheduling should help significantly. We've certainly seen that internally. The plan was to submit a patch to make register pressure sensitive scheduling the default when the scheduler is enabled. We just haven't pushed on it. If you can verify that you're seeing spilling as well, then it'd certainly bolster the argument that register-pressure-sensitive-scheduling is desirable. Jeff
IRA update_equiv_regs for (was Re: ICE for interim fix for PR/110748)
On 8/1/23 12:17, Vineet Gupta wrote: Hi Jeff, As discussed this morning, I'm sending over dumps for the optim of DF const -0.0 (PR/110748) [1] For rv64gc_zbs build, IRA is undoing the split which eventually leads to ICE in final pass. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110748#c15 void znd(double *d) { *d = -0.0; } *split1* (insn 10 3 11 2 (set (reg:DI 136) (const_int [0x8000])) "neg.c":4:5 -1 (insn 11 10 0 2 (set (mem:DF (reg:DI 135) [1 *d_2(D)+0 S8 A64]) (subreg:DF (reg:DI 136) 0)) "neg.c":4:5 -1 *ira* (insn 11 9 12 2 (set (mem:DF (reg:DI 135) [1 *d_2(D)+0 S8 A64]) (const_double:DF -0.0 [-0x0.0p+0])) "neg.c":4:5 190 {*movdf_hardfloat_rv64} (expr_list:REG_DEAD (reg:DI 135) For the working case, the large const is not involved and not subject to IRA playing foul. I investigated this some more. So IRA update_equiv_regs () has code identifying potential replacements: if a reg is referenced exactly twice: set once and used once. if (REG_N_REFS (regno) == 2 && (rtx_equal_p (replacement, src) || ! equiv_init_varies_p (src)) && NONJUMP_INSN_P (insn) && equiv_init_movable_p (PATTERN (insn), regno)) reg_equiv[regno].replace = 1; } And combine_and_move_insns () does the replacement, undoing the split1 above. As a hack if I disable this code, the ICE above goes away and we get the right output. bseti a5,zero,63 sd a5,0(a0) ret In fact this is the reason for many more split1 being undone. See the suboptimal codegen for large const for Andrew Pinski's test case [1] long long f(void) { unsigned t = 0x101_0101; long long t1 = t; long long t2 = ((unsigned long long )t) << 32; asm("":"+r"(t1)); return t1 | t2; } [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-June/620413.html Again if I use the hacked compiler the redundant immediate goes away. upstream | ira hack | li a0,0x101_ | li a5,0x101 | addi a0,a0,0x101 | addi a5,a5,0x101 | li a5,0x101_ | mv a0,a5 | slli a0,a0,32 | slli a5,a5,32 | addi a5,a5,0x101| or a0,a0,a5 | or a0,a5,a0 | ret | ret || This code has been there since 2009, when ira.c was created so it obviously per design/expected. I'm wondering (naively) if there is some way to tune this - for a given backend. In general it would make sense to do the replacement, but not if the cost changes (e.g. consts could be embedded in x86 insn freely, but not for RISC-V where this is costly and if something is split, it might been intentional. Thx, -Vineet
ICE for interim fix for PR/110748
Hi Jeff, As discussed this morning, I'm sending over dumps for the optim of DF const -0.0 (PR/110748) [1] For rv64gc_zbs build, IRA is undoing the split which eventually leads to ICE in final pass. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110748#c15 void znd(double *d) { *d = -0.0; } *split1* (insn 10 3 11 2 (set (reg:DI 136) (const_int [0x8000])) "neg.c":4:5 -1 (insn 11 10 0 2 (set (mem:DF (reg:DI 135) [1 *d_2(D)+0 S8 A64]) (subreg:DF (reg:DI 136) 0)) "neg.c":4:5 -1 *ira* (insn 11 9 12 2 (set (mem:DF (reg:DI 135) [1 *d_2(D)+0 S8 A64]) (const_double:DF -0.0 [-0x0.0p+0])) "neg.c":4:5 190 {*movdf_hardfloat_rv64} (expr_list:REG_DEAD (reg:DI 135) For the working case, the large const is not involved and not subject to IRA playing foul. Attached are split1 and IRA dumps for OK (rv64gc) and NOK (rv64gc_zbs) cases. Thx, -Vineet ;; Function znd (znd, funcdef_no=0, decl_uid=2278, cgraph_uid=1, symbol_order=0) Starting decreasing number of live ranges... starting the processing of deferred insns ending the processing of deferred insns df_analyze called ;; 1 loops found ;; ;; Loop 0 ;; header 0, latch 1 ;; depth 0, outer -1 ;; nodes: 0 1 2 ;; 2 succs { 1 } rescanning insn with uid = 11. deleting insn with uid = 10. starting the processing of deferred insns ending the processing of deferred insns df_analyze called df_worklist_dataflow_doublequeue: n_basic_blocks 3 n_edges 2 count 3 (1) Reg 135 uninteresting ;; 1 loops found ;; ;; Loop 0 ;; header 0, latch 1 ;; depth 0, outer -1 ;; nodes: 0 1 2 ;; 2 succs { 1 } Building IRA IR starting the processing of deferred insns ending the processing of deferred insns df_analyze called Pass 0 for finding pseudo/allocno costs a0 (r135,l0) best GR_REGS, allocno GR_REGS a0(r135,l0) costs: SIBCALL_REGS:2000,2000 JALR_REGS:2000,2000 GR_REGS:2000,2000 MEM:1,1 Pass 1 for finding pseudo/allocno costs r135: preferred GR_REGS, alternative NO_REGS, allocno GR_REGS a0(r135,l0) costs: GR_REGS:2000,2000 MEM:1,1 Insn 11(l0): point = 0 Insn 9(l0): point = 2 a0(r135): [1..2] Compressing live ranges: from 5 to 2 - 40% Ranges after the compression: a0(r135): [0..1] +++Allocating 0 bytes for conflict table (uncompressed size 8) ;; a0(r135,l0) conflicts: ;; total conflict hard regs: ;; conflict hard regs: pref0:a0(r135)<-hr10@2000 regions=1, blocks=3, points=2 allocnos=1 (big 0), copies=0, conflicts=0, ranges=1 Allocnos coloring: Loop 0 (parent -1, header bb2, depth 0) bbs: 2 all: 0r135 modified regnos: 135 border: Pressure: GR_REGS=2 Hard reg set forest: 0:( 1 5-63)@0 1:( 5-31)@24000 Allocno a0r135 of GR_REGS(28) has 27 avail. regs 5-31, node: 5-31 (confl regs = 0-4 32-127) Forming thread from colorable bucket: Pushing a0(r135,l0)(cost 0) Popping a0(r135,l0) -- assign reg 10 Disposition: 0:r135 l010 New iteration of spill/restore move +++Costs: overall -2000, reg -2000, mem 0, ld 0, st 0, move 0 +++ move loops 0, new jumps 0 znd Dataflow summary: ;; fully invalidated by EH 0 [zero] 3 [gp] 4 [tp] 5 [t0] 6 [t1] 7 [t2] 10 [a0] 11 [a1] 12 [a2] 13 [a3] 14 [a4] 15 [a5] 16 [a6] 17 [a7] 28 [t3] 29 [t4] 30 [t5] 31 [t6] 32 [ft0] 33 [ft1] 34 [ft2] 35 [ft3] 36 [ft4] 37 [ft5] 38 [ft6] 39 [ft7] 42 [fa0] 43 [fa1] 44 [fa2] 45 [fa3] 46 [fa4] 47 [fa5] 48 [fa6] 49 [fa7] 60 [ft8] 61 [ft9] 62 [ft10] 63 [ft11] 66 [vl] 67 [vtype] 68 [vxrm] 69 [frm] 70 [N/A] 71 [N/A] 72 [N/A] 73 [N/A] 74 [N/A] 75 [N/A] 76 [N/A] 77 [N/A] 78 [N/A] 79 [N/A] 80 [N/A] 81 [N/A] 82 [N/A] 83 [N/A] 84 [N/A] 85 [N/A] 86 [N/A] 87 [N/A] 88 [N/A] 89 [N/A] 90 [N/A] 91 [N/A] 92 [N/A] 93 [N/A] 94 [N/A] 95 [N/A] 96 [v0] 97 [v1] 98 [v2] 99 [v3] 100 [v4] 101 [v5] 102 [v6] 103 [v7] 104 [v8] 105 [v9] 106 [v10] 107 [v11] 108 [v12] 109 [v13] 110 [v14] 111 [v15] 112 [v16] 113 [v17] 114 [v18] 115 [v19] 116 [v20] 117 [v21] 118 [v22] 119 [v23] 120 [v24] 121 [v25] 122 [v26] 123 [v27] 124 [v28] 125 [v29] 126 [v30] 127 [v31] ;; hardware regs used 2 [sp] 64 [arg] 65 [frame] ;; regular block artificial uses2 [sp] 8 [s0] 64 [arg] 65 [frame] ;; eh block artificial uses 2 [sp] 8 [s0] 64 [arg] 65 [frame] ;; entry block defs 1 [ra] 2 [sp] 8 [s0] 10 [a0] 11 [a1] 12 [a2] 13 [a3] 14 [a4] 15 [a5] 16 [a6] 17 [a7] 42 [fa0] 43 [fa1] 44 [fa2] 45 [fa3] 46 [fa4] 47 [fa5] 48 [fa6] 49 [fa7] 64 [arg] 65 [frame] ;; exit block uses 1 [ra] 2 [sp] 8 [s0] 65 [frame] ;; regs ever live 10 [a0] ;; ref usage r1={1d,1u} r2={1d,2u} r8={1d,2u} r10={1d,1u} r11={1d} r12={1d} r13={1d} r14={1d} r15={1d} r16={1d} r17={1d} r42={1d} r43={1d} r44={1d} r45={1d} r46={1d} r47={1d} r48={1d} r49={1d} r64={1d,1u} r65={1d,2u} r135={1d,1u} ;;total ref usage 32{22d,10u,0e} in 2{2 regular + 0 call} insns. (note 1 0 4 NOTE_INSN_DELETED) (note 4 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK) (note 2 4 3 2