Re: LRA for avr: help with FP and elimination

2023-08-11 Thread Vladimir Makarov via Gcc



On 8/10/23 07:33, senthilkumar.selva...@microchip.com wrote:


Hi Vlad,

   I can confirm your commit 
(https://gcc.gnu.org/git?p=gcc.git;a=commit;h=2971ff7b1d564ac04b537d907c70e6093af70832)
   fixes the above problem, thank you. However, I see execution failures if a
   pseudo assigned to FP has to be spilled because of stack slot creation.

   To reproduce, build the compiler just like above, and then do

$ avr-gcc -mmcu=avr51 
/gcc/testsuite/gcc.c-torture/execute/20050224-1.c -O2 -S 
-fdump-rtl-all

   The execution failure occurs at this point

movw r24,r2
sbiw r24,36
brne .L8

r2 is never set anywhere at all in the assembly.

The relevant insns (in the IRA dump) are

(insn 3 15 4 3 (set (reg/v:HI 51 [ j ])
 (const_int 0 [0])) 
"gcc/gcc/testsuite/gcc.c-torture/execute/20050224-1.c":19:21 101 {*movhi_split}
  (expr_list:REG_EQUAL (const_int 0 [0])
 (nil)))
...
(insn 28 27 67 8 (parallel [
 (set (reg/v:HI 51 [ j ])
 (plus:HI (reg/v:HI 51 [ j ])
 (const_int 1 [0x1])))
 (clobber (scratch:QI))
 ]) 
"/home/i41766/code/personal/gcc/gcc/testsuite/gcc.c-torture/execute/20050224-1.c":28:8
 175 {addhi3_clobber}
  (nil))
...
(jump_insn 44 43 45 13 (parallel [
 (set (pc)
 (if_then_else (ne (reg/v:HI 51 [ j ])
 (const_int 36 [0x24]))
 (label_ref:HI 103)
 (pc)))
 (clobber (scratch:QI))
 ]) 
"/home/i41766/code/personal/gcc/gcc/testsuite/gcc.c-torture/execute/20050224-1.c":11:16
 discrim 1 713 {cbranchhi4_insn}
  (expr_list:REG_DEAD (reg/v:HI 51 [ j ])
 (int_list:REG_BR_PROB 7 (nil)))
  -> 103)

   LRA deletes insns 3 and 28, and uses r2 in the jump_insn.

   In the reload dump, for pseudo r51, I'm seeing this
subreg regs:
   Frame pointer can not be eliminated anymore
   Spilling non-eliminable hard regs: 28 29
 Spilling r51(28)
   Slot 0 regnos (width = 0):46
   Slot 1 regnos (width = 0):45

   lra_update_fp2sp_elimination calls spill_pseudos with
   HARD_FRAME_POINTER_REGNUM, and that sets reg_renumber[51] to -1.

   Later down the line, process_bb_lives is called with dead_insn_p=true from
   lra_create_lives_ranges_1 on the relevant BB (#8), and df_get_live_out
   on that BB does not contain 51 (even though previous calls to the same BB 
did).
   
Breakpoint 8, process_bb_lives (bb=0x7fffea570240, curr_point=@0x7fffd838: 25, dead_insn_p=true) at gcc/gcc/lra-lives.cc:664

664   function_abi last_call_abi = default_function_abi;
(gdb) n
666   reg_live_out = df_get_live_out (bb);
(gdb)
667   sparseset_clear (pseudos_live);
(gdb) p debug_bitmap(reg_live_out)

first = 0x321c128 current = 0x321c128 indx = 0
0x321c128 next = (nil) prev = (nil) indx = 0
bits = { 28 32 34 43 44 47 48 49 50 }

   process_bb_lives then considers the insn setting 51 (and
   the reload insns LRA created) as dead, and removes them.

BB 8
Insn 67: point = 31, n_alt = -1
Insn 114: point = 31, n_alt = 3
Deleting dead insn 114
deleting insn with uid = 114.
Insn 28: point = 31, n_alt = 1
Deleting dead insn 28
deleting insn with uid = 28.
Insn 113: point = 31, n_alt = 2
Deleting dead insn 113

Same for insn 3 as well

BB 3
Insn 92: point = 40, n_alt = -1
Insn 5: point = 40, n_alt = 1
Insn 4: point = 41, n_alt = 3
Insn 3: point = 42, n_alt = 3
Deleting dead insn 3
deleting insn with uid = 3.

   Yet when it prints "Global pseudo live data has been updated" after
   all this, r51 is live again :(

BB 8:
 livein: 8:

43   44   47   48   49   50   51
 liveout: 8:

28   32   34   43   44   47   48   49   50   51

   Eventually, it assigns 2 to r51, resulting in just the compare and
   branch instruction remaining in the assembly.

   Is this an LRA bug or is the target doing something wrong?

I've reproduced this.  Probably it is a bug with live info update when 
fp->sp elimination became invalid.


I'll start to work on this problem on the next week and hope to have a 
fix soon after that.





Re: LRA for avr: help with FP and elimination

2023-08-10 Thread SenthilKumar.Selvaraj--- via Gcc
On Tue, 2023-07-18 at 11:04 -0400, Vladimir Makarov wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the 
> content is safe
> 
> On 7/17/23 03:17, senthilkumar.selva...@microchip.com wrote:
> > On Fri, 2023-07-14 at 09:29 -0400, Vladimir Makarov wrote:
> > > If you send me the preprocessed test, I could start to work on it to fix
> > > the problems.  I think it is hard to fix them right for a person having
> > > a little experience with LRA.
> > > 
> > > 
> > Ok, this is a reduced test case that reproduces the failure.
> > 
> > $ cat case.c
> > typedef int HItype __attribute__ ((mode (HI)));
> > HItype
> > __mulvhi3 (HItype a, HItype b)
> > {
> >HItype w;
> > 
> >if (__builtin_mul_overflow (a, b, ))
> >  __builtin_trap ();
> > 
> >return w;
> > }
> > 
> > On latest master, this trivial patch turns on LRA for avr
> > --- gcc/config/avr/avr.cc
> > +++ gcc/config/avr/avr.cc
> > @@ -15244,9 +15244,6 @@ avr_float_lib_compare_returns_bool (machine_mode 
> > mode, enum rtx_code)
> >   #undef  TARGET_CONVERT_TO_TYPE
> >   #define TARGET_CONVERT_TO_TYPE avr_convert_to_type
> > 
> > -#undef TARGET_LRA_P
> > -#define TARGET_LRA_P hook_bool_void_false
> > -
> >   #undef  TARGET_ADDR_SPACE_SUBSET_P
> >   #define TARGET_ADDR_SPACE_SUBSET_P avr_addr_space_subset_p
> > 
> > Then configuring and building for avr without attempting to build libgcc
> > 
> > $ configure --target=avr --prefix= --enable-languages=c && make 
> > all-host && make install-host
> > 
> > And finally to reproduce the failure
> > $ /bin/avr-gcc -mmcu=avr25 case.c -Os
> 
> Thank you.  I've reproduced the bug and started to work on it
> yesterday.  The problem is a bit tricky than I initially thought but I
> believe I'll fix it on this week.
> 
> 
Hi Vlad,

  I can confirm your commit 
(https://gcc.gnu.org/git?p=gcc.git;a=commit;h=2971ff7b1d564ac04b537d907c70e6093af70832)
  fixes the above problem, thank you. However, I see execution failures if a 
  pseudo assigned to FP has to be spilled because of stack slot creation.

  To reproduce, build the compiler just like above, and then do

$ avr-gcc -mmcu=avr51 
/gcc/testsuite/gcc.c-torture/execute/20050224-1.c -O2 -S 
-fdump-rtl-all

  The execution failure occurs at this point

movw r24,r2
sbiw r24,36
brne .L8

   r2 is never set anywhere at all in the assembly.

The relevant insns (in the IRA dump) are

(insn 3 15 4 3 (set (reg/v:HI 51 [ j ])
(const_int 0 [0])) 
"gcc/gcc/testsuite/gcc.c-torture/execute/20050224-1.c":19:21 101 {*movhi_split}
 (expr_list:REG_EQUAL (const_int 0 [0])
(nil)))
...
(insn 28 27 67 8 (parallel [
(set (reg/v:HI 51 [ j ])
(plus:HI (reg/v:HI 51 [ j ])
(const_int 1 [0x1])))
(clobber (scratch:QI))
]) 
"/home/i41766/code/personal/gcc/gcc/testsuite/gcc.c-torture/execute/20050224-1.c":28:8
 175 {addhi3_clobber}
 (nil))
...
(jump_insn 44 43 45 13 (parallel [
(set (pc)
(if_then_else (ne (reg/v:HI 51 [ j ])
(const_int 36 [0x24]))
(label_ref:HI 103)
(pc)))
(clobber (scratch:QI))
]) 
"/home/i41766/code/personal/gcc/gcc/testsuite/gcc.c-torture/execute/20050224-1.c":11:16
 discrim 1 713 {cbranchhi4_insn}
 (expr_list:REG_DEAD (reg/v:HI 51 [ j ])
(int_list:REG_BR_PROB 7 (nil)))
 -> 103)

  LRA deletes insns 3 and 28, and uses r2 in the jump_insn.

  In the reload dump, for pseudo r51, I'm seeing this
subreg regs:
   Frame pointer can not be eliminated anymore
   Spilling non-eliminable hard regs: 28 29
 Spilling r51(28)
  Slot 0 regnos (width = 0): 46
  Slot 1 regnos (width = 0): 45

  lra_update_fp2sp_elimination calls spill_pseudos with
  HARD_FRAME_POINTER_REGNUM, and that sets reg_renumber[51] to -1.

  Later down the line, process_bb_lives is called with dead_insn_p=true from
  lra_create_lives_ranges_1 on the relevant BB (#8), and df_get_live_out
  on that BB does not contain 51 (even though previous calls to the same BB 
did).
  
Breakpoint 8, process_bb_lives (bb=0x7fffea570240, curr_point=@0x7fffd838: 
25, dead_insn_p=true) at gcc/gcc/lra-lives.cc:664
664   function_abi last_call_abi = default_function_abi;
(gdb) n
666   reg_live_out = df_get_live_out (bb);
(gdb) 
667   sparseset_clear (pseudos_live);
(gdb) p debug_bitmap(reg_live_out)

first = 0x321c128 current = 0x321c128 indx = 0
0x321c128 next = (nil) prev = (nil) indx = 0
bits = { 28 32 34 43 44 47 48 49 50 }

  process_bb_lives then considers the insn setting 51 (and
  the reload insns LRA created) as dead, and removes them.

BB 8
   Insn 67: point = 31, n_alt = -1
   Insn 114: point = 31, n_alt = 3
   Deleting dead insn 114
deleting insn with uid = 114.
   Insn 28: point = 31, n_alt = 1
   Deleting dead insn 28
deleting insn with uid = 28.
   Insn 113: point = 31, n_alt 

Re: LRA for avr: help with FP and elimination

2023-07-27 Thread Paul Koning via Gcc



> On Jul 27, 2023, at 7:50 AM, Maciej W. Rozycki  wrote:
> 
> On Fri, 14 Jul 2023, Vladimir Makarov via Gcc wrote:
> 
>>>  On the avr, the stack pointer (SP)
>>>   is not used to access stack slots
>> It is very uncommon target then.
> 
> Same with the VAX target.  SP is used for outgoing function arguments, 
> function calls, alloca only.  AP is used for incoming function arguments 
> and is set automatically by hardware at function entry.  FP is used for 
> local variables and is likewise set by hardware at function entry.

While most other targets maintain FP in software, doesn't the same description 
apply to any target that can have a frame pointer.  The frame pointer may be 
used only some of the time (PDP-11) or always (VAX) but when it's used local 
variable references and argument references would go through FP, not SP, right?

paul




Re: LRA for avr: help with FP and elimination

2023-07-27 Thread Maciej W. Rozycki
On Fri, 14 Jul 2023, Vladimir Makarov via Gcc wrote:

> >   On the avr, the stack pointer (SP)
> >is not used to access stack slots
> It is very uncommon target then.

 Same with the VAX target.  SP is used for outgoing function arguments, 
function calls, alloca only.  AP is used for incoming function arguments 
and is set automatically by hardware at function entry.  FP is used for 
local variables and is likewise set by hardware at function entry.  The 
RET instruction sets SP from FP automatically as the first step of the 
function return sequence.

 I guess it'll affect LRA conversion of the target too.

  Maciej


Re: LRA for avr: help with FP and elimination

2023-07-18 Thread Vladimir Makarov via Gcc



On 7/17/23 03:17, senthilkumar.selva...@microchip.com wrote:

On Fri, 2023-07-14 at 09:29 -0400, Vladimir Makarov wrote:

If you send me the preprocessed test, I could start to work on it to fix
the problems.  I think it is hard to fix them right for a person having
a little experience with LRA.



Ok, this is a reduced test case that reproduces the failure.

$ cat case.c
typedef int HItype __attribute__ ((mode (HI)));
HItype
__mulvhi3 (HItype a, HItype b)
{
   HItype w;

   if (__builtin_mul_overflow (a, b, ))
 __builtin_trap ();

   return w;
}

On latest master, this trivial patch turns on LRA for avr
--- gcc/config/avr/avr.cc
+++ gcc/config/avr/avr.cc
@@ -15244,9 +15244,6 @@ avr_float_lib_compare_returns_bool (machine_mode mode, 
enum rtx_code)
  #undef  TARGET_CONVERT_TO_TYPE
  #define TARGET_CONVERT_TO_TYPE avr_convert_to_type
  
-#undef TARGET_LRA_P

-#define TARGET_LRA_P hook_bool_void_false
-
  #undef  TARGET_ADDR_SPACE_SUBSET_P
  #define TARGET_ADDR_SPACE_SUBSET_P avr_addr_space_subset_p
  
Then configuring and building for avr without attempting to build libgcc


$ configure --target=avr --prefix= --enable-languages=c && make all-host 
&& make install-host

And finally to reproduce the failure
$ /bin/avr-gcc -mmcu=avr25 case.c -Os


Thank you.  I've reproduced the bug and started to work on it 
yesterday.  The problem is a bit tricky than I initially thought but I 
believe I'll fix it on this week.





Re: LRA for avr: help with FP and elimination

2023-07-17 Thread SenthilKumar.Selvaraj--- via Gcc
On Fri, 2023-07-14 at 09:29 -0400, Vladimir Makarov wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the 
> content is safe
> 
> On 7/13/23 05:27, SenthilKumar.Selvaraj--- via Gcc wrote:
> > Hi,
> > 
> >I've been spending some (spare) time checking what it would take to
> >make LRA work for the avr target.
> > 
> >Right after I removed the TARGET_LRA_P hook disabling LRA, building
> >libgcc failed with a weird ICE.
> >   On the avr, the stack pointer (SP)
> >is not used to access stack slots
> It is very uncommon target then.
> >   - TARGET_CAN_ELIMINATE returns false
> >if frame_pointer_needed, and TARGET_FRAME_POINTER_REQUIRED returns true
> >if get_frame_size() > 0.
> > 
> >With LRA, however, reload generates
> > 
> > (insn 159 239 240 7 (set (mem/c:QI (plus:HI (reg/f:HI 32 __SP_L__)
> >  (const_int 1 [0x1])) [2 %sfp+1 S1 A8])
> >  (reg:QI 24 r24 [orig:86 a ] [86])) "case.c":7:7 86 
> > {movqi_insn_split}
> >   (nil))
> > 
> >and the backend code errors out when it finds SP is being used as a
> >pointer register.
> > 
> >Digging through the RTL dumps, I found the following. For the
> >following insn sequence in *.ira
> > 
> > (insn 189 128 159 7 (set (reg:HI 58 [ b ])
> >  (const_int 0 [0])) "case.c":7:7 101 {*movhi_split}
> >   (nil))
> > (insn 159 189 160 7 (set (subreg:QI (reg:HI 58 [ b ]) 0)
> >  (reg:QI 86 [ a ])) "case.c":7:7 86 {movqi_insn_split}
> >   (nil))
> > (insn 160 159 32 7 (set (subreg:QI (reg:HI 58 [ b ]) 1)
> >  (reg:QI 87 [ a+1 ])) "case.c":7:7 86 {movqi_insn_split}
> >   (nil))
> > 
> >1. For r58, IRA picks R28:R29, which is the frame pointer for avr.
> > 
> >Popping a13(r58,l0)  -- assign reg 28
> > 
> >2. LRA sees the subreg in insn 159 and generates a reload reg
> >(r125).  simplify_subreg_regno (lra-constraints.cc:1810) however
> >bails (returns -1) if the reg involved is FRAME_POINTER_REGNUM and
> >reload isn't completed yet. LRA therefore decides rclass for the
> >pseudo reg is NO_REGS.
> > 
> > 
> > Creating newreg=125 from oldreg=58, assigning class NO_REGS to subreg reg 
> > r125
> >159: r125:HI#0=r86:QI
> > 
> >4. As rclass is NO_REGS, LRA picks an insn alternative that involves 
> > memory.
> >That is my understanding, please correct me if I'm wrong.
> > 
> >  0 Small class reload: reject+=3
> >  0 Non input pseudo reload: reject++
> >  Cycle danger: overall += LRA_MAX_REJECT
> >alt=0,overall=610,losers=1,rld_nregs=1
> >  0 Small class reload: reject+=3
> >  0 Non input pseudo reload: reject++
> >  alt=1: Bad operand -- refuse
> >  0 Non pseudo reload: reject++
> >alt=2,overall=1,losers=0,rld_nregs=0
> >Choosing alt 2 in insn 159:  (0) Qm  (1) rY00 {movqi_insn_split}
> > 
> >5. LRA creates stack slots, and then uses the FP register to access
> >the slots. This is despite r58 already being assigned R28:R29.
> > 
> >6. TARGET_FRAME_POINTER_REQUIRED is never called, and therefore
> >   frame_pointer_needed is not set, despite the creation of stack
> >   slots. TARGET_CAN_ELIMINATE therefore okays elimination of FP to SP,
> >   and this eventually causes the ICE when the avr backend sees SP being
> >   used as a pointer register.
> > 
> >This is the relevant sequence after reload
> > 
> > (insn 189 128 239 7 (set (reg:HI 28 r28 [orig:58 b ] [58])
> >  (const_int 0 [0])) "case.c":7:7 101 {*movhi_split}
> >   (nil))
> > (insn 239 189 159 7 (set (mem/c:HI (plus:HI (reg/f:HI 32 __SP_L__)
> >  (const_int 1 [0x1])) [2 %sfp+1 S2 A8])
> >  (reg:HI 28 r28 [orig:58 b ] [58])) "case.c":7:7 101 {*movhi_split}
> >   (nil))
> > (insn 159 239 240 7 (set (mem/c:QI (plus:HI (reg/f:HI 32 __SP_L__)
> >  (const_int 1 [0x1])) [2 %sfp+1 S1 A8])
> >  (reg:QI 24 r24 [orig:86 a ] [86])) "case.c":7:7 86 
> > {movqi_insn_split}
> >   (nil))
> > (insn 240 159 241 7 (set (reg:HI 28 r28 [orig:58 b ] [58])
> >  (mem/c:HI (plus:HI (reg/f:HI 32 __SP_L__)
> >  (const_int 1 [0x1])) [2 %sfp+1 S2 A8])) "case.c":7:7 101 
> > {*movhi_split}
> >   (nil))
> > (insn 241 240 160 7 (set (mem/c:HI (plus:HI (reg/f:HI 32 __SP_L__)
> >  (const_int 1 [0x1])) [2 %sfp+1 S2 A8])
> >  (reg:HI 28 r28 [orig:58 b ] [58])) "case.c":7:7 101 {*movhi_split}
> >   (nil))
> > (insn 160 241 242 7 (set (mem/c:QI (plus:HI (reg/f:HI 32 __SP_L__)
> >  (const_int 2 [0x2])) [2 %sfp+2 S1 A8])
> >  (reg:QI 18 r18 [orig:87 a+1 ] [87])) "case.c":7:7 86 
> > {movqi_insn_split}
> >   (nil))
> > (insn 242 160 33 7 (set (reg:HI 28 r28 [orig:58 b ] [58])
> >  (mem/c:HI (plus:HI (reg/f:HI 32 __SP_L__)
> >  (const_int 1 [0x1])) [2 %sfp+1 S2 

Re: LRA for avr: help with FP and elimination

2023-07-14 Thread Vladimir Makarov via Gcc



On 7/13/23 05:27, SenthilKumar.Selvaraj--- via Gcc wrote:

Hi,

   I've been spending some (spare) time checking what it would take to
   make LRA work for the avr target.

   Right after I removed the TARGET_LRA_P hook disabling LRA, building
   libgcc failed with a weird ICE.



  On the avr, the stack pointer (SP)
   is not used to access stack slots

It is very uncommon target then.

  - TARGET_CAN_ELIMINATE returns false
   if frame_pointer_needed, and TARGET_FRAME_POINTER_REQUIRED returns true
   if get_frame_size() > 0.

   With LRA, however, reload generates

(insn 159 239 240 7 (set (mem/c:QI (plus:HI (reg/f:HI 32 __SP_L__)
 (const_int 1 [0x1])) [2 %sfp+1 S1 A8])
 (reg:QI 24 r24 [orig:86 a ] [86])) "case.c":7:7 86 {movqi_insn_split}
  (nil))

   and the backend code errors out when it finds SP is being used as a
   pointer register.

   Digging through the RTL dumps, I found the following. For the
   following insn sequence in *.ira

(insn 189 128 159 7 (set (reg:HI 58 [ b ])
 (const_int 0 [0])) "case.c":7:7 101 {*movhi_split}
  (nil))
(insn 159 189 160 7 (set (subreg:QI (reg:HI 58 [ b ]) 0)
 (reg:QI 86 [ a ])) "case.c":7:7 86 {movqi_insn_split}
  (nil))
(insn 160 159 32 7 (set (subreg:QI (reg:HI 58 [ b ]) 1)
 (reg:QI 87 [ a+1 ])) "case.c":7:7 86 {movqi_insn_split}
  (nil))

   1. For r58, IRA picks R28:R29, which is the frame pointer for avr.

   Popping a13(r58,l0)  -- assign reg 28

   2. LRA sees the subreg in insn 159 and generates a reload reg
   (r125).  simplify_subreg_regno (lra-constraints.cc:1810) however
   bails (returns -1) if the reg involved is FRAME_POINTER_REGNUM and
   reload isn't completed yet. LRA therefore decides rclass for the
   pseudo reg is NO_REGS.


Creating newreg=125 from oldreg=58, assigning class NO_REGS to subreg reg r125
   159: r125:HI#0=r86:QI

   4. As rclass is NO_REGS, LRA picks an insn alternative that involves memory.
   That is my understanding, please correct me if I'm wrong.

 0 Small class reload: reject+=3
 0 Non input pseudo reload: reject++
 Cycle danger: overall += LRA_MAX_REJECT
   alt=0,overall=610,losers=1,rld_nregs=1
 0 Small class reload: reject+=3
 0 Non input pseudo reload: reject++
 alt=1: Bad operand -- refuse
 0 Non pseudo reload: reject++
   alt=2,overall=1,losers=0,rld_nregs=0
 Choosing alt 2 in insn 159:  (0) Qm  (1) rY00 {movqi_insn_split}

   5. LRA creates stack slots, and then uses the FP register to access
   the slots. This is despite r58 already being assigned R28:R29.

   6. TARGET_FRAME_POINTER_REQUIRED is never called, and therefore
  frame_pointer_needed is not set, despite the creation of stack
  slots. TARGET_CAN_ELIMINATE therefore okays elimination of FP to SP,
  and this eventually causes the ICE when the avr backend sees SP being
  used as a pointer register.

   This is the relevant sequence after reload

(insn 189 128 239 7 (set (reg:HI 28 r28 [orig:58 b ] [58])
 (const_int 0 [0])) "case.c":7:7 101 {*movhi_split}
  (nil))
(insn 239 189 159 7 (set (mem/c:HI (plus:HI (reg/f:HI 32 __SP_L__)
 (const_int 1 [0x1])) [2 %sfp+1 S2 A8])
 (reg:HI 28 r28 [orig:58 b ] [58])) "case.c":7:7 101 {*movhi_split}
  (nil))
(insn 159 239 240 7 (set (mem/c:QI (plus:HI (reg/f:HI 32 __SP_L__)
 (const_int 1 [0x1])) [2 %sfp+1 S1 A8])
 (reg:QI 24 r24 [orig:86 a ] [86])) "case.c":7:7 86 {movqi_insn_split}
  (nil))
(insn 240 159 241 7 (set (reg:HI 28 r28 [orig:58 b ] [58])
 (mem/c:HI (plus:HI (reg/f:HI 32 __SP_L__)
 (const_int 1 [0x1])) [2 %sfp+1 S2 A8])) "case.c":7:7 101 
{*movhi_split}
  (nil))
(insn 241 240 160 7 (set (mem/c:HI (plus:HI (reg/f:HI 32 __SP_L__)
 (const_int 1 [0x1])) [2 %sfp+1 S2 A8])
 (reg:HI 28 r28 [orig:58 b ] [58])) "case.c":7:7 101 {*movhi_split}
  (nil))
(insn 160 241 242 7 (set (mem/c:QI (plus:HI (reg/f:HI 32 __SP_L__)
 (const_int 2 [0x2])) [2 %sfp+2 S1 A8])
 (reg:QI 18 r18 [orig:87 a+1 ] [87])) "case.c":7:7 86 {movqi_insn_split}
  (nil))
(insn 242 160 33 7 (set (reg:HI 28 r28 [orig:58 b ] [58])
 (mem/c:HI (plus:HI (reg/f:HI 32 __SP_L__)
 (const_int 1 [0x1])) [2 %sfp+1 S2 A8])) "case.c":7:7 101 
{*movhi_split}
  (nil))

   For choices other than FP, simplify_subreg_regno returns the correct part
   of the wider HImode reg, so rclass is not NO_REGS, and things workout fine.

   I checked what classic reload does in the same situation - it picks a
   different register (R25) instead of spilling to a stack slot.


(insn 189 128 159 7 (set (reg:HI 28 r28 [orig:58 b ] [58])
 (const_int 0 [0])) "case.c":7:7 101 {*movhi_split}
  (nil))
(insn 159 189 226 7 (set (reg:QI 25 r25)
 (reg:QI 24 r24 [orig:86 a ] [86])) "case.c":7:7 86 

LRA for avr: help with FP and elimination

2023-07-13 Thread SenthilKumar.Selvaraj--- via Gcc
Hi,

  I've been spending some (spare) time checking what it would take to
  make LRA work for the avr target.

  Right after I removed the TARGET_LRA_P hook disabling LRA, building
  libgcc failed with a weird ICE. On the avr, the stack pointer (SP)
  is not used to access stack slots - TARGET_CAN_ELIMINATE returns false
  if frame_pointer_needed, and TARGET_FRAME_POINTER_REQUIRED returns true
  if get_frame_size() > 0.

  With LRA, however, reload generates

(insn 159 239 240 7 (set (mem/c:QI (plus:HI (reg/f:HI 32 __SP_L__)
(const_int 1 [0x1])) [2 %sfp+1 S1 A8])
(reg:QI 24 r24 [orig:86 a ] [86])) "case.c":7:7 86 {movqi_insn_split}
 (nil))

  and the backend code errors out when it finds SP is being used as a
  pointer register.

  Digging through the RTL dumps, I found the following. For the
  following insn sequence in *.ira

(insn 189 128 159 7 (set (reg:HI 58 [ b ])
(const_int 0 [0])) "case.c":7:7 101 {*movhi_split}
 (nil))
(insn 159 189 160 7 (set (subreg:QI (reg:HI 58 [ b ]) 0)
(reg:QI 86 [ a ])) "case.c":7:7 86 {movqi_insn_split}
 (nil))
(insn 160 159 32 7 (set (subreg:QI (reg:HI 58 [ b ]) 1)
(reg:QI 87 [ a+1 ])) "case.c":7:7 86 {movqi_insn_split}
 (nil))

  1. For r58, IRA picks R28:R29, which is the frame pointer for avr.

  Popping a13(r58,l0)  -- assign reg 28

  2. LRA sees the subreg in insn 159 and generates a reload reg
  (r125).  simplify_subreg_regno (lra-constraints.cc:1810) however
  bails (returns -1) if the reg involved is FRAME_POINTER_REGNUM and
  reload isn't completed yet. LRA therefore decides rclass for the
  pseudo reg is NO_REGS.


Creating newreg=125 from oldreg=58, assigning class NO_REGS to subreg reg r125
  159: r125:HI#0=r86:QI

  4. As rclass is NO_REGS, LRA picks an insn alternative that involves memory.
  That is my understanding, please correct me if I'm wrong.

0 Small class reload: reject+=3
0 Non input pseudo reload: reject++
Cycle danger: overall += LRA_MAX_REJECT
  alt=0,overall=610,losers=1,rld_nregs=1
0 Small class reload: reject+=3
0 Non input pseudo reload: reject++
alt=1: Bad operand -- refuse
0 Non pseudo reload: reject++
  alt=2,overall=1,losers=0,rld_nregs=0
 Choosing alt 2 in insn 159:  (0) Qm  (1) rY00 {movqi_insn_split}

  5. LRA creates stack slots, and then uses the FP register to access
  the slots. This is despite r58 already being assigned R28:R29. 

  6. TARGET_FRAME_POINTER_REQUIRED is never called, and therefore
 frame_pointer_needed is not set, despite the creation of stack
 slots. TARGET_CAN_ELIMINATE therefore okays elimination of FP to SP,
 and this eventually causes the ICE when the avr backend sees SP being
 used as a pointer register.

  This is the relevant sequence after reload

(insn 189 128 239 7 (set (reg:HI 28 r28 [orig:58 b ] [58])
(const_int 0 [0])) "case.c":7:7 101 {*movhi_split}
 (nil))
(insn 239 189 159 7 (set (mem/c:HI (plus:HI (reg/f:HI 32 __SP_L__)
(const_int 1 [0x1])) [2 %sfp+1 S2 A8])
(reg:HI 28 r28 [orig:58 b ] [58])) "case.c":7:7 101 {*movhi_split}
 (nil))
(insn 159 239 240 7 (set (mem/c:QI (plus:HI (reg/f:HI 32 __SP_L__)
(const_int 1 [0x1])) [2 %sfp+1 S1 A8])
(reg:QI 24 r24 [orig:86 a ] [86])) "case.c":7:7 86 {movqi_insn_split}
 (nil))
(insn 240 159 241 7 (set (reg:HI 28 r28 [orig:58 b ] [58])
(mem/c:HI (plus:HI (reg/f:HI 32 __SP_L__)
(const_int 1 [0x1])) [2 %sfp+1 S2 A8])) "case.c":7:7 101 
{*movhi_split}
 (nil))
(insn 241 240 160 7 (set (mem/c:HI (plus:HI (reg/f:HI 32 __SP_L__)
(const_int 1 [0x1])) [2 %sfp+1 S2 A8])
(reg:HI 28 r28 [orig:58 b ] [58])) "case.c":7:7 101 {*movhi_split}
 (nil))
(insn 160 241 242 7 (set (mem/c:QI (plus:HI (reg/f:HI 32 __SP_L__)
(const_int 2 [0x2])) [2 %sfp+2 S1 A8])
(reg:QI 18 r18 [orig:87 a+1 ] [87])) "case.c":7:7 86 {movqi_insn_split}
 (nil))
(insn 242 160 33 7 (set (reg:HI 28 r28 [orig:58 b ] [58])
(mem/c:HI (plus:HI (reg/f:HI 32 __SP_L__)
(const_int 1 [0x1])) [2 %sfp+1 S2 A8])) "case.c":7:7 101 
{*movhi_split}
 (nil))

  For choices other than FP, simplify_subreg_regno returns the correct part
  of the wider HImode reg, so rclass is not NO_REGS, and things workout fine.

  I checked what classic reload does in the same situation - it picks a
  different register (R25) instead of spilling to a stack slot.


(insn 189 128 159 7 (set (reg:HI 28 r28 [orig:58 b ] [58])
(const_int 0 [0])) "case.c":7:7 101 {*movhi_split}
 (nil))
(insn 159 189 226 7 (set (reg:QI 25 r25)
(reg:QI 24 r24 [orig:86 a ] [86])) "case.c":7:7 86 {movqi_insn_split}
 (nil))
(insn 226 159 160 7 (set (reg:QI 28 r28)
(reg:QI 25 r25)) "case.c":7:7 86 {movqi_insn_split}
 (nil))
(insn 160 226 227 7 (set (reg:QI 25