Re: Special Memory Constraint [was Re: Indirect memory addresses vs. lra]

2019-08-20 Thread Richard Biener
On Tue, Aug 20, 2019 at 9:07 AM John Darrington
 wrote:
>
> On Tue, Aug 20, 2019 at 08:56:39AM +0200, Richard Biener wrote:
>
>  > Most of these suggestions involve adding some sort of virtual registers
>  > So I hacked the machine description to add two new registers Z1 and Z2
>  > with the same mode as X and Y.
>  >
>  > Obviously the assembler balks at this.  However the compiler still
>  > ICEs at the same place as before.
>  >
>  > So this suggests that our original diagnosis, viz: there are not enough
>  > address registers was not accurate, and in fact there is some other
>  > problem?
>
>  That sounds likely.  Given you have indirect addressing you could
>  simulate N virtual regs by placing them in a virtual reg table in memory
>  and accessed via a fixed address register (assuming all instructions
>  that would need an address reg also can take that indirect from memory).
>
> That was my plan.  Accordingly, extending the md to provide N additional
> regs (N currently = 2) was the first step.  Having doubled the number
> of available address registers, I had expected this would fix most of the
> ICEs (but cause a lot of assembler errors).
>
> However it hasn't eliminated any ICEs.  lra is still complaining
> "unable to find a register to spill" So the plan seems to have fallen
> over at the first hurdle.  Why can it still not spill registers despite
> having a lot more of them?

You really have to sit down and trace the LRA code with a debugger
to tell...  unfortunately the dumps aren't verbose enough to tell.
Usually after spilling the insn constraints can still not be satisfied,
the main question is usually why.

Richard.

> J'


Re: Special Memory Constraint [was Re: Indirect memory addresses vs. lra]

2019-08-20 Thread John Darrington
On Tue, Aug 20, 2019 at 08:56:39AM +0200, Richard Biener wrote:

 > Most of these suggestions involve adding some sort of virtual registers
 > So I hacked the machine description to add two new registers Z1 and Z2
 > with the same mode as X and Y.
 >
 > Obviously the assembler balks at this.  However the compiler still
 > ICEs at the same place as before.
 >
 > So this suggests that our original diagnosis, viz: there are not enough
 > address registers was not accurate, and in fact there is some other
 > problem?
 
 That sounds likely.  Given you have indirect addressing you could
 simulate N virtual regs by placing them in a virtual reg table in memory
 and accessed via a fixed address register (assuming all instructions
 that would need an address reg also can take that indirect from memory).
 
That was my plan.  Accordingly, extending the md to provide N additional
regs (N currently = 2) was the first step.  Having doubled the number
of available address registers, I had expected this would fix most of the 
ICEs (but cause a lot of assembler errors).

However it hasn't eliminated any ICEs.  lra is still complaining 
"unable to find a register to spill" So the plan seems to have fallen
over at the first hurdle.  Why can it still not spill registers despite
having a lot more of them?

J'


Re: Special Memory Constraint [was Re: Indirect memory addresses vs. lra]

2019-08-20 Thread Richard Biener
On Mon, Aug 19, 2019 at 8:06 PM John Darrington
 wrote:
>
> On Mon, Aug 19, 2019 at 10:07:11AM -0500, Segher Boessenkool wrote:
>
>  > ? As I remember there were a few other ideas from Richard Biener and
>  > Segher Boessenkool.? I also proposed to add a new address register 
> which
>  > will be always a fixed stack memory slot at the end. Unfortunately I am
>  > not familiar with the target and the port to say in details how to do
>  > it.? But I think it is worth to try.
>
>  The m68hc11 port used the fake Z register approach, and I believe it had
>  some special machine pass to get rid of it right before assembler output.
>
>  (r171302 is when it was removed -- last version was
>  
> https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/config/m68hc11/m68hc11.c;h=1e414102c3f1fed985e4fb8db7954342e965190b;hb=bae8bb65d842d7ffefe990c1f0ac004491f3c105#l4061
>  for the machine reorg stuff).
>
>  No idea how well it works...  But it's only needed if you are forced to
>  have a frame pointer IIUC?
>
>
>  Segher
>
>
> Most of these suggestions involve adding some sort of virtual registers
> So I hacked the machine description to add two new registers Z1 and Z2
> with the same mode as X and Y.
>
> Obviously the assembler balks at this.  However the compiler still
> ICEs at the same place as before.
>
> So this suggests that our original diagnosis, viz: there are not enough
> address registers was not accurate, and in fact there is some other
> problem?

That sounds likely.  Given you have indirect addressing you could
simulate N virtual regs by placing them in a virtual reg table in memory
and accessed via a fixed address register (assuming all instructions
that would need an address reg also can take that indirect from memory).

Richard.

> J'
>
> --
> Avoid eavesdropping.  Send strong encrypted email.
> PGP Public key ID: 1024D/2DE827B3
> fingerprint = 8797 A26D 0854 2EAB 0285  A290 8A67 719C 2DE8 27B3
> See http://sks-keyservers.net or any PGP keyserver for public key.
>


Re: Special Memory Constraint [was Re: Indirect memory addresses vs. lra]

2019-08-19 Thread John Darrington
On Mon, Aug 19, 2019 at 10:07:11AM -0500, Segher Boessenkool wrote:

 > ? As I remember there were a few other ideas from Richard Biener and 
 > Segher Boessenkool.? I also proposed to add a new address register which 
 > will be always a fixed stack memory slot at the end. Unfortunately I am 
 > not familiar with the target and the port to say in details how to do 
 > it.? But I think it is worth to try.
 
 The m68hc11 port used the fake Z register approach, and I believe it had
 some special machine pass to get rid of it right before assembler output.
 
 (r171302 is when it was removed -- last version was
 
https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/config/m68hc11/m68hc11.c;h=1e414102c3f1fed985e4fb8db7954342e965190b;hb=bae8bb65d842d7ffefe990c1f0ac004491f3c105#l4061
 for the machine reorg stuff).
 
 No idea how well it works...  But it's only needed if you are forced to
 have a frame pointer IIUC?
 
 
 Segher


Most of these suggestions involve adding some sort of virtual registers
So I hacked the machine description to add two new registers Z1 and Z2 
with the same mode as X and Y.

Obviously the assembler balks at this.  However the compiler still
ICEs at the same place as before.

So this suggests that our original diagnosis, viz: there are not enough
address registers was not accurate, and in fact there is some other
problem?

J'

-- 
Avoid eavesdropping.  Send strong encrypted email.
PGP Public key ID: 1024D/2DE827B3 
fingerprint = 8797 A26D 0854 2EAB 0285  A290 8A67 719C 2DE8 27B3
See http://sks-keyservers.net or any PGP keyserver for public key.



Re: Special Memory Constraint [was Re: Indirect memory addresses vs. lra]

2019-08-19 Thread Segher Boessenkool
On Mon, Aug 19, 2019 at 09:14:22AM -0400, Vladimir Makarov wrote:
> On 2019-08-19 3:35 a.m., John Darrington wrote:
> >On Fri, Aug 16, 2019 at 10:50:13AM -0400, Vladimir Makarov wrote:
> >  No I meant something like that
> >  
> >  (define_special_memory_constraint "a" ...)
> >  (define_predicate "my_special_predicate" ...
> > 
> >   {
> > if (lra_in_progress_p)
> >   return REG_P (op) && REGNO (op) >= FIRST_PSEUDO_REGISTER && 
> >   reg_renumber[REGNO(op)] < 0;
> > return true if memory with sp addressing;
> >  })
> >  
> >  I think LRA spills pseudo-register and it will be memory addressed 
> >  by sp
> >  at the end of LRA.
> >
> >What I've done is this:
> >
> >(define_predicate "my_special_predicate"
> > (match_operand 0 "memory_operand")
> >  {
> >debug_rtx (op);
> >gcc_assert (MEM_P (op));
> >op = XEXP (op, 0);
> >if (GET_CODE (op) == PLUS)
> >  op = XEXP (op, 0);
> >
> >if (lra_in_progress)
> >  {
> >fprintf (stderr, "%s:%d\n", __FILE__, __LINE__);
> >return REG_P (op) && REGNO (op) >= FIRST_PSEUDO_REGISTER && 
> >reg_renumber[REGNO(op)] < 0;
> >  }
> >
> >
> >if (REG_P (op))
> >  {
> >int regno = REGNO (op);
> >return (regno == 10); // register is the stack pointer
> >  }
> >
> >return true;
> >  })
> >
> >  (and many variations)  Unfortunately, any moderately complicated input
> >  still results in a (mem (reg) ) insn repeatedly entering the
> >  lra_in_progress case and returning false, and eventually terminating with
> >  
> >  "internal compiler error: maximum number of generated reload insns per 
> >  insn achieved (90)"
> >
> >
> >Any other ideas?
>   As I remember there were a few other ideas from Richard Biener and 
> Segher Boessenkool.  I also proposed to add a new address register which 
> will be always a fixed stack memory slot at the end. Unfortunately I am 
> not familiar with the target and the port to say in details how to do 
> it.  But I think it is worth to try.

The m68hc11 port used the fake Z register approach, and I believe it had
some special machine pass to get rid of it right before assembler output.

(r171302 is when it was removed -- last version was
https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/config/m68hc11/m68hc11.c;h=1e414102c3f1fed985e4fb8db7954342e965190b;hb=bae8bb65d842d7ffefe990c1f0ac004491f3c105#l4061
for the machine reorg stuff).

No idea how well it works...  But it's only needed if you are forced to
have a frame pointer IIUC?


Segher


Re: Special Memory Constraint [was Re: Indirect memory addresses vs. lra]

2019-08-19 Thread Vladimir Makarov



On 2019-08-19 3:35 a.m., John Darrington wrote:

On Fri, Aug 16, 2019 at 10:50:13AM -0400, Vladimir Makarov wrote:
  
  
  No I meant something like that
  
  (define_special_memory_constraint "a" ...)

  (define_predicate "my_special_predicate" ...

   {
 if (lra_in_progress_p)
   return REG_P (op) && REGNO (op) >= FIRST_PSEUDO_REGISTER && 
reg_renumber[REGNO(op)] < 0;
 return true if memory with sp addressing;
  })
  
  I think LRA spills pseudo-register and it will be memory addressed by sp

  at the end of LRA.

What I've done is this:

(define_predicate "my_special_predicate"
(match_operand 0 "memory_operand")
  {
debug_rtx (op);
gcc_assert (MEM_P (op));
op = XEXP (op, 0);
if (GET_CODE (op) == PLUS)
  op = XEXP (op, 0);

if (lra_in_progress)
  {
fprintf (stderr, "%s:%d\n", __FILE__, __LINE__);
return REG_P (op) && REGNO (op) >= FIRST_PSEUDO_REGISTER && 
reg_renumber[REGNO(op)] < 0;
  }


if (REG_P (op))
  {
int regno = REGNO (op);
return (regno == 10); // register is the stack pointer
  }

return true;
  })

  (and many variations)  Unfortunately, any moderately complicated input
  still results in a (mem (reg) ) insn repeatedly entering the
  lra_in_progress case and returning false, and eventually terminating with
  
  "internal compiler error: maximum number of generated reload insns per insn achieved (90)"



Any other ideas?
  As I remember there were a few other ideas from Richard Biener and 
Segher Boessenkool.  I also proposed to add a new address register which 
will be always a fixed stack memory slot at the end. Unfortunately I am 
not familiar with the target and the port to say in details how to do 
it.  But I think it is worth to try.


Re: Special Memory Constraint [was Re: Indirect memory addresses vs. lra]

2019-08-19 Thread John Darrington
On Fri, Aug 16, 2019 at 10:50:13AM -0400, Vladimir Makarov wrote:
 
 
 No I meant something like that
 
 (define_special_memory_constraint "a" ...)
 (define_predicate "my_special_predicate" ...

  {
if (lra_in_progress_p)
  return REG_P (op) && REGNO (op) >= FIRST_PSEUDO_REGISTER && 
reg_renumber[REGNO(op)] < 0;
return true if memory with sp addressing;
 })
 
 I think LRA spills pseudo-register and it will be memory addressed by sp
 at the end of LRA.

What I've done is this:

(define_predicate "my_special_predicate"
(match_operand 0 "memory_operand")
 {
   debug_rtx (op);
   gcc_assert (MEM_P (op));
   op = XEXP (op, 0);
   if (GET_CODE (op) == PLUS)
 op = XEXP (op, 0);

   if (lra_in_progress)
 {
   fprintf (stderr, "%s:%d\n", __FILE__, __LINE__);
   return REG_P (op) && REGNO (op) >= FIRST_PSEUDO_REGISTER && 
reg_renumber[REGNO(op)] < 0;
 }


   if (REG_P (op))
 {
   int regno = REGNO (op);
   return (regno == 10); // register is the stack pointer
 }

   return true;
 })

 (and many variations)  Unfortunately, any moderately complicated input
 still results in a (mem (reg) ) insn repeatedly entering the
 lra_in_progress case and returning false, and eventually terminating with
 
 "internal compiler error: maximum number of generated reload insns per insn 
achieved (90)"


Any other ideas?

J'


Re: Special Memory Constraint [was Re: Indirect memory addresses vs. lra]

2019-08-16 Thread Vladimir Makarov



On 2019-08-16 7:23 a.m., John Darrington wrote:

On Thu, Aug 15, 2019 at 02:23:45PM -0400, Vladimir Makarov wrote:

  > I tried this solution earlier.  But unfortunately it makes things 
worse.  What happens is it libgcc cannot
  > even be built -- ICEs occur on a memory from  address reg insn such as:
  > (insn 117 2981 3697 5 (set (mem/f:PSI (plus:PSI (reg:PSI 1309)
  >  (const_int 102 [0x66])) [3 fs_129(D)->pc+0 S4 A8])
  >  (reg:PSI 1310)) 
"/home/jmd/Source/GCC2/libgcc/unwind-dw2.c":977:9 96 {movpsi}
  >
  I see.?? Then for the insn, you could try to create a pattern
  "memory,special memory constraint".?? The special memory constraint
  should satisfy only spilled pseudo (pseudo with reg_renumber == -1).?? I
  believe lra-constraints.c can spill the pseudo and the end you will have
  mem[disp1 + r8|r9|sp] = mem[disp1+sp].

You mean something like this:

(define_special_memory_constraint "a"
  "My special memory constraint"
  (match_operand 0 "my_special_predicate")
)

(define_predicate "my_special_predicate"
(match_operand 0 "memory_operand")
  {
   debug_rtx (op);
   if (MEM_P (op))
   {
 op = XEXP (op, 0);
 if (GET_CODE (op) == PLUS)
   {
op = XEXP (op, 0);
if (REG_P (op))
  {
fprintf (stderr, "Reg number is %d\n", REGNO (op));
if (REGNO (op) >= 0)
  return false;
  }
   }
   }
   return true;
})


No I meant something like that

(define_special_memory_constraint "a" ...)
(define_predicate "my_special_predicate" ...

 {
   if (lra_in_progress_p)
 return REG_P (op) && REGNO (op) >= FIRST_PSEUDO_REGISTER && 
reg_renumber[REGNO(op)] < 0;
   return true if memory with sp addressing;
})

I think LRA spills pseudo-register and it will be memory addressed by sp 
at the end of LRA.



When I use this I get lots of the following ICEs

  "internal compiler error: maximum number of generated reload insns per insn 
achieved (90)"

It seems logical to me that this would happen since the constraint is not going 
to match any
operand with resolved registers.  Thus it will continually reload.

... which makes me think I've probably misunderstood what you are saying.

J'




Special Memory Constraint [was Re: Indirect memory addresses vs. lra]

2019-08-16 Thread John Darrington
On Thu, Aug 15, 2019 at 02:23:45PM -0400, Vladimir Makarov wrote:

 > I tried this solution earlier.  But unfortunately it makes things worse. 
 What happens is it libgcc cannot
 > even be built -- ICEs occur on a memory from  address reg insn such as:
 > (insn 117 2981 3697 5 (set (mem/f:PSI (plus:PSI (reg:PSI 1309)
 >  (const_int 102 [0x66])) [3 fs_129(D)->pc+0 S4 A8])
 >  (reg:PSI 1310)) 
"/home/jmd/Source/GCC2/libgcc/unwind-dw2.c":977:9 96 {movpsi}
 > 
 I see.?? Then for the insn, you could try to create a pattern
 "memory,special memory constraint".?? The special memory constraint
 should satisfy only spilled pseudo (pseudo with reg_renumber == -1).?? I
 believe lra-constraints.c can spill the pseudo and the end you will have
 mem[disp1 + r8|r9|sp] = mem[disp1+sp].

You mean something like this:

(define_special_memory_constraint "a"
 "My special memory constraint"
 (match_operand 0 "my_special_predicate")
)

(define_predicate "my_special_predicate"
(match_operand 0 "memory_operand")
 {
  debug_rtx (op);
  if (MEM_P (op))
  {
op = XEXP (op, 0);
if (GET_CODE (op) == PLUS)
  {
op = XEXP (op, 0);
if (REG_P (op))
  {
fprintf (stderr, "Reg number is %d\n", REGNO (op));
if (REGNO (op) >= 0)
  return false;
  }
  }
  }
  return true;
})

When I use this I get lots of the following ICEs

 "internal compiler error: maximum number of generated reload insns per 
insn achieved (90)"

It seems logical to me that this would happen since the constraint is not going 
to match any
operand with resolved registers.  Thus it will continually reload.

... which makes me think I've probably misunderstood what you are saying.

J'


-- 
Avoid eavesdropping.  Send strong encrypted email.
PGP Public key ID: 1024D/2DE827B3 
fingerprint = 8797 A26D 0854 2EAB 0285  A290 8A67 719C 2DE8 27B3
See http://sks-keyservers.net or any PGP keyserver for public key.



Re: Indirect memory addresses vs. lra

2019-08-15 Thread Segher Boessenkool
On Thu, Aug 15, 2019 at 02:30:19PM -0400, Vladimir Makarov wrote:
> >Couldn't we spill the frame pointer? Basically we should be able to 
> >compute the first address into a reg, spill that, do the second (both 
> >could require the frame pointer), spill the frame pointer, reload the 
> >first computed address from the stack, execute the insn and then reload 
> >the frame pointer.
> >
> >Maybe the frame pointer can also be implemented 'virually' in an index 
> >register that you keep updated so that sp + reg
> >Is the FP. Or frame accesses can use a
> >Stack slot as FP and the indirect memory
> >Addressing... (is there an indirect lea?)
> >
> Yes, it could be a solution.  It just needs some target maintainer 
> creativity.  There are a lot of things (tricks) can be done in 
> machine-dependent code which would not require RA changes.

You can even go as far as not having the hard frame pointer be a machine
register at all.  In RTL it will still be a reg, but that doesn't mean
the machine code you emit should be like that; you can use a special
fixed memory location for it, for example.


Segher


Re: Indirect memory addresses vs. lra

2019-08-15 Thread Vladimir Makarov

On 8/15/19 12:38 PM, Richard Biener wrote:

On August 15, 2019 6:29:13 PM GMT+02:00, Vladimir Makarov  
wrote:

On 8/10/19 2:05 AM, John Darrington wrote:

On Fri, Aug 09, 2019 at 01:34:36PM -0400, Vladimir Makarov wrote:
   
   If you provide LRA dump for such test (it is better to use

   -fira-verbose=15 to output full RA info into stderr), I

probably could

   say more.

I've attached such a dump (generated from

gcc/testsuite/gcc.c-torture/compile/pr53410-2.c).
   
   The less regs the architecture has, thoke easier to run into

such error

   message if something described wrong in the back-end.?? I see

your

   architecture is 16-bit micro-controller with only 8 regs, some

of them is

   specialized.?? So your architecture is really register

constrained.

That's not quite correct.  It is a 24-bit micro-controller (the

address

space is 24 bits wide).  There are 2 address registers (plus stack
pointer and program counter) and there are 8 general purpose data
registers (of differing sizes).
   


J'


Thank you for providing the sources.  It helped me to understand what
is
going on.  So the test crashes on

/home/jmd/Source/GCC2/gcc/testsuite/gcc.c-torture/compile/pr53410-2.c:
In function ‘f1’:
/home/jmd/Source/GCC2/gcc/testsuite/gcc.c-torture/compile/pr53410-2.c:10:1:
error: unable to find a register to spill
/home/jmd/Source/GCC2/gcc/testsuite/gcc.c-torture/compile/pr53410-2.c:10:1:
error: this is the insn:
(insn 14 49 15 2 (set (mem:SI (plus:PSI (reg/f:PSI 40 [34])
 (const_int 32 [0x20])) [2  S4 A64])
(mem:SI (reg:PSI 41) [2 *p_5(D)+0 S4 A8]))
"/home/jmd/Source/GCC2/gcc/testsuite/gcc.c-torture/compile/pr53410-2.c":9:9
95 {*movsi}
  (expr_list:REG_DEAD (reg:PSI 41)
 (expr_list:REG_DEAD (reg/f:PSI 40 [34])
 (nil

Your target has only 2 non-fixed addr registers (r8, r9).  One (r9) is
defined as a hard reg pointer pointer. Honestly, I never saw a target
with such register constraints.

-O0 assumes -fno-omit-frame-pointer.  So in -O0 mode we have only *one*
free addr reg for insn which requires *2* of them.  That is why the GCC
port crashes on this test.  If you add -fomit-frame-pointer, the test
succeeds.

But even if use -fomit-frame-pointer,  it is not guaranteed that hard
reg pointer will be substituted by stack pointer.  There are many cases
where it is not possible (e.g. in case of alloca usage).

So what can be done, imho.  The simplest solution would be preventing
insns with more one memory operand.  The more difficult solution would
be permitting two memory one with address pseudo and another one with
stack pointer.

Couldn't we spill the frame pointer? Basically we should be able to compute the 
first address into a reg, spill that, do the second (both could require the 
frame pointer), spill the frame pointer, reload the first computed address from 
the stack, execute the insn and then reload the frame pointer.

Maybe the frame pointer can also be implemented 'virually' in an index register 
that you keep updated so that sp + reg
Is the FP. Or frame accesses can use a
Stack slot as FP and the indirect memory
Addressing... (is there an indirect lea?)

Yes, it could be a solution.  It just needs some target maintainer 
creativity.  There are a lot of things (tricks) can be done in 
machine-dependent code which would not require RA changes.


Re: Indirect memory addresses vs. lra

2019-08-15 Thread Vladimir Makarov

On 8/15/19 1:35 PM, John Darrington wrote:

On Thu, Aug 15, 2019 at 12:29:13PM -0400, Vladimir Makarov wrote:


  Thank you for providing the sources.?? It helped me to understand what is
  going on.?? So the test crashes on
  
  /home/jmd/Source/GCC2/gcc/testsuite/gcc.c-torture/compile/pr53410-2.c: In function ???f1???:

  
/home/jmd/Source/GCC2/gcc/testsuite/gcc.c-torture/compile/pr53410-2.c:10:1: 
error: unable to find a register to spill
  
/home/jmd/Source/GCC2/gcc/testsuite/gcc.c-torture/compile/pr53410-2.c:10:1: 
error: this is the insn:
  (insn 14 49 15 2 (set (mem:SI (plus:PSI (reg/f:PSI 40 [34])
  (const_int 32 [0x20])) [2  S4 A64])
  (mem:SI (reg:PSI 41) [2 *p_5(D)+0 S4 A8])) 
"/home/jmd/Source/GCC2/gcc/testsuite/gcc.c-torture/compile/pr53410-2.c":9:9 95 
{*movsi}
   (expr_list:REG_DEAD (reg:PSI 41)
  (expr_list:REG_DEAD (reg/f:PSI 40 [34])
  (nil

Thanks for taking a look.
  
  Your target has only 2 non-fixed addr registers (r8, r9).  One (r9) is defined as a hard reg pointer pointer.


That is correct.

  Honestly, I never saw a target with such register constraints.

My recollection is that MC68HC11 was the same.
  
  So what can be done, imho.  The simplest solution would be preventing insns with more one memory operand.


I tried this solution earlier.  But unfortunately it makes things worse.  What 
happens is it libgcc cannot
even be built -- ICEs occur on a memory from  address reg insn such as:
  
(insn 117 2981 3697 5 (set (mem/f:PSI (plus:PSI (reg:PSI 1309)

 (const_int 102 [0x66])) [3 fs_129(D)->pc+0 S4 A8])
(reg:PSI 1310)) 
"/home/jmd/Source/GCC2/libgcc/unwind-dw2.c":977:9 96 {movpsi}

I see.  Then for the insn, you could try to create a pattern 
"memory,special memory constraint".  The special memory constraint 
should satisfy only spilled pseudo (pseudo with reg_renumber == -1).  I 
believe lra-constraints.c can spill the pseudo and the end you will have 
mem[disp1 + r8|r9|sp] = mem[disp1+sp].


It might work.  If it is not, we could modify LRA to do this.

Another solution would be adding unexisting register Z and for mem:psi 
[psi:r] = Z you could emit an assembler insn : mem[psi:r] = a stack slot 
corresponding Z.




Re: Indirect memory addresses vs. lra

2019-08-15 Thread John Darrington
On Thu, Aug 15, 2019 at 06:38:30PM +0200, Richard Biener wrote:

   Couldn't we spill the frame pointer? Basically we should be able to
   compute the first address into a reg, spill that, do the second
   (both could require the frame pointer), spill the frame pointer,
   reload the first computed address from the stack, execute the insn
   and then reload the frame pointer. 
 
 Maybe the frame pointer can also be implemented 'virually' in an index 
register that you keep updated so that sp + reg
 Is the FP. Or frame accesses can use a  Stack slot as FP and the indirect 
memory 
 Addressing... (is there an indirect lea?)

Yes.  lea x, [4,x] is a valid instruction.

J'

-- 
Avoid eavesdropping.  Send strong encrypted email.
PGP Public key ID: 1024D/2DE827B3 
fingerprint = 8797 A26D 0854 2EAB 0285  A290 8A67 719C 2DE8 27B3
See http://sks-keyservers.net or any PGP keyserver for public key.



Re: Indirect memory addresses vs. lra

2019-08-15 Thread John Darrington
On Thu, Aug 15, 2019 at 12:29:13PM -0400, Vladimir Makarov wrote:


 Thank you for providing the sources.?? It helped me to understand what is
 going on.?? So the test crashes on
 
 /home/jmd/Source/GCC2/gcc/testsuite/gcc.c-torture/compile/pr53410-2.c: In 
function ???f1???:
 
/home/jmd/Source/GCC2/gcc/testsuite/gcc.c-torture/compile/pr53410-2.c:10:1: 
error: unable to find a register to spill
 
/home/jmd/Source/GCC2/gcc/testsuite/gcc.c-torture/compile/pr53410-2.c:10:1: 
error: this is the insn:
 (insn 14 49 15 2 (set (mem:SI (plus:PSI (reg/f:PSI 40 [34])
 (const_int 32 [0x20])) [2  S4 A64])
 (mem:SI (reg:PSI 41) [2 *p_5(D)+0 S4 A8])) 
"/home/jmd/Source/GCC2/gcc/testsuite/gcc.c-torture/compile/pr53410-2.c":9:9 95 
{*movsi}
  (expr_list:REG_DEAD (reg:PSI 41)
 (expr_list:REG_DEAD (reg/f:PSI 40 [34])
 (nil

Thanks for taking a look.
 
 Your target has only 2 non-fixed addr registers (r8, r9).  One (r9) is 
defined as a hard reg pointer pointer.

That is correct.

 Honestly, I never saw a target with such register constraints.

My recollection is that MC68HC11 was the same.
 
 So what can be done, imho.  The simplest solution would be preventing 
insns with more one memory operand.

I tried this solution earlier.  But unfortunately it makes things worse.  What 
happens is it libgcc cannot
even be built -- ICEs occur on a memory from  address reg insn such as:
 
(insn 117 2981 3697 5 (set (mem/f:PSI (plus:PSI (reg:PSI 1309)
(const_int 102 [0x66])) [3 fs_129(D)->pc+0 S4 A8])
(reg:PSI 1310)) 
"/home/jmd/Source/GCC2/libgcc/unwind-dw2.c":977:9 96 {movpsi}


J'
 

-- 
Avoid eavesdropping.  Send strong encrypted email.
PGP Public key ID: 1024D/2DE827B3 
fingerprint = 8797 A26D 0854 2EAB 0285  A290 8A67 719C 2DE8 27B3
See http://sks-keyservers.net or any PGP keyserver for public key.



Re: Indirect memory addresses vs. lra

2019-08-15 Thread Richard Biener
On August 15, 2019 6:29:13 PM GMT+02:00, Vladimir Makarov  
wrote:
>On 8/10/19 2:05 AM, John Darrington wrote:
>> On Fri, Aug 09, 2019 at 01:34:36PM -0400, Vladimir Makarov wrote:
>>   
>>   If you provide LRA dump for such test (it is better to use
>>   -fira-verbose=15 to output full RA info into stderr), I
>probably could
>>   say more.
>>
>> I've attached such a dump (generated from
>gcc/testsuite/gcc.c-torture/compile/pr53410-2.c).
>>   
>>   The less regs the architecture has, thoke easier to run into
>such error
>>   message if something described wrong in the back-end.?? I see
>your
>>   architecture is 16-bit micro-controller with only 8 regs, some
>of them is
>>   specialized.?? So your architecture is really register
>constrained.
>>
>> That's not quite correct.  It is a 24-bit micro-controller (the
>address
>> space is 24 bits wide).  There are 2 address registers (plus stack
>> pointer and program counter) and there are 8 general purpose data
>> registers (of differing sizes).
>>   
>>
>> J'
>>
>Thank you for providing the sources.  It helped me to understand what
>is 
>going on.  So the test crashes on
>
>/home/jmd/Source/GCC2/gcc/testsuite/gcc.c-torture/compile/pr53410-2.c:
>In function ‘f1’:
>/home/jmd/Source/GCC2/gcc/testsuite/gcc.c-torture/compile/pr53410-2.c:10:1:
>error: unable to find a register to spill
>/home/jmd/Source/GCC2/gcc/testsuite/gcc.c-torture/compile/pr53410-2.c:10:1:
>error: this is the insn:
>(insn 14 49 15 2 (set (mem:SI (plus:PSI (reg/f:PSI 40 [34])
> (const_int 32 [0x20])) [2  S4 A64])
>(mem:SI (reg:PSI 41) [2 *p_5(D)+0 S4 A8]))
>"/home/jmd/Source/GCC2/gcc/testsuite/gcc.c-torture/compile/pr53410-2.c":9:9
>95 {*movsi}
>  (expr_list:REG_DEAD (reg:PSI 41)
> (expr_list:REG_DEAD (reg/f:PSI 40 [34])
> (nil
>
>Your target has only 2 non-fixed addr registers (r8, r9).  One (r9) is
>defined as a hard reg pointer pointer. Honestly, I never saw a target
>with such register constraints.
>
>-O0 assumes -fno-omit-frame-pointer.  So in -O0 mode we have only *one*
>free addr reg for insn which requires *2* of them.  That is why the GCC
>port crashes on this test.  If you add -fomit-frame-pointer, the test
>succeeds.
>
>But even if use -fomit-frame-pointer,  it is not guaranteed that hard
>reg pointer will be substituted by stack pointer.  There are many cases
>where it is not possible (e.g. in case of alloca usage).
>
>So what can be done, imho.  The simplest solution would be preventing
>insns with more one memory operand.  The more difficult solution would
>be permitting two memory one with address pseudo and another one with
>stack pointer.

Couldn't we spill the frame pointer? Basically we should be able to compute the 
first address into a reg, spill that, do the second (both could require the 
frame pointer), spill the frame pointer, reload the first computed address from 
the stack, execute the insn and then reload the frame pointer.

Maybe the frame pointer can also be implemented 'virually' in an index register 
that you keep updated so that sp + reg
Is the FP. Or frame accesses can use a
Stack slot as FP and the indirect memory 
Addressing... (is there an indirect lea?) 

>I think only after solving this problem, you could think about
>implementing indirect memory addressing.
>
>  



Re: Indirect memory addresses vs. lra

2019-08-15 Thread Vladimir Makarov

On 8/10/19 2:05 AM, John Darrington wrote:

On Fri, Aug 09, 2019 at 01:34:36PM -0400, Vladimir Makarov wrote:
  
  If you provide LRA dump for such test (it is better to use

  -fira-verbose=15 to output full RA info into stderr), I probably could
  say more.

I've attached such a dump (generated from 
gcc/testsuite/gcc.c-torture/compile/pr53410-2.c).
  
  The less regs the architecture has, thoke easier to run into such error

  message if something described wrong in the back-end.?? I see your
  architecture is 16-bit micro-controller with only 8 regs, some of them is
  specialized.?? So your architecture is really register constrained.

That's not quite correct.  It is a 24-bit micro-controller (the address
space is 24 bits wide).  There are 2 address registers (plus stack
pointer and program counter) and there are 8 general purpose data
registers (of differing sizes).
  


J'

Thank you for providing the sources.  It helped me to understand what is 
going on.  So the test crashes on


/home/jmd/Source/GCC2/gcc/testsuite/gcc.c-torture/compile/pr53410-2.c: In 
function ‘f1’:
/home/jmd/Source/GCC2/gcc/testsuite/gcc.c-torture/compile/pr53410-2.c:10:1: 
error: unable to find a register to spill
/home/jmd/Source/GCC2/gcc/testsuite/gcc.c-torture/compile/pr53410-2.c:10:1: 
error: this is the insn:
(insn 14 49 15 2 (set (mem:SI (plus:PSI (reg/f:PSI 40 [34])
(const_int 32 [0x20])) [2  S4 A64])
(mem:SI (reg:PSI 41) [2 *p_5(D)+0 S4 A8])) 
"/home/jmd/Source/GCC2/gcc/testsuite/gcc.c-torture/compile/pr53410-2.c":9:9 95 
{*movsi}
 (expr_list:REG_DEAD (reg:PSI 41)
(expr_list:REG_DEAD (reg/f:PSI 40 [34])
(nil

Your target has only 2 non-fixed addr registers (r8, r9).  One (r9) is defined 
as a hard reg pointer pointer. Honestly, I never saw a target with such 
register constraints.

-O0 assumes -fno-omit-frame-pointer.  So in -O0 mode we have only *one* free 
addr reg for insn which requires *2* of them.  That is why the GCC port crashes 
on this test.  If you add -fomit-frame-pointer, the test succeeds.

But even if use -fomit-frame-pointer,  it is not guaranteed that hard reg 
pointer will be substituted by stack pointer.  There are many cases where it is 
not possible (e.g. in case of alloca usage).

So what can be done, imho.  The simplest solution would be preventing insns 
with more one memory operand.  The more difficult solution would be permitting 
two memory one with address pseudo and another one with stack pointer.

I think only after solving this problem, you could think about implementing 
indirect memory addressing.

 



Re: Indirect memory addresses vs. lra

2019-08-12 Thread Vladimir Makarov



On 2019-08-10 2:05 a.m., John Darrington wrote:

On Fri, Aug 09, 2019 at 01:34:36PM -0400, Vladimir Makarov wrote:
  
  If you provide LRA dump for such test (it is better to use

  -fira-verbose=15 to output full RA info into stderr), I probably could
  say more.

I've attached such a dump (generated from 
gcc/testsuite/gcc.c-torture/compile/pr53410-2.c).


Unfortunately, this info is not enough for me to say what is the 
problem.  I only found suspicious that LRA is trying to assign a few 
registers to a pseudo register and fails even though these registers are 
not assigned to anything.  Probably HARD_REGNO_MODE_OK prevents this.  
So it would be interesting to know how many registers of Pmode are 
actually available.


In any case I'll try to look at this problem more on this week using 
your built gcc on gcc135.


  
  The less regs the architecture has, thoke easier to run into such error

  message if something described wrong in the back-end.?? I see your
  architecture is 16-bit micro-controller with only 8 regs, some of them is
  specialized.?? So your architecture is really register constrained.

That's not quite correct.  It is a 24-bit micro-controller (the address
space is 24 bits wide).  There are 2 address registers (plus stack
pointer and program counter) and there are 8 general purpose data
registers (of differing sizes).
  



Re: Indirect memory addresses vs. lra

2019-08-12 Thread Segher Boessenkool
Hi John,

On Mon, Aug 12, 2019 at 08:47:43AM +0200, John Darrington wrote:
> On Sat, Aug 10, 2019 at 11:12:18AM -0500, Segher Boessenkool wrote:
>  On Sat, Aug 10, 2019 at 08:05:53AM +0200, John Darrington wrote:
>  > Choosing alt 5 in insn 14:  (0) m  (1) m {*movsi}
>  >14: [r40:PSI+0x20]=[r41:PSI]
>  > Inserting insn reload before:
>  >48: r40:PSI=r34:PSI
>  >49: r41:PSI=[y:PSI+0x2f]
>  
>  insn 14 is a mem-to-mem move (another feature not many more modern /
>  more RISCy CPUs have).  That requires both of your address registers.
>  So far, so good.  The reloads (insn 48 and 49) require address
>  registers themselves; that isn't necessarily a problem either.
> 
> So far as I can see, insn 48 is completely redundant.  It's copying a
> pseudo reg (74) into another pseudo reg (40).
> This is pointless and a waste, since insn 14 does not modify 74.
> I don't understand why lra feels the need to do it.

LRA always does this, I think...  it reloads all inputs to all insns
that may need reloading.  It later optimises most of that away again,
but this gives it a lot of freedom to move things around.

Or that is what it always looked like to me.  I haven't looked at the
code to see if that is the real reason, blush.

> If lra knew about (mem (mem ...)) style addressing, then insn 49 would
> also be redundant (which is why I raised the topic).

Yes.  But it probably should be able to deal with things like this, too,
or some other testcases will die a horrible death.

> In summary, what we have is:
> 
> (insn 48 84 49 2 (set (reg/f:PSI 40 [34])
> (reg/f:PSI 74 [34]))
>  (nil))
> (insn 49 48 14 2 (set (reg:PSI 41)
> (mem/f/c:PSI (plus:PSI (reg/f:PSI 9 y)
> (const_int 47 [0x2f])) [3 p+0 S4 A8]))
>  (nil))
> (insn 14 49 15 2 (set (mem:SI (plus:PSI (reg/f:PSI 40 [34])
> (const_int 32 [0x20])) [2  S4 A64])
> (mem:SI (reg:PSI 41) [2 *p_5(D)+0 S4 A8])) 
> 
> where, like you say, insns 48 and 49 are reloads.  But these two reloads 
> are unnecessary and cause the machine to run out of PSImode registers.

Anyway, please have patience, and see what Vladimir comes up with.  These
things take time.


Segher


Re: Indirect memory addresses vs. lra

2019-08-12 Thread John Darrington
On Sat, Aug 10, 2019 at 11:12:18AM -0500, Segher Boessenkool wrote:
 Hi!
 
 On Sat, Aug 10, 2019 at 08:05:53AM +0200, John Darrington wrote:
 >   Choosing alt 5 in insn 14:  (0) m  (1) m {*movsi}
 >14: [r40:PSI+0x20]=[r41:PSI]
 > Inserting insn reload before:
 >48: r40:PSI=r34:PSI
 >49: r41:PSI=[y:PSI+0x2f]
 
 insn 14 is a mem-to-mem move (another feature not many more modern /
 more RISCy CPUs have).  That requires both of your address registers.
 So far, so good.  The reloads (insn 48 and 49) require address
 registers themselves; that isn't necessarily a problem either.

So far as I can see, insn 48 is completely redundant.  It's copying a
pseudo reg (74) into another pseudo reg (40).
This is pointless and a waste, since insn 14 does not modify 74.
I don't understand why lra feels the need to do it.

If lra knew about (mem (mem ...)) style addressing, then insn 49 would
also be redundant (which is why I raised the topic).

In summary, what we have is:

(insn 48 84 49 2 (set (reg/f:PSI 40 [34])
(reg/f:PSI 74 [34]))
 (nil))
(insn 49 48 14 2 (set (reg:PSI 41)
(mem/f/c:PSI (plus:PSI (reg/f:PSI 9 y)
(const_int 47 [0x2f])) [3 p+0 S4 A8]))
 (nil))
(insn 14 49 15 2 (set (mem:SI (plus:PSI (reg/f:PSI 40 [34])
(const_int 32 [0x20])) [2  S4 A64])
(mem:SI (reg:PSI 41) [2 *p_5(D)+0 S4 A8])) 

where, like you say, insns 48 and 49 are reloads.  But these two reloads 
are unnecessary and cause the machine to run out of PSImode registers.
The above could be easier and more efficiently done simply as:

(insn 14 11 15 2 (set 
(mem:SI (plus:PSI (reg/f:PSI 74 [34]) (const_int 32 [0x20])) [2  S4 
A64])
(mem/f/c:PSI (mem:PSI (plus:PSI (reg/f:PSI 9 y)
(const_int 47 [0x2f])) [3 p+0 S4 A8])))


This is exactly what we had before lra messed with things.  It can be
represented in the ISA with one assembler instruction: 
  mov.p (32, x), [47, y]
and if I'm not mistaken, alternative 5 of my "movpsi" pattern should do
this just fine.


 But
 this requires careful juggling.  Maybe you will need some backend code

Could you give a hint into which set of hooks/constraints/predicates
this backend code should go?
 

-- 
Avoid eavesdropping.  Send strong encrypted email.
PGP Public key ID: 1024D/2DE827B3 
fingerprint = 8797 A26D 0854 2EAB 0285  A290 8A67 719C 2DE8 27B3
See http://sks-keyservers.net or any PGP keyserver for public key.



Re: Indirect memory addresses vs. lra

2019-08-10 Thread Segher Boessenkool
On Sat, Aug 10, 2019 at 08:10:27AM +0200, John Darrington wrote:
> On Fri, Aug 09, 2019 at 09:16:44AM -0500, Segher Boessenkool wrote:
> 
>  Is your code in some branch in our git?  
> 
> No.  But it could be pushed there if people think it would be
> appropriate to do so, and if I'm given the permissions to do so.
>  
>  Or in some other public git?
> 
> It's in my repo on gcc135 ~jmd/gcc-s12z (branch s12z)

That will work fine, for me at least.

>  Do you have a representative testcase?
> 
> I think gcc/testsuite/gcc.c-torture/compile/pr53410-2.c is as
> representative as any.

Okido, thanks!


Segher


Re: Indirect memory addresses vs. lra

2019-08-10 Thread Segher Boessenkool
Hi!

On Sat, Aug 10, 2019 at 08:05:53AM +0200, John Darrington wrote:
>Choosing alt 5 in insn 14:  (0) m  (1) m {*movsi}
>14: [r40:PSI+0x20]=[r41:PSI]
> Inserting insn reload before:
>48: r40:PSI=r34:PSI
>49: r41:PSI=[y:PSI+0x2f]

insn 14 is a mem-to-mem move (another feature not many more modern /
more RISCy CPUs have).  That requires both of your address registers.
So far, so good.  The reloads (insn 48 and 49) require address
registers themselves; that isn't necessarily a problem either.  But
this requires careful juggling.  Maybe you will need some backend code
for this, or to optimise this (although right now you just want it to
*work* :-) )

For some reason LRA didn't manage.  Register inheritance seems to be
implicated (but that might be a red herring).  Vladimir will probably
find out more, and/or correct me :-)


Segher


Re: Indirect memory addresses vs. lra

2019-08-10 Thread John Darrington
On Fri, Aug 09, 2019 at 09:16:44AM -0500, Segher Boessenkool wrote:

 Is your code in some branch in our git?  

No.  But it could be pushed there if people think it would be
appropriate to do so, and if I'm given the permissions to do so.
 
 Or in some other public git?

It's in my repo on gcc135 ~jmd/gcc-s12z (branch s12z)


 Do you have a representative testcase?

I think gcc/testsuite/gcc.c-torture/compile/pr53410-2.c is as
representative as any.
 

J'

 

-- 
Avoid eavesdropping.  Send strong encrypted email.
PGP Public key ID: 1024D/2DE827B3 
fingerprint = 8797 A26D 0854 2EAB 0285  A290 8A67 719C 2DE8 27B3
See http://sks-keyservers.net or any PGP keyserver for public key.



signature.asc
Description: PGP signature


Re: Indirect memory addresses vs. lra

2019-08-10 Thread John Darrington
On Fri, Aug 09, 2019 at 01:34:36PM -0400, Vladimir Makarov wrote:
 
 If you provide LRA dump for such test (it is better to use
 -fira-verbose=15 to output full RA info into stderr), I probably could
 say more.

I've attached such a dump (generated from 
gcc/testsuite/gcc.c-torture/compile/pr53410-2.c).
 
 The less regs the architecture has, thoke easier to run into such error
 message if something described wrong in the back-end.?? I see your
 architecture is 16-bit micro-controller with only 8 regs, some of them is
 specialized.?? So your architecture is really register constrained.

That's not quite correct.  It is a 24-bit micro-controller (the address
space is 24 bits wide).  There are 2 address registers (plus stack
pointer and program counter) and there are 8 general purpose data
registers (of differing sizes).
 

J'

-- 
Avoid eavesdropping.  Send strong encrypted email.
PGP Public key ID: 1024D/2DE827B3 
fingerprint = 8797 A26D 0854 2EAB 0285  A290 8A67 719C 2DE8 27B3
See http://sks-keyservers.net or any PGP keyserver for public key.

Building IRA IR

Pass 0 for finding pseudo/allocno costs

r36: preferred X_REG, alternative NO_REGS, allocno X_REG
a0 (r36,l0) best X_REG, allocno X_REG
r35: preferred X_REG, alternative NO_REGS, allocno X_REG
a10 (r35,l0) best X_REG, allocno X_REG
r34: preferred X_REG, alternative NO_REGS, allocno X_REG
a1 (r34,l0) best X_REG, allocno X_REG
r33: preferred DATA_REGS, alternative NO_REGS, allocno DATA_REGS
a11 (r33,l0) best DATA_REGS, allocno DATA_REGS
r32: preferred DATA_REGS, alternative NO_REGS, allocno DATA_REGS
a12 (r32,l0) best DATA_REGS, allocno DATA_REGS
r31: preferred DATA_REGS, alternative NO_REGS, allocno DATA_REGS
a14 (r31,l0) best DATA_REGS, allocno DATA_REGS
r30: preferred NO_REGS, alternative NO_REGS, allocno NO_REGS
a13 (r30,l0) best NO_REGS, allocno NO_REGS
r29: preferred X_REG, alternative NO_REGS, allocno X_REG
a15 (r29,l0) best X_REG, allocno X_REG
r28: preferred X_REG, alternative NO_REGS, allocno X_REG
a16 (r28,l0) best X_REG, allocno X_REG
r27: preferred X_REG, alternative NO_REGS, allocno X_REG
a17 (r27,l0) best X_REG, allocno X_REG
r26: preferred DATA_REGS, alternative NO_REGS, allocno DATA_REGS
a2 (r26,l0) best DATA_REGS, allocno DATA_REGS
r25: preferred DATA_REGS, alternative NO_REGS, allocno DATA_REGS
a4 (r25,l0) best DATA_REGS, allocno DATA_REGS
r24: preferred DATA_REGS, alternative NO_REGS, allocno DATA_REGS
a3 (r24,l0) best DATA_REGS, allocno DATA_REGS
r23: preferred DATA_REGS, alternative NO_REGS, allocno DATA_REGS
a5 (r23,l0) best DATA_REGS, allocno DATA_REGS
r22: preferred DATA_REGS, alternative NO_REGS, allocno DATA_REGS
a6 (r22,l0) best DATA_REGS, allocno DATA_REGS
r21: preferred DATA_REGS, alternative NO_REGS, allocno DATA_REGS
a8 (r21,l0) best DATA_REGS, allocno DATA_REGS
r20: preferred DATA_REGS, alternative NO_REGS, allocno DATA_REGS
a7 (r20,l0) best DATA_REGS, allocno DATA_REGS
r19: preferred DATA_REGS, alternative NO_REGS, allocno DATA_REGS
a9 (r19,l0) best DATA_REGS, allocno DATA_REGS

  a0(r36,l0) costs: X_REG:0 MEM:5000
  a1(r34,l0) costs: X_REG:0 MEM:84000
  a2(r26,l0) costs: DATA_REGS:0 MEM:5000
  a3(r24,l0) costs: DATA_REGS:0 MEM:5000
  a4(r25,l0) costs: DATA_REGS:0 MEM:5000
  a5(r23,l0) costs: DATA_REGS:0 MEM:5000
  a6(r22,l0) costs: DATA_REGS:0 MEM:5000
  a7(r20,l0) costs: DATA_REGS:0 MEM:5000
  a8(r21,l0) costs: DATA_REGS:0 MEM:5000
  a9(r19,l0) costs: DATA_REGS:0 MEM:5000
  a10(r35,l0) costs: X_REG:0 MEM:5000
  a11(r33,l0) costs: DATA_REGS:0 MEM:8000
  a12(r32,l0) costs: DATA_REGS:0 MEM:7000
  a13(r30,l0) costs: MEM:8000
  a14(r31,l0) costs: DATA_REGS:0 MEM:7000
  a15(r29,l0) costs: X_REG:0 MEM:8000
  a16(r28,l0) costs: X_REG:0 MEM:8000
  a17(r27,l0) costs: X_REG:2000 MEM:8000

   Insn 43(l0): point = 0
   Insn 39(l0): point = 3
   Insn 38(l0): point = 5
   Insn 37(l0): point = 7
   Insn 36(l0): point = 9
   Insn 35(l0): point = 11
   Insn 34(l0): point = 13
   Insn 33(l0): point = 15
   Insn 32(l0): point = 17
   Insn 31(l0): point = 19
   Insn 30(l0): point = 21
   Insn 29(l0): point = 23
   Insn 28(l0): point = 25
   Insn 27(l0): point = 27
   Insn 26(l0): point = 29
   Insn 25(l0): point = 31
   Insn 24(l0): point = 33
   Insn 23(l0): point = 35
   Insn 22(l0): point = 37
   Insn 21(l0): point = 39
   Insn 20(l0): point = 41
   Insn 19(l0): point = 43
   Insn 18(l0): point = 45
   Insn 17(l0): point = 47
   Insn 16(l0): point = 49
   Insn 15(l0): point = 51
   Insn 14(l0): point = 53
   Insn 9(l0): point = 55
   Insn 8(l0): point = 57
   Insn 7(l0): point = 59
   Insn 6(l0): point = 61
   Insn 5(l0): point = 63
   Insn 4(l0): point = 65
   Insn 3(l0): point = 67
   Insn 2(l0): point = 69
   Insn 10(l0): point = 71
 a0(r36): [4..5]
 a1(r34): [4..55]
 a2(r26): [18..21]
 a3(r24): [20..25]
 a4(r25): [22..23]
 a5(r23): [26..27]
 

Re: Indirect memory addresses vs. lra

2019-08-09 Thread Vladimir Makarov



On 2019-08-09 4:14 a.m., John Darrington wrote:

On Thu, Aug 08, 2019 at 01:57:41PM -0600, Jeff Law wrote:

  Yea, it's certainly designed with the more mainstream architectures in
  mind.  THe double-indirect case that's being talked about here is well
  out of the mainstream and not a feature of anything LRA has targetted to
  date.  So I'm not surprised it's not working.
  
  My suggestion would be to ignore the double-indirect aspect of the

  architecture right now, get the port working, then come back and try to
  make double-indirect addressing modes work.
  
This sounds like sensible advice.  However I wonder if this issue is

related to the other major outstanding problem I have, viz: the large
number of test failures which report "Unable to find a register to
spill" - So far, nobody has been able to explain how to solve that
issue and even the people who appear to be more knowlegeable have
expressed suprise that it is even happening at all.


Basically, LRA behaves here as older reload.  If an RTL insn needs hard 
regs and there are no free regs, LRA/reload put pseudos assigned to hard 
regs and living through the insn into memory.  So it is very hard to run 
into problem "unable to find a register to spill", if the insn needs 
less regs provided by architecture. That is why people are surprised.  
Still it can happens as one RTL insn can be implemented by a few machine 
insns.  Most frequent case here are GCC asm insns requiring a lot of 
input/output/and clobbered regs/operands.


If you provide LRA dump for such test (it is better to use 
-fira-verbose=15 to output full RA info into stderr), I probably could 
say more.


The less regs the architecture has, the easier to run into such error 
message if something described wrong in the back-end.  I see your 
architecture is 16-bit micro-controller with only 8 regs, some of them 
is specialized.  So your architecture is really register constrained.



Even if it should turn out not to be related, the message I've been
receiving in this thread is lra should not be expected to work for
non "mainstream" backends.  So perhaps there is another, yet to be
discovered, restriction which prevents my backend from ever working?

On the other hand, given my lack of experience with gcc,  it could be
that lra is working perfectly, and I have simply done something
incorrectly.But the uncertainty voiced in this thread means that it
is hard to be sure that I'm not trying to do something which is
currently unsupported.


LRA/reload is the most machine-dependent machine-independent pass in 
GCC.  It is connected to machine-dependent code by numerous ways. Big 
part of making a new backend  is to make LRA/reload and 
machine-dependent code communication in the right way.


Sometimes it is hard to decide who is responsible for RA related bugs: 
RA or back-end.  Sometimes an innocent change in RA solving one problem 
for a particular target might results in numerous new bugs for other 
targets.  Therefore it is very difficult to say will your small change 
to permit indirect memory addressing work in general case.




Re: Indirect memory addresses vs. lra

2019-08-09 Thread Jeff Law
On 8/9/19 2:14 AM, John Darrington wrote:
> On Thu, Aug 08, 2019 at 01:57:41PM -0600, Jeff Law wrote:
> 
>  Yea, it's certainly designed with the more mainstream architectures in
>  mind.  THe double-indirect case that's being talked about here is well
>  out of the mainstream and not a feature of anything LRA has targetted to
>  date.  So I'm not surprised it's not working.
>  
>  My suggestion would be to ignore the double-indirect aspect of the
>  architecture right now, get the port working, then come back and try to
>  make double-indirect addressing modes work.
>  
> This sounds like sensible advice.  However I wonder if this issue is
> related to the other major outstanding problem I have, viz: the large 
> number of test failures which report "Unable to find a register to
> spill" - So far, nobody has been able to explain how to solve that
> issue and even the people who appear to be more knowlegeable have
> expressed suprise that it is even happening at all.
You're going to have to debug what LRA is doing and why.  There's really
no short-cuts here.  We can't really do it for you.  Even if you weren't
using LRA you'd be doing the same process, just on even more difficult
to understand codebase.

> 
> Even if it should turn out not to be related, the message I've been
> receiving in this thread is lra should not be expected to work for
> non "mainstream" backends.  So perhaps there is another, yet to be
> discovered, restriction which prevents my backend from ever working?
It's possible.  But that's not really any different than reload.
There's certainly various aspects of architectures that reload can't
handle as well -- even on architectures that were mainstream processors
when reload was under active development and maintenance.  THere's even
a good chance reload won't handle double-indirect addressing modes well
-- they were far from mainstream and as a result the code which does
purport to handle double-indirect addressing modes hasn't been
used/tested all that much over the last 25+ years.

> 
> On the other hand, given my lack of experience with gcc,  it could be
> that lra is working perfectly, and I have simply done something
> incorrectly.But the uncertainty voiced in this thread means that it
> is hard to be sure that I'm not trying to do something which is
> currently unsupported.
My recommendation is to continue with the LRA path.

jeff


Re: Indirect memory addresses vs. lra

2019-08-09 Thread Paul Koning



> On Aug 9, 2019, at 10:16 AM, Segher Boessenkool  
> wrote:
> 
> Hi!
> 
> On Fri, Aug 09, 2019 at 10:14:39AM +0200, John Darrington wrote:
>> On Thu, Aug 08, 2019 at 01:57:41PM -0600, Jeff Law wrote:
>> 
>>  ...  However I wonder if this issue is
>> related to the other major outstanding problem I have, viz: the large 
>> number of test failures which report "Unable to find a register to
>> spill" - So far, nobody has been able to explain how to solve that
>> issue and even the people who appear to be more knowlegeable have
>> expressed suprise that it is even happening at all.
> 
> No one is surprised.  It is just the funny way that LRA says "whoops I
> am going in circles, there is no progress and there will never be, I'd
> better stop that".  Everyone doing new ports / new conversions to LRA
> sees that error all the time.
> 
> The error could be pretty much *anywhere* in your port.  You have to
> look at what LRA did, and why, and why that is wrong, and fix that.

I've run into this a number of times.  The difficulty is that, for someone who 
understands the back end and the documented rules but not the internals of LRA, 
it tends to be hard to figure out what the problem is.  And since the causes 
tend to be obscure and undocumented, I find myself having to relearn the 
analysis from time to time. 

It has been stated that LRA is more dependent on correct back end definitions 
than Reload is, but unfortunately the precise definition of "correct" can be 
less than obvious to a back end maintainer.

paul




Re: Indirect memory addresses vs. lra

2019-08-09 Thread Segher Boessenkool
Hi!

On Fri, Aug 09, 2019 at 10:14:39AM +0200, John Darrington wrote:
> On Thu, Aug 08, 2019 at 01:57:41PM -0600, Jeff Law wrote:
> 
>  Yea, it's certainly designed with the more mainstream architectures in
>  mind.  THe double-indirect case that's being talked about here is well
>  out of the mainstream and not a feature of anything LRA has targetted to
>  date.  So I'm not surprised it's not working.
>  
>  My suggestion would be to ignore the double-indirect aspect of the
>  architecture right now, get the port working, then come back and try to
>  make double-indirect addressing modes work.
>  
> This sounds like sensible advice.  However I wonder if this issue is
> related to the other major outstanding problem I have, viz: the large 
> number of test failures which report "Unable to find a register to
> spill" - So far, nobody has been able to explain how to solve that
> issue and even the people who appear to be more knowlegeable have
> expressed suprise that it is even happening at all.

No one is surprised.  It is just the funny way that LRA says "whoops I
am going in circles, there is no progress and there will never be, I'd
better stop that".  Everyone doing new ports / new conversions to LRA
sees that error all the time.

The error could be pretty much *anywhere* in your port.  You have to
look at what LRA did, and why, and why that is wrong, and fix that.

> Even if it should turn out not to be related, the message I've been
> receiving in this thread is lra should not be expected to work for
> non "mainstream" backends.

LRA is more likely to have problems in situations where it has not been
tested before.  You can replace LRA by anything else, and this isn't
limited to GCC (or software, or human endeavours, or humanity even).

> So perhaps there is another, yet to be
> discovered, restriction which prevents my backend from ever working?

>From ever?  Nah, we can patch.  Also, Occam's razor says there likely
is an error in your backend you haven't found yet.

> On the other hand, given my lack of experience with gcc,  it could be
> that lra is working perfectly, and I have simply done something
> incorrectly.But the uncertainty voiced in this thread means that it
> is hard to be sure that I'm not trying to do something which is
> currently unsupported.

Is your code in some branch in our git?  Or in some other public git?
Do you have a representative testcase?


Segher


Re: Indirect memory addresses vs. lra

2019-08-09 Thread John Darrington
On Thu, Aug 08, 2019 at 01:57:41PM -0600, Jeff Law wrote:

 Yea, it's certainly designed with the more mainstream architectures in
 mind.  THe double-indirect case that's being talked about here is well
 out of the mainstream and not a feature of anything LRA has targetted to
 date.  So I'm not surprised it's not working.
 
 My suggestion would be to ignore the double-indirect aspect of the
 architecture right now, get the port working, then come back and try to
 make double-indirect addressing modes work.
 
This sounds like sensible advice.  However I wonder if this issue is
related to the other major outstanding problem I have, viz: the large 
number of test failures which report "Unable to find a register to
spill" - So far, nobody has been able to explain how to solve that
issue and even the people who appear to be more knowlegeable have
expressed suprise that it is even happening at all.

Even if it should turn out not to be related, the message I've been
receiving in this thread is lra should not be expected to work for
non "mainstream" backends.  So perhaps there is another, yet to be
discovered, restriction which prevents my backend from ever working?

On the other hand, given my lack of experience with gcc,  it could be
that lra is working perfectly, and I have simply done something
incorrectly.But the uncertainty voiced in this thread means that it
is hard to be sure that I'm not trying to do something which is
currently unsupported.

J'

-- 
Avoid eavesdropping.  Send strong encrypted email.
PGP Public key ID: 1024D/2DE827B3 
fingerprint = 8797 A26D 0854 2EAB 0285  A290 8A67 719C 2DE8 27B3
See http://sks-keyservers.net or any PGP keyserver for public key.



Re: Indirect memory addresses vs. lra

2019-08-08 Thread Jeff Law
On 8/8/19 1:19 PM, Segher Boessenkool wrote:
> On Thu, Aug 08, 2019 at 01:30:41PM -0400, Paul Koning wrote:
>>
>>
>>> On Aug 8, 2019, at 1:21 PM, Segher Boessenkool  
>>> wrote:
>>>
>>> On Thu, Aug 08, 2019 at 12:43:52PM -0400, Paul Koning wrote:
> On Aug 8, 2019, at 12:25 PM, Vladimir Makarov  wrote:
> The old reload (reload[1].c) supports such addressing.  As modern 
> mainstream architectures have no this kind of addressing, it was not 
> implemented in LRA.

 Is LRA only intended for "modern mainstream architectures"?
>>>
>>> I sure hope not!  But it has only been *used* and *tested* much on such,
>>> so far. 
>>
>> That's not entirely accurate.  At the prodding of people pushing for
>> the removal of CC0 and reload, I've added LRA support to pdp11 in the
>> V9 cycle.
> 
> I said "much" :-)
> 
> Pretty much all design input so far has been from "modern mainstream
> architectures", as far as I can make out.  Now one of those has the
> most "interesting" (for RA) features that many less mainstream archs
> have (a not-so-very-flat register file), so it should still work pretty
> well hopefully.
Yea, it's certainly designed with the more mainstream architectures in
mind.  THe double-indirect case that's being talked about here is well
out of the mainstream and not a feature of anything LRA has targetted to
date.  So I'm not surprised it's not working.

My suggestion would be to ignore the double-indirect aspect of the
architecture right now, get the port working, then come back and try to
make double-indirect addressing modes work.

> 
>> And it works pretty well, in the sense of passing the
>> compile tests.  But I haven't yet examined the code quality vs. the
>> old one in any detail.
> 
> That would be quite interesting to see, also for the other ports that
> still need conversion: how much (if any) degradation should you expect
> from a straight-up conversion of a port to LRA, without any retuning?
I did the v850 last year where it was a wash or perhaps a slight
improvement for codesize, which is a reasonable approximation for
performance on that target.

I was working a bit on converting the H8 away from cc0 with an eye
towards LRA as well.  Given how registers overlap on the H8, the most
straightforward port should end up with properties much like 32bit x86.
  I suspect the independent addressing of the high/low register parts
might be better handled by LRA, but I wasn't going to do anything beyond
the "just make it work".

jeff


Re: Indirect memory addresses vs. lra

2019-08-08 Thread Segher Boessenkool
On Thu, Aug 08, 2019 at 01:30:41PM -0400, Paul Koning wrote:
> 
> 
> > On Aug 8, 2019, at 1:21 PM, Segher Boessenkool  
> > wrote:
> > 
> > On Thu, Aug 08, 2019 at 12:43:52PM -0400, Paul Koning wrote:
> >>> On Aug 8, 2019, at 12:25 PM, Vladimir Makarov  wrote:
> >>> The old reload (reload[1].c) supports such addressing.  As modern 
> >>> mainstream architectures have no this kind of addressing, it was not 
> >>> implemented in LRA.
> >> 
> >> Is LRA only intended for "modern mainstream architectures"?
> > 
> > I sure hope not!  But it has only been *used* and *tested* much on such,
> > so far. 
> 
> That's not entirely accurate.  At the prodding of people pushing for
> the removal of CC0 and reload, I've added LRA support to pdp11 in the
> V9 cycle.

I said "much" :-)

Pretty much all design input so far has been from "modern mainstream
architectures", as far as I can make out.  Now one of those has the
most "interesting" (for RA) features that many less mainstream archs
have (a not-so-very-flat register file), so it should still work pretty
well hopefully.

> And it works pretty well, in the sense of passing the
> compile tests.  But I haven't yet examined the code quality vs. the
> old one in any detail.

That would be quite interesting to see, also for the other ports that
still need conversion: how much (if any) degradation should you expect
from a straight-up conversion of a port to LRA, without any retuning?


Segher


Re: Indirect memory addresses vs. lra

2019-08-08 Thread Segher Boessenkool
On Thu, Aug 08, 2019 at 01:25:27PM -0400, Paul Koning wrote:
> > On Aug 8, 2019, at 1:21 PM, Segher Boessenkool  
> > wrote:
> > On Thu, Aug 08, 2019 at 12:43:52PM -0400, Paul Koning wrote:
> >> Indirect addressing is a key feature in size-optimized code.
> > 
> > That doesn't mean that LRA has to support it, btw, not necessarily; it
> > may well be possible to do a good job of this in the later passes?
> > Maybe postreload, maybe some peepholes, etc.?
> 
> Possibly.  But as Vladimir points out, indirect addressing affects
> register allocation (reducing register pressure).

Yeah, good point, esp. if you have only one or two registers that you
can use for addressing at all.  So it will have to happen during (or
before?) RA, alright.


Segher


Re: Indirect memory addresses vs. lra

2019-08-08 Thread Vladimir Makarov



On 2019-08-08 12:43 p.m., Paul Koning wrote:



On Aug 8, 2019, at 12:25 PM, Vladimir Makarov  wrote:


On 2019-08-04 3:18 p.m., John Darrington wrote:

I'm trying to write a back-end for an architecture (s12z - the ISA you can
download from [1]).  This arch accepts indirect memory addresses.   That is to
say, those of the form (mem (mem (...)))  and although my 
TARGET_LEGITIMATE_ADDRESS
function returns true for such addresses, LRA insists on reloading them out of
existence.
...

The old reload (reload[1].c) supports such addressing.  As modern mainstream 
architectures have no this kind of addressing, it was not implemented in LRA.

Is LRA only intended for "modern mainstream architectures"?



No.  As I wrote patches implementing indirect addressing is welcomed.  
It is hard to implement everything at once and by one person.




If yes, why is the old reload being deprecated?
   You can't have it both ways.  Unless you want to obsolete all "not modern 
mainstream architectures" in GCC, it doesn't make sense to get rid of core 
functionality used by those architectures.

Indirect addressing is a key feature in size-optimized code.




Re: Indirect memory addresses vs. lra

2019-08-08 Thread Paul Koning



> On Aug 8, 2019, at 1:21 PM, Segher Boessenkool  
> wrote:
> 
> On Thu, Aug 08, 2019 at 12:43:52PM -0400, Paul Koning wrote:
>>> On Aug 8, 2019, at 12:25 PM, Vladimir Makarov  wrote:
>>> The old reload (reload[1].c) supports such addressing.  As modern 
>>> mainstream architectures have no this kind of addressing, it was not 
>>> implemented in LRA.
>> 
>> Is LRA only intended for "modern mainstream architectures"?
> 
> I sure hope not!  But it has only been *used* and *tested* much on such,
> so far. 

That's not entirely accurate.  At the prodding of people pushing for the 
removal of CC0 and reload, I've added LRA support to pdp11 in the V9 cycle.  
And it works pretty well, in the sense of passing the compile tests.  But I 
haven't yet examined the code quality vs. the old one in any detail.

paul



Re: Indirect memory addresses vs. lra

2019-08-08 Thread Paul Koning



> On Aug 8, 2019, at 1:21 PM, Segher Boessenkool  
> wrote:
> 
> On Thu, Aug 08, 2019 at 12:43:52PM -0400, Paul Koning wrote:
>>> On Aug 8, 2019, at 12:25 PM, Vladimir Makarov  wrote:
>>> The old reload (reload[1].c) supports such addressing.  As modern 
>>> mainstream architectures have no this kind of addressing, it was not 
>>> implemented in LRA.
>> 
>> Is LRA only intended for "modern mainstream architectures"?
> 
> I sure hope not!  But it has only been *used* and *tested* much on such,
> so far.  Things are designed to work well for modern archs.
> 
>> If yes, why is the old reload being deprecated?  You can't have it both 
>> ways.  Unless you want to obsolete all "not modern mainstream architectures" 
>> in GCC, it doesn't make sense to get rid of core functionality used by those 
>> architectures.
>> 
>> Indirect addressing is a key feature in size-optimized code.
> 
> That doesn't mean that LRA has to support it, btw, not necessarily; it
> may well be possible to do a good job of this in the later passes?
> Maybe postreload, maybe some peepholes, etc.?

Possibly.  But as Vladimir points out, indirect addressing affects register 
allocation (reducing register pressure).  In older architectures that implement 
indirect addressing, that is one of the key ways in which the feature reduces 
code size.  While I can see how peephole optimization can convert a address 
load plus a register indirect into a memory indirect instruction, does that 
help the register become available for other uses or is post-LRA too late for 
that?  My impression is that it is too late, since at this point we're dealing 
with hard registers and making one free via peephole helps no one else.

paul




Re: Indirect memory addresses vs. lra

2019-08-08 Thread Segher Boessenkool
On Thu, Aug 08, 2019 at 12:43:52PM -0400, Paul Koning wrote:
> > On Aug 8, 2019, at 12:25 PM, Vladimir Makarov  wrote:
> > The old reload (reload[1].c) supports such addressing.  As modern 
> > mainstream architectures have no this kind of addressing, it was not 
> > implemented in LRA.
> 
> Is LRA only intended for "modern mainstream architectures"?

I sure hope not!  But it has only been *used* and *tested* much on such,
so far.  Things are designed to work well for modern archs.

> If yes, why is the old reload being deprecated?  You can't have it both ways. 
>  Unless you want to obsolete all "not modern mainstream architectures" in 
> GCC, it doesn't make sense to get rid of core functionality used by those 
> architectures.
> 
> Indirect addressing is a key feature in size-optimized code.

That doesn't mean that LRA has to support it, btw, not necessarily; it
may well be possible to do a good job of this in the later passes?
Maybe postreload, maybe some peepholes, etc.?


Segher


Re: Indirect memory addresses vs. lra

2019-08-08 Thread Paul Koning



> On Aug 8, 2019, at 12:25 PM, Vladimir Makarov  wrote:
> 
> 
> On 2019-08-04 3:18 p.m., John Darrington wrote:
>> I'm trying to write a back-end for an architecture (s12z - the ISA you can
>> download from [1]).  This arch accepts indirect memory addresses.   That is 
>> to
>> say, those of the form (mem (mem (...)))  and although my 
>> TARGET_LEGITIMATE_ADDRESS
>> function returns true for such addresses, LRA insists on reloading them out 
>> of
>> existence.
>> ...
> The old reload (reload[1].c) supports such addressing.  As modern mainstream 
> architectures have no this kind of addressing, it was not implemented in LRA.

Is LRA only intended for "modern mainstream architectures"?

If yes, why is the old reload being deprecated?  You can't have it both ways.  
Unless you want to obsolete all "not modern mainstream architectures" in GCC, 
it doesn't make sense to get rid of core functionality used by those 
architectures.

Indirect addressing is a key feature in size-optimized code.

paul



Re: Indirect memory addresses vs. lra

2019-08-08 Thread Vladimir Makarov



On 2019-08-04 3:18 p.m., John Darrington wrote:

I'm trying to write a back-end for an architecture (s12z - the ISA you can
download from [1]).  This arch accepts indirect memory addresses.   That is to
say, those of the form (mem (mem (...)))  and although my 
TARGET_LEGITIMATE_ADDRESS
function returns true for such addresses, LRA insists on reloading them out of
existence.

For example, when compiling a code fragment:

   volatile unsigned char *led = 0x2F2;
   *led = 1;

the ira dump file shows:

(insn 7 6 8 2 (set (mem/f/c:PSI (reg/f:PSI 9 y) [3 led+0 S4 A8])
 (const_int 754 [0x2f2])) "/home/jmd/MemMem/memmem.c":15:27 96 {movpsi}
  (nil))
(insn 8 7 14 2 (set (mem/v:QI (mem/f/c:PSI (reg/f:PSI 9 y) [3 led+0 S4 A8]) [0 
*led_7+0 S1 A8])
 (const_int 1 [0x1])) "/home/jmd/MemMem/memmem.c":16:8 98 {movqi}
  (nil))

which is a perfectly valid insn, and the most efficient assembler for it is:
mov.p #0x2f2, y
mov.b #1, [0,y]

However the reload dump shows this has been changed to:

(insn 7 6 22 2 (set (mem/f/c:PSI (reg/f:PSI 9 y) [3 led+0 S4 A8])
 (const_int 754 [0x2f2])) "/home/jmd/MemMem/memmem.c":15:27 96 {movpsi}
  (nil))
(insn 22 7 8 2 (set (reg:PSI 8 x [22])
 (mem/f/c:PSI (reg/f:PSI 9 y) [3 led+0 S4 A8])) 
"/home/jmd/MemMem/memmem.c":16:8 96 {movpsi}
  (nil))
(insn 8 22 14 2 (set (mem/v:QI (reg:PSI 8 x [22]) [0 *led_7+0 S1 A8])
 (const_int 1 [0x1])) "/home/jmd/MemMem/memmem.c":16:8 98 {movqi}
  (nil))

and ends up as:

mov.p #0x2f2, y
mov.p (0,y) x
mov.b #1, (0,x)

So this wastes a register (which leads to other issues which I don't want to go
into in this email).

After a lot of debugging I tracked down the part of lra which is doing this
reload to the function process_addr_reg at lra-constraints.c:1378

  if (! REG_P (reg))
 {
   if (check_only_p)
 return true;
   /* Always reload memory in an address even if the target supports such 
addresses.  */
   new_reg = lra_create_new_reg_with_unique_value (mode, reg, cl, 
"address");
   before_p = true;
 }

Changing this to

  if (! REG_P (reg))
 {
   if (check_only_p)
 return true;
   return false;
 }

solves my immediate problem.  However I imagine there was a reason for doing
this reload, and presumably a better way of avoiding it.

Can someone explain the reason for this reload, and how I can best ensure that
indirect memory operands are left in the compiled code?

The old reload (reload[1].c) supports such addressing.  As modern 
mainstream architectures have no this kind of addressing, it was not 
implemented in LRA.


I don't think the above simple change will work fully.  For example, you 
need to constrain memory nesting.  The constraints should be described, 
may be some hooks should be implemented (may be not and 
TARGET_LEGITIMATE_ADDRESS will be enough), may be additional address 
anslysis and transformations should be implemented in LRA, etc.  But may 
be implementing this is not hard either.


It is also difficult for me to say is it worth to do.  Removing such 
addressing helps to remove redundant memory reads.  On the other hand, 
its usage can decrease #insns and save registers for better RA and 
utilize hardware on design of which a lot of efforts were spent.


In any case, if somebody implements this, it can be included in LRA.



[1] https://www.nxp.com/docs/en/reference-manual/S12ZCPU_RM_V1.pdf