Re: Special Memory Constraint [was Re: Indirect memory addresses vs. lra]
On Tue, Aug 20, 2019 at 9:07 AM John Darrington wrote: > > On Tue, Aug 20, 2019 at 08:56:39AM +0200, Richard Biener wrote: > > > Most of these suggestions involve adding some sort of virtual registers > > So I hacked the machine description to add two new registers Z1 and Z2 > > with the same mode as X and Y. > > > > Obviously the assembler balks at this. However the compiler still > > ICEs at the same place as before. > > > > So this suggests that our original diagnosis, viz: there are not enough > > address registers was not accurate, and in fact there is some other > > problem? > > That sounds likely. Given you have indirect addressing you could > simulate N virtual regs by placing them in a virtual reg table in memory > and accessed via a fixed address register (assuming all instructions > that would need an address reg also can take that indirect from memory). > > That was my plan. Accordingly, extending the md to provide N additional > regs (N currently = 2) was the first step. Having doubled the number > of available address registers, I had expected this would fix most of the > ICEs (but cause a lot of assembler errors). > > However it hasn't eliminated any ICEs. lra is still complaining > "unable to find a register to spill" So the plan seems to have fallen > over at the first hurdle. Why can it still not spill registers despite > having a lot more of them? You really have to sit down and trace the LRA code with a debugger to tell... unfortunately the dumps aren't verbose enough to tell. Usually after spilling the insn constraints can still not be satisfied, the main question is usually why. Richard. > J'
Re: Special Memory Constraint [was Re: Indirect memory addresses vs. lra]
On Tue, Aug 20, 2019 at 08:56:39AM +0200, Richard Biener wrote: > Most of these suggestions involve adding some sort of virtual registers > So I hacked the machine description to add two new registers Z1 and Z2 > with the same mode as X and Y. > > Obviously the assembler balks at this. However the compiler still > ICEs at the same place as before. > > So this suggests that our original diagnosis, viz: there are not enough > address registers was not accurate, and in fact there is some other > problem? That sounds likely. Given you have indirect addressing you could simulate N virtual regs by placing them in a virtual reg table in memory and accessed via a fixed address register (assuming all instructions that would need an address reg also can take that indirect from memory). That was my plan. Accordingly, extending the md to provide N additional regs (N currently = 2) was the first step. Having doubled the number of available address registers, I had expected this would fix most of the ICEs (but cause a lot of assembler errors). However it hasn't eliminated any ICEs. lra is still complaining "unable to find a register to spill" So the plan seems to have fallen over at the first hurdle. Why can it still not spill registers despite having a lot more of them? J'
Re: Special Memory Constraint [was Re: Indirect memory addresses vs. lra]
On Mon, Aug 19, 2019 at 8:06 PM John Darrington wrote: > > On Mon, Aug 19, 2019 at 10:07:11AM -0500, Segher Boessenkool wrote: > > > ? As I remember there were a few other ideas from Richard Biener and > > Segher Boessenkool.? I also proposed to add a new address register > which > > will be always a fixed stack memory slot at the end. Unfortunately I am > > not familiar with the target and the port to say in details how to do > > it.? But I think it is worth to try. > > The m68hc11 port used the fake Z register approach, and I believe it had > some special machine pass to get rid of it right before assembler output. > > (r171302 is when it was removed -- last version was > > https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/config/m68hc11/m68hc11.c;h=1e414102c3f1fed985e4fb8db7954342e965190b;hb=bae8bb65d842d7ffefe990c1f0ac004491f3c105#l4061 > for the machine reorg stuff). > > No idea how well it works... But it's only needed if you are forced to > have a frame pointer IIUC? > > > Segher > > > Most of these suggestions involve adding some sort of virtual registers > So I hacked the machine description to add two new registers Z1 and Z2 > with the same mode as X and Y. > > Obviously the assembler balks at this. However the compiler still > ICEs at the same place as before. > > So this suggests that our original diagnosis, viz: there are not enough > address registers was not accurate, and in fact there is some other > problem? That sounds likely. Given you have indirect addressing you could simulate N virtual regs by placing them in a virtual reg table in memory and accessed via a fixed address register (assuming all instructions that would need an address reg also can take that indirect from memory). Richard. > J' > > -- > Avoid eavesdropping. Send strong encrypted email. > PGP Public key ID: 1024D/2DE827B3 > fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3 > See http://sks-keyservers.net or any PGP keyserver for public key. >
Re: Special Memory Constraint [was Re: Indirect memory addresses vs. lra]
On Mon, Aug 19, 2019 at 10:07:11AM -0500, Segher Boessenkool wrote: > ? As I remember there were a few other ideas from Richard Biener and > Segher Boessenkool.? I also proposed to add a new address register which > will be always a fixed stack memory slot at the end. Unfortunately I am > not familiar with the target and the port to say in details how to do > it.? But I think it is worth to try. The m68hc11 port used the fake Z register approach, and I believe it had some special machine pass to get rid of it right before assembler output. (r171302 is when it was removed -- last version was https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/config/m68hc11/m68hc11.c;h=1e414102c3f1fed985e4fb8db7954342e965190b;hb=bae8bb65d842d7ffefe990c1f0ac004491f3c105#l4061 for the machine reorg stuff). No idea how well it works... But it's only needed if you are forced to have a frame pointer IIUC? Segher Most of these suggestions involve adding some sort of virtual registers So I hacked the machine description to add two new registers Z1 and Z2 with the same mode as X and Y. Obviously the assembler balks at this. However the compiler still ICEs at the same place as before. So this suggests that our original diagnosis, viz: there are not enough address registers was not accurate, and in fact there is some other problem? J' -- Avoid eavesdropping. Send strong encrypted email. PGP Public key ID: 1024D/2DE827B3 fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3 See http://sks-keyservers.net or any PGP keyserver for public key.
Re: Special Memory Constraint [was Re: Indirect memory addresses vs. lra]
On Mon, Aug 19, 2019 at 09:14:22AM -0400, Vladimir Makarov wrote: > On 2019-08-19 3:35 a.m., John Darrington wrote: > >On Fri, Aug 16, 2019 at 10:50:13AM -0400, Vladimir Makarov wrote: > > No I meant something like that > > > > (define_special_memory_constraint "a" ...) > > (define_predicate "my_special_predicate" ... > > > > { > > if (lra_in_progress_p) > > return REG_P (op) && REGNO (op) >= FIRST_PSEUDO_REGISTER && > > reg_renumber[REGNO(op)] < 0; > > return true if memory with sp addressing; > > }) > > > > I think LRA spills pseudo-register and it will be memory addressed > > by sp > > at the end of LRA. > > > >What I've done is this: > > > >(define_predicate "my_special_predicate" > > (match_operand 0 "memory_operand") > > { > >debug_rtx (op); > >gcc_assert (MEM_P (op)); > >op = XEXP (op, 0); > >if (GET_CODE (op) == PLUS) > > op = XEXP (op, 0); > > > >if (lra_in_progress) > > { > >fprintf (stderr, "%s:%d\n", __FILE__, __LINE__); > >return REG_P (op) && REGNO (op) >= FIRST_PSEUDO_REGISTER && > >reg_renumber[REGNO(op)] < 0; > > } > > > > > >if (REG_P (op)) > > { > >int regno = REGNO (op); > >return (regno == 10); // register is the stack pointer > > } > > > >return true; > > }) > > > > (and many variations) Unfortunately, any moderately complicated input > > still results in a (mem (reg) ) insn repeatedly entering the > > lra_in_progress case and returning false, and eventually terminating with > > > > "internal compiler error: maximum number of generated reload insns per > > insn achieved (90)" > > > > > >Any other ideas? > As I remember there were a few other ideas from Richard Biener and > Segher Boessenkool. I also proposed to add a new address register which > will be always a fixed stack memory slot at the end. Unfortunately I am > not familiar with the target and the port to say in details how to do > it. But I think it is worth to try. The m68hc11 port used the fake Z register approach, and I believe it had some special machine pass to get rid of it right before assembler output. (r171302 is when it was removed -- last version was https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/config/m68hc11/m68hc11.c;h=1e414102c3f1fed985e4fb8db7954342e965190b;hb=bae8bb65d842d7ffefe990c1f0ac004491f3c105#l4061 for the machine reorg stuff). No idea how well it works... But it's only needed if you are forced to have a frame pointer IIUC? Segher
Re: Special Memory Constraint [was Re: Indirect memory addresses vs. lra]
On 2019-08-19 3:35 a.m., John Darrington wrote: On Fri, Aug 16, 2019 at 10:50:13AM -0400, Vladimir Makarov wrote: No I meant something like that (define_special_memory_constraint "a" ...) (define_predicate "my_special_predicate" ... { if (lra_in_progress_p) return REG_P (op) && REGNO (op) >= FIRST_PSEUDO_REGISTER && reg_renumber[REGNO(op)] < 0; return true if memory with sp addressing; }) I think LRA spills pseudo-register and it will be memory addressed by sp at the end of LRA. What I've done is this: (define_predicate "my_special_predicate" (match_operand 0 "memory_operand") { debug_rtx (op); gcc_assert (MEM_P (op)); op = XEXP (op, 0); if (GET_CODE (op) == PLUS) op = XEXP (op, 0); if (lra_in_progress) { fprintf (stderr, "%s:%d\n", __FILE__, __LINE__); return REG_P (op) && REGNO (op) >= FIRST_PSEUDO_REGISTER && reg_renumber[REGNO(op)] < 0; } if (REG_P (op)) { int regno = REGNO (op); return (regno == 10); // register is the stack pointer } return true; }) (and many variations) Unfortunately, any moderately complicated input still results in a (mem (reg) ) insn repeatedly entering the lra_in_progress case and returning false, and eventually terminating with "internal compiler error: maximum number of generated reload insns per insn achieved (90)" Any other ideas? As I remember there were a few other ideas from Richard Biener and Segher Boessenkool. I also proposed to add a new address register which will be always a fixed stack memory slot at the end. Unfortunately I am not familiar with the target and the port to say in details how to do it. But I think it is worth to try.
Re: Special Memory Constraint [was Re: Indirect memory addresses vs. lra]
On Fri, Aug 16, 2019 at 10:50:13AM -0400, Vladimir Makarov wrote: No I meant something like that (define_special_memory_constraint "a" ...) (define_predicate "my_special_predicate" ... { if (lra_in_progress_p) return REG_P (op) && REGNO (op) >= FIRST_PSEUDO_REGISTER && reg_renumber[REGNO(op)] < 0; return true if memory with sp addressing; }) I think LRA spills pseudo-register and it will be memory addressed by sp at the end of LRA. What I've done is this: (define_predicate "my_special_predicate" (match_operand 0 "memory_operand") { debug_rtx (op); gcc_assert (MEM_P (op)); op = XEXP (op, 0); if (GET_CODE (op) == PLUS) op = XEXP (op, 0); if (lra_in_progress) { fprintf (stderr, "%s:%d\n", __FILE__, __LINE__); return REG_P (op) && REGNO (op) >= FIRST_PSEUDO_REGISTER && reg_renumber[REGNO(op)] < 0; } if (REG_P (op)) { int regno = REGNO (op); return (regno == 10); // register is the stack pointer } return true; }) (and many variations) Unfortunately, any moderately complicated input still results in a (mem (reg) ) insn repeatedly entering the lra_in_progress case and returning false, and eventually terminating with "internal compiler error: maximum number of generated reload insns per insn achieved (90)" Any other ideas? J'
Re: Special Memory Constraint [was Re: Indirect memory addresses vs. lra]
On 2019-08-16 7:23 a.m., John Darrington wrote: On Thu, Aug 15, 2019 at 02:23:45PM -0400, Vladimir Makarov wrote: > I tried this solution earlier. But unfortunately it makes things worse. What happens is it libgcc cannot > even be built -- ICEs occur on a memory from address reg insn such as: > (insn 117 2981 3697 5 (set (mem/f:PSI (plus:PSI (reg:PSI 1309) > (const_int 102 [0x66])) [3 fs_129(D)->pc+0 S4 A8]) > (reg:PSI 1310)) "/home/jmd/Source/GCC2/libgcc/unwind-dw2.c":977:9 96 {movpsi} > I see.?? Then for the insn, you could try to create a pattern "memory,special memory constraint".?? The special memory constraint should satisfy only spilled pseudo (pseudo with reg_renumber == -1).?? I believe lra-constraints.c can spill the pseudo and the end you will have mem[disp1 + r8|r9|sp] = mem[disp1+sp]. You mean something like this: (define_special_memory_constraint "a" "My special memory constraint" (match_operand 0 "my_special_predicate") ) (define_predicate "my_special_predicate" (match_operand 0 "memory_operand") { debug_rtx (op); if (MEM_P (op)) { op = XEXP (op, 0); if (GET_CODE (op) == PLUS) { op = XEXP (op, 0); if (REG_P (op)) { fprintf (stderr, "Reg number is %d\n", REGNO (op)); if (REGNO (op) >= 0) return false; } } } return true; }) No I meant something like that (define_special_memory_constraint "a" ...) (define_predicate "my_special_predicate" ... { if (lra_in_progress_p) return REG_P (op) && REGNO (op) >= FIRST_PSEUDO_REGISTER && reg_renumber[REGNO(op)] < 0; return true if memory with sp addressing; }) I think LRA spills pseudo-register and it will be memory addressed by sp at the end of LRA. When I use this I get lots of the following ICEs "internal compiler error: maximum number of generated reload insns per insn achieved (90)" It seems logical to me that this would happen since the constraint is not going to match any operand with resolved registers. Thus it will continually reload. ... which makes me think I've probably misunderstood what you are saying. J'
Special Memory Constraint [was Re: Indirect memory addresses vs. lra]
On Thu, Aug 15, 2019 at 02:23:45PM -0400, Vladimir Makarov wrote: > I tried this solution earlier. But unfortunately it makes things worse. What happens is it libgcc cannot > even be built -- ICEs occur on a memory from address reg insn such as: > (insn 117 2981 3697 5 (set (mem/f:PSI (plus:PSI (reg:PSI 1309) > (const_int 102 [0x66])) [3 fs_129(D)->pc+0 S4 A8]) > (reg:PSI 1310)) "/home/jmd/Source/GCC2/libgcc/unwind-dw2.c":977:9 96 {movpsi} > I see.?? Then for the insn, you could try to create a pattern "memory,special memory constraint".?? The special memory constraint should satisfy only spilled pseudo (pseudo with reg_renumber == -1).?? I believe lra-constraints.c can spill the pseudo and the end you will have mem[disp1 + r8|r9|sp] = mem[disp1+sp]. You mean something like this: (define_special_memory_constraint "a" "My special memory constraint" (match_operand 0 "my_special_predicate") ) (define_predicate "my_special_predicate" (match_operand 0 "memory_operand") { debug_rtx (op); if (MEM_P (op)) { op = XEXP (op, 0); if (GET_CODE (op) == PLUS) { op = XEXP (op, 0); if (REG_P (op)) { fprintf (stderr, "Reg number is %d\n", REGNO (op)); if (REGNO (op) >= 0) return false; } } } return true; }) When I use this I get lots of the following ICEs "internal compiler error: maximum number of generated reload insns per insn achieved (90)" It seems logical to me that this would happen since the constraint is not going to match any operand with resolved registers. Thus it will continually reload. ... which makes me think I've probably misunderstood what you are saying. J' -- Avoid eavesdropping. Send strong encrypted email. PGP Public key ID: 1024D/2DE827B3 fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3 See http://sks-keyservers.net or any PGP keyserver for public key.
Re: Indirect memory addresses vs. lra
On Thu, Aug 15, 2019 at 02:30:19PM -0400, Vladimir Makarov wrote: > >Couldn't we spill the frame pointer? Basically we should be able to > >compute the first address into a reg, spill that, do the second (both > >could require the frame pointer), spill the frame pointer, reload the > >first computed address from the stack, execute the insn and then reload > >the frame pointer. > > > >Maybe the frame pointer can also be implemented 'virually' in an index > >register that you keep updated so that sp + reg > >Is the FP. Or frame accesses can use a > >Stack slot as FP and the indirect memory > >Addressing... (is there an indirect lea?) > > > Yes, it could be a solution. It just needs some target maintainer > creativity. There are a lot of things (tricks) can be done in > machine-dependent code which would not require RA changes. You can even go as far as not having the hard frame pointer be a machine register at all. In RTL it will still be a reg, but that doesn't mean the machine code you emit should be like that; you can use a special fixed memory location for it, for example. Segher
Re: Indirect memory addresses vs. lra
On 8/15/19 12:38 PM, Richard Biener wrote: On August 15, 2019 6:29:13 PM GMT+02:00, Vladimir Makarov wrote: On 8/10/19 2:05 AM, John Darrington wrote: On Fri, Aug 09, 2019 at 01:34:36PM -0400, Vladimir Makarov wrote: If you provide LRA dump for such test (it is better to use -fira-verbose=15 to output full RA info into stderr), I probably could say more. I've attached such a dump (generated from gcc/testsuite/gcc.c-torture/compile/pr53410-2.c). The less regs the architecture has, thoke easier to run into such error message if something described wrong in the back-end.?? I see your architecture is 16-bit micro-controller with only 8 regs, some of them is specialized.?? So your architecture is really register constrained. That's not quite correct. It is a 24-bit micro-controller (the address space is 24 bits wide). There are 2 address registers (plus stack pointer and program counter) and there are 8 general purpose data registers (of differing sizes). J' Thank you for providing the sources. It helped me to understand what is going on. So the test crashes on /home/jmd/Source/GCC2/gcc/testsuite/gcc.c-torture/compile/pr53410-2.c: In function ‘f1’: /home/jmd/Source/GCC2/gcc/testsuite/gcc.c-torture/compile/pr53410-2.c:10:1: error: unable to find a register to spill /home/jmd/Source/GCC2/gcc/testsuite/gcc.c-torture/compile/pr53410-2.c:10:1: error: this is the insn: (insn 14 49 15 2 (set (mem:SI (plus:PSI (reg/f:PSI 40 [34]) (const_int 32 [0x20])) [2 S4 A64]) (mem:SI (reg:PSI 41) [2 *p_5(D)+0 S4 A8])) "/home/jmd/Source/GCC2/gcc/testsuite/gcc.c-torture/compile/pr53410-2.c":9:9 95 {*movsi} (expr_list:REG_DEAD (reg:PSI 41) (expr_list:REG_DEAD (reg/f:PSI 40 [34]) (nil Your target has only 2 non-fixed addr registers (r8, r9). One (r9) is defined as a hard reg pointer pointer. Honestly, I never saw a target with such register constraints. -O0 assumes -fno-omit-frame-pointer. So in -O0 mode we have only *one* free addr reg for insn which requires *2* of them. That is why the GCC port crashes on this test. If you add -fomit-frame-pointer, the test succeeds. But even if use -fomit-frame-pointer, it is not guaranteed that hard reg pointer will be substituted by stack pointer. There are many cases where it is not possible (e.g. in case of alloca usage). So what can be done, imho. The simplest solution would be preventing insns with more one memory operand. The more difficult solution would be permitting two memory one with address pseudo and another one with stack pointer. Couldn't we spill the frame pointer? Basically we should be able to compute the first address into a reg, spill that, do the second (both could require the frame pointer), spill the frame pointer, reload the first computed address from the stack, execute the insn and then reload the frame pointer. Maybe the frame pointer can also be implemented 'virually' in an index register that you keep updated so that sp + reg Is the FP. Or frame accesses can use a Stack slot as FP and the indirect memory Addressing... (is there an indirect lea?) Yes, it could be a solution. It just needs some target maintainer creativity. There are a lot of things (tricks) can be done in machine-dependent code which would not require RA changes.
Re: Indirect memory addresses vs. lra
On 8/15/19 1:35 PM, John Darrington wrote: On Thu, Aug 15, 2019 at 12:29:13PM -0400, Vladimir Makarov wrote: Thank you for providing the sources.?? It helped me to understand what is going on.?? So the test crashes on /home/jmd/Source/GCC2/gcc/testsuite/gcc.c-torture/compile/pr53410-2.c: In function ???f1???: /home/jmd/Source/GCC2/gcc/testsuite/gcc.c-torture/compile/pr53410-2.c:10:1: error: unable to find a register to spill /home/jmd/Source/GCC2/gcc/testsuite/gcc.c-torture/compile/pr53410-2.c:10:1: error: this is the insn: (insn 14 49 15 2 (set (mem:SI (plus:PSI (reg/f:PSI 40 [34]) (const_int 32 [0x20])) [2 S4 A64]) (mem:SI (reg:PSI 41) [2 *p_5(D)+0 S4 A8])) "/home/jmd/Source/GCC2/gcc/testsuite/gcc.c-torture/compile/pr53410-2.c":9:9 95 {*movsi} (expr_list:REG_DEAD (reg:PSI 41) (expr_list:REG_DEAD (reg/f:PSI 40 [34]) (nil Thanks for taking a look. Your target has only 2 non-fixed addr registers (r8, r9). One (r9) is defined as a hard reg pointer pointer. That is correct. Honestly, I never saw a target with such register constraints. My recollection is that MC68HC11 was the same. So what can be done, imho. The simplest solution would be preventing insns with more one memory operand. I tried this solution earlier. But unfortunately it makes things worse. What happens is it libgcc cannot even be built -- ICEs occur on a memory from address reg insn such as: (insn 117 2981 3697 5 (set (mem/f:PSI (plus:PSI (reg:PSI 1309) (const_int 102 [0x66])) [3 fs_129(D)->pc+0 S4 A8]) (reg:PSI 1310)) "/home/jmd/Source/GCC2/libgcc/unwind-dw2.c":977:9 96 {movpsi} I see. Then for the insn, you could try to create a pattern "memory,special memory constraint". The special memory constraint should satisfy only spilled pseudo (pseudo with reg_renumber == -1). I believe lra-constraints.c can spill the pseudo and the end you will have mem[disp1 + r8|r9|sp] = mem[disp1+sp]. It might work. If it is not, we could modify LRA to do this. Another solution would be adding unexisting register Z and for mem:psi [psi:r] = Z you could emit an assembler insn : mem[psi:r] = a stack slot corresponding Z.
Re: Indirect memory addresses vs. lra
On Thu, Aug 15, 2019 at 06:38:30PM +0200, Richard Biener wrote: Couldn't we spill the frame pointer? Basically we should be able to compute the first address into a reg, spill that, do the second (both could require the frame pointer), spill the frame pointer, reload the first computed address from the stack, execute the insn and then reload the frame pointer. Maybe the frame pointer can also be implemented 'virually' in an index register that you keep updated so that sp + reg Is the FP. Or frame accesses can use a Stack slot as FP and the indirect memory Addressing... (is there an indirect lea?) Yes. lea x, [4,x] is a valid instruction. J' -- Avoid eavesdropping. Send strong encrypted email. PGP Public key ID: 1024D/2DE827B3 fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3 See http://sks-keyservers.net or any PGP keyserver for public key.
Re: Indirect memory addresses vs. lra
On Thu, Aug 15, 2019 at 12:29:13PM -0400, Vladimir Makarov wrote: Thank you for providing the sources.?? It helped me to understand what is going on.?? So the test crashes on /home/jmd/Source/GCC2/gcc/testsuite/gcc.c-torture/compile/pr53410-2.c: In function ???f1???: /home/jmd/Source/GCC2/gcc/testsuite/gcc.c-torture/compile/pr53410-2.c:10:1: error: unable to find a register to spill /home/jmd/Source/GCC2/gcc/testsuite/gcc.c-torture/compile/pr53410-2.c:10:1: error: this is the insn: (insn 14 49 15 2 (set (mem:SI (plus:PSI (reg/f:PSI 40 [34]) (const_int 32 [0x20])) [2 S4 A64]) (mem:SI (reg:PSI 41) [2 *p_5(D)+0 S4 A8])) "/home/jmd/Source/GCC2/gcc/testsuite/gcc.c-torture/compile/pr53410-2.c":9:9 95 {*movsi} (expr_list:REG_DEAD (reg:PSI 41) (expr_list:REG_DEAD (reg/f:PSI 40 [34]) (nil Thanks for taking a look. Your target has only 2 non-fixed addr registers (r8, r9). One (r9) is defined as a hard reg pointer pointer. That is correct. Honestly, I never saw a target with such register constraints. My recollection is that MC68HC11 was the same. So what can be done, imho. The simplest solution would be preventing insns with more one memory operand. I tried this solution earlier. But unfortunately it makes things worse. What happens is it libgcc cannot even be built -- ICEs occur on a memory from address reg insn such as: (insn 117 2981 3697 5 (set (mem/f:PSI (plus:PSI (reg:PSI 1309) (const_int 102 [0x66])) [3 fs_129(D)->pc+0 S4 A8]) (reg:PSI 1310)) "/home/jmd/Source/GCC2/libgcc/unwind-dw2.c":977:9 96 {movpsi} J' -- Avoid eavesdropping. Send strong encrypted email. PGP Public key ID: 1024D/2DE827B3 fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3 See http://sks-keyservers.net or any PGP keyserver for public key.
Re: Indirect memory addresses vs. lra
On August 15, 2019 6:29:13 PM GMT+02:00, Vladimir Makarov wrote: >On 8/10/19 2:05 AM, John Darrington wrote: >> On Fri, Aug 09, 2019 at 01:34:36PM -0400, Vladimir Makarov wrote: >> >> If you provide LRA dump for such test (it is better to use >> -fira-verbose=15 to output full RA info into stderr), I >probably could >> say more. >> >> I've attached such a dump (generated from >gcc/testsuite/gcc.c-torture/compile/pr53410-2.c). >> >> The less regs the architecture has, thoke easier to run into >such error >> message if something described wrong in the back-end.?? I see >your >> architecture is 16-bit micro-controller with only 8 regs, some >of them is >> specialized.?? So your architecture is really register >constrained. >> >> That's not quite correct. It is a 24-bit micro-controller (the >address >> space is 24 bits wide). There are 2 address registers (plus stack >> pointer and program counter) and there are 8 general purpose data >> registers (of differing sizes). >> >> >> J' >> >Thank you for providing the sources. It helped me to understand what >is >going on. So the test crashes on > >/home/jmd/Source/GCC2/gcc/testsuite/gcc.c-torture/compile/pr53410-2.c: >In function ‘f1’: >/home/jmd/Source/GCC2/gcc/testsuite/gcc.c-torture/compile/pr53410-2.c:10:1: >error: unable to find a register to spill >/home/jmd/Source/GCC2/gcc/testsuite/gcc.c-torture/compile/pr53410-2.c:10:1: >error: this is the insn: >(insn 14 49 15 2 (set (mem:SI (plus:PSI (reg/f:PSI 40 [34]) > (const_int 32 [0x20])) [2 S4 A64]) >(mem:SI (reg:PSI 41) [2 *p_5(D)+0 S4 A8])) >"/home/jmd/Source/GCC2/gcc/testsuite/gcc.c-torture/compile/pr53410-2.c":9:9 >95 {*movsi} > (expr_list:REG_DEAD (reg:PSI 41) > (expr_list:REG_DEAD (reg/f:PSI 40 [34]) > (nil > >Your target has only 2 non-fixed addr registers (r8, r9). One (r9) is >defined as a hard reg pointer pointer. Honestly, I never saw a target >with such register constraints. > >-O0 assumes -fno-omit-frame-pointer. So in -O0 mode we have only *one* >free addr reg for insn which requires *2* of them. That is why the GCC >port crashes on this test. If you add -fomit-frame-pointer, the test >succeeds. > >But even if use -fomit-frame-pointer, it is not guaranteed that hard >reg pointer will be substituted by stack pointer. There are many cases >where it is not possible (e.g. in case of alloca usage). > >So what can be done, imho. The simplest solution would be preventing >insns with more one memory operand. The more difficult solution would >be permitting two memory one with address pseudo and another one with >stack pointer. Couldn't we spill the frame pointer? Basically we should be able to compute the first address into a reg, spill that, do the second (both could require the frame pointer), spill the frame pointer, reload the first computed address from the stack, execute the insn and then reload the frame pointer. Maybe the frame pointer can also be implemented 'virually' in an index register that you keep updated so that sp + reg Is the FP. Or frame accesses can use a Stack slot as FP and the indirect memory Addressing... (is there an indirect lea?) >I think only after solving this problem, you could think about >implementing indirect memory addressing. > >
Re: Indirect memory addresses vs. lra
On 8/10/19 2:05 AM, John Darrington wrote: On Fri, Aug 09, 2019 at 01:34:36PM -0400, Vladimir Makarov wrote: If you provide LRA dump for such test (it is better to use -fira-verbose=15 to output full RA info into stderr), I probably could say more. I've attached such a dump (generated from gcc/testsuite/gcc.c-torture/compile/pr53410-2.c). The less regs the architecture has, thoke easier to run into such error message if something described wrong in the back-end.?? I see your architecture is 16-bit micro-controller with only 8 regs, some of them is specialized.?? So your architecture is really register constrained. That's not quite correct. It is a 24-bit micro-controller (the address space is 24 bits wide). There are 2 address registers (plus stack pointer and program counter) and there are 8 general purpose data registers (of differing sizes). J' Thank you for providing the sources. It helped me to understand what is going on. So the test crashes on /home/jmd/Source/GCC2/gcc/testsuite/gcc.c-torture/compile/pr53410-2.c: In function ‘f1’: /home/jmd/Source/GCC2/gcc/testsuite/gcc.c-torture/compile/pr53410-2.c:10:1: error: unable to find a register to spill /home/jmd/Source/GCC2/gcc/testsuite/gcc.c-torture/compile/pr53410-2.c:10:1: error: this is the insn: (insn 14 49 15 2 (set (mem:SI (plus:PSI (reg/f:PSI 40 [34]) (const_int 32 [0x20])) [2 S4 A64]) (mem:SI (reg:PSI 41) [2 *p_5(D)+0 S4 A8])) "/home/jmd/Source/GCC2/gcc/testsuite/gcc.c-torture/compile/pr53410-2.c":9:9 95 {*movsi} (expr_list:REG_DEAD (reg:PSI 41) (expr_list:REG_DEAD (reg/f:PSI 40 [34]) (nil Your target has only 2 non-fixed addr registers (r8, r9). One (r9) is defined as a hard reg pointer pointer. Honestly, I never saw a target with such register constraints. -O0 assumes -fno-omit-frame-pointer. So in -O0 mode we have only *one* free addr reg for insn which requires *2* of them. That is why the GCC port crashes on this test. If you add -fomit-frame-pointer, the test succeeds. But even if use -fomit-frame-pointer, it is not guaranteed that hard reg pointer will be substituted by stack pointer. There are many cases where it is not possible (e.g. in case of alloca usage). So what can be done, imho. The simplest solution would be preventing insns with more one memory operand. The more difficult solution would be permitting two memory one with address pseudo and another one with stack pointer. I think only after solving this problem, you could think about implementing indirect memory addressing.
Re: Indirect memory addresses vs. lra
On 2019-08-10 2:05 a.m., John Darrington wrote: On Fri, Aug 09, 2019 at 01:34:36PM -0400, Vladimir Makarov wrote: If you provide LRA dump for such test (it is better to use -fira-verbose=15 to output full RA info into stderr), I probably could say more. I've attached such a dump (generated from gcc/testsuite/gcc.c-torture/compile/pr53410-2.c). Unfortunately, this info is not enough for me to say what is the problem. I only found suspicious that LRA is trying to assign a few registers to a pseudo register and fails even though these registers are not assigned to anything. Probably HARD_REGNO_MODE_OK prevents this. So it would be interesting to know how many registers of Pmode are actually available. In any case I'll try to look at this problem more on this week using your built gcc on gcc135. The less regs the architecture has, thoke easier to run into such error message if something described wrong in the back-end.?? I see your architecture is 16-bit micro-controller with only 8 regs, some of them is specialized.?? So your architecture is really register constrained. That's not quite correct. It is a 24-bit micro-controller (the address space is 24 bits wide). There are 2 address registers (plus stack pointer and program counter) and there are 8 general purpose data registers (of differing sizes).
Re: Indirect memory addresses vs. lra
Hi John, On Mon, Aug 12, 2019 at 08:47:43AM +0200, John Darrington wrote: > On Sat, Aug 10, 2019 at 11:12:18AM -0500, Segher Boessenkool wrote: > On Sat, Aug 10, 2019 at 08:05:53AM +0200, John Darrington wrote: > > Choosing alt 5 in insn 14: (0) m (1) m {*movsi} > >14: [r40:PSI+0x20]=[r41:PSI] > > Inserting insn reload before: > >48: r40:PSI=r34:PSI > >49: r41:PSI=[y:PSI+0x2f] > > insn 14 is a mem-to-mem move (another feature not many more modern / > more RISCy CPUs have). That requires both of your address registers. > So far, so good. The reloads (insn 48 and 49) require address > registers themselves; that isn't necessarily a problem either. > > So far as I can see, insn 48 is completely redundant. It's copying a > pseudo reg (74) into another pseudo reg (40). > This is pointless and a waste, since insn 14 does not modify 74. > I don't understand why lra feels the need to do it. LRA always does this, I think... it reloads all inputs to all insns that may need reloading. It later optimises most of that away again, but this gives it a lot of freedom to move things around. Or that is what it always looked like to me. I haven't looked at the code to see if that is the real reason, blush. > If lra knew about (mem (mem ...)) style addressing, then insn 49 would > also be redundant (which is why I raised the topic). Yes. But it probably should be able to deal with things like this, too, or some other testcases will die a horrible death. > In summary, what we have is: > > (insn 48 84 49 2 (set (reg/f:PSI 40 [34]) > (reg/f:PSI 74 [34])) > (nil)) > (insn 49 48 14 2 (set (reg:PSI 41) > (mem/f/c:PSI (plus:PSI (reg/f:PSI 9 y) > (const_int 47 [0x2f])) [3 p+0 S4 A8])) > (nil)) > (insn 14 49 15 2 (set (mem:SI (plus:PSI (reg/f:PSI 40 [34]) > (const_int 32 [0x20])) [2 S4 A64]) > (mem:SI (reg:PSI 41) [2 *p_5(D)+0 S4 A8])) > > where, like you say, insns 48 and 49 are reloads. But these two reloads > are unnecessary and cause the machine to run out of PSImode registers. Anyway, please have patience, and see what Vladimir comes up with. These things take time. Segher
Re: Indirect memory addresses vs. lra
On Sat, Aug 10, 2019 at 11:12:18AM -0500, Segher Boessenkool wrote: Hi! On Sat, Aug 10, 2019 at 08:05:53AM +0200, John Darrington wrote: > Choosing alt 5 in insn 14: (0) m (1) m {*movsi} >14: [r40:PSI+0x20]=[r41:PSI] > Inserting insn reload before: >48: r40:PSI=r34:PSI >49: r41:PSI=[y:PSI+0x2f] insn 14 is a mem-to-mem move (another feature not many more modern / more RISCy CPUs have). That requires both of your address registers. So far, so good. The reloads (insn 48 and 49) require address registers themselves; that isn't necessarily a problem either. So far as I can see, insn 48 is completely redundant. It's copying a pseudo reg (74) into another pseudo reg (40). This is pointless and a waste, since insn 14 does not modify 74. I don't understand why lra feels the need to do it. If lra knew about (mem (mem ...)) style addressing, then insn 49 would also be redundant (which is why I raised the topic). In summary, what we have is: (insn 48 84 49 2 (set (reg/f:PSI 40 [34]) (reg/f:PSI 74 [34])) (nil)) (insn 49 48 14 2 (set (reg:PSI 41) (mem/f/c:PSI (plus:PSI (reg/f:PSI 9 y) (const_int 47 [0x2f])) [3 p+0 S4 A8])) (nil)) (insn 14 49 15 2 (set (mem:SI (plus:PSI (reg/f:PSI 40 [34]) (const_int 32 [0x20])) [2 S4 A64]) (mem:SI (reg:PSI 41) [2 *p_5(D)+0 S4 A8])) where, like you say, insns 48 and 49 are reloads. But these two reloads are unnecessary and cause the machine to run out of PSImode registers. The above could be easier and more efficiently done simply as: (insn 14 11 15 2 (set (mem:SI (plus:PSI (reg/f:PSI 74 [34]) (const_int 32 [0x20])) [2 S4 A64]) (mem/f/c:PSI (mem:PSI (plus:PSI (reg/f:PSI 9 y) (const_int 47 [0x2f])) [3 p+0 S4 A8]))) This is exactly what we had before lra messed with things. It can be represented in the ISA with one assembler instruction: mov.p (32, x), [47, y] and if I'm not mistaken, alternative 5 of my "movpsi" pattern should do this just fine. But this requires careful juggling. Maybe you will need some backend code Could you give a hint into which set of hooks/constraints/predicates this backend code should go? -- Avoid eavesdropping. Send strong encrypted email. PGP Public key ID: 1024D/2DE827B3 fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3 See http://sks-keyservers.net or any PGP keyserver for public key.
Re: Indirect memory addresses vs. lra
On Sat, Aug 10, 2019 at 08:10:27AM +0200, John Darrington wrote: > On Fri, Aug 09, 2019 at 09:16:44AM -0500, Segher Boessenkool wrote: > > Is your code in some branch in our git? > > No. But it could be pushed there if people think it would be > appropriate to do so, and if I'm given the permissions to do so. > > Or in some other public git? > > It's in my repo on gcc135 ~jmd/gcc-s12z (branch s12z) That will work fine, for me at least. > Do you have a representative testcase? > > I think gcc/testsuite/gcc.c-torture/compile/pr53410-2.c is as > representative as any. Okido, thanks! Segher
Re: Indirect memory addresses vs. lra
Hi! On Sat, Aug 10, 2019 at 08:05:53AM +0200, John Darrington wrote: >Choosing alt 5 in insn 14: (0) m (1) m {*movsi} >14: [r40:PSI+0x20]=[r41:PSI] > Inserting insn reload before: >48: r40:PSI=r34:PSI >49: r41:PSI=[y:PSI+0x2f] insn 14 is a mem-to-mem move (another feature not many more modern / more RISCy CPUs have). That requires both of your address registers. So far, so good. The reloads (insn 48 and 49) require address registers themselves; that isn't necessarily a problem either. But this requires careful juggling. Maybe you will need some backend code for this, or to optimise this (although right now you just want it to *work* :-) ) For some reason LRA didn't manage. Register inheritance seems to be implicated (but that might be a red herring). Vladimir will probably find out more, and/or correct me :-) Segher
Re: Indirect memory addresses vs. lra
On Fri, Aug 09, 2019 at 09:16:44AM -0500, Segher Boessenkool wrote: Is your code in some branch in our git? No. But it could be pushed there if people think it would be appropriate to do so, and if I'm given the permissions to do so. Or in some other public git? It's in my repo on gcc135 ~jmd/gcc-s12z (branch s12z) Do you have a representative testcase? I think gcc/testsuite/gcc.c-torture/compile/pr53410-2.c is as representative as any. J' -- Avoid eavesdropping. Send strong encrypted email. PGP Public key ID: 1024D/2DE827B3 fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3 See http://sks-keyservers.net or any PGP keyserver for public key. signature.asc Description: PGP signature
Re: Indirect memory addresses vs. lra
On Fri, Aug 09, 2019 at 01:34:36PM -0400, Vladimir Makarov wrote: If you provide LRA dump for such test (it is better to use -fira-verbose=15 to output full RA info into stderr), I probably could say more. I've attached such a dump (generated from gcc/testsuite/gcc.c-torture/compile/pr53410-2.c). The less regs the architecture has, thoke easier to run into such error message if something described wrong in the back-end.?? I see your architecture is 16-bit micro-controller with only 8 regs, some of them is specialized.?? So your architecture is really register constrained. That's not quite correct. It is a 24-bit micro-controller (the address space is 24 bits wide). There are 2 address registers (plus stack pointer and program counter) and there are 8 general purpose data registers (of differing sizes). J' -- Avoid eavesdropping. Send strong encrypted email. PGP Public key ID: 1024D/2DE827B3 fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3 See http://sks-keyservers.net or any PGP keyserver for public key. Building IRA IR Pass 0 for finding pseudo/allocno costs r36: preferred X_REG, alternative NO_REGS, allocno X_REG a0 (r36,l0) best X_REG, allocno X_REG r35: preferred X_REG, alternative NO_REGS, allocno X_REG a10 (r35,l0) best X_REG, allocno X_REG r34: preferred X_REG, alternative NO_REGS, allocno X_REG a1 (r34,l0) best X_REG, allocno X_REG r33: preferred DATA_REGS, alternative NO_REGS, allocno DATA_REGS a11 (r33,l0) best DATA_REGS, allocno DATA_REGS r32: preferred DATA_REGS, alternative NO_REGS, allocno DATA_REGS a12 (r32,l0) best DATA_REGS, allocno DATA_REGS r31: preferred DATA_REGS, alternative NO_REGS, allocno DATA_REGS a14 (r31,l0) best DATA_REGS, allocno DATA_REGS r30: preferred NO_REGS, alternative NO_REGS, allocno NO_REGS a13 (r30,l0) best NO_REGS, allocno NO_REGS r29: preferred X_REG, alternative NO_REGS, allocno X_REG a15 (r29,l0) best X_REG, allocno X_REG r28: preferred X_REG, alternative NO_REGS, allocno X_REG a16 (r28,l0) best X_REG, allocno X_REG r27: preferred X_REG, alternative NO_REGS, allocno X_REG a17 (r27,l0) best X_REG, allocno X_REG r26: preferred DATA_REGS, alternative NO_REGS, allocno DATA_REGS a2 (r26,l0) best DATA_REGS, allocno DATA_REGS r25: preferred DATA_REGS, alternative NO_REGS, allocno DATA_REGS a4 (r25,l0) best DATA_REGS, allocno DATA_REGS r24: preferred DATA_REGS, alternative NO_REGS, allocno DATA_REGS a3 (r24,l0) best DATA_REGS, allocno DATA_REGS r23: preferred DATA_REGS, alternative NO_REGS, allocno DATA_REGS a5 (r23,l0) best DATA_REGS, allocno DATA_REGS r22: preferred DATA_REGS, alternative NO_REGS, allocno DATA_REGS a6 (r22,l0) best DATA_REGS, allocno DATA_REGS r21: preferred DATA_REGS, alternative NO_REGS, allocno DATA_REGS a8 (r21,l0) best DATA_REGS, allocno DATA_REGS r20: preferred DATA_REGS, alternative NO_REGS, allocno DATA_REGS a7 (r20,l0) best DATA_REGS, allocno DATA_REGS r19: preferred DATA_REGS, alternative NO_REGS, allocno DATA_REGS a9 (r19,l0) best DATA_REGS, allocno DATA_REGS a0(r36,l0) costs: X_REG:0 MEM:5000 a1(r34,l0) costs: X_REG:0 MEM:84000 a2(r26,l0) costs: DATA_REGS:0 MEM:5000 a3(r24,l0) costs: DATA_REGS:0 MEM:5000 a4(r25,l0) costs: DATA_REGS:0 MEM:5000 a5(r23,l0) costs: DATA_REGS:0 MEM:5000 a6(r22,l0) costs: DATA_REGS:0 MEM:5000 a7(r20,l0) costs: DATA_REGS:0 MEM:5000 a8(r21,l0) costs: DATA_REGS:0 MEM:5000 a9(r19,l0) costs: DATA_REGS:0 MEM:5000 a10(r35,l0) costs: X_REG:0 MEM:5000 a11(r33,l0) costs: DATA_REGS:0 MEM:8000 a12(r32,l0) costs: DATA_REGS:0 MEM:7000 a13(r30,l0) costs: MEM:8000 a14(r31,l0) costs: DATA_REGS:0 MEM:7000 a15(r29,l0) costs: X_REG:0 MEM:8000 a16(r28,l0) costs: X_REG:0 MEM:8000 a17(r27,l0) costs: X_REG:2000 MEM:8000 Insn 43(l0): point = 0 Insn 39(l0): point = 3 Insn 38(l0): point = 5 Insn 37(l0): point = 7 Insn 36(l0): point = 9 Insn 35(l0): point = 11 Insn 34(l0): point = 13 Insn 33(l0): point = 15 Insn 32(l0): point = 17 Insn 31(l0): point = 19 Insn 30(l0): point = 21 Insn 29(l0): point = 23 Insn 28(l0): point = 25 Insn 27(l0): point = 27 Insn 26(l0): point = 29 Insn 25(l0): point = 31 Insn 24(l0): point = 33 Insn 23(l0): point = 35 Insn 22(l0): point = 37 Insn 21(l0): point = 39 Insn 20(l0): point = 41 Insn 19(l0): point = 43 Insn 18(l0): point = 45 Insn 17(l0): point = 47 Insn 16(l0): point = 49 Insn 15(l0): point = 51 Insn 14(l0): point = 53 Insn 9(l0): point = 55 Insn 8(l0): point = 57 Insn 7(l0): point = 59 Insn 6(l0): point = 61 Insn 5(l0): point = 63 Insn 4(l0): point = 65 Insn 3(l0): point = 67 Insn 2(l0): point = 69 Insn 10(l0): point = 71 a0(r36): [4..5] a1(r34): [4..55] a2(r26): [18..21] a3(r24): [20..25] a4(r25): [22..23] a5(r23): [26..27]
Re: Indirect memory addresses vs. lra
On 2019-08-09 4:14 a.m., John Darrington wrote: On Thu, Aug 08, 2019 at 01:57:41PM -0600, Jeff Law wrote: Yea, it's certainly designed with the more mainstream architectures in mind. THe double-indirect case that's being talked about here is well out of the mainstream and not a feature of anything LRA has targetted to date. So I'm not surprised it's not working. My suggestion would be to ignore the double-indirect aspect of the architecture right now, get the port working, then come back and try to make double-indirect addressing modes work. This sounds like sensible advice. However I wonder if this issue is related to the other major outstanding problem I have, viz: the large number of test failures which report "Unable to find a register to spill" - So far, nobody has been able to explain how to solve that issue and even the people who appear to be more knowlegeable have expressed suprise that it is even happening at all. Basically, LRA behaves here as older reload. If an RTL insn needs hard regs and there are no free regs, LRA/reload put pseudos assigned to hard regs and living through the insn into memory. So it is very hard to run into problem "unable to find a register to spill", if the insn needs less regs provided by architecture. That is why people are surprised. Still it can happens as one RTL insn can be implemented by a few machine insns. Most frequent case here are GCC asm insns requiring a lot of input/output/and clobbered regs/operands. If you provide LRA dump for such test (it is better to use -fira-verbose=15 to output full RA info into stderr), I probably could say more. The less regs the architecture has, the easier to run into such error message if something described wrong in the back-end. I see your architecture is 16-bit micro-controller with only 8 regs, some of them is specialized. So your architecture is really register constrained. Even if it should turn out not to be related, the message I've been receiving in this thread is lra should not be expected to work for non "mainstream" backends. So perhaps there is another, yet to be discovered, restriction which prevents my backend from ever working? On the other hand, given my lack of experience with gcc, it could be that lra is working perfectly, and I have simply done something incorrectly.But the uncertainty voiced in this thread means that it is hard to be sure that I'm not trying to do something which is currently unsupported. LRA/reload is the most machine-dependent machine-independent pass in GCC. It is connected to machine-dependent code by numerous ways. Big part of making a new backend is to make LRA/reload and machine-dependent code communication in the right way. Sometimes it is hard to decide who is responsible for RA related bugs: RA or back-end. Sometimes an innocent change in RA solving one problem for a particular target might results in numerous new bugs for other targets. Therefore it is very difficult to say will your small change to permit indirect memory addressing work in general case.
Re: Indirect memory addresses vs. lra
On 8/9/19 2:14 AM, John Darrington wrote: > On Thu, Aug 08, 2019 at 01:57:41PM -0600, Jeff Law wrote: > > Yea, it's certainly designed with the more mainstream architectures in > mind. THe double-indirect case that's being talked about here is well > out of the mainstream and not a feature of anything LRA has targetted to > date. So I'm not surprised it's not working. > > My suggestion would be to ignore the double-indirect aspect of the > architecture right now, get the port working, then come back and try to > make double-indirect addressing modes work. > > This sounds like sensible advice. However I wonder if this issue is > related to the other major outstanding problem I have, viz: the large > number of test failures which report "Unable to find a register to > spill" - So far, nobody has been able to explain how to solve that > issue and even the people who appear to be more knowlegeable have > expressed suprise that it is even happening at all. You're going to have to debug what LRA is doing and why. There's really no short-cuts here. We can't really do it for you. Even if you weren't using LRA you'd be doing the same process, just on even more difficult to understand codebase. > > Even if it should turn out not to be related, the message I've been > receiving in this thread is lra should not be expected to work for > non "mainstream" backends. So perhaps there is another, yet to be > discovered, restriction which prevents my backend from ever working? It's possible. But that's not really any different than reload. There's certainly various aspects of architectures that reload can't handle as well -- even on architectures that were mainstream processors when reload was under active development and maintenance. THere's even a good chance reload won't handle double-indirect addressing modes well -- they were far from mainstream and as a result the code which does purport to handle double-indirect addressing modes hasn't been used/tested all that much over the last 25+ years. > > On the other hand, given my lack of experience with gcc, it could be > that lra is working perfectly, and I have simply done something > incorrectly.But the uncertainty voiced in this thread means that it > is hard to be sure that I'm not trying to do something which is > currently unsupported. My recommendation is to continue with the LRA path. jeff
Re: Indirect memory addresses vs. lra
> On Aug 9, 2019, at 10:16 AM, Segher Boessenkool > wrote: > > Hi! > > On Fri, Aug 09, 2019 at 10:14:39AM +0200, John Darrington wrote: >> On Thu, Aug 08, 2019 at 01:57:41PM -0600, Jeff Law wrote: >> >> ... However I wonder if this issue is >> related to the other major outstanding problem I have, viz: the large >> number of test failures which report "Unable to find a register to >> spill" - So far, nobody has been able to explain how to solve that >> issue and even the people who appear to be more knowlegeable have >> expressed suprise that it is even happening at all. > > No one is surprised. It is just the funny way that LRA says "whoops I > am going in circles, there is no progress and there will never be, I'd > better stop that". Everyone doing new ports / new conversions to LRA > sees that error all the time. > > The error could be pretty much *anywhere* in your port. You have to > look at what LRA did, and why, and why that is wrong, and fix that. I've run into this a number of times. The difficulty is that, for someone who understands the back end and the documented rules but not the internals of LRA, it tends to be hard to figure out what the problem is. And since the causes tend to be obscure and undocumented, I find myself having to relearn the analysis from time to time. It has been stated that LRA is more dependent on correct back end definitions than Reload is, but unfortunately the precise definition of "correct" can be less than obvious to a back end maintainer. paul
Re: Indirect memory addresses vs. lra
Hi! On Fri, Aug 09, 2019 at 10:14:39AM +0200, John Darrington wrote: > On Thu, Aug 08, 2019 at 01:57:41PM -0600, Jeff Law wrote: > > Yea, it's certainly designed with the more mainstream architectures in > mind. THe double-indirect case that's being talked about here is well > out of the mainstream and not a feature of anything LRA has targetted to > date. So I'm not surprised it's not working. > > My suggestion would be to ignore the double-indirect aspect of the > architecture right now, get the port working, then come back and try to > make double-indirect addressing modes work. > > This sounds like sensible advice. However I wonder if this issue is > related to the other major outstanding problem I have, viz: the large > number of test failures which report "Unable to find a register to > spill" - So far, nobody has been able to explain how to solve that > issue and even the people who appear to be more knowlegeable have > expressed suprise that it is even happening at all. No one is surprised. It is just the funny way that LRA says "whoops I am going in circles, there is no progress and there will never be, I'd better stop that". Everyone doing new ports / new conversions to LRA sees that error all the time. The error could be pretty much *anywhere* in your port. You have to look at what LRA did, and why, and why that is wrong, and fix that. > Even if it should turn out not to be related, the message I've been > receiving in this thread is lra should not be expected to work for > non "mainstream" backends. LRA is more likely to have problems in situations where it has not been tested before. You can replace LRA by anything else, and this isn't limited to GCC (or software, or human endeavours, or humanity even). > So perhaps there is another, yet to be > discovered, restriction which prevents my backend from ever working? >From ever? Nah, we can patch. Also, Occam's razor says there likely is an error in your backend you haven't found yet. > On the other hand, given my lack of experience with gcc, it could be > that lra is working perfectly, and I have simply done something > incorrectly.But the uncertainty voiced in this thread means that it > is hard to be sure that I'm not trying to do something which is > currently unsupported. Is your code in some branch in our git? Or in some other public git? Do you have a representative testcase? Segher
Re: Indirect memory addresses vs. lra
On Thu, Aug 08, 2019 at 01:57:41PM -0600, Jeff Law wrote: Yea, it's certainly designed with the more mainstream architectures in mind. THe double-indirect case that's being talked about here is well out of the mainstream and not a feature of anything LRA has targetted to date. So I'm not surprised it's not working. My suggestion would be to ignore the double-indirect aspect of the architecture right now, get the port working, then come back and try to make double-indirect addressing modes work. This sounds like sensible advice. However I wonder if this issue is related to the other major outstanding problem I have, viz: the large number of test failures which report "Unable to find a register to spill" - So far, nobody has been able to explain how to solve that issue and even the people who appear to be more knowlegeable have expressed suprise that it is even happening at all. Even if it should turn out not to be related, the message I've been receiving in this thread is lra should not be expected to work for non "mainstream" backends. So perhaps there is another, yet to be discovered, restriction which prevents my backend from ever working? On the other hand, given my lack of experience with gcc, it could be that lra is working perfectly, and I have simply done something incorrectly.But the uncertainty voiced in this thread means that it is hard to be sure that I'm not trying to do something which is currently unsupported. J' -- Avoid eavesdropping. Send strong encrypted email. PGP Public key ID: 1024D/2DE827B3 fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3 See http://sks-keyservers.net or any PGP keyserver for public key.
Re: Indirect memory addresses vs. lra
On 8/8/19 1:19 PM, Segher Boessenkool wrote: > On Thu, Aug 08, 2019 at 01:30:41PM -0400, Paul Koning wrote: >> >> >>> On Aug 8, 2019, at 1:21 PM, Segher Boessenkool >>> wrote: >>> >>> On Thu, Aug 08, 2019 at 12:43:52PM -0400, Paul Koning wrote: > On Aug 8, 2019, at 12:25 PM, Vladimir Makarov wrote: > The old reload (reload[1].c) supports such addressing. As modern > mainstream architectures have no this kind of addressing, it was not > implemented in LRA. Is LRA only intended for "modern mainstream architectures"? >>> >>> I sure hope not! But it has only been *used* and *tested* much on such, >>> so far. >> >> That's not entirely accurate. At the prodding of people pushing for >> the removal of CC0 and reload, I've added LRA support to pdp11 in the >> V9 cycle. > > I said "much" :-) > > Pretty much all design input so far has been from "modern mainstream > architectures", as far as I can make out. Now one of those has the > most "interesting" (for RA) features that many less mainstream archs > have (a not-so-very-flat register file), so it should still work pretty > well hopefully. Yea, it's certainly designed with the more mainstream architectures in mind. THe double-indirect case that's being talked about here is well out of the mainstream and not a feature of anything LRA has targetted to date. So I'm not surprised it's not working. My suggestion would be to ignore the double-indirect aspect of the architecture right now, get the port working, then come back and try to make double-indirect addressing modes work. > >> And it works pretty well, in the sense of passing the >> compile tests. But I haven't yet examined the code quality vs. the >> old one in any detail. > > That would be quite interesting to see, also for the other ports that > still need conversion: how much (if any) degradation should you expect > from a straight-up conversion of a port to LRA, without any retuning? I did the v850 last year where it was a wash or perhaps a slight improvement for codesize, which is a reasonable approximation for performance on that target. I was working a bit on converting the H8 away from cc0 with an eye towards LRA as well. Given how registers overlap on the H8, the most straightforward port should end up with properties much like 32bit x86. I suspect the independent addressing of the high/low register parts might be better handled by LRA, but I wasn't going to do anything beyond the "just make it work". jeff
Re: Indirect memory addresses vs. lra
On Thu, Aug 08, 2019 at 01:30:41PM -0400, Paul Koning wrote: > > > > On Aug 8, 2019, at 1:21 PM, Segher Boessenkool > > wrote: > > > > On Thu, Aug 08, 2019 at 12:43:52PM -0400, Paul Koning wrote: > >>> On Aug 8, 2019, at 12:25 PM, Vladimir Makarov wrote: > >>> The old reload (reload[1].c) supports such addressing. As modern > >>> mainstream architectures have no this kind of addressing, it was not > >>> implemented in LRA. > >> > >> Is LRA only intended for "modern mainstream architectures"? > > > > I sure hope not! But it has only been *used* and *tested* much on such, > > so far. > > That's not entirely accurate. At the prodding of people pushing for > the removal of CC0 and reload, I've added LRA support to pdp11 in the > V9 cycle. I said "much" :-) Pretty much all design input so far has been from "modern mainstream architectures", as far as I can make out. Now one of those has the most "interesting" (for RA) features that many less mainstream archs have (a not-so-very-flat register file), so it should still work pretty well hopefully. > And it works pretty well, in the sense of passing the > compile tests. But I haven't yet examined the code quality vs. the > old one in any detail. That would be quite interesting to see, also for the other ports that still need conversion: how much (if any) degradation should you expect from a straight-up conversion of a port to LRA, without any retuning? Segher
Re: Indirect memory addresses vs. lra
On Thu, Aug 08, 2019 at 01:25:27PM -0400, Paul Koning wrote: > > On Aug 8, 2019, at 1:21 PM, Segher Boessenkool > > wrote: > > On Thu, Aug 08, 2019 at 12:43:52PM -0400, Paul Koning wrote: > >> Indirect addressing is a key feature in size-optimized code. > > > > That doesn't mean that LRA has to support it, btw, not necessarily; it > > may well be possible to do a good job of this in the later passes? > > Maybe postreload, maybe some peepholes, etc.? > > Possibly. But as Vladimir points out, indirect addressing affects > register allocation (reducing register pressure). Yeah, good point, esp. if you have only one or two registers that you can use for addressing at all. So it will have to happen during (or before?) RA, alright. Segher
Re: Indirect memory addresses vs. lra
On 2019-08-08 12:43 p.m., Paul Koning wrote: On Aug 8, 2019, at 12:25 PM, Vladimir Makarov wrote: On 2019-08-04 3:18 p.m., John Darrington wrote: I'm trying to write a back-end for an architecture (s12z - the ISA you can download from [1]). This arch accepts indirect memory addresses. That is to say, those of the form (mem (mem (...))) and although my TARGET_LEGITIMATE_ADDRESS function returns true for such addresses, LRA insists on reloading them out of existence. ... The old reload (reload[1].c) supports such addressing. As modern mainstream architectures have no this kind of addressing, it was not implemented in LRA. Is LRA only intended for "modern mainstream architectures"? No. As I wrote patches implementing indirect addressing is welcomed. It is hard to implement everything at once and by one person. If yes, why is the old reload being deprecated? You can't have it both ways. Unless you want to obsolete all "not modern mainstream architectures" in GCC, it doesn't make sense to get rid of core functionality used by those architectures. Indirect addressing is a key feature in size-optimized code.
Re: Indirect memory addresses vs. lra
> On Aug 8, 2019, at 1:21 PM, Segher Boessenkool > wrote: > > On Thu, Aug 08, 2019 at 12:43:52PM -0400, Paul Koning wrote: >>> On Aug 8, 2019, at 12:25 PM, Vladimir Makarov wrote: >>> The old reload (reload[1].c) supports such addressing. As modern >>> mainstream architectures have no this kind of addressing, it was not >>> implemented in LRA. >> >> Is LRA only intended for "modern mainstream architectures"? > > I sure hope not! But it has only been *used* and *tested* much on such, > so far. That's not entirely accurate. At the prodding of people pushing for the removal of CC0 and reload, I've added LRA support to pdp11 in the V9 cycle. And it works pretty well, in the sense of passing the compile tests. But I haven't yet examined the code quality vs. the old one in any detail. paul
Re: Indirect memory addresses vs. lra
> On Aug 8, 2019, at 1:21 PM, Segher Boessenkool > wrote: > > On Thu, Aug 08, 2019 at 12:43:52PM -0400, Paul Koning wrote: >>> On Aug 8, 2019, at 12:25 PM, Vladimir Makarov wrote: >>> The old reload (reload[1].c) supports such addressing. As modern >>> mainstream architectures have no this kind of addressing, it was not >>> implemented in LRA. >> >> Is LRA only intended for "modern mainstream architectures"? > > I sure hope not! But it has only been *used* and *tested* much on such, > so far. Things are designed to work well for modern archs. > >> If yes, why is the old reload being deprecated? You can't have it both >> ways. Unless you want to obsolete all "not modern mainstream architectures" >> in GCC, it doesn't make sense to get rid of core functionality used by those >> architectures. >> >> Indirect addressing is a key feature in size-optimized code. > > That doesn't mean that LRA has to support it, btw, not necessarily; it > may well be possible to do a good job of this in the later passes? > Maybe postreload, maybe some peepholes, etc.? Possibly. But as Vladimir points out, indirect addressing affects register allocation (reducing register pressure). In older architectures that implement indirect addressing, that is one of the key ways in which the feature reduces code size. While I can see how peephole optimization can convert a address load plus a register indirect into a memory indirect instruction, does that help the register become available for other uses or is post-LRA too late for that? My impression is that it is too late, since at this point we're dealing with hard registers and making one free via peephole helps no one else. paul
Re: Indirect memory addresses vs. lra
On Thu, Aug 08, 2019 at 12:43:52PM -0400, Paul Koning wrote: > > On Aug 8, 2019, at 12:25 PM, Vladimir Makarov wrote: > > The old reload (reload[1].c) supports such addressing. As modern > > mainstream architectures have no this kind of addressing, it was not > > implemented in LRA. > > Is LRA only intended for "modern mainstream architectures"? I sure hope not! But it has only been *used* and *tested* much on such, so far. Things are designed to work well for modern archs. > If yes, why is the old reload being deprecated? You can't have it both ways. > Unless you want to obsolete all "not modern mainstream architectures" in > GCC, it doesn't make sense to get rid of core functionality used by those > architectures. > > Indirect addressing is a key feature in size-optimized code. That doesn't mean that LRA has to support it, btw, not necessarily; it may well be possible to do a good job of this in the later passes? Maybe postreload, maybe some peepholes, etc.? Segher
Re: Indirect memory addresses vs. lra
> On Aug 8, 2019, at 12:25 PM, Vladimir Makarov wrote: > > > On 2019-08-04 3:18 p.m., John Darrington wrote: >> I'm trying to write a back-end for an architecture (s12z - the ISA you can >> download from [1]). This arch accepts indirect memory addresses. That is >> to >> say, those of the form (mem (mem (...))) and although my >> TARGET_LEGITIMATE_ADDRESS >> function returns true for such addresses, LRA insists on reloading them out >> of >> existence. >> ... > The old reload (reload[1].c) supports such addressing. As modern mainstream > architectures have no this kind of addressing, it was not implemented in LRA. Is LRA only intended for "modern mainstream architectures"? If yes, why is the old reload being deprecated? You can't have it both ways. Unless you want to obsolete all "not modern mainstream architectures" in GCC, it doesn't make sense to get rid of core functionality used by those architectures. Indirect addressing is a key feature in size-optimized code. paul
Re: Indirect memory addresses vs. lra
On 2019-08-04 3:18 p.m., John Darrington wrote: I'm trying to write a back-end for an architecture (s12z - the ISA you can download from [1]). This arch accepts indirect memory addresses. That is to say, those of the form (mem (mem (...))) and although my TARGET_LEGITIMATE_ADDRESS function returns true for such addresses, LRA insists on reloading them out of existence. For example, when compiling a code fragment: volatile unsigned char *led = 0x2F2; *led = 1; the ira dump file shows: (insn 7 6 8 2 (set (mem/f/c:PSI (reg/f:PSI 9 y) [3 led+0 S4 A8]) (const_int 754 [0x2f2])) "/home/jmd/MemMem/memmem.c":15:27 96 {movpsi} (nil)) (insn 8 7 14 2 (set (mem/v:QI (mem/f/c:PSI (reg/f:PSI 9 y) [3 led+0 S4 A8]) [0 *led_7+0 S1 A8]) (const_int 1 [0x1])) "/home/jmd/MemMem/memmem.c":16:8 98 {movqi} (nil)) which is a perfectly valid insn, and the most efficient assembler for it is: mov.p #0x2f2, y mov.b #1, [0,y] However the reload dump shows this has been changed to: (insn 7 6 22 2 (set (mem/f/c:PSI (reg/f:PSI 9 y) [3 led+0 S4 A8]) (const_int 754 [0x2f2])) "/home/jmd/MemMem/memmem.c":15:27 96 {movpsi} (nil)) (insn 22 7 8 2 (set (reg:PSI 8 x [22]) (mem/f/c:PSI (reg/f:PSI 9 y) [3 led+0 S4 A8])) "/home/jmd/MemMem/memmem.c":16:8 96 {movpsi} (nil)) (insn 8 22 14 2 (set (mem/v:QI (reg:PSI 8 x [22]) [0 *led_7+0 S1 A8]) (const_int 1 [0x1])) "/home/jmd/MemMem/memmem.c":16:8 98 {movqi} (nil)) and ends up as: mov.p #0x2f2, y mov.p (0,y) x mov.b #1, (0,x) So this wastes a register (which leads to other issues which I don't want to go into in this email). After a lot of debugging I tracked down the part of lra which is doing this reload to the function process_addr_reg at lra-constraints.c:1378 if (! REG_P (reg)) { if (check_only_p) return true; /* Always reload memory in an address even if the target supports such addresses. */ new_reg = lra_create_new_reg_with_unique_value (mode, reg, cl, "address"); before_p = true; } Changing this to if (! REG_P (reg)) { if (check_only_p) return true; return false; } solves my immediate problem. However I imagine there was a reason for doing this reload, and presumably a better way of avoiding it. Can someone explain the reason for this reload, and how I can best ensure that indirect memory operands are left in the compiled code? The old reload (reload[1].c) supports such addressing. As modern mainstream architectures have no this kind of addressing, it was not implemented in LRA. I don't think the above simple change will work fully. For example, you need to constrain memory nesting. The constraints should be described, may be some hooks should be implemented (may be not and TARGET_LEGITIMATE_ADDRESS will be enough), may be additional address anslysis and transformations should be implemented in LRA, etc. But may be implementing this is not hard either. It is also difficult for me to say is it worth to do. Removing such addressing helps to remove redundant memory reads. On the other hand, its usage can decrease #insns and save registers for better RA and utilize hardware on design of which a lot of efforts were spent. In any case, if somebody implements this, it can be included in LRA. [1] https://www.nxp.com/docs/en/reference-manual/S12ZCPU_RM_V1.pdf