RE: [PATCH] MIPS16/GCC: Optimise `__call_stub_fp_' call/return stubs

2014-12-01 Thread Maciej W. Rozycki
On Fri, 21 Nov 2014, Moore, Catherine wrote:

  gcc/
  * config/mips/mips.c (mips16_build_call_stub): Move the save of
  the return address in $18 ahead of passing arguments to FPRs.
  
Maciej
 
 This looks OK.  Please commit.

 Applied, thanks for your review.

  Maciej


RE: [PATCH] MIPS16/GCC: Optimise `__call_stub_fp_' call/return stubs

2014-11-21 Thread Moore, Catherine


 -Original Message-
 From: Rozycki, Maciej
 Sent: Wednesday, November 19, 2014 8:05 AM
 To: gcc-patches@gcc.gnu.org
 Cc: Moore, Catherine; Eric Christopher; Matthew Fortune
 Subject: [PATCH] MIPS16/GCC: Optimise `__call_stub_fp_' call/return stubs
 
 
 2014-11-19  Maciej W. Rozycki  ma...@codesourcery.com
 
   gcc/
   * config/mips/mips.c (mips16_build_call_stub): Move the save of
   the return address in $18 ahead of passing arguments to FPRs.
 
   Maciej

This looks OK.  Please commit.
 


[PATCH] MIPS16/GCC: Optimise `__call_stub_fp_' call/return stubs

2014-11-19 Thread Maciej W. Rozycki
Hi,

 It has come to my attention that we create suboptimal code for the 
`__call_stub_fp_' variant of the MIPS16 call stubs.  These stubs are 
used for outgoing calls made from MIPS16 code to standard MIPS code that 
return floating-point results and may also pass floating-point 
arguments.

 This can be illustrated with this small program:

$ cat foo.c
double bar (double d);

int main (int argc, char **argv)
{
  return bar (argc);
}
$ mips-linux-gnu-gcc -O2 -mips16 -c foo.c
$ mips-linux-gnu-objdump -dr foo.o

foo.o: file format elf32-tradbigmips

Disassembly of section .mips16.call.fp.bar:

 __call_stub_fp_bar:
   0:   44856000mtc1a1,$f12
   4:   44846800mtc1a0,$f13
   8:   03e09021moves2,ra
   c:   0c00jal 0 __call_stub_fp_bar
c: R_MIPS_26bar
  10:   nop
  14:   4403mfc1v1,$f0
  18:   0248jr  s2
  1c:   44020800mfc1v0,$f1

Disassembly of section .text.startup:

 main:
   0:   f100 64c4   save32,ra,s2
   4:   1800    jal 0 main
4: R_MIPS16_26  __mips16_floatsidf
   8:   6500nop
   a:   67a3movea1,v1
   c:   1800    jal 0 main
c: R_MIPS16_26  bar
  10:   6782movea0,v0
  12:   67a3movea1,v1
  14:   1800    jal 0 main
14: R_MIPS16_26 __mips16_fix_truncdfsi
  18:   6782movea0,v0
  1a:   f100 6444   restore 32,ra,s2
  1e:   e8a0jrc ra

-- as you can see in `__call_stub_fp_bar' above the jump delay slot 
remains empty because the move to $s2 instruction cannot be scheduled 
there due to a data dependency on $ra.

 However there is no need to save $ra last and since the MIPS IV ISA 
there are no coprocessor move delay slots so the last MTC1 instruction 
could be scheduled there instead. 

 So here is a change to avoid this empty delay slot.  With this change 
in place we get this code instead:

$ mips-linux-gnu-objdump -dr foo.o

foo.o: file format elf32-tradbigmips

Disassembly of section .mips16.call.fp.bar:

 __call_stub_fp_bar:
   0:   03e09021moves2,ra
   4:   44856000mtc1a1,$f12
   8:   0c00jal 0 __call_stub_fp_bar
8: R_MIPS_26bar
   c:   44846800mtc1a0,$f13
  10:   4403mfc1v1,$f0
  14:   0248jr  s2
  18:   44020800mfc1v0,$f1

Disassembly of section .text.startup:

 main:
   0:   f100 64c4   save32,ra,s2
   4:   1800    jal 0 main
4: R_MIPS16_26  __mips16_floatsidf
   8:   6500nop
   a:   67a3movea1,v1
   c:   1800    jal 0 main
c: R_MIPS16_26  bar
  10:   6782movea0,v0
  12:   67a3movea1,v1
  14:   1800    jal 0 main
14: R_MIPS16_26 __mips16_fix_truncdfsi
  18:   6782movea0,v0
  1a:   f100 6444   restore 32,ra,s2
  1e:   e8a0jrc ra

-- as you can see the last MTC1 instruction has now been placed in the 
delay slot.

 For ISAs that do have a coprocessor move delay slot there is no gain, 
but no loss either, both delay slots remain empty due to data 
dependencies (coprocessor move delay slots):

$ mips-linux-gnu-gcc -O2 -mips1 -mips16 -c foo.c
$ mips-linux-gnu-objdump -dr foo.o

foo.o: file format elf32-tradbigmips

Disassembly of section .mips16.call.fp.bar:

 __call_stub_fp_bar:
   0:   03e09021moves2,ra
   4:   44856000mtc1a1,$f12
   8:   44846800mtc1a0,$f13
   c:   0c00jal 0 __call_stub_fp_bar
c: R_MIPS_26bar
  10:   nop
  14:   4403mfc1v1,$f0
  18:   44020800mfc1v0,$f1
  1c:   0248jr  s2
  20:   nop

Disassembly of section .text.startup:

 main:
   0:   63fcaddiu   sp,-32
   2:   6772movev1,s2
   4:   6207sw  ra,28(sp)
   6:   1800    jal 0 main
6: R_MIPS16_26  __mips16_floatsidf
   a:   d306sw  v1,24(sp)
   c:   67a3movea1,v1
   e:   1800    jal 0 main
e: R_MIPS16_26  bar
  12:   6782movea0,v0
  14:   67a3movea1,v1
  16:   1800    jal 0 main
16: R_MIPS16_26 __mips16_fix_truncdfsi
  1a:   6782movea0,v0
  1c:   9606lw  a2,24(sp)
  1e:   9707lw  a3,28(sp)
  20:   6556moves2,a2
  22:   ef00jr  a3
  24:   6304addiu   sp,32
  26:   6500nop

-- though I think this consideration is actually academic as no