Re: [PATCH,rs6000] Optimize pcrel access of globals [ping]
Hi! On Wed, Dec 09, 2020 at 11:04:44AM -0600, acsaw...@linux.ibm.com wrote: > This patch implements a RTL pass that looks for pc-relative loads of the > address of an external variable using the PCREL_GOT relocation and a > single load or store that uses that external address. > --- a/gcc/config.gcc > +++ b/gcc/config.gcc > @@ -509,7 +509,7 @@ or1k*-*-*) > ;; > powerpc*-*-*) > cpu_type=rs6000 > - extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o > rs6000-call.o" > + extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o > rs6000-call.o pcrel-opt.o" Make this fit on its line? Just like extra_headers for example: > extra_headers="ppc-asm.h altivec.h htmintrin.h htmxlintrin.h" > extra_headers="${extra_headers} bmi2intrin.h bmiintrin.h" > extra_headers="${extra_headers} xmmintrin.h mm_malloc.h emmintrin.h" > +/* This file implements a RTL pass that looks for pc-relative loads of the > + address of an external variable using the PCREL_GOT relocation and a > single > + load that uses that external address. If that is found we create the > + PCREL_OPT relocation to possibly convert: > + > + pld addr_reg,var@pcrel@got > + > + > + > + lwz data_reg,0(addr_reg) > + > + into: > + > + plwz data_reg,var@pcrel > + > + > + > + nop The nop first seems much simpler, but you cannot replace a 4-byte insn with an 8-byte one (without huge effort). Pity. Maybe mention that somewhere? > + If the variable is not defined in the main program or the code using it is > + not in the main program, the linker puts the address in the .got section > and > + generates: > + > + .section .got > + .Lvar_got: > + .dword var > + > + .section .text > + pld addr_reg,.Lvar_got@pcrel > + > + > + > + lwz data_reg,0(addr_reg) What is the advantage of this, over what we started with? > + We look for a single usage in the basic block where the external > + address is loaded. Multiple uses or references in another basic block > will > + force us to not use the PCREL_OPT relocation. > + > + We also optimize stores to the address of an external variable using the > + PCREL_GOT relocation and a single store that uses that external address. > If > + that is found we create the PCREL_OPT relocation to possibly convert: > + > + pld addr_reg,var@pcrel@got > + > + > + > + stw data_reg,0(addr_reg) > + > + into: > + > + pstw data_reg,var@pcrel > + > + > + > + nop > + > + If the variable is not defined in the main program or the code using it is > + not in the main program, the linker put the address in the .got section > and > + do: "puts and does"? Or "will put and will do"? > + .section .got > + .Lvar_got: > + .dword var > + > + .section .text > + pld addr_reg,.Lvar_got@pcrel > + > + > + > + stw data_reg,0(addr_reg) > + > + We only look for a single usage in the basic block where the external > + address is loaded. Multiple uses or references in another basic block > will > + force us to not use the PCREL_OPT relocation. */ That sounds like it is a restriction, but you cannot do better at all, not without communicating a lot more info to the linker anyway. > +#include "config.h" > +#include "system.h" > +#include "coretypes.h" > +#include "backend.h" > +#include "rtl.h" > +#include "tree.h" > +#include "memmodel.h" > +#include "expmed.h" > +#include "optabs.h" > +#include "recog.h" > +#include "df.h" > +#include "tm_p.h" > +#include "ira.h" > +#include "print-tree.h" > +#include "varasm.h" > +#include "explow.h" > +#include "expr.h" > +#include "output.h" > +#include "tree-pass.h" > +#include "rtx-vector-builder.h" > +#include "print-rtl.h" > +#include "insn-attr.h" > +#include "insn-codes.h" Do you need all these header files? It looks quite reduced, and in the right order, but did you check :-) > +/* Various counters. */ > +static struct { > + unsigned long extern_addrs; > + unsigned long loads; > + unsigned long adjacent_loads; > + unsigned long failed_loads; > + unsigned long stores; > + unsigned long adjacent_stores; > + unsigned long failed_stores; > +} counters; There is the whole statistics.[ch] you could use for this. I've never used it, no idea if there are actual advantages to it :-) > +/* Return a marker to identify the PCREL_OPT load address and > + load/store instruction. We use a unique integer which is appended > + to ".Lpcrel" to make the label. */ > + > +static rtx > +pcrel_opt_next_marker (void) > +{ > + static unsigned int pcrel_opt_next_num; > + > + pcrel_opt_next_num++; > + return GEN_INT (pcrel_opt_next_num); > +} You could just open-code it the whole two places you use this, that is more straightforward. > +/* Optimize a PC-relative load address to be used in a load. > + > + If the
Re: [PATCH,rs6000] Optimize pcrel access of globals [ping]
Ping. Aaron Sawdey, Ph.D. saw...@linux.ibm.com IBM Linux on POWER Toolchain > On Dec 9, 2020, at 11:04 AM, acsaw...@linux.ibm.com wrote: > > From: Aaron Sawdey > > Ping. I've folded in the changes to comments suggested by Will Schmidt. > > This patch implements a RTL pass that looks for pc-relative loads of the > address of an external variable using the PCREL_GOT relocation and a > single load or store that uses that external address. > > Produced by a cast of thousands: > * Michael Meissner > * Peter Bergner > * Bill Schmidt > * Alan Modra > * Segher Boessenkool > * Aaron Sawdey > > Passes bootstrap/regtest on ppc64le power10. Should have no effect on > other processors. OK for trunk? > > Thanks! > Aaron > > gcc/ChangeLog: > > * config.gcc: Add pcrel-opt.c and pcrel-opt.o. > * config/rs6000/pcrel-opt.c: New file. > * config/rs6000/pcrel-opt.md: New file. > * config/rs6000/predicates.md: Add d_form_memory predicate. > * config/rs6000/rs6000-cpus.def: Add OPTION_MASK_PCREL_OPT. > * config/rs6000/rs6000-passes.def: Add pass_pcrel_opt. > * config/rs6000/rs6000-protos.h: Add reg_to_non_prefixed(), > offsettable_non_prefixed_memory(), output_pcrel_opt_reloc(), > and make_pass_pcrel_opt(). > * config/rs6000/rs6000.c (reg_to_non_prefixed): Make global. > (rs6000_option_override_internal): Add pcrel-opt. > (rs6000_delegitimize_address): Support pcrel-opt. > (rs6000_opt_masks): Add pcrel-opt. > (offsettable_non_prefixed_memory): New function. > (reg_to_non_prefixed): Make global. > (rs6000_asm_output_opcode): Reset next_insn_prefixed_p. > (output_pcrel_opt_reloc): New function. > * config/rs6000/rs6000.md (loads_extern_addr): New attr. > (pcrel_extern_addr): Set loads_extern_addr. > Add include for pcrel-opt.md. > * config/rs6000/rs6000.opt: Add -mpcrel-opt. > * config/rs6000/t-rs6000: Add rules for pcrel-opt.c and > pcrel-opt.md. > > gcc/testsuite/ChangeLog: > > * gcc.target/powerpc/pcrel-opt-inc-di.c: New test. > * gcc.target/powerpc/pcrel-opt-ld-df.c: New test. > * gcc.target/powerpc/pcrel-opt-ld-di.c: New test. > * gcc.target/powerpc/pcrel-opt-ld-hi.c: New test. > * gcc.target/powerpc/pcrel-opt-ld-qi.c: New test. > * gcc.target/powerpc/pcrel-opt-ld-sf.c: New test. > * gcc.target/powerpc/pcrel-opt-ld-si.c: New test. > * gcc.target/powerpc/pcrel-opt-ld-vector.c: New test. > * gcc.target/powerpc/pcrel-opt-st-df.c: New test. > * gcc.target/powerpc/pcrel-opt-st-di.c: New test. > * gcc.target/powerpc/pcrel-opt-st-hi.c: New test. > * gcc.target/powerpc/pcrel-opt-st-qi.c: New test. > * gcc.target/powerpc/pcrel-opt-st-sf.c: New test. > * gcc.target/powerpc/pcrel-opt-st-si.c: New test. > * gcc.target/powerpc/pcrel-opt-st-vector.c: New test. > --- > gcc/config.gcc| 6 +- > gcc/config/rs6000/pcrel-opt.c | 888 ++ > gcc/config/rs6000/pcrel-opt.md| 386 > gcc/config/rs6000/predicates.md | 23 + > gcc/config/rs6000/rs6000-cpus.def | 2 + > gcc/config/rs6000/rs6000-passes.def | 8 + > gcc/config/rs6000/rs6000-protos.h | 4 + > gcc/config/rs6000/rs6000.c| 116 ++- > gcc/config/rs6000/rs6000.md | 8 +- > gcc/config/rs6000/rs6000.opt | 4 + > gcc/config/rs6000/t-rs6000| 7 +- > .../gcc.target/powerpc/pcrel-opt-inc-di.c | 18 + > .../gcc.target/powerpc/pcrel-opt-ld-df.c | 36 + > .../gcc.target/powerpc/pcrel-opt-ld-di.c | 43 + > .../gcc.target/powerpc/pcrel-opt-ld-hi.c | 42 + > .../gcc.target/powerpc/pcrel-opt-ld-qi.c | 42 + > .../gcc.target/powerpc/pcrel-opt-ld-sf.c | 42 + > .../gcc.target/powerpc/pcrel-opt-ld-si.c | 41 + > .../gcc.target/powerpc/pcrel-opt-ld-vector.c | 36 + > .../gcc.target/powerpc/pcrel-opt-st-df.c | 36 + > .../gcc.target/powerpc/pcrel-opt-st-di.c | 37 + > .../gcc.target/powerpc/pcrel-opt-st-hi.c | 42 + > .../gcc.target/powerpc/pcrel-opt-st-qi.c | 42 + > .../gcc.target/powerpc/pcrel-opt-st-sf.c | 36 + > .../gcc.target/powerpc/pcrel-opt-st-si.c | 41 + > .../gcc.target/powerpc/pcrel-opt-st-vector.c | 36 + > 26 files changed, 2013 insertions(+), 9 deletions(-) > create mode 100644 gcc/config/rs6000/pcrel-opt.c > create mode 100644 gcc/config/rs6000/pcrel-opt.md > create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-inc-di.c > create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-df.c > create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-di.c > create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-hi.c > create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-qi.c > create mode 100644
[PATCH,rs6000] Optimize pcrel access of globals [ping]
From: Aaron Sawdey Ping. I've folded in the changes to comments suggested by Will Schmidt. This patch implements a RTL pass that looks for pc-relative loads of the address of an external variable using the PCREL_GOT relocation and a single load or store that uses that external address. Produced by a cast of thousands: * Michael Meissner * Peter Bergner * Bill Schmidt * Alan Modra * Segher Boessenkool * Aaron Sawdey Passes bootstrap/regtest on ppc64le power10. Should have no effect on other processors. OK for trunk? Thanks! Aaron gcc/ChangeLog: * config.gcc: Add pcrel-opt.c and pcrel-opt.o. * config/rs6000/pcrel-opt.c: New file. * config/rs6000/pcrel-opt.md: New file. * config/rs6000/predicates.md: Add d_form_memory predicate. * config/rs6000/rs6000-cpus.def: Add OPTION_MASK_PCREL_OPT. * config/rs6000/rs6000-passes.def: Add pass_pcrel_opt. * config/rs6000/rs6000-protos.h: Add reg_to_non_prefixed(), offsettable_non_prefixed_memory(), output_pcrel_opt_reloc(), and make_pass_pcrel_opt(). * config/rs6000/rs6000.c (reg_to_non_prefixed): Make global. (rs6000_option_override_internal): Add pcrel-opt. (rs6000_delegitimize_address): Support pcrel-opt. (rs6000_opt_masks): Add pcrel-opt. (offsettable_non_prefixed_memory): New function. (reg_to_non_prefixed): Make global. (rs6000_asm_output_opcode): Reset next_insn_prefixed_p. (output_pcrel_opt_reloc): New function. * config/rs6000/rs6000.md (loads_extern_addr): New attr. (pcrel_extern_addr): Set loads_extern_addr. Add include for pcrel-opt.md. * config/rs6000/rs6000.opt: Add -mpcrel-opt. * config/rs6000/t-rs6000: Add rules for pcrel-opt.c and pcrel-opt.md. gcc/testsuite/ChangeLog: * gcc.target/powerpc/pcrel-opt-inc-di.c: New test. * gcc.target/powerpc/pcrel-opt-ld-df.c: New test. * gcc.target/powerpc/pcrel-opt-ld-di.c: New test. * gcc.target/powerpc/pcrel-opt-ld-hi.c: New test. * gcc.target/powerpc/pcrel-opt-ld-qi.c: New test. * gcc.target/powerpc/pcrel-opt-ld-sf.c: New test. * gcc.target/powerpc/pcrel-opt-ld-si.c: New test. * gcc.target/powerpc/pcrel-opt-ld-vector.c: New test. * gcc.target/powerpc/pcrel-opt-st-df.c: New test. * gcc.target/powerpc/pcrel-opt-st-di.c: New test. * gcc.target/powerpc/pcrel-opt-st-hi.c: New test. * gcc.target/powerpc/pcrel-opt-st-qi.c: New test. * gcc.target/powerpc/pcrel-opt-st-sf.c: New test. * gcc.target/powerpc/pcrel-opt-st-si.c: New test. * gcc.target/powerpc/pcrel-opt-st-vector.c: New test. --- gcc/config.gcc| 6 +- gcc/config/rs6000/pcrel-opt.c | 888 ++ gcc/config/rs6000/pcrel-opt.md| 386 gcc/config/rs6000/predicates.md | 23 + gcc/config/rs6000/rs6000-cpus.def | 2 + gcc/config/rs6000/rs6000-passes.def | 8 + gcc/config/rs6000/rs6000-protos.h | 4 + gcc/config/rs6000/rs6000.c| 116 ++- gcc/config/rs6000/rs6000.md | 8 +- gcc/config/rs6000/rs6000.opt | 4 + gcc/config/rs6000/t-rs6000| 7 +- .../gcc.target/powerpc/pcrel-opt-inc-di.c | 18 + .../gcc.target/powerpc/pcrel-opt-ld-df.c | 36 + .../gcc.target/powerpc/pcrel-opt-ld-di.c | 43 + .../gcc.target/powerpc/pcrel-opt-ld-hi.c | 42 + .../gcc.target/powerpc/pcrel-opt-ld-qi.c | 42 + .../gcc.target/powerpc/pcrel-opt-ld-sf.c | 42 + .../gcc.target/powerpc/pcrel-opt-ld-si.c | 41 + .../gcc.target/powerpc/pcrel-opt-ld-vector.c | 36 + .../gcc.target/powerpc/pcrel-opt-st-df.c | 36 + .../gcc.target/powerpc/pcrel-opt-st-di.c | 37 + .../gcc.target/powerpc/pcrel-opt-st-hi.c | 42 + .../gcc.target/powerpc/pcrel-opt-st-qi.c | 42 + .../gcc.target/powerpc/pcrel-opt-st-sf.c | 36 + .../gcc.target/powerpc/pcrel-opt-st-si.c | 41 + .../gcc.target/powerpc/pcrel-opt-st-vector.c | 36 + 26 files changed, 2013 insertions(+), 9 deletions(-) create mode 100644 gcc/config/rs6000/pcrel-opt.c create mode 100644 gcc/config/rs6000/pcrel-opt.md create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-inc-di.c create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-df.c create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-di.c create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-hi.c create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-qi.c create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-sf.c create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-si.c create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-vector.c create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-df.c create