Re: [PATCH,rs6000] Optimize pcrel access of globals [ping]

2021-02-11 Thread Segher Boessenkool
Hi!

On Wed, Dec 09, 2020 at 11:04:44AM -0600, acsaw...@linux.ibm.com wrote:
> This patch implements a RTL pass that looks for pc-relative loads of the
> address of an external variable using the PCREL_GOT relocation and a
> single load or store that uses that external address.

> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -509,7 +509,7 @@ or1k*-*-*)
>   ;;
>  powerpc*-*-*)
>   cpu_type=rs6000
> - extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o 
> rs6000-call.o"
> + extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o 
> rs6000-call.o pcrel-opt.o"

Make this fit on its line?  Just like extra_headers for example:

>   extra_headers="ppc-asm.h altivec.h htmintrin.h htmxlintrin.h"
>   extra_headers="${extra_headers} bmi2intrin.h bmiintrin.h"
>   extra_headers="${extra_headers} xmmintrin.h mm_malloc.h emmintrin.h"


> +/* This file implements a RTL pass that looks for pc-relative loads of the
> +   address of an external variable using the PCREL_GOT relocation and a 
> single
> +   load that uses that external address.  If that is found we create the
> +   PCREL_OPT relocation to possibly convert:
> +
> + pld addr_reg,var@pcrel@got
> +
> + 
> +
> + lwz data_reg,0(addr_reg)
> +
> +   into:
> +
> + plwz data_reg,var@pcrel
> +
> + 
> +
> + nop

The nop first seems much simpler, but you cannot replace a 4-byte insn
with an 8-byte one (without huge effort).  Pity.  Maybe mention that
somewhere?

> +   If the variable is not defined in the main program or the code using it is
> +   not in the main program, the linker puts the address in the .got section 
> and
> +   generates:
> +
> + .section .got
> + .Lvar_got:
> + .dword var
> +
> + .section .text
> + pld addr_reg,.Lvar_got@pcrel
> +
> + 
> +
> + lwz data_reg,0(addr_reg)

What is the advantage of this, over what we started with?

> +   We look for a single usage in the basic block where the external
> +   address is loaded.  Multiple uses or references in another basic block 
> will
> +   force us to not use the PCREL_OPT relocation.
> +
> +   We also optimize stores to the address of an external variable using the
> +   PCREL_GOT relocation and a single store that uses that external address.  
> If
> +   that is found we create the PCREL_OPT relocation to possibly convert:
> +
> + pld addr_reg,var@pcrel@got
> +
> + 
> +
> + stw data_reg,0(addr_reg)
> +
> +   into:
> +
> + pstw data_reg,var@pcrel
> +
> + 
> +
> + nop
> +
> +   If the variable is not defined in the main program or the code using it is
> +   not in the main program, the linker put the address in the .got section 
> and
> +   do:

"puts and does"?  Or "will put and will do"?

> + .section .got
> + .Lvar_got:
> + .dword var
> +
> + .section .text
> + pld addr_reg,.Lvar_got@pcrel
> +
> + 
> +
> + stw data_reg,0(addr_reg)
> +
> +   We only look for a single usage in the basic block where the external
> +   address is loaded.  Multiple uses or references in another basic block 
> will
> +   force us to not use the PCREL_OPT relocation.  */

That sounds like it is a restriction, but you cannot do better at all,
not without communicating a lot more info to the linker anyway.

> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "backend.h"
> +#include "rtl.h"
> +#include "tree.h"
> +#include "memmodel.h"
> +#include "expmed.h"
> +#include "optabs.h"
> +#include "recog.h"
> +#include "df.h"
> +#include "tm_p.h"
> +#include "ira.h"
> +#include "print-tree.h"
> +#include "varasm.h"
> +#include "explow.h"
> +#include "expr.h"
> +#include "output.h"
> +#include "tree-pass.h"
> +#include "rtx-vector-builder.h"
> +#include "print-rtl.h"
> +#include "insn-attr.h"
> +#include "insn-codes.h"

Do you need all these header files?  It looks quite reduced, and in the
right order, but did you check :-)

> +/* Various counters.  */
> +static struct {
> +  unsigned long extern_addrs;
> +  unsigned long loads;
> +  unsigned long adjacent_loads;
> +  unsigned long failed_loads;
> +  unsigned long stores;
> +  unsigned long adjacent_stores;
> +  unsigned long failed_stores;
> +} counters;

There is the whole statistics.[ch] you could use for this.  I've never
used it, no idea if there are actual advantages to it :-)

> +/* Return a marker to identify the PCREL_OPT load address and
> +   load/store instruction.  We use a unique integer which is appended
> +   to ".Lpcrel" to make the label.  */
> +
> +static rtx
> +pcrel_opt_next_marker (void)
> +{
> +  static unsigned int pcrel_opt_next_num;
> +
> +  pcrel_opt_next_num++;
> +  return GEN_INT (pcrel_opt_next_num);
> +}

You could just open-code it the whole two places you use this, that is
more straightforward.

> +/* Optimize a PC-relative load address to be used in a load.
> +
> +   If the 

Re: [PATCH,rs6000] Optimize pcrel access of globals [ping]

2021-01-18 Thread Aaron Sawdey via Gcc-patches
Ping.

Aaron Sawdey, Ph.D. saw...@linux.ibm.com
IBM Linux on POWER Toolchain
 

> On Dec 9, 2020, at 11:04 AM, acsaw...@linux.ibm.com wrote:
> 
> From: Aaron Sawdey 
> 
> Ping. I've folded in the changes to comments suggested by Will Schmidt.
> 
> This patch implements a RTL pass that looks for pc-relative loads of the
> address of an external variable using the PCREL_GOT relocation and a
> single load or store that uses that external address.
> 
> Produced by a cast of thousands:
> * Michael Meissner
> * Peter Bergner
> * Bill Schmidt
> * Alan Modra
> * Segher Boessenkool
> * Aaron Sawdey
> 
> Passes bootstrap/regtest on ppc64le power10. Should have no effect on
> other processors. OK for trunk?
> 
> Thanks!
>   Aaron
> 
> gcc/ChangeLog:
> 
>   * config.gcc: Add pcrel-opt.c and pcrel-opt.o.
>   * config/rs6000/pcrel-opt.c: New file.
>   * config/rs6000/pcrel-opt.md: New file.
>   * config/rs6000/predicates.md: Add d_form_memory predicate.
>   * config/rs6000/rs6000-cpus.def: Add OPTION_MASK_PCREL_OPT.
>   * config/rs6000/rs6000-passes.def: Add pass_pcrel_opt.
>   * config/rs6000/rs6000-protos.h: Add reg_to_non_prefixed(),
>   offsettable_non_prefixed_memory(), output_pcrel_opt_reloc(),
>   and make_pass_pcrel_opt().
>   * config/rs6000/rs6000.c (reg_to_non_prefixed): Make global.
>   (rs6000_option_override_internal): Add pcrel-opt.
>   (rs6000_delegitimize_address): Support pcrel-opt.
>   (rs6000_opt_masks): Add pcrel-opt.
>   (offsettable_non_prefixed_memory): New function.
>   (reg_to_non_prefixed): Make global.
>   (rs6000_asm_output_opcode): Reset next_insn_prefixed_p.
>   (output_pcrel_opt_reloc): New function.
>   * config/rs6000/rs6000.md (loads_extern_addr): New attr.
>   (pcrel_extern_addr): Set loads_extern_addr.
>   Add include for pcrel-opt.md.
>   * config/rs6000/rs6000.opt: Add -mpcrel-opt.
>   * config/rs6000/t-rs6000: Add rules for pcrel-opt.c and
>   pcrel-opt.md.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/powerpc/pcrel-opt-inc-di.c: New test.
>   * gcc.target/powerpc/pcrel-opt-ld-df.c: New test.
>   * gcc.target/powerpc/pcrel-opt-ld-di.c: New test.
>   * gcc.target/powerpc/pcrel-opt-ld-hi.c: New test.
>   * gcc.target/powerpc/pcrel-opt-ld-qi.c: New test.
>   * gcc.target/powerpc/pcrel-opt-ld-sf.c: New test.
>   * gcc.target/powerpc/pcrel-opt-ld-si.c: New test.
>   * gcc.target/powerpc/pcrel-opt-ld-vector.c: New test.
>   * gcc.target/powerpc/pcrel-opt-st-df.c: New test.
>   * gcc.target/powerpc/pcrel-opt-st-di.c: New test.
>   * gcc.target/powerpc/pcrel-opt-st-hi.c: New test.
>   * gcc.target/powerpc/pcrel-opt-st-qi.c: New test.
>   * gcc.target/powerpc/pcrel-opt-st-sf.c: New test.
>   * gcc.target/powerpc/pcrel-opt-st-si.c: New test.
>   * gcc.target/powerpc/pcrel-opt-st-vector.c: New test.
> ---
> gcc/config.gcc|   6 +-
> gcc/config/rs6000/pcrel-opt.c | 888 ++
> gcc/config/rs6000/pcrel-opt.md| 386 
> gcc/config/rs6000/predicates.md   |  23 +
> gcc/config/rs6000/rs6000-cpus.def |   2 +
> gcc/config/rs6000/rs6000-passes.def   |   8 +
> gcc/config/rs6000/rs6000-protos.h |   4 +
> gcc/config/rs6000/rs6000.c| 116 ++-
> gcc/config/rs6000/rs6000.md   |   8 +-
> gcc/config/rs6000/rs6000.opt  |   4 +
> gcc/config/rs6000/t-rs6000|   7 +-
> .../gcc.target/powerpc/pcrel-opt-inc-di.c |  18 +
> .../gcc.target/powerpc/pcrel-opt-ld-df.c  |  36 +
> .../gcc.target/powerpc/pcrel-opt-ld-di.c  |  43 +
> .../gcc.target/powerpc/pcrel-opt-ld-hi.c  |  42 +
> .../gcc.target/powerpc/pcrel-opt-ld-qi.c  |  42 +
> .../gcc.target/powerpc/pcrel-opt-ld-sf.c  |  42 +
> .../gcc.target/powerpc/pcrel-opt-ld-si.c  |  41 +
> .../gcc.target/powerpc/pcrel-opt-ld-vector.c  |  36 +
> .../gcc.target/powerpc/pcrel-opt-st-df.c  |  36 +
> .../gcc.target/powerpc/pcrel-opt-st-di.c  |  37 +
> .../gcc.target/powerpc/pcrel-opt-st-hi.c  |  42 +
> .../gcc.target/powerpc/pcrel-opt-st-qi.c  |  42 +
> .../gcc.target/powerpc/pcrel-opt-st-sf.c  |  36 +
> .../gcc.target/powerpc/pcrel-opt-st-si.c  |  41 +
> .../gcc.target/powerpc/pcrel-opt-st-vector.c  |  36 +
> 26 files changed, 2013 insertions(+), 9 deletions(-)
> create mode 100644 gcc/config/rs6000/pcrel-opt.c
> create mode 100644 gcc/config/rs6000/pcrel-opt.md
> create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-inc-di.c
> create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-df.c
> create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-di.c
> create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-hi.c
> create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-qi.c
> create mode 100644 

[PATCH,rs6000] Optimize pcrel access of globals [ping]

2020-12-09 Thread acsawdey--- via Gcc-patches
From: Aaron Sawdey 

Ping. I've folded in the changes to comments suggested by Will Schmidt.

This patch implements a RTL pass that looks for pc-relative loads of the
address of an external variable using the PCREL_GOT relocation and a
single load or store that uses that external address.

Produced by a cast of thousands:
 * Michael Meissner
 * Peter Bergner
 * Bill Schmidt
 * Alan Modra
 * Segher Boessenkool
 * Aaron Sawdey

Passes bootstrap/regtest on ppc64le power10. Should have no effect on
other processors. OK for trunk?

Thanks!
   Aaron

gcc/ChangeLog:

* config.gcc: Add pcrel-opt.c and pcrel-opt.o.
* config/rs6000/pcrel-opt.c: New file.
* config/rs6000/pcrel-opt.md: New file.
* config/rs6000/predicates.md: Add d_form_memory predicate.
* config/rs6000/rs6000-cpus.def: Add OPTION_MASK_PCREL_OPT.
* config/rs6000/rs6000-passes.def: Add pass_pcrel_opt.
* config/rs6000/rs6000-protos.h: Add reg_to_non_prefixed(),
offsettable_non_prefixed_memory(), output_pcrel_opt_reloc(),
and make_pass_pcrel_opt().
* config/rs6000/rs6000.c (reg_to_non_prefixed): Make global.
(rs6000_option_override_internal): Add pcrel-opt.
(rs6000_delegitimize_address): Support pcrel-opt.
(rs6000_opt_masks): Add pcrel-opt.
(offsettable_non_prefixed_memory): New function.
(reg_to_non_prefixed): Make global.
(rs6000_asm_output_opcode): Reset next_insn_prefixed_p.
(output_pcrel_opt_reloc): New function.
* config/rs6000/rs6000.md (loads_extern_addr): New attr.
(pcrel_extern_addr): Set loads_extern_addr.
Add include for pcrel-opt.md.
* config/rs6000/rs6000.opt: Add -mpcrel-opt.
* config/rs6000/t-rs6000: Add rules for pcrel-opt.c and
pcrel-opt.md.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pcrel-opt-inc-di.c: New test.
* gcc.target/powerpc/pcrel-opt-ld-df.c: New test.
* gcc.target/powerpc/pcrel-opt-ld-di.c: New test.
* gcc.target/powerpc/pcrel-opt-ld-hi.c: New test.
* gcc.target/powerpc/pcrel-opt-ld-qi.c: New test.
* gcc.target/powerpc/pcrel-opt-ld-sf.c: New test.
* gcc.target/powerpc/pcrel-opt-ld-si.c: New test.
* gcc.target/powerpc/pcrel-opt-ld-vector.c: New test.
* gcc.target/powerpc/pcrel-opt-st-df.c: New test.
* gcc.target/powerpc/pcrel-opt-st-di.c: New test.
* gcc.target/powerpc/pcrel-opt-st-hi.c: New test.
* gcc.target/powerpc/pcrel-opt-st-qi.c: New test.
* gcc.target/powerpc/pcrel-opt-st-sf.c: New test.
* gcc.target/powerpc/pcrel-opt-st-si.c: New test.
* gcc.target/powerpc/pcrel-opt-st-vector.c: New test.
---
 gcc/config.gcc|   6 +-
 gcc/config/rs6000/pcrel-opt.c | 888 ++
 gcc/config/rs6000/pcrel-opt.md| 386 
 gcc/config/rs6000/predicates.md   |  23 +
 gcc/config/rs6000/rs6000-cpus.def |   2 +
 gcc/config/rs6000/rs6000-passes.def   |   8 +
 gcc/config/rs6000/rs6000-protos.h |   4 +
 gcc/config/rs6000/rs6000.c| 116 ++-
 gcc/config/rs6000/rs6000.md   |   8 +-
 gcc/config/rs6000/rs6000.opt  |   4 +
 gcc/config/rs6000/t-rs6000|   7 +-
 .../gcc.target/powerpc/pcrel-opt-inc-di.c |  18 +
 .../gcc.target/powerpc/pcrel-opt-ld-df.c  |  36 +
 .../gcc.target/powerpc/pcrel-opt-ld-di.c  |  43 +
 .../gcc.target/powerpc/pcrel-opt-ld-hi.c  |  42 +
 .../gcc.target/powerpc/pcrel-opt-ld-qi.c  |  42 +
 .../gcc.target/powerpc/pcrel-opt-ld-sf.c  |  42 +
 .../gcc.target/powerpc/pcrel-opt-ld-si.c  |  41 +
 .../gcc.target/powerpc/pcrel-opt-ld-vector.c  |  36 +
 .../gcc.target/powerpc/pcrel-opt-st-df.c  |  36 +
 .../gcc.target/powerpc/pcrel-opt-st-di.c  |  37 +
 .../gcc.target/powerpc/pcrel-opt-st-hi.c  |  42 +
 .../gcc.target/powerpc/pcrel-opt-st-qi.c  |  42 +
 .../gcc.target/powerpc/pcrel-opt-st-sf.c  |  36 +
 .../gcc.target/powerpc/pcrel-opt-st-si.c  |  41 +
 .../gcc.target/powerpc/pcrel-opt-st-vector.c  |  36 +
 26 files changed, 2013 insertions(+), 9 deletions(-)
 create mode 100644 gcc/config/rs6000/pcrel-opt.c
 create mode 100644 gcc/config/rs6000/pcrel-opt.md
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-inc-di.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-df.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-di.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-hi.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-qi.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-sf.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-si.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-vector.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-df.c
 create