[PATCH 1/3] power10: Add PCREL_OPT load support.

2020-09-04 Thread Michael Meissner via Gcc-patches
power10: Add PCREL_OPT load support.

This patch adds support for optimizing power10 loads of an external variable to
eliminate loading the address of the variable, and then doing a subsequent load
using that address.

The next patch will add the support for optimizing power10 stores to an
external variable.  The third patch will add the test suite for these patches.

I have built compilers with and without these set of 3 patches doing a
bootstrap build and make check.  There were no regressions, and the new tests
passed.  Can I check these patches into the master branch for GCC?  Because
this is new functionality, I do not intend to back port these patches to GCC 10
at this time.

gcc/
2020-09-05  Michael Meissner  

* config.gcc (powerpc*-*-*): Add pcrel-opt.o.
(rs6000*-*-*): Add pcrel-opt.o.
* config/rs6000/pcrel-opt.c: New file.
* config/rs6000/pcrel-opt.md: New file.
* config/rs6000/predicates.md (d_form_memory): New predicate.
* config/rs6000/rs6000-cpus.def (OTHER_POWER10_MASKS): Add
-mpcrel-opt.
(POWERPC_MASKS): Add -mpcrel-opt.
* config/rs6000/rs6000-passes.def: Add comment for existing power8
swaps pass.  Add PCREL_OPT pass.
* config/rs6000/rs6000-protos.h (reg_to_non_prefixed): New
declaration.
(offsettable_non_prefixed_memory): New declaration.
(output_pcrel_opt_reloc): New declaration.
(make_pass_pcrel_opt): New declaration.
* config/rs6000/rs6000.c (reg_to_non_prefixed): Make function
globally visible.
(rs6000_option_override_internal): Add support for -mpcrel-opt.
(rs6000_delegitimize_address): Add support for the PCREL_OPT
addresses.
(rs6000_opt_masks): Add -mpcrel-opt.
(offsettable_non_prefixed_memory): New helper function.
(rs6000_asm_output_opcode): Reset prefixed flag after first use.
(output_pcrel_opt_reloc): New function.
* config/rs6000/rs6000.md (loads_extern_addr): New insn
attribute.
(pcrel_extern_addr): Set loads_extern_addr attribute.
(toplevel): Include pcrel-opt.md.
* config/rs6000/rs6000.opt (-mpcrel-opt): New option.
* config/rs6000/t-rs6000 (pcrel-opt.o): Add build rules.
(MD_INCLUDES): Add pcrel-opt.md
---
 gcc/config.gcc  |   6 +-
 gcc/config/rs6000/pcrel-opt.c   | 590 
 gcc/config/rs6000/pcrel-opt.md  | 256 
 gcc/config/rs6000/predicates.md |  23 ++
 gcc/config/rs6000/rs6000-cpus.def   |   2 +
 gcc/config/rs6000/rs6000-passes.def |   8 +
 gcc/config/rs6000/rs6000-protos.h   |   4 +
 gcc/config/rs6000/rs6000.c  | 105 -
 gcc/config/rs6000/rs6000.md |   8 +-
 gcc/config/rs6000/rs6000.opt|   4 +
 gcc/config/rs6000/t-rs6000  |   7 +-
 11 files changed, 1004 insertions(+), 9 deletions(-)
 create mode 100644 gcc/config/rs6000/pcrel-opt.c
 create mode 100644 gcc/config/rs6000/pcrel-opt.md

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 797f0ad5edd..c677e4145e3 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -505,7 +505,7 @@ or1k*-*-*)
;;
 powerpc*-*-*)
cpu_type=rs6000
-   extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o 
rs6000-call.o"
+   extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o 
rs6000-call.o pcrel-opt.o"
extra_headers="ppc-asm.h altivec.h htmintrin.h htmxlintrin.h"
extra_headers="${extra_headers} bmi2intrin.h bmiintrin.h"
extra_headers="${extra_headers} xmmintrin.h mm_malloc.h emmintrin.h"
@@ -520,6 +520,7 @@ powerpc*-*-*)
esac
extra_options="${extra_options} g.opt fused-madd.opt 
rs6000/rs6000-tables.opt"
target_gtfiles="$target_gtfiles \$(srcdir)/config/rs6000/rs6000-logue.c 
\$(srcdir)/config/rs6000/rs6000-call.c"
+   target_gtfiles="$target_gtfiles \$(srcdir)/config/rs6000/pcrel-opt.c"
;;
 pru-*-*)
cpu_type=pru
@@ -531,8 +532,9 @@ riscv*)
;;
 rs6000*-*-*)
extra_options="${extra_options} g.opt fused-madd.opt 
rs6000/rs6000-tables.opt"
-   extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o 
rs6000-call.o"
+   extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o 
rs6000-call.o pcrel-opt.o"
target_gtfiles="$target_gtfiles \$(srcdir)/config/rs6000/rs6000-logue.c 
\$(srcdir)/config/rs6000/rs6000-call.c"
+   target_gtfiles="$target_gtfiles \$(srcdir)/config/rs6000/pcrel-opt.c"
;;
 sparc*-*-*)
cpu_type=sparc
diff --git a/gcc/config/rs6000/pcrel-opt.c b/gcc/config/rs6000/pcrel-opt.c
new file mode 100644
index 000..f831853c90b
--- /dev/null
+++ b/gcc/config/rs6000/pcrel-opt.c
@@ -0,0 +1,590 @@
+/* Subroutines used support the pc-relative linker optimization.
+   Copyright (C) 2020 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of t

Re: [PATCH 1/3] Power10: Add PCREL_OPT load support

2020-09-03 Thread Michael Meissner via Gcc-patches
On Thu, Aug 20, 2020 at 09:09:40PM -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Tue, Aug 18, 2020 at 02:34:01AM -0400, Michael Meissner wrote:
> > +// Maximum number of insns to scan between the load address and the load 
> > that
> 
> Please don't mix /* and // comments.  Just stick to /* comments, like
> all the rest of our backend?
> 
> > +const int MAX_PCREL_OPT_INSNS  = 10;
> 
> "static const", and not tab please (just a space).

With the switch to using flow control, this will go away.

> > +  // LWA is a DS format instruction, but LWZ is a D format instruction.  
> > We use
> > +  // DImode for the mode to force checking whether the bottom 2 bits are 0.
> 
> How so?  Can't you just check if the resulting insn is accepted? That
> would be so much more robust (and can have a correct description more
> easily, too!)

Unfortunately no.  The prefixed form will be accepted, and the whole PCREL_OPT
optimization requires a non-prefixed instruction.

> > +  // However FPR and vector registers uses the LFIWAX instruction which is
> > +  // indexed only.
> 
> (vectors use lxsiwax)

Yes.

> > +  if (GET_CODE (mem) == SIGN_EXTEND && GET_MODE (XEXP (mem, 0)) == SImode)
> 
> You're checking here whether the address of the MEM is SImode.

Whoops.

> > +{
> > +  if (!INT_REGNO_P (reg_regno))
> 
> That should use base_reg_operand instead?
> 
> > +  // The optimization will only work on non-prefixed offsettable loads.
> > +  rtx addr = XEXP (mem_inner, 0);
> > +  enum insn_form iform = address_to_insn_form (addr, mem_mode, 
> > non_prefixed);
> > +  if (iform != INSN_FORM_BASE_REG
> > +  && iform != INSN_FORM_D
> > +  && iform != INSN_FORM_DS
> > +  && iform != INSN_FORM_DQ)
> > +return false;
> 
> Sounds like something for a utility function?  Do we use this elsewhere?

I will make a utility function.  But no, we currently do not use this anywhere
else.  It might be useful for fusion.

> > +  ++pcrel_opt_next_num;
> 
> Normal style is postfix increment.  Empty lines around this are nice, it
> indicates this is the point where we decided to do the thing.  Or add a
> comment even, maybe?
> 
> > +  unsigned int addr_regno = reg_or_subregno (addr_reg);
> > +  rtx label_num = GEN_INT (pcrel_opt_next_num);
> > +  rtx reg_di = gen_rtx_REG (DImode, reg_regno);
> > +
> > +  PATTERN (addr_insn)
> > += ((addr_regno != reg_regno)
> > +   ? gen_pcrel_opt_ld_addr (addr_reg, addr_symbol, label_num, reg_di)
> > +   : gen_pcrel_opt_ld_addr_same_reg (addr_reg, addr_symbol, 
> > label_num));
> 
> Please use if() instead of multi-line expressions.  The outer parens are
> unnecessary, too.

Ok.  However, the outer parens make it easier to format in emacs.

> > +  // Revalidate the insn, backing out of the optimization if the insn is 
> > not
> > +  // supported.
> 
> So the count will be incorrect then?

No, the count is correct, since we don't count the PCREL_OPTs until everything 
is validated.

> 
> > +  INSN_CODE (addr_insn) = recog (PATTERN (addr_insn), addr_insn, 0);
> > +  if (INSN_CODE (addr_insn) < 0)
> > +{
> > +  PATTERN (addr_insn) = addr_set;
> > +  INSN_CODE (addr_insn) = recog (PATTERN (addr_insn), addr_insn, 0);
> > +  return false;
> > +}
> 
> Can you use the normal undo mechanisms, instead?  apply_change_group
> etc.

I will think about it.

> > +static rtx_insn *
> > +next_active_insn_in_basic_block (rtx_insn *insn)
> > +{
> > +  insn = NEXT_INSN (insn);
> > +
> > +  while (insn != NULL_RTX)
> > +{
> > +  /* If the basic block ends or there is a jump of some kind, exit the
> > +loop.  */
> > +  if (CALL_P (insn)
> > + || JUMP_P (insn)
> > + || JUMP_TABLE_DATA_P (insn)
> > + || LABEL_P (insn)
> > + || BARRIER_P (insn))
> > +   return NULL;
> > +
> > +  /* If this is a real insn, return it.  */
> > +  if (!insn->deleted ()
> > + && NONJUMP_INSN_P (insn)
> > + && GET_CODE (PATTERN (insn)) != USE
> > + && GET_CODE (PATTERN (insn)) != CLOBBER)
> > +   return insn;
> > +
> > +  /* Loop for USE, CLOBBER, DEBUG_INSN, NOTEs.  */
> > +  insn = NEXT_INSN (insn);
> > +}
> > +
> > +  return NULL;
> > +}
> 
> There are things to do this.  Please don't write code manually parsing
> RTL unless you have to.

I will switch to using the DF flow in the next set of patches.

> > +// Validate that a load is actually a single instruction that can be 
> > optimized
> > +// with the PCREL_OPT optimization.
> > +
> > +static bool
> > +is_single_instruction (rtx_insn *insn, rtx reg)
> 
> Of course it is a single instruction!  A single RTL instruction...
> Clarify as "single machine instruction"?

My experience is some of the insns lie in terms of things like size.  The size
is set when it is split, but this pass may/may not run before those
instructions are split.  But I will change it to just testing if the length is
4, and hope for the best.

> > +{
> > +  if (!REG_P (reg) && !SUBREG_P (reg))
> > +return false;
> 
>

Re: [PATCH 1/3] Power10: Add PCREL_OPT load support

2020-08-20 Thread Segher Boessenkool
Hi!

On Tue, Aug 18, 2020 at 02:34:01AM -0400, Michael Meissner wrote:
> +// Maximum number of insns to scan between the load address and the load that

Please don't mix /* and // comments.  Just stick to /* comments, like
all the rest of our backend?

> +const int MAX_PCREL_OPT_INSNS= 10;

"static const", and not tab please (just a space).

> +  // LWA is a DS format instruction, but LWZ is a D format instruction.  We 
> use
> +  // DImode for the mode to force checking whether the bottom 2 bits are 0.

How so?  Can't you just check if the resulting insn is accepted? That
would be so much more robust (and can have a correct description more
easily, too!)

> +  // However FPR and vector registers uses the LFIWAX instruction which is
> +  // indexed only.

(vectors use lxsiwax)

> +  if (GET_CODE (mem) == SIGN_EXTEND && GET_MODE (XEXP (mem, 0)) == SImode)

You're checking here whether the address of the MEM is SImode.

> +{
> +  if (!INT_REGNO_P (reg_regno))

That should use base_reg_operand instead?

> +  // The optimization will only work on non-prefixed offsettable loads.
> +  rtx addr = XEXP (mem_inner, 0);
> +  enum insn_form iform = address_to_insn_form (addr, mem_mode, non_prefixed);
> +  if (iform != INSN_FORM_BASE_REG
> +  && iform != INSN_FORM_D
> +  && iform != INSN_FORM_DS
> +  && iform != INSN_FORM_DQ)
> +return false;

Sounds like something for a utility function?  Do we use this elsewhere?

> +  ++pcrel_opt_next_num;

Normal style is postfix increment.  Empty lines around this are nice, it
indicates this is the point where we decided to do the thing.  Or add a
comment even, maybe?

> +  unsigned int addr_regno = reg_or_subregno (addr_reg);
> +  rtx label_num = GEN_INT (pcrel_opt_next_num);
> +  rtx reg_di = gen_rtx_REG (DImode, reg_regno);
> +
> +  PATTERN (addr_insn)
> += ((addr_regno != reg_regno)
> +   ? gen_pcrel_opt_ld_addr (addr_reg, addr_symbol, label_num, reg_di)
> +   : gen_pcrel_opt_ld_addr_same_reg (addr_reg, addr_symbol, label_num));

Please use if() instead of multi-line expressions.  The outer parens are
unnecessary, too.

> +  // Revalidate the insn, backing out of the optimization if the insn is not
> +  // supported.

So the count will be incorrect then?

> +  INSN_CODE (addr_insn) = recog (PATTERN (addr_insn), addr_insn, 0);
> +  if (INSN_CODE (addr_insn) < 0)
> +{
> +  PATTERN (addr_insn) = addr_set;
> +  INSN_CODE (addr_insn) = recog (PATTERN (addr_insn), addr_insn, 0);
> +  return false;
> +}

Can you use the normal undo mechanisms, instead?  apply_change_group
etc.

> +static rtx_insn *
> +next_active_insn_in_basic_block (rtx_insn *insn)
> +{
> +  insn = NEXT_INSN (insn);
> +
> +  while (insn != NULL_RTX)
> +{
> +  /* If the basic block ends or there is a jump of some kind, exit the
> +  loop.  */
> +  if (CALL_P (insn)
> +   || JUMP_P (insn)
> +   || JUMP_TABLE_DATA_P (insn)
> +   || LABEL_P (insn)
> +   || BARRIER_P (insn))
> + return NULL;
> +
> +  /* If this is a real insn, return it.  */
> +  if (!insn->deleted ()
> +   && NONJUMP_INSN_P (insn)
> +   && GET_CODE (PATTERN (insn)) != USE
> +   && GET_CODE (PATTERN (insn)) != CLOBBER)
> + return insn;
> +
> +  /* Loop for USE, CLOBBER, DEBUG_INSN, NOTEs.  */
> +  insn = NEXT_INSN (insn);
> +}
> +
> +  return NULL;
> +}

There are things to do this.  Please don't write code manually parsing
RTL unless you have to.

> +// Validate that a load is actually a single instruction that can be 
> optimized
> +// with the PCREL_OPT optimization.
> +
> +static bool
> +is_single_instruction (rtx_insn *insn, rtx reg)

Of course it is a single instruction!  A single RTL instruction...
Clarify as "single machine instruction"?

> +{
> +  if (!REG_P (reg) && !SUBREG_P (reg))
> +return false;

You need to check if is a subreg of reg, then.  There are subregs of
other things.  Not that you should see those here, but still.

> +  // _Decimal128 and IBM extended double are always multiple instructions.
> +  machine_mode mode = GET_MODE (reg);
> +  if (mode == TFmode && !TARGET_IEEEQUAD)
> +return false;
> +
> +  if (mode == TDmode || mode == IFmode)
> +return false;

Don't we have a helper for this?  The ibm128 part at least.

Maybe we should just have an attribute on the insns that *can* get this
optimisation?

> +  return (base_reg_operand (XEXP (addr, 0), Pmode)
> +   && satisfies_constraint_I (XEXP (addr, 1)));

short_cint_operand.  But maybe just rs6000_legitimate_offset_address_p?

>  /* Flags that need to be turned off if -mno-power10.  */
>  #define OTHER_POWER10_MASKS  (OPTION_MASK_MMA\
>| OPTION_MASK_PCREL\
> +  | OPTION_MASK_PCREL_OPT\
>| OPTION_MASK_PREFIXED)

This isn't a CPU flag.  Instead, it is just an optimisation option.

>

Re: [PATCH 1/3] Power10: Add PCREL_OPT load support

2020-08-17 Thread Michael Meissner via Gcc-patches
[PATCH 1/3] Power10: Add PCREL_OPT load support.

This patch adds support for optimizing power10 loads of an external variable to
eliminate loading the address of the variable, and then doing a subsequent load
using that address.

I have built compilers with and without these set of 3 patches doing a
bootstrap build and make check.  There were no regressions, and the new tests
passed.  Can I check these patches into the master branch for GCC?  Because
this is new functionality, I do not intend to back port these patches to GCC 10
at this time.

gcc/
2020-08-18  Michael Meissner  

* config.gcc (powerpc*-*-*): Add pcrel-opt.o.
(rs6000*-*-*): Add pcrel-opt.o.
* config/rs6000/pcrel-opt.c: New file.
* config/rs6000/pcrel-opt.md: New file.
* config/rs6000/predicates.md (d_form_memory): New predicate.
* config/rs6000/rs6000-cpus.def (OTHER_POWER10_MASKS): Add
-mpcrel-opt.
(POWERPC_MASKS): Add -mpcrel-opt.
* config/rs6000/rs6000-passes.def: Add PCREL_OPT pass.
* config/rs6000/rs6000-protos.h (reg_to_non_prefixed): New
declaration.
(make_pass_pcrel_opt): New declaration.
* config/rs6000/rs6000.c (rs6000_option_override_internal): Add
support for -mpcrel-opt.
(rs6000_delegitimize_address): Add support for PCREL_OPT
addresses.
(print_operand, 'r' case): New operand for PCREL_OPT.
(rs6000_opt_masks): Add -mpcrel-opt.
(rs6000_asm_output_opcode): Reset flag to emit the initial 'p'
after use.
* config/rs6000/rs6000.md (loads_extern_addr attribute): New
attribute.
(isa attribute): Add pcrel_opt sub-case.
(enabled attribute): Add support for pcrel_opt.
(pcrel_extern_addr): Set loads_extern_addr attribute.
(toplevel): Include pcrel-opt.md.
* config/rs6000/rs6000.opt (-mpcrel-opt): New debug option.
* config/rs6000/t-rs6000 (pcrel-opt.o): Add build rule.
(MD_INCLUDES): Add pcrel-opt.md.
---
 gcc/config.gcc  |   6 +-
 gcc/config/rs6000/pcrel-opt.c   | 656 
 gcc/config/rs6000/pcrel-opt.md  | 248 ++
 gcc/config/rs6000/predicates.md |  23 ++
 gcc/config/rs6000/rs6000-cpus.def   |   2 +
 gcc/config/rs6000/rs6000-passes.def |   8 +
 gcc/config/rs6000/rs6000-protos.h   |   2 +
 gcc/config/rs6000/rs6000.c  |  40 ++-
 gcc/config/rs6000/rs6000.md |  14 +-
 gcc/config/rs6000/rs6000.opt|   4 +
 gcc/config/rs6000/t-rs6000  |   7 +-
 11 files changed, 1001 insertions(+), 9 deletions(-)
 create mode 100644 gcc/config/rs6000/pcrel-opt.c
 create mode 100644 gcc/config/rs6000/pcrel-opt.md

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 2370368..605d743 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -505,7 +505,7 @@ or1k*-*-*)
;;
 powerpc*-*-*)
cpu_type=rs6000
-   extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o 
rs6000-call.o"
+   extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o 
rs6000-call.o pcrel-opt.o"
extra_headers="ppc-asm.h altivec.h htmintrin.h htmxlintrin.h"
extra_headers="${extra_headers} bmi2intrin.h bmiintrin.h"
extra_headers="${extra_headers} xmmintrin.h mm_malloc.h emmintrin.h"
@@ -520,6 +520,7 @@ powerpc*-*-*)
esac
extra_options="${extra_options} g.opt fused-madd.opt 
rs6000/rs6000-tables.opt"
target_gtfiles="$target_gtfiles \$(srcdir)/config/rs6000/rs6000-logue.c 
\$(srcdir)/config/rs6000/rs6000-call.c"
+   target_gtfiles="$target_gtfiles \$(srcdir)/config/rs6000/pcrel-opt.c"
;;
 pru-*-*)
cpu_type=pru
@@ -531,8 +532,9 @@ riscv*)
;;
 rs6000*-*-*)
extra_options="${extra_options} g.opt fused-madd.opt 
rs6000/rs6000-tables.opt"
-   extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o 
rs6000-call.o"
+   extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o 
rs6000-call.o pcrel-opt.o"
target_gtfiles="$target_gtfiles \$(srcdir)/config/rs6000/rs6000-logue.c 
\$(srcdir)/config/rs6000/rs6000-call.c"
+   target_gtfiles="$target_gtfiles \$(srcdir)/config/rs6000/pcrel-opt.c"
;;
 sparc*-*-*)
cpu_type=sparc
diff --git a/gcc/config/rs6000/pcrel-opt.c b/gcc/config/rs6000/pcrel-opt.c
new file mode 100644
index 000..10b4bc4
--- /dev/null
+++ b/gcc/config/rs6000/pcrel-opt.c
@@ -0,0 +1,656 @@
+/* Subroutines used support the pc-relative linker optimization.
+   Copyright (C) 2020 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.