Re: Ping: [PATCH v4, rs6000] Replace X-form addressing with D-form addressing in new pass for Power9

Kelvin Nilsen Wed, 13 Nov 2019 14:12:45 -0800

On 10/25/19 8:30 PM, Kelvin Nilsen wrote:
> 
> This patch adds a new optimization pass for rs6000 targets.
> 
> This new pass scans existing rtl expressions and replaces X-form loads and 
> stores with rtl expressions that favor selection of the D-form instructions 
> in contexts for which the D-form instructions are preferred.  The new pass 
> runs after the RTL loop optimizations since loop unrolling often introduces 
> opportunities for beneficial replacements of X-form addressing instructions.
> 
> For each of the new tests, multiple X-form instructions are replaced with 
> D-form instructions, some addi instructions are replaced with add 
> instructions, and some addi instructions are eliminated.  The typical 
> improvement for the included tests is a decrease of 4.28% to 12.12% in the 
> number of instructions executed on each iteration of the loop.  The 
> optimization has not shown measurable improvement on specmark tests, 
> presumably because the typical loops that are benefited by this optimization 
> are memory bounded and this optimization does not eliminate memory loads or 
> stores.  However, it is anticipated that multi-threaded workloads and 
> measurements of total power and cooling costs for heavy server workloads 
> would benefit.
> 
> This version 4 patch responds to feedback and numerous suggestions by Segher:
> 
>   1. Further improvements to comments and discussion of computational 
> complexity.
> 
>   2. Changed the name of insn_sequence_no to luid.
> 
>   3. Fixed some typos in comments.
> 
>   4. Added macro-defined constants to enforce upper bounds on the sizes (and 
> number of required iterations) for certain data structures.  The intent is to 
> bound compile time for programs that represent large numbers of opportunities 
> for D-form replacements.  This optimization pass ignores  parts of a source 
> program that exceed these macro-defined size limits.
> 
> In a separate mail, I have sent discussion regarding the behavior of 
> preceding passes and how this behavior relates to this new pass.
> 
> I have built and regression tested this patch on powerpc64le-unknown-linux 
> target with no regressions.
> 
> Is this ok for trunk?
> 
> gcc/ChangeLog:
> 
> 2019-10-25  Kelvin Nilsen  <kel...@gcc.gnu.org>
> 
>       * config/rs6000/rs6000-p9dform.c: New file.
>       * config/rs6000/rs6000-passes.def: Add pass_insert_dform.
>       * config/rs6000/rs6000-protos.h
>       (rs6000_target_supports_dform_offset_p): New function prototype.
>       (make_pass_insert_dform): Likewise.
>       * config/rs6000/rs6000.c (rs6000_target_supports_dform_offset_p):
>       New function.
>       * config/rs6000/t-rs6000 (rs6000-p9dform.o): New build target.
>       * config.gcc: Add rs6000-p9dform.o object file.
> 
> gcc/testsuite/ChangeLog:
> 
> 2019-10-25  Kelvin Nilsen  <kel...@gcc.gnu.org>
> 
>       * gcc.target/powerpc/p9-dform-0.c: New test.
>       * gcc.target/powerpc/p9-dform-1.c: New test.
>       * gcc.target/powerpc/p9-dform-10.c: New test.
>       * gcc.target/powerpc/p9-dform-11.c: New test.
>       * gcc.target/powerpc/p9-dform-12.c: New test.
>       * gcc.target/powerpc/p9-dform-13.c: New test.
>       * gcc.target/powerpc/p9-dform-14.c: New test.
>       * gcc.target/powerpc/p9-dform-15.c: New test.
>       * gcc.target/powerpc/p9-dform-2.c: New test.
>       * gcc.target/powerpc/p9-dform-3.c: New test.
>       * gcc.target/powerpc/p9-dform-4.c: New test.
>       * gcc.target/powerpc/p9-dform-5.c: New test.
>       * gcc.target/powerpc/p9-dform-6.c: New test.
>       * gcc.target/powerpc/p9-dform-7.c: New test.
>       * gcc.target/powerpc/p9-dform-8.c: New test.
>       * gcc.target/powerpc/p9-dform-9.c: New test.
>       * gcc.target/powerpc/p9-dform-generic.h: New test.
> 
> Index: gcc/config/rs6000/rs6000-p9dform.c
> ===================================================================
> --- gcc/config/rs6000/rs6000-p9dform.c        (nonexistent)
> +++ gcc/config/rs6000/rs6000-p9dform.c        (working copy)
> @@ -0,0 +1,1763 @@
> +/* Subroutines used to transform array subscripting expressions into
> +   forms that are more amenable to d-form instruction selection for p9
> +   little-endian VSX code.
> +   Copyright (C) 1991-2019 Free Software Foundation, Inc.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published
> +   by the Free Software Foundation; either version 3, or (at your
> +   option) any later version.
> +
> +   GCC is distributed in the hope that it will be useful, but WITHOUT
> +   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
> +   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
> +   License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with GCC; see the file COPYING3.  If not see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "backend.h"
> +#include "rtl.h"
> +#include "tree.h"
> +#include "memmodel.h"
> +#include "df.h"
> +#include "tm_p.h"
> +#include "ira.h"
> +#include "print-tree.h"
> +#include "varasm.h"
> +#include "explow.h"
> +#include "expr.h"
> +#include "output.h"
> +#include "tree-pass.h"
> +#include "rtx-vector-builder.h"
> +#include "cfgloop.h"
> +
> +#include "insn-config.h"
> +#include "recog.h"
> +
> +#include "print-rtl.h"
> +#include "tree-pretty-print.h"
> +
> +#include "genrtl.h"
> +
> +/* This pass transforms array indexing expressions from a form that
> +   favors selection of X-form instructions into a form that favors
> +   selection of D-form instructions.
> +
> +   Showing favor for D-form instructions is especially important when
> +   targeting Power9, as the Power9 architecture added a number of new
> +   D-form instruction capabilities.
> +
> +   Consider, for example, the following loop, excerpted from an actual
> +   program:
> +
> +    double sacc, x[], y[], z[];
> +    sacc = 0.00;
> +    for (unsigned long long int i = 0; i < N; i++) {
> +      z[i] = x[i] * y[i];
> +      sacc += z[i];
> +    }
> +
> +   Compile this program with the following gcc options which enable both
> +   vectorization and loop unrolling:
> +    -m64 -fdump-rtl-all-details -mcpu=power9 -mtune=power9 -funroll-loops -O3
> +
> +   Without this pass, this loop is represented by the following:
> +
> +     lxvx:          16
> +     addi:           8
> +     xvmuldp:        8
> +     stxvx:          8
> +     fmr:            8
> +     xxpermdi:       8
> +     fadd:          16
> +     bdnz:           1
> +                   ___
> +           total:   73 instructions
> +
> +.L3:
> +     lxvx 0,29,11
> +     lxvx 12,30,11
> +     addi 12,11,16
> +     addi 0,11,48
> +     addi 5,11,64
> +     addi 9,11,32
> +     addi 6,11,80
> +     addi 7,11,96
> +     addi 8,11,112
> +     lxvx 2,29,12
> +     lxvx 3,30,12
> +     lxvx 4,29,0
> +     lxvx 5,30,0
> +     lxvx 10,30,9
> +     lxvx 11,29,5
> +     xvmuldp 6,0,12
> +     lxvx 13,30,5
> +     lxvx 8,29,9
> +     lxvx 27,29,6
> +     lxvx 28,30,6
> +     xvmuldp 7,2,3
> +     lxvx 29,29,7
> +     lxvx 30,30,7
> +     xvmuldp 9,4,5
> +     lxvx 3,30,8
> +     lxvx 0,29,8
> +     xvmuldp 8,8,10
> +     xvmuldp 10,11,13
> +     xvmuldp 11,27,28
> +     xxpermdi 26,6,6,3
> +     fmr 2,6
> +     stxvx 6,31,11
> +     xvmuldp 12,29,30
> +     addi 11,11,128
> +     fadd 1,26,1
> +     xxpermdi 26,7,7,3
> +     stxvx 7,31,12
> +     fmr 27,7
> +     xvmuldp 0,0,3
> +     xxpermdi 30,9,9,3
> +     fmr 31,9
> +     stxvx 8,31,9
> +     xxpermdi 13,10,10,3
> +     xxpermdi 28,8,8,3
> +     stxvx 9,31,0
> +     stxvx 10,31,5
> +     fadd 1,2,1
> +     fmr 2,10
> +     xxpermdi 3,11,11,3
> +     stxvx 11,31,6
> +     fmr 4,11
> +     fmr 29,8
> +     xxpermdi 5,12,12,3
> +     fmr 6,12
> +     stxvx 12,31,7
> +     xxpermdi 9,0,0,3
> +     fmr 8,0
> +     fadd 7,26,1
> +     stxvx 0,31,8
> +     fadd 10,27,7
> +     fadd 11,28,10
> +     fadd 12,29,11
> +     fadd 26,30,12
> +     fadd 27,31,26
> +     fadd 30,13,27
> +     fadd 31,2,30
> +     fadd 0,3,31
> +     fadd 28,4,0
> +     fadd 29,5,28
> +     fadd 13,6,29
> +     fadd 1,9,13
> +     fadd 1,8,1
> +     bdnz .L3
> +
> +   With this pass, the same loop is represented by:
> +
> +     lxvx:           4
> +     lxv:           12
> +     addi:           2
> +     add:            3
> +     xvmuldp:        8
> +     stxvx:          2
> +     stxv:           6
> +     fmr:            8
> +     xxpermdi:       8
> +     fadd:          16
> +     bdnz:           1
> +                   ___
> +           total:   70 instructions
> +
> +.L3:
> +     lxvx 0,29,6
> +     lxvx 12,30,6
> +     addi 10,6,16
> +     add 7,29,10
> +     add 8,30,10
> +     lxvx 2,29,10
> +     lxvx 3,30,10
> +     add 11,31,10
> +     lxv 4,16(7)
> +     lxv 9,16(8)
> +     xvmuldp 6,0,12
> +     lxv 13,64(8)
> +     lxv 5,32(7)
> +     lxv 28,32(8)
> +     lxv 11,64(7)
> +     xvmuldp 7,2,3
> +     lxv 30,80(7)
> +     lxv 12,80(8)
> +     lxv 31,48(8)
> +     lxv 10,48(7)
> +     xvmuldp 8,4,9
> +     lxv 3,96(8)
> +     lxv 0,96(7)
> +     xvmuldp 9,5,28
> +     xvmuldp 11,11,13
> +     xxpermdi 29,6,6,3
> +     fmr 4,6
> +     stxvx 6,31,6
> +     xvmuldp 12,30,12
> +     addi 6,6,128
> +     fadd 1,29,1
> +     xxpermdi 28,7,7,3
> +     fmr 29,7
> +     stxvx 7,31,10
> +     xvmuldp 10,10,31
> +     xxpermdi 30,8,8,3
> +     fmr 31,8
> +     stxv 8,16(11)
> +     xvmuldp 0,0,3
> +     xxpermdi 5,11,11,3
> +     fmr 6,11
> +     xxpermdi 13,9,9,3
> +     fadd 1,4,1
> +     stxv 11,64(11)
> +     xxpermdi 7,12,12,3
> +     fmr 8,12
> +     stxv 12,80(11)
> +     fmr 2,9
> +     xxpermdi 3,10,10,3
> +     fmr 4,10
> +     stxv 9,32(11)
> +     stxv 10,48(11)
> +     fadd 28,28,1
> +     xxpermdi 9,0,0,3
> +     fmr 10,0
> +     stxv 0,96(11)
> +     fadd 11,29,28
> +     fadd 29,30,11
> +     fadd 12,31,29
> +     fadd 30,13,12
> +     fadd 31,2,30
> +     fadd 13,3,31
> +     fadd 2,4,13
> +     fadd 1,5,2
> +     fadd 0,6,1
> +     fadd 3,7,0
> +     fadd 4,8,3
> +     fadd 5,9,4
> +     fadd 1,10,5
> +     bdnz .L3
> +
> +   The optimized loop body replaces 12 lxvx instructions with lxv
> +   instructions, 6 stxvx instructions with stxv, and has 3 fewer add
> +   operations.
> +
> +   This pass runs immediately after pass_loop2.  Loops have already
> +   been unrolled.  The pass searches for sequences of code of the following
> +   form.  These code sequences often appear within the expanded loop bodies
> +   that result from unrolling.  The memory access patterns below match
> +   both load and store instructions.  The set of memory operations
> +   that derive from the same originating expression are grouped together
> +   by this algorithm into a collection identified within the code as an
> +   equivalence class.
> +
> +   A0: *(array_base + offset)
> +   ;; The above is known as an originating access to memory.
> +
> +   Aij: offset += constant (i, j)
> +   ;; Between consecutive accesses to memory, there may appear zero or
> +   ;; more constant adjustments to the memory offset subexpression.
> +
> +   Ai: *(array_base + offset)
> +   ;; The memory address for each subsequent access to memory differs
> +   ;; from the originating memory access by a constant offset, which
> +   ;; is computed by adding together all of the preceding constant
> +   ;; (i,j) values.
> +
> +   ;; In any given equivalence class, there may be multiple subsequent
> +   ;; memory accesses, identifed as A2, A3, ... AN, and there may be
> +   ;; multiple constant adjustments to the offset expression between
> +   ;; each pair Ai-1 and Ai where the N intervening constant
> +   ;; adjustments are identified as Aij for j ranging from 0 to N-1.
> +
> +   ;; It is required that each element of the matched pattern dominate
> +   ;; the element that follows.  In other words, the flow through the
> +   ;; various matched elements must be unconditional.  Otherwise, the
> +   ;; matched elements cannot be considered to reside within the same
> +   ;; equivalence class for purposes of this optimization.
> +
> +   This pass replaces the above-matched sequences with:
> +
> +   Ai: derived_pointer = array_base + offset
> +       *(derived_pointer)
> +
> +   Aij: leave these alone.  expect that subsequent optimization deletes
> +        this code as it may become dead (since we don't use the
> +        indexing expression following our code transformations.)
> +
> +   Ai:
> +   *(derived_pointer + constant_i)
> +     (where constant_i equals sum of constant (n,j) for all n from 1
> +      to i paired with all j from 1 to Kn,
> +
> +   For example, if the original code is of the form
> +     int offset = some_var;
> +     x0 = base[offset];
> +     offset -= 8;
> +     x1 = base[offset];
> +     other_offset = offset + 16;
> +     x2 = base[other_offset]
> +
> +   This is transformed into
> +
> +     iv_ptr = base+some_var;
> +     x0 = iv_ptr[0];
> +     x1 = iv_ptr[-8];
> +     x2 = iv_ptr[8];
> +
> +   This pass makes no effort to assure that the calculated iv_ptr
> +   value points into the middle of the range of addresses to be
> +   spanned by the pointer.  We simply choose the value of the
> +   addressing expression that dominates all the other addressing
> +   instructions within the same equivalence class.  In theory, a more
> +   judicious choice of the calculated iv_ptr value can result in fewer
> +   instructions not matching the d-form addressing mode because the
> +   constant offset exceeds the reach of the instruction's encoding
> +   limits.  However, this is very rare and the extra effort required
> +   to optimize the choice of the computed iv_ptr value is deemed not
> +   worth the effort at the current time.
> +
> +   Note that there may be multiple equivalence classes, each
> +   associated with the same or possibly a different array_base value
> +   within each function that is processed by this optimization pass.  */
> +
> +/* This is based on the union-find logic in web.c.  web_entry_base is
> +   defined in df.h.  */
> +class indexing_web_entry: public web_entry_base
> +{
> + public:
> +  rtx_insn *insn;
> +  basic_block bb;
> +
> +  /* A unique sequence number, identified as the local unique id, or
> +     luid, is assigned to each instruction for the purpose of
> +     simplifying domination tests.  Within each basic block, local
> +     unique id numbers are assigned in strictly increasing order.
> +     Thus, for any two instructions known to reside in the same basic
> +     block, the instruction with a lower luid is known to dominate the
> +     instruction with a higher luid.  */
> +  unsigned int luid;
> +
> +  /* If this insn is relevant, it is a load or store with a memory
> +     address that is comprised of a base pointer (e.g. the address of
> +     an array or array slice) and an index expression (e.g. an index
> +     within the array).  By convention, the base expression is assumed
> +     to be the left argument to the addressing arithmetic plus
> +     operator and the index expression is assumed to be the right
> +     argument.  Symmetric presentation of the same two arguments of
> +     the plus operator will not be recognized as belonging to the same
> +     equivalence class.  The original_base_use and original_index_use
> +     fields represent the numbers of the instructions that define the
> +     base and index values which are summed together with a constant
> +     value to determine the value of this instruction's memory
> +     address.  */
> +  unsigned int original_base_use;
> +  unsigned int original_index_use;
> +
> +  /* If this insn is relevant, the register assigned by insn
> +     original_base_use is original_base_reg.  The insn assigned by insn
> +     original_index_use is original_index_reg.  */
> +  unsigned int original_base_reg;
> +  unsigned int original_index_reg;
> +
> +  /* If this insn is_relevant, this is the constant that is added to
> +     the originating expression to calculate the value of this insn's
> +     memory address.  */
> +  int base_delta;
> +  int index_delta;
> +
> +  /* If this insn is relevant, it belongs to an equivalence class.
> +     The equivalence classes are identified by the definitions that
> +     define the inputs to this insn.   */
> +  unsigned int base_equivalence_hash;
> +  unsigned int index_equivalence_hash;
> +
> +  /* When multiple insns fall within the same equivalence class, they
> +     are linked together through this field.  The value UINT_MAX
> +     represents the end of this list.  */
> +  unsigned int equivalent_sibling;
> +
> +  /* Only instructions that represent loads or stores for which the
> +     memory address computation is in a particular simple form are
> +     considered relevant to this d-form optimization pass.
> +
> +     If a particular entry is identified as is_relevant == false, the
> +     values of the following fields are all undefined: is_load,
> +     is_store, is_originating, original_base_use, original_index_use,
> +     original_base_reg, original_index_reg, base_delta, index_delta,
> +     base_equivalence_hash, index_equivalence_hash, and
> +     equivalent_sibling.  */
> +  unsigned int is_relevant : 1;
> +  unsigned int is_load : 1;
> +  unsigned int is_store : 1;
> +  unsigned int is_originating : 1;
> +};
> +
> +/* How many relevant uses are contained within this function?  A
> +   relevant insn is a memory store or load insn for which the memory
> +   address is computed as the sum or difference of a base address and
> +   an index expression and there is exactly one definition of the base
> +   expression and there is exactly one definition of the index
> +   expression.  The complexity of this pass is determined in part by
> +   this number. */
> +static unsigned int num_relevant_insns;
> +
> +/* What is the largest number of definitions that reach relevant
> +   uses within this function?  The complexity of this pass
> +   is determined in part by this number.  */
> +static unsigned int max_use_defs;
> +
> +/* How many otherwise relevant instructions were rejected due to
> +   overflow of the MAX_RELEVANT_INSNS or BOUND_MAX_USE_DEFS
> +   limitations?  */
> +static unsigned int rejected_otherwise_relevant;
> +
> +/* The complexity of this pass is O (num_relevant_insns *
> +   max_use_defs).  If num_relevant_insns > MAX_RELEVANT_INSNS, suspend
> +   further optimization so as to manage compilation complexity.  */
> +#define MAX_RELEVANT_INSNS 128
> +
> +/* The complexity of this pass is O (num_relevant_insns *
> +   max_use_defs).  If max_use_defs > BOUND_MAX_USE_DEFS, suspend
> +   further optimization so as to manage compilation complexity.  */
> +#define BOUND_MAX_USE_DEFS 8
> +
> +/* A relevant instruction is expected to make use of no more than 3
> +   inputs: the base register, the index register, and, for store
> +   operations, the register holding the value to be stored.  If an
> +   instruction has more uses, consider that instruction to be not
> +   relevant.  */
> +#define BOUND_RELEVANT_USES 3
> +
> +/* Count how many definitions reach the use that is represented by the
> +   DEF_LINK argument.  Update max_use_defs if returned count exceeds
> +   previously encountered maximum count value.  */
> +static unsigned int
> +count_links (struct df_link *def_link)
> +{
> +  unsigned int count;
> +  for (count = 0; def_link != NULL; count++)
> +    def_link = def_link->next;
> +  if (count > max_use_defs)
> +    max_use_defs = count;
> +  return count;
> +}
> +
> +/* Helper comparison function for use by qsort.  */
> +static int
> +int_compare (const void *v1, const void *v2)
> +{
> +  const int *i1 = (const int *) v1;
> +  const int *i2 = (const int *) v2;
> +  return *i1 - *i2;
> +}
> +
> +/* Calculate the hash value for the use represented by DEF_LINK, given
> +   that COUNT definitions are known to reach this use.
> +
> +   Complexity is n*log(n) (for qsort) where n is COUNT.  */
> +static unsigned int
> +help_hash (unsigned int count, struct df_link *def_link)
> +{
> +  int *ids;
> +  int i = 0;
> +
> +  ids = (int *) alloca (count * sizeof (ids[0]));
> +
> +  while (def_link != NULL)
> +    {
> +      ids[i++] = DF_REF_ID (def_link->ref);
> +      def_link = def_link->next;
> +    }
> +
> +  /* sort to put ids in ascending order. */
> +  qsort ((void *) ids, count, sizeof (ids[0]), int_compare);
> +
> +  unsigned int result = 0;
> +  for (unsigned i = 0; i < count; i++)
> +    {
> +      result = (result << 6) ^ ((result >> 28) & 0x0f);
> +      result += ids[i];
> +    }
> +  return result;
> +}
> +
> +/* Calculate a hash code that represents all of the definitions that
> +   contribute to a given variable's computed value.  */
> +static unsigned int
> +equivalence_hash (struct df_link *def_link)
> +{
> +  unsigned int count = count_links (def_link);
> +  return help_hash (count, def_link);
> +}
> +
> +/* Set *header to point to the chain of originating definitions
> +   corresponding to USE.  */
> +static void
> +overwrite_defs_header (indexing_web_entry *insn_entry, unsigned int uid,
> +                    df_ref use, struct df_link **header)
> +{
> +  struct df_link *def_link = DF_REF_CHAIN (use);
> +  unsigned int uid2;
> +
> +  /* If there is no def, or the definition is artificial, or if there
> +     are multiple definitions, this is an originating use.  */
> +  if (!def_link || !def_link->ref
> +      || DF_REF_IS_ARTIFICIAL (def_link->ref) || def_link->next)
> +    *header = def_link;
> +  else if ((uid2 = insn_entry[uid].original_base_use) > 0)
> +    {
> +      df_ref use2;
> +      rtx_insn *insn2 = insn_entry[uid2].insn;
> +      struct df_insn_info *insn_info2 = DF_INSN_INFO_GET (insn2);
> +      use2 = DF_INSN_INFO_USES (insn_info2);
> +      if (use2)
> +     *header = DF_REF_CHAIN (use2);
> +      else
> +     *header = NULL;
> +    }
> +  else
> +    *header = NULL;
> +}
> +
> +/* Given that UID represents a "relevant" instruction, overwrite
> +   *insn_base with a pointer to the chain of originating definitions
> +   for the base expression.  Overwrite *insn_index with a pointer
> +   to the chain of originating definitions for the index expression.  */
> +static void
> +help_find_defs (indexing_web_entry *insn_entry,
> +             unsigned int uid, rtx base_reg, rtx index_reg,
> +             struct df_link **insn_base, struct df_link **insn_index)
> +{
> +  rtx_insn *insn = insn_entry[uid].insn;
> +  struct df_insn_info *insn_info = DF_INSN_INFO_GET (insn);
> +  df_ref use;
> +
> +  /* Iterate over the uses consumed by this insn: loop iterates at
> +     most 3 times, once for use representing the base register, once
> +     for use representing the index register, and, for store
> +     operations, once for the use representing the value to be stored
> +     to memory.  */
> +  FOR_EACH_INSN_INFO_USE (use, insn_info)
> +    {
> +      if (rtx_equal_p (DF_REF_REG (use), base_reg))
> +     overwrite_defs_header (insn_entry, uid, use, insn_base);
> +      else if (rtx_equal_p (DF_REF_REG (use), index_reg))
> +     overwrite_defs_header (insn_entry, uid, use, insn_index);
> +    }
> +}
> +
> +/* Given that INSN is known to be relevant, find the linked list of
> +   definitions that define the base register and the the linked list
> +   of definitions that define the index register, and overwrite
> +   *INSN_BASE and *INSN_INDEX with pointers to the two linked lists
> +   respectively.
> +
> +   Since INSN is "relevant", it is known to represent a load or store
> +   operation and the associated memory address is known to be
> +   represented by a sum or difference of a base register and an index
> +   offset register.  */
> +static void
> +find_defs (indexing_web_entry *insn_entry, rtx_insn *insn,
> +        struct df_link **insn_base, struct df_link **insn_index)
> +{
> +  unsigned int uid = INSN_UID (insn);
> +  rtx body = PATTERN (insn);
> +  rtx mem = NULL;
> +
> +  gcc_assert (GET_CODE (body) == SET);
> +
> +  if (MEM_P (SET_SRC (body)))
> +    mem = XEXP (SET_SRC (body), 0);
> +  else if (MEM_P (SET_DEST (body)))
> +    mem = XEXP (SET_DEST (body), 0);
> +  else
> +    gcc_unreachable ();
> +
> +  enum rtx_code code = GET_CODE (mem);
> +  gcc_assert ((code == PLUS) || (code == MINUS));
> +
> +  rtx base_reg = XEXP (mem, 0);
> +  rtx index_reg = XEXP (mem, 1);
> +  if (REG_P (base_reg) && REG_P (index_reg))
> +    help_find_defs (insn_entry, uid,
> +                 base_reg, index_reg, insn_base, insn_index);
> +}
> +
> +/* Return non-zero if and only if USE represents a compile-time constant.  */
> +static bool
> +represents_constant_p (df_ref use)
> +{
> +  struct df_link *def_link = DF_REF_CHAIN (use);
> +
> +  /* If there is no definition, or the definition is artificial,
> +     or if there are multiple definitions, this is an originating use
> +     which is not a constant.  */
> +  if (!def_link || !def_link->ref
> +      || DF_REF_IS_ARTIFICIAL (def_link->ref) || def_link->next)
> +    return false;
> +  else
> +    {
> +      /* Consult the instruction that provides definition of this use.  */
> +      rtx def_insn = DF_REF_INSN (def_link->ref);
> +      rtx body = PATTERN (def_insn);
> +      if (CONST_INT_P (body))
> +     return true;
> +      else if (GET_CODE (body) == SET)
> +     {
> +       /* Recurse on the uses that define the value of this definition.  */
> +       struct df_insn_info *inner_insn_info = DF_INSN_INFO_GET (def_insn);
> +       df_ref inner_use;
> +       FOR_EACH_INSN_INFO_USE (inner_use, inner_insn_info)
> +         {
> +           if (!represents_constant_p (inner_use))
> +             return false;
> +         }
> +       /* All uses are constant.  */
> +       return true;
> +     }
> +      else
> +     return false;   /* Treat unrecognized codes as not constant.  */
> +    }
> +}
> +
> +/* This function helps analyze opportunities for copy propogation and
> +   constant folding.
> +
> +   An originator represents the first point at which the value of
> +   DEF_LINK is derived from potentially more than one input
> +   definition, or the point at which DEF_LINK's value is defined by an
> +   algebraic expression involving only constants,
> +
> +   If DEF_LINK's value depends on a constant combined with a single
> +   variable or a simple propagation of a single variable, continue
> +   the search for the originator by examining the origin of the source
> +   variable's value.
> +
> +   The value of *ADJUSTMENT is overwritten with the constant value that is
> +   added to the originator expression to obtain the value intended to
> +   be represented by DEF_LINK.  In the case that find_true_originator
> +   returns NULL, the value held in *ADJUSTMENT is undefined.
> +
> +   Returns NULL if there is no single true originator.  In general, the 
> search
> +   for an originator expression only spans SET operations that are
> +   based on simple algebraic expressions, each of which involves no
> +   more than one variable input.
> +
> +   Complexity: Linear in the number of definitions used by this
> +   instruction (<= BOUND_MAX_USE_DEFS) multiplied (for recursive
> +   calls) by the maximum depth of the potential copy propogation
> +   chain.
> +  */
> +static rtx
> +find_true_originator (struct df_link *def_link, long long int *adjustment)
> +{
> +  rtx def_insn = DF_REF_INSN (def_link->ref);
> +
> +  rtx inner_body = PATTERN (def_insn);
> +  if (GET_CODE (inner_body) == SET)
> +    {
> +      struct df_insn_info *inner_insn_info = DF_INSN_INFO_GET (def_insn);
> +      df_ref inner_use;
> +
> +      /* We're only happy with multiple uses if all but one represent
> +      constant values.  */
> +      int non_constant_uses = 0;
> +      rtx result = NULL;
> +      FOR_EACH_INSN_INFO_USE (inner_use, inner_insn_info)
> +     {
> +       if (!represents_constant_p (inner_use))
> +         {
> +           non_constant_uses++;
> +           /* There should be only one non-constant use, and it should
> +              satisfy find_true_originator.  */
> +           struct df_link *def_link = DF_REF_CHAIN (inner_use);
> +
> +           /* If there is no definition, or the definition is artificial,
> +              or there are multiple definitions, this is an
> +              originating use.  */
> +           if (!def_link || !def_link->ref
> +               || DF_REF_IS_ARTIFICIAL (def_link->ref) || def_link->next)
> +             result = def_insn;
> +           else
> +             result = find_true_originator (def_link, adjustment);
> +         }
> +     }
> +
> +      /* If non_constant_uses > 1, the value of result is not well
> +      defined because it is overwritten during multiple iterations
> +      of the above loop.  In the case that non_constant_uses > 1,
> +      we ignore the result value computed above.  */
> +      if (non_constant_uses == 1) {
> +
> +     /* If my SET looks like a simple register copy, or if it looks
> +        like PLUS or MINUS of a constant and a register, this is
> +        what we optimize.  Otherwise, punt.  */
> +
> +     if (result == NULL)
> +       /* Doing constant arithmetic with unknown originator is not
> +          useful.  */
> +       return def_insn;
> +
> +     rtx source_expr = SET_SRC (inner_body);
> +     int source_code = GET_CODE (source_expr);
> +     if (source_code == PLUS)
> +       {
> +         rtx op1 = XEXP (source_expr, 0);
> +         rtx op2 = XEXP (source_expr, 1);
> +
> +         if (CONST_INT_P (op1) && CONST_INT_P (op2))
> +           *adjustment += INTVAL (op1);
> +         else if (!CONST_INT_P (op1) && CONST_INT_P (op2))
> +           *adjustment += INTVAL (op2);
> +       }
> +     else if (source_code == MINUS)
> +       {
> +         rtx op1 = XEXP (source_expr, 0);
> +         rtx op2 = XEXP (source_expr, 1);
> +
> +         if (!CONST_INT_P (op1) && CONST_INT_P (op2))
> +           *adjustment -= INTVAL (op1);
> +         else
> +           /* assumption is that *adjustment is added to a positive variable
> +              expression, so don't optimize this rare condition.  */
> +           result = def_insn;
> +       }
> +     else if (source_code != REG)
> +       /* We don't handle ashift, UNSPEC, etc..  */
> +       result = def_insn;
> +     /* else, register copy expression does not impact adjustment.  */
> +
> +     return result;
> +      }
> +      else
> +     /* Same behavior if there are too many non-constant inputs or if
> +        all inputs are constant.  */
> +     return def_insn;
> +    }
> +  else
> +    /* This is not a SET.  It does not serve as a true originator. */
> +    return NULL;
> +}
> +
> +/* The size of the insn_entry array.  Note that this array does not
> +   represent instructions created during this optimization pass.  */
> +static unsigned int max_uid_at_start;
> +
> +/* Return true if and only if ELEMENT is on LIST.  This is linear in
> +   the length of the list.  The list length is no longer than
> +   BOUND_MAX_USE_DEFS.  */
> +static bool
> +in_use_list (struct df_link *list, struct df_link *element)
> +{
> +  while (list != NULL)
> +    {
> +      if (element->ref == list->ref)
> +     return true;
> +      list = list->next;
> +    }
> +  /* Got to end of list without finding element.  */
> +  return false;
> +}
> +
> +/* Return true iff the instruction represented by uid_1 is in the same
> +   equivalence class as the instruction represented by uid_2.
> +
> +   Returns false generally in constant time (based on unequal hash
> +   values).  Returning true, as currently implemented, requires
> +   quadratic time in the number of definitions that contribute to
> +   either the base or index expressions of the equivalence class.
> +   Number of iterations is no greater than (BOUND_MAX_USE_DEFS *
> +   BOUND_MAX_USE_DEFS).
> +
> +   To improve complexity to linear in the number of contributing
> +   definitions, allocate and remember the sorted list of all
> +   definitions for each relevant value.   */
> +static bool
> +equivalent_p (indexing_web_entry *insn_entry,
> +           unsigned int uid_1, unsigned int uid_2)
> +{
> +  if ((insn_entry[uid_1].base_equivalence_hash !=
> +       insn_entry[uid_2].base_equivalence_hash) ||
> +      (insn_entry[uid_1].index_equivalence_hash !=
> +       insn_entry[uid_2].index_equivalence_hash))
> +    return false;
> +
> +  /* Hash codes match.  Check details.  */
> +  rtx_insn *insn_1, *insn_2;
> +  insn_1 = insn_entry[uid_1].insn;
> +  insn_2 = insn_entry[uid_2].insn;
> +
> +  struct df_link *insn1_base_defs, *insn1_index_defs;
> +  struct df_link *insn2_base_defs, *insn2_index_defs;
> +
> +  find_defs (insn_entry, insn_1, &insn1_base_defs, &insn1_index_defs);
> +  find_defs (insn_entry, insn_2, &insn2_base_defs, &insn2_index_defs);
> +
> +  int base_count_1 = count_links (insn1_base_defs);
> +  int index_count_1 = count_links (insn1_index_defs);
> +  int base_count_2 = count_links (insn2_base_defs);
> +  int index_count_2 = count_links (insn2_index_defs);
> +
> +  if ((base_count_1 != base_count_2) || (index_count_1 != index_count_2))
> +    return false;
> +
> +  if (insn_entry [uid_1].original_base_reg
> +      != insn_entry [uid_2].original_base_reg)
> +    return false;
> +  else if (insn_entry [uid_1].original_index_reg
> +        != insn_entry [uid_2].original_index_reg)
> +    return false;
> +
> +  /* Counts are the same.  Make sure elements match.  */
> +  /* The following comparison code is quadratic in counts.  Improve to
> +     n*log(n) by sorting the four arrays and comparing eleents pairwise.
> +     Improve to linear by remembering the sorted contents of each of
> +     the four arrays from when the hash values were first computed.
> +     Since the typical sizes of the count variables are fairly small,
> +     leave as is unless performance measurements justify increased
> +     complexity.  */
> +  while (insn1_base_defs != NULL)
> +    {
> +      if (!in_use_list (insn2_base_defs, insn1_base_defs))
> +     return false;
> +      insn1_base_defs = insn1_base_defs->next;
> +    }
> +  /* base patterns match, but stil need to consider index matches.  */
> +  while (insn1_index_defs != NULL)
> +    {
> +      if (!in_use_list (insn2_index_defs, insn1_index_defs))
> +     return false;
> +      insn1_index_defs = insn1_index_defs->next;
> +    }
> +
> +  return true;
> +}
> +
> +/* Return true iff instruction E2 dominates instruction E1.  Note
> +   that insn_dominated_by_p defined in ira.c is declared static and
> +   requires initialization of auxilary data not computed in this
> +   context. */
> +static bool
> +insn_dominated_by_p (indexing_web_entry *e1, indexing_web_entry *e2)
> +{
> +  basic_block bb1 = e1->bb;
> +  basic_block bb2 = e2->bb;
> +
> +  if (bb1 == bb2)
> +    return e2->luid <= e1->luid;
> +  else
> +    return dominated_by_p (CDI_DOMINATORS, bb1, bb2);
> +}
> +
> +/* Confirm that every insn in the equivalence class that is
> +   represented by the EQUIVALENCE_HASH [INDEX] linked list can be
> +   represented using d-form memory addresses.  For any insn that
> +   cannot be so represented, decrement the value of the
> +   REPLACEMENT_COUNT argument.  Return the adjusted value of
> +   REPLACEMENT_COUNT.
> +
> +   Complexity is linear in the size of the equivalence class.  */
> +static unsigned int
> +confirm_dform_insns (indexing_web_entry *insn_entry,
> +                  unsigned int *equivalence_hash, unsigned int index,
> +                  unsigned int the_dominator, unsigned replacement_count)
> +{
> +  unsigned int uid;
> +  long long int dominator_delta = (insn_entry [the_dominator].base_delta
> +                                + insn_entry [the_dominator].index_delta);
> +  for (uid = equivalence_hash [index]; uid != UINT_MAX;
> +       uid = insn_entry [uid].equivalent_sibling)
> +    {
> +      if (uid != the_dominator)
> +     {
> +       long long int dominated_delta = (insn_entry [uid].base_delta
> +                                        + insn_entry [uid].index_delta);
> +       dominated_delta -= dominator_delta;
> +
> +       rtx_insn *insn = insn_entry [uid].insn;
> +       rtx body = PATTERN (insn);
> +       rtx mem;
> +
> +       gcc_assert (GET_CODE (body) == SET);
> +
> +       if (MEM_P (SET_SRC (body))) /* load */
> +         {
> +           mem = SET_SRC (body);
> +           if (!rs6000_target_supports_dform_offset_p
> +               (GET_MODE (mem), dominated_delta))
> +             replacement_count--;
> +         }
> +       else
> +         {
> +           mem = SET_DEST (body); /* store */
> +           if (!rs6000_target_supports_dform_offset_p
> +               (GET_MODE (mem), dominated_delta))
> +             replacement_count--;
> +         }
> +     }
> +    }
> +  return replacement_count;
> +}
> +
> +/* Replace all xform insns in equivalence class with dform insns.
> +
> +   Complexity is linear in the size of the equivalence class.  */
> +void replace_xform_insns (indexing_web_entry *insn_entry,
> +                       unsigned int *equivalence_hash,
> +                       unsigned int index,
> +                       unsigned int the_dominator)
> +{
> +  /* First, fix up the_dominator instruction.  */
> +  rtx derived_ptr_reg = gen_reg_rtx (Pmode);
> +  rtx_insn *insn = insn_entry [the_dominator].insn;
> +  rtx body = PATTERN (insn);
> +  rtx base_reg, index_reg;
> +  rtx addr, mem;
> +  rtx new_init_expr;
> +
> +  if (dump_file)
> +    {
> +      fprintf (dump_file,
> +            "Endeavoring to replace originating insn %d: ", the_dominator);
> +      print_inline_rtx (dump_file, insn, 2);
> +      fprintf (dump_file, "\n");
> +    }
> +
> +  gcc_assert (GET_CODE (body) == SET);
> +  if (MEM_P (SET_SRC (body)))
> +    {
> +      /* originating instruction is a load */
> +      mem = SET_SRC (body);
> +      addr = XEXP (SET_SRC (body), 0);
> +    }
> +  else
> +    { /* originating instruction is a store */
> +      gcc_assert (MEM_P (SET_DEST (body)));
> +      mem = SET_DEST (body);
> +      addr = XEXP (SET_DEST (body), 0);
> +    }
> +
> +  enum rtx_code code = GET_CODE (addr);
> +  gcc_assert ((code == PLUS) || (code == MINUS));
> +  base_reg = XEXP (addr, 0);
> +  index_reg = XEXP (addr, 1);
> +
> +  if (code == PLUS)
> +    new_init_expr = gen_rtx_PLUS (Pmode, base_reg, index_reg);
> +  else
> +    new_init_expr = gen_rtx_MINUS (Pmode, base_reg, index_reg);
> +  new_init_expr = gen_rtx_SET (derived_ptr_reg, new_init_expr);
> +
> +  rtx_insn *new_insn = emit_insn_before (new_init_expr, insn);
> +  set_block_for_insn (new_insn, BLOCK_FOR_INSN (insn));
> +  INSN_CODE (new_insn) = -1; /* force re-recogniition. */
> +  df_insn_rescan (new_insn);
> +
> +  if (dump_file)
> +    {
> +      fprintf (dump_file, "with insn %d: ", INSN_UID (new_insn));
> +      print_inline_rtx (dump_file, new_insn, 2);
> +      fprintf (dump_file, "\n");
> +    }
> +
> +  /* If dominator_delta != 0, we need to make adjustments for dominator_delta
> +     in the D-form constant offsets associated with the propagating
> +     instructions.  */
> +
> +  rtx new_mem = gen_rtx_MEM (GET_MODE (mem), derived_ptr_reg);
> +  MEM_COPY_ATTRIBUTES (new_mem, mem);
> +  rtx new_expr;
> +  if (insn_entry [the_dominator].is_load)
> +    new_expr = gen_rtx_SET (SET_DEST (body), new_mem);
> +  else
> +    new_expr = gen_rtx_SET (new_mem, SET_SRC (body));
> +
> +  if (!validate_change (insn, &PATTERN(insn), new_expr, false))
> +    {        /* proposed change was rejected */
> +      if (dump_file)
> +     {
> +       fprintf (dump_file,
> +                "Dform optimization rejected by validate_change\n");
> +       print_inline_rtx (dump_file, new_insn, 2);
> +       fprintf (dump_file, "\n");
> +     }
> +    }
> +  else if (dump_file)
> +    {
> +      fprintf (dump_file, "and with insn %d: ", INSN_UID (insn));
> +      print_inline_rtx (dump_file, insn, 2);
> +      fprintf (dump_file, "\n");
> +    }
> +
> +  for (unsigned int uid = equivalence_hash [index]; uid != UINT_MAX;
> +       uid = insn_entry [uid].equivalent_sibling)
> +    {
> +      if (uid != the_dominator)
> +     {
> +       long long int dominated_delta = (insn_entry [uid].base_delta
> +                                        + insn_entry [uid].index_delta);
> +       long long int dominator_delta
> +         = (insn_entry [the_dominator].base_delta
> +            + insn_entry [the_dominator].index_delta);
> +       dominated_delta -= dominator_delta;
> +
> +       rtx_insn *insn = insn_entry [uid].insn;
> +       rtx body = PATTERN (insn);
> +       rtx mem;
> +
> +       if (dump_file)
> +         {
> +           fprintf (dump_file,
> +                    "Endeavoring to replace propagating insn %d: ", uid);
> +           print_inline_rtx (dump_file, insn, 2);
> +           fprintf (dump_file, "\n");
> +         }
> +
> +       gcc_assert (GET_CODE (body) == SET);
> +       if (MEM_P (SET_SRC (body))) /* load */
> +         mem = SET_SRC (body);
> +       else
> +         mem = SET_DEST (body); /* store */
> +
> +       rtx ci = gen_rtx_raw_CONST_INT (Pmode, dominated_delta);
> +       rtx addr_expr = gen_rtx_PLUS (Pmode,
> +                                     derived_ptr_reg, ci);
> +       rtx new_mem = gen_rtx_MEM (GET_MODE (mem), addr_expr);
> +       MEM_COPY_ATTRIBUTES (new_mem, mem);
> +
> +       rtx new_expr;
> +       if (insn_entry [uid].is_load)
> +         new_expr = gen_rtx_SET (SET_DEST (body), new_mem);
> +       else
> +         new_expr = gen_rtx_SET (new_mem, SET_SRC (body));
> +
> +       if (!validate_change (insn, &PATTERN(insn), new_expr, false))
> +         {   /* proposed change was rejected */
> +           if (dump_file)
> +             {
> +               fprintf (dump_file,
> +                        "Dform optimization rejected by validate_change\n");
> +               print_inline_rtx (dump_file, new_expr, 2);
> +               fprintf (dump_file, "\n");
> +             }
> +         }
> +       else if (dump_file)
> +         {
> +           fprintf (dump_file, "with insn %d: ", INSN_UID (insn));
> +           print_inline_rtx (dump_file, insn, 2);
> +           fprintf (dump_file, "\n");
> +         }
> +     }
> +    }
> +}
> +
> +/* Organize all "relevant" instructions into equivalence classes.
> +   Relevant instructions are instructions that load or store memory
> +   where the memory address is represented by a sum or difference of a
> +   base address register and an integer offset register.
> +
> +   An equivalence class holds all of the load and store operations
> +   that refer to the same computed base and/or index variables plus or
> +   minus some constant value.
> +
> +   All expressions in each equivalence class are replaced with d-form
> +   instructions in the emitted code on P9 or above.
> +
> +   Calculation of hash functions is linear in the number of
> +   definitions that potentially contribute to the computation of a
> +   particular variable's value.
> +
> +   Assume hash table insertion and lookup is constant time (i.e. we
> +   normally do not experience collision on hash values).
> +
> +   Complexity of of this function is O(m*n) where m is number of
> +   equivalence classes and n is maximum number of entries in the
> +   equivalence class.  Since each insn can be a member of only one
> +   equivalence class, this is linear in the number of insns within
> +   the function.  */
> +static void
> +build_and_fixup_equivalence_classes (indexing_web_entry *insn_entry)
> +{
> +  unsigned int i;
> +  /* There can be no more equivalence classes than the total number of
> +     instructions in the analyzed function.  Usually, there are far
> +     fewer instructions because many of the instructions are not load
> +     or store instructions, and some of those that are load and store
> +     instructions may end up in the same equivalence class.  */
> +  unsigned int *equivalence_hash =
> +    (unsigned int *) alloca (max_uid_at_start * sizeof (unsigned int));
> +
> +  /* Initialize the equivalence_hash array.  */
> +  for (i = 0; i < max_uid_at_start; i++)
> +    equivalence_hash [i] = UINT_MAX;
> +
> +  /* Place each relevant instruction into an equivalence class, either
> +     a class consisting only of itself, or a class that includes other
> +     relevant instructions.
> +
> +     Complexity of this loop is O(m*n^2) where m is the number of relevant
> +     instructions and n is number of definitions of the registers that
> +     hold the base or index components of the memory operation's
> +     address.  The number of iterations of the slow path through the
> +     loop is less than m * (BOUND_MAX_USE_DEFS * BOUND_MAX_USE_DEFS).  */
> +  for (unsigned int uid = 0; uid < max_uid_at_start; uid++)
> +    {
> +      if (insn_entry [uid].is_relevant)
> +     {
> +       unsigned int hash = ((insn_entry [uid].base_equivalence_hash
> +                             + insn_entry [uid].index_equivalence_hash)
> +                            % max_uid_at_start);
> +
> +       if (equivalence_hash [hash] == UINT_MAX)
> +         {                   /* first mention of this class */
> +           equivalence_hash [hash] = uid;
> +           insn_entry [uid].equivalent_sibling = UINT_MAX;
> +         }
> +       else
> +         {
> +           while ((equivalence_hash [hash] != UINT_MAX)
> +                  && !equivalent_p (insn_entry, uid,
> +                                    equivalence_hash [hash]))
> +             hash = (hash + 1) % max_uid_at_start;
> +
> +           if (equivalence_hash [hash] != UINT_MAX)
> +             {
> +               /* Found an equivalent instruction. */
> +               insn_entry [uid].equivalent_sibling =
> +                 equivalence_hash [hash];
> +               equivalence_hash [hash] = uid;
> +             }
> +           else
> +             {
> +               /* Equivalence class doesn't yet exist.  */
> +               equivalence_hash [hash] = uid;
> +               insn_entry [uid].equivalent_sibling = UINT_MAX;
> +             }
> +         }
> +     }
> +    }
> +
> +  /* Scrutinize each equivalence class.  For any entries in the
> +     equivalence class that are on conditional control flows (such
> +     that they do not dominate the other entries), remove these
> +     entries from the equivalence class.  */
> +  for (unsigned int i = 0; i < max_uid_at_start; i++)
> +    {
> +      while (equivalence_hash [i] != UINT_MAX)
> +     {
> +       unsigned int the_dominator = equivalence_hash [i];
> +       unsigned int uid;
> +
> +       if (dump_file && (dump_flags & TDF_DETAILS))
> +         fprintf (dump_file, "Equivalence class consists of\n");
> +
> +       /* Find the dominator for this equivalence class.
> +
> +          Complexity of following loop body is O(m*n) where m is number
> +          of equivalence classes and n is maximum number of entries
> +          in the equivalence class.  */
> +       for (uid = the_dominator; uid != UINT_MAX;
> +            uid = insn_entry [uid].equivalent_sibling)
> +         {
> +           if (insn_dominated_by_p (&insn_entry [the_dominator],
> +                                    &insn_entry [uid]))
> +             the_dominator = uid;
> +           if (dump_file && (dump_flags & TDF_DETAILS))
> +             fprintf (dump_file, "  member: %d\n", uid);
> +         }
> +
> +       unsigned int size_of_equivalence = 0;
> +       unsigned int removed_partition = UINT_MAX;
> +       unsigned int preceding_uid = UINT_MAX;
> +       unsigned int next_uid;
> +
> +       /* Having found a dominator, remove from this equivalence
> +          class any element that is not dominated by the_dominator.
> +
> +          Though this loop appears to be O(m*n), for m being number
> +          of equivalence classes and n being maximum number of
> +          entries in an equivalence class, it can also be described
> +          as O(i), where i is the number of instructions in the
> +          functions implementation.  Each iteration of the loop
> +          deals with a distinct instruction.  */
> +       for (uid = equivalence_hash [i]; uid != UINT_MAX; uid = next_uid)
> +         {
> +           next_uid = insn_entry [uid].equivalent_sibling;
> +           if (!insn_dominated_by_p (&insn_entry [uid],
> +                                     &insn_entry [the_dominator]))
> +             {
> +               /* insn uid thinks its in this equivalence class, but
> +                  it's not dominated by the_dominator, so remove it.  */
> +               insn_entry [uid].equivalent_sibling = removed_partition;
> +               removed_partition = uid;
> +               if (preceding_uid == UINT_MAX)
> +                 equivalence_hash [i] = next_uid;
> +               else
> +                 insn_entry [preceding_uid].equivalent_sibling = next_uid;
> +             }
> +           else
> +             {
> +               size_of_equivalence++;
> +               preceding_uid = uid;
> +             }
> +         }
> +
> +       /* By same argument as above, confirm_dform_insns and
> +          replace_xform_insns are also O(i), where i is the number
> +          of instructions in the function.  */
> +       if (confirm_dform_insns (insn_entry, equivalence_hash, i,
> +                                the_dominator, size_of_equivalence) > 1)
> +         replace_xform_insns (insn_entry, equivalence_hash,
> +                              i, the_dominator);
> +       else if (dump_file)
> +         fprintf (dump_file,
> +                  "Abandoning dform optimization: too few dform insns\n");
> +
> +       /* if (removed_partition != UINT_MAX), need to reprocess the
> +          contents of the removed_partition.  There may be
> +          additional opportunity to optimize within the set of
> +          insns that were not dominated by the selected dominator.
> +
> +          Each time through this loop, at least one dominator and
> +          any instructions it dominates are "processed".  Anything
> +          not dominated by the selected dominator remains in the
> +          "removed partition".  The "removed partition" gets
> +          smaller on each iteration, assuring eventual termination.  */
> +       equivalence_hash [i] = removed_partition;
> +     }
> +    }
> +}
> +
> +/* Assess whether the instruction represented by uid is relevant,
> +   setting *is_relevant to true if so, and setting *is_originating to
> +   true if this use is an originating definition.
> +
> +   If the insn is determined to be relevant and is_base is true, overwrite
> +   the base_delta, original_base_reg, original_base_use, and
> +   base_equivalence_hash fields of insn_entry[uid].
> +
> +   Otherwise, if the insn is determined to be relevant and is_base is
> +   not true, overwrite the index_delta, original_index_reg,
> +   original_index_use, and index_equivalence_hash fields of
> +   insn_entry[uid].
> +
> +   In case the insn is not determined to be relevant, certain fields
> +   fields of insn_entry[uid] may be overwritten with scratch values
> +   that have no significance as these fields are not consulted
> +   subequently in the case that the insn is not relevant.
> +
> +   Complexity: linear in the length of the number of entries on the
> +   use-definition chain, as represented by argument USE.
> +*/
> +static void
> +assess_use_relevance (bool is_base, rtx reg, bool *is_relevant,
> +                   bool *is_originating, df_ref use,
> +                   unsigned int uid, indexing_web_entry *insn_entry)
> +{
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> +    {
> +      fprintf (dump_file, "Found use corresponding to %s\n",
> +            is_base? "base": "index");
> +      df_ref_debug (use, dump_file);
> +    }
> +  struct df_link *def_link = DF_REF_CHAIN (use);
> +
> +  /* If there is no definition, or the definition is artificial, or there
> +     are multiple definitions, this is originating use.  */
> +  if (!def_link || !def_link->ref
> +      || DF_REF_IS_ARTIFICIAL (def_link->ref) || def_link->next)
> +    {
> +      if (dump_file && (dump_flags & TDF_DETAILS))
> +     fprintf (dump_file, "Use is originating!\n");
> +      *is_relevant = true;
> +      *is_originating = true;
> +      unsigned int hash = equivalence_hash (def_link);
> +
> +      if (is_base)
> +     {
> +       insn_entry[uid].base_delta = 0;
> +       insn_entry[uid].original_base_reg = REGNO (reg);
> +       insn_entry[uid].original_base_use = uid;
> +       insn_entry[uid].base_equivalence_hash = hash;
> +     }
> +      else
> +     {
> +       insn_entry[uid].index_delta = 0;
> +       insn_entry[uid].original_index_reg = REGNO (reg);
> +       insn_entry[uid].original_index_use = uid;
> +       insn_entry[uid].index_equivalence_hash = hash;
> +     }
> +    }
> +  else
> +    {
> +      /* Only one definition.  Dig deeper.  */
> +      long long int delta = 0;
> +      rtx insn2 = find_true_originator (def_link, &delta);
> +      if (insn2)
> +     {
> +       unsigned uid2 = INSN_UID (insn2);
> +       df_ref use2;
> +
> +       if (dump_file  && (dump_flags & TDF_DETAILS))
> +         fprintf (dump_file, "Use may propagate from %d\n", uid2);
> +
> +       struct df_insn_info *insn_info2 = DF_INSN_INFO_GET (insn2);
> +
> +       if (insn_info2)
> +         use2 = DF_INSN_INFO_USES (insn_info2);
> +       else
> +         use2 = NULL;
> +
> +       if (!use2 || !DF_REF_NEXT_LOC (use2))
> +         {
> +           *is_originating = false;
> +
> +           rtx body = PATTERN (insn2);
> +           gcc_assert (GET_CODE (body) == SET);
> +           gcc_assert (REG_P (SET_DEST (body)));
> +
> +           if (is_base)
> +             {
> +               insn_entry[uid].original_base_reg = REGNO (SET_DEST (body));
> +               insn_entry[uid].original_base_use  = uid2;
> +               insn_entry[uid].base_delta = delta;
> +             }
> +           else              /* !is_base means is_index.  */
> +             {
> +               insn_entry[uid].original_index_reg = REGNO (SET_DEST(body));
> +               insn_entry[uid].original_index_use = uid2;
> +               insn_entry[uid].index_delta = delta;
> +             }
> +
> +           if (use2)
> +             {
> +               struct df_link *def_link = DF_REF_CHAIN (use2);
> +               unsigned int hash = equivalence_hash (def_link);
> +               *is_relevant = true;
> +               if (is_base)
> +                 insn_entry[uid].base_equivalence_hash = hash;
> +               else
> +                 insn_entry[uid].index_equivalence_hash = hash;
> +             }
> +           /* else, use is not relevant.  */
> +
> +           if (dump_file && (dump_flags & TDF_DETAILS))
> +             fprintf (dump_file,
> +                      " propagates from originating insn %d"
> +                      " with delta: %lld\n", uid2, delta);
> +         }
> +       else if (dump_file && (dump_flags & TDF_DETAILS))
> +         fprintf (dump_file, " Dependencies are too"
> +                  " complicated for this optimization\n");
> +     }
> +    }
> +}
> +
> +/* Given that INSN represents a memory store or load operation, that
> +   MEM is the expression that computes the address to or from which
> +   memory is transferred and INSN_ENTRY holds the array representing
> +   all of the indexing_web_entry structures associated with the
> +   instructions of this function, set the is_relevant field of
> +   INSN_ENTRY [UID] if the form of the memory address expression is
> +   compatible with dform optimization.
> +
> +   If the insn is considered relevant, this function also initializes
> +   the following fields of the corresponding insn_entry:
> +
> +     is_originating
> +
> +   If is_relevant and is_originating, we set:
> +
> +     original_base_reg
> +     original_base_use
> +     base_delta
> +     base_equivalence_hash
> +
> +     original_index_reg = REGNO (index_reg);
> +     original_index_use = uid;
> +     index_delta = 0;
> +     index_equivalence_hash
> +
> +   If is_relevant and !is_originating, we set:
> +
> +     original_index_reg
> +     original_index_use
> +     index_delta = (non-zero)
> +     index_equivalence_hash (only if use2 != NULL)
> +
> +   Complexity is linear in the number of variables "used" by this
> +   instruction, multiplied by the number of definitions of each
> +   variable.  Loop iterations are no greater than
> +   BOUND_RELEVANT_USES * BOUND_MAX_USE_DEFS.  */
> +static void
> +assess_relevance (rtx mem, rtx_insn *insn, indexing_web_entry *insn_entry)
> +{
> +  unsigned int uid = INSN_UID (insn);
> +  rtx base_reg = XEXP (mem,0);
> +  rtx index_reg = XEXP (mem, 1);
> +
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> +    {
> +      fprintf (dump_file, " memory is base +/- index, ");
> +      fprintf (dump_file, "base: ");
> +      print_inline_rtx (dump_file, base_reg, 2);
> +      fprintf (dump_file, "\n index: ");
> +      print_inline_rtx (dump_file, index_reg, 2);
> +      fprintf (dump_file, "\n");
> +    }
> +
> +  if (REG_P (base_reg) && REG_P (index_reg))
> +    {
> +      struct df_insn_info *insn_info = DF_INSN_INFO_GET (insn);
> +      /* Since insn is known to represent a sum or difference, this
> +      insn is likely to use at least two input variables.  */
> +
> +      int num_base_defs = 0;
> +      int num_index_defs = 0;
> +      bool base_is_relevant = false;
> +      bool index_is_relevant = false;
> +      bool base_is_originating = false;
> +      bool index_is_originating = false;
> +      df_ref use;
> +      int use_count = 0;
> +
> +      /* Iterate over the number of definitions used by this
> +      instruction to find the definitions that correspond to the
> +      base register and the index register.  */
> +      FOR_EACH_INSN_INFO_USE (use, insn_info)
> +     {
> +       if (use_count++ >= BOUND_RELEVANT_USES)
> +         {
> +           /* Relevant insns generally have no more than 3 uses.  */
> +           num_base_defs = 0;
> +           num_index_defs = 0;
> +           break;
> +         }
> +       else if (rtx_equal_p (DF_REF_REG (use), base_reg))
> +         {
> +           assess_use_relevance (true, base_reg, &base_is_relevant,
> +                                &base_is_originating,
> +                                use, uid, insn_entry);
> +           num_base_defs++;
> +         }
> +       else if (rtx_equal_p (DF_REF_REG (use), index_reg))
> +         {
> +           assess_use_relevance (false, index_reg, &index_is_relevant,
> +                                &index_is_originating,
> +                                use, uid, insn_entry);
> +           num_index_defs++;
> +         }
> +     }
> +
> +      /* This insn is only relevant if there is exactly one definition of
> +      base and one definition of index and they are both considered to
> +      be relevant.  */
> +      if ((num_base_defs == 1) && (num_index_defs == 1)
> +       && base_is_relevant && index_is_relevant
> +       && (num_relevant_insns++ < MAX_RELEVANT_INSNS)
> +       && (max_use_defs <= BOUND_MAX_USE_DEFS))
> +     {
> +       insn_entry[uid].is_relevant = true;
> +       insn_entry[uid].is_originating =
> +         (base_is_originating && index_is_originating);
> +     }
> +      else if (dump_file)
> +     {
> +       int rejection_count = 0;
> +       fprintf (dump_file,
> +                "Rejecting dform optimization of insn %d\n", uid);
> +       if (num_base_defs != 1)
> +         {
> +           fprintf (dump_file, "Too %s (%d) base definitions\n",
> +                    (num_base_defs > 1)? "many": "few", num_base_defs);
> +           rejection_count++;
> +         }
> +       if (num_index_defs != 1)
> +         {
> +           fprintf (dump_file, "Too %s (%d) index definitions\n",
> +                    (num_index_defs > 1)? "many": "few", num_index_defs);
> +           rejection_count++;
> +         }
> +       if (!base_is_relevant)
> +         {
> +           fprintf (dump_file,
> +                    "The available base definition is not relevant\n");
> +           rejection_count++;
> +         }
> +       if (!index_is_relevant)
> +         {
> +           fprintf (dump_file,
> +                    "The available index definition is not relevant\n");
> +           rejection_count++;
> +         }
> +       if ((rejection_count == 0)
> +           && (num_relevant_insns >= MAX_RELEVANT_INSNS))
> +         {
> +           fprintf (dump_file, "Too many relevant insns\n");
> +           rejected_otherwise_relevant++;
> +         }
> +       else if (max_use_defs >= BOUND_MAX_USE_DEFS)
> +         {
> +           fprintf (dump_file, "Too many relevant use definitions\n");
> +           rejected_otherwise_relevant++;
> +         }
> +     }
> +      else
> +     {
> +       if ((num_base_defs == 1) && (num_index_defs == 1)
> +           && base_is_relevant && index_is_relevant)
> +         rejected_otherwise_relevant++;
> +     }
> +    }
> +  else if (dump_file && (dump_flags & TDF_DETAILS))
> +    fprintf (dump_file, " punting because base or index not registers\n");
> +}
> +
> +
> +/* Main entry point for this pass.
> +
> +   Complexity is linear in the number of instructions in the function
> +   plus the complexity of build_and_fixup_equivalence_classes, which
> +   is also linear in the number of instruction (since each instruction
> +   is in no more than one equivalence class).  */
> +unsigned int
> +rs6000_insert_dform (function *fun)
> +{
> +  basic_block bb;
> +  rtx_insn *insn, *curr_insn = 0;
> +  indexing_web_entry *insn_entry;
> +  unsigned int luid = 0;
> +
> +  num_relevant_insns = 0;
> +  max_use_defs = 0;
> +  rejected_otherwise_relevant = 0;
> +
> +  calculate_dominance_info (CDI_DOMINATORS);
> +
> +  /* Dataflow analysis for use-def chains.  */
> +  df_set_flags (DF_RD_PRUNE_DEAD_DEFS);
> +  df_chain_add_problem (DF_DU_CHAIN | DF_UD_CHAIN);
> +  df_analyze ();
> +
> +  /* Since this pass creates new instructions, get_max_uid () may
> +     return different values at different times during this pass.  The
> +     insn_entry array represents only the instructions that were
> +     present in this function's representation at the start of this
> +     pass.  */
> +  max_uid_at_start = get_max_uid ();
> +  insn_entry = XCNEWVEC (indexing_web_entry, max_uid_at_start);
> +
> +  if (dump_file)
> +    fprintf (dump_file, "Creating insn_entry array with %d entries\n",
> +          max_uid_at_start);
> +
> +  /* The general approach is to:
> +
> +       1. Look for multiple array indexing expressions that refer to
> +          the same array base address such as are represented by the
> +       rtl excerpts below..
> +
> +       2. Group these into subsets for which the indexing expression
> +          derives from the same initial_value + some accumulation of
> +          constant values added thereto.
> +
> +      (cinsn 2 (set (reg/v/f:DI <27> [ x ])
> +                    (reg:DI 3 [ x ])) "ddot-c.c":12
> +       (expr_list:REG_DEAD (reg:DI 3 [ x ])))
> +
> +      ...
> +
> +      (cinsn 31 (set (reg:V2DF <35> [ vect__3.7 ])
> +                     (mem:V2DF (plus:DI (reg/v/f:DI <27> [ x ])
> +                                        (reg:DI <9> [ ivtmp.18 ]))
> +                      [1 MEM[base: x_20(D), index: ivtmp.18_35,
> +                       offset: 0B]+0 S16 A64])) "ddot-c.c":18)
> +
> +      ...
> +
> +      (cinsn 304 (set (reg:DI <70>)
> +                      (plus:DI (reg:DI <9> [ ivtmp.18 ])
> +                               (const_int 16)))
> +       (expr_list:REG_DEAD (reg:DI <9> [ ivtmp.18 ])))
> +      (cinsn 34 (set (reg:DI <9> [ ivtmp.18 ])
> +                     (reg:DI <70>)))
> +   */
> +
> +  /* Walk the insns to gather basic data: complexity of inner-nested
> +     loop body is linear in total number of insns within function.  */
> +  FOR_ALL_BB_FN (bb, fun)
> +    {
> +      if (dump_file && (dump_flags & TDF_DETAILS))
> +     fprintf (dump_file, "Scrutinizing bb %d\n", bb->index);
> +
> +      FOR_BB_INSNS_SAFE (bb, insn, curr_insn)
> +     {
> +       unsigned int uid = INSN_UID (insn);
> +
> +       insn_entry[uid].insn = insn;
> +       insn_entry[uid].bb = BLOCK_FOR_INSN (insn);
> +       insn_entry[uid].luid = luid++;
> +       insn_entry[uid].is_relevant = false;
> +
> +       if (dump_file && (dump_flags & TDF_DETAILS))
> +         {
> +           fprintf (dump_file, "\nLooking at insn: %d\n", uid);
> +           df_dump_insn_top (insn, dump_file);
> +           dump_insn_slim (dump_file, insn);
> +           df_dump_insn_bottom (insn, dump_file);
> +         }
> +
> +       /* First, look for all memory[base + index] expressions.
> +        * Then group these by base.
> +        * Then for all instructions in each group, scrutinize the index
> +        * definition. Partition this group according to the origin
> +        * variable upon which the the definitions of i are based.
> +        *
> +        * How do we define "origin variable"?
> +        *
> +        *  If i has multiple definitions, it is its own origin
> +        *  variable.  Likewise, if i has a single definition and the
> +        *  definition is NOT the sum or difference of a constant value
> +        *  and some other variable, then i is its own origin variable.
> +        *
> +        *  Otherwise, i has the same origin variable as the expression
> +        *  that represents its definition.
> +        *
> +        * After we've created these partitions, for each partition
> +        * whose size is greater than 1:
> +        *
> +        *  1. introduce derived_ptr = base + origin_variable
> +        *     immediately following the instruction that defines
> +        *     origin_variable.
> +        *
> +        *  2. for each member of the partition, replace the expression
> +        *     memory [base + index] with derived_ptr [constant], where
> +        *     constant is the sum of all constant values added to the
> +        *     origin variable to represent this particular value of i.  */
> +       if (NONDEBUG_INSN_P (insn))
> +         {
> +           rtx body = PATTERN (insn);
> +           rtx mem;
> +           if ((GET_CODE (body) == SET) && MEM_P (SET_SRC (body)))
> +             {
> +               mem = XEXP (SET_SRC (body), 0);
> +               insn_entry[uid].is_load = true;
> +               insn_entry[uid].is_store = false;
> +               if (dump_file && (dump_flags & TDF_DETAILS))
> +                 {
> +                   fprintf (dump_file,
> +                            " this insn is fetching data from memory: ");
> +                   print_inline_rtx (dump_file, mem, 2);
> +                   fprintf (dump_file, "\n");
> +                 }
> +             }
> +           else if ((GET_CODE (body) == SET) && MEM_P (SET_DEST (body)))
> +             {
> +               mem = XEXP (SET_DEST (body), 0);
> +               insn_entry[uid].is_load = false;
> +               insn_entry[uid].is_store = true;
> +               if (dump_file && (dump_flags & TDF_DETAILS))
> +                 {
> +                   fprintf (dump_file,
> +                            " this insn is storing data to memory: ");
> +                   print_inline_rtx (dump_file, mem, 2);
> +                   fprintf (dump_file, "\n");
> +                 }
> +             }
> +           else
> +             {
> +               if (dump_file && (dump_flags & TDF_DETAILS))
> +                 fprintf (dump_file,
> +                          " this insn is neither load nor store\n");
> +               continue;             /* Not a load or store */
> +             }
> +
> +           enum rtx_code code = GET_CODE (mem);
> +           if ((code == PLUS) || (code == MINUS))
> +             assess_relevance (mem, insn, insn_entry);
> +           else if (dump_file && (dump_flags & TDF_DETAILS))
> +             fprintf (dump_file,
> +                      " address not sum or difference of values\n");
> +         }
> +       /* else, this is a DEBUG_INSN_P (insn) so ignore it.  */
> +     }
> +
> +     if (dump_file && (dump_flags & TDF_DETAILS))
> +       fprintf (dump_file, "\n");
> +    }
> +
> +  build_and_fixup_equivalence_classes (insn_entry);
> +  free_dominance_info (CDI_DOMINATORS);
> +
> +  if (dump_file)
> +    {
> +      fprintf (dump_file, "\n");
> +      fprintf (dump_file,
> +            "Total number of potentially relevant instructions: %d\n",
> +            num_relevant_insns);
> +      fprintf (dump_file,
> +            "Maximum number of definitions used by potentially "
> +            "relevant instructions: %d\n", max_use_defs);
> +      fprintf (dump_file,
> +            "Total number of potentially relevant but rejected "
> +            "instructions: %d\n\n", rejected_otherwise_relevant);
> +    }
> +
> +  return 0;
> +}  // anon namespace
> +
> +
> +const pass_data pass_data_insert_dform =
> +{
> +  RTL_PASS, /* type */
> +  "dform", /* name */
> +  OPTGROUP_NONE, /* optinfo_flags, or could use OPTGROUP_LOOP */
> +  TV_NONE, /* tv_id, or could use TV_LOOP_UNROLL */
> +  0, /* properties_required */
> +  0, /* properties_provided */
> +  0, /* properties_destroyed */
> +  0, /* todo_flags_start */
> +  TODO_df_finish, /* todo_flags_finish */
> +};
> +
> +class pass_insert_dform: public rtl_opt_pass
> +{
> +public:
> +  pass_insert_dform(gcc::context *ctxt)
> +    : rtl_opt_pass(pass_data_insert_dform, ctxt)
> +  {}
> +
> +  /* opt_pass methods: */
> +  virtual bool gate (function *)
> +    {
> +      // This is most relevant to P9 and subsequent targets since P9
> +      // introduces new D-form instructions, but this may pay off on
> +      // other architectures as well.  Additional experimentation with
> +      // other targets may be worthwhile.
> +      return (optimize > 0 && !BYTES_BIG_ENDIAN && TARGET_VSX
> +           && TARGET_P9_VECTOR);
> +    }
> +
> +  virtual unsigned int execute (function *fun)
> +    {
> +      return rs6000_insert_dform (fun);
> +    }
> +
> +  opt_pass *clone ()
> +    {
> +      return new pass_insert_dform (m_ctxt);
> +    }
> +
> +}; // class pass_insert_dform
> +
> +rtl_opt_pass *make_pass_insert_dform (gcc::context *ctxt)
> +{
> +  return new pass_insert_dform (ctxt);
> +}
> Index: gcc/config/rs6000/rs6000-passes.def
> ===================================================================
> --- gcc/config/rs6000/rs6000-passes.def       (revision 275051)
> +++ gcc/config/rs6000/rs6000-passes.def       (working copy)
> @@ -22,6 +22,8 @@ along with GCC; see the file COPYING3.  If not see
>     INSERT_PASS_AFTER (PASS, INSTANCE, TGT_PASS)
>     INSERT_PASS_BEFORE (PASS, INSTANCE, TGT_PASS)
>     REPLACE_PASS (PASS, INSTANCE, TGT_PASS)
> +   Be advised: gawk program does not parse C comments if inserted below.
>   */
> 
>    INSERT_PASS_BEFORE (pass_cse, 1, pass_analyze_swaps);
> +  INSERT_PASS_AFTER (pass_loop2, 1, pass_insert_dform);
> Index: gcc/config/rs6000/rs6000-protos.h
> ===================================================================
> --- gcc/config/rs6000/rs6000-protos.h (revision 275051)
> +++ gcc/config/rs6000/rs6000-protos.h (working copy)
> @@ -47,6 +47,8 @@ extern bool legitimate_indirect_address_p (rtx, in
>  extern bool legitimate_indexed_address_p (rtx, int);
>  extern bool avoiding_indexed_address_p (machine_mode);
>  extern rtx rs6000_force_indexed_or_indirect_mem (rtx x);
> +extern bool rs6000_target_supports_dform_offset_p (machine_mode,
> +                                                HOST_WIDE_INT);
> 
>  extern rtx rs6000_got_register (rtx);
>  extern rtx find_addr_reg (rtx);
> @@ -246,6 +248,8 @@ namespace gcc { class context; }
>  class rtl_opt_pass;
> 
>  extern rtl_opt_pass *make_pass_analyze_swaps (gcc::context *);
> +extern rtl_opt_pass *make_pass_insert_dform (gcc::context *);
> +
>  extern bool rs6000_sum_of_two_registers_p (const_rtx expr);
>  extern bool rs6000_quadword_masked_address_p (const_rtx exp);
>  extern rtx rs6000_gen_lvx (enum machine_mode, rtx, rtx);
> Index: gcc/config/rs6000/rs6000.c
> ===================================================================
> --- gcc/config/rs6000/rs6000.c        (revision 275051)
> +++ gcc/config/rs6000/rs6000.c        (working copy)
> @@ -8725,6 +8725,152 @@ rs6000_debug_legitimate_address_p (machine_mode mo
>    return ret;
>  }
> 
> +/* This function provides an approximation of which d-form addressing
> +   expressions are valid on any given target configuration.  This
> +   approximation guides optimization choices.  Secondary validation
> +   of the addressing mode is performed before code generation.
> +
> +   Return true iff target has instructions to perform a memory
> +   operation at the specified BYTE_OFFSET from an address held
> +   in a general purpose register.  */
> +bool
> +rs6000_target_supports_dform_offset_p (machine_mode mode,
> +                                    HOST_WIDE_INT byte_offset)
> +{
> +  const HOST_WIDE_INT max_16bit_signed = (0x7fff);
> +  const HOST_WIDE_INT min_16bit_signed = -1 - max_16bit_signed;
> +
> +  /* Available d-form instructions with P1 (the original Power architecture):
> +
> +     lbz RT,D(RA) - load byte and zero d-form
> +     lhz RT,D(RA) - load half word and zero d-form
> +     lha RT,D(RA) - load half word algebraic d-form
> +     lwz RT,D(RA) - load word and zero d-form
> +     lfs FRT,D(RA) - load floating-point single d-form
> +     lfd FRT,D(RA) - load floating-point double d-form
> +
> +     stb RS,D(RA) - store byte d-form
> +     sth RS,D(RA) - store half word d-form
> +     stfs FRS,D(RA) - store floating point single d-form
> +     stfd FRS,D(RA) - store floating point double d-form  */
> +
> +  /* Available d-form instructions with PPC (prior to v2.00):
> +     (option mpowerpc "existed in the past" but is now "always on"
> +
> +     lwa RT,DS(RA) - load word algebraic ds-form (2 bottom bits zero)
> +     ld RT,DS(RA) - load double word ds-form (2 bottom bits zero)
> +
> +     std RS,DS(RA) - store double word ds-form (2 bottom bits zero)
> +
> +     Consider lwa redundant with insn available in prior processors.  */
> +  switch (mode)
> +    {
> +    case E_QImode:
> +    case E_HImode:
> +    case E_SImode:
> +    case E_SFmode:
> +    case E_DFmode:
> +      if (IN_RANGE (byte_offset, min_16bit_signed, max_16bit_signed))
> +     return true;
> +      break;
> +
> +    case E_DImode:
> +      if (IN_RANGE (byte_offset, min_16bit_signed, max_16bit_signed)
> +       && ((byte_offset & 0x03) == 0))
> +     return true;
> +      break;
> +
> +    default:
> +      ;         /* Fall through to see if other instructions will work.  */
> +
> +    }
> +
> +  /* Available d-form instructions with v2.03:
> +
> +     lq RTp,DQ(RA) - load quadword dq-form (4 bottom bits zero)
> +
> +     stq RSp,DS(RA) - store quadword ds-form (2 bottom bits zero)
> +
> +     These instructions are not recommended for general use as they
> +     are expected to be very inefficient.  Their design was apparently
> +     motivated by a need to support atomic quad-word access, which is
> +     difficult to implement even in hardware on some architectures.
> +     Furthermore, the design of these instructions apparently does the
> +     "wrong" thing with regards to swapping of double words on load
> +     and store for little-endian targets.
> +
> +     Therefore, this routine assumes v2.03 does NOT support quadword
> +     d-form addressing.  */
> +
> +  /* Available d-form instructions with v2.05
> +
> +     (There are some floating-point load and store double-pair
> +      instructions.  Consider them "not available".  There are
> +      described as phasing out, which means they are expected
> +      to have poor performance.)  */
> +
> +  /* Available d-form instructions with 3.0
> +
> +     lxsd VRT,DS(RA) - Load VSX scalar double word ds-form (2 bottom bits 
> zero)
> +                       (redundant with lfd from P1)
> +     lxssp VRT,DS(RA) - Load VSX scalar single precision ds-form
> +                        (bottom 2 bits zero)
> +                        (redundant with lfs from P1)
> +     lxv XT,DQ(RA) - Load VSX Vector dq-form (4 bottom bits zero)
> +                     (Works on little endian for any element type, but
> +                   does not preserve lanes.)
> +
> +     stxsd VRS,DS(RA) - Store VSX scalar double-word DS form
> +                        (bottom 2 bits zero)
> +                        (redundant with stfd from P1)
> +     stxssp VRS,DS(RA) - Store VSX scalar single precision DS-form
> +                         (bottom 2 bits zero)
> +                         (redundant with stfs from P1)
> +     stxv XS,DQ(RA) - Store VSX vector dq-form (4 bottom bits zero)
> +                      (Works on little endian for any element type,
> +                    but does not preserve lanes.)
> +
> +     lxv and stxv load/store to/from any VSX register, including
> +     registers that overlay with floating point and altivec register
> +     sets.  */
> +
> +  if (rs6000_isa_flags & OPTION_MASK_MODULO) /* ISA 3.0 */
> +    {
> +      switch (mode)
> +     {
> +     case E_V16QImode:
> +     case E_V8HImode:
> +     case E_V4SFmode:
> +     case E_V4SImode:
> +     case E_V2DFmode:
> +     case E_V2DImode:
> +     case E_V1TImode:
> +     case E_TImode:
> +
> +     case E_KFmode:          /* ieee 754 128-bit floating point */
> +     case E_IFmode:          /* IBM extended 128-bit double */
> +     case E_TFmode:          /* 128-bit double (form depends on
> +                                gcc command line, which may be
> +                                either -mabi=ieeelongdouble (KF)
> +                                or -mabi=ibmlongdouble (IF). */
> +       /* All 128-bit loads and stores are handled by lxv and stxv.  */
> +       if (IN_RANGE (byte_offset, min_16bit_signed, max_16bit_signed)
> +           && ((byte_offset & 0x0f) == 0))
> +         return true;
> +       break;
> +
> +     default:
> +       ; /* fall through to see if other instructions will work.  */
> +     }
> +    }
> +
> +  /* Todo: add support for any new instructions provided by future
> +     archictures when support for those future architectures is
> +     enabled.  */
> +
> +  return false;
> +}
> +
>  /* Implement TARGET_MODE_DEPENDENT_ADDRESS_P.  */
> 
>  static bool
> Index: gcc/config/rs6000/t-rs6000
> ===================================================================
> --- gcc/config/rs6000/t-rs6000        (revision 275051)
> +++ gcc/config/rs6000/t-rs6000        (working copy)
> @@ -47,6 +47,10 @@ rs6000-call.o: $(srcdir)/config/rs6000/rs6000-call
>       $(COMPILE) $<
>       $(POSTCOMPILE)
> 
> +rs6000-p9dform.o: $(srcdir)/config/rs6000/rs6000-p9dform.c
> +     $(COMPILE) $<
> +     $(POSTCOMPILE)
> +
>  $(srcdir)/config/rs6000/rs6000-tables.opt: $(srcdir)/config/rs6000/genopt.sh 
> \
>    $(srcdir)/config/rs6000/rs6000-cpus.def
>       $(SHELL) $(srcdir)/config/rs6000/genopt.sh $(srcdir)/config/rs6000 > \
> Index: gcc/config.gcc
> ===================================================================
> --- gcc/config.gcc    (revision 275051)
> +++ gcc/config.gcc    (working copy)
> @@ -499,7 +499,7 @@ or1k*-*-*)
>       ;;
>  powerpc*-*-*)
>       cpu_type=rs6000
> -     extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o 
> rs6000-call.o"
> +     extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-p9dform.o 
> rs6000-logue.o rs6000-call.o"
>       extra_headers="ppc-asm.h altivec.h htmintrin.h htmxlintrin.h"
>       extra_headers="${extra_headers} bmi2intrin.h bmiintrin.h"
>       extra_headers="${extra_headers} xmmintrin.h mm_malloc.h emmintrin.h"
> Index: gcc/testsuite/gcc.target/powerpc/p9-dform-0.c
> ===================================================================
> --- gcc/testsuite/gcc.target/powerpc/p9-dform-0.c     (nonexistent)
> +++ gcc/testsuite/gcc.target/powerpc/p9-dform-0.c     (working copy)
> @@ -0,0 +1,44 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_p9vector_ok } */
> +/* { dg-skip-if "" { powerpc*-*-aix* } } */
> +/* { dg-options "-O3 -mdejagnu-cpu=power9 -funroll-loops" } */
> +
> +/* This test confirms that the dform instructions are selected in the
> +   translation of this main program.  */
> +
> +extern void first_dummy ();
> +extern void dummy (double sacc, int n);
> +extern void other_dummy ();
> +
> +extern float opt_value;
> +extern char *opt_desc;
> +
> +#define M 128
> +#define N 512
> +
> +double x [N];
> +double y [N];
> +
> +int main (int argc, char *argv []) {
> +  double sacc;
> +
> +  first_dummy ();
> +  for (int j = 0; j < M; j++) {
> +
> +    sacc = 0.00;
> +    for (unsigned long long int i = 0; i < N; i++) {
> +      sacc += x[i] * y[i];
> +    }
> +    dummy (sacc, N);
> +  }
> +  opt_value = ((float) N) * 2 * ((float) M);
> +  opt_desc = "flops";
> +  other_dummy ();
> +}
> +
> +/* At time the dform optimization pass was merged with trunk, 12
> +   lxv instructions were emitted in place of the same number of lxvx
> +   instructions.  No need to require exactly this number, as it may
> +   change when other optimization passes evolve.  */
> +
> +/* { dg-final { scan-assembler {\mlxv\M} } } */
> Index: gcc/testsuite/gcc.target/powerpc/p9-dform-1.c
> ===================================================================
> --- gcc/testsuite/gcc.target/powerpc/p9-dform-1.c     (nonexistent)
> +++ gcc/testsuite/gcc.target/powerpc/p9-dform-1.c     (working copy)
> @@ -0,0 +1,56 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_p9vector_ok } */
> +/* { dg-skip-if "" { powerpc*-*-aix* } } */
> +/* { dg-options "-O3 -mdejagnu-cpu=power9 -funroll-loops" } */
> +
> +/* This test confirms that the dform instructions are selected in the
> +   translation of this main program.  */
> +
> +extern void first_dummy ();
> +extern void dummy (double sacc, int n);
> +extern void other_dummy ();
> +
> +extern float opt_value;
> +extern char *opt_desc;
> +
> +#define M 128
> +#define N 512
> +
> +double x [N];
> +double y [N];
> +double z [N];
> +
> +int main (int argc, char *argv []) {
> +  double sacc;
> +
> +  first_dummy ();
> +  for (int j = 0; j < M; j++) {
> +
> +    sacc = 0.00;
> +    for (unsigned long long int i = 0; i < N; i++) {
> +      z[i] = x[i] * y[i];
> +      sacc += z[i];
> +    }
> +    dummy (sacc, N);
> +  }
> +  opt_value = ((float) N) * 2 * ((float) M);
> +  opt_desc = "flops";
> +  other_dummy ();
> +}
> +
> +
> +
> +/* At time the dform optimization pass was merged with trunk, 12
> +   lxv instructions were emitted in place of the same number of lxvx
> +   instructions.  No need to require exactly this number, as it may
> +   change when other optimization passes evolve.  */
> +
> +/* { dg-final { scan-assembler {\mlxv\M} } } */
> +
> +/* At time the dform optimization pass was merged with trunk, 6
> +   stxv instructions were emitted in place of the same number of stxvx
> +   instructions.  No need to require exactly this number, as it may
> +   change when other optimization passes evolve.  */
> +
> +/* { dg-final { scan-assembler {\mstxv\M} } } */
> +
> Index: gcc/testsuite/gcc.target/powerpc/p9-dform-10.c
> ===================================================================
> --- gcc/testsuite/gcc.target/powerpc/p9-dform-10.c    (nonexistent)
> +++ gcc/testsuite/gcc.target/powerpc/p9-dform-10.c    (working copy)
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_p9vector_ok } */
> +/* { dg-skip-if "" { powerpc*-*-aix* } } */
> +/* { dg-options "-O3 -mdejagnu-cpu=power9 -funroll-loops -Wall -no-pie" } */
> +
> +#define TYPE signed int
> +#include "p9-dform-generic.h"
> +
> +/* The precise number of lxv and stxv instructions may be impacted by
> +   complex interactions between optimization passes, but we expect at
> +   least one of each.  */
> +/* { dg-final { scan-assembler {\mlxv\M} } } */
> +/* { dg-final { scan-assembler {\mstxv\M} } } */
> Index: gcc/testsuite/gcc.target/powerpc/p9-dform-11.c
> ===================================================================
> --- gcc/testsuite/gcc.target/powerpc/p9-dform-11.c    (nonexistent)
> +++ gcc/testsuite/gcc.target/powerpc/p9-dform-11.c    (working copy)
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_p9vector_ok } */
> +/* { dg-skip-if "" { powerpc*-*-aix* } } */
> +/* { dg-options "-O3 -mdejagnu-cpu=power9 -funroll-loops -Wall -no-pie" } */
> +
> +#define TYPE unsigned long long
> +#include "p9-dform-generic.h"
> +
> +/* The precise number of lxv and stxv instructions may be impacted by
> +   complex interactions between optimization passes, but we expect at
> +   least one of each.  */
> +/* { dg-final { scan-assembler {\mld\M} } } */
> +/* { dg-final { scan-assembler {\mstd\M} } } */
> Index: gcc/testsuite/gcc.target/powerpc/p9-dform-12.c
> ===================================================================
> --- gcc/testsuite/gcc.target/powerpc/p9-dform-12.c    (nonexistent)
> +++ gcc/testsuite/gcc.target/powerpc/p9-dform-12.c    (working copy)
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_p9vector_ok } */
> +/* { dg-skip-if "" { powerpc*-*-aix* } } */
> +/* { dg-options "-O3 -mdejagnu-cpu=power9 -funroll-loops -Wall -no-pie" } */
> +
> +#define TYPE signed long long
> +#include "p9-dform-generic.h"
> +
> +/* The precise number of lxv and stxv instructions may be impacted by
> +   complex interactions between optimization passes, but we expect at
> +   least one of each.  */
> +/* { dg-final { scan-assembler {\mld\M} } } */
> +/* { dg-final { scan-assembler {\mstd\M} } } */
> Index: gcc/testsuite/gcc.target/powerpc/p9-dform-13.c
> ===================================================================
> --- gcc/testsuite/gcc.target/powerpc/p9-dform-13.c    (nonexistent)
> +++ gcc/testsuite/gcc.target/powerpc/p9-dform-13.c    (working copy)
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_p9vector_ok } */
> +/* { dg-skip-if "" { powerpc*-*-aix* } } */
> +/* { dg-options "-O3 -mdejagnu-cpu=power9 -funroll-loops -Wall -no-pie" } */
> +
> +#define TYPE unsigned __int128
> +#include "p9-dform-generic.h"
> +
> +/* The precise number of lxv and stxv instructions may be impacted by
> +   complex interactions between optimization passes, but we expect at
> +   least one of each.  */
> +/* { dg-final { scan-assembler {\mld\M} } } */
> +/* { dg-final { scan-assembler {\mstd\M} } } */
> Index: gcc/testsuite/gcc.target/powerpc/p9-dform-14.c
> ===================================================================
> --- gcc/testsuite/gcc.target/powerpc/p9-dform-14.c    (nonexistent)
> +++ gcc/testsuite/gcc.target/powerpc/p9-dform-14.c    (working copy)
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_p9vector_ok } */
> +/* { dg-skip-if "" { powerpc*-*-aix* } } */
> +/* { dg-options "-O3 -mdejagnu-cpu=power9 -funroll-loops -Wall -no-pie" } */
> +
> +#define TYPE signed __int128
> +#include "p9-dform-generic.h"
> +
> +/* The precise number of lxv and stxv instructions may be impacted by
> +   complex interactions between optimization passes, but we expect at
> +   least one of each.  */
> +/* { dg-final { scan-assembler {\mld\M} } } */
> +/* { dg-final { scan-assembler {\mstd\M} } } */
> Index: gcc/testsuite/gcc.target/powerpc/p9-dform-15.c
> ===================================================================
> --- gcc/testsuite/gcc.target/powerpc/p9-dform-15.c    (nonexistent)
> +++ gcc/testsuite/gcc.target/powerpc/p9-dform-15.c    (working copy)
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_p9vector_ok } */
> +/* { dg-skip-if "" { powerpc*-*-aix* } } */
> +/* { dg-options "-O3 -mdejagnu-cpu=power9 -funroll-loops -Wall -no-pie 
> -mfloat128" } */
> +
> +#define TYPE __float128
> +#include "p9-dform-generic.h"
> +
> +/* The precise number of lxv and stxv instructions may be impacted by
> +   complex interactions between optimization passes, but we expect at
> +   least one of each.  */
> +/* { dg-final { scan-assembler {\mlxv\M} } } */
> +/* { dg-final { scan-assembler {\mstxv\M} } } */
> Index: gcc/testsuite/gcc.target/powerpc/p9-dform-2.c
> ===================================================================
> --- gcc/testsuite/gcc.target/powerpc/p9-dform-2.c     (nonexistent)
> +++ gcc/testsuite/gcc.target/powerpc/p9-dform-2.c     (working copy)
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_p9vector_ok } */
> +/* { dg-skip-if "" { powerpc*-*-aix* } } */
> +/* { dg-options "-O3 -mdejagnu-cpu=power9 -funroll-loops -Wall -no-pie" } */
> +
> +
> +#define TYPE float
> +#include "p9-dform-generic.h"
> +
> +/* The precise number of lxv and stxv instructions may be impacted by
> +   complex interactions between optimization passes, but we expect at
> +   least one of each.  */
> +/* { dg-final { scan-assembler {\mlxv\M} } } */
> +/* { dg-final { scan-assembler {\mstxv\M} } } */
> Index: gcc/testsuite/gcc.target/powerpc/p9-dform-3.c
> ===================================================================
> --- gcc/testsuite/gcc.target/powerpc/p9-dform-3.c     (nonexistent)
> +++ gcc/testsuite/gcc.target/powerpc/p9-dform-3.c     (working copy)
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_p9vector_ok } */
> +/* { dg-skip-if "" { powerpc*-*-aix* } } */
> +/* { dg-options "-O3 -mdejagnu-cpu=power9 -funroll-loops -Wall -no-pie" } */
> +
> +#define TYPE double
> +#include "p9-dform-generic.h"
> +
> +/* At time the dform optimization pass was merged with trunk, 6
> +   lxv instructions were emitted in place of the same number of lxvx
> +   instructions and 8 stxv instructions replace the same number of
> +   stxvx instructions.  No need to require exactly this number, as it
> +   may change when other optimization passes evolve.  */
> +
> +/* { dg-final { scan-assembler {\mlxv\M} } } */
> +/* { dg-final { scan-assembler {\mstxv\M} } } */
> Index: gcc/testsuite/gcc.target/powerpc/p9-dform-4.c
> ===================================================================
> --- gcc/testsuite/gcc.target/powerpc/p9-dform-4.c     (nonexistent)
> +++ gcc/testsuite/gcc.target/powerpc/p9-dform-4.c     (working copy)
> @@ -0,0 +1,13 @@
> +/* { dg-do compile { target { powerpc*-*-* } } } */
> +/* { dg-require-effective-target powerpc_p9vector_ok } */
> +/* { dg-skip-if "" { powerpc*-*-aix* } } */
> +/* { dg-options "-O3 -mdejagnu-cpu=power9 -funroll-loops -Wall -no-pie" } */
> +
> +#define TYPE long double
> +#include "p9-dform-generic.h"
> +
> +/* The precise number of lxv and stxv instructions may be impacted by
> +   complex interactions between optimization passes, but we expect at
> +   least one of each.  */
> +/* { dg-final { scan-assembler {\mlfd\M} } } */
> +/* { dg-final { scan-assembler {\mstfd\M} } } */
> Index: gcc/testsuite/gcc.target/powerpc/p9-dform-5.c
> ===================================================================
> --- gcc/testsuite/gcc.target/powerpc/p9-dform-5.c     (nonexistent)
> +++ gcc/testsuite/gcc.target/powerpc/p9-dform-5.c     (working copy)
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_p9vector_ok } */
> +/* { dg-skip-if "" { powerpc*-*-aix* } } */
> +/* { dg-options "-O3 -mdejagnu-cpu=power9 -funroll-loops -Wall -no-pie" } */
> +
> +#define TYPE unsigned char
> +#include "p9-dform-generic.h"
> +
> +/* The precise number of lxv and stxv instructions may be impacted by
> +   complex interactions between optimization passes, but we expect at
> +   least one of each.  */
> +/* { dg-final { scan-assembler {\mlxv\M} } } */
> +/* { dg-final { scan-assembler {\mstxv\M} } } */
> Index: gcc/testsuite/gcc.target/powerpc/p9-dform-6.c
> ===================================================================
> --- gcc/testsuite/gcc.target/powerpc/p9-dform-6.c     (nonexistent)
> +++ gcc/testsuite/gcc.target/powerpc/p9-dform-6.c     (working copy)
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_p9vector_ok } */
> +/* { dg-skip-if "" { powerpc*-*-aix* } } */
> +/* { dg-options "-O3 -mdejagnu-cpu=power9 -funroll-loops -Wall -no-pie" } */
> +
> +#define TYPE signed char
> +#include "p9-dform-generic.h"
> +
> +/* The precise number of lxv and stxv instructions may be impacted by
> +   complex interactions between optimization passes, but we expect at
> +   least one of each.  */
> +/* { dg-final { scan-assembler {\mlxv\M} } } */
> +/* { dg-final { scan-assembler {\mstxv\M} } } */
> Index: gcc/testsuite/gcc.target/powerpc/p9-dform-7.c
> ===================================================================
> --- gcc/testsuite/gcc.target/powerpc/p9-dform-7.c     (nonexistent)
> +++ gcc/testsuite/gcc.target/powerpc/p9-dform-7.c     (working copy)
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_p9vector_ok } */
> +/* { dg-skip-if "" { powerpc*-*-aix* } } */
> +/* { dg-options "-O3 -mdejagnu-cpu=power9 -funroll-loops -Wall -no-pie" } */
> +
> +#define TYPE unsigned short
> +#include "p9-dform-generic.h"
> +
> +/* The precise number of lxv and stxv instructions may be impacted by
> +   complex interactions between optimization passes, but we expect at
> +   least one of each.  */
> +/* { dg-final { scan-assembler {\mlxv\M} } } */
> +/* { dg-final { scan-assembler {\mstxv\M} } } */
> Index: gcc/testsuite/gcc.target/powerpc/p9-dform-8.c
> ===================================================================
> --- gcc/testsuite/gcc.target/powerpc/p9-dform-8.c     (nonexistent)
> +++ gcc/testsuite/gcc.target/powerpc/p9-dform-8.c     (working copy)
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_p9vector_ok } */
> +/* { dg-skip-if "" { powerpc*-*-aix* } } */
> +/* { dg-options "-O3 -mdejagnu-cpu=power9 -funroll-loops -Wall -no-pie" } */
> +
> +#define TYPE signed short
> +#include "p9-dform-generic.h"
> +
> +/* The precise number of lxv and stxv instructions may be impacted by
> +   complex interactions between optimization passes, but we expect at
> +   least one of each.  */
> +/* { dg-final { scan-assembler {\mlxv\M} } } */
> +/* { dg-final { scan-assembler {\mstxv\M} } } */
> Index: gcc/testsuite/gcc.target/powerpc/p9-dform-9.c
> ===================================================================
> --- gcc/testsuite/gcc.target/powerpc/p9-dform-9.c     (nonexistent)
> +++ gcc/testsuite/gcc.target/powerpc/p9-dform-9.c     (working copy)
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_p9vector_ok } */
> +/* { dg-skip-if "" { powerpc*-*-aix* } } */
> +/* { dg-options "-O3 -mdejagnu-cpu=power9 -funroll-loops -Wall -no-pie" } */
> +
> +#define TYPE unsigned int
> +#include "p9-dform-generic.h"
> +
> +/* The precise number of lxv and stxv instructions may be impacted by
> +   complex interactions between optimization passes, but we expect at
> +   least one of each.  */
> +/* { dg-final { scan-assembler {\mlxv\M} } } */
> +/* { dg-final { scan-assembler {\mstxv\M} } } */
> Index: gcc/testsuite/gcc.target/powerpc/p9-dform-generic.h
> ===================================================================
> --- gcc/testsuite/gcc.target/powerpc/p9-dform-generic.h       (nonexistent)
> +++ gcc/testsuite/gcc.target/powerpc/p9-dform-generic.h       (working copy)
> @@ -0,0 +1,34 @@
> +
> +#define ITERATIONS 1000000
> +
> +#define SIZE (16384/sizeof(TYPE))
> +
> +static TYPE x[SIZE] __attribute__ ((aligned (16)));
> +static TYPE y[SIZE] __attribute__ ((aligned (16)));
> +static TYPE a;
> +
> +void obfuscate(void *a, ...);
> +
> +static void __attribute__((noinline)) do_one(void)
> +{
> +  unsigned long i;
> +
> +  obfuscate(x, y, &a);
> +
> +  for (i = 0; i < SIZE; i++)
> +    y[i] = a * x[i];
> +
> +  obfuscate(x, y, &a);
> +
> +}
> +
> +int main(void)
> +{
> +  unsigned long i;
> +
> +  for (i = 0; i < ITERATIONS; i++)
> +    do_one();
> +
> +  return 0;
> +
> +}
>
Re: Ping: [PATCH v4, rs6000] Replace X-form addressing with D-form addressing in new pass for Power9

Reply via email to