Re: [PATCH,rs6000] Replace swap of a loaded vector constant with load of a swapped vector constant

2017-09-19 Thread Bill Schmidt
Hi Kelvin,

This is in quite good shape.  In addition to Segher's comments, I have a few
questions/requests below.

> On Sep 15, 2017, at 4:04 PM, Kelvin Nilsen  
> wrote:
> 
> @@ -385,6 +396,25 @@ const_load_sequence_p (swap_web_entry *insn_entry,
> split_const (XVECEXP (tocrel_base, 0, 0), &base, &offset);
> if (GET_CODE (base) != SYMBOL_REF || !CONSTANT_POOL_ADDRESS_P (base))
>   return false;
> +   else
> + {
> +   /* FIXME: The conditions under which
> +*  ((GET_CODE (const_vector) == SYMBOL_REF) &&
> +*   !CONSTANT_POOL_ADDRESS_P (const_vector))
> +* are not well understood.  This code prevents
> +* an internal compiler error which will occur in
> +* replace_swapped_load_constant () if we were to return
> +* true.  Some day, we should figure out how to properly
> +* handle this condition in
> +* replace_swapped_load_constant () and then we can
> +* remove this special test.  */
> +   rtx const_vector = get_pool_constant (base);
> +   if (GET_CODE (const_vector) == SYMBOL_REF)
> + {
> +   if (!CONSTANT_POOL_ADDRESS_P (const_vector))
> + return false;
> + }

Rewrite as

if (GET_CODE (const_vector) == SYMBOL_REF
&& !CONSTANT_POOL_ADDRESS_P (const_vector))
  return false;

> + }
>   }
> }
>   return true;
> @@ -1281,6 +1311,189 @@ replace_swap_with_copy (swap_web_entry *insn_entry
>   insn->set_deleted ();
> }
> 
> +/* Given that swap_insn represents a swap of a load of a constant
> +   vector value, replace with a single instruction that loads a
> +   swapped variant of the original constant. 
> +
> +   The "natural" representation of a byte array in memory is the same
> +   for big endian and little endian.
> +
> +   unsigned char byte_array[] =
> + { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f };
> +
> +   However, when loaded into a vector register, the representation
> +   depends on endian conventions.
> +
> +   In big-endian mode, the register holds:
> +
> + MSBLSB
> + [ f, e, d, c, b, a, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0 ]
> +
> +   In little-endian mode, the register holds:
> +
> + MSBLSB
> + [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f ]
> +
> +
> +   Word arrays require different handling.  Consider the word array:
> +
> +   unsigned int word_array[] =
> + { 0x00010203, 0x04050607, 0x08090a0b, 0x0c0d0e0f };
> +
> +   The in-memory representation depends on endian configuration.  The
> +   equivalent array, declared as a byte array, in memory would be:
> +
> +   unsigned char big_endian_word_array_data[] =
> + { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f }
> +
> +   unsigned char little_endian_word_array_data[] =
> + { 3, 2, 1, 0, 7, 6, 5, 4, b, a, 9, 8, f, e, d, c }
> +
> +   In big-endian mode, the register holds:
> +
> + MSBLSB
> + [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f ]
> +
> +   In little-endian mode, the register holds:
> +
> + MSBLSB
> + [ c, d, e, f, 8, 9, a, b, 4, 5, 6, 7, 0, 1, 2, 3 ]
> +
> +
> +  Similar transformations apply to the vector of half-word and vector
> +  of double-word representations.

Thanks for providing these examples.

> +
> +  For now, don't handle vectors of quad-precision values.  Just return.
> +  A better solution is to fix the code generator to emit lvx/stvx for
> +  those.  */
> +static void
> +replace_swapped_load_constant (swap_web_entry *insn_entry, rtx swap_insn)
> +{
> +  /* Find the load.  */
> +  struct df_insn_info *insn_info = DF_INSN_INFO_GET (swap_insn);
> +  rtx_insn *load_insn = 0;
> +  df_ref use;
> +
> +  FOR_EACH_INSN_INFO_USE (use, insn_info)
> +{
> +  struct df_link *def_link = DF_REF_CHAIN (use);
> +  gcc_assert (def_link && !def_link->next);
> +  load_insn = DF_REF_INSN (def_link->ref);
> +  break;
> +}
> +  gcc_assert (load_insn);

I agree with Segher, please rewrite the above to not use a loop.  The
code here is correct, just awkward.  You are assuming that the first
use in a swap instruction will be the register you're looking for, and
that is correct.

> +
> +  /* Find the TOC-relative symbol access.  */
> +  insn_info = DF_INSN_INFO_GET (load_insn);
> +  rtx_insn *tocrel_insn = 0;
> +  FOR_EACH_INSN_INFO_USE (use, insn_info)
> +{
> +  struct df_link *def_link = DF_REF_CHAIN (use);
> +  gcc_assert (def_link && !def_link->next);
> +  tocrel_insn = DF_REF_INSN (def_link->ref);
> +  break;
> +}
> +  gcc_assert (tocrel_insn);

Please check all the uses here.  I am a little nervous about making
a similar assumption about a base register being the first use in
the load.  While this is likely provab

Re: [PATCH,rs6000] Replace swap of a loaded vector constant with load of a swapped vector constant

2017-09-18 Thread Segher Boessenkool
Hi Kelvin,

On Fri, Sep 15, 2017 at 03:04:52PM -0600, Kelvin Nilsen wrote:
> On Power8 little endian, two instructions are needed to load from the
> natural in-memory representation of a vector into a vector register: a
> load followed by a swap.  When the vector value to be loaded is a
> constant, more efficient code can be achieved by swapping the
> representation of the constant in memory so that only a load instruction
> is required.

I'll leave the review of the actual swaps part to Bill...  But some
comments:

> --- gcc/config/rs6000/rs6000-p8swap.c (revision 252768)
> +++ gcc/config/rs6000/rs6000-p8swap.c (working copy)
> @@ -342,7 +342,8 @@ const_load_sequence_p (swap_web_entry *insn_entry,
>FOR_EACH_INSN_INFO_USE (use, insn_info)
>  {
>struct df_link *def_link = DF_REF_CHAIN (use);
> -  if (!def_link || def_link->next)
> +  if (!def_link || !def_link->ref || DF_REF_IS_ARTIFICIAL (def_link->ref)
> +   || def_link->next)
>   return false;

You probably should adjust the comment before this a bit:
  /* Find the unique use in the swap and locate its def.  If the def
 isn't unique, punt.  */
no longer says what this does.

> @@ -370,6 +373,14 @@ const_load_sequence_p (swap_web_entry *insn_entry,
> if (!base_def_link || base_def_link->next)
>   return false;
>  
> +   /* Constants held on the stack are not "true" constants
> +* because their values are not part of the static load
> +* image.  If this constant's base reference is a stack
> +* or frame pointer, it is seen as an artificial
> +* reference. */

No leading asterisks in block comments.

> @@ -385,6 +396,25 @@ const_load_sequence_p (swap_web_entry *insn_entry,
> split_const (XVECEXP (tocrel_base, 0, 0), &base, &offset);
> if (GET_CODE (base) != SYMBOL_REF || !CONSTANT_POOL_ADDRESS_P (base))
>   return false;
> +   else
> + {
> +   /* FIXME: The conditions under which
> +*  ((GET_CODE (const_vector) == SYMBOL_REF) &&
> +*   !CONSTANT_POOL_ADDRESS_P (const_vector))
> +* are not well understood.  This code prevents
> +* an internal compiler error which will occur in
> +* replace_swapped_load_constant () if we were to return
> +* true.  Some day, we should figure out how to properly
> +* handle this condition in
> +* replace_swapped_load_constant () and then we can
> +* remove this special test.  */
> +   rtx const_vector = get_pool_constant (base);
> +   if (GET_CODE (const_vector) == SYMBOL_REF)
> + {
> +   if (!CONSTANT_POOL_ADDRESS_P (const_vector))
> + return false;
> + }
> + }
>   }
>  }
>return true;

It would be good to understand what this is about.  Some day :-)

> @@ -1281,6 +1311,189 @@ replace_swap_with_copy (swap_web_entry *insn_entry
>insn->set_deleted ();
>  }
>  
> +/* Given that swap_insn represents a swap of a load of a constant
> +   vector value, replace with a single instruction that loads a
> +   swapped variant of the original constant. 

(Trailing space).

> +static void
> +replace_swapped_load_constant (swap_web_entry *insn_entry, rtx swap_insn)
> +{
> +  /* Find the load.  */
> +  struct df_insn_info *insn_info = DF_INSN_INFO_GET (swap_insn);
> +  rtx_insn *load_insn = 0;
> +  df_ref use;
> +
> +  FOR_EACH_INSN_INFO_USE (use, insn_info)
> +{
> +  struct df_link *def_link = DF_REF_CHAIN (use);
> +  gcc_assert (def_link && !def_link->next);
> +  load_insn = DF_REF_INSN (def_link->ref);
> +  break;
> +}
> +  gcc_assert (load_insn);

A loop where you always break after the first iteration?  You probably
can write this simpler without a loop?  Or is this normal idiom?

> +  /* Find the embedded CONST_VECTOR.  We have to call toc_relative_expr_p
> + to set tocrel_base; otherwise it would be unnecessary as we've
> + already established it will return true.  */
> +  rtx base, offset;
> +  rtx tocrel_expr = SET_SRC (PATTERN (tocrel_insn));
> +  const_rtx tocrel_base;
> +  /* There is an extra level of indirection for small/large code models.  */
> +  if (GET_CODE (tocrel_expr) == MEM)
> +tocrel_expr = XEXP (tocrel_expr, 0);
> +  if (!toc_relative_expr_p (tocrel_expr, false, &tocrel_base, NULL))
> +gcc_unreachable ();
> +  split_const (XVECEXP (tocrel_base, 0, 0), &base, &offset);
> +  rtx const_vector = get_pool_constant (base);
> +  /* With the extra indirection, get_pool_constant will produce the
> + real constant from the reg_equal expression, so get the real
> + constant.  */
> +  if (GET_CODE (const_vector) == SYMBOL_REF)
> +const_vector = get_pool_constant (const_vector);
> +  gcc_assert (GET_CODE (const_vector) == CONST_VECTOR);
> +
> +  rtx new_mem;
> +  enum machine_mode mode = GET_MODE (const_vector);
> +  /* Create an adjusted constant from th

[PATCH,rs6000] Replace swap of a loaded vector constant with load of a swapped vector constant

2017-09-15 Thread Kelvin Nilsen

On Power8 little endian, two instructions are needed to load from the
natural in-memory representation of a vector into a vector register: a
load followed by a swap.  When the vector value to be loaded is a
constant, more efficient code can be achieved by swapping the
representation of the constant in memory so that only a load instruction
is required.

This patch has been bootstrapped and tested without regressions on
powerpc64le-unknown-linux (P8) and on powerpc-unknown-linux (P8,
big-endian, with both -m32 and -m64 target options).

Is this ok for trunk?

gcc/ChangeLog:

2017-09-14  Kelvin Nilsen  

* config/rs6000/rs6000-p8swap.c (const_load_sequence_p): Revise
this function to return false if the definition used by the swap
instruction is artificial, or if the memory address from which the
constant value is loaded is not represented by a base address held
in a register or if the base address register is a frame or stack
pointer.  Additionally, return false if the base address of the
loaded constant is a SYMBOL_REF but is not considered to be a
constant.
(replace_swapped_load_constant): New function.
(rs6000_analyze_swaps): Add a new pass to replace a swap of a
loaded constant vector with a load of a swapped constant vector.

gcc/testsuite/ChangeLog:

2017-09-14  Kelvin Nilsen  

* gcc.target/powerpc/swaps-p8-28.c: New test.
* gcc.target/powerpc/swaps-p8-29.c: New test.
* gcc.target/powerpc/swps-p8-30.c: New test.

Index: gcc/config/rs6000/rs6000-p8swap.c
===
--- gcc/config/rs6000/rs6000-p8swap.c   (revision 252768)
+++ gcc/config/rs6000/rs6000-p8swap.c   (working copy)
@@ -342,7 +342,8 @@ const_load_sequence_p (swap_web_entry *insn_entry,
   FOR_EACH_INSN_INFO_USE (use, insn_info)
 {
   struct df_link *def_link = DF_REF_CHAIN (use);
-  if (!def_link || def_link->next)
+  if (!def_link || !def_link->ref || DF_REF_IS_ARTIFICIAL (def_link->ref)
+ || def_link->next)
return false;
 
   rtx def_insn = DF_REF_INSN (def_link->ref);
@@ -358,6 +359,8 @@ const_load_sequence_p (swap_web_entry *insn_entry,
 
   rtx mem = XEXP (SET_SRC (body), 0);
   rtx base_reg = XEXP (mem, 0);
+  if (!REG_P (base_reg))
+   return false;
 
   df_ref base_use;
   insn_info = DF_INSN_INFO_GET (def_insn);
@@ -370,6 +373,14 @@ const_load_sequence_p (swap_web_entry *insn_entry,
  if (!base_def_link || base_def_link->next)
return false;
 
+ /* Constants held on the stack are not "true" constants
+  * because their values are not part of the static load
+  * image.  If this constant's base reference is a stack
+  * or frame pointer, it is seen as an artificial
+  * reference. */
+ if (DF_REF_IS_ARTIFICIAL (base_def_link->ref))
+   return false;
+
  rtx tocrel_insn = DF_REF_INSN (base_def_link->ref);
  rtx tocrel_body = PATTERN (tocrel_insn);
  rtx base, offset;
@@ -385,6 +396,25 @@ const_load_sequence_p (swap_web_entry *insn_entry,
  split_const (XVECEXP (tocrel_base, 0, 0), &base, &offset);
  if (GET_CODE (base) != SYMBOL_REF || !CONSTANT_POOL_ADDRESS_P (base))
return false;
+ else
+   {
+ /* FIXME: The conditions under which
+  *  ((GET_CODE (const_vector) == SYMBOL_REF) &&
+  *   !CONSTANT_POOL_ADDRESS_P (const_vector))
+  * are not well understood.  This code prevents
+  * an internal compiler error which will occur in
+  * replace_swapped_load_constant () if we were to return
+  * true.  Some day, we should figure out how to properly
+  * handle this condition in
+  * replace_swapped_load_constant () and then we can
+  * remove this special test.  */
+ rtx const_vector = get_pool_constant (base);
+ if (GET_CODE (const_vector) == SYMBOL_REF)
+   {
+ if (!CONSTANT_POOL_ADDRESS_P (const_vector))
+   return false;
+   }
+   }
}
 }
   return true;
@@ -1281,6 +1311,189 @@ replace_swap_with_copy (swap_web_entry *insn_entry
   insn->set_deleted ();
 }
 
+/* Given that swap_insn represents a swap of a load of a constant
+   vector value, replace with a single instruction that loads a
+   swapped variant of the original constant. 
+
+   The "natural" representation of a byte array in memory is the same
+   for big endian and little endian.
+
+   unsigned char byte_array[] =
+ { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f };
+
+   However, when loaded into a vector register, the representation
+   depends on endian conventions.
+
+   In big-endian mode, the register holds:
+
+ MSBLSB
+ [ f, e, d