[PATCH] MAINTAINERS: Change to my personal email address

2019-11-15 Thread Kelvin Nilsen
I'm leaving IBM and am changing my email to my my personal address.

ChangeLog:

2019-11-15  Kelvin Nilsen  

* MAINTAINERS: Change my email address as maintainer.

Index: MAINTAINERS
===
--- MAINTAINERS (revision 278306)
+++ MAINTAINERS (working copy)
@@ -524,7 +524,7 @@ Quentin Neill   

 Adam Nemet 
 Thomas Neumann 
 Dan Nicolaescu 
-Kelvin Nilsen  
+Kelvin Nilsen  
 James Norris
 Diego Novillo  
 Dorit Nuzman   


Re: Ping: [PATCH v4, rs6000] Replace X-form addressing with D-form addressing in new pass for Power9

2019-11-13 Thread Kelvin Nilsen



On 10/25/19 8:30 PM, Kelvin Nilsen wrote:
> 
> This patch adds a new optimization pass for rs6000 targets.
> 
> This new pass scans existing rtl expressions and replaces X-form loads and 
> stores with rtl expressions that favor selection of the D-form instructions 
> in contexts for which the D-form instructions are preferred.  The new pass 
> runs after the RTL loop optimizations since loop unrolling often introduces 
> opportunities for beneficial replacements of X-form addressing instructions.
> 
> For each of the new tests, multiple X-form instructions are replaced with 
> D-form instructions, some addi instructions are replaced with add 
> instructions, and some addi instructions are eliminated.  The typical 
> improvement for the included tests is a decrease of 4.28% to 12.12% in the 
> number of instructions executed on each iteration of the loop.  The 
> optimization has not shown measurable improvement on specmark tests, 
> presumably because the typical loops that are benefited by this optimization 
> are memory bounded and this optimization does not eliminate memory loads or 
> stores.  However, it is anticipated that multi-threaded workloads and 
> measurements of total power and cooling costs for heavy server workloads 
> would benefit.
> 
> This version 4 patch responds to feedback and numerous suggestions by Segher:
> 
>   1. Further improvements to comments and discussion of computational 
> complexity.
> 
>   2. Changed the name of insn_sequence_no to luid.
> 
>   3. Fixed some typos in comments.
> 
>   4. Added macro-defined constants to enforce upper bounds on the sizes (and 
> number of required iterations) for certain data structures.  The intent is to 
> bound compile time for programs that represent large numbers of opportunities 
> for D-form replacements.  This optimization pass ignores  parts of a source 
> program that exceed these macro-defined size limits.
> 
> In a separate mail, I have sent discussion regarding the behavior of 
> preceding passes and how this behavior relates to this new pass.
> 
> I have built and regression tested this patch on powerpc64le-unknown-linux 
> target with no regressions.
> 
> Is this ok for trunk?
> 
> gcc/ChangeLog:
> 
> 2019-10-25  Kelvin Nilsen  
> 
>   * config/rs6000/rs6000-p9dform.c: New file.
>   * config/rs6000/rs6000-passes.def: Add pass_insert_dform.
>   * config/rs6000/rs6000-protos.h
>   (rs6000_target_supports_dform_offset_p): New function prototype.
>   (make_pass_insert_dform): Likewise.
>   * config/rs6000/rs6000.c (rs6000_target_supports_dform_offset_p):
>   New function.
>   * config/rs6000/t-rs6000 (rs6000-p9dform.o): New build target.
>   * config.gcc: Add rs6000-p9dform.o object file.
> 
> gcc/testsuite/ChangeLog:
> 
> 2019-10-25  Kelvin Nilsen  
> 
>   * gcc.target/powerpc/p9-dform-0.c: New test.
>   * gcc.target/powerpc/p9-dform-1.c: New test.
>   * gcc.target/powerpc/p9-dform-10.c: New test.
>   * gcc.target/powerpc/p9-dform-11.c: New test.
>   * gcc.target/powerpc/p9-dform-12.c: New test.
>   * gcc.target/powerpc/p9-dform-13.c: New test.
>   * gcc.target/powerpc/p9-dform-14.c: New test.
>   * gcc.target/powerpc/p9-dform-15.c: New test.
>   * gcc.target/powerpc/p9-dform-2.c: New test.
>   * gcc.target/powerpc/p9-dform-3.c: New test.
>   * gcc.target/powerpc/p9-dform-4.c: New test.
>   * gcc.target/powerpc/p9-dform-5.c: New test.
>   * gcc.target/powerpc/p9-dform-6.c: New test.
>   * gcc.target/powerpc/p9-dform-7.c: New test.
>   * gcc.target/powerpc/p9-dform-8.c: New test.
>   * gcc.target/powerpc/p9-dform-9.c: New test.
>   * gcc.target/powerpc/p9-dform-generic.h: New test.
> 
> Index: gcc/config/rs6000/rs6000-p9dform.c
> ===
> --- gcc/config/rs6000/rs6000-p9dform.c(nonexistent)
> +++ gcc/config/rs6000/rs6000-p9dform.c(working copy)
> @@ -0,0 +1,1763 @@
> +/* Subroutines used to transform array subscripting expressions into
> +   forms that are more amenable to d-form instruction selection for p9
> +   little-endian VSX code.
> +   Copyright (C) 1991-2019 Free Software Foundation, Inc.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published
> +   by the Free Software Foundation; either version 3, or (at your
> +   option) any later version.
> +
> +   GCC is distributed in the hope that it will be useful, but WITHOUT
> +   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
> +   or FITNESS FOR A PAR

[PATCH, rs6000] Add xxswapd support for V2DF and V2DI modes

2019-11-06 Thread Kelvin Nilsen


It was recently discovered that the existing xxswapd instruction patterns lack 
support for the V2DF and V2DI modes.  Support for these modes is required for 
certain new instruction patterns that are being implemented.

This patch adds the desired support.

The patch has been bootstrapped and tested without regressions on 
powerpc64le-unknown-linux.

Is this ok for trunk?

gcc/ChangeLog:

2019-11-06  Kelvin Nilsen  

* config/rs6000/vsx.md (xxswapd_): Add support for V2DF and
V2DI modes.

Index: gcc/config/rs6000/vsx.md
===
--- gcc/config/rs6000/vsx.md(revision 277861)
+++ gcc/config/rs6000/vsx.md(working copy)
@@ -2987,6 +2987,17 @@
   "xxpermdi %x0,%x1,%x1,2"
   [(set_attr "type" "vecperm")])
 
+(define_insn "xxswapd_"
+  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa")
+   (vec_select:VSX_D
+ (match_operand:VSX_D 1 "vsx_register_operand" "wa")
+ (parallel [(const_int 1) (const_int 0)])))]
+  "TARGET_VSX"
+;; AIX does not support extended mnemonic xxswapd.  Use the basic
+;; mnemonic xxpermdi instead.
+  "xxpermdi %x0,%x1,%x1,2"
+  [(set_attr "type" "vecperm")])
+
 ;; lxvd2x for little endian loads.  We need several of
 ;; these since the form of the PARALLEL differs by mode.
 (define_insn "*vsx_lxvd2x2_le_"


[PATCH v4, rs6000] Replace X-form addressing with D-form addressing in new pass for Power9

2019-10-25 Thread Kelvin Nilsen


This patch adds a new optimization pass for rs6000 targets.

This new pass scans existing rtl expressions and replaces X-form loads and 
stores with rtl expressions that favor selection of the D-form instructions in 
contexts for which the D-form instructions are preferred.  The new pass runs 
after the RTL loop optimizations since loop unrolling often introduces 
opportunities for beneficial replacements of X-form addressing instructions.

For each of the new tests, multiple X-form instructions are replaced with 
D-form instructions, some addi instructions are replaced with add instructions, 
and some addi instructions are eliminated.  The typical improvement for the 
included tests is a decrease of 4.28% to 12.12% in the number of instructions 
executed on each iteration of the loop.  The optimization has not shown 
measurable improvement on specmark tests, presumably because the typical loops 
that are benefited by this optimization are memory bounded and this 
optimization does not eliminate memory loads or stores.  However, it is 
anticipated that multi-threaded workloads and measurements of total power and 
cooling costs for heavy server workloads would benefit.

This version 4 patch responds to feedback and numerous suggestions by Segher:

  1. Further improvements to comments and discussion of computational 
complexity.

  2. Changed the name of insn_sequence_no to luid.

  3. Fixed some typos in comments.

  4. Added macro-defined constants to enforce upper bounds on the sizes (and 
number of required iterations) for certain data structures.  The intent is to 
bound compile time for programs that represent large numbers of opportunities 
for D-form replacements.  This optimization pass ignores  parts of a source 
program that exceed these macro-defined size limits.

In a separate mail, I have sent discussion regarding the behavior of preceding 
passes and how this behavior relates to this new pass.

I have built and regression tested this patch on powerpc64le-unknown-linux 
target with no regressions.

Is this ok for trunk?

gcc/ChangeLog:

2019-10-25  Kelvin Nilsen  

* config/rs6000/rs6000-p9dform.c: New file.
* config/rs6000/rs6000-passes.def: Add pass_insert_dform.
* config/rs6000/rs6000-protos.h
(rs6000_target_supports_dform_offset_p): New function prototype.
(make_pass_insert_dform): Likewise.
* config/rs6000/rs6000.c (rs6000_target_supports_dform_offset_p):
New function.
* config/rs6000/t-rs6000 (rs6000-p9dform.o): New build target.
* config.gcc: Add rs6000-p9dform.o object file.

gcc/testsuite/ChangeLog:

2019-10-25  Kelvin Nilsen  

* gcc.target/powerpc/p9-dform-0.c: New test.
* gcc.target/powerpc/p9-dform-1.c: New test.
* gcc.target/powerpc/p9-dform-10.c: New test.
* gcc.target/powerpc/p9-dform-11.c: New test.
* gcc.target/powerpc/p9-dform-12.c: New test.
* gcc.target/powerpc/p9-dform-13.c: New test.
* gcc.target/powerpc/p9-dform-14.c: New test.
* gcc.target/powerpc/p9-dform-15.c: New test.
* gcc.target/powerpc/p9-dform-2.c: New test.
* gcc.target/powerpc/p9-dform-3.c: New test.
* gcc.target/powerpc/p9-dform-4.c: New test.
* gcc.target/powerpc/p9-dform-5.c: New test.
* gcc.target/powerpc/p9-dform-6.c: New test.
* gcc.target/powerpc/p9-dform-7.c: New test.
* gcc.target/powerpc/p9-dform-8.c: New test.
* gcc.target/powerpc/p9-dform-9.c: New test.
* gcc.target/powerpc/p9-dform-generic.h: New test.

Index: gcc/config/rs6000/rs6000-p9dform.c
===
--- gcc/config/rs6000/rs6000-p9dform.c  (nonexistent)
+++ gcc/config/rs6000/rs6000-p9dform.c  (working copy)
@@ -0,0 +1,1763 @@
+/* Subroutines used to transform array subscripting expressions into
+   forms that are more amenable to d-form instruction selection for p9
+   little-endian VSX code.
+   Copyright (C) 1991-2019 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "rtl.h"
+#include "tree.h"
+#include "memmodel.h"
+#include "df.h"
+#include "

Re: {PATCH v3, rs6000] Replace X-form addressing with D-form addressing in new pass for Power9

2019-10-22 Thread Kelvin Nilsen



On 10/17/19 5:57 PM, Segher Boessenkool wrote:
> Hi Kelvin,
> 
> On Wed, Oct 09, 2019 at 03:28:45PM -0500, Kelvin Nilsen wrote:
>> This new pass scans existing rtl expressions and replaces them with rtl 
>> expressions that favor selection of the D-form instructions in contexts for 
>> which the D-form instructions are preferred.  The new pass runs after the 
>> RTL loop optimizations since loop unrolling often introduces opportunities 
>> for beneficial replacements of X-form addressing instructions.
>>
>> For each of the new tests, multiple X-form instructions are replaced with 
>> D-form instructions, some addi instructions are replaced with add 
>> instructions, and some addi instructions are eliminated.  The typical 
>> improvement for the included tests is a decrease of 4.28% to 12.12% in the 
>> number of instructions executed on each iteration of the loop.  The 
>> optimization has not shown measurable improvement on specmark tests, 
>> presumably because the typical loops that are benefited by this optimization 
>> are memory bounded and this optimization does not eliminate memory loads or 
>> stores.  However, it is anticipated that multi-threaded workloads and 
>> measurements of total power and cooling costs for heavy server workloads 
>> would benefit.
> 
> My first question is, why did ivopts choose the suboptimal solution?
> _Did_ it, or did something later mess things up?
> 
> This new pass can help us investigate that.  It certainly sounds like we
> could do better earlier already.
> 
> I think it is a good design to make fixes late in the pass pipeline, *but*
> we should try to make good choices earlier, too -- the "late tweaks" should
> be just that, tweaks; 4%-12% is a bit much.
> 
> (It's not that super late here; but still, why does it help so much?)
> 

Thanks Segher for looking over my draft patch and providing your comments. When 
I first began work
on this reported performance problem, I did look at the earlier passes in hopes 
of identifying a better place to address the poor instruction selection.

It is difficult to know exactly where we want to accomplish the improved code 
generation.  Some of the "earlier" candidate passes are disadvantaged because 
they are "blind" to instruction costs and do not even have an awareness of 
which addressing modes are supported by which instructions.

Below, I'm providing some of the earlier pass information for one of the sample 
programs that motivates this patch.  Please feel free to comment.  I welcome 
suggestions as to alternative ways to attack this.

Thanks.


 
Consider the following program:

extern float opt_value
extern char *opt_desc;

#define M 128
#define N 512

double x [N];
double y [N];

int main (int argc, char *argv []) {
  double sacc;

  first_dummy ();
  for (int j = 0; j < M; j++) {
sacc = 0.00;
for (unsigned long long int i = 0; i < N; i++)
  sacc += x[i] * y[i];
dummy (sacc, N);
  }
  opt_value = ((float) N) * 2 * ((float) M);
  opt_desc = "flops";
  other_dummy ();
}


Compile this with the following command-line options on a Power target:

xgcc p9-dform-0.c -da -m64 -fdump-tree-all -fno-diagnostics-show-caret \
  -fno-diagnostics-show-line-numbers -fdiagnostics-color=never -O3 \
  -mcpu=power9 -mtune=power9 -funroll-loops -ffat-lto-objects -fno-ident


*
* Auto-vectorization transforms this program into approximately the
* following C code
*

int main (int argc, char *argv []) {
  double sacc;
  vector double x_values, y_values, xy_product;
  vector double *vectp_x, *vectp_y;

  first_dummy ();
  for (int j = 0; j < M; j++) {
sacc = 0.00;
vectp_x = x;
vectp_y = y;
for (unsigned int ivtmp_31 = 0; ivtmp_31 != N / 2; ivtmp_31++) {
  x_values = *vectp_x;
  y_values = *vectp_y;
  xy_product = x_values * y_values;
  sacc += xy_product[0];
  sacc += xy_product[1];
  vectp_x++;
  vectp_y++;
}
dummy (sacc, N);
  }
  opt_value = ((float) N) * 2 * ((float) M);
  opt_desc = "flops";
  other_dummy ();
}

*
* Induction variable optimization transforms this program into approximately
* the following C code
*


int main (int argc, char *argv []) {
  double sacc;
  vector double x_values, y_values, xy_product;

  first_dummy ();
  for (int j = 0; j < M; j++) {
sacc = 0.00;
for (unsigned int ivtmp_14 = 0; ivtmp_31 != 4096; ivtmp_14 += 16) {
  x_values = x [ivtmp_14];
  y_values = y [ivtmp_14];
  xy_product = x_values * y_values;
  sacc += xy_product[0];
  sacc += xy_product[1];
  /* Note: induction variable optimization has removed 2 p

{PATCH v3, rs6000] Replace X-form addressing with D-form addressing in new pass for Power9

2019-10-09 Thread Kelvin Nilsen
This patch is a refinement of a patch first submitted to this list on Nov. 10, 
2018, with revisions submitted this list on Dec. 13, 2018 and Sep. 3, 2019.

This new pass scans existing rtl expressions and replaces them with rtl 
expressions that favor selection of the D-form instructions in contexts for 
which the D-form instructions are preferred.  The new pass runs after the RTL 
loop optimizations since loop unrolling often introduces opportunities for 
beneficial replacements of X-form addressing instructions.

For each of the new tests, multiple X-form instructions are replaced with 
D-form instructions, some addi instructions are replaced with add instructions, 
and some addi instructions are eliminated.  The typical improvement for the 
included tests is a decrease of 4.28% to 12.12% in the number of instructions 
executed on each iteration of the loop.  The optimization has not shown 
measurable improvement on specmark tests, presumably because the typical loops 
that are benefited by this optimization are memory bounded and this 
optimization does not eliminate memory loads or stores.  However, it is 
anticipated that multi-threaded workloads and measurements of total power and 
cooling costs for heavy server workloads would benefit.

This version 3 patch responds to feedback and numerous suggestions by Segher:

1. Fixed multiple typos.

2. Improved comments and added discussion of computational complexity.

3. Added a field to the indexing_web_entry class, allowing constant-time test 
for dominance of instructions within a common basic block.

4. Improved implementation of the equivalence hash function.

5. Refactored the code to divide into smaller functions and provide more 
descriptive commentary.

6. Improved indentation.

7. Corrected definition of max_16bit_signed value.

8. Added To-do comment in rs6000_target_supports_dform_offset_p, to alert 
maintainers that adding support for future hardware architectures will require 
code to be added to this function.

9. Simplified the dg directives in the new test cases. 

I have built and regression tested this patch on powerpc64le-unknown-linux 
target with no regressions.

Is this ok for trunk?

gcc/ChangeLog:

2019-10-09  Kelvin Nilsen  

* config/rs6000/rs6000-p9dform.c: New file.
* config/rs6000/rs6000-passes.def: Add pass_insert_dform.
* config/rs6000/rs6000-protos.h
(rs6000_target_supports_dform_offset_p): New function prototype.
(make_pass_insert_dform): Likewise.
* config/rs6000/rs6000.c (rs6000_target_supports_dform_offset_p):
New function.
* config/rs6000/t-rs6000 (rs6000-p9dform.o): New build target.
* config.gcc: Add rs6000-p9dform.o object file.

gcc/testsuite/ChangeLog:

2019-10-09  Kelvin Nilsen  

* gcc.target/powerpc/p9-dform-0.c: New test.
* gcc.target/powerpc/p9-dform-1.c: New test.
* gcc.target/powerpc/p9-dform-10.c: New test.
* gcc.target/powerpc/p9-dform-11.c: New test.
* gcc.target/powerpc/p9-dform-12.c: New test.
* gcc.target/powerpc/p9-dform-13.c: New test.
* gcc.target/powerpc/p9-dform-14.c: New test.
* gcc.target/powerpc/p9-dform-15.c: New test.
* gcc.target/powerpc/p9-dform-2.c: New test.
* gcc.target/powerpc/p9-dform-3.c: New test.
* gcc.target/powerpc/p9-dform-4.c: New test.
* gcc.target/powerpc/p9-dform-5.c: New test.
* gcc.target/powerpc/p9-dform-6.c: New test.
* gcc.target/powerpc/p9-dform-7.c: New test.
* gcc.target/powerpc/p9-dform-8.c: New test.
* gcc.target/powerpc/p9-dform-9.c: New test.
* gcc.target/powerpc/p9-dform-generic.h: New test.

Index: gcc/config/rs6000/rs6000-p9dform.c
===
--- gcc/config/rs6000/rs6000-p9dform.c  (nonexistent)
+++ gcc/config/rs6000/rs6000-p9dform.c  (working copy)
@@ -0,0 +1,1623 @@
+/* Subroutines used to transform array subscripting expressions into
+   forms that are more amenable to d-form instruction selection for p9
+   little-endian VSX code.
+   Copyright (C) 1991-2019 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"

[PATCH v2, rs6000] Replace X-form addressing with D-form addressing in new pass for Power 9

2019-09-03 Thread Kelvin Nilsen
This patch is a refinement of a path first submitted to this list on Nov. 10, 
2018, with a revision submitted this list on Dec. 13, 2018.  At the time of the 
last submission, it was deemed too close to the close of GCC 9, so was not 
considered at that time.

This new pass scans existing rtl expressions and replaces them with rtl 
expressions that favor selection of the D-form instructions in contexts for 
which the D-form instructions are preferred.  The new pass runs after the RTL 
loop optimizations since loop unrolling often introduces opportunities for 
beneficial replacements of X-form addressing instructions.

This version 2 of the patch includes new tests representing additional 
applications for which the existing code generator produces sub-optimal code.  
For each of the sample tests, multiple x-form instructions are replaced with 
D-form instructions, some addi instructions are replaced with add instructions, 
and some addi instructions are eliminated.  The typical improvement for the 
included tests is a decrease of 4.28% to 12.12% in the number of instructions 
executed on each iteration of the loop.  The optimization has not shown 
measurable improvement on, for example, specmark tests, presumably because the 
typical loops that are benefited by this optimization are memory bounded and 
this optimization does not eliminate memory loads or stores.  However, it is 
anticipated that multi-threaded workloads and, for example, measurements of 
total power and cooling costs for heavy server workloads would benefit.

I have built and regression tested this patch on powerpc64le-unknown-linux 
target with no regressions.

Is this ok for trunk?

gcc/ChangeLog:

2019-09-03  Kelvin Nilsen  

* config/rs6000/rs6000-p9dform.c: New file.
* config/rs6000/rs6000-passes.def: Add pass_insert_dform.
* config/rs6000/rs6000-protos.h
(rs6000_target_supports_dform_offset_p): New function prototype.
(make_pass_insert_dform): Likewise.
* config/rs6000/rs6000.c (rs6000_target_supports_dform_offset_p):
New function.
* config/rs6000/t-rs6000 (rs6000-p9dform.o): New build target.
* config.gcc: Add rs6000-p9dform.o object file.

gcc/testsuite/ChangeLog:

2019-09-03  Kelvin Nilsen  

* gcc.target/powerpc/p9-dform-0.c: New test.
* gcc.target/powerpc/p9-dform-1.c: New test.
* gcc.target/powerpc/p9-dform-10.c: New test.
* gcc.target/powerpc/p9-dform-11.c: New test.
* gcc.target/powerpc/p9-dform-12.c: New test.
* gcc.target/powerpc/p9-dform-13.c: New test.
* gcc.target/powerpc/p9-dform-14.c: New test.
* gcc.target/powerpc/p9-dform-15.c: New test.
* gcc.target/powerpc/p9-dform-2.c: New test.
* gcc.target/powerpc/p9-dform-3.c: New test.
* gcc.target/powerpc/p9-dform-4.c: New test.
* gcc.target/powerpc/p9-dform-5.c: New test.
* gcc.target/powerpc/p9-dform-6.c: New test.
* gcc.target/powerpc/p9-dform-7.c: New test.
* gcc.target/powerpc/p9-dform-8.c: New test.
* gcc.target/powerpc/p9-dform-9.c: New test.
* gcc.target/powerpc/p9-dform-generic.h: New header.


Index: gcc/config/rs6000/rs6000-p9dform.c
===
--- gcc/config/rs6000/rs6000-p9dform.c  (nonexistent)
+++ gcc/config/rs6000/rs6000-p9dform.c  (working copy)
@@ -0,0 +1,1487 @@
+/* Subroutines used to transform array subscripting expressions into
+   forms that are more amenable to d-form instruction selection for p9
+   little-endian VSX code.
+   Copyright (C) 1991-2018 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "rtl.h"
+#include "tree.h"
+#include "memmodel.h"
+#include "df.h"
+#include "tm_p.h"
+#include "ira.h"
+#include "print-tree.h"
+#include "varasm.h"
+#include "explow.h"
+#include "expr.h"
+#include "output.h"
+#include "tree-pass.h"
+#include "rtx-vector-builder.h"
+#include "cfgloop.h"
+
+#include "insn-config.h"
+#include "recog.h"
+
+#include &

[PATCH, rs6000] PR89765: Multiple problems with vec-insert implementation on PowerPC

2019-04-30 Thread Kelvin Nilsen


In combination with a related recently committed patch 
(https://gcc.gnu.org/ml/gcc-patches/2019-04/msg00989.html), the attached patch 
resolves the issues described in this problem report.  This patch also includes 
tests to exercise the previously committed patch.

This patch includes redundant content from patch PR89424 
(https://gcc.gnu.org/ml/gcc-patches/2019-04/msg00994.html), which has been 
already been approved by Segher for trunk and backports to GCC 7 and 8 but is 
awaiting GCC 9 release.

The patch has been bootstrapped and tested without regressions on 
powerpc64le-unknown-linux-gnu (both P8 and P9) and on 
powerpc64-unknown-linux-gnu (P7 and P8, both -m32 and -m64).

Segher: After GCC9 release, is this ok for trunk and backports to GCC 7 and 
GCC8?

Jakub or Richi: Is this patch and the redundant PR89424 patch ok for backports 
to GCC9?

gcc/ChangeLog:

2019-04-30  Kelvin Nilsen  

PR target/89765
* config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):
In handling of ALTIVEC_BUILTIN_VEC_INSERT, use modular arithmetic
to compute vector element selector for both constant and variable
operands.
* config/rs6000/rs6000.c (rs6000_expand_vector_extract): Add case
to handle V1TImode vectors.

gcc/testsuite/ChangeLog:

2019-04-30  Kelvin Nilsen  

PR target/89765
* gcc.target/powerpc/pr89765-mc.c: New test.
* gcc.target/powerpc/vsx-builtin-10c.c: New test.
* gcc.target/powerpc/vsx-builtin-10d.c: New test.
* gcc.target/powerpc/vsx-builtin-11c.c: New test.
* gcc.target/powerpc/vsx-builtin-11d.c: New test.
* gcc.target/powerpc/vsx-builtin-12c.c: New test.
* gcc.target/powerpc/vsx-builtin-12d.c: New test.
* gcc.target/powerpc/vsx-builtin-13c.c: New test.
* gcc.target/powerpc/vsx-builtin-13d.c: New test.
* gcc.target/powerpc/vsx-builtin-14c.c: New test.
* gcc.target/powerpc/vsx-builtin-14d.c: New test.
* gcc.target/powerpc/vsx-builtin-15c.c: New test.
* gcc.target/powerpc/vsx-builtin-15d.c: New test.
* gcc.target/powerpc/vsx-builtin-16c.c: New test.
* gcc.target/powerpc/vsx-builtin-16d.c: New test.
* gcc.target/powerpc/vsx-builtin-17c.c: New test.
* gcc.target/powerpc/vsx-builtin-17d.c: New test.
* gcc.target/powerpc/vsx-builtin-18c.c: New test.
* gcc.target/powerpc/vsx-builtin-18d.c: New test.
* gcc.target/powerpc/vsx-builtin-19c.c: New test.
* gcc.target/powerpc/vsx-builtin-19d.c: New test.
* gcc.target/powerpc/vsx-builtin-20c.c: New test.
* gcc.target/powerpc/vsx-builtin-20d.c: New test.
* gcc.target/powerpc/vsx-builtin-9c.c: New test.
* gcc.target/powerpc/vsx-builtin-9d.c: New test.
* gcc.target/powerpc/vsx-builtin-13a.c (PR89424): Define this
macro to increase coverage of test.
* gcc.target/powerpc/vsx-builtin-13b.c (PR89424): Likewise.
* gcc.target/powerpc/vsx-builtin-20a.c (PR89424): Likewise.
* gcc.target/powerpc/vsx-builtin-20b.c (PR89424): Likewise.

Index: gcc/config/rs6000/rs6000-c.c
===
--- gcc/config/rs6000/rs6000-c.c(revision 270584)
+++ gcc/config/rs6000/rs6000-c.c(working copy)
@@ -6736,11 +6736,13 @@ altivec_resolve_overloaded_builtin (location_t loc
   /* If we can use the VSX xxpermdi instruction, use that for insert.  */
   mode = TYPE_MODE (arg1_type);
   if ((mode == V2DFmode || mode == V2DImode) && VECTOR_UNIT_VSX_P (mode)
- && TREE_CODE (arg2) == INTEGER_CST
- && wi::ltu_p (wi::to_wide (arg2), 2))
+ && TREE_CODE (arg2) == INTEGER_CST)
{
+ wide_int selector = wi::to_wide (arg2);
+ selector = wi::umod_trunc (selector, 2);
  tree call = NULL_TREE;
 
+ arg2 = wide_int_to_tree (TREE_TYPE (arg2), selector);
  if (mode == V2DFmode)
call = rs6000_builtin_decls[VSX_BUILTIN_VEC_SET_V2DF];
  else if (mode == V2DImode)
@@ -6752,11 +6754,12 @@ altivec_resolve_overloaded_builtin (location_t loc
return build_call_expr (call, 3, arg1, arg0, arg2);
}
   else if (mode == V1TImode && VECTOR_UNIT_VSX_P (mode)
-  && TREE_CODE (arg2) == INTEGER_CST
-  && wi::eq_p (wi::to_wide (arg2), 0))
+  && TREE_CODE (arg2) == INTEGER_CST)
{
  tree call = rs6000_builtin_decls[VSX_BUILTIN_VEC_SET_V1TI];
+ wide_int selector = wi::zero(32);
 
+ arg2 = wide_int_to_tree (TREE_TYPE (arg2), selector);
  /* Note, __builtin_vec_insert_ has vector and scalar types
 reversed.  */
  return build_call_expr (call, 3, arg1, arg0, arg2);
@@ -6764,10 +6767,13 @@ altivec_resolve_overloaded_builtin (location_t loc
 
   /* Build *(((arg1_inner_type*)&(vec

[PATCH, rs6000] PR89424: __builtin_vec_ext_v1ti (v, i) results in ICE with variable i (RS6000)

2019-04-25 Thread Kelvin Nilsen



The attached patch resolves the issue described in this problem report.  The 
patch has been bootstrapped and tested without regressions on 
powerpc64le-unknown-linux-gnu (both P8 and P9) and on 
powerpc64-unknown-linux-gnu (P7 and P8, both -m32 and -m64).

Is this ok for trunk and backports?


Thanks.

gcc/ChangeLog:

2019-04-25  Kelvin Nilsen  

PR target/89424
* config/rs6000/rs6000.c (rs6000_expand_vector_extract): Add
handling of V1TImode.

gcc/testsuite/ChangeLog:

2019-04-25  Kelvin Nilsen  

PR target/89424
* gcc.target/powerpc/pr89424-0.c: New test.
* gcc.target/powerpc/vsx-builtin-13a.c: Define macro PR89424 to
enable testing of newly patched capability.
* gcc.target/powerpc/vsx-builtin-13b.c: Likewise.
* gcc.target/powerpc/vsx-builtin-20a.c: Likewise.
* gcc.target/powerpc/vsx-builtin-20b.c: Likewise.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 270513)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -6944,6 +6944,10 @@ rs6000_expand_vector_extract (rtx target, rtx vec,
 
   switch (mode)
{
+   case E_V1TImode:
+ emit_move_insn (target, gen_lowpart (TImode, vec));
+ return;
+
case E_V2DFmode:
  emit_insn (gen_vsx_extract_v2df_var (target, vec, elt));
  return;
Index: gcc/testsuite/gcc.target/powerpc/pr89424-0.c
===
--- gcc/testsuite/gcc.target/powerpc/pr89424-0.c(nonexistent)
+++ gcc/testsuite/gcc.target/powerpc/pr89424-0.c(working copy)
@@ -0,0 +1,78 @@
+/* { dg-do run { target { powerpc*-*-* && lp64 } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } } */
+/* { dg-require-effective-target vsx_hw } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { } } */
+/* { dg-options "-mvsx" } */
+
+/* This test should run the same on any target that supports vsx
+   instructions.  Intentionally not specifying cpu in order to test
+   all code generation paths.  */
+
+#include 
+
+extern void abort (void);
+
+/* Define PR89626 after that pr is addressed.  */
+#ifdef PR89626
+#define SIGNED
+#else
+#define SIGNED signed
+#endif
+
+#define CONST0 (((__int128) 31415926539) << 60)
+
+/* Test that indices > length of vector are applied modulo the vector
+   length.  */
+
+
+/* Test for variable selector and vector residing in register.  */
+__attribute__((noinline))
+__int128 ei (vector SIGNED __int128 v, int i)
+{
+  return __builtin_vec_ext_v1ti (v, i);
+}
+
+/* Test for variable selector and vector residing in memory.  */
+__int128 mei (vector SIGNED __int128 *vp, int i)
+{
+  return __builtin_vec_ext_v1ti (*vp, i);
+}
+
+int main (int argc, char *argv[]) {
+  vector SIGNED __int128 dv = { CONST0 };
+  __int128 d;
+
+  d = ei (dv, 0);
+  if (d != CONST0)
+abort ();
+
+  d = ei (dv, 1);
+  if (d != CONST0)
+abort ();
+
+  d = ei (dv, 2);
+  if (d != CONST0)
+abort ();
+
+  d = ei (dv, 3);
+  if (d != CONST0)
+abort ();
+
+  d = mei (, 0);
+  if (d != CONST0)
+abort ();
+
+  d = mei (, 1);
+  if (d != CONST0)
+abort ();
+
+  d = mei (, 2);
+  if (d != CONST0)
+abort ();
+
+  d = mei (, 3);
+  if (d != CONST0)
+abort ();
+
+  return 0;
+}
Index: gcc/testsuite/gcc.target/powerpc/vsx-builtin-13a.c
===
--- gcc/testsuite/gcc.target/powerpc/vsx-builtin-13a.c  (revision 270513)
+++ gcc/testsuite/gcc.target/powerpc/vsx-builtin-13a.c  (working copy)
@@ -9,7 +9,7 @@
 #include 
 
 /* Define this after PR89424 is addressed.  */
-#undef PR89424
+#define PR89424
 
 /* Define this after PR89626 is addressed.  */
 #undef PR89626
Index: gcc/testsuite/gcc.target/powerpc/vsx-builtin-13b.c
===
--- gcc/testsuite/gcc.target/powerpc/vsx-builtin-13b.c  (revision 270513)
+++ gcc/testsuite/gcc.target/powerpc/vsx-builtin-13b.c  (working copy)
@@ -9,7 +9,7 @@
 #include 
 
 /* Define this after PR89424 is addressed.  */
-#undef PR89424
+#define PR89424
 
 /* Define this after PR89626 is addressed.  */
 #undef PR89626
Index: gcc/testsuite/gcc.target/powerpc/vsx-builtin-20a.c
===
--- gcc/testsuite/gcc.target/powerpc/vsx-builtin-20a.c  (revision 270513)
+++ gcc/testsuite/gcc.target/powerpc/vsx-builtin-20a.c  (working copy)
@@ -9,7 +9,7 @@
 #include 
 
 /* Define this after PR89424 is addressed.  */
-#undef PR89424
+#define PR89424
 
 extern void abort (void);
 
Index: gcc/testsuite/gcc.target/powerpc/vsx-builtin-20b.c
===
--- gcc/testsuite/gcc.target/powerpc/vsx-builtin-20b.c  (revision 270513)
+++ gcc/testsuite/gcc.target/powerpc/vsx-builtin-20b.

[PATCH, rs6000] PR89424: __builtin_vec_ext_v1ti (v, i) results in ICE with variable i (RS6000)

2019-04-25 Thread Kelvin Nilsen
The attached patch resolves the issue described in this problem report.  The 
patch has been bootstrapped and tested without regressions on 
powerpc64le-unknown-linux-gnu (both P8 and P9) and on powerpc64-linux (P7 and 
P8, both -m32 and -m64).

Is this ok for trunk and backports?


Thanks.

gcc/ChangeLog:

2019-04-25  Kelvin Nilsen  

PR target/89424
* config/rs6000/rs6000.c (rs6000_expand_vector_extract): Add
handling of V1TImode.

gcc/testsuite/ChangeLog:

2019-04-25  Kelvin Nilsen  

PR target/89424
* gcc.target/powerpc/pr89424-0.c: New test.
* gcc.target/powerpc/vsx-builtin-13a.c: Define macro PR89424 to
enable testing of newly patched capability.
* gcc.target/powerpc/vsx-builtin-13b.c: Likewise.
* gcc.target/powerpc/vsx-builtin-20a.c: Likewise.
* gcc.target/powerpc/vsx-builtin-20b.c: Likewise.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 270513)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -6944,6 +6944,10 @@ rs6000_expand_vector_extract (rtx target, rtx vec,
 
   switch (mode)
{
+   case E_V1TImode:
+ emit_move_insn (target, gen_lowpart (TImode, vec));
+ return;
+
case E_V2DFmode:
  emit_insn (gen_vsx_extract_v2df_var (target, vec, elt));
  return;
Index: gcc/testsuite/gcc.target/powerpc/pr89424-0.c
===
--- gcc/testsuite/gcc.target/powerpc/pr89424-0.c(nonexistent)
+++ gcc/testsuite/gcc.target/powerpc/pr89424-0.c(working copy)
@@ -0,0 +1,78 @@
+/* { dg-do run { target { powerpc*-*-* && lp64 } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } } */
+/* { dg-require-effective-target vsx_hw } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { } } */
+/* { dg-options "-mvsx" } */
+
+/* This test should run the same on any target that supports vsx
+   instructions.  Intentionally not specifying cpu in order to test
+   all code generation paths.  */
+
+#include 
+
+extern void abort (void);
+
+/* Define PR89626 after that pr is addressed.  */
+#ifdef PR89626
+#define SIGNED
+#else
+#define SIGNED signed
+#endif
+
+#define CONST0 (((__int128) 31415926539) << 60)
+
+/* Test that indices > length of vector are applied modulo the vector
+   length.  */
+
+
+/* Test for variable selector and vector residing in register.  */
+__attribute__((noinline))
+__int128 ei (vector SIGNED __int128 v, int i)
+{
+  return __builtin_vec_ext_v1ti (v, i);
+}
+
+/* Test for variable selector and vector residing in memory.  */
+__int128 mei (vector SIGNED __int128 *vp, int i)
+{
+  return __builtin_vec_ext_v1ti (*vp, i);
+}
+
+int main (int argc, char *argv[]) {
+  vector SIGNED __int128 dv = { CONST0 };
+  __int128 d;
+
+  d = ei (dv, 0);
+  if (d != CONST0)
+abort ();
+
+  d = ei (dv, 1);
+  if (d != CONST0)
+abort ();
+
+  d = ei (dv, 2);
+  if (d != CONST0)
+abort ();
+
+  d = ei (dv, 3);
+  if (d != CONST0)
+abort ();
+
+  d = mei (, 0);
+  if (d != CONST0)
+abort ();
+
+  d = mei (, 1);
+  if (d != CONST0)
+abort ();
+
+  d = mei (, 2);
+  if (d != CONST0)
+abort ();
+
+  d = mei (, 3);
+  if (d != CONST0)
+abort ();
+
+  return 0;
+}
Index: gcc/testsuite/gcc.target/powerpc/vsx-builtin-13a.c
===
--- gcc/testsuite/gcc.target/powerpc/vsx-builtin-13a.c  (revision 270513)
+++ gcc/testsuite/gcc.target/powerpc/vsx-builtin-13a.c  (working copy)
@@ -9,7 +9,7 @@
 #include 
 
 /* Define this after PR89424 is addressed.  */
-#undef PR89424
+#define PR89424
 
 /* Define this after PR89626 is addressed.  */
 #undef PR89626
Index: gcc/testsuite/gcc.target/powerpc/vsx-builtin-13b.c
===
--- gcc/testsuite/gcc.target/powerpc/vsx-builtin-13b.c  (revision 270513)
+++ gcc/testsuite/gcc.target/powerpc/vsx-builtin-13b.c  (working copy)
@@ -9,7 +9,7 @@
 #include 
 
 /* Define this after PR89424 is addressed.  */
-#undef PR89424
+#define PR89424
 
 /* Define this after PR89626 is addressed.  */
 #undef PR89626
Index: gcc/testsuite/gcc.target/powerpc/vsx-builtin-20a.c
===
--- gcc/testsuite/gcc.target/powerpc/vsx-builtin-20a.c  (revision 270513)
+++ gcc/testsuite/gcc.target/powerpc/vsx-builtin-20a.c  (working copy)
@@ -9,7 +9,7 @@
 #include 
 
 /* Define this after PR89424 is addressed.  */
-#undef PR89424
+#define PR89424
 
 extern void abort (void);
 
Index: gcc/testsuite/gcc.target/powerpc/vsx-builtin-20b.c
===
--- gcc/testsuite/gcc.target/powerpc/vsx-builtin-20b.c  (revision 270513)
+++ gcc/testsuite/gcc.target/powerpc/vsx-builtin-20b.c  (working copy)
@

[PATCH, rs6000] PR87532: Bad results from vec_extract (unsigned char, foo) dependent upon function inline

2019-04-09 Thread Kelvin Nilsen
A patch to address this problem report was committed on 3/15/2019.  Some of the 
new regressions tests submitted with that initial patch failed on P8 big-endian 
and on P9 little-endian.

This new patch addresses the code generation problems that were uncovered by 
these failing tests.  Additionally, this new patch corrects some of the 
expected instruction counts for certain previously existing regression tests on 
certain targets to adjust for changes in the generated code.

This new patch has been bootstrapped and tested without regressions on 
powerpcle-unknown-linux (both P8 and P9) and on powerpc-linux (P7 and P8, both 
-m32 and -m64).

Is this ok for trunk and backports?

Thanks.

gcc/ChangeLog:

2019-04-09  Kelvin Nilsen  

PR target/87532
* config/rs6000/rs6000.c (rs6000_split_vec_extract_var): Use inner
mode of vector rather than mode of destination for move instruction.
* config/rs6000/vsx.md (*vsx_extract__mode_var):
Use QI inner mode with V16QI vector mode.

gcc/testsuite/ChangeLog:

2019-04-09  Kelvin Nilsen  

PR target/87532
* gcc.target/powerpc/fold-vec-extract-char.p8.c: Adjust expected
instruction counts.
* gcc.target/powerpc/fold-vec-extract-int.p8.c: Likewise.
* gcc.target/powerpc/fold-vec-extract-short.p8.c: Likewise.:

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 270127)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -7167,7 +7167,7 @@
  rtx tmp_altivec)
 {
   machine_mode mode = GET_MODE (src);
-  machine_mode scalar_mode = GET_MODE (dest);
+  machine_mode scalar_mode = GET_MODE_INNER (GET_MODE (src));
   unsigned scalar_size = GET_MODE_SIZE (scalar_mode);
   int byte_shift = exact_log2 (scalar_size);
 
Index: gcc/config/rs6000/vsx.md
===
--- gcc/config/rs6000/vsx.md(revision 270127)
+++ gcc/config/rs6000/vsx.md(working copy)
@@ -3739,9 +3739,9 @@
   DONE;
 })
 
-(define_insn_and_split "*vsx_extract___var"
-  [(set (match_operand:SDI 0 "gpc_reg_operand" "=r,r,r")
-   (zero_extend:SDI
+(define_insn_and_split "*vsx_extract__mode_var"
+  [(set (match_operand: 0 "gpc_reg_operand" "=r,r,r")
+   (zero_extend:
 (unspec:
  [(match_operand:VSX_EXTRACT_I 1 "input_operand" "wK,v,m")
   (match_operand:DI 2 "gpc_reg_operand" "r,r,r")]
@@ -3753,7 +3753,7 @@
   "&& reload_completed"
   [(const_int 0)]
 {
-  machine_mode smode = mode;
+  machine_mode smode = mode;
   rs6000_split_vec_extract_var (gen_rtx_REG (smode, REGNO (operands[0])),
operands[1], operands[2],
operands[3], operands[4]);
Index: gcc/testsuite/gcc.target/powerpc/fold-vec-extract-char.p8.c
===
--- gcc/testsuite/gcc.target/powerpc/fold-vec-extract-char.p8.c (revision 
270127)
+++ gcc/testsuite/gcc.target/powerpc/fold-vec-extract-char.p8.c (working copy)
@@ -6,9 +6,9 @@
 /* { dg-options "-mdejagnu-cpu=power8 -O2" } */
 
 // six tests total. Targeting P8LE / P8BE.
-// P8 LE variable offset: rldicl, subfic, sldi, mtvsrd, xxpermdi, vslo, 
mfvsrd, sradi, (extsb)
+// P8 LE variable offset: rldicl, subfic, sldi, mtvsrd, xxpermdi, vslo, 
mfvsrd, sradi, rlwin, (extsb)
 // P8 LE constant offset: vspltb, mfvsrd, rlwinm, (extsb)
-// P8 BE variable offset: sldi, mtvsrd, xxpermdi, vslo, 
mfvsrd, sradi, (extsb)
+// P8 BE variable offset: sldi, mtvsrd, xxpermdi, vslo, 
mfvsrd, sradi, rlwinm, (extsb)
 // P8 BE constant offset: vspltb, mfvsrd, rlwinm, (extsb)
 
 /* { dg-final { scan-assembler-times {\mrldicl\M} 3 { target { le } } } } */
@@ -21,7 +21,7 @@
 /* { dg-final { scan-assembler-times {\msrdi\M} 3 { target lp64 } } } */
 /* { dg-final { scan-assembler-times "extsb" 2 } } */
 /* { dg-final { scan-assembler-times {\mvspltb\M} 3 { target lp64 } } } */
-/* { dg-final { scan-assembler-times {\mrlwinm\M} 2 { target lp64} } } */
+/* { dg-final { scan-assembler-times {\mrlwinm\M} 4 { target lp64} } } */
 
 /* multiple codegen variations for -m32. */
 /* { dg-final { scan-assembler-times {\mrlwinm\M} 3 { target ilp32} } } */
Index: gcc/testsuite/gcc.target/powerpc/fold-vec-extract-int.p8.c
===
--- gcc/testsuite/gcc.target/powerpc/fold-vec-extract-int.p8.c  (revision 
270127)
+++ gcc/testsuite/gcc.target/powerpc/fold-vec-extract-int.p8.c  (working copy)
@@ -7,14 +7,14 @@
 
 // Targeting P8 (LE) and (BE).  6 tests total.
 // P8 LE constant:  vspltw, mfvsrwz, (1:extsw/2:rldicl)
-// P8 LE variables: rldicl, subfic,  sldi, mtvsrd, xxpermdi, vslo, mfvsrd, 
sradi, (1:extsw)
+// P8 LE variables: 

[PATCH, rs6000] PR89732: New test pr87532-mc.c fails on compiler not defaulting to VSX

2019-03-19 Thread Kelvin Nilsen
A recently added test was observed to fail when compiled without the -mvsx 
option.  This patch adds -mvsx to the dg-options directive.

Was boostrapped and regression tested on powerpc-linux (P7 big-endian, both 
-m32 and -m64).

Was preapproved by seg...@gcc.gnu.org and has been merged with trunk.

gcc/testsuite/ChangeLog:

2019-03-19  Kelvin Nilsen  

PR target/89736
* gcc.target/powerpc/pr87532-mc.c: Modify dejagnu directives to
restrict this test to vsx targets.

Index: gcc/testsuite/gcc.target/powerpc/pr87532-mc.c
===
--- gcc/testsuite/gcc.target/powerpc/pr87532-mc.c   (revision 269782)
+++ gcc/testsuite/gcc.target/powerpc/pr87532-mc.c   (working copy)
@@ -1,8 +1,8 @@
 /* { dg-do run { target int128 } } */
-/* { dg-require-effective-target vmx_hw } */
-/* { dg-options "-maltivec -O2" } */
+/* { dg-require-effective-target vsx_hw } */
+/* { dg-options "-mvsx -O2" } */
 
-/* This test should run the same on any target that supports altivec/dfp
+/* This test should run the same on any target that supports vsx
instructions.  Intentionally not specifying cpu in order to test
all code generation paths.  */
 



[PATCH v2, rs6000] PR87532: Bad Results from vec_extract(unsigned char, foo) dependent upon function inline

2019-03-13 Thread Kelvin Nilsen


An initial draft patch was distributed on 3/8/19.  Thanks Segher for careful 
review and detailed feedback.  This second draft patch differs from the first 
in the following regards:

  1. Simplified dg directives in the new tests cases:
 a) Removed { target { powerpc*-*-* } } from dg-do run directives because 
this is redundant with powerpc.exp
 b) Removed { dg-skip-if "" { powerpc*-*-darwin* } } directives because 
this is redundant with requiring vsx or altivec
 c) Changed effective target requirement from dfp_hw to vmx_hw
 d) Required { target { int128 } } instead of lp64 for tests that require 
int128 support
 e) Removed dg-skip-if for -mcpu= overrides, because these tests are not 
setting cpu
 f) Removed "-save-temps -dp -g" from the dg-options directive on certain 
tests
  2. Corrected certain __asm__ statements to require the "v" output constraint 
rather than the "wa" output constraint (when compiling with -maltivec)
  3. In rs6000-c.c, made modular computation of constant selector expression 
unconditional
  4. In rs6000.c:
 a) Changed computation of bits_in_element to use GET_MODE_INNER macro.
 b) Changed error message in case of selector expression overflow to not 
make reference to HOST_WIDE_INT


This problem report, though initially motivated by differences in behavior 
between constant and non-constant selector arguments, uncovered a number of 
inconsistencies in the implementation of vec_extract.

This patch provides several fixes to make handling of constant selector 
expressions the same as the handling of non-constant selector expressions.  In 
the process of testing, it was observed that certain existing regression tests 
were looking for the wrong instructions to be emitted and those tests have been 
updated.

This has bootstrapped and tested without regressions on 
powerpc64le-unknown-linux (both P8 and P9) and on powerpc-linux (P7 big-endian, 
with both -m32 and -m64 target options).

Is this ok for trunk?

gcc/ChangeLog:

2019-03-13  Kelvin Nilsen  

PR target/87532
* config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):
When handling vec_extract, use modular arithmetic to allow
constant selectors greater than vector length.
* config/rs6000/rs6000.c (rs6000_expand_vector_extract): Allow
V1TImode vectors to have constant selector values greater than 0.
Use modular arithmetic to compute vector index.
(rs6000_split_vec_extract_var): Use modular arithmetic to compute
index for in-memory vectors.  Correct code generation for
in-register vectors.
(altivec_expand_vec_ext_builtin): Use modular arithmetic to
computer index.

gcc/testsuite/ChangeLog:

2019-03-13  Kelvin Nilsen  

PR target/87532
* gcc.target/powerpc/fold-vec-extract-char.p8.c: Modify expected
instruction selection.
* gcc.target/powerpc/fold-vec-extract-int.p8.c: Likewise.
* gcc.target/powerpc/fold-vec-extract-short.p8.c: Likewise.
* gcc.target/powerpc/pr87532-mc.c: New test.
* gcc.target/powerpc/pr87532.c: New test.
* gcc.target/powerpc/vec-extract-v16qiu-v2.h: New test.
* gcc.target/powerpc/vec-extract-v16qiu-v2a.c: New test.
* gcc.target/powerpc/vec-extract-v16qiu-v2b.c: New test.
* gcc.target/powerpc/vsx-builtin-10a.c: New test.
* gcc.target/powerpc/vsx-builtin-10b.c: New test.
* gcc.target/powerpc/vsx-builtin-11a.c: New test.
* gcc.target/powerpc/vsx-builtin-11b.c: New test.
* gcc.target/powerpc/vsx-builtin-12a.c: New test.
* gcc.target/powerpc/vsx-builtin-12b.c: New test.
* gcc.target/powerpc/vsx-builtin-13a.c: New test.
* gcc.target/powerpc/vsx-builtin-13b.c: New test.
* gcc.target/powerpc/vsx-builtin-14a.c: New test.
* gcc.target/powerpc/vsx-builtin-14b.c: New test.
* gcc.target/powerpc/vsx-builtin-15a.c: New test.
* gcc.target/powerpc/vsx-builtin-15b.c: New test.
* gcc.target/powerpc/vsx-builtin-16a.c: New test.
* gcc.target/powerpc/vsx-builtin-16b.c: New test.
* gcc.target/powerpc/vsx-builtin-17a.c: New test.
* gcc.target/powerpc/vsx-builtin-17b.c: New test.
* gcc.target/powerpc/vsx-builtin-18a.c: New test.
* gcc.target/powerpc/vsx-builtin-18b.c: New test.
* gcc.target/powerpc/vsx-builtin-19a.c: New test.
* gcc.target/powerpc/vsx-builtin-19b.c: New test.
* gcc.target/powerpc/vsx-builtin-20a.c: New test.
* gcc.target/powerpc/vsx-builtin-20b.c: New test.
* gcc.target/powerpc/vsx-builtin-9a.c: New test.
* gcc.target/powerpc/vsx-builtin-9b.c: New test.

Index: gcc/config/rs6000/rs6000-c.c
===
--- gcc/config/rs6000/rs6000-c.c(revision 269492)
+++ gcc/config/rs6000/rs6000-c.c   

[PATCH, rs6000] PR87532: Bad Results from vec_extract(unsigned char, foo) dependent upon function inline

2019-03-08 Thread Kelvin Nilsen
This problem report, though initially motivated by differences in behavior 
between constant and non-constant selector arguments, uncovered a number of 
inconsistencies in the implementation of vec_extract.

This patch provides several fixes to make handling of constant selector 
expressions the same as the handling of non-constant selector expressions.  In 
the process of testing, it was observed that certain existing regression tests 
were looking for the wrong instructions to be emitted and those tests have been 
updated.

This has bootstrapped and tested without regressions on 
powerpc64le-unknown-linux (both P8 and P9) and on powerpc-linux (P7 big-endian, 
with both -m32 and -m64 target options).

Is this ok for trunk?

gcc/ChangeLog:

2019-03-08  Kelvin Nilsen  

PR target/87532
* config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):
When handling vec-extract, use modular arithmetic to allow
constant selectors greater than vector length.
* config/rs6000/rs6000.c (rs6000_expand_vector_extract): Allow
V1TImode vectors to have constant selector values greater than 0.
Use modular arithmetic to compute vector index.
(rs6000_split_vec_extract_var): Use modular arithmetic to compute
index for in-memory vectors.  Correct code generation for
in-register vectors.
(altivec_expand_vec_ext_builtin): Use modular arithmetic to
compute index.

gcc/testsuite/ChangeLog:

2019-03-08  Kelvin Nilsen  

PR target/87532
* gcc.target/powerpc/vsx-builtin-10a.c: New test.
* gcc.target/powerpc/vsx-builtin-20a.c: New test.
* gcc.target/powerpc/vsx-builtin-11b.c: New test.
* gcc.target/powerpc/vsx-builtin-9b.c: New test.
* gcc.target/powerpc/vsx-builtin-12a.c: New test.
* gcc.target/powerpc/vsx-builtin-13b.c: New test.
* gcc.target/powerpc/vsx-builtin-14a.c: New test.
* gcc.target/powerpc/vec-extract-v16qiu-v2a.c: New test.
* gcc.target/powerpc/vsx-builtin-15b.c: New test.
* gcc.target/powerpc/vsx-builtin-16a.c: New test.
* gcc.target/powerpc/vsx-builtin-17b.c: New test.
* gcc.target/powerpc/vsx-builtin-18a.c: New test.
* gcc.target/powerpc/pr87532-mc.c: New test.
* gcc.target/powerpc/vsx-builtin-19b.c: New test.
* gcc.target/powerpc/vsx-builtin-10b.c: New test.
* gcc.target/powerpc/vsx-builtin-11a.c: New test.
* gcc.target/powerpc/vsx-builtin-9a.c: New test.
* gcc.target/powerpc/vsx-builtin-20b.c: New test.
* gcc.target/powerpc/vsx-builtin-12b.c: New test.
* gcc.target/powerpc/vsx-builtin-13a.c: New test.
* gcc.target/powerpc/vsx-builtin-14b.c: New test.
* gcc.target/powerpc/vsx-builtin-15a.c: New test.
* gcc.target/powerpc/vec-extract-v16qiu-v2b.c: New test.
* gcc.target/powerpc/pr87532.c: New test.
* gcc.target/powerpc/vsx-builtin-16b.c: New test.
* gcc.target/powerpc/vec-extract-v16qiu-v2.h: New test.
* gcc.target/powerpc/vsx-builtin-17a.c: New test.
* gcc.target/powerpc/vsx-builtin-18b.c: New test.
* gcc.target/powerpc/vsx-builtin-19a.c: New test.
* gcc.target/powerpc/fold-vec-extract-char.p8.c: Modify expected
instruction selection.
* gcc.target/powerpc/fold-vec-extract-short.p8.c: Likewise.
* gcc.target/powerpc/fold-vec-extract-int.p8.c: Likewise.

Index: gcc/testsuite/gcc.target/powerpc/vsx-builtin-10a.c
===
--- gcc/testsuite/gcc.target/powerpc/vsx-builtin-10a.c  (revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vsx-builtin-10a.c  (revision 0)
@@ -0,0 +1,157 @@
+/* { dg-do run { target { powerpc*-*-* } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } } */
+/* { dg-require-effective-target dfp_hw } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { } } */
+/* { dg-options "-maltivec" } */
+
+/* This test should run the same on any target that supports altivec/dfp
+   instructions.  Intentionally not specifying cpu in order to test
+   all code generation paths.  */
+
+#include 
+
+extern void abort (void);
+
+#define CONST0 (0)
+#define CONST1 (1)
+#define CONST2 (2)
+#define CONST3 (3)
+#define CONST4 (4)
+#define CONST5 (5)
+#define CONST6 (6)
+#define CONST7 (7)
+
+
+/* Test that indices > length of vector are applied modulo the vector
+   length.  */
+
+/* Test for vector residing in register.  */
+short s3 (vector short v)
+{
+  return __builtin_vec_ext_v8hi (v, 3);
+}
+
+short s7 (vector short v)
+{
+  return __builtin_vec_ext_v8hi (v, 7);
+}
+
+short s21 (vector short v)
+{
+  return __builtin_vec_ext_v8hi (v, 21);
+}
+
+short s30 (vector short v)
+{
+  return __builtin_vec_ext_v8hi (v, 30);
+}
+
+/* Test for vector residing in memory.  */

[PATCH, rs6000] Correct dg directives on recently added vec-extract tests

2019-02-01 Thread Kelvin Nilsen


Overnight regression testing revealed a portability problem with several 
recently installed tests.  The tests were observed to fail on a power7 test 
platform.

The tests, which are intended to execute, are compiled with -mcpu=power8.  
Thus, they require power 8 hardware.

I have regression tested this on powerpc64-linux (P7 big-endian, both -m32 and 
-m64), both 32-bit and 64-bit.  Is this ok for trunk and for various backports 
to which the original patch is to be directed?

gcc/testsuite/ChangeLog:

2019-02-01  Kelvin Nilsen  

* gcc.target/powerpc/vec-extract-slong-1.c: Require p8 execution
hardware.
* gcc.target/powerpc/vec-extract-schar-1.c: Likewise.
* gcc.target/powerpc/vec-extract-sint128-1.c: Likewise.
* gcc.target/powerpc/vec-extract-sshort-1.c: Likewise.
* gcc.target/powerpc/vec-extract-ulong-1.c: Likewise.
* gcc.target/powerpc/vec-extract-uchar-1.c: Likewise.
* gcc.target/powerpc/vec-extract-sint-1.c: Likewise.
* gcc.target/powerpc/vec-extract-uint128-1.c: Likewise.
* gcc.target/powerpc/vec-extract-ushort-1.c: Likewise.
* gcc.target/powerpc/vec-extract-uint-1.c: Likewise.

Index: gcc/testsuite/gcc.target/powerpc/vec-extract-slong-1.c
===
--- gcc/testsuite/gcc.target/powerpc/vec-extract-slong-1.c  (revision 
268424)
+++ gcc/testsuite/gcc.target/powerpc/vec-extract-slong-1.c  (working copy)
@@ -2,7 +2,7 @@
signed longs remains signed.  */
 /* { dg-do run } */
 /* { dg-options "-ansi -mcpu=power8 " } */
-/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-require-effective-target p8vector_hw } */
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power8" } } */
 
 #include 
Index: gcc/testsuite/gcc.target/powerpc/vec-extract-schar-1.c
===
--- gcc/testsuite/gcc.target/powerpc/vec-extract-schar-1.c  (revision 
268424)
+++ gcc/testsuite/gcc.target/powerpc/vec-extract-schar-1.c  (working copy)
@@ -2,7 +2,7 @@
signed chars remains signed.  */
 /* { dg-do run } */
 /* { dg-options "-ansi -mcpu=power8 " } */
-/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-require-effective-target p8vector_hw } */
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power8" } } */
 
 #include 
Index: gcc/testsuite/gcc.target/powerpc/vec-extract-sint128-1.c
===
--- gcc/testsuite/gcc.target/powerpc/vec-extract-sint128-1.c(revision 
268424)
+++ gcc/testsuite/gcc.target/powerpc/vec-extract-sint128-1.c(working copy)
@@ -2,7 +2,7 @@
signed __int128s remains signed.  */
 /* { dg-do run } */
 /* { dg-options "-ansi -mcpu=power8 " } */
-/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-require-effective-target p8vector_hw } */
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power8" } } */
 
 #include 
Index: gcc/testsuite/gcc.target/powerpc/vec-extract-sshort-1.c
===
--- gcc/testsuite/gcc.target/powerpc/vec-extract-sshort-1.c (revision 
268424)
+++ gcc/testsuite/gcc.target/powerpc/vec-extract-sshort-1.c (working copy)
@@ -2,7 +2,7 @@
signed shorts remains signed.  */
 /* { dg-do run } */
 /* { dg-options "-ansi -mcpu=power8 " } */
-/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-require-effective-target p8vector_hw } */
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power8" } } */
 
 #include 
Index: gcc/testsuite/gcc.target/powerpc/vec-extract-ulong-1.c
===
--- gcc/testsuite/gcc.target/powerpc/vec-extract-ulong-1.c  (revision 
268424)
+++ gcc/testsuite/gcc.target/powerpc/vec-extract-ulong-1.c  (working copy)
@@ -2,7 +2,7 @@
unsigned longs remains unsigned.  */
 /* { dg-do run } */
 /* { dg-options "-ansi -mcpu=power8 " } */
-/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-require-effective-target p8vector_hw } */
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power8" } } */
 
 #include 
Index: gcc/testsuite/gcc.target/powerpc/vec-extract-uchar-1.c
===
--- gcc/testsuite/gcc.target/powerpc/vec-extract-uchar-1.c  (revision 
268424)
+++ gcc/testsuite/gcc.target/powerpc/vec-extract-uchar-1.c  (working copy)
@@ -2,7 +2,7 @@
unsigned chars remains unsigned.  */
 /* { dg-do run } */
 /* { dg-options "-ansi -mcpu=power8 " } */
-/* { dg-requir

[PATCH, rs6000] Fix invalid type returned from builtin vec_extract

2019-01-28 Thread Kelvin Nilsen
An error in the type returned from the built-in vec_extract function was 
recently reported, as represented in the following sample program:

#include 
#include 

int main() {
unsigned char uc = 0xf6;
printf("explicit cast: %x\n", (int)uc);

vector unsigned char v = vec_splats((unsigned char)0xf6);
printf("cast from vec_extract(): %x\n", (int)vec_extract(v, 0));
return 0;
}

When compiled with the current trunk, the output of running this program is:

$ ./a.out 
explicit cast: f6
cast from vec_extract(): fff6

The fix is to coerce the result of vec_extract so that it matches the type of 
the array element supplied as its first argument.

I have built and regression tested this patch on powerpc64le-unknown-linux with 
no regressions.  Is this ok for trunk?

gcc/ChangeLog:

2019-01-28  Kelvin Nilsen  

* config/rs6000/rs6000-c.c (altivec-resolve_overloaded_builtin):
Change handling of ALTIVEC_BUILTIN_VEC_EXTRACT.  Coerce result to
type of vector element when vec_extract is implemented by direct
move.

gcc/testsuite/ChangeLog:

2019-01-28  Kelvin Nilsen  

* gcc.target/powerpc/vec-extract-schar-1.c: New test.
* gcc.target/powerpc/vec-extract-sint-1.c: New test.
* gcc.target/powerpc/vec-extract-sint128-1.c: New test.
* gcc.target/powerpc/vec-extract-slong-1.c: New test.
* gcc.target/powerpc/vec-extract-sshort-1.c: New test.
* gcc.target/powerpc/vec-extract-uchar-1.c: New test.
* gcc.target/powerpc/vec-extract-uint-1.c: New test.
* gcc.target/powerpc/vec-extract-uint128-1.c: New test.
* gcc.target/powerpc/vec-extract-ulong-1.c: New test.
* gcc.target/powerpc/vec-extract-ushort-1.c: New test.

Index: gcc/config/rs6000/rs6000-c.c
===
--- gcc/config/rs6000/rs6000-c.c(revision 268196)
+++ gcc/config/rs6000/rs6000-c.c(working copy)
@@ -6645,7 +6645,13 @@
}
 
  if (call)
-   return build_call_expr (call, 2, arg1, arg2);
+   {
+ tree result = build_call_expr (call, 2, arg1, arg2);
+ /* Coerce the result to vector element type.  May be no-op.  */
+ arg1_inner_type = TREE_TYPE (arg1_type);
+ result = fold_convert (arg1_inner_type, result);
+ return result;
+   }
}
 
   /* Build *(((arg1_inner_type*)&(vector type){arg1})+arg2). */
Index: gcc/testsuite/gcc.target/powerpc/vec-extract-schar-1.c
===
--- gcc/testsuite/gcc.target/powerpc/vec-extract-schar-1.c  (nonexistent)
+++ gcc/testsuite/gcc.target/powerpc/vec-extract-schar-1.c  (working copy)
@@ -0,0 +1,27 @@
+/* Test to verify that the vec_extract from a vector of
+   signed chars remains signed.  */
+/* { dg-do run } */
+/* { dg-options "-ansi -mcpu=power8 " } */
+
+#include 
+#include 
+#include 
+
+int test1(signed char sc) {
+  int sce;
+
+  vector signed char v = vec_splats(sc);
+  sce = vec_extract(v,0);
+
+  if (sce != sc)
+abort();
+  return 0;
+}
+
+int main()
+{
+  test1 (0xf6);
+  test1 (0x76);
+  test1 (0x06);
+  return 0;
+}
Index: gcc/testsuite/gcc.target/powerpc/vec-extract-sint-1.c
===
--- gcc/testsuite/gcc.target/powerpc/vec-extract-sint-1.c   (nonexistent)
+++ gcc/testsuite/gcc.target/powerpc/vec-extract-sint-1.c   (working copy)
@@ -0,0 +1,27 @@
+/* Test to verify that the vec_extract from a vector of
+   signed ints remains signed.  */
+/* { dg-do run } */
+/* { dg-options "-ansi -mcpu=power8 " } */
+
+#include 
+#include 
+#include 
+
+int test1(signed int si) {
+  long long int sie;
+
+  vector signed int v = vec_splats(si);
+  sie = vec_extract(v,0);
+
+  if (sie != si)
+abort();
+  return 0;
+}
+
+int main()
+{
+  test1 (0xf600);
+  test1 (0x7600);
+  test1 (0x0600);
+  return 0;
+}
Index: gcc/testsuite/gcc.target/powerpc/vec-extract-sint128-1.c
===
--- gcc/testsuite/gcc.target/powerpc/vec-extract-sint128-1.c(nonexistent)
+++ gcc/testsuite/gcc.target/powerpc/vec-extract-sint128-1.c(working copy)
@@ -0,0 +1,25 @@
+/* Test to verify that the vec_extract from a vector of
+   signed __int128s remains signed.  */
+/* { dg-do run } */
+/* { dg-options "-ansi -mcpu=power8 " } */
+
+#include 
+#include 
+#include 
+
+int test1(signed __int128 st) {
+
+  vector signed long long int v = vec_splats(st);
+
+  if (vec_extract (v, 0) > st)
+abort();
+  return 0;
+}
+
+int main()
+{
+  test1 (((__int128) 0xf600LL) << 64);
+  test1 (((__int128) 0x7600LL) << 64);
+  test1 (((__int128) 0x0600LL) << 64);
+  return 0;
+}
Index: gcc/testsuite/gcc.target/powerpc/vec-extract-slong-1.c
==

[PATCH, rs6000] Replace X-form addressing with D-form addressing in new pass for Power9

2018-12-13 Thread Kelvin Nilsen


This patch is a refinement of a path first submitted to this list on Nov. 10, 
2018.  This new patch incorporates improvements suggested by 
seg...@gcc.gnu.org.  Two regression observed at the time this patch was 
previously distributed have been resolved as described here: 
https://sourceware.org/bugzilla/show_bug.cgi?id=23937

New D-form instructions available on Power9 introduce new code generation 
options that result in more efficient execution.

This new pass scans existing rtl expressions and replaces them with rtl 
expressions that favor selection of the D-form instructions in contexts for 
which the D-form instructions are preferred.  The new pass runs after the RTL 
loop optimizations since loop unrolling often introduces opportunities for 
beneficial replacements of X-form addressing instructions.

I have built and regression tested this patch on powerpc64le-unknown-linux 
(Power9) target with no regressions.

Is this ok for trunk?

gcc/ChangeLog:

2018-12-13  Kelvin Nilsen  

* config/rs6000/rs6000-p9dform.c: New file.
* config/rs6000/rs6000-passes.def: Add pass_insert_dform after
pass_loop2.
* config/rs6000/rs6000-protos.h
(rs6000_target_supports_dform_offset_p): New prototype.
(make_pass_insert_dform): New prototype.
* config/rs6000/rs6000.c (rs6000_target_supports_dform_offset_p):
New function.
* config/rs6000/t-rs6000: Add entry to compile rs6000-p9dform.c.
* config.gcc: Add entry to link new object file rs6000-p9dform.o.

gcc/testsuite/ChangeLog:

2018-12-13  Kelvin Nilsen  

* gcc.target/powerpc/p9-dform-0.c: New test.
* gcc.target/powerpc/p9-dform-1.c: New test.

Index: gcc/config/rs6000/rs6000-p9dform.c
===
--- gcc/config/rs6000/rs6000-p9dform.c  (nonexistent)
+++ gcc/config/rs6000/rs6000-p9dform.c  (working copy)
@@ -0,0 +1,1487 @@
+/* Subroutines used to transform array subscripting expressions into
+   forms that are more amenable to d-form instruction selection for p9
+   little-endian VSX code.
+   Copyright (C) 1991-2018 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "rtl.h"
+#include "tree.h"
+#include "memmodel.h"
+#include "df.h"
+#include "tm_p.h"
+#include "ira.h"
+#include "print-tree.h"
+#include "varasm.h"
+#include "explow.h"
+#include "expr.h"
+#include "output.h"
+#include "tree-pass.h"
+#include "rtx-vector-builder.h"
+#include "cfgloop.h"
+
+#include "insn-config.h"
+#include "recog.h"
+
+#include "print-rtl.h"
+#include "tree-pretty-print.h"
+
+#include "genrtl.h"
+
+/* This pass transforms array indexing expressions from a form that
+   favors selection of X-form instructions into a form that favors
+   selection of D-form instructions.
+
+   Showing favor for D-form instructions is especially important when
+   targeting Power9, as the Power9 architecture added a number of new
+   D-form instruction capabilities.
+
+   Consider, for example, the following loop, excerpted from an actual
+   program:
+
+double sacc, x[], y[], z[];
+sacc = 0.00;
+for (unsigned long long int i = 0; i < N; i++) {
+  z[i] = x[i] * y[i];
+  sacc += z[i];
+}
+
+   Compile this program with the following gcc options which enable both
+   vectorization and loop unrolling:
+-m64 -fdump-rtl-all-details -mcpu=power9 -mtune=power9 -funroll-loops -O3
+
+   Without this pass, this loop is represented by the following:
+
+   lxvx:  16
+   addi:   8
+   xvmuldp:8
+   stxvx:  8
+   fmr:8
+   xxpermdi:   8
+   fadd:  16
+   bdnz:   1
+ ___
+ total:   73 instructions
+
+.L3:
+   lxvx 0,29,11
+   lxvx 12,30,11
+   addi 12,11,16
+   addi 0,11,48
+   addi 5,11,64
+   addi 9,11,32
+   addi 6,11,80
+   addi 7,11,96
+   addi 8,11,112
+   lxvx 2,29,12
+   lxvx 3,30,12
+   

[RFC][PATCH, rs6000] Replace X-form addressing with D-form addressing in new pass for Power9

2018-11-10 Thread Kelvin Nilsen


New D-form instructions available on Power9 introduce new code generation 
options that result in more efficient execution.

This new pass scans existing rtl expressions and replaces them with rtl 
expressions that favor selection of the D-form instructions in contexts for 
which the D-form instructions are preferred.

I have built and regression tested this patch on powerpc64le-unknown-linux 
(Power9) target with only two regressions.

Both regressions relate to resolution of ifuncs, and I have determined that the 
toc pointer upon entry into the resolver functions are not valid.  I have not 
yet determined why this is happening, though I have observed that the same 
problem seems to occur with certain other versions of the compiler prior to my 
trunk with patch.  The two failures are:

FAIL: gcc.dg/attr-ifunc-4.c execution test
FAIL: gcc.dg/ipa/ipa-pta-19.c execution test


I invite comments and suggestions regarding this draft patch at this time.

gcc/ChangeLog:

2018-11-10  Kelvin Nilsen  

* config.gcc: Add entry to compile new object rs6000-p9indexing.o.
* config/rs6000/rs6000-passes.def: Add pass_fix_indexing after
pass_loop2.
* config/rs6000/t-rs6000: Add entry to compile rs6000-p9indexing.c.
* config/rs6000/rs6000.c (rs6000_target_supports_dform_offset_p):
New function.
* config/rs6000/rs6000-protos.h
(rs6000_target_supports_dform_offset_p): New prototype.
(make_pass_fix_indexing): New prototype.
* config/rs6000/rs6000-p9indexing.c: New file.

Index: gcc/config/rs6000/t-rs6000
===
--- gcc/config/rs6000/t-rs6000  (revision 263589)
+++ gcc/config/rs6000/t-rs6000  (working copy)
@@ -35,6 +35,10 @@
$(COMPILE) $<
$(POSTCOMPILE)
 
+rs6000-p9indexing.o: $(srcdir)/config/rs6000/rs6000-p9indexing.c
+   $(COMPILE) $<
+   $(POSTCOMPILE)
+
 $(srcdir)/config/rs6000/rs6000-tables.opt: $(srcdir)/config/rs6000/genopt.sh \
   $(srcdir)/config/rs6000/rs6000-cpus.def
$(SHELL) $(srcdir)/config/rs6000/genopt.sh $(srcdir)/config/rs6000 > \
Index: gcc/config/rs6000/rs6000-protos.h
===
--- gcc/config/rs6000/rs6000-protos.h   (revision 263589)
+++ gcc/config/rs6000/rs6000-protos.h   (working copy)
@@ -47,6 +47,8 @@
 extern bool legitimate_indirect_address_p (rtx, int);
 extern bool legitimate_indexed_address_p (rtx, int);
 extern bool avoiding_indexed_address_p (machine_mode);
+extern bool rs6000_target_supports_dform_offset_p (bool, machine_mode,
+  HOST_WIDE_INT);
 
 extern rtx rs6000_got_register (rtx);
 extern rtx find_addr_reg (rtx);
@@ -244,6 +246,8 @@
 class rtl_opt_pass;
 
 extern rtl_opt_pass *make_pass_analyze_swaps (gcc::context *);
+extern rtl_opt_pass *make_pass_fix_indexing (gcc::context *);
+
 extern bool rs6000_sum_of_two_registers_p (const_rtx expr);
 extern bool rs6000_quadword_masked_address_p (const_rtx exp);
 extern rtx rs6000_gen_lvx (enum machine_mode, rtx, rtx);
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 263589)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -9263,6 +9263,169 @@
   return ret;
 }
 
+/* This function provides an approximation of which d-form addressing
+   expressions are valid on any given target configuration.  This
+   approximation guides optimization choices.  Secondary validation
+   of the addressing mode is performed before code generation.
+
+   Return true iff target has instructions to perform a memory
+   operation at the specified BYTE_OFFSET from an address held
+   in a general purpose register.  if IS_STORE is true, test for
+   availability of a store instruction.  Otherwise, test for
+   availability of a load instruction.  */
+bool
+rs6000_target_supports_dform_offset_p (bool is_store __attribute__((unused)),
+  machine_mode mode,
+  HOST_WIDE_INT byte_offset)
+{
+  const int max_16bit_signed = (0x7fff);
+  const int min_16bit_signed = -1 - max_16bit_signed;
+
+  /* available d-form instructions with P1 (the original Power architecture):
+
+ lbz RT,D(RA) - load byte and zero d-form
+ lhz RT,D(RA) - load half word and zero d-form
+ lha RT,D(RA) - load half word algebraic d-form
+ lwz RT,D(RA) - load word and zero d-form
+ lfs FRT,D(RA) - load floating-point single d-form
+ lfd FRT,D(RA) - load floating-point double d-form
+
+ stb RS,D(RA) - store byte d-form
+ sth RS,D(RA) - store half word d-form
+ stfs FRS,D(RA) - store floating point single d-form
+ stfd FRS,D(RA) - store floating point double d-form
+  */
+
+  /* available d-form instructions with PPC (prior to v2.00):
+ (option mpowerpc "existed in the past" but is now "always 

Re: [PATCH, rs6000] Correct descriptions of __builtin_bcdadd* and _builtin_bcdsub* functions

2018-08-07 Thread Kelvin Nilsen


My "consistency" check was against the implementation.

On 8/2/18 11:38 AM, Segher Boessenkool wrote:
> Hi Kelvin,
> 
> On Wed, Aug 01, 2018 at 02:55:22PM -0500, Kelvin Nilsen wrote:
>> Several errors were discovered in the descriptions of the __builtin_bcdadd, 
>> __builtin_bcdadd_lt, __builtin_bcdadd_eq, __builtin_bcdadd_gt, 
>> __builtin_bcdadd_ov, __builtin_bcdsub, __builtin_bcdsub_lt, 
>> __builtin_bcdsub_eq, __builtin_bcdsub_gt, and __builtin_bcdsub_ov functions. 
>>  This patch corrects these documentation errors.
> 
> What did you check this against?  The ABI doc, or what is currently
> implemented?  Neither is very clear to me :-/
> 
> 
> Segher
> 
> 



[PATCH, rs6000] Correct descriptions of __builtin_bcdadd* and _builtin_bcdsub* functions

2018-08-01 Thread Kelvin Nilsen
Several errors were discovered in the descriptions of the __builtin_bcdadd, 
__builtin_bcdadd_lt, __builtin_bcdadd_eq, __builtin_bcdadd_gt, 
__builtin_bcdadd_ov, __builtin_bcdsub, __builtin_bcdsub_lt, 
__builtin_bcdsub_eq, __builtin_bcdsub_gt, and __builtin_bcdsub_ov functions.  
This patch corrects these documentation errors.

I have built the gcc.pdf file and reviewed the formatting, and all looks good.

Is this ok for trunk?

gcc/ChangeLog:

2018-08-01  Kelvin Nilsen  

* doc/extend.texi (PowerPC AltiVec Built-in Functions Available on
ISA 2.07): Correct spelling of bcdsub to be __builtin_bcdsub.  Add
third argument of type "const signed char" to descriptions of
__builtin_bcdadd, __builtin_bcdadd_lt, __builtin_bcdadd_eq,
__builtin_bcdadd_gt, __builtin_bcdadd_ov, __builtin_bcdsub,
__builtin_bcdsub_lt, __builtin_bcdsub_eq, __builtin_bcdsub_gt,
__builtin_bcdsub_ov functions.

Index: gcc/doc/extend.texi
===
--- gcc/doc/extend.texi (revision 263068)
+++ gcc/doc/extend.texi (working copy)
@@ -18383,16 +18383,16 @@ vector __uint128 vec_vsubcuq (vector __uint128, ve
 __int128 vec_vsubuqm (__int128, __int128);
 __uint128 vec_vsubuqm (__uint128, __uint128);
 
-vector __int128 __builtin_bcdadd (vector __int128, vector __int128);
-int __builtin_bcdadd_lt (vector __int128, vector __int128);
-int __builtin_bcdadd_eq (vector __int128, vector __int128);
-int __builtin_bcdadd_gt (vector __int128, vector __int128);
-int __builtin_bcdadd_ov (vector __int128, vector __int128);
-vector __int128 bcdsub (vector __int128, vector __int128);
-int __builtin_bcdsub_lt (vector __int128, vector __int128);
-int __builtin_bcdsub_eq (vector __int128, vector __int128);
-int __builtin_bcdsub_gt (vector __int128, vector __int128);
-int __builtin_bcdsub_ov (vector __int128, vector __int128);
+vector __int128 __builtin_bcdadd (vector __int128, vector __int128, const 
signed char);
+int __builtin_bcdadd_lt (vector __int128, vector __int128, const signed char);
+int __builtin_bcdadd_eq (vector __int128, vector __int128, const signed char);
+int __builtin_bcdadd_gt (vector __int128, vector __int128, const signed char);
+int __builtin_bcdadd_ov (vector __int128, vector __int128, const signed char);
+vector __int128 __builtin_bcdsub (vector __int128, vector __int128, const 
signed char);
+int __builtin_bcdsub_lt (vector __int128, vector __int128, const signed char);
+int __builtin_bcdsub_eq (vector __int128, vector __int128, const signed char);
+int __builtin_bcdsub_gt (vector __int128, vector __int128, const signed char);
+int __builtin_bcdsub_ov (vector __int128, vector __int128, const signed char);
 @end smallexample
 
 @node PowerPC AltiVec Built-in Functions Available on ISA 3.0



Re: Fwd: [PATCH, rs6000] Replace __uint128_t and __int128_t with __uint128 and __int128 in Power PC built-in documentation

2018-07-27 Thread Kelvin Nilsen
Thanks for review and approval.  To respond to your question about error 
messages:
> 
> microdoc3.c:22:3: error: invalid parameter combination for AltiVec intrinsic 
> ‘__builtin_vec_vaddcuq’
>u1 = vec_vaddcuq (d2, d3);
>^~

On 7/26/18 9:54 AM, Segher Boessenkool wrote:
> On Thu, Jul 26, 2018 at 08:40:01AM -0500, Kelvin Nilsen wrote:
>> To improve internal consistency and to improve consistency with published 
>> ABI documents, this patch replaces the __uint128_t type with __uint128 and 
>> replaces __int128_t with __int128.
> 
>> Is this ok for trunk?
> 
> Looks good, thanks!  Most (all?) of these functions are not documented
> in the ABI, but this is a step forward anyway.  Okay for trunk.
> 
> What do things like error messages involving these functions look like?
> What types do those say?
> 
> 
> Segher
> 
> 



[PATCH, rs6000] Replace __uint128_t and __int128_t with __uint128 and __int128 in Power PC built-in documentation

2018-07-26 Thread Kelvin Nilsen
To improve internal consistency and to improve consistency with published ABI 
documents, this patch replaces the __uint128_t type with __uint128 and replaces 
__int128_t with __int128.

I have built and regression tested this patch on powerpc64le-unknown-linux with 
no regressions.  I have also built and reviewed the gcc.pdf file.

Is this ok for trunk?

gcc/ChangeLog:

2018-07-25  Kelvin Nilsen  

* doc/extend.texi (Basic PowerPC Built-in Functions Available on
ISA 2.05):  Replace __uint128_t with __uint128 and __int128_t with
__int128 in built-in function prototypes.
(PowerPC AltiVec Built-in Functions on ISA 2.07): Likewise.
(PowerPC AltiVec Built-in Functions on ISA 3.0): Likewise.

Index: gcc/doc/extend.texi
===
--- gcc/doc/extend.texi (revision 262977)
+++ gcc/doc/extend.texi (working copy)
@@ -15762,9 +15762,9 @@ long long __builtin_divde (long long, long long);
 unsigned long long __builtin_divdeu (unsigned long long, unsigned long long);
 int __builtin_divwe (int, int);
 unsigned int __builtin_divweu (unsigned int, unsigned int);
-vector __int128_t __builtin_pack_vector_int128 (long long, long long);
+vector __int128 __builtin_pack_vector_int128 (long long, long long);
 void __builtin_rs6000_speculation_barrier (void);
-long long __builtin_unpack_vector_int128 (vector __int128_t, signed char);
+long long __builtin_unpack_vector_int128 (vector __int128, signed char);
 @end smallexample
 
 Of these, the @code{__builtin_divde} and @code{__builtin_divdeu} functions
@@ -18331,57 +18331,57 @@ vector unsigned long long vec_vupklsw (vector int)
 If the ISA 2.07 additions to the vector/scalar (power8-vector)
 instruction set are available, the following additional functions are
 available for 64-bit targets.  New vector types
-(@var{vector __int128_t} and @var{vector __uint128_t}) are available
-to hold the @var{__int128_t} and @var{__uint128_t} types to use these
+(@var{vector __int128} and @var{vector __uint128}) are available
+to hold the @var{__int128} and @var{__uint128} types to use these
 builtins.
 
 The normal vector extract, and set operations work on
-@var{vector __int128_t} and @var{vector __uint128_t} types,
+@var{vector __int128} and @var{vector __uint128} types,
 but the index value must be 0.
 
 @smallexample
-vector __int128_t vec_vaddcuq (vector __int128_t, vector __int128_t);
-vector __uint128_t vec_vaddcuq (vector __uint128_t, vector __uint128_t);
+vector __int128 vec_vaddcuq (vector __int128, vector __int128);
+vector __uint128 vec_vaddcuq (vector __uint128, vector __uint128);
 
-vector __int128_t vec_vadduqm (vector __int128_t, vector __int128_t);
-vector __uint128_t vec_vadduqm (vector __uint128_t, vector __uint128_t);
+vector __int128 vec_vadduqm (vector __int128, vector __int128);
+vector __uint128 vec_vadduqm (vector __uint128, vector __uint128);
 
-vector __int128_t vec_vaddecuq (vector __int128_t, vector __int128_t,
-vector __int128_t);
-vector __uint128_t vec_vaddecuq (vector __uint128_t, vector __uint128_t,
- vector __uint128_t);
+vector __int128 vec_vaddecuq (vector __int128, vector __int128,
+vector __int128);
+vector __uint128 vec_vaddecuq (vector __uint128, vector __uint128,
+ vector __uint128);
 
-vector __int128_t vec_vaddeuqm (vector __int128_t, vector __int128_t,
-vector __int128_t);
-vector __uint128_t vec_vaddeuqm (vector __uint128_t, vector __uint128_t,
- vector __uint128_t);
+vector __int128 vec_vaddeuqm (vector __int128, vector __int128,
+vector __int128);
+vector __uint128 vec_vaddeuqm (vector __uint128, vector __uint128,
+ vector __uint128);
 
-vector __int128_t vec_vsubecuq (vector __int128_t, vector __int128_t,
-vector __int128_t);
-vector __uint128_t vec_vsubecuq (vector __uint128_t, vector __uint128_t,
- vector __uint128_t);
+vector __int128 vec_vsubecuq (vector __int128, vector __int128,
+vector __int128);
+vector __uint128 vec_vsubecuq (vector __uint128, vector __uint128,
+ vector __uint128);
 
-vector __int128_t vec_vsubeuqm (vector __int128_t, vector __int128_t,
-vector __int128_t);
-vector __uint128_t vec_vsubeuqm (vector __uint128_t, vector __uint128_t,
- vector __uint128_t);
+vector __int128 vec_vsubeuqm (vector __int128, vector __int128,
+vector __int128);
+vector __uint128 vec_vsubeuqm (vector __uint128, vector __uint128,
+ vector __uint128);
 
-vector __int128_t vec_vsubcuq (vector __int128_t, vector __int128_t);
-vector __uint128_t

[PATCH, rs6000] Sort Altivec/VSX built-in functions into subsubsections according to configuration requirements

2018-07-17 Thread Kelvin Nilsen
The many PowerPC built-in functions (intrinsics) that are enabled by including 
 each have different configuration requirements.  To simplify the 
description of the requirements, this patch sorts these functions into 
different subsubsections.  

A subsequent patch will add and remove various functions from each section to 
correct incompatibilities between what is implemented and what is documented.

I have built and regression tested this patch on powerpc64le-unknown-linux and 
on powerpc-linux (P8 big-endian) with no regressions.  I have also built and 
reviewed the gcc.pdf file.

Is this ok for trunk?

gcc/ChangeLog:

2018-07-17  Kelvin Nilsen  

* doc/extend.texi (PowerPC AltiVec/VSX Built-in Functions):
Corrected spelling of this subsection.  Moved some material to new
subsubsections "PowerPC AltiVec Built-in Functions on ISA 2.06" and
"PowerPC AltiVec Built-in Functions on ISA 2.07".
(PowerPC Altivec Built-in Functions on ISA 2.05): New subsubsection.
(PowerPC Altivec Built-in Functions on ISA 2.06): Likewise.
(PowerPC Altivec Built-in Functions on ISA 2.07): Likewise.
(PowerPC Altivec Built-in Functions on ISA 3.0): Likewise.

Index: gcc/doc/extend.texi
===
--- gcc/doc/extend.texi (revision 262747)
+++ gcc/doc/extend.texi (working copy)
@@ -15941,10 +15941,8 @@ The @code{__builtin_dfp_dtstsfi_ov_dd} and
 require that the type of the @code{value} argument be
 @code{__Decimal64} and @code{__Decimal128} respectively.
 
-
-
 @node PowerPC AltiVec/VSX Built-in Functions
-@subsection PowerPC AltiVec Built-in Functions
+@subsection PowerPC AltiVec/VSX Built-in Functions
 
 GCC provides an interface for the PowerPC family of processors to access
 the AltiVec operations described in Motorola's AltiVec Programming
@@ -15969,19 +15967,6 @@ vector bool int
 vector float
 @end smallexample
 
-If @option{-mvsx} is used the following additional vector types are
-implemented.
-
-@smallexample
-vector unsigned long
-vector signed long
-vector double
-@end smallexample
-
-The long types are only implemented for 64-bit code generation, and
-the long type is only used in the floating point/integer conversion
-instructions.
-
 GCC's implementation of the high-level language interface available from
 C and C++ code differs from Motorola's documentation in several ways.
 
@@ -16039,6 +16024,16 @@ the interfaces described therein.  However, histor
 additional interfaces for access to vector instructions.  These are
 briefly described below.
 
+@menu
+* PowerPC AltiVec Built-in Functions on ISA 2.05::
+* PowerPC AltiVec Built-in Functions Available on ISA 2.06::
+* PowerPC AltiVec Built-in Functions Available on ISA 2.07::
+* PowerPC AltiVec Built-in Functions Available on ISA 3.0::
+@end menu
+
+@node PowerPC AltiVec Built-in Functions on ISA 2.05
+@subsubsection PowerPC AltiVec Built-in Functions on ISA 2.05
+
 The following interfaces are supported for the generic and specific
 AltiVec operations and the AltiVec predicates.  In cases where there
 is a direct mapping between generic and specific operations, only the
@@ -17581,132 +17576,152 @@ vector unsigned char vec_xor (vector unsigned char
 vector unsigned char vec_xor (vector unsigned char, vector unsigned char);
 @end smallexample
 
-The following built-in functions which are currently documented in
-this section are not alphabetized with other built-in functions of
-this section because they belong in different sections.
+@node PowerPC AltiVec Built-in Functions Available on ISA 2.06
+@subsubsection PowerPC AltiVec Built-in Functions Available on ISA 2.06
 
+The AltiVec built-in functions described in this section are
+available on the PowerPC family of processors starting with ISA 2.06
+or later.  These are normally enabled by adding @option{-mvsx} to the
+command line.
+
+When @option{-mvsx} is used, the following additional vector types are
+implemented.
+
 @smallexample
-/* __int128, long long, and double arguments and results require -mvsx.  */
+vector unsigned __int128
+vector signed __int128
+vector unsigned long long int
+vector signed long long int
+vector double
+@end smallexample
+
+The long long types are only implemented for 64-bit code generation.
+
+@smallexample
+
 vector bool long long vec_and (vector bool long long int, vector bool long 
long);
+
 vector double vec_ctf (vector unsigned long, const int);
 vector double vec_ctf (vector signed long, const int);
+
 vector signed long vec_cts (vector double, const int);
+
 vector unsigned long vec_ctu (vector double, const int);
+
 void vec_dst (const unsigned long *, int, const int);
 void vec_dst (const long *, int, const int);
+
 void vec_dststt (const unsigned long *, int, const int);
 void vec_dststt (const long *, int, const int);
+
 void vec_dstt (const unsigned long *, int, const int);
 void vec_dstt (const long *, int, const int);
+
 vector un

Re: [RFC] Induction variable candidates not sufficiently general

2018-07-16 Thread Kelvin Nilsen
Thanks for looking at this for me.  In simplifying the test case for a bug 
report, I've narrowed the "problem" to integer overflow considerations.  My len 
variable is declared int, and the target has 64-bit pointers.  I'm gathering 
that the "manual transformation" I quoted below is not considered "equivalent" 
to the original source code due to different integer overflow behaviors.  If I 
redeclare len to be unsigned long long, then I automatically get the 
optimizations that I was originally expecting.

I suppose this is really NOT a bug?

Is there a compiler optimization flag that allows the optimizer to ignore array 
index integer overflow in considering legal optimizations?



On 7/13/18 9:14 PM, Bin.Cheng wrote:
> On Fri, Jul 13, 2018 at 6:04 AM, Kelvin Nilsen  wrote:
>> A somewhat old "issue report" pointed me to the code generated for a 4-fold 
>> manually unrolled version of the following loop:
>>
>>>   while (++len != len_limit) /* this is loop */
>>>   if (pb[len] != cur[len])
>>>   break;
>>
>> As unrolled, the loop appears as:
>>
>>> while (++len != len_limit) /* this is loop */ {
>>>   if (pb[len] != cur[len])
>>> break;
>>>   if (++len == len_limit)  /* unrolled 2nd iteration */
>>> break;
>>>   if (pb[len] != cur[len])
>>> break;
>>>   if (++len == len_limit)  /* unrolled 3rd iteration */
>>> break;
>>>   if (pb[len] != cur[len])
>>> break;
>>>   if (++len == len_limit)  /* unrolled 4th iteration */
>>> break;
>>>   if (pb[len] != cur[len])
>>> break;
>>> }
>>
>> In examining the behavior of tree-ssa-loop-ivopts.c, I've discovered the 
>> only induction variable candidates that are being considered are all forms 
>> of the len variable.  We are not considering any induction variables to 
>> represent the address expressions [len] and [len].
>>
>> I rewrote the source code for this loop to make the addressing expressions 
>> more explicit, as in the following:
>>
>>>   cur++;
>>>   while (++pb != last_pb) /* this is loop */ {
>>>   if (*pb != *cur)
>>> break;
>>>   ++cur;
>>>   if (++pb == last_pb)  /* unrolled 2nd iteration */
>>> break;
>>>   if (*pb != *cur)
>>> break;
>>>   ++cur;
>>>   if (++pb == last_pb)  /* unrolled 3rd iteration */
>>> break;
>>>   if (*pb != *cur)
>>> break;
>>>   ++cur;
>>>   if (++pb == last_pb)  /* unrolled 4th iteration */
>>> break;
>>>   if (*pb != *cur)
>>> break;
>>>   ++cur;
>>>   }
>>
>> Now, gcc does a better job of identifying the "address expression induction 
>> variables".  This version of the loop runs about 10% faster than the 
>> original on my target architecture.
>>
>> This would seem to be a textbook pattern for the induction variable 
>> analysis.  Does anyone have any thoughts on the best way to add these 
>> candidates to the set of induction variables that are considered by 
>> tree-ssa-loop-ivopts.c?
>>
>> Thanks in advance for any suggestions.
>>
> Hi,
> Could you please file a bug with your original slow test code
> attached?  I tried to construct meaningful test case from your code
> snippet but not successful.  There is difference in generated
> assembly, but it's not that fundamental.  So a bug with preprocessed
> test would be high appreciated.
> I think there are two potential issues in cost computation for such
> case: invariant expression and iv uses outside of loop handled as
> inside uses.
> 
> Thanks,
> bin
> 
> 

#include 
#include 

int
bt_skip_func(const __uint64_t len_limit, const __uint8_t *cur,
 long long int delta, __uint64_t len) {

  const __uint8_t *pb = cur - delta;

  while (++len != len_limit) {
if (pb[len] != cur[len])
  break;
if (++len == len_limit)
  break;
if (pb[len] != cur[len])
  break;
if (++len == len_limit)
  break;
if (pb[len] != cur[len])
  break;
if (++len == len_limit)
  break;
if (pb[len] != cur[len])
  break;
  }

  return len;
}

int main (int argc, 

[RFC] Induction variable candidates not sufficiently general

2018-07-12 Thread Kelvin Nilsen
A somewhat old "issue report" pointed me to the code generated for a 4-fold 
manually unrolled version of the following loop:

>   while (++len != len_limit) /* this is loop */
>   if (pb[len] != cur[len])
>   break;

As unrolled, the loop appears as:

> while (++len != len_limit) /* this is loop */ {
>   if (pb[len] != cur[len])
> break;
>   if (++len == len_limit)  /* unrolled 2nd iteration */
> break;
>   if (pb[len] != cur[len])
> break;
>   if (++len == len_limit)  /* unrolled 3rd iteration */
> break;
>   if (pb[len] != cur[len])
> break;
>   if (++len == len_limit)  /* unrolled 4th iteration */
> break;
>   if (pb[len] != cur[len])
> break;
> }

In examining the behavior of tree-ssa-loop-ivopts.c, I've discovered the only 
induction variable candidates that are being considered are all forms of the 
len variable.  We are not considering any induction variables to represent the 
address expressions [len] and [len].

I rewrote the source code for this loop to make the addressing expressions more 
explicit, as in the following:

>   cur++;
>   while (++pb != last_pb) /* this is loop */ {
>   if (*pb != *cur)
> break;
>   ++cur;
>   if (++pb == last_pb)  /* unrolled 2nd iteration */
> break;
>   if (*pb != *cur)
> break;
>   ++cur;
>   if (++pb == last_pb)  /* unrolled 3rd iteration */
> break;
>   if (*pb != *cur)
> break;
>   ++cur;
>   if (++pb == last_pb)  /* unrolled 4th iteration */
> break;
>   if (*pb != *cur)
> break;
>   ++cur;
>   }

Now, gcc does a better job of identifying the "address expression induction 
variables".  This version of the loop runs about 10% faster than the original 
on my target architecture.

This would seem to be a textbook pattern for the induction variable analysis.  
Does anyone have any thoughts on the best way to add these candidates to the 
set of induction variables that are considered by tree-ssa-loop-ivopts.c?

Thanks in advance for any suggestions.



[PATCH, rs6000] Alphabetize prototypes of AltiVec built-in functions in extend.texi

2018-07-10 Thread Kelvin Nilsen
This patch alphabetizes the list of AltiVec built-in function prototypes that 
consume about 15 pages of the gcc.pdf file.  As part of the alphabetization 
effort, certain functions that should not be documented in this section of the 
manual are separated from the others and moved to the end of the section with 
comments to explain their role.

This patch prepares the way for future patches that will remove certain 
prototypes from this section and will insert certain prototypes that are 
currently missing from this section.  It also improves readability and 
maintainability of the section.

This patch has bootstrapped and tested without regressions on 
powerpc64le-unknown-linux (P8).  I have also built the gcc.pdf file and 
reviewed its contents.

In total, the diffs may appear daunting.  A condensation of the diffs is 
obtained by separating out the insertions (+ in the first column) from the 
deletions (- in the first column), sorting the respective files, and performing 
a diff.  This condensed diff reveals that the entirety of this patch results 
only in the following "net changes", all of which are (temporary) additions to 
the extend.texi file:

< 
< 
< 
< 
< 
< @end smallexample
< /* __int128, long long, and double arguments and results require -mvsx.  */
< @smallexample
< The following built-in functions which are currently documented in
< this section are not alphabetized with other built-in functions of
< this section because they belong in different sections.
< /* vec_doublee requires -mvsx.  */
< /* vec_doubleh requires -mvsx.  */
< /* vec_doublel requires -mvsx.  */
< /* vec_doubleo requires -mvsx.  */
< /* vec_float2 requires -mvsx.  */
< /* vec_floate requires -mvsx.  */
< /* vec_floato requires -mvsx.  */
< /* vec_float requires -mvsx.  */
< /* vec_neg requires P8_vector */
< /* vec_signed2 requires -mcpu=power8.  */
< /* vec_signede requires -mvsx.  */
< /* vec_signedo requires -mvsx.  */
< /* vec_signed requires -mvsx.  */
< /* vec_sldw requires -mvsx.  */
< /* vec_unsignede requires -mcpu=power8.  */
< /* vec_unsignede requires -mvsx.  */
< /* vec_unsignedo requires -mvsx.  */
< /* vec_unsigned requires -mvsx.  */

Is this patch ok for trunk?

gcc/ChangeLog:

2018-07-10  Kelvin Nilsen  

* doc/extend.texi (PowerPC AltiVec Built-in Functions):
Alphabetize prototypes of built-in functions, separating out
built-in functions that are listed in this section but should be
described elsewhere.

Index: gcc/doc/extend.texi
===
--- gcc/doc/extend.texi (revision 262542)
+++ gcc/doc/extend.texi (working copy)
@@ -16065,29 +16065,6 @@ vector unsigned int vec_add (vector unsigned int,
 vector unsigned int vec_add (vector unsigned int, vector unsigned int);
 vector float vec_add (vector float, vector float);
 
-vector float vec_vaddfp (vector float, vector float);
-
-vector signed int vec_vadduwm (vector bool int, vector signed int);
-vector signed int vec_vadduwm (vector signed int, vector bool int);
-vector signed int vec_vadduwm (vector signed int, vector signed int);
-vector unsigned int vec_vadduwm (vector bool int, vector unsigned int);
-vector unsigned int vec_vadduwm (vector unsigned int, vector bool int);
-vector unsigned int vec_vadduwm (vector unsigned int, vector unsigned int);
-
-vector signed short vec_vadduhm (vector bool short, vector signed short);
-vector signed short vec_vadduhm (vector signed short, vector bool short);
-vector signed short vec_vadduhm (vector signed short, vector signed short);
-vector unsigned short vec_vadduhm (vector bool short, vector unsigned short);
-vector unsigned short vec_vadduhm (vector unsigned short, vector bool short);
-vector unsigned short vec_vadduhm (vector unsigned short, vector unsigned 
short);
-
-vector signed char vec_vaddubm (vector bool char, vector signed char);
-vector signed char vec_vaddubm (vector signed char, vector bool char);
-vector signed char vec_vaddubm (vector signed char, vector signed char);
-vector unsigned char vec_vaddubm (vector bool char, vector unsigned char);
-vector unsigned char vec_vaddubm (vector unsigned char, vector bool char);
-vector unsigned char vec_vaddubm (vector unsigned char, vector unsigned char);
-
 vector unsigned int vec_addc (vector unsigned int, vector unsigned int);
 
 vector unsigned char vec_adds (vector bool char, vector unsigned char);
@@ -16109,34 +16086,151 @@ vector signed int vec_adds (vector bool int, vecto
 vector signed int vec_adds (vector signed int, vector bool int);
 vector signed int vec_adds (vector signed int, vector signed int);
 
-vector signed int vec_vaddsws (vector bool int, vector signed int);
-vector signed int vec_vaddsws (vector signed int, vector bool int);
-vector signed int vec_vaddsws (vector signed int, vector signed int);
+int vec_all_eq (vector signed char, vector bool 

[PATCH] Backport testsuite: Introduce be/le selectors

2018-06-27 Thread Kelvin Nilsen
Hi Jeff,

Is it ok to backport this patch to gcc 8?  There are other backports of test 
programs that would like to use the new selector options.

Thanks.


On 5/23/18 12:31 PM, Segher Boessenkool wrote:
> On Tue, May 22, 2018 at 03:21:30PM -0600, Jeff Law wrote:
>> On 05/21/2018 03:46 PM, Segher Boessenkool wrote:
>>> This patch creates "be" and "le" selectors, which can be used by all
>>> architectures, similar to ilp32 and lp64.
>>
>> I think this is fine.  "be" "le" are used all over the place in gcc and
>> the kernel to denote big/little endian.
> 
> Thanks.  This is what I checked in (to trunk):
> 
> 
> 2017-05-23  Segher Boessenkool  
> 
>   * doc/sourcebuild.texi (Endianness): New subsubsection.
> 
> gcc/testsuite/
>   * lib/target-supports.exp (check_effective_target_be): New.
>   (check_effective_target_le): New.
> 
> 
> diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
> index dfb0578..596007d 100644
> --- a/gcc/doc/sourcebuild.texi
> +++ b/gcc/doc/sourcebuild.texi
> @@ -1313,6 +1313,16 @@ By convention, keywords ending in @code{_nocache} can 
> also include options
>  specified for the particular test in an earlier @code{dg-options} or
>  @code{dg-add-options} directive.
> 
> +@subsubsection Endianness
> +
> +@table @code
> +@item be
> +Target uses big-endian memory order for multi-byte and multi-word data.
> +
> +@item le
> +Target uses little-endian memory order for multi-byte and multi-word data.
> +@end table
> +
>  @subsubsection Data type sizes
> 
>  @table @code
> diff --git a/gcc/testsuite/lib/target-supports.exp 
> b/gcc/testsuite/lib/target-supports.exp
> index aa1296e6..0a53d7b 100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -2523,6 +2523,22 @@ proc check_effective_target_next_runtime { } {
>  }]
>  }
> 
> +# Return 1 if we're generating code for big-endian memory order.
> +
> +proc check_effective_target_be { } {
> +return [check_no_compiler_messages be object {
> + int dummy[__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__ ? 1 : -1];
> +}]
> +}
> +
> +# Return 1 if we're generating code for little-endian memory order.
> +
> +proc check_effective_target_le { } {
> +return [check_no_compiler_messages le object {
> + int dummy[__BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ ? 1 : -1];
> +}]
> +}
> +
>  # Return 1 if we're generating 32-bit code using default options, 0
>  # otherwise.
> 



Re: [PATCH, rs6000] Backport Fix tests that are failing in gcc.target/powerpc/bfp with -m32

2018-06-26 Thread Kelvin Nilsen


Hi Segher,

This patch, as revised in response to your suggestions, was committed to trunk 
on 4/17/2018.

Is this ok for backporting to gcc8, gcc7, and gcc6?

Thanks.


On 4/13/18 3:15 PM, Kelvin Nilsen wrote:
> Twelve failures have been occuring in the bfp test directory during -m32
> regression testing.
> 
> The cause of these failures was two-fold:
> 
> 1. Patches added subsequent to development of the tests caused new error
> messages
> to be emitted that are different than the error messages expected in the
> dejagnu patterns.
> These new patches also changed which built-in functions are legal when
> compiling with the
> -m32 command-line option.
> 
> 2. The implementation of overloaded built-in functions maps overloaded
> function names to
> non-overloaded names.  Depending on the stage at which an error is
> recognized, error
> messages may refer either to the overloaded built-in function name or
> the non-overloaded
> name.
> 
> This patch:
> 
> 1. Changes the expected error messages in certain test programs.
> 
> 2. Disables certain test programs from being exercised on 32-bit targets.
> 
> 3. Adds a "note" error message to explain the mapping from overloaded
> built-in functions
> to non-overloaded built-in functions.
> 
> 
> This patch has bootstrapped and tested without regressions on both
> powerpc64le-unknown-linux (P8) and on powerpc-linux (P7 big-endian, with
> both -m32
> and -m64 target options).
> 
> Is this ok for trunk?
> 
> gcc/ChangeLog:
> 
> 2018-04-13  Kelvin Nilsen  
> 
>     * config/rs6000/rs6000-protos.h (rs6000_builtin_is_supported_p):
>     New prototype.
>     * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):
>     Add note to error message to explain internal mapping of overloaded
>     built-in function name to non-overloaded built-in function name.
>     * config/rs6000/rs6000.c (rs6000_builtin_is_supported_p): New
>     function.
> 
> gcc/testsuite/ChangeLog:
> 
> 2018-04-13  Kelvin Nilsen  
> 
>     * gcc.target/powerpc/bfp/scalar-extract-sig-5.c: Simplify to
>     prevent cascading of errors and change expected error message.
>     * gcc.target/powerpc/bfp/scalar-test-neg-4.c: Restrict this test
>     to 64-bit targets.
>     * gcc.target/powerpc/bfp/scalar-test-data-class-8.c: Likewise.
>     * gcc.target/powerpc/bfp/scalar-test-data-class-9.c: Likewise.
>     * gcc.target/powerpc/bfp/scalar-test-data-class-10.c: Likewise.
>     * gcc.target/powerpc/bfp/scalar-insert-exp-11.c: Change expected
>     error message.
>     * gcc.target/powerpc/bfp/scalar-extract-exp-5.c: Likewise.
> 
> Index: gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-5.c
> ===
> --- gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-5.c   
> (revision 259316)
> +++ gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-5.c   
> (working copy)
> @@ -8,10 +8,10 @@
>     error because the builtin requires 64 bits.  */
>  #include 
>  
> -unsigned __int128 /* { dg-error "'__int128' is not supported on this
> target" } */
> +unsigned long long int
>  get_significand (__ieee128 *p)
>  {
>    __ieee128 source = *p;
>  
> -  return __builtin_vec_scalar_extract_sig (source); /* { dg-error
> "builtin function '__builtin_vec_scalar_extract_sig' not supported in
> this compiler configuration" } */
> +  return (long long int) __builtin_vec_scalar_extract_sig (source); /*
> { dg-error "requires ISA 3.0 IEEE 128-bit floating point" } */
>  }
> Index: gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-neg-4.c
> ===
> --- gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-neg-4.c   
> (revision 259316)
> +++ gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-neg-4.c    (working
> copy)
> @@ -1,5 +1,6 @@
>  /* { dg-do compile { target { powerpc*-*-* } } } */
>  /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" }
> { "-mcpu=power9" } } */
> +/* { dg-require-effective-target lp64 } */
>  /* { dg-require-effective-target powerpc_p9vector_ok } */
>  /* { dg-options "-mcpu=power9" } */
>  
> @@ -11,6 +12,8 @@
>  {
>    __ieee128 source = *p;
>  
> +  /* IEEE 128-bit floating point operations are only supported
> + on 64-bit targets.  */
>    return scalar_test_neg (source);
>  }
>  
> Index: gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-data-class-8.c
> ===
> --- gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-data-class-8.c   
> (revision 259316)
&

[PATCH, rs6000] Obvious patch to fix erroneous comment

2018-06-26 Thread Kelvin Nilsen
In recently committed patch to correct code generation for the vec_packsu 
(vector unsigned long long, vector unsigned long long) built-in function, I 
accidentally left a comment in place that was not relevant to the final patch 
that was committed.

This patch fixes that comment.  After regression testing, I have committed this 
patch as obvious.

gcc/testsuite/ChangeLog:

2018-06-26  Kelvin Nilsen  

* gcc.target/powerpc/builtins-1.c: Correct a comment.

Index: gcc/testsuite/gcc.target/powerpc/builtins-1.c
===
--- gcc/testsuite/gcc.target/powerpc/builtins-1.c   (revision 262149)
+++ gcc/testsuite/gcc.target/powerpc/builtins-1.c   (working copy)
@@ -288,7 +288,7 @@ int main ()
vec_mul mulld | mullw, mulhwu
vec_nor xxlnor
vec_or  xxlor
-   vec_packsu  vpkudus (matches twice due to -dp option)
+   vec_packsu  vpkudus
vec_perm vperm
vec_round xvrdpi
vec_sel xxsel



Re: [PATCH v2, rs6000] Backport Fix implementation of vec_pack (vector double, vector double) built-in function

2018-06-22 Thread Kelvin Nilsen
Hi Segher,

After waiting a few days for this newly committed patch to settle, is it ok to 
backport to gcc 6, gcc 7, and gcc 8?

Thanks.


On 6/22/18 5:34 PM, Kelvin Nilsen wrote:
> Thanks for feedback.  It turns out that the vmrgew and vmrgow instructions 
> require power 8.
> 
> After coordinating with Segher on minor refinements to the test cases, I have 
> committed the patch as quoted below to the trunk.
> 
> On 6/19/18 5:37 PM, Segher Boessenkool wrote:
>> Hi!
>>
>> On Tue, Jun 19, 2018 at 01:37:51PM -0500, Kelvin Nilsen wrote:
>>> --- gcc/testsuite/gcc.target/powerpc/builtins-9.c   (nonexistent)
>>> +++ gcc/testsuite/gcc.target/powerpc/builtins-9.c   (working copy)
>>> @@ -0,0 +1,21 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-require-effective-target powerpc_p8vector_ok } */
>>> +/* Expect same instruction selecton on p8 and above.  Fix if future
>>> +   targets behave differently.  */
>>> +/* { dg-options "-O3 -maltivec" } */
>>
>> But this doesn't use -mcpu=power8 or similar.  Does it need it anyway?
>> Both xxpermdi and xvcvdpsp are Power7 (ISA 2.06) and the rest is AltiVec?
>> So maybe just powerpc_vsx_ok?
>>
>>> +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } } */
>>
>> You do not use -mcpu= so you don't need this.
>>
>> Same issues in the next test.  Rest looks good though :-)
>>
>>
>> Segher
>>
>>
> 
> gcc/ChangeLog:
> 
> 2018-06-22  Kelvin Nilsen  
> 
>   * config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Change
>   behavior of vec_pack (vector double, vector double) to match
>   behavior of vec_float2 (vector double, vector double).
> 
> gcc/testsuite/ChangeLog:
> 
> 2018-06-22  Kelvin Nilsen  
> 
>   * gcc.target/powerpc/builtins-3-p8.c (test_pack_float): Remove
>   this test.
>   * gcc.target/powerpc/builtins-9.c: New test.
>   * gcc.target/powerpc/fold-vec-pack-double.c: Modify dg directives
>   to expect different code generation on big-endian vs.
>   little-endian targets.
> 
> Index: gcc/config/rs6000/rs6000-c.c
> ===
> --- gcc/config/rs6000/rs6000-c.c  (revision 261775)
> +++ gcc/config/rs6000/rs6000-c.c  (working copy)
> @@ -2425,7 +2425,7 @@ const struct altivec_builtin_types altivec_overloa
>  RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V2DI, 
> RS6000_BTI_unsigned_V2DI, 0 },
>{ ALTIVEC_BUILTIN_VEC_PACK, P8V_BUILTIN_VPKUDUM,
>  RS6000_BTI_bool_V4SI, RS6000_BTI_bool_V2DI, RS6000_BTI_bool_V2DI, 0 },
> -  { ALTIVEC_BUILTIN_VEC_PACK, P8V_BUILTIN_VPKUDUM,
> +  { ALTIVEC_BUILTIN_VEC_PACK, P8V_BUILTIN_FLOAT2_V2DF,
>  RS6000_BTI_V4SF, RS6000_BTI_V2DF, RS6000_BTI_V2DF, 0 },
> 
>{ P8V_BUILTIN_VEC_NEG, P8V_BUILTIN_NEG_V16QI,
> Index: gcc/testsuite/gcc.target/powerpc/builtins-3-p8.c
> ===
> --- gcc/testsuite/gcc.target/powerpc/builtins-3-p8.c  (revision 261775)
> +++ gcc/testsuite/gcc.target/powerpc/builtins-3-p8.c  (working copy)
> @@ -11,12 +11,6 @@ test_eq_long_long (vector bool long long x, vector
>   return vec_cmpeq (x, y);
>  }
> 
> -vector float
> -test_pack_float (vector double x, vector double y)
> -{
> -  return vec_pack (x, y);
> -}
> -
>  vector unsigned char
>  test_vsi_packs_vusi_vusi (vector unsigned short x,
>vector unsigned short y)
> @@ -214,7 +208,6 @@ test_neg_double (vector double x)
>  /* Expected test results:
> 
>   test_eq_long_long 1 vcmpequd inst
> - test_pack_float   1 vpkudum inst
>   test_vsi_packs_vsll_vsll  1 vpksdss
>   test_vui_packs_vull_vull  1 vpkudus
>   test_vui_packs_vssi_vssi  1 vpkshss
> @@ -239,7 +232,6 @@ test_neg_double (vector double x)
>   */
> 
>  /* { dg-final { scan-assembler-times "vcmpequd" 1 } } */
> -/* { dg-final { scan-assembler-times "vpkudum"  1 } } */
>  /* { dg-final { scan-assembler-times "vpksdss"  1 } } */
>  /* { dg-final { scan-assembler-times "vpkudus"  1 } } */  
>  /* { dg-final { scan-assembler-times "vpkuhus"  2 } } */
> Index: gcc/testsuite/gcc.target/powerpc/builtins-9.c
> ===
> --- gcc/testsuite/gcc.target/powerpc/builtins-9.c (nonexistent)
> +++ gcc/testsuite/gcc.target/powerpc/builtins-9.c (working copy)
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-ta

Re: [PATCH v2, rs6000] Fix implementation of vec_pack (vector double, vector double) built-in function

2018-06-22 Thread Kelvin Nilsen
Thanks for feedback.  It turns out that the vmrgew and vmrgow instructions 
require power 8.

After coordinating with Segher on minor refinements to the test cases, I have 
committed the patch as quoted below to the trunk.

On 6/19/18 5:37 PM, Segher Boessenkool wrote:
> Hi!
> 
> On Tue, Jun 19, 2018 at 01:37:51PM -0500, Kelvin Nilsen wrote:
>> --- gcc/testsuite/gcc.target/powerpc/builtins-9.c(nonexistent)
>> +++ gcc/testsuite/gcc.target/powerpc/builtins-9.c(working copy)
>> @@ -0,0 +1,21 @@
>> +/* { dg-do compile } */
>> +/* { dg-require-effective-target powerpc_p8vector_ok } */
>> +/* Expect same instruction selecton on p8 and above.  Fix if future
>> +   targets behave differently.  */
>> +/* { dg-options "-O3 -maltivec" } */
> 
> But this doesn't use -mcpu=power8 or similar.  Does it need it anyway?
> Both xxpermdi and xvcvdpsp are Power7 (ISA 2.06) and the rest is AltiVec?
> So maybe just powerpc_vsx_ok?
> 
>> +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } } */
> 
> You do not use -mcpu= so you don't need this.
> 
> Same issues in the next test.  Rest looks good though :-)
> 
> 
> Segher
> 
> 

gcc/ChangeLog:

2018-06-22  Kelvin Nilsen  

* config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Change
behavior of vec_pack (vector double, vector double) to match
behavior of vec_float2 (vector double, vector double).

gcc/testsuite/ChangeLog:

2018-06-22  Kelvin Nilsen  

* gcc.target/powerpc/builtins-3-p8.c (test_pack_float): Remove
this test.
* gcc.target/powerpc/builtins-9.c: New test.
* gcc.target/powerpc/fold-vec-pack-double.c: Modify dg directives
to expect different code generation on big-endian vs.
little-endian targets.

Index: gcc/config/rs6000/rs6000-c.c
===
--- gcc/config/rs6000/rs6000-c.c(revision 261775)
+++ gcc/config/rs6000/rs6000-c.c(working copy)
@@ -2425,7 +2425,7 @@ const struct altivec_builtin_types altivec_overloa
 RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V2DI, 
RS6000_BTI_unsigned_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_PACK, P8V_BUILTIN_VPKUDUM,
 RS6000_BTI_bool_V4SI, RS6000_BTI_bool_V2DI, RS6000_BTI_bool_V2DI, 0 },
-  { ALTIVEC_BUILTIN_VEC_PACK, P8V_BUILTIN_VPKUDUM,
+  { ALTIVEC_BUILTIN_VEC_PACK, P8V_BUILTIN_FLOAT2_V2DF,
 RS6000_BTI_V4SF, RS6000_BTI_V2DF, RS6000_BTI_V2DF, 0 },
 
   { P8V_BUILTIN_VEC_NEG, P8V_BUILTIN_NEG_V16QI,
Index: gcc/testsuite/gcc.target/powerpc/builtins-3-p8.c
===
--- gcc/testsuite/gcc.target/powerpc/builtins-3-p8.c(revision 261775)
+++ gcc/testsuite/gcc.target/powerpc/builtins-3-p8.c(working copy)
@@ -11,12 +11,6 @@ test_eq_long_long (vector bool long long x, vector
return vec_cmpeq (x, y);
 }
 
-vector float
-test_pack_float (vector double x, vector double y)
-{
-  return vec_pack (x, y);
-}
-
 vector unsigned char
 test_vsi_packs_vusi_vusi (vector unsigned short x,
   vector unsigned short y)
@@ -214,7 +208,6 @@ test_neg_double (vector double x)
 /* Expected test results:
 
  test_eq_long_long 1 vcmpequd inst
- test_pack_float   1 vpkudum inst
  test_vsi_packs_vsll_vsll  1 vpksdss
  test_vui_packs_vull_vull  1 vpkudus
  test_vui_packs_vssi_vssi  1 vpkshss
@@ -239,7 +232,6 @@ test_neg_double (vector double x)
  */
 
 /* { dg-final { scan-assembler-times "vcmpequd" 1 } } */
-/* { dg-final { scan-assembler-times "vpkudum"  1 } } */
 /* { dg-final { scan-assembler-times "vpksdss"  1 } } */
 /* { dg-final { scan-assembler-times "vpkudus"  1 } } */  
 /* { dg-final { scan-assembler-times "vpkuhus"  2 } } */
Index: gcc/testsuite/gcc.target/powerpc/builtins-9.c
===
--- gcc/testsuite/gcc.target/powerpc/builtins-9.c   (nonexistent)
+++ gcc/testsuite/gcc.target/powerpc/builtins-9.c   (working copy)
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power8" } } */
+/* { dg-options "-maltivec -mcpu=power8 -O3" } */
+
+#include 
+
+vector float
+test_pack_float (vector double x, vector double y)
+{
+  return vec_pack (x, y);
+}
+
+/* { dg-final { scan-assembler-times "vmrgew" 1 { target be } } } */
+/* { dg-final { scan-assembler-times "vmrgow"  1 { target le } } } */
+
+/* { dg-final { scan-assembler-times "xvcvdpsp"  2 } } */
+/* { dg-final { scan-assembler-times "xxpermdi"  2 } } */
+
Index: g

Re: [PATCH, rs6000] Backport Fix implementation of vec_packsu (vector unsigned long long, vector unsigned long long) built-in function

2018-06-22 Thread Kelvin Nilsen
This has been committed to trunk.

Is this ok to backport to gcc6, gcc7, and gcc8?

Thanks.

On 6/19/18 2:30 PM, Segher Boessenkool wrote:
> Hi!
> 
> On Mon, Jun 18, 2018 at 11:29:55AM -0500, Kelvin Nilsen wrote:
>> +/* A single vpkudus matches twice because this is compiled with -dp,
>> +   causing diagnostic comments to appear in the resulting .s file, one
>> +   of which matches vpkudus.  */
> 
> -dp prints the name of the instruction pattern, which is altivec_vpkudus.
> So if you look for the full word instead, this problem isn't there I
> think?
> 
>> +/* { dg-final { scan-assembler-times "vpkudus" 2 } } */
> 
> /* { dg-final { scan-assembler-times {\mvpkudus\M} 1 } } */
> 
> Okay with that change (and comment changes).  Thanks!
> 
> 
> Segher
> 
> 



[PATCH v2, rs6000] Fix implementation of vec_pack (vector double, vector double) built-in function

2018-06-19 Thread Kelvin Nilsen


This patch fixes an error in the code generation for vec_pack (vector double, 
vector double).  As previously implemented, this built-in function translates 
to the vpkudum instruction.

This patch causes vec_pack (vector double, vector double) to behave the same as 
vec_float2 for the same type signature, producing the vmrgow instruction on 
little-endian targets and the vmrgew instruction on big-endian targets.

This revision differs from the initial path submission in that it combines all 
of the new testing into two test programs, using target qualifiers on the dg 
scan-assembler-times directives.

This patch has bootstrapped and tested without regressions on 
powerpc64le-unknown-linux (both P8 and P9) and on powerpc-linux (P8 big-endian, 
both -m32 and -m64).

Is this ok for the trunk?

gcc/ChangeLog:

2018-06-19  Kelvin Nilsen  

* config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Change
behavior of vec_pack (vector double, vector double) to match
behavior of vec_float2 (vector double, vector double).

gcc/testsuite/ChangeLog:

2018-06-19  Kelvin Nilsen  

* gcc.target/powerpc/builtins-3-p8.c (test_pack_float): Remove
this test.
* gcc.target/powerpc/builtins-9.c: New test.
* gcc.target/powerpc/fold-vec-pack-double.c: Modify dg directives
to expect different code generation on big-endian
vs. little-endian targets.

Index: gcc/config/rs6000/rs6000-c.c
===
--- gcc/config/rs6000/rs6000-c.c(revision 261341)
+++ gcc/config/rs6000/rs6000-c.c(working copy)
@@ -2425,7 +2425,7 @@ const struct altivec_builtin_types altivec_overloa
 RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V2DI, 
RS6000_BTI_unsigned_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_PACK, P8V_BUILTIN_VPKUDUM,
 RS6000_BTI_bool_V4SI, RS6000_BTI_bool_V2DI, RS6000_BTI_bool_V2DI, 0 },
-  { ALTIVEC_BUILTIN_VEC_PACK, P8V_BUILTIN_VPKUDUM,
+  { ALTIVEC_BUILTIN_VEC_PACK, P8V_BUILTIN_FLOAT2_V2DF,
 RS6000_BTI_V4SF, RS6000_BTI_V2DF, RS6000_BTI_V2DF, 0 },
 
   { P8V_BUILTIN_VEC_NEG, P8V_BUILTIN_NEG_V16QI,
Index: gcc/testsuite/gcc.target/powerpc/builtins-3-p8.c
===
--- gcc/testsuite/gcc.target/powerpc/builtins-3-p8.c(revision 261341)
+++ gcc/testsuite/gcc.target/powerpc/builtins-3-p8.c(working copy)
@@ -11,12 +11,6 @@ test_eq_long_long (vector bool long long x, vector
return vec_cmpeq (x, y);
 }
 
-vector float
-test_pack_float (vector double x, vector double y)
-{
-  return vec_pack (x, y);
-}
-
 vector unsigned char
 test_vsi_packs_vusi_vusi (vector unsigned short x,
   vector unsigned short y)
@@ -214,7 +208,6 @@ test_neg_double (vector double x)
 /* Expected test results:
 
  test_eq_long_long 1 vcmpequd inst
- test_pack_float   1 vpkudum inst
  test_vsi_packs_vsll_vsll  1 vpksdss
  test_vui_packs_vull_vull  1 vpkudus
  test_vui_packs_vssi_vssi  1 vpkshss
@@ -239,7 +232,6 @@ test_neg_double (vector double x)
  */
 
 /* { dg-final { scan-assembler-times "vcmpequd" 1 } } */
-/* { dg-final { scan-assembler-times "vpkudum"  1 } } */
 /* { dg-final { scan-assembler-times "vpksdss"  1 } } */
 /* { dg-final { scan-assembler-times "vpkudus"  1 } } */  
 /* { dg-final { scan-assembler-times "vpkuhus"  2 } } */
Index: gcc/testsuite/gcc.target/powerpc/builtins-9.c
===
--- gcc/testsuite/gcc.target/powerpc/builtins-9.c   (nonexistent)
+++ gcc/testsuite/gcc.target/powerpc/builtins-9.c   (working copy)
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* Expect same instruction selecton on p8 and above.  Fix if future
+   targets behave differently.  */
+/* { dg-options "-O3 -maltivec" } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } } */
+
+#include 
+
+vector float
+test_pack_float (vector double x, vector double y)
+{
+  return vec_pack (x, y);
+}
+
+/* { dg-final { scan-assembler-times "vmrgew" 1 { target be } } } */
+/* { dg-final { scan-assembler-times "vmrgow"  1 { target le } } } */
+
+/* { dg-final { scan-assembler-times "xvcvdpsp"  2 } } */
+/* { dg-final { scan-assembler-times "xxpermdi"  2 } } */
+
Index: gcc/testsuite/gcc.target/powerpc/fold-vec-pack-double.c
===
--- gcc/testsuite/gcc.target/powerpc/fold-vec-pack-double.c (revision 
261341)
+++ gcc/testsuite/gcc.target/powerpc/fold-vec-pack-double.c (working copy)
@@ -3,7 +3,10 @@
 
 /* { dg-do compile } */
 /* { dg-require-effective-target powerpc_p8vector_ok } */
-/* { dg-options "-mvsx 

[PATCH, rs6000] Fix implementation of vec_packsu (vector unsigned long long, vector unsigned long long) built-in function

2018-06-18 Thread Kelvin Nilsen


This patch fixes an error in the code generation for vec_packsu (vector 
unsigned long long, vector unsigned long long).  As previously implemented, 
this built-in function translates to the vpksdus instruction.

This patch causes vec_packsu (vector unsigned long long, vector unsigned long 
long) to behave the same as vec_packs (vector unsigned long long, vector 
unsigned long long) for the same type signature, producing the vpkudus 
instruction.

This patch has bootstrapped and tested without regressions on 
powerpc64le-unknown-linux (both P8 and P9) and on powerpc-linux (P8 big-endian, 
both -m32 and -m64).

Is this ok for the trunk?

gcc/ChangeLog:

2018-06-18  Kelvin Nilsen  

* config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Change
behavior of vec_packsu (vector unsigned long long, vector unsigned
long long) to match behavior of vec_packs with same signature.

gcc/testsuite/ChangeLog:

2018-06-18  Kelvin Nilsen  

* gcc.target/powerpc/builtins-1.c: Adjust dg directives to scan
for vpkudus in place of vpksdus.
* gcc.target/powerpc/builtins-3-p8.c: Likewise.

Index: gcc/config/rs6000/rs6000-c.c
===
--- gcc/config/rs6000/rs6000-c.c(revision 261599)
+++ gcc/config/rs6000/rs6000-c.c(working copy)
@@ -2544,7 +2544,7 @@ const struct altivec_builtin_types altivec_overloa
 RS6000_BTI_unsigned_V8HI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
   { ALTIVEC_BUILTIN_VEC_PACKSU, P8V_BUILTIN_VPKSDUS,
 RS6000_BTI_unsigned_V4SI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
-  { ALTIVEC_BUILTIN_VEC_PACKSU, P8V_BUILTIN_VPKSDUS,
+  { ALTIVEC_BUILTIN_VEC_PACKSU, P8V_BUILTIN_VPKUDUS,
 RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V2DI, 
RS6000_BTI_unsigned_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_VPKSWUS, ALTIVEC_BUILTIN_VPKSWUS,
 RS6000_BTI_unsigned_V8HI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
Index: gcc/testsuite/gcc.target/powerpc/builtins-1.c
===
--- gcc/testsuite/gcc.target/powerpc/builtins-1.c   (revision 261599)
+++ gcc/testsuite/gcc.target/powerpc/builtins-1.c   (working copy)
@@ -297,7 +297,7 @@ int main ()
vec_mul mulld | mullw, mulhwu
vec_nor xxlnor
vec_or  xxlor
-   vec_packsu  vpksdus
+   vec_packsu  vpkudus (matches twice due to -dp option)
vec_perm vperm
vec_round xvrdpi
vec_sel xxsel
@@ -335,7 +335,11 @@ int main ()
 /* { dg-final { scan-assembler-times "xxlnor" 6 } } */
 /* { dg-final { scan-assembler-times "xxlor" 11 { target { ilp32 } } } } */
 /* { dg-final { scan-assembler-times "xxlor" 7  { target { lp64 } } } } */
-/* { dg-final { scan-assembler-times "vpksdus" 2 } } */
+
+/* A single vpkudus matches twice because this is compiled with -dp,
+   causing diagnostic comments to appear in the resulting .s file, one
+   of which matches vpkudus.  */
+/* { dg-final { scan-assembler-times "vpkudus" 2 } } */
 /* { dg-final { scan-assembler-times "vperm" 4 } } */
 /* { dg-final { scan-assembler-times "xvrdpi" 2 } } */
 /* { dg-final { scan-assembler-times "xxsel" 10 } } */
Index: gcc/testsuite/gcc.target/powerpc/builtins-3-p8.c
===
--- gcc/testsuite/gcc.target/powerpc/builtins-3-p8.c(revision 261599)
+++ gcc/testsuite/gcc.target/powerpc/builtins-3-p8.c(working copy)
@@ -219,6 +219,8 @@ test_neg_double (vector double x)
  test_vui_packs_vull_vull  1 vpkudus
  test_vui_packs_vssi_vssi  1 vpkshss
  test_vsi_packsu_vssi_vssi 1 vpkshus
+ test_vsi_packsu_vsll_vsll 1 vpksdus
+ test_vsi_packsu_vull_vull 1 vpkudus
  test_unsigned_char_popcnt_signed_char 1 vpopcntb
  test_unsigned_char_popcnt_unsigned_char   1 vpopcntb
  test_unsigned_short_popcnt_signed_short   1 vpopcnth
@@ -241,11 +243,11 @@ test_neg_double (vector double x)
 /* { dg-final { scan-assembler-times "vcmpequd" 1 } } */
 /* { dg-final { scan-assembler-times "vpkudum"  1 } } */
 /* { dg-final { scan-assembler-times "vpksdss"  1 } } */
-/* { dg-final { scan-assembler-times "vpkudus"  1 } } */  
+/* { dg-final { scan-assembler-times "vpkudus"  2 } } */  
 /* { dg-final { scan-assembler-times "vpkuhus"  2 } } */
 /* { dg-final { scan-assembler-times "vpkshss"  1 } } */  
 /* { dg-final { scan-assembler-times "vpkshus"  1 } } */  
-/* { dg-final { scan-assembler-times "vpksdus"  2 } } */  
+/* { dg-final { scan-assembler-times "vpksdus"  1 } } */  
 /* { dg-final { scan-assembler-times "vpkuwus"  2 } } */  
 /* { dg-final { scan-assembler-times "vpopcntb" 2 } } */
 /* { dg-final { scan-assembler-times "vpopcnth" 2 } } */



[PATCH, rs6000] Fix implementation of vec_pack (vector double, vector double) built-in function

2018-06-15 Thread Kelvin Nilsen
This patch fixes an error in the code generation for vec_pack (vector double, 
vector double).  As previously implemented, this built-in function translates 
to the vpkudum instruction.

This patch causes vec_pack (vector double, vector double) to behave the same as 
vec_float2 for the same type signature, producing the vmrgow instruction on 
little-endian targets and the vmrgew instruction on big-endian targets.

This patch has bootstrapped and tested without regressions on 
powerpc64le-unknown-linux (both P8 and P9) and on powerpc-linux (P8 big-endian, 
both -m32 and -m64).

Is this ok for the trunk?

gcc/ChangeLog:

2018-06-14  Kelvin Nilsen  

* config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Change
behavior of vec_pack (double, double) to match behavior of
vec_float2 (double, double).

gcc/testsuite/ChangeLog:

2018-06-14  Kelvin Nilsen  

* gcc.target/powerpc/builtins-3-p8.c (test_pack_float): Remove
this test.
* gcc.target/powerpc/builtins-9-p8-be.c: New test.
* gcc.target/powerpc/builtins-9-p8-le.c: New test.
* gcc.target/powerpc/builtins-9-p9-le.c: New test.
* gcc.target/powerpc/fold-vec-pack-double-p8-be.c: New test.
* gcc.target/powerpc/fold-vec-pack-double-p8-le.c: New test.
* gcc.target/powerpc/fold-vec-pack-double.c: Specialize this test
for p9 little-endian.

Index: gcc/config/rs6000/rs6000-c.c
===
--- gcc/config/rs6000/rs6000-c.c(revision 261341)
+++ gcc/config/rs6000/rs6000-c.c(working copy)
@@ -2425,7 +2425,7 @@ const struct altivec_builtin_types altivec_overloa
 RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V2DI, 
RS6000_BTI_unsigned_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_PACK, P8V_BUILTIN_VPKUDUM,
 RS6000_BTI_bool_V4SI, RS6000_BTI_bool_V2DI, RS6000_BTI_bool_V2DI, 0 },
-  { ALTIVEC_BUILTIN_VEC_PACK, P8V_BUILTIN_VPKUDUM,
+  { ALTIVEC_BUILTIN_VEC_PACK, P8V_BUILTIN_FLOAT2_V2DF,
 RS6000_BTI_V4SF, RS6000_BTI_V2DF, RS6000_BTI_V2DF, 0 },
 
   { P8V_BUILTIN_VEC_NEG, P8V_BUILTIN_NEG_V16QI,
Index: gcc/testsuite/gcc.target/powerpc/builtins-3-p8.c
===
--- gcc/testsuite/gcc.target/powerpc/builtins-3-p8.c(revision 261341)
+++ gcc/testsuite/gcc.target/powerpc/builtins-3-p8.c(working copy)
@@ -11,12 +11,6 @@ test_eq_long_long (vector bool long long x, vector
return vec_cmpeq (x, y);
 }
 
-vector float
-test_pack_float (vector double x, vector double y)
-{
-  return vec_pack (x, y);
-}
-
 vector unsigned char
 test_vsi_packs_vusi_vusi (vector unsigned short x,
   vector unsigned short y)
@@ -214,7 +208,6 @@ test_neg_double (vector double x)
 /* Expected test results:
 
  test_eq_long_long 1 vcmpequd inst
- test_pack_float   1 vpkudum inst
  test_vsi_packs_vsll_vsll  1 vpksdss
  test_vui_packs_vull_vull  1 vpkudus
  test_vui_packs_vssi_vssi  1 vpkshss
@@ -239,7 +232,6 @@ test_neg_double (vector double x)
  */
 
 /* { dg-final { scan-assembler-times "vcmpequd" 1 } } */
-/* { dg-final { scan-assembler-times "vpkudum"  1 } } */
 /* { dg-final { scan-assembler-times "vpksdss"  1 } } */
 /* { dg-final { scan-assembler-times "vpkudus"  1 } } */  
 /* { dg-final { scan-assembler-times "vpkuhus"  2 } } */
Index: gcc/testsuite/gcc.target/powerpc/builtins-9-p8-be.c
===
--- gcc/testsuite/gcc.target/powerpc/builtins-9-p8-be.c (nonexistent)
+++ gcc/testsuite/gcc.target/powerpc/builtins-9-p8-be.c (working copy)
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-require-effective-target be } */ /* Require big-endian.  */
+/* { dg-options "-O3 -maltivec -mcpu=power8" } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power8" } } */
+
+#include 
+
+vector float
+test_pack_float (vector double x, vector double y)
+{
+  return vec_pack (x, y);
+}
+
+/* { dg-final { scan-assembler-times "vmrgew"  1 } } */
+/* { dg-final { scan-assembler-times "xvcvdpsp"  2 } } */
+/* { dg-final { scan-assembler-times "xxpermdi"  2 } } */
+
Index: gcc/testsuite/gcc.target/powerpc/builtins-9-p8-le.c
===
--- gcc/testsuite/gcc.target/powerpc/builtins-9-p8-le.c (nonexistent)
+++ gcc/testsuite/gcc.target/powerpc/builtins-9-p8-le.c (working copy)
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-require-effective-target le } */ /* Require little-endian.  */
+/* { dg-options "-O3 -maltivec -mcpu=power8" } */
+/* { dg-skip-if "do not

[PATCH, rs6000] Improve indentation of prototype documentation

2018-06-04 Thread Kelvin Nilsen


This patch removes extraneous line breaks to condense the number of lines 
require in the "PowerPC AltiVec Built-in Functions" section of the gcc.pdf 
manual by about 7 pages.  Besides improving the appearance of this 
documentation, there are two additional benefits:

1. Subsequent patches that move prototype definitions in order to alphabetize 
definitions or in order to group definitions requiring the same target options 
together are easier to understand if each prototype description is represented 
on a single line.

2. Enclosing the group of 8 vec_xl prototypes and 8 vec_xst prototypes between 
@smallexample and @end smallexample allows these prototypes to be automatically 
parsed by a tool that validates consistency between implementation and 
documentation of built-in functions.

This patch has bootstrapped and tested without regressions on 
powerpc64le-unknown-linux (P8).  I have also built the gcc.pdf file and 
reviewed its contents.

Segher: if you prefer, I can break this into multiple smaller patches.  What 
would be the ideal size of each patch?

Is this ok for trunk?

gcc/ChangeLog:

2018-06-04  Kelvin Nilsen  

* doc/extend.texi (PowerPC AltiVec Built-in Functions): Adjust
indentation and line wrap for many prototypes.  Add missing
@smallexample directives around block of prototypes for vec_xl and
vec_xst.

Index: gcc/doc/extend.texi
===
--- gcc/doc/extend.texi (revision 261067)
+++ gcc/doc/extend.texi (working copy)
@@ -16200,17 +16200,13 @@ vector signed char vec_add (vector signed char, ve
 vector signed char vec_add (vector signed char, vector signed char);
 vector unsigned char vec_add (vector bool char, vector unsigned char);
 vector unsigned char vec_add (vector unsigned char, vector bool char);
-vector unsigned char vec_add (vector unsigned char,
-  vector unsigned char);
+vector unsigned char vec_add (vector unsigned char, vector unsigned char);
 vector signed short vec_add (vector bool short, vector signed short);
 vector signed short vec_add (vector signed short, vector bool short);
 vector signed short vec_add (vector signed short, vector signed short);
-vector unsigned short vec_add (vector bool short,
-   vector unsigned short);
-vector unsigned short vec_add (vector unsigned short,
-   vector bool short);
-vector unsigned short vec_add (vector unsigned short,
-   vector unsigned short);
+vector unsigned short vec_add (vector bool short, vector unsigned short);
+vector unsigned short vec_add (vector unsigned short, vector bool short);
+vector unsigned short vec_add (vector unsigned short, vector unsigned short);
 vector signed int vec_add (vector bool int, vector signed int);
 vector signed int vec_add (vector signed int, vector bool int);
 vector signed int vec_add (vector signed int, vector signed int);
@@ -16226,47 +16222,33 @@ vector signed int vec_vadduwm (vector signed int,
 vector signed int vec_vadduwm (vector signed int, vector signed int);
 vector unsigned int vec_vadduwm (vector bool int, vector unsigned int);
 vector unsigned int vec_vadduwm (vector unsigned int, vector bool int);
-vector unsigned int vec_vadduwm (vector unsigned int,
- vector unsigned int);
+vector unsigned int vec_vadduwm (vector unsigned int, vector unsigned int);
 
-vector signed short vec_vadduhm (vector bool short,
- vector signed short);
-vector signed short vec_vadduhm (vector signed short,
- vector bool short);
-vector signed short vec_vadduhm (vector signed short,
- vector signed short);
-vector unsigned short vec_vadduhm (vector bool short,
-   vector unsigned short);
-vector unsigned short vec_vadduhm (vector unsigned short,
-   vector bool short);
-vector unsigned short vec_vadduhm (vector unsigned short,
-   vector unsigned short);
+vector signed short vec_vadduhm (vector bool short, vector signed short);
+vector signed short vec_vadduhm (vector signed short, vector bool short);
+vector signed short vec_vadduhm (vector signed short, vector signed short);
+vector unsigned short vec_vadduhm (vector bool short, vector unsigned short);
+vector unsigned short vec_vadduhm (vector unsigned short, vector bool short);
+vector unsigned short vec_vadduhm (vector unsigned short, vector unsigned 
short);
 
 vector signed char vec_vaddubm (vector bool char, vector signed char);
 vector signed char vec_vaddubm (vector signed char, vector bool char);
 vector signed char vec_vaddubm (vector signed char, vector signed char);
-vector unsigned char vec_vaddubm (vector bool char,
-  vector unsigned char);
-vector unsigned char vec_vaddu

[PATCH, rs6000] Correct documentation of vec_lvsl and vec_lvsr arguments

2018-06-04 Thread Kelvin Nilsen
The existing documentation incorrectly specifies that the second argument of 
vec_lvsl and vec_lvsr instructions are volatile  *.  This patch removes 
the volatile qualifier from the documentation of these arguments.

his patch has bootstrapped and tested without regressions on 
powerpc64le-unknown-linux (P8).  I have built the gcc.pdf file and reviewed its 
contents.

Is this ok for trunk?

 gcc/ChangeLog:

2018-06-04  Kelvin Nilsen  

* doc/extend.texi (PowerPC AltiVec Built-in Functions): Remove
volatile qualifier from vec_lvsl and vec_lvsr argument prototypes.

Index: gcc/doc/extend.texi
===
--- gcc/doc/extend.texi (revision 261067)
+++ gcc/doc/extend.texi (working copy)
@@ -16662,25 +16662,25 @@ vector unsigned char vec_ldl (int, const unsigned
 
 vector float vec_loge (vector float);
 
-vector unsigned char vec_lvsl (int, const volatile unsigned char *);
-vector unsigned char vec_lvsl (int, const volatile signed char *);
-vector unsigned char vec_lvsl (int, const volatile unsigned short *);
-vector unsigned char vec_lvsl (int, const volatile short *);
-vector unsigned char vec_lvsl (int, const volatile unsigned int *);
-vector unsigned char vec_lvsl (int, const volatile int *);
-vector unsigned char vec_lvsl (int, const volatile unsigned long *);
-vector unsigned char vec_lvsl (int, const volatile long *);
-vector unsigned char vec_lvsl (int, const volatile float *);
+vector unsigned char vec_lvsl (int, const unsigned char *);
+vector unsigned char vec_lvsl (int, const signed char *);
+vector unsigned char vec_lvsl (int, const unsigned short *);
+vector unsigned char vec_lvsl (int, const short *);
+vector unsigned char vec_lvsl (int, const unsigned int *);
+vector unsigned char vec_lvsl (int, const int *);
+vector unsigned char vec_lvsl (int, const unsigned long *);
+vector unsigned char vec_lvsl (int, const long *);
+vector unsigned char vec_lvsl (int, const float *);
 
-vector unsigned char vec_lvsr (int, const volatile unsigned char *);
-vector unsigned char vec_lvsr (int, const volatile signed char *);
-vector unsigned char vec_lvsr (int, const volatile unsigned short *);
-vector unsigned char vec_lvsr (int, const volatile short *);
-vector unsigned char vec_lvsr (int, const volatile unsigned int *);
-vector unsigned char vec_lvsr (int, const volatile int *);
-vector unsigned char vec_lvsr (int, const volatile unsigned long *);
-vector unsigned char vec_lvsr (int, const volatile long *);
-vector unsigned char vec_lvsr (int, const volatile float *);
+vector unsigned char vec_lvsr (int, const unsigned char *);
+vector unsigned char vec_lvsr (int, const signed char *);
+vector unsigned char vec_lvsr (int, const unsigned short *);
+vector unsigned char vec_lvsr (int, const short *);
+vector unsigned char vec_lvsr (int, const unsigned int *);
+vector unsigned char vec_lvsr (int, const int *);
+vector unsigned char vec_lvsr (int, const unsigned long *);
+vector unsigned char vec_lvsr (int, const long *);
+vector unsigned char vec_lvsr (int, const float *);
 
 vector float vec_madd (vector float, vector float, vector float);
 
@@ -18210,8 +18210,8 @@ vector double vec_ld (int, const vector double *);
 vector double vec_ld (int, const double *);
 vector double vec_ldl (int, const vector double *);
 vector double vec_ldl (int, const double *);
-vector unsigned char vec_lvsl (int, const volatile double *);
-vector unsigned char vec_lvsr (int, const volatile double *);
+vector unsigned char vec_lvsl (int, const double *);
+vector unsigned char vec_lvsr (int, const double *);
 vector double vec_madd (vector double, vector double, vector double);
 vector double vec_max (vector double, vector double);
 vector signed long vec_mergeh (vector signed long, vector signed long);



[PATCH, rs6000] Clean up implementation of built-in functions

2018-06-01 Thread Kelvin Nilsen
This patch improves maintainability of the rs6000 built-in functions by adding 
a comment to describe the non-traditional implementation of the 
__builtin_vec_vsx_ld and __builtin_vec_vsx_st functions, and by removing eight 
redundant entries from the altivec_overloaded_builtins array.

Note, in the patch file, that the lines immediately preceding each of the 
deletions from altivec_overloaded_builtins exactly matches the deleted lines.  
This redundancy may have been accidentally introduced by manual resolution of 
merge conflicts.  I did not investigate the origin of the redundancy.  The 
redundant entries cause trouble to tools that automate consistency checking 
between implementation and documentation of built-in functions.  Additionally, 
they are a likely cause of future bugs if any future efforts need to make 
corrections or changes to the associated functions.

This patch has bootstrapped and tested without regressions on 
powerpc64le-unknown-linux (P8).

Is this ok for trunk?


gcc/ChangeLog:

2018-06-01  Kelvin Nilsen  

* config/rs6000/rs6000-builtin.def (VSX_BUILTIN_VEC_LD,
VSX_BUILTIN_VEC_ST): Add comment to explain non-traditional uses.
* config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Remove
several redundant entries.

Index: gcc/config/rs6000/rs6000-builtin.def
===
--- gcc/config/rs6000/rs6000-builtin.def(revision 261067)
+++ gcc/config/rs6000/rs6000-builtin.def(working copy)
@@ -1811,6 +1811,15 @@ BU_VSX_OVERLOAD_1 (VUNSIGNEDE,  "vunsignede")
 BU_VSX_OVERLOAD_1 (VUNSIGNEDO,  "vunsignedo")
 
 /* VSX builtins that are handled as special cases.  */
+
+
+/* NON-TRADITIONAL BEHAVIOR HERE: Besides introducing the
+   __builtin_vec_ld and __builtin_vec_st built-in functions,
+   the VSX_BUILTIN_VEC_LD and VSX_BUILTIN_VEC_ST symbolic constants
+   introduced below are also affiliated with the __builtin_vec_vsx_ld
+   and __builtin_vec_vsx_st functions respectively.  This unnatural
+   binding is formed with explicit calls to the def_builtin function
+   found in rs6000.c.  */
 BU_VSX_OVERLOAD_X (LD,  "ld")
 BU_VSX_OVERLOAD_X (ST,  "st")
 BU_VSX_OVERLOAD_X (XL,  "xl")
Index: gcc/config/rs6000/rs6000-c.c
===
--- gcc/config/rs6000/rs6000-c.c(revision 261067)
+++ gcc/config/rs6000/rs6000-c.c(working copy)
@@ -1375,28 +1375,16 @@ const struct altivec_builtin_types altivec_overloa
 RS6000_BTI_bool_V4SI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
   { ALTIVEC_BUILTIN_VEC_VCMPGTSW, ALTIVEC_BUILTIN_VCMPGTSW,
 RS6000_BTI_bool_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
-  { ALTIVEC_BUILTIN_VEC_VCMPGTSW, ALTIVEC_BUILTIN_VCMPGTSW,
-RS6000_BTI_bool_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
   { ALTIVEC_BUILTIN_VEC_VCMPGTUW, ALTIVEC_BUILTIN_VCMPGTUW,
 RS6000_BTI_bool_V4SI, RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, 
0 },
-  { ALTIVEC_BUILTIN_VEC_VCMPGTUW, ALTIVEC_BUILTIN_VCMPGTUW,
-RS6000_BTI_bool_V4SI, RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, 
0 },
   { ALTIVEC_BUILTIN_VEC_VCMPGTSH, ALTIVEC_BUILTIN_VCMPGTSH,
 RS6000_BTI_bool_V8HI, RS6000_BTI_V8HI, RS6000_BTI_V8HI, 0 },
-  { ALTIVEC_BUILTIN_VEC_VCMPGTSH, ALTIVEC_BUILTIN_VCMPGTSH,
-RS6000_BTI_bool_V8HI, RS6000_BTI_V8HI, RS6000_BTI_V8HI, 0 },
   { ALTIVEC_BUILTIN_VEC_VCMPGTUH, ALTIVEC_BUILTIN_VCMPGTUH,
 RS6000_BTI_bool_V8HI, RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI, 
0 },
-  { ALTIVEC_BUILTIN_VEC_VCMPGTUH, ALTIVEC_BUILTIN_VCMPGTUH,
-RS6000_BTI_bool_V8HI, RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI, 
0 },
   { ALTIVEC_BUILTIN_VEC_VCMPGTSB, ALTIVEC_BUILTIN_VCMPGTSB,
 RS6000_BTI_bool_V16QI, RS6000_BTI_V16QI, RS6000_BTI_V16QI, 0 },
-  { ALTIVEC_BUILTIN_VEC_VCMPGTSB, ALTIVEC_BUILTIN_VCMPGTSB,
-RS6000_BTI_bool_V16QI, RS6000_BTI_V16QI, RS6000_BTI_V16QI, 0 },
   { ALTIVEC_BUILTIN_VEC_VCMPGTUB, ALTIVEC_BUILTIN_VCMPGTUB,
 RS6000_BTI_bool_V16QI, RS6000_BTI_unsigned_V16QI, 
RS6000_BTI_unsigned_V16QI, 0 },
-  { ALTIVEC_BUILTIN_VEC_VCMPGTUB, ALTIVEC_BUILTIN_VCMPGTUB,
-RS6000_BTI_bool_V16QI, RS6000_BTI_unsigned_V16QI, 
RS6000_BTI_unsigned_V16QI, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPLE, ALTIVEC_BUILTIN_VCMPGEFP,
 RS6000_BTI_bool_V4SI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPLE, VSX_BUILTIN_XVCMPGEDP,
@@ -4249,8 +4237,6 @@ const struct altivec_builtin_types altivec_overloa
   { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVD2X_V2DI,
 RS6000_BTI_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_long_long, 0 },
   { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVD2X_V2DI,
-RS6000_BTI_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_long_long, 0 },
-  { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVD2X_V2DI,
 RS6000_BTI_unsigned_V1TI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTTI, 0 },
   { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVD2X_V2DI,
 RS6000_BTI_unsigned_V

[PATCH, rs6000] Remove incorrect built-in function documentation

2018-05-23 Thread Kelvin Nilsen

This patch removes several incorrectly documented functions from the "PowerPC
AltiVec Built-in Functions" section of the "Using the GNU Compiler Collection"
manual.

The following two functions are removed because they are not implemented:

  vector float vec_copysign (vector float);
  vector float vec_recip (vector float, vector float);

The following six functions are removed because though they are implemented,
they are not specified in the AltiVec PIM document and the type of the result
vector does not match the type of the supplied pointer argument:

  vector signed int vec_lde (int, const long long *);
  vector unsigned int vec_lde (int, const unsigned long long *);

  vector int vec_ld (int, long *)
  vector unsigned int vec_ld (int, const unsigned long *);

  vector signed int vec_lvewx (int, long *);
  vector unsigned int vec_lvewx (int, unsigned long *);

The following two functions are removed because they are not implemented. 
Also, they are not specified in the AltiVec PIM document and the type of
the result vector does not match the type of the supplied pointer argument:

  vector signed int vec_ldl (int, const long *);
  vector unsigned int vec_ldl (int, const unsigned long *);

The following four functions are removed because they are not implemented. 
They do happen to be specified in the AltiVec PIM document. Until they are
implemented, they should not be documented:

  void vec_st (vector pixel, int, unsigned short *)
  void vec_st (vector pixel, int, short *)

  void vec_stl (vector pixel, int, unsigned short *);
  void vec_stl (vector pixel, int, short *);

The following two functions are removed because they are not implemented.
They are not specified in the AltiVec PIM or ABI v.2 documents:

  void vec_stvehx (vector pixel, int, short *);
  void vec_stvehx (vector pixel, int, unsigned short *);

The following function was incompletely documented.  The argument list lacked
a closing parenthesis.  There is no function by this name.

  test_vsi_packsu_vssi_vssi (vector signed short x,

This patch successfully builds on both powerpc64le-unknown-linux (P8) and on
powerpc-linux (P7 big-endian, with both -m32 and -m64 target options).


The patch affects only extend.texi.  The gcc.pdf file has been built
and reviewed.

Is this ok for the trunk?

gcc/ChangeLog:

2018-05-23  Kelvin Nilsen  <kel...@gcc.gnu.org>

* doc/extend.texi (PowerPC AltiVec Built-in Functions): Remove
descriptions of various incorrectly documented functions.

Index: gcc/doc/extend.texi
===
--- gcc/doc/extend.texi (revision 260607)
+++ gcc/doc/extend.texi (working copy)
@@ -16354,8 +16354,6 @@ vector signed char vec_vavgsb (vector signed char,
 vector unsigned char vec_vavgub (vector unsigned char,
  vector unsigned char);
 
-vector float vec_copysign (vector float);
-
 vector float vec_ceil (vector float);
 
 vector signed int vec_cmpb (vector float, vector float);
@@ -16569,10 +16567,8 @@ vector float vec_ld (int, const float *);
 vector bool int vec_ld (int, const vector bool int *);
 vector signed int vec_ld (int, const vector signed int *);
 vector signed int vec_ld (int, const int *);
-vector signed int vec_ld (int, const long *);
 vector unsigned int vec_ld (int, const vector unsigned int *);
 vector unsigned int vec_ld (int, const unsigned int *);
-vector unsigned int vec_ld (int, const unsigned long *);
 vector bool short vec_ld (int, const vector bool short *);
 vector pixel vec_ld (int, const vector pixel *);
 vector signed short vec_ld (int, const vector signed short *);
@@ -16592,14 +16588,10 @@ vector unsigned short vec_lde (int, const unsigned
 vector float vec_lde (int, const float *);
 vector signed int vec_lde (int, const int *);
 vector unsigned int vec_lde (int, const unsigned int *);
-vector signed int vec_lde (int, const long *);
-vector unsigned int vec_lde (int, const unsigned long *);
 
 vector float vec_lvewx (int, float *);
 vector signed int vec_lvewx (int, int *);
 vector unsigned int vec_lvewx (int, unsigned int *);
-vector signed int vec_lvewx (int, long *);
-vector unsigned int vec_lvewx (int, unsigned long *);
 
 vector signed short vec_lvehx (int, short *);
 vector unsigned short vec_lvehx (int, unsigned short *);
@@ -16612,10 +16604,8 @@ vector float vec_ldl (int, const float *);
 vector bool int vec_ldl (int, const vector bool int *);
 vector signed int vec_ldl (int, const vector signed int *);
 vector signed int vec_ldl (int, const int *);
-vector signed int vec_ldl (int, const long *);
 vector unsigned int vec_ldl (int, const vector unsigned int *);
 vector unsigned int vec_ldl (int, const unsigned int *);
-vector unsigned int vec_ldl (int, const unsigned long *);
 vector bool short vec_ldl (int, const vector bool short *);
 vector pixel vec_ldl (int, const vector pixel *);
 vector signed short vec_ldl (int, const vector signed short *);
@@ -

[PATCH, rs6000] Improved Documentation of Built-in Functions Part 2

2018-05-14 Thread Kelvin Nilsen
The focus of this patch is to restructure the section headers within the PowerPC
portion of the extend.texi documentation file.  Restructuring section headers 
prepares
the foundation for subsequent documentation improvements which will be 
delivered in
follow-on patches.

I have bootstrapped and regression tested without regressions on
powerpc64le-unknown-linux (P8).  I have also confirmed that this patch builds
on powerpc-linux (P7 bing-endian, both -m32 and -m64 target options).

I have also built and reviewed the gcc.pdf file.

Is this ok for the trunk?

gcc/ChangeLog:

2018-05-14  Kelvin Nilsen  <kel...@gcc.gnu.org>

* doc/extend.texi (Basic PowerPC Built-in Functions): Rename this
subsection to be "PowerPC Built-in Functions".
(PowerPC Altivec/VSX Built-in Functions): Change this
subsection to subsubsection and rename as "PowerPC Altivec
Built-in Functions Available on ISA 2.05".
(PowerPC Built-in Functions Available on ISA 2.06): New
subsubsection.
(PowerPC Built-in Functions Available on ISA 2.07): Likewise.
(PowerPC Built-in Functions Available on ISA 3.0): Likewise.
(PowerPC Hardware Transactional Memory Built-in Functions): Split
this subsection into two subsubsections named "Basic PowerPC
Hardware Transactional Memory Built-in Functions" and "PowerPC
Hardware Transactional Memory Built-in Functions".  Move the basic
subsubsection forward to be next to other basic subsubsections.
(PowerPC Atomic Memory Operation Functions): Change this
subsection to subsubsection.

Index: gcc/doc/extend.texi
===
--- gcc/doc/extend.texi (revision 260182)
+++ gcc/doc/extend.texi (working copy)
@@ -12477,10 +12477,7 @@ instructions, but allow the compiler to schedule t
 * MSP430 Built-in Functions::
 * NDS32 Built-in Functions::
 * picoChip Built-in Functions::
-* Basic PowerPC Built-in Functions::
-* PowerPC AltiVec/VSX Built-in Functions::
-* PowerPC Hardware Transactional Memory Built-in Functions::
-* PowerPC Atomic Memory Operation Functions::
+* PowerPC Built-in Functions::
 * RX Built-in Functions::
 * S/390 System z Built-in Functions::
 * SH Built-in Functions::
@@ -15536,25 +15533,35 @@ implementing assertions.
 
 @end table
 
-@node Basic PowerPC Built-in Functions
-@subsection Basic PowerPC Built-in Functions
+@node PowerPC Built-in Functions
+@subsection PowerPC Built-in Functions
 
+This section describes built-in functions that are supported for
+various configurations of the PowerPC processor.
+
 @menu
 * Basic PowerPC Built-in Functions Available on all Configurations::
 * Basic PowerPC Built-in Functions Available on ISA 2.05::
 * Basic PowerPC Built-in Functions Available on ISA 2.06::
 * Basic PowerPC Built-in Functions Available on ISA 2.07::
+* Basic PowerPC Hardware Transactional Memory Built-in Functions::
 * Basic PowerPC Built-in Functions Available on ISA 3.0::
+* PowerPC AltiVec Built-in Functions Available on ISA 2.05::
+* PowerPC AltiVec Built-in Functions Available on ISA 2.06::
+* PowerPC AltiVec Built-in Functions Available on ISA 2.07::
+* PowerPC AltiVec Built-in Functions Available on ISA 3.0::
+* PowerPC Hardware Transactional Memory Built-in Functions::
+* PowerPC Atomic Memory Operation Functions::
 @end menu
 
-This section describes PowerPC built-in functions that do not require
-the inclusion of any special header files to declare prototypes or
-provide macro definitions.  The sections that follow describe
-additional PowerPC built-in functions.
-
 @node Basic PowerPC Built-in Functions Available on all Configurations
 @subsubsection Basic PowerPC Built-in Functions Available on all Configurations
 
+This section describes PowerPC built-in functions that are supported
+on all configurations and do not require
+the inclusion of any special header files to declare prototypes or
+provide macro definitions.
+
 @deftypefn {Built-in Function} void __builtin_cpu_init (void)
 This function is a @code{nop} on the PowerPC platform and is included solely
 to maintain API compatibility with the x86 builtins.
@@ -15889,6 +15896,150 @@ addition to the @option{-mpower8-fusion}, @option{
 
 This section intentionally empty.
 
+@node Basic PowerPC Hardware Transactional Memory Built-in Functions
+@subsubsection Basic PowerPC Hardware Transactional Memory Built-in Functions
+
+The following basic built-in functions are available with
+@option{-mhtm} or @option{-mcpu=CPU} where CPU is `power8' or later.
+They all generate the machine instruction that is part of the name.
+
+The Hardware Transactional Memory (HTM) builtins (with the exception
+of @code{__builtin_tbegin}) return 
+the full 4-bit condition register value set by their associated hardware
+instruction.  The header file @code{htmintrin.h} defines some macros that can
+be used to decipher the retu

[PATCH v2, rs6000] Improve Documentation of Built-In Functions Part 1

2018-05-09 Thread Kelvin Nilsen

This is the first of several planned patches to address shortcomings in
existing documentation of PowerPC built-in functions.  The focus of this
particular patch is to improve documentation of basic built-in functions
that do not require inclusion of special header files.

A summary of this patch follows:

1. Change the name of the first PowerPC built-in section from 
   "PowerPC Built-in Functions" to "Basic PowerPC Built-in Functions".
   This section has never described all PowerPC built-in functions.

2. Introduce subsubsections within this section to independently describe
   built-in functions that target particular ISA levels.  Sort function
   descriptions into appropriate subsubsections.

3. Add descriptions of three new features that can be tested with the
   __builtin_cpu_supports function: darn, htm-no-suspend, and scv.

4. Corrected the spellings of several built-in functions:
   __builtin_fmaf128_round_to_odd, __builtin_addg6s, __builtin_cbctdt,
   __builtin_cdtbcd.

This patch is limited in scope in order to manage complexity of the
diffs.  Subsequent patches will address different sections of the
documentation.  Subsequent patches will also add new function descriptions
into these sections.

This differs from the previous draft patch in the following regards:

1. This patch adds back in documentation of the __builtin_fabsq,
   __builtin_copysignq, __builtin_infq, __builtin_huge_valq, __builtin_nanq,
   __builtin_nansq, __builtin_sqrtf128, and __builtin_fmaf128 functions.

2. Consistently, changed subsubsection names from
   "Low-Level PowerPC Built-in ... " to "Basic PowerPC Built-in ... "

3. Changed subsubsection name from "... Available on All Targets" to
   "... Available on All Configurations".

4. Used @code{} font for darn and tsuspend. instruction names.

5. Removed unnecessary parentheses around many option descriptions.

6. Clarified that the result returned from the __builtin_darn_32 function is
   conditioned.

7. Enhanced the ChangeLog to call out each of the subsection names
   (within extend.texi) that is affected by this patch.

8. Changed the menu reference to the newly named "Basic PowerPC Built-in
   Functions"

9. Added a new sub-menu to identify the subsubsections of the "Basic PowerPC
   Built-in Functions" section.

I have bootstrapped and regression tested without regressions on both
powerpc64le-unknown-linux (P8) and on powerpc-linux (P8 big-endian,
with both -m32 and -m64 target options).  I have built and reviewed the
gcc.pdf on the little-endian test platform.  I did not build the gcc.pdf 
file on my big-endian test platform because it is missing relevant fonts.

Is this ok for the trunk?

2018-05-09  Kelvin Nilsen  <kel...@gcc.gnu.org>

* doc/extend.texi (PowerPC Built-in Functions): Rename this
subsection.
(Basic PowerPC Built-in Functions): The new name of the
subsection previously known as "PowerPC Built-in Functions".
(Basic PowerPC Built-in Functions Available on all Configurations):
New subsubsection.
(Basic PowerPC Built-in Functions Available on ISA 2.05): Likewise.
(Basic PowerPC Built-in Functions Available on ISA 2.06): Likewise.
(Basic PowerPC Built-in Functions Available on ISA 2.07): Likewise.
(Basic PowerPC Built-in Functions Available on ISA 3.0): Likewise.

Index: gcc/doc/extend.texi
===
--- gcc/doc/extend.texi (revision 260073)
+++ gcc/doc/extend.texi (working copy)
@@ -12475,7 +12475,7 @@
 * MSP430 Built-in Functions::
 * NDS32 Built-in Functions::
 * picoChip Built-in Functions::
-* PowerPC Built-in Functions::
+* Basic PowerPC Built-in Functions::
 * PowerPC AltiVec/VSX Built-in Functions::
 * PowerPC Hardware Transactional Memory Built-in Functions::
 * PowerPC Atomic Memory Operation Functions::
@@ -15534,12 +15534,25 @@
 
 @end table
 
-@node PowerPC Built-in Functions
-@subsection PowerPC Built-in Functions
+@node Basic PowerPC Built-in Functions
+@subsection Basic PowerPC Built-in Functions
 
-The following built-in functions are always available and can be used to
-check the PowerPC target platform type:
+@menu
+* Basic PowerPC Built-in Functions Available on all Configurations::
+* Basic PowerPC Built-in Functions Available on ISA 2.05::
+* Basic PowerPC Built-in Functions Available on ISA 2.06::
+* Basic PowerPC Built-in Functions Available on ISA 2.07::
+* Basic PowerPC Built-in Functions Available on ISA 3.0::
+@end menu
 
+This section describes PowerPC built-in functions that do not require
+the inclusion of any special header files to declare prototypes or
+provide macro definitions.  The sections that follow describe
+additional PowerPC built-in functions.
+
+@node Basic PowerPC Built-in Functions Available on all Configurations
+@subsubsection Basic PowerPC Built-in Functions

Re: [PATCH, rs6000] Improve Documentation of Built-In Functions Part 1

2018-04-25 Thread Kelvin Nilsen
Thank you for the prompt review and careful feedback.  I didn't notice
your message until this morning.  At this point, I'll wait a few days before
committing these changes as I understand we are still in the "RC phase of GCC 
8".


On 4/24/18 4:45 PM, Segher Boessenkool wrote:
> Hi!
> 
> On Tue, Apr 24, 2018 at 02:25:58PM -0500, Kelvin Nilsen wrote:
>>> 4. Remove descriptions of built-in function that do not belong in this
>>> section because the
>>>built-in functions are generic (not specific to PowerPC):
>>> __builtin_fabsq,
>>>__builtin_copysignq, __builtin_infq, __builtin_huge_valq, __builtin_nanq,
>>>__builtin_nansq, __builtin_sqrtf128, __builtin_fmaf128.
> 
> Are these described in a generic place, then?  I don't see it?
> 
>> +@node Low-Level PowerPC Built-in Functions Available on all Targets
>> +@subsubsection Low-Level PowerPC Built-in Functions Available on all Targets

Regarding your question about "q functions", the existing gcc.pdf document
is a bit confusing.  Here's what I can figure out.

The following are mentioned only in "Section 6.59.33: x86 Built-in Functions"

  __float128 __builtin_fabsq (__float128)
  __float128 __builtin_copysignq (__float128, __float128)
  __float128 __builtin_infq (void)
  __float128 __builtin_huge_valq (void)
  __float128 __builtin_nanq (void)
  __float128 __builtin_nansq (void)

As far as I can tell, these should not be documented as specific to x86, but
should be documented as generic across all platforms.  This is an issue outside
the realm of PowerPC maintenance.

If we want to preserve mention of these "q" functions, I would recommend
changing the text that introduces them.  Currently, it says:

  "Previous versions of GCC supported some 'q' builtins for IEEE 128-bit
   floating point.  These functions are now mapped into the equivalent
   'f128' builtin functions."

If the description of these built-ins is not moved to a more generic context,
I would prefer to replace this section with something like:

The following functions, which are also supported on x86 targets, are supported
if the -mfloat128 option is specified:

  __float128 __builtin_fabsq (__float128)
  __float128 __builtin_copysignq (__float128, __float128)
  __float128 __builtin_infq (void)
  __float128 __builtin_huge_valq (void)
  __float128 __builtin_nanq (void)
  __float128 __builtin_nansq (void)

Regarding your question about f128 functions, these are "supposed to be"
documented in "Section 6.58: Other Built-in Functions Provided by GCC".
Search for the phrase "corresponding to the TS 18661-3 functions".  We
should add "__builtin_sqrtf128 and builtin_fmaf128 to the list of functions
described this way.  These may not be the only omissions.  Should we push
for fixing this documentation in Section 6.58 instead of keeping it in
the PowerPC section?

It is difficult to find the official TS 18661-3 document, and
I'm not sure where to look for a list of which of the functions are
currently implemented by gcc.  I found this "diff" document, which provides
some hints.  Given that this standard is not easily accessible, perhaps the
generic built-in documentation should provide a little more information?

See http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1945.pdf



Re: [PATCH, rs6000] Improve Documentation of Built-In Functions Part 1

2018-04-24 Thread Kelvin Nilsen
I'm updating this patch to make two improvements to what was submitted
earlier today:

1. Correct the description of the htm-no-suspend CPU feature.

2. Add a comment to clarify that the builtin_divde and builtin_divdeu
   built-in functions require 64-bit targets.

Everything else is the same as submitted previously.

On 4/24/18 9:12 AM, Kelvin Nilsen wrote:
> This is the first of several patches to address shortcomings in existing
> documentation of
> PowerPC built-in functions.  The focus of this particular patch is to
> improve documentation
> of low-level built-in functions that do not require special include headers.
> 
> A summary of this patch follows:
> 
> 1. Change the name of the first PowerPC built-in section from "PowerPC
> Built-in Functions"
>    to "Low-Level PowerPC Built-in Functions".  This section has never
> described all PowerPC
>    built-in functions.
> 
> 2. Introduce subsubsections within this section to independently
> describe built-in functions
>    that target particular ISA levels.  Sort function descriptions into
> appropriate
>    subsubsections.
> 
> 3. Add descriptions of three new features that can be tested with the
> __builtin_cpu_supports
>    function: darn, htm-no-suspend, and scv.
> 
> 4. Remove descriptions of built-in function that do not belong in this
> section because the
>    built-in functions are generic (not specific to PowerPC):
> __builtin_fabsq,
>    __builtin_copysignq, __builtin_infq, __builtin_huge_valq, __builtin_nanq,
>    __builtin_nansq, __builtin_sqrtf128, __builtin_fmaf128.
> 
> 5. Corrected the spellings of several built-in functions:
> __builtin_fmaf128_round_to_odd,
>    __builtin_addg6s, __builtin_cbctdt, __builtin_cdtbcd.
> 
> This patch is limited in scope in order to manage complexity of the
> diffs.  Subsequent patches
> will address different sections of the documentation.  Subsequent
> patches will also add
> new function descriptions into these sections.
> 
> This patch affects only extend.texi.  The gcc.pdf file has been built
> and reviewed.
> 
> Is this ok for the trunk?
gcc/ChangeLog:

2018-04-24  Kelvin Nilsen  <kel...@gcc.gnu.org>

* doc/extend.texi: Tidy documentation of PowerPC built-in functions.

Index: gcc/doc/extend.texi
===
--- gcc/doc/extend.texi (revision 259504)
+++ gcc/doc/extend.texi (working copy)
@@ -15524,12 +15524,17 @@ implementing assertions.
 
 @end table
 
-@node PowerPC Built-in Functions
-@subsection PowerPC Built-in Functions
+@node Low-Level PowerPC Built-in Functions
+@subsection Low-Level PowerPC Built-in Functions
 
-The following built-in functions are always available and can be used to
-check the PowerPC target platform type:
+This section describes PowerPC built-in functions that do not require
+the inclusion of any special header files to declare prototypes or
+provide macro definitions.  The sections that follow describe
+additional PowerPC built-in functions.
 
+@node Low-Level PowerPC Built-in Functions Available on all Targets
+@subsubsection Low-Level PowerPC Built-in Functions Available on all Targets
+
 @deftypefn {Built-in Function} void __builtin_cpu_init (void)
 This function is a @code{nop} on the PowerPC platform and is included solely
 to maintain API compatibility with the x86 builtins.
@@ -15633,6 +15638,8 @@ CPU supports the set of compatible performance mon
 CPU supports the Embedded ISA category.
 @item cellbe
 CPU has a CELL broadband engine.
+@item darn
+CPU supports the darn (deliver a random number) instruction.
 @item dfp
 CPU has a decimal floating point unit.
 @item dscr
@@ -15649,6 +15656,9 @@ CPU has a floating point unit.
 CPU has hardware transaction memory instructions.
 @item htm-nosc
 Kernel aborts hardware transactions when a syscall is made.
+@item htm-no-suspend
+CPU supports hardware transaction memory but does not support the
+tsuspend. instruction.
 @item ic_snoop
 CPU supports icache snooping capabilities.
 @item ieee128
@@ -15677,6 +15687,8 @@ CPU supports the old POWER ISA (eg, 601)
 CPU supports 64-bit mode execution.
 @item ppcle
 CPU supports a little-endian mode that uses address swizzling.
+@item scv
+Kernel supports system call vectored.
 @item smt
 CPU support simultaneous multi-threading.
 @item spe
@@ -15708,19 +15720,81 @@ Here is an example:
 @end smallexample
 @end deftypefn
 
-These built-in functions are available for the PowerPC family of
+The following built-in functions are also available on all PowerPC
 processors:
 @smallexample
-float __builtin_recipdivf (float, float);
-float __builtin_rsqrtf (float);
-double __builtin_recipdiv (double, double);
-double __builtin_rsqrt (double);
 uint64_t __builtin_ppc_get_timebase ();
 unsigned long __builtin_ppc_mftb ();
-double __builtin_unpack_longdouble (lon

[PATCH, rs6000] Improve Documentation of Built-In Functions Part 1

2018-04-24 Thread Kelvin Nilsen
This is the first of several patches to address shortcomings in existing
documentation of
PowerPC built-in functions.  The focus of this particular patch is to
improve documentation
of low-level built-in functions that do not require special include headers.

A summary of this patch follows:

1. Change the name of the first PowerPC built-in section from "PowerPC
Built-in Functions"
   to "Low-Level PowerPC Built-in Functions".  This section has never
described all PowerPC
   built-in functions.

2. Introduce subsubsections within this section to independently
describe built-in functions
   that target particular ISA levels.  Sort function descriptions into
appropriate
   subsubsections.

3. Add descriptions of three new features that can be tested with the
__builtin_cpu_supports
   function: darn, htm-no-suspend, and scv.

4. Remove descriptions of built-in function that do not belong in this
section because the
   built-in functions are generic (not specific to PowerPC):
__builtin_fabsq,
   __builtin_copysignq, __builtin_infq, __builtin_huge_valq, __builtin_nanq,
   __builtin_nansq, __builtin_sqrtf128, __builtin_fmaf128.

5. Corrected the spellings of several built-in functions:
__builtin_fmaf128_round_to_odd,
   __builtin_addg6s, __builtin_cbctdt, __builtin_cdtbcd.

This patch is limited in scope in order to manage complexity of the
diffs.  Subsequent patches
will address different sections of the documentation.  Subsequent
patches will also add
new function descriptions into these sections.

This patch affects only extend.texi.  The gcc.pdf file has been built
and reviewed.

Is this ok for the trunk?

gcc/ChangeLog:

2018-04-24  Kelvin Nilsen  <kel...@gcc.gnu.org>

    * doc/extend.texi: Tidy documentation of PowerPC built-in functions.

Index: gcc/doc/extend.texi
===
--- gcc/doc/extend.texi    (revision 259504)
+++ gcc/doc/extend.texi    (working copy)
@@ -15524,12 +15524,17 @@ implementing assertions.
 
 @end table
 
-@node PowerPC Built-in Functions
-@subsection PowerPC Built-in Functions
+@node Low-Level PowerPC Built-in Functions
+@subsection Low-Level PowerPC Built-in Functions
 
-The following built-in functions are always available and can be used to
-check the PowerPC target platform type:
+This section describes PowerPC built-in functions that do not require
+the inclusion of any special header files to declare prototypes or
+provide macro definitions.  The sections that follow describe
+additional PowerPC built-in functions.
 
+@node Low-Level PowerPC Built-in Functions Available on all Targets
+@subsubsection Low-Level PowerPC Built-in Functions Available on all
Targets
+
 @deftypefn {Built-in Function} void __builtin_cpu_init (void)
 This function is a @code{nop} on the PowerPC platform and is included
solely
 to maintain API compatibility with the x86 builtins.
@@ -15633,6 +15638,8 @@ CPU supports the set of compatible performance mon
 CPU supports the Embedded ISA category.
 @item cellbe
 CPU has a CELL broadband engine.
+@item darn
+CPU supports the darn (deliver a random number) instruction.
 @item dfp
 CPU has a decimal floating point unit.
 @item dscr
@@ -15649,6 +15656,8 @@ CPU has a floating point unit.
 CPU has hardware transaction memory instructions.
 @item htm-nosc
 Kernel aborts hardware transactions when a syscall is made.
+@item htm-no-suspend
+Kernel aborts hardware transactions when the thread is suspended.
 @item ic_snoop
 CPU supports icache snooping capabilities.
 @item ieee128
@@ -15677,6 +15686,8 @@ CPU supports the old POWER ISA (eg, 601)
 CPU supports 64-bit mode execution.
 @item ppcle
 CPU supports a little-endian mode that uses address swizzling.
+@item scv
+Kernel supports system call vectored.
 @item smt
 CPU support simultaneous multi-threading.
 @item spe
@@ -15708,19 +15719,81 @@ Here is an example:
 @end smallexample
 @end deftypefn
 
-These built-in functions are available for the PowerPC family of
+The following built-in functions are also available on all PowerPC
 processors:
 @smallexample
-float __builtin_recipdivf (float, float);
-float __builtin_rsqrtf (float);
-double __builtin_recipdiv (double, double);
-double __builtin_rsqrt (double);
 uint64_t __builtin_ppc_get_timebase ();
 unsigned long __builtin_ppc_mftb ();
-double __builtin_unpack_longdouble (long double, int);
-long double __builtin_pack_longdouble (double, double);
 @end smallexample
 
+The @code{__builtin_ppc_get_timebase} and @code{__builtin_ppc_mftb}
+functions generate instructions to read the Time Base Register.  The
+@code{__builtin_ppc_get_timebase} function may generate multiple
+instructions and always returns the 64 bits of the Time Base Register.
+The @code{__builtin_ppc_mftb} function always generates one instruction and
+returns the Time Base Register value as an unsigned long, throwing away
+the most significant word on 32-bit environments.
+
+@node Low-Level PowerPC Built-in Funct

[PATCH, rs6000] Fix tests that are failing in gcc.target/powerpc/bfp with -m32

2018-04-13 Thread Kelvin Nilsen
Twelve failures have been occuring in the bfp test directory during -m32
regression testing.

The cause of these failures was two-fold:

1. Patches added subsequent to development of the tests caused new error
messages
to be emitted that are different than the error messages expected in the
dejagnu patterns.
These new patches also changed which built-in functions are legal when
compiling with the
-m32 command-line option.

2. The implementation of overloaded built-in functions maps overloaded
function names to
non-overloaded names.  Depending on the stage at which an error is
recognized, error
messages may refer either to the overloaded built-in function name or
the non-overloaded
name.

This patch:

1. Changes the expected error messages in certain test programs.

2. Disables certain test programs from being exercised on 32-bit targets.

3. Adds a "note" error message to explain the mapping from overloaded
built-in functions
to non-overloaded built-in functions.


This patch has bootstrapped and tested without regressions on both
powerpc64le-unknown-linux (P8) and on powerpc-linux (P7 big-endian, with
both -m32
and -m64 target options).

Is this ok for trunk?

gcc/ChangeLog:

2018-04-13  Kelvin Nilsen  <kel...@gcc.gnu.org>

    * config/rs6000/rs6000-protos.h (rs6000_builtin_is_supported_p):
    New prototype.
    * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):
    Add note to error message to explain internal mapping of overloaded
    built-in function name to non-overloaded built-in function name.
    * config/rs6000/rs6000.c (rs6000_builtin_is_supported_p): New
    function.

gcc/testsuite/ChangeLog:

2018-04-13  Kelvin Nilsen  <kel...@gcc.gnu.org>

    * gcc.target/powerpc/bfp/scalar-extract-sig-5.c: Simplify to
    prevent cascading of errors and change expected error message.
    * gcc.target/powerpc/bfp/scalar-test-neg-4.c: Restrict this test
    to 64-bit targets.
    * gcc.target/powerpc/bfp/scalar-test-data-class-8.c: Likewise.
    * gcc.target/powerpc/bfp/scalar-test-data-class-9.c: Likewise.
    * gcc.target/powerpc/bfp/scalar-test-data-class-10.c: Likewise.
    * gcc.target/powerpc/bfp/scalar-insert-exp-11.c: Change expected
    error message.
    * gcc.target/powerpc/bfp/scalar-extract-exp-5.c: Likewise.

Index: gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-5.c
===
--- gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-5.c   
(revision 259316)
+++ gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-5.c   
(working copy)
@@ -8,10 +8,10 @@
    error because the builtin requires 64 bits.  */
 #include 
 
-unsigned __int128 /* { dg-error "'__int128' is not supported on this
target" } */
+unsigned long long int
 get_significand (__ieee128 *p)
 {
   __ieee128 source = *p;
 
-  return __builtin_vec_scalar_extract_sig (source); /* { dg-error
"builtin function '__builtin_vec_scalar_extract_sig' not supported in
this compiler configuration" } */
+  return (long long int) __builtin_vec_scalar_extract_sig (source); /*
{ dg-error "requires ISA 3.0 IEEE 128-bit floating point" } */
 }
Index: gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-neg-4.c
===
--- gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-neg-4.c   
(revision 259316)
+++ gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-neg-4.c    (working
copy)
@@ -1,5 +1,6 @@
 /* { dg-do compile { target { powerpc*-*-* } } } */
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" }
{ "-mcpu=power9" } } */
+/* { dg-require-effective-target lp64 } */
 /* { dg-require-effective-target powerpc_p9vector_ok } */
 /* { dg-options "-mcpu=power9" } */
 
@@ -11,6 +12,8 @@
 {
   __ieee128 source = *p;
 
+  /* IEEE 128-bit floating point operations are only supported
+ on 64-bit targets.  */
   return scalar_test_neg (source);
 }
 
Index: gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-data-class-8.c
===
--- gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-data-class-8.c   
(revision 259316)
+++ gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-data-class-8.c   
(working copy)
@@ -1,5 +1,6 @@
 /* { dg-do compile { target { powerpc*-*-* } } } */
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" }
{ "-mcpu=power9" } } */
+/* { dg-require-effective-target lp64 } */
 /* { dg-require-effective-target powerpc_p9vector_ok } */
 /* { dg-options "-mcpu=power9" } */
 
@@ -11,6 +12,8 @@
 {
   __ieee128 source = *p;
 
+  /* IEEE 128-bit floating point operations are only supported
+ on 64-bit targets.  */
   return scalar_test_data_class (source, 3);
 }
 
Index: gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-11.c
=

[PATCH][OBVIOUS] PR85347: New testcase vec-ldl-1.c FAILs on powerpc64-linux

2018-04-12 Thread Kelvin Nilsen
This new test case required a dejagnu qualifier to restrict its
execution on big-endian platforms.

The patch bootstrapped and tested without regressions.  Was committed as
obvious.


gcc/testsuite/ChangeLog:

2018-04-12  Kelvin Nilsen  <kel...@gcc.gnu.org>

    PR target/85347
    * gcc.target/powerpc/vec-ldl-1.c: Change dejagnu directives to
    specify -mvsx on gcc command line.

Index: gcc/testsuite/gcc.target/powerpc/vec-ldl-1.c
===
--- gcc/testsuite/gcc.target/powerpc/vec-ldl-1.c    (revision 259318)
+++ gcc/testsuite/gcc.target/powerpc/vec-ldl-1.c    (working copy)
@@ -1,6 +1,6 @@
 /* { dg-do run { target powerpc*-*-* } } */
-/* { dg-require-effective-target vmx_hw } */
-/* { dg-options "-maltivec -O0 -Wall" } */
+/* { dg-require-effective-target vsx_hw } */
+/* { dg-options "-mvsx -O0 -Wall" } */
 
 #include 
 #include 



[PATCH v2, rs6000] Tidy implementation of vec_ldl

2018-04-04 Thread Kelvin Nilsen
This is a second draft of a draft patch originally submitted on 3/29.  
This patch corrects inconsistencies in the supported prototypes for the 
vec_ldl built-in function.  Specifically, it removes support for:



  vector int vec_ldl (int, long int *)
  vector unsigned int vec_ldl (int, unsigned long int *)

and adds support for:

  vector bool char vec_ldl (int, bool char *)
  vector bool short vec_ldl (int, bool short *)
  vector bool int vec_ldl (int, bool int *)
  vector bool long long vec_ldl (int, bool long long *)
  vector long long vec_ldl (int, long long *)
  vector unsigned long long vec_ldl (int, unsigned long long *)

Thanks to Segher Boessenkool for his careful review and feedback on the 
first draft of this patch.  This second revision differs from the first 
in the following:


1. Removed support for the proposed new prototype: "vector pixel vec_ldl 
(int, pixel *)"


2. Removed an extraneous tab character in the ChangeLog.

3. Changed the mangling of the bool_long_long_type_node.

4. Removed leading * on comment continuation lines.

5. Added a comment to describe limitations on use of the pixel data type.

6. Removed requirement for lp64 on the new test program.


This patch has bootstrapped and tested without regressions on both 
powerpc64le-unknown-linux (P8) and on powerpc-linux (P7 big-endian, with 
both -m32 and -m64 target options).


Is this ok for trunk?

gcc/ChangeLog:

2018-04-03  Kelvin Nilsen  <kel...@gcc.gnu.org>

    * config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Remove
    erroneous entries for
    "vector int vec_ldl (int, long int *)", and
    "vector unsigned int vec_ldl (int, unsigned long int *)".
    Add comments and entries for
    "vector bool char vec_ldl (int, bool char *)",
    "vector bool short vec_ldl (int, bool short *)",
    "vector bool int vec_ldl (int, bool int *)",
    "vector bool long long vec_ldl (int, bool long long *)",
    "vector pixel vec_ldl (int, pixel *)",
    "vector long long vec_ldl (int, long long *)",
    "vector unsigned long long vec_ldl (int, unsigned long long *)".
    * config/rs6000/rs6000.c (rs6000_init_builtins): Initialize new
    type tree bool_long_long_type_node and correct definition of
    bool_V2DI_type_node to make reference to this new type tree.
    (rs6000_mangle_type): Replace erroneous reference to
    bool_long_type_node with bool_long_long_type_node.
    * config/rs6000/rs6000.h (enum rs6000_builtin_type_index): Add
    comments to emphasize sign distinctions for char and int types and
    replace RS6000_BTI_bool_long constant with
    RS6000_BTI_bool_long_long constant.  Also add comment to restrict
    use of RS6000_BTI_pixel.
    (bool_long_type_node): Remove this macro definition.
    (bool_long_long_type_node): New macro definition

gcc/testsuite/ChangeLog:

2018-04-03  Kelvin Nilsen  <kel...@gcc.gnu.org>

    * gcc.target/powerpc/vec-ldl-1.c: New test.
    * gcc.dg/vmx/ops-long-1.c: Correct test programs to reflect
    corrections to ABI implementation.

Index: gcc/config/rs6000/rs6000-c.c
===
--- gcc/config/rs6000/rs6000-c.c    (revision 258800)
+++ gcc/config/rs6000/rs6000-c.c    (working copy)
@@ -1656,27 +1656,45 @@ const struct altivec_builtin_types altivec_overloa
 RS6000_BTI_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_INTQI, 0 },
   { ALTIVEC_BUILTIN_VEC_LVEBX, ALTIVEC_BUILTIN_LVEBX,
 RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTQI, 0 },
+
+  /* vector float vec_ldl (int, vector float *);
+ vector float vec_ldl (int, float *); */
   { ALTIVEC_BUILTIN_VEC_LDL, ALTIVEC_BUILTIN_LVXL_V4SF,
 RS6000_BTI_V4SF, RS6000_BTI_INTSI, ~RS6000_BTI_V4SF, 0 },
   { ALTIVEC_BUILTIN_VEC_LDL, ALTIVEC_BUILTIN_LVXL_V4SF,
 RS6000_BTI_V4SF, RS6000_BTI_INTSI, ~RS6000_BTI_float, 0 },
+
+  /* vector bool int vec_ldl (int, vector bool int *);
+ vector bool int vec_ldl (int, bool int *);
+  vector int vec_ldl (int, vector int *);
+  vector int vec_ldl (int, int *);
+ vector unsigned int vec_ldl (int, vector unsigned int *);
+ vector unsigned int vec_ldl (int, unsigned int *); */
   { ALTIVEC_BUILTIN_VEC_LDL, ALTIVEC_BUILTIN_LVXL_V4SI,
 RS6000_BTI_bool_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_bool_V4SI, 0 },
   { ALTIVEC_BUILTIN_VEC_LDL, ALTIVEC_BUILTIN_LVXL_V4SI,
+    RS6000_BTI_bool_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_bool_int, 0 },
+  { ALTIVEC_BUILTIN_VEC_LDL, ALTIVEC_BUILTIN_LVXL_V4SI,
 RS6000_BTI_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_V4SI, 0 },
   { ALTIVEC_BUILTIN_VEC_LDL, ALTIVEC_BUILTIN_LVXL_V4SI,
 RS6000_BTI_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_INTSI, 0 },
   { ALTIVEC_BUILTIN_VEC_LDL, ALTIVEC_BUILTIN_LVXL_V4SI,
-    RS6000_BTI_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_long, 0 },
-  { ALTIVEC_BUILTIN_VEC_LDL, ALTIVEC_BUILTIN_LVXL_V4SI,
 RS6000_BTI_unsigned_V

[PATCH, rs6000] Tidy implementation of vec_ldl

2018-03-29 Thread Kelvin Nilsen

During code review, an inconsistency was noticed in some
of the prototypes defined for the vec_ldl built-in function.
In particular, the vector fetched from an address declare
to be long long * was returned as "vector int".  In
addressing this problem, certain other inconsistencies
and omissions were discovered.  This patch tidies up
the implementation of this function.  A separate patch
is in preparation to address the documentation for this
and all other PowerPC built-in functions.

In summary, this patch removes two prototypes:

  vector int vec_ldl (int, long int *)
  vector unsigned int vec_ldl (int, unsigned long int *)

and adds eight:

  vector bool char vec_ldl (int, bool char *)
  vector bool short vec_ldl (int, bool short *)
  vector bool int vec_ldl (int, bool int *)
  vector bool long long vec_ldl (int, bool long long *)
  vector pixel vec_ldl (int, pixel *)
  vector long long vec_ldl (int, long long *)
  vector unsigned long long vec_ldl (int, unsigned long long *)

This patch has been bootstrapped and tested without
regressions on both powerpc64le-unknown-linux (P8)
and on powerpc-linux (P7 big-endian, with both -m32
and -m64 target options).

Is this ok for trunk?

gcc/ChangeLog:

2018-03-29  Kelvin Nilsen <kel...@gcc.gnu.org>

    * config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Remove
    erroneous entries for
    "vector int vec_ldl (int, long int *)", and
    "vector    unsigned int vec_ldl (int, unsigned long int *)".
    Add comments and entries for
    "vector bool char vec_ldl (int, bool char *)",
    "vector bool short vec_ldl (int, bool short *)",
    "vector bool int vec_ldl (int, bool int *)",
    "vector bool long long vec_ldl (int, bool long long *)",
    "vector pixel vec_ldl (int, pixel *)",
    "vector long long vec_ldl (int, long long *)",
    "vector unsigned long long vec_ldl (int, unsigned long long *)".
    * config/rs6000/rs6000.c (rs6000_init_builtins): Initialize new
    type tree bool_long_long_type_node and correct definition of
    bool_V2DI_type_node to make reference to this new type tree.
    (rs6000_mangle_type): Replace erroneous reference to
    bool_long_type_node with bool_long_long_type_node.
    * config/rs6000/rs6000.h (enum rs6000_builtin_type_index): Add
    comments to emphasize sign distinctions for char and int types and
    replace RS6000_BTI_bool_long constant with
    RS6000_BTI_bool_long_long constant.
    (bool_long_type_node): Remove this macro definition.
    (bool_long_long_type_node): New macro definition

gcc/testsuite/ChangeLog:

2018-03-29  Kelvin Nilsen <kel...@gcc.gnu.org>

    * gcc.target/powerpc/vec-ldl-1.c: New test.
    * gcc.dg/vmx/ops-long-1.c: Correct test programs to reflect
    corrections to ABI implementation.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c    (revision 258800)
+++ gcc/config/rs6000/rs6000.c    (working copy)
@@ -16947,7 +16947,7 @@ rs6000_init_builtins (void)
   bool_char_type_node = build_distinct_type_copy 
(unsigned_intQI_type_node);
   bool_short_type_node = build_distinct_type_copy 
(unsigned_intHI_type_node);
   bool_int_type_node = build_distinct_type_copy 
(unsigned_intSI_type_node);
-  bool_long_type_node = build_distinct_type_copy 
(unsigned_intDI_type_node);
+  bool_long_long_type_node = build_distinct_type_copy 
(unsigned_intDI_type_node);

   pixel_type_node = build_distinct_type_copy (unsigned_intHI_type_node);

   long_integer_type_internal_node = long_integer_type_node;
@@ -17064,7 +17064,7 @@ rs6000_init_builtins (void)
   bool_V2DI_type_node = rs6000_vector_type (TARGET_POWERPC64
                     ? "__vector __bool long"
                     : "__vector __bool long long",
-                        bool_long_type_node, 2);
+                        bool_long_long_type_node, 2);
   pixel_V8HI_type_node = rs6000_vector_type ("__vector __pixel",
                  pixel_type_node, 8);

@@ -32855,7 +32855,7 @@ rs6000_mangle_type (const_tree type)
   if (type == bool_short_type_node) return "U6__bools";
   if (type == pixel_type_node) return "u7__pixel";
   if (type == bool_int_type_node) return "U6__booli";
-  if (type == bool_long_type_node) return "U6__booll";
+  if (type == bool_long_long_type_node) return "U6__booll";

   /* Use a unique name for __float128 rather than trying to use "e" or 
"g". Use
  "g" for IBM extended double, no matter whether it is long double 
(using

Index: gcc/config/rs6000/rs6000.h
===
--- gcc/config/rs6000/rs6000.h    (revision 258800)
+++ gcc/config/rs6000/rs6000.h    (working copy)
@@ -2578,7 +2578,7 @@ enum rs6000_builtin_type_index
   RS6000_BTI_opaque_V2SF,
  

[PATCH, rs6000] Finish implementation of __builtin_atlivec_lvx_v1ti

2018-03-14 Thread Kelvin Nilsen
During code review, it was discovered that the implementation of
__builtin_altivec_lvx_v1ti is not complete.  The constant
ALTIVEC_BUILTINLVX_V1TI is introduced and is bound to the function
__builtin_altivec_lvx_v1ti.  However, this function's implementation is
incomplete because there is no call to the def_builtin function for this
binding.

This patch provides the missing pieces to add support for this function.
Additionally, this patch introduces four new __int128-based prototypes
of the overloaded __builtin_vec_ld function.  This is the function that
implements the vec_ld () macro expansion.  A new test case has been
provided to exercise each of these prototypes.

This patch has been bootstrapped and tested without regressions on both
powerpc64le-unknown-linux (P8) and on powerpc-linux (P7 big-endian, with
both -m32 and -m64 target options).

Is this patch ok for trunk?

gcc/ChangeLog:

2018-03-14  Kelvin Nilsen  <kel...@gcc.gnu.org>

* config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Add
entries for V1TI variants of __builtin_altivec_ld builtin.
* config/rs6000/rs6000.c (altivec_expand_lv_builtin): Add test and
handling of V1TI variant of LVX icode pattern.
(altivec_expand_builtin): Add case for ALTIVEC_BUILTIN_LVX_V1TI.
(rs6000_gimple_fold_builtin): Likewise.
(altivec_init_builtins): Add code to define
__builtin_altivec_lvx_v1ti function.
* doc/extend.texi: Add four new prototypes for vec_ld.

gcc/testsuite/ChangeLog:

2018-03-14  Kelvin Nilsen  <kel...@gcc.gnu.org>

* gcc.target/powerpc/altivec-ld-1.c: New test.

Index: gcc/config/rs6000/rs6000-c.c
===
--- gcc/config/rs6000/rs6000-c.c(revision 258341)
+++ gcc/config/rs6000/rs6000-c.c(working copy)
@@ -1562,6 +1562,15 @@ const struct altivec_builtin_types altivec_overloa
   { VSX_BUILTIN_VEC_FLOATO, VSX_BUILTIN_UNS_FLOATO_V2DI,
 RS6000_BTI_V4SF, RS6000_BTI_unsigned_V2DI, 0, 0 },
 
+  { ALTIVEC_BUILTIN_VEC_LD, ALTIVEC_BUILTIN_LVX_V1TI,
+RS6000_BTI_V1TI, RS6000_BTI_INTSI, ~RS6000_BTI_V1TI, 0 },
+  { ALTIVEC_BUILTIN_VEC_LD, ALTIVEC_BUILTIN_LVX_V1TI,
+RS6000_BTI_unsigned_V1TI, RS6000_BTI_INTSI, ~RS6000_BTI_unsigned_V1TI, 0 },
+  { ALTIVEC_BUILTIN_VEC_LD, ALTIVEC_BUILTIN_LVX_V1TI,
+RS6000_BTI_V1TI, RS6000_BTI_INTSI, ~RS6000_BTI_INTTI, 0 },
+  { ALTIVEC_BUILTIN_VEC_LD, ALTIVEC_BUILTIN_LVX_V1TI,
+RS6000_BTI_unsigned_V1TI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTTI, 0 },
+
   { ALTIVEC_BUILTIN_VEC_LD, ALTIVEC_BUILTIN_LVX_V2DF,
 RS6000_BTI_V2DF, RS6000_BTI_INTSI, ~RS6000_BTI_V2DF, 0 },
   { ALTIVEC_BUILTIN_VEC_LD, ALTIVEC_BUILTIN_LVX_V2DI,
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 258341)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -14452,6 +14452,7 @@ altivec_expand_lv_builtin (enum insn_code icode, t
  LVXL and LVE*X expand to use UNSPECs to hide their special behavior,
  so the raw address is fine.  */
   if (icode == CODE_FOR_altivec_lvx_v2df_2op
+  || icode == CODE_FOR_altivec_lvx_v1ti_2op
   || icode == CODE_FOR_altivec_lvx_v2di_2op
   || icode == CODE_FOR_altivec_lvx_v4sf_2op
   || icode == CODE_FOR_altivec_lvx_v4si_2op
@@ -15811,6 +15812,9 @@ altivec_expand_builtin (tree exp, rtx target, bool
 case ALTIVEC_BUILTIN_LVX_V2DI:
   return altivec_expand_lv_builtin (CODE_FOR_altivec_lvx_v2di_2op,
exp, target, false);
+case ALTIVEC_BUILTIN_LVX_V1TI:
+  return altivec_expand_lv_builtin (CODE_FOR_altivec_lvx_v1ti_2op,
+   exp, target, false);
 case ALTIVEC_BUILTIN_LVX_V4SF:
   return altivec_expand_lv_builtin (CODE_FOR_altivec_lvx_v4sf_2op,
exp, target, false);
@@ -16542,6 +16546,7 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *
 case ALTIVEC_BUILTIN_LVX_V4SF:
 case ALTIVEC_BUILTIN_LVX_V2DI:
 case ALTIVEC_BUILTIN_LVX_V2DF:
+case ALTIVEC_BUILTIN_LVX_V1TI:
   {
arg0 = gimple_call_arg (stmt, 0);  // offset
arg1 = gimple_call_arg (stmt, 1);  // address
@@ -17443,6 +17448,10 @@ altivec_init_builtins (void)
 = build_function_type_list (V2DI_type_node,
long_integer_type_node, pcvoid_type_node,
NULL_TREE);
+  tree v1ti_ftype_long_pcvoid
+= build_function_type_list (V1TI_type_node,
+   long_integer_type_node, pcvoid_type_node,
+   NULL_TREE);
 
   tree void_ftype_opaque_long_pvoid
 = build_function_type_list (void_type_node,
@@ -17540,6 +17549,8 @@ altivec_init_builtins (void)
   def_builtin ("__builtin_altivec_lvx", v4si_ftype_long_pcvoid, 
ALTIVEC_BUILTIN_LVX);
   def_builtin ("__builtin_altivec_lvx_v2

[PATCH v2, rs6000] Remove unused (and incorrect) code for internal store and load operations

2018-03-14 Thread Kelvin Nilsen
Thank you for feedback and discussion regarding my first draft of this
patch with Segher Boessenkool and Bill Schmidt.  This revision of the
patch differs from the first in the following regards:

 1. I have also removed the vector_altivec_load_ and
vectore_altivec_store_ expansions from vector.md.

 2. I have removed the unused rs6000_address_for_altivec function
from rs6000.c.

I have once again bootstrapped and regression tested on both little-
endian and big-endian targets.  The remainder of this description is
borrowed from my initial submission of the patch.

While working to assure rs6000 documentation of built-in functions is
consistent with the implementation of built-in functions, I discovered
some apparent typographic errors in the definitions of the
ST_INTERNAL_4sf and ST_INTERNAL_2df built-in functions.  As I endeavored
to fix these definitions and write test cases to prove that I had
properly fixed them, I discovered that these functions are no  longer in
use.

This patch removes the unnecessary definitions and related back-end
functions.  This has bootstrapped and tested without regressions on both
powerpc64le-unknown-linux (P8) and on powerpc-linux (P7 big-endian, with
both -m32 and -m64 target options).

Is this patch ok for trunk?

gcc/ChangeLog:

2018-03-14  Kelvin Nilsen  <kel...@gcc.gnu.org>

* config/rs6000/rs6000-builtin.def: Remove various BU_ALTIVEC_X
macro expansions for definition of ST_INTERNAL_ and
LD_INTERNAL_ builtins.
* config/rs6000/rs6000-protos.h (rs6000_address_for_altivec):
Remove prototype.
* config/rs6000/rs6000.c (altivec_expand_ld_builtin): Delete this
function.
(altivec_expand_st_builtin): Likewise.
(altivec_expand_builtin): Remove calls to deleted functions.
(rs6000_address_for_altivec): Delete this function.
* config/rs6000/vector.md: Remove expands for
vector_altivec_load_ and vector_altivec_store_.

Index: gcc/config/rs6000/rs6000-builtin.def
===
--- gcc/config/rs6000/rs6000-builtin.def(revision 258338)
+++ gcc/config/rs6000/rs6000-builtin.def(working copy)
@@ -1210,20 +1210,6 @@ BU_ALTIVEC_P (VCMPGTSB_P, "vcmpgtsb_p",  CONST,
 BU_ALTIVEC_P (VCMPGTUB_P, "vcmpgtub_p",CONST,  vector_gtu_v16qi_p)
 
 /* AltiVec builtins that are handled as special cases.  */
-BU_ALTIVEC_X (ST_INTERNAL_4si,  "st_internal_4si",  MEM)
-BU_ALTIVEC_X (LD_INTERNAL_4si,  "ld_internal_4si",  MEM)
-BU_ALTIVEC_X (ST_INTERNAL_8hi, "st_internal_8hi",  MEM)
-BU_ALTIVEC_X (LD_INTERNAL_8hi, "ld_internal_8hi",  MEM)
-BU_ALTIVEC_X (ST_INTERNAL_16qi,"st_internal_16qi", MEM)
-BU_ALTIVEC_X (LD_INTERNAL_16qi,"ld_internal_16qi", MEM)
-BU_ALTIVEC_X (ST_INTERNAL_4sf, "st_internal_16qi", MEM)
-BU_ALTIVEC_X (LD_INTERNAL_4sf, "ld_internal_4sf",  MEM)
-BU_ALTIVEC_X (ST_INTERNAL_2df, "st_internal_4sf",  MEM)
-BU_ALTIVEC_X (LD_INTERNAL_2df, "ld_internal_2df",  MEM)
-BU_ALTIVEC_X (ST_INTERNAL_2di, "st_internal_2di",  MEM)
-BU_ALTIVEC_X (LD_INTERNAL_2di, "ld_internal_2di",  MEM)
-BU_ALTIVEC_X (ST_INTERNAL_1ti, "st_internal_1ti",  MEM)
-BU_ALTIVEC_X (LD_INTERNAL_1ti, "ld_internal_1ti",  MEM)
 BU_ALTIVEC_X (MTVSCR,  "mtvscr",   MISC)
 BU_ALTIVEC_X (MFVSCR,  "mfvscr",   MISC)
 BU_ALTIVEC_X (DSSALL,  "dssall",   MISC)
Index: gcc/config/rs6000/rs6000-protos.h
===
--- gcc/config/rs6000/rs6000-protos.h   (revision 258338)
+++ gcc/config/rs6000/rs6000-protos.h   (working copy)
@@ -162,7 +162,6 @@ extern void rs6000_emit_parity (rtx, rtx);
 extern rtx rs6000_machopic_legitimize_pic_address (rtx, machine_mode,
   rtx);
 extern rtx rs6000_address_for_fpconvert (rtx);
-extern rtx rs6000_address_for_altivec (rtx);
 extern rtx rs6000_allocate_stack_temp (machine_mode, bool, bool);
 extern int rs6000_loop_align (rtx);
 extern void rs6000_split_logical (rtx [], enum rtx_code, bool, bool, bool);
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 258338)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -15183,127 +15183,7 @@ rs6000_expand_ternop_builtin (enum insn_code icode
   return target;
 }
 
-/* Expand the lvx builtins.  */
-static rtx
-altivec_expand_ld_builtin (tree exp, rtx target, bool *expandedp)
-{
-  tree fndecl = TREE_OPERAND (CALL_EXPR_FN (exp), 0);
-  unsigned int fcode = DECL_FUNCTION_CODE (fndecl);
-  tree arg0;
-  machine_mode tmode, mode0;
-  rtx pat, op0;
-  enum insn_code icode;
 
-  switch (fcode)
-{
-case ALTIVEC_BUILTIN_LD_

[PATCH, rs6000] Remove unused (and incorrect) code for internal store and load operations

2018-03-13 Thread Kelvin Nilsen
While working to assure rs6000 documentation of built-in functions is
consistent with the implementation of built-in functions, I discovered
some apparent typographic errors in the definitions of the
ST_INTERNAL_4sf and ST_INTERNAL_2df built-in functions.  As I endeavored
to fix these definitions and write test cases to prove that I had
properly fixed them, I discovered that these functions are no  longer in
use.

This patch removes the unnecessary definitions and related back-end
functions.  This has bootstrapped and tested without regressions on both
powerpc64le-unknown-linux (P8) and on powerpc-linux (P7 big-endian, with
both -m32 and -m64 target options).

Is this patch ok for trunk?

gcc/ChangeLog:

2018-03-09  Kelvin Nilsen  <kel...@gcc.gnu.org>

* config/rs6000/rs6000-builtin.def: Remove various BU_ALTIVEC_X
macro expansions for definition of ST_INTERNAL_ and
LD_INTERNAL_ builtins.
* config/rs6000/rs6000.c (altivec_expand_ld_builtin): Delete this
function.
(altivec_expand_st_builtin): Likewise.
(altivec_expand_builtin): Remove calls to deleted functions.

Index: gcc/config/rs6000/rs6000-builtin.def
===
--- gcc/config/rs6000/rs6000-builtin.def(revision 258338)
+++ gcc/config/rs6000/rs6000-builtin.def(working copy)
@@ -1210,20 +1210,6 @@ BU_ALTIVEC_P (VCMPGTSB_P, "vcmpgtsb_p",  CONST,
 BU_ALTIVEC_P (VCMPGTUB_P, "vcmpgtub_p",CONST,  vector_gtu_v16qi_p)
 
 /* AltiVec builtins that are handled as special cases.  */
-BU_ALTIVEC_X (ST_INTERNAL_4si,  "st_internal_4si",  MEM)
-BU_ALTIVEC_X (LD_INTERNAL_4si,  "ld_internal_4si",  MEM)
-BU_ALTIVEC_X (ST_INTERNAL_8hi, "st_internal_8hi",  MEM)
-BU_ALTIVEC_X (LD_INTERNAL_8hi, "ld_internal_8hi",  MEM)
-BU_ALTIVEC_X (ST_INTERNAL_16qi,"st_internal_16qi", MEM)
-BU_ALTIVEC_X (LD_INTERNAL_16qi,"ld_internal_16qi", MEM)
-BU_ALTIVEC_X (ST_INTERNAL_4sf, "st_internal_16qi", MEM)
-BU_ALTIVEC_X (LD_INTERNAL_4sf, "ld_internal_4sf",  MEM)
-BU_ALTIVEC_X (ST_INTERNAL_2df, "st_internal_4sf",  MEM)
-BU_ALTIVEC_X (LD_INTERNAL_2df, "ld_internal_2df",  MEM)
-BU_ALTIVEC_X (ST_INTERNAL_2di, "st_internal_2di",  MEM)
-BU_ALTIVEC_X (LD_INTERNAL_2di, "ld_internal_2di",  MEM)
-BU_ALTIVEC_X (ST_INTERNAL_1ti, "st_internal_1ti",  MEM)
-BU_ALTIVEC_X (LD_INTERNAL_1ti, "ld_internal_1ti",  MEM)
 BU_ALTIVEC_X (MTVSCR,  "mtvscr",   MISC)
 BU_ALTIVEC_X (MFVSCR,  "mfvscr",   MISC)
 BU_ALTIVEC_X (DSSALL,  "dssall",   MISC)
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 258338)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -15183,127 +15183,7 @@ rs6000_expand_ternop_builtin (enum insn_code icode
   return target;
 }
 
-/* Expand the lvx builtins.  */
-static rtx
-altivec_expand_ld_builtin (tree exp, rtx target, bool *expandedp)
-{
-  tree fndecl = TREE_OPERAND (CALL_EXPR_FN (exp), 0);
-  unsigned int fcode = DECL_FUNCTION_CODE (fndecl);
-  tree arg0;
-  machine_mode tmode, mode0;
-  rtx pat, op0;
-  enum insn_code icode;
 
-  switch (fcode)
-{
-case ALTIVEC_BUILTIN_LD_INTERNAL_16qi:
-  icode = CODE_FOR_vector_altivec_load_v16qi;
-  break;
-case ALTIVEC_BUILTIN_LD_INTERNAL_8hi:
-  icode = CODE_FOR_vector_altivec_load_v8hi;
-  break;
-case ALTIVEC_BUILTIN_LD_INTERNAL_4si:
-  icode = CODE_FOR_vector_altivec_load_v4si;
-  break;
-case ALTIVEC_BUILTIN_LD_INTERNAL_4sf:
-  icode = CODE_FOR_vector_altivec_load_v4sf;
-  break;
-case ALTIVEC_BUILTIN_LD_INTERNAL_2df:
-  icode = CODE_FOR_vector_altivec_load_v2df;
-  break;
-case ALTIVEC_BUILTIN_LD_INTERNAL_2di:
-  icode = CODE_FOR_vector_altivec_load_v2di;
-  break;
-case ALTIVEC_BUILTIN_LD_INTERNAL_1ti:
-  icode = CODE_FOR_vector_altivec_load_v1ti;
-  break;
-default:
-  *expandedp = false;
-  return NULL_RTX;
-}
-
-  *expandedp = true;
-
-  arg0 = CALL_EXPR_ARG (exp, 0);
-  op0 = expand_normal (arg0);
-  tmode = insn_data[icode].operand[0].mode;
-  mode0 = insn_data[icode].operand[1].mode;
-
-  if (target == 0
-  || GET_MODE (target) != tmode
-  || ! (*insn_data[icode].operand[0].predicate) (target, tmode))
-target = gen_reg_rtx (tmode);
-
-  if (! (*insn_data[icode].operand[1].predicate) (op0, mode0))
-op0 = gen_rtx_MEM (mode0, copy_to_mode_reg (Pmode, op0));
-
-  pat = GEN_FCN (icode) (target, op0);
-  if (! pat)
-return 0;
-  emit_insn (pat);
-  return target;
-}
-
-/* Expand the stvx builtins.  */
-static rtx
-altivec_expand_st_builtin (tree exp, rtx target ATTRIBUTE_UNUSED,
-  bool *expandedp)
-{
-  tree

wwwdocs: An additional release note for powerpc for GCC 8

2018-02-14 Thread Kelvin Nilsen

Is this revision to the existing draft GCC 8 release notes ok for
commit?  

Thanks

? cvs.diffs
Index: htdocs/gcc-8/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-8/changes.html,v
retrieving revision 1.36
diff -u -3 -p -r1.36 changes.html
--- htdocs/gcc-8/changes.html   12 Feb 2018 07:23:11 -  1.36
+++ htdocs/gcc-8/changes.html   14 Feb 2018 14:58:56 -
@@ -464,6 +464,11 @@ a work-in-progress.
 powerpc-xilinx-eabi*)
 is deprecated and will be removed in a future release.
   
+  
+Support for using big-endian AltiVec intrinsics on a little-endian target
+(-maltivec=be) is deprecated and will be removed in a
+future release.
+  
 
 
 PowerPC SPE



[PATCH, rs6000] Begin deprecation of -maltivec=be

2018-02-13 Thread Kelvin Nilsen

PR 78303 was recently marked RESOLVED, WONTFIX.  The resolution was to
deprecate the troublesome command-line option.

This patch begins the process of deprecation by issuing a warning
message when this command-line option is specified.  The patch has
bootstrapped and tested without regressions on
powerpc64le-unknown-linux.  Is this ok for trunk?

gcc/ChangeLog:

2018-02-13  Kelvin Nilsen  <kel...@gcc.gnu.org>

* config/rs6000/rs6000.c (rs6000_option_override_internal): Issue
warning message if user requests -maltivec=be.

gcc/testsuite/ChangeLog:

2018-02-13  Kelvin Nilsen  <kel...@gcc.gnu.org>

* gcc.dg/vmx/extract-be-order.c: Disable -maltivec=be warning so
this test case still works ok.
* gcc.dg/vmx/extract-vsx-be-order.c: Likewise.
* gcc.dg/vmx/insert-be-order.c: Likewise.
* gcc.dg/vmx/insert-vsx-be-order.c: Likewise.
* gcc.dg/vmx/ld-be-order.c: Likewise.
* gcc.dg/vmx/ld-vsx-be-order.c: Likewise.
* gcc.dg/vmx/lde-be-order.c: Likewise.
* gcc.dg/vmx/ldl-be-order.c: Likewise.
* gcc.dg/vmx/ldl-vsx-be-order.c: Likewise.
* gcc.dg/vmx/merge-be-order.c: Likewise.
* gcc.dg/vmx/merge-vsx-be-order.c: Likewise.
* gcc.dg/vmx/mult-even-odd-be-order.c: Likewise.
* gcc.dg/vmx/pack-be-order.c: Likewise.
* gcc.dg/vmx/perm-be-order.c: Likewise.
* gcc.dg/vmx/splat-be-order.c: Likewise.
* gcc.dg/vmx/splat-vsx-be-order.c: Likewise.
* gcc.dg/vmx/st-be-order.c: Likewise.
* gcc.dg/vmx/st-vsx-be-order.c: Likewise.
* gcc.dg/vmx/ste-be-order.c: Likewise.
* gcc.dg/vmx/stl-be-order.c: Likewise.
* gcc.dg/vmx/stl-vsx-be-order.c: Likewise.
* gcc.dg/vmx/sum2s-be-order.c: Likewise.
* gcc.dg/vmx/unpack-be-order.c: Likewise.
* gcc.dg/vmx/vsums-be-order.c: Likewise.
* gcc.target/powerpc/vec-setup-be-long.c: Likewise.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 257395)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -4028,6 +4028,13 @@ rs6000_option_override_internal (bool global_init_
   if (global_init_p)
 rs6000_isa_flags_explicit = global_options_set.x_rs6000_isa_flags;
 
+  /* We plan to deprecate the -maltivec=be option.  For now, just
+ issue a warning message.  */
+  if (global_init_p
+  && (rs6000_altivec_element_order == 2))
+warning (0, "%qs command-line option is deprecated",
+"-maltivec=be");
+
   /* On 64-bit Darwin, power alignment is ABI-incompatible with some C
  library functions, so warn about it. The flag may be useful for
  performance studies from time to time though, so don't disable it
Index: gcc/testsuite/gcc.dg/vmx/extract-be-order.c
===
--- gcc/testsuite/gcc.dg/vmx/extract-be-order.c (revision 257395)
+++ gcc/testsuite/gcc.dg/vmx/extract-be-order.c (working copy)
@@ -1,4 +1,4 @@
-/* { dg-options "-maltivec=be -mabi=altivec -std=gnu99 -mno-vsx" } */
+/* { dg-options "-maltivec=be -mabi=altivec -std=gnu99 -mno-vsx -w" } */
 
 #include "harness.h"
 
Index: gcc/testsuite/gcc.dg/vmx/extract-vsx-be-order.c
===
--- gcc/testsuite/gcc.dg/vmx/extract-vsx-be-order.c (revision 257395)
+++ gcc/testsuite/gcc.dg/vmx/extract-vsx-be-order.c (working copy)
@@ -1,6 +1,6 @@
 /* { dg-skip-if "" { powerpc*-*-darwin* } } */
 /* { dg-require-effective-target powerpc_vsx_ok } */
-/* { dg-options "-maltivec=be -mabi=altivec -std=gnu99 -mvsx" } */
+/* { dg-options "-maltivec=be -mabi=altivec -std=gnu99 -mvsx -w" } */
 
 #include "harness.h"
 
Index: gcc/testsuite/gcc.dg/vmx/insert-be-order.c
===
--- gcc/testsuite/gcc.dg/vmx/insert-be-order.c  (revision 257395)
+++ gcc/testsuite/gcc.dg/vmx/insert-be-order.c  (working copy)
@@ -1,4 +1,4 @@
-/* { dg-options "-maltivec=be -mabi=altivec -std=gnu99 -mno-vsx" } */
+/* { dg-options "-w -maltivec=be -mabi=altivec -std=gnu99 -mno-vsx" } */
 
 #include "harness.h"
 
Index: gcc/testsuite/gcc.dg/vmx/insert-vsx-be-order.c
===
--- gcc/testsuite/gcc.dg/vmx/insert-vsx-be-order.c  (revision 257395)
+++ gcc/testsuite/gcc.dg/vmx/insert-vsx-be-order.c  (working copy)
@@ -1,6 +1,6 @@
 /* { dg-skip-if "" { powerpc*-*-darwin* } } */
 /* { dg-require-effective-target powerpc_vsx_ok } */
-/* { dg-options "-maltivec=be -mabi=altivec -std=gnu99 -mvsx" } */
+/* { dg-options "-w -maltivec=be -mabi=altivec -std=gnu99 -mvsx" } */
 
 #include "harness.h"
 
Index: gcc/testsuite/gcc.dg/vmx/ld-be-order.c
===

[PATCH] PR 80867: ICE during -O3 compile of libgnat

2018-01-29 Thread Kelvin Nilsen
It was determined that the reported ICE occurs because a NULL value is
passed from vectorizable_call () to

   targetm.vectorize.builtin_md_vectorized_function (
callee, vectype_out, vectype_in).

This patch avoids making this call if callee equals NULL.

After successful bootstrap and regression testing, with preapproval,
this patch has been committed to the trunk.

It it ok to backport to GCC 7 and GCC 6 (after testing on those
platforms)?

Thanks.


gcc/ChangeLog:

2018-01-29  Richard Biener <rguent...@suse.de>
    Kelvin Nilsen  <kel...@gcc.gnu.org>

PR bootstrap/80867
* tree-vect-stmts.c (vectorizable_call): Don't call
targetm.vectorize_builtin_md_vectorized_function if callee is
NULL.

Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   (revision 257105)
+++ gcc/tree-vect-stmts.c   (working copy)
@@ -3159,7 +3159,7 @@
   if (cfn != CFN_LAST)
fndecl = targetm.vectorize.builtin_vectorized_function
  (cfn, vectype_out, vectype_in);
-  else
+  else if (callee)
fndecl = targetm.vectorize.builtin_md_vectorized_function
  (callee, vectype_out, vectype_in);
 }



[PATCH, rs6000] Fix ICE caused by recent patch: Generate lvx and stvx without swaps for aligned vector loads and stores

2018-01-16 Thread Kelvin Nilsen

A patch committed on 2018-01-10 is causing an ICE with existing test
program $GCC_SRC/gcc/testsuite/gcc.target/powerpc/pr83399.c, when
compiled with the -m32 option.  At the time of the commit, it was
thought that this was a problem with the recent resolution of PR83399.
However, further investigation revealed a problem with the patch that
was just committed.  The generated code did not distinguish between 32-
and 64-bit targets.

This patch corrects that problem.

This has been bootstrapped and tested without regressions on
powerpc64le-unknown-linux (P8) and on powerpc64-unknown-linux (P7) with
both -m32 and -m64 target options.  Is this ok for trunk?


gcc/ChangeLog:

2018-01-16  Kelvin Nilsen  <kel...@gcc.gnu.org>

* config/rs6000/rs6000-p8swap.c (rs6000_gen_stvx): Generate
different rtl trees depending on TARGET_64BIT.
(rs6000_gen_lvx): Likewise.

Index: gcc/config/rs6000/rs6000-p8swap.c
===
--- gcc/config/rs6000/rs6000-p8swap.c   (revision 256710)
+++ gcc/config/rs6000/rs6000-p8swap.c   (working copy)
@@ -1554,23 +1554,31 @@ rs6000_gen_stvx (enum machine_mode mode, rtx dest_
   op1 = XEXP (memory_address, 0);
   op2 = XEXP (memory_address, 1);
   if (mode == V16QImode)
-   stvx = gen_altivec_stvx_v16qi_2op (src_exp, op1, op2);
+   stvx = TARGET_64BIT ? gen_altivec_stvx_v16qi_2op (src_exp, op1, op2)
+ : gen_altivec_stvx_v16qi_2op_si (src_exp, op1, op2);
   else if (mode == V8HImode)
-   stvx = gen_altivec_stvx_v8hi_2op (src_exp, op1, op2);
+   stvx = TARGET_64BIT ? gen_altivec_stvx_v8hi_2op (src_exp, op1, op2)
+ : gen_altivec_stvx_v8hi_2op_si (src_exp, op1, op2);
 #ifdef HAVE_V8HFmode
   else if (mode == V8HFmode)
-   stvx = gen_altivec_stvx_v8hf_2op (src_exp, op1, op2);
+   stvx = TARGET_64BIT ? gen_altivec_stvx_v8hf_2op (src_exp, op1, op2)
+ : gen_altivec_stvx_v8hf_2op_si (src_exp, op1, op2);
 #endif
   else if (mode == V4SImode)
-   stvx = gen_altivec_stvx_v4si_2op (src_exp, op1, op2);
+   stvx = TARGET_64BIT ? gen_altivec_stvx_v4si_2op (src_exp, op1, op2)
+ : gen_altivec_stvx_v4si_2op_si (src_exp, op1, op2);
   else if (mode == V4SFmode)
-   stvx = gen_altivec_stvx_v4sf_2op (src_exp, op1, op2);
+   stvx = TARGET_64BIT ? gen_altivec_stvx_v4sf_2op (src_exp, op1, op2)
+ : gen_altivec_stvx_v4sf_2op_si (src_exp, op1, op2);
   else if (mode == V2DImode)
-   stvx = gen_altivec_stvx_v2di_2op (src_exp, op1, op2);
+   stvx = TARGET_64BIT ? gen_altivec_stvx_v2di_2op (src_exp, op1, op2)
+ : gen_altivec_stvx_v2di_2op_si (src_exp, op1, op2);
   else if (mode == V2DFmode)
-   stvx = gen_altivec_stvx_v2df_2op (src_exp, op1, op2);
+   stvx = TARGET_64BIT ? gen_altivec_stvx_v2df_2op (src_exp, op1, op2)
+ : gen_altivec_stvx_v2df_2op_si (src_exp, op1, op2);
   else if (mode == V1TImode)
-   stvx = gen_altivec_stvx_v1ti_2op (src_exp, op1, op2);
+   stvx = TARGET_64BIT ? gen_altivec_stvx_v1ti_2op (src_exp, op1, op2)
+ : gen_altivec_stvx_v1ti_2op_si (src_exp, op1, op2);
   else
/* KFmode, TFmode, other modes not expected in this context.  */
gcc_unreachable ();
@@ -1578,23 +1586,39 @@ rs6000_gen_stvx (enum machine_mode mode, rtx dest_
   else /* REG_P (memory_address) */
 {
   if (mode == V16QImode)
-   stvx = gen_altivec_stvx_v16qi_1op (src_exp, memory_address);
+   stvx = TARGET_64BIT ?
+ gen_altivec_stvx_v16qi_1op (src_exp, memory_address)
+ : gen_altivec_stvx_v16qi_1op_si (src_exp, memory_address);
   else if (mode == V8HImode)
-   stvx = gen_altivec_stvx_v8hi_1op (src_exp, memory_address);
+   stvx = TARGET_64BIT ?
+ gen_altivec_stvx_v8hi_1op (src_exp, memory_address)
+ : gen_altivec_stvx_v8hi_1op_si (src_exp, memory_address);
 #ifdef HAVE_V8HFmode
   else if (mode == V8HFmode)
-   stvx = gen_altivec_stvx_v8hf_1op (src_exp, memory_address);
+   stvx = TARGET_64BIT ?
+ gen_altivec_stvx_v8hf_1op (src_exp, memory_address)
+ : gen_altivec_stvx_v8hf_1op_si (src_exp, memory_address);
 #endif
   else if (mode == V4SImode)
-   stvx = gen_altivec_stvx_v4si_1op (src_exp, memory_address);
+   stvx =TARGET_64BIT ?
+ gen_altivec_stvx_v4si_1op (src_exp, memory_address)
+ : gen_altivec_stvx_v4si_1op_si (src_exp, memory_address);
   else if (mode == V4SFmode)
-   stvx = gen_altivec_stvx_v4sf_1op (src_exp, memory_address);
+   stvx = TARGET_64BIT ?
+ gen_altivec_stvx_v4sf_1op (src_exp, memory_address)
+ : gen_altivec_stvx_v4sf_1op_si (src_exp, memory_address);
   else if (mode == V2DImode)
-   stvx = gen_altivec_stvx_v2di_1op (src_exp, memory_address);
+   stvx = TARGET_64BIT ?
+ gen_altivec_stvx_v2di_1op (src_exp, memory_address)
+ : gen_altivec_stvx_v2di_

[PATCH, rs6000] Generate lvx and stvx without swaps for aligned vector loads and stores

2018-01-12 Thread Kelvin Nilsen

On Power 7 and Power 8 little endian, the code generator has been
emitting two instructions for each vector load and each vector store. 
One instruction does a swapping load or store, and the second
instruction does an in-register swap.

This patch replaces the two-instruction sequences with a single lvx (for
loads) or stvx (for stores) instruction in the very common case that the
vector is known to reside at a quad-word aligned address in memory. 
This patch is most relevant to Power 7 and Power 8 targets because
Power 9 code generation uses new single-instruction encodings for both
aligned and unaligned vector loads and stores.

This patch has been boostrapped and tested without regressions on
powerpc64le-unknown-linux (P8).  It has also been boostrapped and tested
on powerpc-linux (P7 and P8, big-endian, with both -m32 and -m64 target
options).

One regression was identified during big-endian regression testing:

> FAIL: gcc.target/powerpc/pr83399.c (internal compiler error)
> FAIL: gcc.target/powerpc/pr83399.c (test for excess errors)

The pr83399.c test and the ICE are related to a recently committed patch
that addresses a problem originally found and reported as part of the work
on this lvx/stvx optimization patch.  It appears that the PR83399 patch
may not have fully addressed the big-endian aspects of the original
problem report.

> The ICE occurs at
> ;; 
> /home/kelvin/gcc/gcc-trunk4test99327/gcc/testsuite/gcc.target/powerpc/pr8339\
> 9.c:15:1: internal compiler error: in plus_constant, at explow.c:103^M
> ;; 0x104af39f plus_constant(machine_mode, rtx_def*, poly_int<1u, long>, 
> bool)^M
> ;;  /home/kelvin/gcc/gcc-trunk4test99327/gcc/explow.c:103^M
> ;; 0x112e2d97 record_store^M
> ;;  /home/kelvin/gcc/gcc-trunk4test99327/gcc/dse.c:1502^M
> ;; 0x112e525b scan_insn^M
> ;;  /home/kelvin/gcc/gcc-trunk4test99327/gcc/dse.c:2540^M
> ;; 0x112e525b dse_step1^M
> ;;  /home/kelvin/gcc/gcc-trunk4test99327/gcc/dse.c:2652^M
> ;; 0x112e525b rest_of_handle_dse^M
> ;;  /home/kelvin/gcc/gcc-trunk4test99327/gcc/dse.c:3569^M
> ;; 0x112e525b execute^M
> ;;  /home/kelvin/gcc/gcc-trunk4test99327/gcc/dse.c:3627^M
> ;; Please submit a full bug report,^M

Is this patch ok for trunk?

gcc/ChangeLog:

2018-01-10  Kelvin Nilsen  <kel...@gcc.gnu.org>

* config/rs6000/rs6000-p8swap.c (rs6000_sum_of_two_registers_p):
New function.
(rs6000_quadword_masked_address_p): Likewise.
(quad_aligned_load_p): Likewise.
(quad_aligned_store_p): Likewise.
(const_load_sequence_p): Add comment to describe the outer-most loop.
(mimic_memory_attributes_and_flags): New function.
(rs6000_gen_stvx): Likewise.
(replace_swapped_aligned_store): Likewise.
(rs6000_gen_lvx): Likewise.
(replace_swapped_aligned_load): Likewise.
(replace_swapped_load_constant): Capitalize argument name in
comment describing this function.
(rs6000_analyze_swaps): Add a third pass to search for vector loads
and stores that access quad-word aligned addresses and replace
with stvx or lvx instructions when appropriate.
* config/rs6000/rs6000-protos.h (rs6000_sum_of_two_registers_p):
New function prototype.
(rs6000_quadword_masked_address_p): Likewise.
(rs6000_gen_lvx): Likewise.
(rs6000_gen_stvx): Likewise.
* config/rs6000/vsx.md (*vsx_le_perm_load_): For modes
VSX_D (V2DF, V2DI), modify this split to select lvx instruction
when memory address is aligned.
(*vsx_le_perm_load_): For modes VSX_W (V4SF, V4SI), modify
this split to select lvx instruction when memory address is aligned.
(*vsx_le_perm_load_v8hi): Modify this split to select lvx
instruction when memory address is aligned.
(*vsx_le_perm_load_v16qi): Likewise.
(four unnamed splitters): Modify to select the stvx instruction
    when memory is aligned.

gcc/testsuite/ChangeLog:

2018-01-10  Kelvin Nilsen  <kel...@gcc.gnu.org>

* gcc.target/powerpc/pr48857.c: Modify dejagnu directives to look
for lvx and stvx instead of lxvd2x and stxvd2x and require
little-endian target.  Add comments.
* gcc.target/powerpc/swaps-p8-28.c: Add functions for more
comprehensive testing.
* gcc.target/powerpc/swaps-p8-29.c: Likewise.
* gcc.target/powerpc/swaps-p8-30.c: Likewise.
* gcc.target/powerpc/swaps-p8-31.c: Likewise.
* gcc.target/powerpc/swaps-p8-32.c: Likewise.
* gcc.target/powerpc/swaps-p8-33.c: Likewise.
* gcc.target/powerpc/swaps-p8-34.c: Likewise.
* gcc.target/powerpc/swaps-p8-35.c: Likewise.
* gcc.target/powerpc/swaps-p8-36.c: Likewise.
* gcc.target/powerpc/swaps-p8-37.c: Likewise.
* gcc.target/powerpc/swaps-p8-38.c: Likewise.
* gcc.target/powerpc

Backports to gcc 7.x

2017-12-06 Thread Kelvin Nilsen

I would like to backport the following patch to the GCC 7 branch.

PR80101: Fix ICE in store_data_bypass_p
  https://gcc.gnu.org/ml/gcc-patches/2017-04/msg00953.html


This patch has been bootstrapped and regression tested on the
GCC 7 branch.

Is this ok for backporting to GCC 7?



[PATCH,rs6000] Correct dejagnu directives in several newly added tests

2017-09-29 Thread Kelvin Nilsen

This patch corrects an error in several newly added test programs that
was causing these programs to be SUPPORTED on platforms where they were
not supposed to be SUPPORTED, which was causing unexpected FAILS.

The patch has been preapproved by seg...@gcc.gnu.org.

gcc/testsuite/ChangeLog:

2017-09-29  Kelvin Nilsen  <kel...@gcc.gnu.org>

* gcc.target/powerpc/swaps-p8-30.c: Exchange the order of dg-do
and dg-require-effective-target directives to correct testing
behavior.
* gcc.target/powerpc/swaps-p8-32.c: Likewise.
* gcc.target/powerpc/swaps-p8-41.c: Likewise.
* gcc.target/powerpc/swaps-p8-34.c: Likewise.
* gcc.target/powerpc/swaps-p8-43.c: Likewise.
* gcc.target/powerpc/swaps-p8-36.c: Likewise.
* gcc.target/powerpc/swaps-p8-45.c: Likewise.
* gcc.target/powerpc/swaps-p8-29.c: Likewise.
* gcc.target/powerpc/swaps-p8-38.c: Likewise.
* gcc.target/powerpc/swaps-p8-31.c: Likewise.
* gcc.target/powerpc/swaps-p8-40.c: Likewise.
* gcc.target/powerpc/swaps-p8-33.c: Likewise.
* gcc.target/powerpc/swaps-p8-42.c: Likewise.
* gcc.target/powerpc/swaps-p8-35.c: Likewise.
* gcc.target/powerpc/swaps-p8-44.c: Likewise.
* gcc.target/powerpc/swaps-p8-28.c: Likewise.
* gcc.target/powerpc/swaps-p8-37.c: Likewise.
* gcc.target/powerpc/swaps-p8-39.c: Likewise.


Index: gcc/testsuite/gcc.target/powerpc/swaps-p8-30.c
===
--- gcc/testsuite/gcc.target/powerpc/swaps-p8-30.c  (revision 253294)
+++ gcc/testsuite/gcc.target/powerpc/swaps-p8-30.c  (working copy)
@@ -1,5 +1,5 @@
+/* { dg-do compile { target { powerpc64le-*-* } } } */
 /* { dg-require-effective-target powerpc_p8vector_ok } */
-/* { dg-do compile { target { powerpc64le-*-* } } } */
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power8" } } */
 /* { dg-options "-mcpu=power8 -O3 " } */
 /* { dg-final { scan-assembler-not "xxpermdi" } } */
Index: gcc/testsuite/gcc.target/powerpc/swaps-p8-32.c
===
--- gcc/testsuite/gcc.target/powerpc/swaps-p8-32.c  (revision 253294)
+++ gcc/testsuite/gcc.target/powerpc/swaps-p8-32.c  (working copy)
@@ -1,5 +1,5 @@
+/* { dg-do run { target { powerpc*-*-* } } } */
 /* { dg-require-effective-target p8vector_hw } */
-/* { dg-do run { target { powerpc*-*-* } } } */
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power8" } } */
 /* { dg-options "-mcpu=power8 -O3 " } */
 
Index: gcc/testsuite/gcc.target/powerpc/swaps-p8-41.c
===
--- gcc/testsuite/gcc.target/powerpc/swaps-p8-41.c  (revision 253294)
+++ gcc/testsuite/gcc.target/powerpc/swaps-p8-41.c  (working copy)
@@ -1,5 +1,5 @@
+/* { dg-do run { target { powerpc*-*-* } } } */
 /* { dg-require-effective-target p8vector_hw } */
-/* { dg-do run { target { powerpc*-*-* } } } */
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power8" } } */
 /* { dg-options "-mcpu=power8 -O3 " } */
 
Index: gcc/testsuite/gcc.target/powerpc/swaps-p8-34.c
===
--- gcc/testsuite/gcc.target/powerpc/swaps-p8-34.c  (revision 253294)
+++ gcc/testsuite/gcc.target/powerpc/swaps-p8-34.c  (working copy)
@@ -1,5 +1,5 @@
+/* { dg-do run { target { powerpc*-*-* } } } */
 /* { dg-require-effective-target p8vector_hw } */
-/* { dg-do run { target { powerpc*-*-* } } } */
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power8" } } */
 /* { dg-options "-mcpu=power8 -O3 " } */
 
Index: gcc/testsuite/gcc.target/powerpc/swaps-p8-43.c
===
--- gcc/testsuite/gcc.target/powerpc/swaps-p8-43.c  (revision 253294)
+++ gcc/testsuite/gcc.target/powerpc/swaps-p8-43.c  (working copy)
@@ -1,5 +1,5 @@
+/* { dg-do run { target { powerpc*-*-* } } } */
 /* { dg-require-effective-target p8vector_hw } */
-/* { dg-do run { target { powerpc*-*-* } } } */
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power8" } } */
 /* { dg-options "-mcpu=power8 -O3 " } */
 
Index: gcc/testsuite/gcc.target/powerpc/swaps-p8-36.c
===
--- gcc/testsuite/gcc.target/powerpc/swaps-p8-36.c  (revision 253294)
+++ gcc/testsuite/gcc.target/powerpc/swaps-p8-36.c  (working copy)
@@ -1,5 +1,5 @@
+/* { dg-do compile { target { powerpc64le-*-* } } } */
 /* { dg-require-effective-target powerpc_p8vector_ok } */
-/* { dg-do compile { ta

[PATCH v2,rs6000] Replace swap of a loaded vector constant with load of a swapped vector constant

2017-09-25 Thread Kelvin Nilsen

On Power8 little endian, two instructions are needed to load from the
natural in-memory representation of a vector into a vector register: a
load followed by a swap.  When the vector value to be loaded is a
constant, more efficient code can be achieved by swapping the
representation of the constant in memory so that only a load instruction
is required.

This second version of the patch responds to feedback provided by Segher
Boessenkool, Bill Schmidt, and Pat Haugen. Thank you for the careful
reviews:

1. Revised comments in const_load_sequence_p function of rs6000-p8swap.c

2. Restructured nested if statements as a single if-statement with
   compound condition in const_load_sequence_p function of
   rs6000-p8swap.c

3. In replace_swapped_load_constant function of rs6000-p8swap.c,
   replaced two FOR_EACH_INSN_INFO_USE macro expansions with
   non-looping control structures.

4. Added comments and white space to replace_swapped_load_constant
   function of rs6000-p8swap.c to improve readability.

5. Reordered handling of cases in replace_swapped_load_constant
   function of rs6000-p8swap.c, moving V8HImode and V8HFmode
   handling above V4SImode handling.

6. Replaced gcc_assert (0) with gcc_unreachable () in
   replace_swapped_load_constant of rs6000-p8swap.c.

7. In rs6000_analyze_swaps function of rs6000-p8swap.c,
   added requirement that !pass2_insn_entry[i].is_store
   before calling const_load_sequence_p.

8. Removed unnecessary code blocks at end of
   rs6000_analyze_swaps function of rs6000-p8swap.c.

9. Added 15 new tests to exercise different vector element sizes.

This patch has been bootstrapped and tested without regressions on
powerpc64le-unknown-linux (P8) and on powerpc-unknown-linux (P8,
big-endian, with both -m32 and -m64 target options).

Is this ok for trunk?

gcc/ChangeLog:

2017-09-25  Kelvin Nilsen  <kel...@gcc.gnu.org>

* config/rs6000/rs6000-p8swap.c (const_load_sequence_p): Revise
this function to return false if the definition used by the swap
instruction is artificial, or if the memory address from which the
constant value is loaded is not represented by a base address held
in a register or if the base address register is a frame or stack
pointer.  Additionally, return false if the base address of the
loaded constant is a SYMBOL_REF but is not considered to be a
constant.
(replace_swapped_load_constant): New function.
(rs6000_analyze_swaps): Add a new pass to replace a swap of a
loaded constant vector with a load of a swapped constant vector.

gcc/testsuite/ChangeLog:

2017-09-25  Kelvin Nilsen  <kel...@gcc.gnu.org>

* gcc.target/powerpc/swaps-p8-28.c: New test.
* gcc.target/powerpc/swaps-p8-29.c: New test.
* gcc.target/powerpc/swaps-p8-31.c: New test.
* gcc.target/powerpc/swaps-p8-32.c: New test.
* gcc.target/powerpc/swaps-p8-34.c: New test.
* gcc.target/powerpc/swaps-p8-35.c: New test.
* gcc.target/powerpc/swaps-p8-37.c: New test.
* gcc.target/powerpc/swaps-p8-38.c: New test.
* gcc.target/powerpc/swaps-p8-40.c: New test.
* gcc.target/powerpc/swaps-p8-41.c: New test.
* gcc.target/powerpc/swaps-p8-43.c: New test.
* gcc.target/powerpc/swaps-p8-44.c: New test.
* gcc.target/powerpc/swps-p8-30.c: New test.
* gcc.target/powerpc/swps-p8-33.c: New test.
* gcc.target/powerpc/swps-p8-36.c: New test.
* gcc.target/powerpc/swps-p8-39.c: New test.
* gcc.target/powerpc/swps-p8-42.c: New test.
* gcc.target/powerpc/swps-p8-45.c: New test.
Index: gcc/config/rs6000/rs6000-p8swap.c
===
--- gcc/config/rs6000/rs6000-p8swap.c   (revision 252768)
+++ gcc/config/rs6000/rs6000-p8swap.c   (working copy)
@@ -335,21 +335,26 @@ const_load_sequence_p (swap_web_entry *insn_entry,
 
   const_rtx tocrel_base;
 
-  /* Find the unique use in the swap and locate its def.  If the def
- isn't unique, punt.  */
   struct df_insn_info *insn_info = DF_INSN_INFO_GET (insn);
   df_ref use;
   FOR_EACH_INSN_INFO_USE (use, insn_info)
 {
   struct df_link *def_link = DF_REF_CHAIN (use);
-  if (!def_link || def_link->next)
+
+  /* If there is no def or the def is artificial or there are
+multiple defs, punt.  */
+  if (!def_link || !def_link->ref || DF_REF_IS_ARTIFICIAL (def_link->ref)
+ || def_link->next)
return false;
 
   rtx def_insn = DF_REF_INSN (def_link->ref);
   unsigned uid2 = INSN_UID (def_insn);
+  /* If this is not a load or is not a swap, return false */
   if (!insn_entry[uid2].is_load || !insn_entry[uid2].is_swap)
return false;
 
+  /* If the source of the rtl def is not a set from memory, return
+false.  */
   rtx body = PATTERN (def_insn);
   if (GET_CODE (body) != SET
  || GET_CODE (SE

[PATCH,rs6000] Replace swap of a loaded vector constant with load of a swapped vector constant

2017-09-15 Thread Kelvin Nilsen

On Power8 little endian, two instructions are needed to load from the
natural in-memory representation of a vector into a vector register: a
load followed by a swap.  When the vector value to be loaded is a
constant, more efficient code can be achieved by swapping the
representation of the constant in memory so that only a load instruction
is required.

This patch has been bootstrapped and tested without regressions on
powerpc64le-unknown-linux (P8) and on powerpc-unknown-linux (P8,
big-endian, with both -m32 and -m64 target options).

Is this ok for trunk?

gcc/ChangeLog:

2017-09-14  Kelvin Nilsen  <kel...@gcc.gnu.org>

* config/rs6000/rs6000-p8swap.c (const_load_sequence_p): Revise
this function to return false if the definition used by the swap
instruction is artificial, or if the memory address from which the
constant value is loaded is not represented by a base address held
in a register or if the base address register is a frame or stack
pointer.  Additionally, return false if the base address of the
loaded constant is a SYMBOL_REF but is not considered to be a
constant.
(replace_swapped_load_constant): New function.
(rs6000_analyze_swaps): Add a new pass to replace a swap of a
loaded constant vector with a load of a swapped constant vector.

gcc/testsuite/ChangeLog:

2017-09-14  Kelvin Nilsen  <kel...@gcc.gnu.org>

* gcc.target/powerpc/swaps-p8-28.c: New test.
* gcc.target/powerpc/swaps-p8-29.c: New test.
* gcc.target/powerpc/swps-p8-30.c: New test.

Index: gcc/config/rs6000/rs6000-p8swap.c
===
--- gcc/config/rs6000/rs6000-p8swap.c   (revision 252768)
+++ gcc/config/rs6000/rs6000-p8swap.c   (working copy)
@@ -342,7 +342,8 @@ const_load_sequence_p (swap_web_entry *insn_entry,
   FOR_EACH_INSN_INFO_USE (use, insn_info)
 {
   struct df_link *def_link = DF_REF_CHAIN (use);
-  if (!def_link || def_link->next)
+  if (!def_link || !def_link->ref || DF_REF_IS_ARTIFICIAL (def_link->ref)
+ || def_link->next)
return false;
 
   rtx def_insn = DF_REF_INSN (def_link->ref);
@@ -358,6 +359,8 @@ const_load_sequence_p (swap_web_entry *insn_entry,
 
   rtx mem = XEXP (SET_SRC (body), 0);
   rtx base_reg = XEXP (mem, 0);
+  if (!REG_P (base_reg))
+   return false;
 
   df_ref base_use;
   insn_info = DF_INSN_INFO_GET (def_insn);
@@ -370,6 +373,14 @@ const_load_sequence_p (swap_web_entry *insn_entry,
  if (!base_def_link || base_def_link->next)
return false;
 
+ /* Constants held on the stack are not "true" constants
+  * because their values are not part of the static load
+  * image.  If this constant's base reference is a stack
+  * or frame pointer, it is seen as an artificial
+  * reference. */
+ if (DF_REF_IS_ARTIFICIAL (base_def_link->ref))
+   return false;
+
  rtx tocrel_insn = DF_REF_INSN (base_def_link->ref);
  rtx tocrel_body = PATTERN (tocrel_insn);
  rtx base, offset;
@@ -385,6 +396,25 @@ const_load_sequence_p (swap_web_entry *insn_entry,
  split_const (XVECEXP (tocrel_base, 0, 0), , );
  if (GET_CODE (base) != SYMBOL_REF || !CONSTANT_POOL_ADDRESS_P (base))
return false;
+ else
+   {
+ /* FIXME: The conditions under which
+  *  ((GET_CODE (const_vector) == SYMBOL_REF) &&
+  *   !CONSTANT_POOL_ADDRESS_P (const_vector))
+  * are not well understood.  This code prevents
+  * an internal compiler error which will occur in
+  * replace_swapped_load_constant () if we were to return
+  * true.  Some day, we should figure out how to properly
+  * handle this condition in
+  * replace_swapped_load_constant () and then we can
+  * remove this special test.  */
+ rtx const_vector = get_pool_constant (base);
+ if (GET_CODE (const_vector) == SYMBOL_REF)
+   {
+ if (!CONSTANT_POOL_ADDRESS_P (const_vector))
+   return false;
+   }
+   }
}
 }
   return true;
@@ -1281,6 +1311,189 @@ replace_swap_with_copy (swap_web_entry *insn_entry
   insn->set_deleted ();
 }
 
+/* Given that swap_insn represents a swap of a load of a constant
+   vector value, replace with a single instruction that loads a
+   swapped variant of the original constant. 
+
+   The "natural" representation of a byte array in memory is the same
+   for big endian and little endian.
+
+   unsigned char byte_array[] =
+ { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f };
+
+   However, when loaded into a vector register, the representation
+   depends on endian con

Backports to gcc 6.x

2017-07-11 Thread Kelvin Nilsen

I would like to backport the following patches to the GCC 6 branch.

PR9: Fix failure of gcc.dg/loop-8.c on Power
  https://gcc.gnu.org/ml/gcc-patches/2017-01/msg01788.html

PR68972: g++.dg/cpp1y/vla-initlist1.C test case fails on power
  https://gcc.gnu.org/ml/gcc-patches/2017-02/msg00541.html

Handle conflicting target options -mno-power9-vector and -mcpu=power9
  https://gcc.gnu.org/ml/gcc-patches/2017-03/msg01192.html

PR80103: Fix ICE with cross compiler
  https://gcc.gnu.org/ml/gcc-patches/2017-03/msg01335.html

PR80101: Fix ICE in store_data_bypass_p
  https://gcc.gnu.org/ml/gcc-patches/2017-04/msg00953.html


Each of these patches has been bootstrapped and regression tested on the
GCC 6 branch.  In backport, patch PR80103 omits certain changes to
existing comments that are not present in GCC6.

Are these patches ok for backporting to GCC 6?



[PATCH,rs6000] PR80103: Fix typo in test case

2017-06-30 Thread Kelvin Nilsen

While reviewing regression test results for a back port of the PR80103
patch, I discovered a typographic error in the test case.  This patch
corrects the error.

I have tested this fix on powerpc64le-unknown-linux-gnu with no
regressions.  Is this ok for trunk?

gcc/testsuite/ChangeLog:

2017-06-30  Kelvin Nilsen  <kel...@gcc.gnu.org>

PR target/80103
* gcc.target/powerpc/pr80103-1.c (b): Correct spelling of
__attribute__.


Index: gcc/testsuite/gcc.target/powerpc/pr80103-1.c
===
--- gcc/testsuite/gcc.target/powerpc/pr80103-1.c(revision 249798)
+++ gcc/testsuite/gcc.target/powerpc/pr80103-1.c(working copy)
@@ -12,5 +12,5 @@
 int a;
 void b (__attribute__ ((__vector_size__ (16))) char c)
 {
-  a = ((__attributes__ ((__vector_size__ (2 * sizeof (long long) c)[0];
+  a = ((__attribute__ ((__vector_size__ (2 * sizeof (long long) c)[0];
 }



Re: Backport [PATCH,rs6000] PR80103: Fix ICE with cross compiler

2017-06-28 Thread Kelvin Nilsen

Is the attached refinement of this patch previously applied to mainline
ok for backport to gcc 6?  I have bootstrapped and tested without
regressions on powerpc64le-unknown-linux-gnu.

This patch differs from the original mainline patch in the following
regards:

 1. Certain commentary changes are omitted because the context to which
they applied is missing from GCC 6.

 2. A typo in a test case has been corrected.  The typo was discovered
during scrutiny of the backport regression testing results.  I will
momentarily submit a patch to correct the same test case on main line.

On 03/24/2017 04:14 PM, Segher Boessenkool wrote:
> On Fri, Mar 24, 2017 at 04:04:33PM -0600, Kelvin Nilsen wrote:
>> PR 80103 provides a test case which results in an internal
>> compiler error when invoked with -mno-direct-move -mpower9-dform-
>> vector target options.  The internal compiler error results because
>> these two target options are incompatible with each other.
>>
>> The enclosed patch simply disables this particular combination of
>> target options, terminating gcc with an error message instead of
>> producing an internal compiler error.  Additionally, this patch
>> includes new comments to address omissions from a patch committed
>> on 2017/03/23 which deals with conflicts between the 
>> -mno-power9-vector and -mcpu=power9 target options.
>>
>> This patch has been bootstrapped and tested with no regressions on
>> both powerpc64-unknown-linux-gnu and powerpc64le-unknown-linux-gnu.
>> Is this ok for the trunk?
> 
> This looks good, please apply.  Thanks,
> 
> 
> Segher
> 
> 

gcc/ChangeLog:

2017-06-28  Kelvin Nilsen  <kel...@gcc.gnu.org>

Backport from mainline
2017-03-27  Kelvin Nilsen  <kel...@gcc.gnu.org>

PR target/80103
* config/rs6000/rs6000.c (rs6000_option_override_internal): Add
special handling for target option conflicts between dform options
(-mpower9-dform, -mpower9-dform-vector, -mpower9-dform-scalar) and
-mno-direct-move.

gcc/testsuite/ChangeLog:

2017-06-28  Kelvin Nilsen  <kel...@gcc.gnu.org>

Backport from mainline
2017-03-27  Kelvin Nilsen  <kel...@gcc.gnu.org>

PR target/80103
* gcc.target/powerpc/pr80103-1.c: New test.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 249572)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -4295,6 +4295,33 @@ rs6000_option_override_internal (bool global_init_
| OPTION_MASK_P9_DFORM_VECTOR);
 }
 
+  if ((TARGET_P9_DFORM_SCALAR || TARGET_P9_DFORM_VECTOR)
+  && !TARGET_DIRECT_MOVE)
+{
+  /* We prefer to not mention undocumented options in
+error messages.  However, if users have managed to select
+power9-dform without selecting power9-vector, they
+already know about undocumented flags.  */
+  if ((rs6000_isa_flags_explicit & OPTION_MASK_DIRECT_MOVE)
+ && ((rs6000_isa_flags_explicit & OPTION_MASK_P9_DFORM_VECTOR) ||
+ (rs6000_isa_flags_explicit & OPTION_MASK_P9_DFORM_SCALAR) ||
+ (TARGET_P9_DFORM_BOTH == 1)))
+   error ("-mpower9-dform, -mpower9-dform-vector, -mpower9-dform-scalar"
+  " require -mdirect-move");
+  else if ((rs6000_isa_flags_explicit & OPTION_MASK_DIRECT_MOVE) == 0)
+   {
+ rs6000_isa_flags |= OPTION_MASK_DIRECT_MOVE;
+ rs6000_isa_flags_explicit |= OPTION_MASK_DIRECT_MOVE;
+   }
+  else
+   {
+ rs6000_isa_flags &=
+   ~(OPTION_MASK_P9_DFORM_SCALAR | OPTION_MASK_P9_DFORM_VECTOR);
+ rs6000_isa_flags_explicit |=
+   (OPTION_MASK_P9_DFORM_SCALAR | OPTION_MASK_P9_DFORM_VECTOR);
+   }
+}
+
   if (TARGET_P9_DFORM_SCALAR && !TARGET_UPPER_REGS_DF)
 {
   /* We prefer to not mention undocumented options in
Index: gcc/testsuite/gcc.target/powerpc/pr80103-1.c
===
--- gcc/testsuite/gcc.target/powerpc/pr80103-1.c(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr80103-1.c(working copy)
@@ -0,0 +1,16 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power9" } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-mpower9-dform-vector -mno-direct-move" } */
+/* { dg-excess-errors "expect error due to conflicting target options" } */
+/* Since the error message is not associated with a particular line
+   number, we cannot use the dg-error directive and cannot specify a
+   regexp to describe the expected error message.  The expected er

Re: Backport [PATCH,rs6000] Handle conflicting target options -mno-power9-vector and -mcpu=power9

2017-06-28 Thread Kelvin Nilsen

I have bootstrapped and tested this patch on
powerpc64le-unkonwn-linux-gnu with no regressions.  Is this ok for
backporting to gcc 6?



On 03/22/2017 10:17 PM, Segher Boessenkool wrote:
> On Wed, Mar 22, 2017 at 05:55:53PM -0600, Kelvin Nilsen wrote:
>>> Or it could do -mpower9-dform-scalar but disable -mpower9-dform-vector?
>>> That seems more reasonable.
>>
>> The internal problem report sent to me said "-mno-power9-vector should
>> override power9-dform unless the latter has been deliberately specified
>> by the user."  I'm just following orders.
> 
> Heh :-)
> 
>> If you think it preferable to
>> only override -mpower-dform-vector, I'll make that modification.
> 
> It is more logical.  Or so I though.  But as it turns out,
> -mpower9-dform-scalar is about vector registers as well.
> 
> So the patch is approved for trunk as-is.  Thanks!
> 
>>>>* config/rs6000/rs6000.c (rs6000_option_override_internal): Change
>>>>handling of certain combinations of target options, including the
>>>>combinations -mpower8-vector vs. -mno-vsx, -mpower8-vector vs.
>>>>-mno-power8-vector, and -mpower9_dform vs. -mno-power9-vector.
>>>
>>> Those other changes are independent?
>>
>> Actually, these other changes are not independent.  My initial attempt
>> at a patch only changed the behavior of -mpower9_dform vs.
>> -mno-power9-vector.  But this actually resulted in a regression of an
>> existing test.  To "properly" handle the new case without impacting
>> existing "established" behavior (as represented in the existing dejagnu
>> testsuite), I had to make these other changes as well.
> 
> Too many options :-(
> 
> 
> Segher
> 
> 

-- 
Kelvin Nilsen, Ph.D.  kdnil...@linux.vnet.ibm.com
home office: 801-756-4821, cell: 520-991-6727
IBM Linux Technology Center - PPC Toolchain



Re: [PATCH] PR68972: g++.dg/cpp1y/vla-initlist1.C test case fails on power (backport)

2017-06-26 Thread Kelvin Nilsen

Is this ok for backport to GCC 6?

On 02/06/2017 03:20 PM, Kelvin Nilsen wrote:
> 
> The test g++.dg/cpp1y/vla-initlist1.C makes assumptions that the memory
> used to represent the private temporary variables of neighboring control
> blocks at the same control nesting level is:
> 
> 1. found at the same address, and
> 2. not overwritten between when the first block ends and the second
> block begins.
> 
> While these assumptions are valid with some optimization choices on some
> architectures, these assumptions do not hold universally.
> 
> With optimization disabled on the power architecture, the
> g++.dg/cpp1y/vla-initlist1.C test program runs initialization code to
> allocate the variable-length array a[] before entry into the second of
> two neighboring control blocks.  This initialization code overwrites the
> first two cells of the array i[] that were initialized by the first of
> the two neighboring control blocks.  Thus, the initialization value
> stored into i[1] is no longer present when this value is subsequently
> fetched as a[1].i from within the second control block.
> 
> This patch disables this particular test case on power hardware.
> 
> The patch has been bootstrapped and tested on
> powerpc64le-unknown-linux with no regressions.
> 
> Is this ok for trunk?
> 
> gcc/testsuite/ChangeLog:
> 
> 2017-02-06  Kelvin Nilsen  <kel...@gcc.gnu.org>
> 
>   PR target/68972
>   * g++.dg/cpp1y/vla-initlist1.C: Add dg-skip-if directive to
>   disable this test on power architecture.
> 
> Index: gcc/testsuite/g++.dg/cpp1y/vla-initlist1.C
> ===
> --- gcc/testsuite/g++.dg/cpp1y/vla-initlist1.C(revision 245156)
> +++ gcc/testsuite/g++.dg/cpp1y/vla-initlist1.C(working copy)
> @@ -1,4 +1,5 @@
>  // { dg-do run { target c++11 } }
> +// { dg-skip-if "power overwrites two slots of array i" { "power*-*-*"
> } { "*" } { "" } }
>  // { dg-options "-Wno-vla" }
> 
>  #include 
> 
> 

-- 
Kelvin Nilsen, Ph.D.  kdnil...@linux.vnet.ibm.com
home office: 801-756-4821, cell: 520-991-6727
IBM Linux Technology Center - PPC Toolchain



Re: [PATCH] PR66669: Fix failure of gcc.dg/loop-8.c on Power (Backport)

2017-06-26 Thread Kelvin Nilsen

Is it ok to backport this patch to GCC-6?

On 01/23/2017 09:59 AM, Kelvin Nilsen wrote:
> 
> The test gcc.dg/loop-8.c makes assumptions that are not valid on Power
> architecture (and on certain other architectures for which this issue
> has already been addressed).  The test case assumes that a single
> loop-invariant statement will be moved outside the loop.  On Power, a
> constant is copy-propagated within the loop, and the subsequent
> loop-invariant code motion moves two loop-invariant statements out of
> the loop.
> 
> This patch simply disables this test case on Power architecture.
> 
> 
> gcc/testsuite/ChangeLog:
> 
> 2017-01-23  Kelvin Nilsen  <kel...@gcc.gnu.org>
> 
>   PR target/9
>   * gcc.dg/loop-8.c: Modify dg-skip-if directive to exclude this
>   test on powerpc targets.
> 
> Index: gcc/testsuite/gcc.dg/loop-8.c
> ===
> --- gcc/testsuite/gcc.dg/loop-8.c (revision 244730)
> +++ gcc/testsuite/gcc.dg/loop-8.c (working copy)
> @@ -1,6 +1,6 @@
>  /* { dg-do compile } */
>  /* { dg-options "-O1 -fdump-rtl-loop2_invariant" } */
> -/* { dg-skip-if "unexpected IV" { "hppa*-*-* mips*-*-* visium-*-*" } { "*" } 
> { "" } } */
> +/* { dg-skip-if "unexpected IV" { "hppa*-*-* mips*-*-* visium-*-* 
> powerpc*-*-*" } { "*" } { "" } } */
> 
>  void
>  f (int *a, int *b)
> 
> 

-- 
Kelvin Nilsen, Ph.D.  kdnil...@linux.vnet.ibm.com
home office: 801-756-4821, cell: 520-991-6727
IBM Linux Technology Center - PPC Toolchain



[PATCH,rs6000] Add IEEE 128 support for several existing built-in functions

2017-06-21 Thread Kelvin Nilsen

This patch adds IEEE 128 support to the existing scalar_insert_exp,
scalar_extract_exp, scalar_extract_sig, scalar_test_data_class, and
scalar_test_neg rs6000 built-in functions.  Test programs are provided
to exercise the new IEEE 128 functionality and to validate forms of
these built-in functions that do not depend on IEEE 128 support.

The patch has been boostrapped and tested on powerpc64le-unknown-linux
(both P8 and P9 targets) and powerpc-unknown-linux (beg-endian, with
both -m32 and -m64 target options) with no regressions.

Is this ok for the trunk?

gcc/ChangeLog:

2017-06-19  Kelvin Nilsen  <kel...@gcc.gnu.org>

* config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Add
array entries to represent __ieee128 versions of the
scalar_test_data_class, scalar_test_neg, scalar_extract_exp,
scalar_extract_sig, and scalar_insert_exp built-in functions.
(altivec_resolve_overloaded_builtin): Add special case handling
for the __builtin_scalar_insert_exp function, as represented by
the P9V_BUILTIN_VEC_VSIEDP constant.
* config/rs6000/rs6000-builtin.def (VSEEQP): Add scalar extract
exponent support for __ieee128 argument.
(VSESQP): Add scalar extract signature support for __ieee128
argument.
(VSTDCNQP): Add scalar test negative support for __ieee128
argument.
(VSIEQP): Add scalar insert exponent support for __int128 argument
with __ieee128 result.
(VSIEQPF): Add scalar insert exponent support for __ieee128
argument with __ieee128 result.
(VSTDCQP): Add scalar test data class support for __ieee128
argument.
(VSTDCNQP): Add overload support for scalar test negative with
__ieee128 argument.
(VSTDCQP): Add overload support for scalar test data class
__ieee128 argument.
* config/rs6000/vsx.md (UNSPEC_VSX_SIEXPQP): New constant.
(xsxexpqp): New insn for VSX scalar extract exponent quad
precision.
(xsxsigqp): New insn for VSX scalar extract significand quad
precision.
(xsiexpqpf): New insn for VSX scalar insert exponent quad
precision with floating point argument.
(xststdcqp): New expand for VSX scalar test data class quad
precision.
(xststdcnegqp): New expand for VSX scalar test negative quad
precision.
(xststdcqp): New insn to match expansions for VSX scalar test data
class quad precision and VSX scalar test negative quad precision.
* config/rs6000/rs6000.c (rs6000_expand_binop_builtin): Add
special case operand checking to enforce that second operand of
VSX scalar test data class with quad precision argument is a 7-bit
unsigned literal.
* doc/extend.texi (PowerPC AltiVec Built-in Functions): Add
prototypes and descriptions of __ieee128 versions of
scalar_extract_exp, scalar_extract_sig, scalar_insert_exp,
scalar_test_data_class, and scalar_test_neg built-in functions.

gcc/testsuite/ChangeLog:

2017-06-19  Kelvin Nilsen  <kel...@gcc.gnu.org>

* gcc.target/powerpc/bfp/scalar-cmp-exp-eq-3.c: New test.
* gcc.target/powerpc/bfp/scalar-cmp-exp-eq-4.c: New test.
* gcc.target/powerpc/bfp/scalar-cmp-exp-gt-3.c: New test.
* gcc.target/powerpc/bfp/scalar-cmp-exp-gt-4.c: New test.
* gcc.target/powerpc/bfp/scalar-cmp-exp-lt-3.c: New test.
* gcc.target/powerpc/bfp/scalar-cmp-exp-lt-4.c: New test.
* gcc.target/powerpc/bfp/scalar-cmp-exp-unordered-3.c: New test.
* gcc.target/powerpc/bfp/scalar-cmp-exp-unordered-4.c: New test.
* gcc.target/powerpc/bfp/scalar-extract-exp-3.c: New test.
* gcc.target/powerpc/bfp/scalar-extract-exp-4.c: New test.
* gcc.target/powerpc/bfp/scalar-extract-exp-5.c: New test.
* gcc.target/powerpc/bfp/scalar-extract-exp-6.c: New test.
* gcc.target/powerpc/bfp/scalar-extract-exp-7.c: New test.
* gcc.target/powerpc/bfp/scalar-extract-sig-3.c: New test.
* gcc.target/powerpc/bfp/scalar-extract-sig-4.c: New test.
* gcc.target/powerpc/bfp/scalar-extract-sig-5.c: New test.
* gcc.target/powerpc/bfp/scalar-extract-sig-6.c: New test.
* gcc.target/powerpc/bfp/scalar-extract-sig-7.c: New test.
* gcc.target/powerpc/bfp/scalar-insert-exp-10.c: New test.
* gcc.target/powerpc/bfp/scalar-insert-exp-11.c: New test.
* gcc.target/powerpc/bfp/scalar-insert-exp-12.c: New test.
* gcc.target/powerpc/bfp/scalar-insert-exp-13.c: New test.
* gcc.target/powerpc/bfp/scalar-insert-exp-14.c: New test.
* gcc.target/powerpc/bfp/scalar-insert-exp-15.c: New test.
* gcc.target/powerpc/bfp/scalar-insert-exp-6.c: New test.
* gcc.target/powerpc/bfp/scalar-insert-exp-7.c: New test.
* gcc.target/powerpc/bfp/scalar-insert-exp-8.c: New test.
* gcc.target/powerpc/bfp/s

[PATCH v2,rs6000] Add built-in function support for compare bytes instruction

2017-05-08 Thread Kelvin Nilsen

This patch adds support for the compare bytes instruction, which has
been available in the rs6000 architecture since Power6.  Thank you to
Segher Boessenkool for feedback on the original submission of this
patch.  The following refinements have been incorporated:

1. Changed the implementation and documentation to present a single
overloaded function that handles either 32-bit or 64-bit arguments.

2. Corrected the spelling of compare in the comment describing the
RS6000_BTM_CMPB macro.  In response to reviewer question of whether
this line is too long: it is not.  It only appears that way due to
alignment of tabs in the diff output.

The patch has been bootstrapped and tested on powerpc64le-unknown-linux
and powerpc-unknown-linux (big-endian, with both -m32 and -m64 target
options) with no regressions.

Is this ok for the trunk?

gcc/testsuite/ChangeLog:

2017-05-08  Kelvin Nilsen  <kel...@gcc.gnu.org>

* gcc.target/powerpc/cmpb-1.c: New test.
* gcc.target/powerpc/cmpb-2.c: New test.
* gcc.target/powerpc/cmpb-3.c: New test.
* gcc.target/powerpc/cmpb32-1.c: New test.
* gcc.target/powerpc/cmpb32-2.c: New test.

gcc/ChangeLog:

2017-05-08  Kelvin Nilsen  <kel...@gcc.gnu.org>

* config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Add
array entries to represent two legal parameterizations of the
overloaded __builtin_cmpb function, as represented by the
P6_OV_BUILTIN_CMPB constant.
(altivec_resolve_overloaded_builtin): Add special case handling
for the __builtin_cmpb function, as represented by the
P6_OV_BUILTIN_CMPB constant.
* config/rs6000/rs6000-builtin.def (BU_P6_2): New macro.
(BU_P6_64BIT_2): New macro.
(BU_P6_OVERLOAD_2): New macro
(CMPB_32): Add 32-bit compare-bytes support for 32-bit only targets.
(CMPB): Add 64-bit compare-bytes support for 32-bit and 64-bit targets.
(CMPB): Add overload support to represent both 32-bit and 64-bit
compare-bytes function.
* config/rs6000/rs6000.c (rs6000_builtin_mask_calculate): Add
support for TARGET_CMPB.
* config/rs6000/rs6000.h: Add support for RS6000_BTM_CMPB.
* doc/extend.texi (PowerPC AltiVec Built-in Functions): Add
documentation of the __builtin_cmpb overloaded built-in function.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 247069)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -3788,6 +3788,7 @@ HOST_WIDE_INT
 rs6000_builtin_mask_calculate (void)
 {
   return (((TARGET_ALTIVEC)? RS6000_BTM_ALTIVEC   : 0)
+ | ((TARGET_CMPB)  ? RS6000_BTM_CMPB  : 0)
  | ((TARGET_VSX)   ? RS6000_BTM_VSX   : 0)
  | ((TARGET_SPE)   ? RS6000_BTM_SPE   : 0)
  | ((TARGET_PAIRED_FLOAT)  ? RS6000_BTM_PAIRED: 0)
Index: gcc/config/rs6000/rs6000.h
===
--- gcc/config/rs6000/rs6000.h  (revision 247069)
+++ gcc/config/rs6000/rs6000.h  (working copy)
@@ -2717,6 +2717,7 @@ extern int frame_pointer_needed;
aren't in target_flags.  */
 #define RS6000_BTM_ALWAYS  0   /* Always enabled.  */
 #define RS6000_BTM_ALTIVEC MASK_ALTIVEC/* VMX/altivec vectors.  */
+#define RS6000_BTM_CMPBMASK_CMPB   /* ISA 2.05: compare 
bytes.  */
 #define RS6000_BTM_VSX MASK_VSX/* VSX (vector/scalar).  */
 #define RS6000_BTM_P8_VECTOR   MASK_P8_VECTOR  /* ISA 2.07 vector.  */
 #define RS6000_BTM_P9_VECTOR   MASK_P9_VECTOR  /* ISA 3.0 vector.  */
Index: gcc/config/rs6000/rs6000-c.c
===
--- gcc/config/rs6000/rs6000-c.c(revision 247069)
+++ gcc/config/rs6000/rs6000-c.c(working copy)
@@ -5348,6 +5348,11 @@ const struct altivec_builtin_types altivec_overloa
 RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI,
 RS6000_BTI_unsigned_V1TI, 0 },
 
+  { P6_OV_BUILTIN_CMPB, P6_BUILTIN_CMPB_32,
+RS6000_BTI_UINTSI, RS6000_BTI_UINTSI, RS6000_BTI_UINTSI, 0 },
+  { P6_OV_BUILTIN_CMPB, P6_BUILTIN_CMPB,
+RS6000_BTI_UINTDI, RS6000_BTI_UINTDI, RS6000_BTI_UINTDI, 0 },
+
   { P8V_BUILTIN_VEC_VUPKHSW, P8V_BUILTIN_VUPKHSW,
 RS6000_BTI_V2DI, RS6000_BTI_V4SI, 0, 0 },
   { P8V_BUILTIN_VEC_VUPKHSW, P8V_BUILTIN_VUPKHSW,
@@ -6409,25 +6414,76 @@ altivec_resolve_overloaded_builtin (location_t loc
 for (desc = altivec_overloaded_builtins;
 desc->code && desc->code != fcode; desc++)
   continue;
-
-/* For arguments after the last, we have RS6000_BTI_NOT_OPAQUE in
-   the opX fields.  */
-for (; desc->code == fcode; desc++)
+
+/* Need to special case __builtin_cmp because the overloaded forms
+   of this function take (unsigned int, unsigned int) o

[PATCH,rs600] Add built-in function support for compare bytes instruction

2017-04-28 Thread Kelvin Nilsen

This patch adds support for the compare bytes instruction, which has
been available in the rs6000 architecture since Power6.

The patch has been bootstrapped and tested on powerpc64le-unknown-linux
and powerpc-unknown-linux (big-endian, with both -m32 and -m64 target
options) with no regressions.

Is this ok for the trunk?

gcc/ChangeLog:

2017-04-28  Kelvin Nilsen  <kel...@gcc.gnu.org>

* config/rs6000/rs6000.c (rs6000_builtin_mask_calculate): Add
support for TARGET_CMPB.
* config/rs6000/rs6000.h: Add support for RS6000_BTM_CMPB.
* config/rs6000/rs6000-builtin.def (BU_P6_CMPB_2): New macro.
(BU_P6_64BIT_CMPB_2): New macro.
(CMPB_32): Add compare-bytes support for 32-bit only targets.
(CMPB): Add compare-bytes support for 32-bit and 64-bit targets.
* doc/extend.texi (PowerPC AltiVec Built-in Functions): Add
documentation of __builtin_cmpb and __builtin_cmpb_32 built-in
functions.

gcc/testsuite/ChangeLog:

2017-04-28  Kelvin Nilsen  <kel...@gcc.gnu.org>

* gcc.target/powerpc/cmpb-1.c: New test.
* gcc.target/powerpc/cmpb-2.c: New test.
* gcc.target/powerpc/cmpb-3.c: New test.
* gcc.target/powerpc/cmpb32-1.c: New test.
* gcc.target/powerpc/cmpb32-2.c: New test.


Index: gcc/config/rs6000/rs6000-builtin.def
===
--- gcc/config/rs6000/rs6000-builtin.def(revision 247069)
+++ gcc/config/rs6000/rs6000-builtin.def(working copy)
@@ -339,6 +339,26 @@
 | RS6000_BTC_SPECIAL), \
CODE_FOR_nothing)   /* ICODE */
 
+/* ISA 2.05 (power6) convenience macros. */
+/* For functions that depend on the CMPB instruction */
+#define BU_P6_CMPB_2(ENUM, NAME, ATTR, ICODE)  \
+  RS6000_BUILTIN_2 (P6_BUILTIN_ ## ENUM,   /* ENUM */  \
+   "__builtin_" NAME,  /* NAME */  \
+   RS6000_BTM_CMPB,/* MASK */  \
+   (RS6000_BTC_ ## ATTR/* ATTR */  \
+| RS6000_BTC_BINARY),  \
+   CODE_FOR_ ## ICODE) /* ICODE */
+
+/* For functions that depend on 64-BIT support and on the CMPB instruction */
+#define BU_P6_64BIT_CMPB_2(ENUM, NAME, ATTR, ICODE)\
+  RS6000_BUILTIN_2 (P6_BUILTIN_ ## ENUM,   /* ENUM */  \
+   "__builtin_" NAME,  /* NAME */  \
+   RS6000_BTM_CMPB \
+ | RS6000_BTM_64BIT,   /* MASK */  \
+   (RS6000_BTC_ ## ATTR/* ATTR */  \
+| RS6000_BTC_BINARY),  \
+   CODE_FOR_ ## ICODE) /* ICODE */
+
 /* ISA 2.07 (power8) vector convenience macros.  */
 /* For the instructions that are encoded as altivec instructions use
__builtin_altivec_ as the builtin name.  */
@@ -1778,6 +1798,10 @@ BU_VSX_OVERLOAD_X (ST,"st")
 BU_VSX_OVERLOAD_X (XL,  "xl")
 BU_VSX_OVERLOAD_X (XST, "xst")
 
+/* 2 argument CMPB instructions added in ISA 2.05. */
+BU_P6_CMPB_2 (CMPB_32,"cmpb_32",   CONST,  cmpbsi3)
+BU_P6_64BIT_CMPB_2 (CMPB, "cmpb",  CONST,  cmpbdi3)
+
 /* 1 argument VSX instructions added in ISA 2.07.  */
 BU_P8V_VSX_1 (XSCVSPDPN,  "xscvspdpn", CONST,  vsx_xscvspdpn)
 BU_P8V_VSX_1 (XSCVDPSPN,  "xscvdpspn", CONST,  vsx_xscvdpspn)
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 247069)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -3788,6 +3788,7 @@ HOST_WIDE_INT
 rs6000_builtin_mask_calculate (void)
 {
   return (((TARGET_ALTIVEC)? RS6000_BTM_ALTIVEC   : 0)
+ | ((TARGET_CMPB)  ? RS6000_BTM_CMPB  : 0)
  | ((TARGET_VSX)   ? RS6000_BTM_VSX   : 0)
  | ((TARGET_SPE)   ? RS6000_BTM_SPE   : 0)
  | ((TARGET_PAIRED_FLOAT)  ? RS6000_BTM_PAIRED: 0)
Index: gcc/config/rs6000/rs6000.h
===
--- gcc/config/rs6000/rs6000.h  (revision 247069)
+++ gcc/config/rs6000/rs6000.h  (working copy)
@@ -2717,6 +2717,7 @@ extern int frame_pointer_needed;
aren't in target_flags.  */
 #define RS6000_BTM_ALWAYS  0   /* Always enabled.  */
 #define RS6000_BTM_ALTIVEC MASK_ALTIVEC/* VMX/altivec vectors.  */
+#define RS6000_BTM_CMPBMASK_CMPB   /* ISA 2.05: cmopare 
bytes.  */
 #define RS6000_BTM_VSX MASK_VSX

[PATCH v3,rs6000] PR80101: Fix ICE in store_data_bypass_p

2017-04-21 Thread Kelvin Nilsen

This problem reports an assertion error when certain rtl expressions
which are not eligible as producers or consumers of a store bypass
optimization are passed as arguments to the store_data_bypass_p
function.  Since the problem surfaced with tests targeting the rs6000
architecture, the proposed patch is integrated within the rs6000 back
end.  

A new rs6000_store_data_bypass_p function has been introduced and all
calls to store_data_bypass_p from within the rs6000 back end have been
replaced with calls to rs6000_store_data_bypass_p.  This new function
scans its arguments for patterns that are known to cause assertion
errors in store_data_bypass_p and returns false if any of those
patterns are encountered.  Otherwise, rs6000_store_data_bypass_p simply
returns the result produced when passing its arguments to a call of
store_data_bypass_p.

Thank you for feedback and guidance from Eric Botcazou, Segher
Boessenkool, Richard Sandiford, and Pat Haugen which was offered in
response to my first two patch submissions and an RFC post on this
topic.  With all of your help, I now have a much better understanding
of the intended role of store_data_bypass_p.

The patch has been boostrapped without regressions on
powerpc64le-unknown-linux-gnu.  Is this ok for the trunk?

gcc/testsuite/ChangeLog:

2017-04-20  Kelvin Nilsen  <kel...@gcc.gnu.org>

PR target/80101
* gcc.target/powerpc/pr80101-1.c: New test.


gcc/ChangeLog:

2017-04-20  Kelvin Nilsen  <kel...@gcc.gnu.org>

PR target/80101
* config/rs6000/power6.md: Replace store_data_bypass_p calls with
rs6000_store_data_bypass_p in seven define_bypass directives and
in several comments.
* config/rs6000/rs6000-protos.h: Add prototype for
rs6000_store_data_bypass_p function.
* config/rs6000/rs6000.c (rs6000_store_data_bypass_p): New
function implements slightly different (rs6000-specific) semantics
than store_data_bypass_p, returning false rather than aborting
with assertion error when arguments do not satisfy the
requirements of store data bypass.
(rs6000_adjust_cost): Replace six calls of store_data_bypass_p with
rs6000_store_data_bypass_p.

Index: gcc/config/rs6000/power6.md
===
--- gcc/config/rs6000/power6.md (revision 246469)
+++ gcc/config/rs6000/power6.md (working copy)
@@ -108,7 +108,7 @@
   power6-store-update-indexed,\
   power6-fpstore,\
   power6-fpstore-update"
-  "store_data_bypass_p")
+  "rs6000_store_data_bypass_p")
 
 (define_insn_reservation "power6-load-ext" 4 ; fx
   (and (eq_attr "type" "load")
@@ -128,7 +128,7 @@
   power6-store-update-indexed,\
   power6-fpstore,\
   power6-fpstore-update"
-  "store_data_bypass_p")
+  "rs6000_store_data_bypass_p")
 
 (define_insn_reservation "power6-load-update" 2 ; fx
   (and (eq_attr "type" "load")
@@ -276,7 +276,7 @@
   power6-store-update-indexed,\
   power6-fpstore,\
   power6-fpstore-update"
-  "store_data_bypass_p")
+  "rs6000_store_data_bypass_p")
 
 (define_insn_reservation "power6-cntlz" 2
   (and (eq_attr "type" "cntlz")
@@ -289,7 +289,7 @@
   power6-store-update-indexed,\
   power6-fpstore,\
   power6-fpstore-update"
-  "store_data_bypass_p")
+  "rs6000_store_data_bypass_p")
 
 (define_insn_reservation "power6-var-rotate" 4
   (and (eq_attr "type" "shift")
@@ -355,7 +355,7 @@
   power6-store-update-indexed,\
   power6-fpstore,\
   power6-fpstore-update"
-  "store_data_bypass_p")
+  "rs6000_store_data_bypass_p")
 
 (define_insn_reservation "power6-delayed-compare" 2 ; N/A
   (and (eq_attr "type" "shift")
@@ -420,7 +420,7 @@
   power6-store-update-indexed,\
   power6-fpstore,\
   power6-fpstore-update"
-  "store_data_bypass_p")
+  "rs6000_store_data_bypass_p")
 
 (define_insn_reservation "power6-idiv" 44
   (and (eq_attr "type" "div")
@@ -436,7 +436,7 @@
 ;  power6-store-update-indexed,\
 ;  power6-fpstore,\
 ;  power6-fpstore-update"
-;  "store_data_bypass_p")
+;  "rs6000_store_data_bypass_p")
 
 (define_insn_reservation "power6-ldiv" 56
   (and (eq_attr "type" "div")
@@ -452,7 +452,7 @@
 ;  power6-store-update-indexed,\
 ;  power6-fpstore,\
 ;  power6-fpstor

[PATCH v2] PR80101: Fix ICE in store_data_bypass_p

2017-04-14 Thread Kelvin Nilsen

This problem reports an assertion error when certain rtl expressions
which are not eligible as producers or consumers of a store bypass
optimization are passed as arguments to the store_data_bypass_p
function.  The proposed patch returns false from store_data_bypass_p
rather than terminating with an assertion error.  False indicates that
the passed arguments are not eligible for the store bypass scheduling
optimization.

Thank you for feedback and guidance received in response to my first
patch submission and the follow-on RFC post from Eric Botcazou, Segher
Boessenkool, Richard Sandiford, and Pat Haugen.  With all of your help,
I now have a much better understanding of the intended role of
store_data_bypass_p.  This new revision of the patch differs from the
original submission in the following ways:

1. I have modified the comment that describes this function to clarify
that this function is only called if it is already determined that
there exists at least one variable that is set by OUT_INSN and read by
IN_INSN. My modified comment also clarifies the function's new behavior,
as implemented with this patch. 

2. I have added comments to the body of the function to clarify some of
the rationale for the existing code and the newly inserted code,
especially where I was originally confused because I did not understand
the rationale.

3. I have added code to allow USE expressions beneath a PARALLEL node
without invalidating store data bypass (for consistency, for example,
with the implementation of single_set, and as mentioned in feedback
from Richard Sandiford).

I gather that it is extremely unlikely that in_insn would represent a
PARALLEL with multiple store operations beneath it, but this function,
as originally implemented, supports that possibility, and my changes to
the function do as well.

The patch has been boostrapped without regressions on
powerpc64le-unknown-linux-gnu.  Is this ok for the trunk?

gcc/testsuite/ChangeLog:

2017-04-14  Kelvin Nilsen  <kel...@gcc.gnu.org>

* gcc.target/powerpc/pr80101-1.c: New test.


gcc/ChangeLog:

2017-04-14  Kelvin Nilsen  <kel...@gcc.gnu.org>

* recog.c (store_data_bypass_p): Rather than terminate with
assertion error, return false if either of the function's
arguments is not a singe_set or a PARALLEL with only SETS inside.
Allow USE subexpressions in addition to CLOBBER subexpressions
within a PARALLEL that represents either of the function's
arguments.  Add and modify comments to clarify behavior.

Index: gcc/recog.c
===
--- gcc/recog.c (revision 246469)
+++ gcc/recog.c (working copy)
@@ -3663,9 +3663,14 @@ peephole2_optimize (void)
 
 /* Common predicates for use with define_bypass.  */
 
-/* True if the dependency between OUT_INSN and IN_INSN is on the store
-   data not the address operand(s) of the store.  IN_INSN and OUT_INSN
-   must be either a single_set or a PARALLEL with SETs inside.  */
+/* Given that there exists at least one variable that is set (produced)
+   by OUT_INSN and read (consumed) by IN_INSN, return true iff
+   IN_INSN represents one or more memory store operations and none of
+   the variables set by OUT_INSN is used by IN_INSN as the address of a
+   store operation.  If either IN_INSN or OUT_INSN does not represent
+   a "single" RTL SET expression (as loosely defined by the
+   implementation of the single_set function) or a PARALLEL with only
+   SETs, CLOBBERs, and USEs inside, this function returns false.  */
 
 int
 store_data_bypass_p (rtx_insn *out_insn, rtx_insn *in_insn)
@@ -3678,6 +3683,8 @@ store_data_bypass_p (rtx_insn *out_insn, rtx_insn
   in_set = single_set (in_insn);
   if (in_set)
 {
+  /* If in_set does not represent a store operation, this insn
+pair is not eligible for store data bypass.  */
   if (!MEM_P (SET_DEST (in_set)))
return false;
 
@@ -3684,6 +3691,9 @@ store_data_bypass_p (rtx_insn *out_insn, rtx_insn
   out_set = single_set (out_insn);
   if (out_set)
 {
+ /* If the address stored by in_set is set by out_set, the
+dependency is on the address of the store operation, so
+this insn pair is not eligible for store data bypass.  */
   if (reg_mentioned_p (SET_DEST (out_set), SET_DEST (in_set)))
 return false;
 }
@@ -3698,11 +3708,15 @@ store_data_bypass_p (rtx_insn *out_insn, rtx_insn
   {
 out_exp = XVECEXP (out_pat, 0, i);
 
-if (GET_CODE (out_exp) == CLOBBER)
-  continue;
+   if ((GET_CODE (out_exp) == CLOBBER) || (GET_CODE (out_exp) == USE))
+ continue;
+else if (GET_CODE (out_exp) != SET)
+  return false;
 
-gcc_assert (GET_CODE (out_exp) == SET);
-
+   /* If the address to which the in_set store operation
+  writes is set by any of 

[PATCH,rs6000] PR80315: Add test cases to confirm ICE has been fixed

2017-04-12 Thread Kelvin Nilsen

PR80315 Reported an Internal Compiler Error when the third argument to
__builtin_crypto_vshasigmaw was an integer constant with a value
greater than 15.  The patch to correct this problem was committed
yesterday. This patch adds 4 new test cases to the regression suite.

Regression testing has confirmed that these test programs reproduce the
error reported with PR80315 before yesterday's patch was applied, and
that all test programs pass following application of yesterday's patch.

Is this ok for the trunk?


gcc/testsuite/ChangeLog:

2017-04-12  Kelvin Nilsen  <kel...@gcc.gnu.org>

* gcc.target/powerpc/pr80315-1.c: New test.
* gcc.target/powerpc/pr80315-2.c: New test.
* gcc.target/powerpc/pr80315-3.c: New test.
* gcc.target/powerpc/pr80315-4.c: New test.

Index: gcc/testsuite/gcc.target/powerpc/pr80315-1.c
===
--- gcc/testsuite/gcc.target/powerpc/pr80315-1.c(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr80315-1.c(working copy)
@@ -0,0 +1,16 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power8" } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-mcpu=power8" } */
+
+int
+main()
+{
+  __attribute__((altivec(vector__))) unsigned int test, res;
+  const int s0 = 0;
+  int mask;
+
+  /* Argument 2 must be 0 or 1.  Argument 3 must be in range 0..15.  */
+  res = __builtin_crypto_vshasigmaw (test, 1, 0xff); /* { dg-error "argument 3 
must be in the range 0..15" } */
+  return 0;
+}
Index: gcc/testsuite/gcc.target/powerpc/pr80315-2.c
===
--- gcc/testsuite/gcc.target/powerpc/pr80315-2.c(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr80315-2.c(working copy)
@@ -0,0 +1,16 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power8" } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-mcpu=power8" } */
+
+int
+main ()
+{
+  __attribute__((altivec(vector__))) unsigned long long test, res;
+  const int s0 = 0;
+  int mask;
+
+  /* Argument 2 must be 0 or 1.  Argument 3 must be in range 0..15.  */
+  res = __builtin_crypto_vshasigmad (test, 1, 0xff); /* { dg-error "argument 3 
must be in the range 0..15" } */
+  return 0;
+}
Index: gcc/testsuite/gcc.target/powerpc/pr80315-3.c
===
--- gcc/testsuite/gcc.target/powerpc/pr80315-3.c(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr80315-3.c(working copy)
@@ -0,0 +1,18 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power8" } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-mcpu=power8" } */
+
+#include 
+
+vector unsigned int
+main ()
+{
+  vector unsigned int test, res;
+  const int s0 = 0;
+  int mask;
+
+  /* Argument 2 must be 0 or 1.  Argument 3 must be in range 0..15.  */
+  res = vec_shasigma_be (test, 1, 0xff); /* { dg-error "argument 3 must be in 
the range 0..15" } */
+  return res;
+}
Index: gcc/testsuite/gcc.target/powerpc/pr80315-4.c
===
--- gcc/testsuite/gcc.target/powerpc/pr80315-4.c(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr80315-4.c(working copy)
@@ -0,0 +1,18 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power8" } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-mcpu=power8" } */
+
+#include 
+
+vector unsigned long long int
+main ()
+{
+  vector unsigned long long int test, res;
+  const int s0 = 0;
+  int mask;
+
+  /* Argument 2 must be 0 or 1.  Argument 3 must be in range 0..15.  */
+  res = vec_shasigma_be (test, 1, 0xff); /* { dg-error "argument 3 must be in 
the range 0..15" } */
+  return res;
+}



RFC: seeking insight on store_data_bypass_p (recog.c)

2017-04-12 Thread Kelvin Nilsen


My work on PR80101 is "motivating" me to modify the implementation of
store_data_bypass_p (in gcc/recog.c).

I have a patch that bootstraps with no regressions.  However, I think
"regression" testing may not be enough to prove I got this right.  If my
new patch returns the wrong value, the outcome will be poor instruction
scheduling decisions, which will impact performance, but probably not
"correctness".

So I'd like some help understanding the existing implementation of
store_data_bypass_p.  To establish some context, here is what I think I
understand about this function:

1. As input arguments, out_insn represents an rtl expression that
potentially "produces" a store to memory and in_insn represents an rtl
expression that potentially "consumes" a value recently stored to memory.

2. If the memory store produced matches the memory fetch consumed, this
function returns true to indicate that this sequence of two instructions
qualifies for a special "bypass" latency that represents the fact that
the fetch will obtain the value out of the write buffer.  So, whereas
the instruction scheduler might normally expect that this sequence of
two instructions would experience Load-Hit-Store penalties associated
with cache coherency hardware costs, since these two instruction qualify
for the store_data_bypass optimization, the instruction scheduler counts
the latency as only 1 or 2 cycles (potentially).  [This is what I
understand, but I may be wrong, so please correct me if so.]

3. Actually, what I described above is only the "simple" case.  It may
be that the rtl for either out_insn or in_insn is really a parallel
clause with multiple rtl trees beneath it.  In this case, we compare the
subtrees in a "similar" way to see if the compound expressions qualify
for the store_data_bypass_p "optimization".  (I've got some questions
about how this is done below)  As currently implemented, special
handling is given to a CLOBBER subtree as part of either PARALLEL
expression: we ignore it.  This is because CLOBBER does not represent
any real machine instructions.  It just represents semantic information
that might be used by the compiler.

In addition to seeking confirmation of my existing understanding of the
code as outlined above, the specific questions that I am seeking help
with are:

1. In the current implementation (as I understand it), near the top of
the function body, we handle the case that the consumer (in_insn) rtl is
a single SET expression and the producer (out_insn) rtl is a PARALLEL
expression containing multiple sets.  The way I read this code, we are
requiring that every one of the producer's parallel SET instructions
produce the same value that is to be consumed in order to qualify this
sequence as a "store data bypass".  That seems wrong to me.  I would
expect that we only need "one" of the produced values to match the
consumed value in order to qualify for the "store data bypass"
optimization.  Please explain.  (The same confusing behavior happens
below in the same function, in the case that the consumer rtl is a
PARALLEL expression of multiple SETs: we require that every producer's
stored value match every consumer's fetched value.)

2. A "bigger" concern is that any time any SETs are buried within a
PARALLEL tree, I'm not sure the answer produced by this function, as
currently implemented, is at all reliable:

 a) PARALLEL does not necessarily mean all of its subtrees happen in
parallel on hardware.  It just means that there is no sequencing imposed
by the source code, so the final order in which the multiple subtrees
beneath the PARALLEL node is not known at this stage of compilation.

 b) It seems to me that it doesn't really make sense to speak of whether
a whole bunch of producers combined with a whole bunch of consumers
qualify for an optimized store data bypass latency.  If we say that they
do qualify (as a group), which pair(s) of producer and consumer machine
instructions qualify?  It seems we need to know which producer matches
with which consumer in order to know where the bypass latencies "fit"
into the schedule.

 c) Furthermore, if it turns out that the "arbitrary" order in which the
producer instructions and consumer instructions are emitted places too
much "distance" between a producer and the matching consumer, then it is
possible that by the time the hardware executes the consumer, the stored
value is no longer in the write buffer, so even though we might have
"thought" two PARALLEL rtl expressions qualified for the store bypass
optimization, we really should have returned false.

Can someone help me understand this better?

Thanks much.


-- 
Kelvin Nilsen, Ph.D.  kdnil...@linux.vnet.ibm.com
home office: 801-756-4821, cell: 520-991-6727
IBM Linux Technology Center - PPC Toolchain



[PATCH] PR80101: Fix ICE in store_data_bypass_p

2017-04-06 Thread Kelvin Nilsen

[This is a repost of a patch previously posted on 3/29/2017.
Eric, I hope you might consider that this falls within your scope
of maintenance.  Thanks.]

This problem reports an assertion error when certain rtl expressions
which are not eligible as producers or consumers of a store bypass
optimization are passed as arguments to the store_data_bypass_p
function.  The proposed patch returns false from store_data_bypass_p
rather than terminating with an assertion error.  False indicates that
the passed arguments are not eligible for the store bypass scheduling
optimization.

The patch has been boostrapped without regressions on
powerpc64le-unknown-linux-gnu.  Is this ok for the trunk?

gcc/ChangeLog:

2017-03-29  Kelvin Nilsen  <kel...@gcc.gnu.org>

PR target/80101
* recog.c (store_data_bypass_p): Rather than terminate with
assertion error, return false if either function argument is not a
single_set or a PARALLEL with SETs inside.

gcc/testsuite/ChangeLog:

2017-03-29  Kelvin Nilsen  <kel...@gcc.gnu.org>

PR target/80101
* gcc.target/powerpc/pr80101-1.c: New test.


Index: gcc/recog.c
===
--- gcc/recog.c (revision 246469)
+++ gcc/recog.c (working copy)
@@ -3663,9 +3663,12 @@ peephole2_optimize (void)

 /* Common predicates for use with define_bypass.  */

-/* True if the dependency between OUT_INSN and IN_INSN is on the store
-   data not the address operand(s) of the store.  IN_INSN and OUT_INSN
-   must be either a single_set or a PARALLEL with SETs inside.  */
+/* Returns true if the dependency between OUT_INSN and IN_INSN is on
+   the stored data, false if there is no dependency.  Note that a
+   consumer instruction that loads only the address (rather than the
+   value) stored by a producer instruction does not represent a
+   dependency.  If IN_INSN or OUT_INSN are not a single_set or a
+   PARALLEL with SETs inside, this function returns false.  */

 int
 store_data_bypass_p (rtx_insn *out_insn, rtx_insn *in_insn)
@@ -3701,7 +3704,8 @@ store_data_bypass_p (rtx_insn *out_insn, rtx_insn
 if (GET_CODE (out_exp) == CLOBBER)
   continue;

-gcc_assert (GET_CODE (out_exp) == SET);
+   if (GET_CODE (out_exp) != SET)
+ return false;

 if (reg_mentioned_p (SET_DEST (out_exp), SET_DEST (in_set)))
   return false;
@@ -3711,7 +3715,8 @@ store_data_bypass_p (rtx_insn *out_insn, rtx_insn
   else
 {
   in_pat = PATTERN (in_insn);
-  gcc_assert (GET_CODE (in_pat) == PARALLEL);
+  if (GET_CODE (in_pat) != PARALLEL)
+   return false;

   for (i = 0; i < XVECLEN (in_pat, 0); i++)
{
@@ -3720,7 +3725,8 @@ store_data_bypass_p (rtx_insn *out_insn, rtx_insn
  if (GET_CODE (in_exp) == CLOBBER)
continue;

- gcc_assert (GET_CODE (in_exp) == SET);
+ if (GET_CODE (in_exp) != SET)
+   return false;

  if (!MEM_P (SET_DEST (in_exp)))
return false;
@@ -3734,7 +3740,8 @@ store_data_bypass_p (rtx_insn *out_insn, rtx_insn
   else
 {
   out_pat = PATTERN (out_insn);
-  gcc_assert (GET_CODE (out_pat) == PARALLEL);
+ if (GET_CODE (out_pat) != PARALLEL)
+   return false;

   for (j = 0; j < XVECLEN (out_pat, 0); j++)
 {
@@ -3743,7 +3750,8 @@ store_data_bypass_p (rtx_insn *out_insn, rtx_insn
   if (GET_CODE (out_exp) == CLOBBER)
 continue;

-  gcc_assert (GET_CODE (out_exp) == SET);
+ if (GET_CODE (out_exp) != SET)
+   return false;

   if (reg_mentioned_p (SET_DEST (out_exp), SET_DEST (in_exp)))
 return false;
Index: gcc/testsuite/gcc.target/powerpc/pr80101-1.c
===
--- gcc/testsuite/gcc.target/powerpc/pr80101-1.c(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr80101-1.c(working copy)
@@ -0,0 +1,22 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power6" } } */
+/* { dg-require-effective-target dfp_hw } */
+/* { dg-options "-mcpu=power6 -mno-sched-epilog -Ofast" } */
+
+/* Prior to resolving PR 80101, this test case resulted in an internal
+   compiler error.  The role of this test program is to assure that
+   dejagnu's "test for excess errors" does not find any.  */
+
+int b;
+
+void e ();
+
+int c ()
+{
+  struct
+  {
+int a[b];
+  } d;
+  if (d.a[0])
+e ();
+}



[PATCH v2,rs6000] PR80108: Fix ICE with cross compiler

2017-04-06 Thread Kelvin Nilsen
I am reposting this patch, previously posted just moments ago, to
correct the subject so that it clarifies that this is a rs6000-specific
patch.  Thanks.

PR 80108 describes an ICE that occurs on an existing test program when
compiled with a particular combination of target options.

This patch fixes the compiler to reject that particular combination of
target options since it is not meaningful and duplicates the offending
test case with a dg-options directive to exercise the problematic
command-line options.

Thanks to feedback from Pat Haugen, Michael Meissner, and Segher
Boessenkool, version 2 of this proposed patch integrates the following
refinements:

1. Issue an error message when -mpower9-minmax is used in combination
   with -mcpu=power9 if specific prerequisite target options have been
   explicitly disabled.

2. Change the exclude-opts clause on the test case's dg-skip-if
   directive from -mcpu=power9 to -mcpu=405.  (This was a
   copy-and-paste error when this line was borrowed from a
   different test program.)

3. Remove -m32 from the dg-options directive.  Though this target
   option had been specified in the original problem report, subsequent
   testing confirmed that the original ICE occurs independent of this
   option.  Eliminating this option allows the regression test to be
   exercised in more more contexts.

This patch has been bootstrapped and tested with no regressions on both
powerpc64-unknown-linux-gnu and powerpc64le-unknown-linux-gnu.  Is
this ok for the trunk?

gcc/ChangeLog:

2017-04-06  Kelvin Nilsen  <kel...@gcc.gnu.org>

PR target/80108
* config/rs6000/rs6000.c (rs6000_option_override_internal):
Enhance special handling given to the TARGET_P9_MINMAX option in
relation to certain other options.

gcc/testsuite/ChangeLog:

2017-04-06  Kelvin Nilsen  <kel...@gcc.gnu.org>

PR target/80108
* gcc.target/powerpc/ppc-fortran/ppc-fortran.exp: New file.
* gcc.target/powerpc/ppc-fortran/pr80108-1.f90: New test.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 246573)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -4273,8 +4273,40 @@ rs6000_option_override_internal (bool global_init_
   /* For the newer switches (vsx, dfp, etc.) set some of the older options,
  unless the user explicitly used the -mno- to disable the
code.  */
   if (TARGET_P9_VECTOR || TARGET_MODULO || TARGET_P9_DFORM_SCALAR
-  || TARGET_P9_DFORM_VECTOR || TARGET_P9_DFORM_BOTH > 0 ||
TARGET_P9_MINMAX)
+  || TARGET_P9_DFORM_VECTOR || TARGET_P9_DFORM_BOTH > 0)
 rs6000_isa_flags |= (ISA_3_0_MASKS_SERVER &
~rs6000_isa_flags_explicit);
+  else if (TARGET_P9_MINMAX)
+{
+  if (have_cpu)
+   {
+ if (cpu_index == PROCESSOR_POWER9)
+   {
+ /* legacy behavior: allow -mcpu-power9 with certain
+capabilities explicitly disabled.  */
+ rs6000_isa_flags |=
+   (ISA_3_0_MASKS_SERVER & ~rs6000_isa_flags_explicit);
+ /* However, reject this automatic fix if certain
+capabilities required for TARGET_P9_MINMAX support
+have been explicitly disabled.  */
+ if (((OPTION_MASK_VSX | OPTION_MASK_UPPER_REGS_SF
+   | OPTION_MASK_UPPER_REGS_DF) & rs6000_isa_flags)
+ != (OPTION_MASK_VSX | OPTION_MASK_UPPER_REGS_SF
+  | OPTION_MASK_UPPER_REGS_DF))
+   error ("-mpower9-minmax incompatible with explicitly disabled 
options");
+   }
+ else
+   error ("Power9 target option is incompatible with -mcpu= for "
+  " less than power9");
+   }
+  else if ((ISA_3_0_MASKS_SERVER & rs6000_isa_flags_explicit)
+  != (ISA_3_0_MASKS_SERVER & rs6000_isa_flags
+  & rs6000_isa_flags_explicit))
+   /* Enforce that none of the ISA_3_0_MASKS_SERVER flags
+  were explicitly cleared.  */
+   error ("-mpower9-minmax incompatible with explicitly disabled options");
+  else
+   rs6000_isa_flags |= ISA_3_0_MASKS_SERVER;
+}
   else if (TARGET_P8_VECTOR || TARGET_DIRECT_MOVE || TARGET_CRYPTO)
 rs6000_isa_flags |= (ISA_2_7_MASKS_SERVER &
~rs6000_isa_flags_explicit);
   else if (TARGET_VSX)
Index: gcc/testsuite/gcc.target/powerpc/ppc-fortran/ppc-fortran.exp
===
--- gcc/testsuite/gcc.target/powerpc/ppc-fortran/ppc-fortran.exp
(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/ppc-fortran/ppc-fortran.exp
(revision 246624)
@@ -0,0 +1,65 @@
+#   Copyright (C) 2004-2017 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free 

[PATCH v2] PR80108: Fix ICE with cross compiler

2017-04-06 Thread Kelvin Nilsen

PR 80108 describes an ICE that occurs on an existing test program when
compiled with a particular combination of target options.

This patch fixes the compiler to reject that particular combination of
target options since it is not meaningful and duplicates the offending
test case with a dg-options directive to exercise the problematic
command-line options.

Thanks to feedback from Pat Haugen, Michael Meissner, and Segher
Boessenkool, version 2 of this proposed patch integrates the following
refinements:

1. Issue an error message when -mpower9-minmax is used in combination
   with -mcpu=power9 if specific prerequisite target options have been
   explicitly disabled.

2. Change the exclude-opts clause on the test case's dg-skip-if
   directive from -mcpu=power9 to -mcpu=405.  (This was a
   copy-and-paste error when this line was borrowed from a
   different test program.)

3. Remove -m32 from the dg-options directive.  Though this target
   option had been specified in the original problem report, subsequent
   testing confirmed that the original ICE occurs independent of this
   option.  Eliminating this option allows the regression test to be
   exercised in more more contexts.

This patch has been bootstrapped and tested with no regressions on both
powerpc64-unknown-linux-gnu and powerpc64le-unknown-linux-gnu.  Is
this ok for the trunk?

gcc/ChangeLog:

2017-04-06  Kelvin Nilsen  <kel...@gcc.gnu.org>

PR target/80108
* config/rs6000/rs6000.c (rs6000_option_override_internal):
Enhance special handling given to the TARGET_P9_MINMAX option in
relation to certain other options.

gcc/testsuite/ChangeLog:

2017-04-06  Kelvin Nilsen  <kel...@gcc.gnu.org>

PR target/80108
* gcc.target/powerpc/ppc-fortran/ppc-fortran.exp: New file.
* gcc.target/powerpc/ppc-fortran/pr80108-1.f90: New test.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 246573)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -4273,8 +4273,40 @@ rs6000_option_override_internal (bool global_init_
   /* For the newer switches (vsx, dfp, etc.) set some of the older options,
  unless the user explicitly used the -mno- to disable the code.  */
   if (TARGET_P9_VECTOR || TARGET_MODULO || TARGET_P9_DFORM_SCALAR
-  || TARGET_P9_DFORM_VECTOR || TARGET_P9_DFORM_BOTH > 0 || 
TARGET_P9_MINMAX)
+  || TARGET_P9_DFORM_VECTOR || TARGET_P9_DFORM_BOTH > 0)
 rs6000_isa_flags |= (ISA_3_0_MASKS_SERVER & ~rs6000_isa_flags_explicit);
+  else if (TARGET_P9_MINMAX)
+{
+  if (have_cpu)
+   {
+ if (cpu_index == PROCESSOR_POWER9)
+   {
+ /* legacy behavior: allow -mcpu-power9 with certain
+capabilities explicitly disabled.  */
+ rs6000_isa_flags |=
+   (ISA_3_0_MASKS_SERVER & ~rs6000_isa_flags_explicit);
+ /* However, reject this automatic fix if certain
+capabilities required for TARGET_P9_MINMAX support
+have been explicitly disabled.  */
+ if (((OPTION_MASK_VSX | OPTION_MASK_UPPER_REGS_SF
+   | OPTION_MASK_UPPER_REGS_DF) & rs6000_isa_flags)
+ != (OPTION_MASK_VSX | OPTION_MASK_UPPER_REGS_SF
+  | OPTION_MASK_UPPER_REGS_DF))
+   error ("-mpower9-minmax incompatible with explicitly disabled 
options");
+   }
+ else
+   error ("Power9 target option is incompatible with -mcpu= for "
+  " less than power9");
+   }
+  else if ((ISA_3_0_MASKS_SERVER & rs6000_isa_flags_explicit)
+  != (ISA_3_0_MASKS_SERVER & rs6000_isa_flags
+  & rs6000_isa_flags_explicit))
+   /* Enforce that none of the ISA_3_0_MASKS_SERVER flags
+  were explicitly cleared.  */
+   error ("-mpower9-minmax incompatible with explicitly disabled options");
+  else
+   rs6000_isa_flags |= ISA_3_0_MASKS_SERVER;
+}
   else if (TARGET_P8_VECTOR || TARGET_DIRECT_MOVE || TARGET_CRYPTO)
 rs6000_isa_flags |= (ISA_2_7_MASKS_SERVER & ~rs6000_isa_flags_explicit);
   else if (TARGET_VSX)
Index: gcc/testsuite/gcc.target/powerpc/ppc-fortran/ppc-fortran.exp
===
--- gcc/testsuite/gcc.target/powerpc/ppc-fortran/ppc-fortran.exp
(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/ppc-fortran/ppc-fortran.exp
(revision 246624)
@@ -0,0 +1,65 @@
+#   Copyright (C) 2004-2017 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+# 
+# This program is distributed in the hope 

[PATCH,rs6000] PR80108: Fix ICE with cross compiler

2017-03-31 Thread Kelvin Nilsen

PR 80108 describes an ICE that occurs on an existing test program when
compiled with a particular combination of target options.  

This patch fixes the compiler to reject that particular combination of
target options since it is not meaningful and duplicates the offending
test case with a dg-options directive to exercise the problematic
command-line options.

This patch has been bootstrapped and tested with no regressions on both
powerpc64-unknown-linux-gnu and powerpc64le-unknown-linux-gnu.  Is
this ok for the trunk?


gcc/ChangeLog:

2017-03-31  Kelvin Nilsen  <kel...@gcc.gnu.org>

PR target/80108
* config/rs6000/rs6000.c (rs6000_option_override_internal):
Enhance special handling given to the TARGET_P9_MINMAX option in
relation to certain other options.

gcc/testsuite/ChangeLog:

2017-03-31  Kelvin Nilsen  <kel...@gcc.gnu.org>

PR target/80108
* gcc.target/powerpc/ppc-fortran/ppc-fortran.exp: New file.
* gcc.target/powerpc/ppc-fortran/pr80108-1.f90: New test.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 246573)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -4273,8 +4273,30 @@ rs6000_option_override_internal (bool global_init_
   /* For the newer switches (vsx, dfp, etc.) set some of the older options,
  unless the user explicitly used the -mno- to disable the code.  */
   if (TARGET_P9_VECTOR || TARGET_MODULO || TARGET_P9_DFORM_SCALAR
-  || TARGET_P9_DFORM_VECTOR || TARGET_P9_DFORM_BOTH > 0 || 
TARGET_P9_MINMAX)
+  || TARGET_P9_DFORM_VECTOR || TARGET_P9_DFORM_BOTH > 0)
 rs6000_isa_flags |= (ISA_3_0_MASKS_SERVER & ~rs6000_isa_flags_explicit);
+  else if (TARGET_P9_MINMAX)
+{
+  if (have_cpu)
+   {
+ if (cpu_index == PROCESSOR_POWER9)
+   /* legacy behavior: allow -mcpu-power9 with certain capabilities
+  (eg -mno-vsx) explicitly disabled.  */
+   rs6000_isa_flags |=
+ (ISA_3_0_MASKS_SERVER & ~rs6000_isa_flags_explicit);
+ else
+   error ("Power9 target option is incompatible with -mcpu= for "
+  " less than power9");
+   }
+  else if ((ISA_3_0_MASKS_SERVER & rs6000_isa_flags_explicit)
+  != (ISA_3_0_MASKS_SERVER & rs6000_isa_flags
+  & rs6000_isa_flags_explicit))
+   /* Enforce that none of the ISA_3_0_MASKS_SERVER flags
+  were explicitly cleared.  */
+   error ("-mpower9-minmax incompatible with explicitly disabled options");
+  else
+   rs6000_isa_flags |= ISA_3_0_MASKS_SERVER;
+}
   else if (TARGET_P8_VECTOR || TARGET_DIRECT_MOVE || TARGET_CRYPTO)
 rs6000_isa_flags |= (ISA_2_7_MASKS_SERVER & ~rs6000_isa_flags_explicit);
   else if (TARGET_VSX)
Index: gcc/testsuite/gcc.target/powerpc/ppc-fortran/ppc-fortran.exp
===
--- gcc/testsuite/gcc.target/powerpc/ppc-fortran/ppc-fortran.exp
(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/ppc-fortran/ppc-fortran.exp
(revision 246624)
@@ -0,0 +1,65 @@
+#   Copyright (C) 2004-2017 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+# 
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+# 
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# <http://www.gnu.org/licenses/>.
+
+# GCC testsuite that uses the `dg.exp' driver.
+
+# Load support procs.
+load_lib gfortran-dg.exp
+
+# If a testcase doesn't have special options, use these.
+global DEFAULT_FFLAGS
+if ![info exists DEFAULT_FFLAGS] then {
+set DEFAULT_FFLAGS " -pedantic-errors"
+}
+
+# Initialize `dg'.
+dg-init
+
+global gfortran_test_path
+global gfortran_aux_module_flags
+set gfortran_test_path $srcdir/$subdir
+set gfortran_aux_module_flags $DEFAULT_FFLAGS
+proc dg-compile-aux-modules { args } {
+global gfortran_test_path
+global gfortran_aux_module_flags
+if { [llength $args] != 2 } {
+   error "dg-set-target-env-var: needs one argument"
+   return
+}
+
+set level [info level]
+if { [info procs dg-save-unknown] != [list] } {
+   rename dg-save-unknown dg-save-unknown-level-$level
+}
+
+dg-test $gfortran_test_path/[lindex $args 1] "" $gfortran_aux_module_flags
+# cleanup-modules is intentionally not invoked here.
+
+if { [info p

[PATCH] PR80101: Fix ICE in store_data_bypass_p

2017-03-29 Thread Kelvin Nilsen

This problem reports an assertion error when certain rtl expressions
which are not eligible as producers or consumers of a store bypass
optimization are passed as arguments to the store_data_bypass_p
function.  The proposed patch returns false from store_data_bypass_p
rather than terminating with an assertion error.  False indicates that
the passed arguments are not eligible for the store bypass scheduling
optimization.

The patch has been boostrapped without regressions on
powerpc64le-unknown-linux-gnu.  Is this ok for the trunk?

gcc/ChangeLog:

2017-03-29  Kelvin Nilsen  <kel...@gcc.gnu.org>

PR target/80101
* recog.c (store_data_bypass_p): Rather than terminate with
assertion error, return false if either function argument is not a
single_set or a PARALLEL with SETs inside.

gcc/testsuite/ChangeLog:

2017-03-29  Kelvin Nilsen  <kel...@gcc.gnu.org>

PR target/80101
* gcc.target/powerpc/pr80101-1.c: New test.


Index: gcc/recog.c
===
--- gcc/recog.c (revision 246469)
+++ gcc/recog.c (working copy)
@@ -3663,9 +3663,12 @@ peephole2_optimize (void)
 
 /* Common predicates for use with define_bypass.  */
 
-/* True if the dependency between OUT_INSN and IN_INSN is on the store
-   data not the address operand(s) of the store.  IN_INSN and OUT_INSN
-   must be either a single_set or a PARALLEL with SETs inside.  */
+/* Returns true if the dependency between OUT_INSN and IN_INSN is on
+   the stored data, false if there is no dependency.  Note that a
+   consumer instruction that loads only the address (rather than the
+   value) stored by a producer instruction does not represent a
+   dependency.  If IN_INSN or OUT_INSN are not a single_set or a
+   PARALLEL with SETs inside, this function returns false.  */
 
 int
 store_data_bypass_p (rtx_insn *out_insn, rtx_insn *in_insn)
@@ -3701,7 +3704,8 @@ store_data_bypass_p (rtx_insn *out_insn, rtx_insn
 if (GET_CODE (out_exp) == CLOBBER)
   continue;
 
-gcc_assert (GET_CODE (out_exp) == SET);
+   if (GET_CODE (out_exp) != SET)
+ return false;
 
 if (reg_mentioned_p (SET_DEST (out_exp), SET_DEST (in_set)))
   return false;
@@ -3711,7 +3715,8 @@ store_data_bypass_p (rtx_insn *out_insn, rtx_insn
   else
 {
   in_pat = PATTERN (in_insn);
-  gcc_assert (GET_CODE (in_pat) == PARALLEL);
+  if (GET_CODE (in_pat) != PARALLEL)
+   return false;
 
   for (i = 0; i < XVECLEN (in_pat, 0); i++)
{
@@ -3720,7 +3725,8 @@ store_data_bypass_p (rtx_insn *out_insn, rtx_insn
  if (GET_CODE (in_exp) == CLOBBER)
continue;
 
- gcc_assert (GET_CODE (in_exp) == SET);
+ if (GET_CODE (in_exp) != SET)
+   return false;
 
  if (!MEM_P (SET_DEST (in_exp)))
return false;
@@ -3734,7 +3740,8 @@ store_data_bypass_p (rtx_insn *out_insn, rtx_insn
   else
 {
   out_pat = PATTERN (out_insn);
-  gcc_assert (GET_CODE (out_pat) == PARALLEL);
+ if (GET_CODE (out_pat) != PARALLEL)
+   return false;
 
   for (j = 0; j < XVECLEN (out_pat, 0); j++)
 {
@@ -3743,7 +3750,8 @@ store_data_bypass_p (rtx_insn *out_insn, rtx_insn
   if (GET_CODE (out_exp) == CLOBBER)
 continue;
 
-  gcc_assert (GET_CODE (out_exp) == SET);
+ if (GET_CODE (out_exp) != SET)
+   return false;
 
   if (reg_mentioned_p (SET_DEST (out_exp), SET_DEST (in_exp)))
 return false;
Index: gcc/testsuite/gcc.target/powerpc/pr80101-1.c
===
--- gcc/testsuite/gcc.target/powerpc/pr80101-1.c(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr80101-1.c(working copy)
@@ -0,0 +1,22 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power6" } } */
+/* { dg-require-effective-target dfp_hw } */
+/* { dg-options "-mcpu=power6 -mno-sched-epilog -Ofast" } */
+
+/* Prior to resolving PR 80101, this test case resulted in an internal
+   compiler error.  The role of this test program is to assure that
+   dejagnu's "test for excess errors" does not find any.  */
+
+int b;
+
+void e ();
+
+int c ()
+{
+  struct
+  {
+int a[b];
+  } d;
+  if (d.a[0])
+e ();
+}



[PATCH,rs6000] PR80103: Fix ICE with cross compiler

2017-03-24 Thread Kelvin Nilsen

PR 80103 provides a test case which results in an internal
compiler error when invoked with -mno-direct-move -mpower9-dform-
vector target options.  The internal compiler error results because
these two target options are incompatible with each other.

The enclosed patch simply disables this particular combination of
target options, terminating gcc with an error message instead of
producing an internal compiler error.  Additionally, this patch
includes new comments to address omissions from a patch committed
on 2017/03/23 which deals with conflicts between the 
-mno-power9-vector and -mcpu=power9 target options.

This patch has been bootstrapped and tested with no regressions on
both powerpc64-unknown-linux-gnu and powerpc64le-unknown-linux-gnu.
Is this ok for the trunk?

gcc/testsuite/ChangeLog:

2017-03-24  Kelvin Nilsen  <kel...@gcc.gnu.org>

PR target/80103
* gcc.target/powerpc/pr80103-1.c: New test.


gcc/ChangeLog:

2017-03-24  Kelvin Nilsen  <kel...@gcc.gnu.org>

PR target/80103
* config/rs6000/rs6000-c.c (rs6000_target_modify_macros): Edit and
add comments.
* config/rs6000/rs6000.c (rs6000_option_override_internal): Add
special handling for target option conflicts between dform
options (-mpower9-dform, -mpower9-dform-vector,
-mpower9-dform-scalar) and -mno-direct-move.

Index: gcc/config/rs6000/rs6000-c.c
===
--- gcc/config/rs6000/rs6000-c.c(revision 246406)
+++ gcc/config/rs6000/rs6000-c.c(working copy)
@@ -429,6 +429,12 @@ rs6000_target_modify_macros (bool define_p, HOST_W
   if ((flags & OPTION_MASK_POPCNTD) != 0)
 rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR7");
   /* Note that the OPTION_MASK_DIRECT_MOVE flag is automatically
+ turned on in the following condition:
+ 1. TARGET_P9_DFORM_SCALAR or TARGET_P9_DFORM_VECTOR are enabled
+and OPTION_MASK_DIRECT_MOVE is not explicitly disabled.
+Hereafter, the OPTION_MASK_DIRECT_MOVE flag is considered to
+have been turned on explicitly.
+ Note that the OPTION_MASK_DIRECT_MOVE flag is automatically
  turned off in any of the following conditions:
  1. TARGET_HARD_FLOAT, TARGET_ALTIVEC, or TARGET_VSX is explicitly
disabled and OPTION_MASK_DIRECT_MOVE was not explicitly
@@ -473,8 +479,13 @@ rs6000_target_modify_macros (bool define_p, HOST_W
   if (!flag_iso)
rs6000_define_or_undefine_macro (define_p, "__APPLE_ALTIVEC__");
 }
-  /* Note that the OPTION_MASK_VSX flag is automatically turned off in
+  /* Note that the OPTION_MASK_VSX flag is automatically turned on in
  the following conditions:
+ 1. TARGET_P8_VECTOR is explicitly turned on and the OPTION_MASK_VSX
+was not explicitly turned off.  Hereafter, the OPTION_MASK_VSX
+flag is considered to have been explicitly turned on.
+ Note that the OPTION_MASK_VSX flag is automatically turned off in
+ the following conditions:
  1. The operating system does not support saving of AltiVec
registers (OS_MISSING_ALTIVEC).
  2. If any of the options TARGET_HARD_FLOAT, TARGET_FPRS,
@@ -507,6 +518,12 @@ rs6000_target_modify_macros (bool define_p, HOST_W
   rs6000_define_or_undefine_macro (define_p, "__TM_FENCE__");
 }
   /* Note that the OPTION_MASK_P8_VECTOR flag is automatically turned
+ on in the following conditions:
+ 1. TARGET_P9_VECTOR is explicitly turned on and
+OPTION_MASK_P8_VECTOR is not explicitly turned off.
+Hereafter, the OPTION_MASK_P8_VECTOR flag is considered to
+have been turned off explicitly.
+ Note that the OPTION_MASK_P8_VECTOR flag is automatically turned
  off in the following conditions:
  1. If any of TARGET_HARD_FLOAT, TARGET_ALTIVEC, or TARGET_VSX
were turned off explicitly and OPTION_MASK_P8_VECTOR flag was
@@ -514,15 +531,24 @@ rs6000_target_modify_macros (bool define_p, HOST_W
  2. If TARGET_ALTIVEC is turned off.  Hereafter, the
OPTION_MASK_P8_VECTOR flag is considered to have been turned off
explicitly.
- 3. If TARGET_VSX is turned off.  Hereafter, the OPTION_MASK_P8_VECTOR
-   flag is considered to have been turned off explicitly.  */
+ 3. If TARGET_VSX is turned off and OPTION_MASK_P8_VECTOR was not
+explicitly enabled.  If TARGET_VSX is explicitly enabled, the
+OPTION_MASK_P8_VECTOR flag is hereafter also considered to
+   have been turned off explicitly.  */
   if ((flags & OPTION_MASK_P8_VECTOR) != 0)
 rs6000_define_or_undefine_macro (define_p, "__POWER8_VECTOR__");
   /* Note that the OPTION_MASK_P9_VECTOR flag is automatically turned
  off in the following conditions:
- 1. If TARGET_P8_VECTOR is turned off. Hereafter, the
-   OPTION_MASK_P9_VECTOR flag is considered to have been turned off
- 

Re: [PATCH,rs6000] Handle conflicting target options -mno-power9-vector and -mcpu=power9

2017-03-22 Thread Kelvin Nilsen


On 03/22/2017 05:35 PM, Segher Boessenkool wrote:
> On Wed, Mar 22, 2017 at 11:44:49AM -0600, Kelvin Nilsen wrote:
>> Internal testing recently revealed that use of the -mno-power9-vector
>> target option in combination with the -mcpu=power9 target option
>> results in termination of gcc with the error message:
>>
>>   power9-dform requires power9-vector
> 
>> In both cases, the preferred behavior is that the target option
>> -mno-power9-vector causes power9-dform to be automatically disabled.
>>  This patch implements the preferred behavior and adds a test case to
>> demonstrate the fix.
> 
> Or it could do -mpower9-dform-scalar but disable -mpower9-dform-vector?
> That seems more reasonable.

The internal problem report sent to me said "-mno-power9-vector should
override power9-dform unless the latter has been deliberately specified
by the user."  I'm just following orders.  If you think it preferable to
only override -mpower-dform-vector, I'll make that modification.

> 
> Ideally none of the -mpower9-dform* or -mpower9-vector options would
> exist at all, of course.
> 
>> 2017-03-21  Kelvin Nilsen  <kel...@gcc.gnu.org>
>>
>>  * config/rs6000/rs6000.c (rs6000_option_override_internal): Change
>>  handling of certain combinations of target options, including the
>>  combinations -mpower8-vector vs. -mno-vsx, -mpower8-vector vs.
>>  -mno-power8-vector, and -mpower9_dform vs. -mno-power9-vector.
> 
> Those other changes are independent?

Actually, these other changes are not independent.  My initial attempt
at a patch only changed the behavior of -mpower9_dform vs.
-mno-power9-vector.  But this actually resulted in a regression of an
existing test.  To "properly" handle the new case without impacting
existing "established" behavior (as represented in the existing dejagnu
testsuite), I had to make these other changes as well.


> 
> 
> Segher
> 
> 

-- 
Kelvin Nilsen, Ph.D.  kdnil...@linux.vnet.ibm.com
home office: 801-756-4821, cell: 520-991-6727
IBM Linux Technology Center - PPC Toolchain



[PATCH,rs6000] Handle conflicting target options -mno-power9-vector and -mcpu=power9

2017-03-22 Thread Kelvin Nilsen

Internal testing recently revealed that use of the -mno-power9-vector
target option in combination with the -mcpu=power9 target option
results in termination of gcc with the error message:

  power9-dform requires power9-vector

This same problem is seen if the -mno-power9-vector target option is
specified to a gcc which was built using --with-cpu=power9 as an
argument to configure.

In both cases, the preferred behavior is that the target option
-mno-power9-vector causes power9-dform to be automatically disabled.
 This patch implements the preferred behavior and adds a test case to
demonstrate the fix.

The patch has been bootstrapped and tested with no regressions on both
powerpc64-unknown-linux-gnu and powerpc64le-unknown-linux-gnu.  Is this
ok for the trunk?

gcc/testsuite/ChangeLog:

2017-03-21  Kelvin Nilsen  <kel...@gcc.gnu.org>

* gcc.target/powerpc/p9-options-1.c: New test.

gcc/ChangeLog:

2017-03-21  Kelvin Nilsen  <kel...@gcc.gnu.org>

* config/rs6000/rs6000.c (rs6000_option_override_internal): Change
handling of certain combinations of target options, including the
combinations -mpower8-vector vs. -mno-vsx, -mpower8-vector vs.
-mno-power8-vector, and -mpower9_dform vs. -mno-power9-vector.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 246212)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -4246,9 +4246,22 @@ rs6000_option_override_internal (bool global_init_
 
   if (TARGET_P8_VECTOR && !TARGET_VSX)
 {
-  if (rs6000_isa_flags_explicit & OPTION_MASK_P8_VECTOR)
+  if ((rs6000_isa_flags_explicit & OPTION_MASK_P8_VECTOR)
+ && (rs6000_isa_flags_explicit & OPTION_MASK_VSX))
error ("-mpower8-vector requires -mvsx");
-  rs6000_isa_flags &= ~OPTION_MASK_P8_VECTOR;
+  else if ((rs6000_isa_flags_explicit & OPTION_MASK_P8_VECTOR) == 0)
+   {
+ rs6000_isa_flags &= ~OPTION_MASK_P8_VECTOR;
+ if (rs6000_isa_flags_explicit & OPTION_MASK_VSX)
+   rs6000_isa_flags_explicit |= OPTION_MASK_P8_VECTOR;
+   }
+  else
+   {
+ /* OPTION_MASK_P8_VECTOR is explicit, and OPTION_MASK_VSX is
+not explicit.  */
+ rs6000_isa_flags |= OPTION_MASK_VSX;
+ rs6000_isa_flags_explicit |= OPTION_MASK_VSX;
+   }
 }
 
   if (TARGET_VSX_TIMODE && !TARGET_VSX)
@@ -4448,9 +4461,22 @@ rs6000_option_override_internal (bool global_init_
 error messages.  However, if users have managed to select
 power9-vector without selecting power8-vector, they
 already know about undocumented flags.  */
-  if (rs6000_isa_flags_explicit & OPTION_MASK_P8_VECTOR)
+  if ((rs6000_isa_flags_explicit & OPTION_MASK_P9_VECTOR) &&
+ (rs6000_isa_flags_explicit & OPTION_MASK_P8_VECTOR))
error ("-mpower9-vector requires -mpower8-vector");
-  rs6000_isa_flags &= ~OPTION_MASK_P9_VECTOR;
+  else if ((rs6000_isa_flags_explicit & OPTION_MASK_P9_VECTOR) == 0)
+   {
+ rs6000_isa_flags &= ~OPTION_MASK_P9_VECTOR;
+ if (rs6000_isa_flags_explicit & OPTION_MASK_P8_VECTOR)
+   rs6000_isa_flags_explicit |= OPTION_MASK_P9_VECTOR;
+   }
+  else
+   {
+ /* OPTION_MASK_P9_VECTOR is explicit and
+OPTION_MASK_P8_VECTOR is not explicit.  */
+ rs6000_isa_flags |= OPTION_MASK_P8_VECTOR;
+ rs6000_isa_flags_explicit |= OPTION_MASK_P8_VECTOR;
+   }
 }
 
   /* -mpower9-dform turns on both -mpower9-dform-scalar and
@@ -4479,10 +4505,25 @@ rs6000_option_override_internal (bool global_init_
 error messages.  However, if users have managed to select
 power9-dform without selecting power9-vector, they
 already know about undocumented flags.  */
-  if (rs6000_isa_flags_explicit & OPTION_MASK_P9_VECTOR)
+  if ((rs6000_isa_flags_explicit & OPTION_MASK_P9_VECTOR)
+ && (rs6000_isa_flags_explicit & (OPTION_MASK_P9_DFORM_SCALAR
+  | OPTION_MASK_P9_DFORM_VECTOR)))
error ("-mpower9-dform requires -mpower9-vector");
-  rs6000_isa_flags &= ~(OPTION_MASK_P9_DFORM_SCALAR
-   | OPTION_MASK_P9_DFORM_VECTOR);
+  else if (rs6000_isa_flags_explicit & OPTION_MASK_P9_VECTOR)
+   {
+ rs6000_isa_flags &=
+   ~(OPTION_MASK_P9_DFORM_SCALAR | OPTION_MASK_P9_DFORM_VECTOR);
+ rs6000_isa_flags_explicit |=
+   (OPTION_MASK_P9_DFORM_SCALAR | OPTION_MASK_P9_DFORM_VECTOR);
+   }
+  else
+   {
+ /* We know that OPTION_MASK_P9_VECTOR is not explicit and
+OPTION_MASK_P9_DFORM_SCALAR or OPTION_MASK_P9_DORM_VECTOR
+may be explicit.  */
+ rs6000_isa_flags |= OPTION_MASK_P9_VEC

[PATCH,rs6000] Add documentation to describe implicit handling of command-line target options

2017-03-17 Thread Kelvin Nilsen

This patch adds comments to clarify the automatic setting and clearing
of target attribute flags in order to assure consistency between
configuration settings and between multiple interrelated compilation
target options.  Particular attention is given to the target options
that affect the C preprocessor macros that are automatically defined to
denote support is enabled for particular target options.

This patch consists entirely of new comments.  Nevertheless, it has
been bootstrapped on powerpc64le-unknown-linux with no regressions.

Is this ok for trunk?

gcc/ChangeLog:

2017-03-17  Kelvin Nilsen  <kel...@gcc.gnu.org>

* config/rs6000/rs6000-c.c (rs6000_target_modify_macros): Add
comments.
* config/rs6000/rs6000.c (rs6000_option_override_internal): Add
comments.

Index: gcc/config/rs6000/rs6000-c.c
===
--- gcc/config/rs6000/rs6000-c.c(revision 246086)
+++ gcc/config/rs6000/rs6000-c.c(working copy)
@@ -343,6 +343,71 @@ rs6000_target_modify_macros (bool define_p, HOST_W
 (define_p) ? "define" : "undef",
 flags, bu_mask);
 
+  /* Each of the flags mentioned below controls whether certain
+ preprocessor macros will be automatically defined when
+ preprocessing source files for compilation by this compiler.
+ While most of these flags can be enabled or disabled
+ explicitly by specifying certain command-line options when
+ invoking the compiler, there are also many ways in which these
+ flags are enabled or disabled implicitly, based on compiler
+ defaults, configuration choices, and on the presence of certain
+ related command-line options.  Many, but not all, of these
+ implicit behaviors can be found in file "rs6000.c", the
+ rs6000_option_override_internal() function.
+
+ In general, each of the flags may be automatically enabled in
+ any of the following conditions:
+
+ 1. If no -mcpu target is specified on the command line and no
+   --with-cpu target is specified to the configure command line
+   and the TARGET_DEFAULT macro for this default cpu host
+   includes the flag, and the flag has not been explicitly disabled
+   by command-line options.
+
+ 2. If the target specified with -mcpu=target on the command line, or
+   in the absence of a -mcpu=target command-line option, if the
+   target specified using --with-cpu=target on the configure
+   command line, is disqualified because the associated binary
+   tools (e.g. the assembler) lack support for the requested cpu,
+   and the TARGET_DEFAULT macro for this default cpu host
+   includes the flag, and the flag has not been explicitly disabled
+   by command-line options.
+
+ 3. If either of the above two conditions apply except that the
+   TARGET_DEFAULT macro is defined to equal zero, and
+   TARGET_POWERPC64 and
+   a) BYTES_BIG_ENDIAN and the flag to be enabled is either
+  MASK_PPC_GVXOPT or MASK_POWERPC64 (flags for "powerpc64"
+  target), or
+   b) !BYTES_BIG_ENDIAN and the flag to be enabled is either
+  MASK_POWERPC64 or it is one of the flags included in
+  ISA_2_7_MASKS_SERVER (flags for "powerpc64le" target).
+
+ 4. If a cpu has been requested with a -mcpu=target command-line option
+   and this cpu has not been disqualified due to shortcomings of the
+   binary tools, and the set of flags associated with the requested cpu
+   include the flag to be enabled.  See rs6000-cpus.def for macro
+   definitions that represent various ABI standards
+   (e.g. ISA_2_1_MASKS, ISA_3_0_MASKS_SERVER) and for a list of
+   the specific flags that are associated with each of the cpu
+   choices that can be specified as the target of a -mcpu=target
+   compile option, or as the the target of a --with-cpu=target
+   configure option.  Target flags that are specified in either
+   of these two ways are considered "implicit" since the flags
+   are not mentioned specifically by name.
+
+   Additional documentation describing behavior specific to
+   particular flags is provided below, immediately preceding the
+   use of each relevant flag.
+
+ 5. If there is no -mcpu=target command-line option, and the cpu
+   requested by a --with-cpu=target command-line option has not
+   been disqualified due to shortcomings of the binary tools, and
+   the set of flags associated with the specified target include
+   the flag to be enabled.  See the notes immediately above for a
+   summary of the flags associated with particular cpu
+   definitions.  */
+
   /* rs6000_isa_flags based options.  */
   rs6000_define_or_undefine_macro (define_p, "_ARCH_PPC");
   if ((flags & OPTION_MASK_PPC_GPOP

[PATCH,RS6000] PR79963: Correct which condition code bit represents result of vec_any_eq built-in function

2017-03-14 Thread Kelvin Nilsen

This patch corrects several errors in a patch that was submitted on
2017-03-01.  A copy-and-paste error in the previous patch resulted in
accidental use of the lt flag instead of the eq flag to represent the
outcome of the vec_any_eq built-in function.  Also, in reviewing the
code of the previous patch, it was discovered that changes to the C++
templates representing the vec_all_ne and vec_any_eq built-in functions
were incomplete.

This patch has bootstrapped and been tested on
powerpc64le-unknown-linux with no regressions.

Is this ok for trunk?

gcc/ChangeLog:

2017-03-14  Kelvin Nilsen  <kel...@gcc.gnu.org>

PR target/79963
* config/rs6000/altivec.h (vec_all_ne): Under __cplusplus++ and
__POWER9_VECTOR__ #ifdef control, change template definition to
use Power9-specific built-in function.
(vec_any_eq): Likewise.
* config/rs6000/vector.md (vector_ae_v2di_p): Change the flag used
to control outcomes from this test.
(vector_ae_p): For VEC_F modes, likewise.

Index: gcc/config/rs6000/altivec.h
===
--- gcc/config/rs6000/altivec.h (revision 246096)
+++ gcc/config/rs6000/altivec.h (working copy)
@@ -521,9 +521,9 @@ __altivec_scalar_pred(vec_all_nez,
 __altivec_scalar_pred(vec_any_eqz,
   __builtin_vec_vcmpnez_p (__CR6_LT_REV, a1, a2))
 __altivec_scalar_pred(vec_all_ne,
-  __builtin_vec_allne_p (a1, a2))
+  __builtin_vec_vcmpne_p (a1, a2))
 __altivec_scalar_pred(vec_any_eq,
-  __builtin_vec_anyeq_p (a1, a2))
+  __builtin_vec_vcmpae_p (a1, a2))
 #endif
 
 __altivec_scalar_pred(vec_any_ne,
Index: gcc/config/rs6000/vector.md
===
--- gcc/config/rs6000/vector.md (revision 246096)
+++ gcc/config/rs6000/vector.md (working copy)
@@ -790,7 +790,7 @@
  (eq:V2DI (match_dup 1)
   (match_dup 2)))])
(set (match_operand:SI 0 "register_operand" "=r")
-   (lt:SI (reg:CC CR6_REGNO)
+   (eq:SI (reg:CC CR6_REGNO)
   (const_int 0)))
(set (match_dup 0)
(xor:SI (match_dup 0)
@@ -837,7 +837,7 @@
  (eq:VEC_F (match_dup 1)
(match_dup 2)))])
(set (match_operand:SI 0 "register_operand" "=r")
-   (lt:SI (reg:CC CR6_REGNO)
+   (eq:SI (reg:CC CR6_REGNO)
   (const_int 0)))
(set (match_dup 0)
(xor:SI (match_dup 0)



[PATCH,rs6000] PR79395: Fix compile error with -mcpu=power9 and -mno-vsx and __builtin_vec_cmpne_p

2017-02-28 Thread Kelvin Nilsen

PR 79395 reports a problem that arises when the preprocessor believes
that the target supports Power9 but the gcc compiler believes that
Power9 is not supported.

This patch addresses this inconsistency by introducing a new
preprocessor macro named __POWER9_VECTOR__ which is automatically
defined if the current gcc configuration, as adjusted by gcc command
line options, supports Power9.  Previously, certain macro definitions
that were supplied in altivec.h were conditioned upon the _ARCH_PWR9
macro, which represents statically whether the compiler can support
Power9, but ignores any command-line options that might disable the
Power9 support in this run of the compiler.  Also addressed in this
patch is elimination of the xvcmpnesp and xvcmpnedp instructions, which
are not currently supported.  

This patch has been demonstrated to fix the problems identified in the
test case mentioned in the PR 79395 report.

This patch has been bootstrapped and tested on
powerpc64le-unknown-linux with no regressions.

Is this ok for trunk?

gcc/ChangeLog:

2017-02-28  Kelvin Nilsen  <kel...@gcc.gnu.org>

PR target/79395
* config/rs6000/altivec.h (vec_ctz and others): Change the
preprocessor macro that controls conditional compilation from
_ARCH_PWR9 to __POWER9_VECTOR.
(vec_all_ne): Change parameterization of __altivec_scalar_pred
macro expansion under preprocessor #ifdef __POWER9_VECTOR__
control (instead of _ARCH_PWR9 control) so that template
definition uses power9-specific function.
(vec_any_eq): Likewise.
(vec_all_ne): Change macro definition to use a power9-specific
expansion under #ifdef __POWER9_VECTOR CONTROL (instead of
_ARCH_PWR9 control).
(vec_any_eq) Likewise.
* config/rs6000/rs6000-builtin.def (CMPNEF): Remove BU_P9V_AV_2
expansion for CMPNEF to remove support for xvcmpnesp instruction.
(CMPNED): Remove BU_P9V_AV2 expansion for CMPNED to remove
support for xvcmpnedp instruction.
(VCMPNEB_P): Replace BU_P9V_AV_P macro expansion with BU_P9V_AV_2
macro expansion so that Power9 implementation of vec_all_ne does
not use the AltiVec predicate framework.
(VCMPNEH_P): Likewise.
(VCMPNEW_P): Likewise.
(VCMPNED_P): Likewise.
(VCMPNEFP_P): Likewise.
(VCMPNEDP_P): Likewise.
(VCMPAEB_P): Add BU_P9V_AV_2 macro expansion to change
implementation of vec_any_eq to not use AltiVec predicate
framework.
(VCMPAEH_P): Likewise.
(VCMPAEW_P): Likewise.
(VCMPAED_P): Likewise.
(VCMPAEFP_P): Likewise.
(VCMPAEDP_P): Likewise.
(VCMPNE_P): Replace BU_P9V_OVERLOAD_P macro expansion with
BU_P9V_OVERLOAD_2 so that Power9 implementation of vec_all_ne does
not use the AltiVec predicate framework.
(VCMPAE_P): Add BU_P9V_OVERLOAD_2 macro to change implementation
of vec_any_eq to not use AltiVec predicate framework.
* config/rs6000/rs6000-c.c (rs6000_target_modify_macros): Add
support for predefined __POWER9_VECTOR__ macro to indicate that
Power9 instruction selection is enabled.
(altivec_overloaded_builtins): Remove extraneous
ALTIVEC_BUILTIN_VEC_CMPNE entry for overloaded 
function argument types RS6000_BTI_bool_V16QI and
RS6000_BTI_bool_V16QI.  Remove erroneous ALTIVEC_BUILTIN_VEC_CMPNE
entry for overloaded function argument types RS6000_BTI_bool_V4SI
andRS6000_BTI_bool_V4SI, mapping to P9V_BUILTIN_CMPNEB.  Remove
two entries mapping to P9V_BUITIN_CMPNED and one entry mapping to
P9V_BUILTIN_CMPNEF to force use of instructions not specific to
Power9 for impelmentations of vec_cmpne.  Change the signature for
all definitions of the overloaded P9V_BUILTIN_VEC_CMPNE_P function
(representing vec_all_ne) to remove the previously described first
argument of type RS6000_BTI_INTSI, as this was an artifact of
reliance on the AltiVec predicate framework, which is no longer
used in the implementation of these functions.  Add
P9V_BUILTIN_VEC_VCMPAE_P entries (representing the vec_anyeq
function) to match all of the P9V_BUILTIN_VEC_VCMNE_P entries
since, unlike the AltiVec predicate framework implementation, we
do not share function descriptors between vec_alle and vec_anyeq.
(altivec_resolve_overloaded_builtin): Add SFmode and DFmode to the
set of modes that receive special treatment even when
TARGET_P9_VECTOR is true.  The special treatment emits code that
does not depend on Power9 instructions.
* config/rs6000/vector.md (vector_ne__p): Change this
define_expand to not rely on AltiVec predicate framework.
(vector_ae_p): New define_expand to represent vec_any_eq
function.
(vector_ne_v2di_p): Change this define_

[PATCH,rs6000] PR78056: Remove unreliable test case

2017-02-17 Thread Kelvin Nilsen

This patch amends a patch merged with the trunk on 2017-01-14.  One of
the new test cases added at that time has proven to be unreliable so
this path removes it.

Is this patch ok for trunk?

gcc/testsuite/ChangeLog:

2017-02-17  Kelvin Nilsen  <kel...@gcc.gnu.org>

PR target/78056
* gcc.target/powerpc/pr78056-8.c: Remove.


Index: gcc/testsuite/gcc.target/powerpc/pr78056-8.c
===
--- gcc/testsuite/gcc.target/powerpc/pr78056-8.c(revision 245539)
+++ gcc/testsuite/gcc.target/powerpc/pr78056-8.c(working copy)
@@ -1,26 +0,0 @@
-/* { dg-do compile { target { powerpc*-*-* } } } */
-/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power5" } } */
-
-/* powerpc_popcntb_ok represents support for power 5.  */
-/* { dg-require-effective-target powerpc_popcntb_ok } */
-/* dfp_hw represents support for power 6.  */
-/* { dg-skip-if "" { dfp_hw } } */
-/* { dg-skip-if "" { powerpc*-*-aix* } } */
-/* { dg-options "-mcpu=power5" } */
-
-/* This test follows the pattern of pr78056-2.c, which has been
- * exercised with binutils 2.25.  This test, however, has not
- * been exercised because the author of the test does not have access
- * to a development environment that succesfully bootstraps gcc
- * while at the same lacking assembler support for power 6.  */
-
-/* This test should succeed on both 32- and 64-bit configurations.  */
-/* Though the command line specifies power5 target, this function is
-   to support power6. Expect an error message here because this target
-   does not support power6.  */
-__attribute__((target("cpu=power6")))
-/* fabs/fnabs/fsel */
-double normal1 (double a, double b)
-{ /* { dg-warning "lacks power6 support" } */
-  return __builtin_copysign (a, b); /* { dg-warning "implicit declaration" } */
-}



[PATCH v2] PR68972: g++.dg/cpp1y/vla-initlist1.C test case fails on power

2017-02-07 Thread Kelvin Nilsen

This second version of the proposed patch removes redundant and unnecessary
default arguments to the dg-skip-if directive, as requested by Rainer Orth.
Thank you Rainer for your review and feedback.

The test g++.dg/cpp1y/vla-initlist1.C makes assumptions that the memory
used to represent the private temporary variables of neighboring control
blocks at the same control nesting level is:

1. found at the same address, and
2. not overwritten between when the first block ends and the second
block begins.

While these assumptions are valid with some optimization choices on some
architectures, these assumptions do not hold universally.

With optimization disabled on the power architecture, the
g++.dg/cpp1y/vla-initlist1.C test program runs initialization code to
allocate the variable-length array a[] before entry into the second of
two neighboring control blocks.  This initialization code overwrites the
first two cells of the array i[] that were initialized by the first of
the two neighboring control blocks.  Thus, the initialization value
stored into i[1] is no longer present when this value is subsequently
fetched as a[1].i from within the second control block.

This patch disables this particular test case on power hardware.

The patch has been bootstrapped and tested on
powerpc64le-unknown-linux with no regressions.

Is this ok for trunk?


gcc/testsuite/ChangeLog:

2017-02-07  Kelvin Nilsen  <kel...@gcc.gnu.org>

PR target/68972
* g++.dg/cpp1y/vla-initlist1.C: Add dg-skip-if directive to
disable this test on power architecture.

Index: gcc/testsuite/g++.dg/cpp1y/vla-initlist1.C
===
--- gcc/testsuite/g++.dg/cpp1y/vla-initlist1.C  (revision 245156)
+++ gcc/testsuite/g++.dg/cpp1y/vla-initlist1.C  (working copy)
@@ -1,4 +1,5 @@
 // { dg-do run { target c++11 } }
+// { dg-skip-if "power overwrites two slots of array i" { "power*-*-*" } }
 // { dg-options "-Wno-vla" }
 
 #include 



[PATCH] PR68972: g++.dg/cpp1y/vla-initlist1.C test case fails on power

2017-02-06 Thread Kelvin Nilsen

The test g++.dg/cpp1y/vla-initlist1.C makes assumptions that the memory
used to represent the private temporary variables of neighboring control
blocks at the same control nesting level is:

1. found at the same address, and
2. not overwritten between when the first block ends and the second
block begins.

While these assumptions are valid with some optimization choices on some
architectures, these assumptions do not hold universally.

With optimization disabled on the power architecture, the
g++.dg/cpp1y/vla-initlist1.C test program runs initialization code to
allocate the variable-length array a[] before entry into the second of
two neighboring control blocks.  This initialization code overwrites the
first two cells of the array i[] that were initialized by the first of
the two neighboring control blocks.  Thus, the initialization value
stored into i[1] is no longer present when this value is subsequently
fetched as a[1].i from within the second control block.

This patch disables this particular test case on power hardware.

The patch has been bootstrapped and tested on
powerpc64le-unknown-linux with no regressions.

Is this ok for trunk?

gcc/testsuite/ChangeLog:

2017-02-06  Kelvin Nilsen  <kel...@gcc.gnu.org>

PR target/68972
* g++.dg/cpp1y/vla-initlist1.C: Add dg-skip-if directive to
disable this test on power architecture.

Index: gcc/testsuite/g++.dg/cpp1y/vla-initlist1.C
===
--- gcc/testsuite/g++.dg/cpp1y/vla-initlist1.C  (revision 245156)
+++ gcc/testsuite/g++.dg/cpp1y/vla-initlist1.C  (working copy)
@@ -1,4 +1,5 @@
 // { dg-do run { target c++11 } }
+// { dg-skip-if "power overwrites two slots of array i" { "power*-*-*"
} { "*" } { "" } }
 // { dg-options "-Wno-vla" }

 #include 



[PATCH] PR66669: Fix failure of gcc.dg/loop-8.c on Power

2017-01-23 Thread Kelvin Nilsen

The test gcc.dg/loop-8.c makes assumptions that are not valid on Power
architecture (and on certain other architectures for which this issue
has already been addressed).  The test case assumes that a single
loop-invariant statement will be moved outside the loop.  On Power, a
constant is copy-propagated within the loop, and the subsequent
loop-invariant code motion moves two loop-invariant statements out of
the loop.

This patch simply disables this test case on Power architecture.


gcc/testsuite/ChangeLog:

2017-01-23  Kelvin Nilsen  <kel...@gcc.gnu.org>

PR target/9
* gcc.dg/loop-8.c: Modify dg-skip-if directive to exclude this
test on powerpc targets.

Index: gcc/testsuite/gcc.dg/loop-8.c
===
--- gcc/testsuite/gcc.dg/loop-8.c   (revision 244730)
+++ gcc/testsuite/gcc.dg/loop-8.c   (working copy)
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O1 -fdump-rtl-loop2_invariant" } */
-/* { dg-skip-if "unexpected IV" { "hppa*-*-* mips*-*-* visium-*-*" } { "*" } { 
"" } } */
+/* { dg-skip-if "unexpected IV" { "hppa*-*-* mips*-*-* visium-*-* 
powerpc*-*-*" } { "*" } { "" } } */
 
 void
 f (int *a, int *b)



[PATCH,rs6000] Correct argument and result types for binary floating point built-in functions

2017-01-19 Thread Kelvin Nilsen

This patch corrects several errors in a patch originally committed on
2016-08-10.  The following corrections are required to maintain
compliance with "Power Architecture 64-Bit ELF V2 ABI Specification",
also known as "OpenPOWER ABI for Linux Supplement".

  vector unsigned long long vec_extract_exp (vector double);
(instead of vector long long vec_extract_exp (vector double))
  vector unsigned int vec_extract_exp (vector float);
(instead of vector int vec_extract_exp (vector float))
  vector unsigned long long vec_extract_sig (vector double);
(instead of vector long long vec_extract_sig (vector double))
  vector unsigned int vec_extract_sig (vector float);
(instead of vector int vec_extract_sig (vector float))
  vector double vec_insert_exp (vector double, vector unsigned long long);
  vector float vec_insert_exp (vector float, vector unsigned int);
(the above two are new forms, to complement the existing 
 forms which take matching integer arguments)
  vector bool int vec_test_data_class (vector float, const int);
(instead of vector int vec_test_class (vector float, unsigned int))
  vector bool long long vec_test_data_class (vector double, const int);
(instead of vector long long vec_test_data_class (vector double, unsigned 
int))

Though the following functions are not defined in the ABI
specification, they were also corrected to provide improved consistency
with the corresponding vector functions:

  double scalar_insert_exp (double, unsigned long long);
(The above was added to complement the existing form: 
 double scalar_insert_exp (unsigned long long int, unsigned long long int))
  bool scalar_test_data_class (double, const int);
(instead of int scalar_test_data_class (double, unsigned int))
  bool scalar_test_data_class (float, const int);
(instead of int scalar_test_data_class (float, unsigned int))
  bool scalar_test_neg (double);
(instead of int scalar_test_neg (double))
  bool scalar_test_neg (float);
(instead of int scalar_test_neg (float))

This patch has bootstrapped and tested on
powerpcle-unknown-linux (little-endian) and on
powerpc-unknown-linux (big-endian, with both -m32 and -m64 target
option) with no regressions.  Is this ok for the trunk?

gcc/testsuite/ChangeLog:

2017-01-19  Kelvin Nilsen  <kel...@gcc.gnu.org>

* gcc.target/powerpc/bfp/scalar-insert-exp-3.c: New test.
* gcc.target/powerpc/bfp/scalar-insert-exp-4.c: New test.
* gcc.target/powerpc/bfp/scalar-insert-exp-5.c: New test.
* gcc.target/powerpc/bfp/scalar-test-data-class-0.c: Adjust return
type of test function to reflect change in built-in function's
return type.
* gcc.target/powerpc/bfp/scalar-test-data-class-1.c: Likewise.
* gcc.target/powerpc/bfp/scalar-test-data-class-2.c: Likewise.
* gcc.target/powerpc/bfp/scalar-test-data-class-3.c: Likewise.
* gcc.target/powerpc/bfp/scalar-test-data-class-4.c: Adjust return
type and second argument type to reflect change in built-in
function's type signature.
* gcc.target/powerpc/bfp/scalar-test-data-class-5.c: Likewise.
* gcc.target/powerpc/bfp/scalar-test-data-class-6.c: Adjust return
type of test function to reflect change in built-in function's
return type.
* gcc.target/powerpc/bfp/scalar-test-data-class-7.c: Likewise.
* gcc.target/powerpc/bfp/scalar-test-neg-0.c: Likewise.
* gcc.target/powerpc/bfp/scalar-test-neg-1.c: Likewise.
* gcc.target/powerpc/bfp/scalar-test-neg-2.c: Likewise.
* gcc.target/powerpc/bfp/scalar-test-neg-3.c: Likewise.
* gcc.target/powerpc/bfp/vec-extract-exp-0.c: Likewise.
* gcc.target/powerpc/bfp/vec-extract-exp-1.c: Likewise.
* gcc.target/powerpc/bfp/vec-extract-exp-2.c: Likewise.
* gcc.target/powerpc/bfp/vec-extract-exp-3.c: Likewise.
* gcc.target/powerpc/bfp/vec-extract-sig-0.c: Likewise.
* gcc.target/powerpc/bfp/vec-extract-sig-1.c: Likewise.
* gcc.target/powerpc/bfp/vec-extract-sig-2.c: Likewise.
* gcc.target/powerpc/bfp/vec-extract-sig-3.c: Likewise.
* gcc.target/powerpc/bfp/vec-insert-exp-4.c: New test.
* gcc.target/powerpc/bfp/vec-insert-exp-5.c: New test.
* gcc.target/powerpc/bfp/vec-insert-exp-6.c: New test.
* gcc.target/powerpc/bfp/vec-insert-exp-7.c: New test.
* gcc.target/powerpc/bfp/vec-test-data-class-0.c: Adjust return
type of test function to reflect change in built-in function's
return type.
* gcc.target/powerpc/bfp/vec-test-data-class-1.c: Likewise.
* gcc.target/powerpc/bfp/vec-test-data-class-2.c: Likewise.
* gcc.target/powerpc/bfp/vec-test-data-class-3.c: Likewise.
* gcc.target/powerpc/bfp/vec-test-data-class-4.c: Likewise.
* gcc.target/powerpc/bfp/vec-test-data-class-5.c: Likewise.
* gcc.target/powerpc/bfp/vec-test-da

[PATCH v2,rs6000] PR78056: Finish fixing build failure on Power7

2016-12-16 Thread Kelvin Nilsen

This patch adds warning messages and test cases to an initial patch
already submitted and committed to the trunk on October 26, 2016.  The
earlier patch disables initialization of built-in functions which depend
on assembler capabilities that are not supported by the associated tool
chain.

The original patch was submitted before the work was considered
complete because it was desired to expedite a fix to allow builds on
Power7.  At the time the original patch was submitted for approval, the
following additional tasks were planned.

1. Fail with an assertion error instead of an internal compiler error
if built-in functions are ever defined for which the corresponding
instruction pattern is not supported by the current compiler
configuration.

2. Issue a warning message whenever a command-line -mcpu=XXX request
seeks to configure support for a CPU version which is not supported by
the accompanying assembler.

Besides addressing the above tasks, this new patch also adds a number
of tests to exercise different target configurations.

This second version of the patch differs from the first revision
(which had been sent for review on Dec. 9) in the following ways:

1. Removed #define directives from rs6000.c which were defining
HAVE_AS_POWER9, HAVE_AS_POWER8, HAVE_AS_POPCNTD, HAVE_AS_DFP, and
HAVE_AS_POPCNTB macros.  Rewrote the code that made use of these
macros to be conditioned on #ifdef.

2. Removed redundant parentheses in an expression that defines
the value of the default_cpu variable.

3. Replaced multiple occurrences of (d->icode > 0) with
(d->icode != CODE_FOR_nothing).

4. Replaced two comments which claimed that it is expected
that d->icode equals CODE_FOR_nothing to say instead that
d->icode may equal CODE_FOR_nothing.

5. Added a new effective-target named powerpc_popcntb_ok
and required this effective target in the pr78056-8.c test case.

The patch has been tested with three different tool chains supporting
up to power7, power8, and power9 respectively.  It has successfully
bootstrapped and tested without regressions on
powerpc64le-unknown-linux and powerpc-unknown-linux (big-endian, with
both -m32 and -m64 target options) with no regressions.

Is this patch ok for trunk?

gcc/testsuite/ChangeLog:

2016-12-16  Kelvin Nilsen  <kel...@gcc.gnu.org>

PR target/78056
* gcc.target/powerpc/pr78056-1.c: New test.
* gcc.target/powerpc/pr78056-2.c: New test.
* gcc.target/powerpc/pr78056-3.c: New test.
* gcc.target/powerpc/pr78056-4.c: New test.
* gcc.target/powerpc/pr78056-5.c: New test.
* gcc.target/powerpc/pr78056-6.c: New test.
* gcc.target/powerpc/pr78056-7.c: New test.
* gcc.target/powerpc/pr78056-8.c: New test.
* lib/target-supports.exp
(check_effective_target_powerpc_popcntb_ok): New procedure to test
whether the effective target supports the popcntb instruction.

gcc/ChangeLog:

2016-12-16  Kelvin Nilsen  <kel...@gcc.gnu.org>

PR target/78056
* doc/sourcebuild.texi (PowerPC-specific attributes): Add
documentation of the powerpc_popcntb_ok attribute.
* config/rs6000/rs6000.c (rs6000_option_override_internal): Add
code to issue warning messages if a requested CPU configuration is
not supported by the binary (assembler and loader) toolchain.
(spe_init_builtins): Add two assertions to prevent ICE if attempt is
made to define a built-in function that has been disabled.
(paired_init_builtins): Add assertion to prevent ICE if attempt is
made to define a built-in function that has been disabled.
(altivec_init_builtins): Add comment explaining why definition
of the DST built-in functions is not preceded by an assertion
check.  Add assertions to prevent ICE if attempts are made to
define an altivec predicate or an abs* built-in function that has
been disabled.
(htm_init_builtins): Add comment explaining why definition of the
htm built-in functions is not preceded by an assertion check.


Index: gcc/doc/sourcebuild.texi
===
--- gcc/doc/sourcebuild.texi(revision 241606)
+++ gcc/doc/sourcebuild.texi(working copy)
@@ -1763,6 +1763,10 @@ PowerPC target supports @code{-mhtm}
 @item powerpc_p8vector_ok
 PowerPC target supports @code{-mpower8-vector}
 
+@item powerpc_popcntb_ok
+PowerPC target supports the @code{popcntb} instruction, indicating
+that this target supports @code{-mcpu=power5}.
+
 @item powerpc_ppu_ok
 PowerPC target supports @code{-mcpu=cell}.
 
Index: gcc/testsuite/gcc.target/powerpc/pr78056-1.c
===
--- gcc/testsuite/gcc.target/powerpc/pr78056-1.c(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr78056-1.c(revision 241861)
@@ -0,0 +1,18 @@
+/* { dg-do compile { target { powerpc*-*-* } 

Re: [PATCH v3,rs6000] Add built-in function support for Power9 byte instructions

2016-12-13 Thread Kelvin Nilsen

Thanks for your quick feedback.

I'll update the comments regarding possible future enhancement to
support QImode for operands[1] as well.

Regarding the two test cases that are missing the scan-assembler
directive (byte-in-set-1.c and byte-in-set-2.c), those tests are both
expected to fail.  They are checking that the compiler rejects those
programs with appropriate error messages.

On 12/13/2016 03:14 PM, Segher Boessenkool wrote:
> Hi Kelvin,
> 
> On Mon, Dec 12, 2016 at 05:40:05PM -0700, Kelvin Nilsen wrote:
>> The patch has been bootstrapped and tested on
>> powerpc64le-unknown-linux and powerpc-unknown-linux (big-endian, with
>> both -m32 and -m64 target options) with no regressions.
>>
>> Is this ok for the trunk?
> 
> Yes it is, much better, thanks!  Two comments below, please fix the testcase
> one before commit if it is indeed a problem:
> 
>> +;; Though the instructions to which this expansion maps operate on
>> +;; 64-bit registers, the current implementation only operates on
>> +;; SI-mode operands as the high-order bits provide no information
>> +;; that is not already available in the low-order bits.  To avoid the
>> +;; costs of data widening operations, a future enhancement might add
>> +;; support for DI-mode operands.
> 
> And operands[1] could be QImode.
> 
>> +(define_expand "cmprb"
>> +  [(set (match_dup 3)
>> +(unspec:CC [(match_operand:SI 1 "gpc_reg_operand" "r")
>> +(match_operand:SI 2 "gpc_reg_operand" "r")]
>> + UNSPEC_CMPRB))
> 
> 
>> --- gcc/testsuite/gcc.target/powerpc/byte-in-set-1.c (revision 0)
>> +++ gcc/testsuite/gcc.target/powerpc/byte-in-set-1.c (working copy)
> 
> Did you forget the scan-assembler here and in the next one, or do you only
> want to test it does indeed compile?
> 
> 
> Segher
> 
> 

-- 
Kelvin Nilsen, Ph.D.  kdnil...@linux.vnet.ibm.com
home office: 801-756-4821, cell: 520-991-6727
IBM Linux Technology Center - PPC Toolchain



[PATCH v3,rs6000] Add built-in function support for Power9 byte instructions

2016-12-12 Thread Kelvin Nilsen


This patch adds built-in function support for the new setb, cmprb, and
cmpeqb Power9 instructions.  This third version of the patch differs
from the second in the following ways:

1. Changed the name of the *cmprb, *setb, *cmprb2, and *cmpeqb new
instructions to *cmprb_internal, *setb_internal, and *cmprb2_internal
respectively.

2. Added comments to the cmprb, setb, and cmprb2 instructions to
acknowledge that, as implemented, we do not currently support the use
of double-integer operands though support for this might be added in
the future.

3. Changed the names of the new non-overloaded builtin functions to 
be of the form __builtin_scalar_ instead of
__builtin_altivec_.  Changed the names of the new overloaded
functions to be of the form __builtin_ instead of
__builtin_scalar_.

4. Corrected the comments describing range encodings and simplified the
descriptions by speaking of individual bytes instead of bit numbers
(cmprb, cmprb2, cmpeqb define_expand patterns and *cmprb_internal,
*cmprb2_internal, *cmpeqb_internal define_insn patterns).

5. Updated documentation to use the new function names and to speak of
range encodings in terms of individual bytes instead of bit numbers.

6. Changed the test cases to use the new function names.

7. Corrected bit shifting of arguments in the byte-in-range-0.c and
byte-in-range-1.c test cases.


The patch has been bootstrapped and tested on
powerpc64le-unknown-linux and powerpc-unknown-linux (big-endian, with
both -m32 and -m64 target options) with no regressions.

Is this ok for the trunk?

gcc/testsuite/ChangeLog:

2016-12-12  Kelvin Nilsen  <kel...@gcc.gnu.org>

* gcc.target/powerpc/byte-in-either-range-0.c: New test.
* gcc.target/powerpc/byte-in-either-range-1.c: New test.
* gcc.target/powerpc/byte-in-range-0.c: New test.
* gcc.target/powerpc/byte-in-range-1.c: New test.
* gcc.target/powerpc/byte-in-set-0.c: New test.
* gcc.target/powerpc/byte-in-set-1.c: New test.
* gcc.target/powerpc/byte-in-set-2.c: New test.


gcc/ChangeLog:

2016-12-12  Kelvin Nilsen  <kel...@gcc.gnu.org>

* config/rs6000/altivec.md (UNSPEC_CMPRB): New unspec value.
(UNSPEC_CMPRB2): New unspec value.
(UNSPEC_CMPEQB): New unspec value.
(cmprb): New expansion.
(*cmprb_internal): New insn.
(*setb_internal): New insn.
(cmprb2): New expansion.
(*cmprb2_internal): New insn.
(cmpeqb): New expansion.
(*cmpeqb_internal): New insn.
* config/rs6000/rs6000-builtin.def (BU_P9_2): New macro.
(BU_P9_64BIT_2): Likewise.
(BU_P9_OVERLOAD_2): Likewise.
(CMPRB): Add byte-in-range built-in function.
(CMBRB2): Add byte-in-either-range built-in function.
(CMPEQB): Add byte-in-set built-in function.
(CMPRB): Add overload support for byte-in-range function.
(CMPRB2): Add overload support for byte-in-either-range function.
(CMPEQB): Add overload support for byte-in-set built-in function.
* config/rs6000/rs6000-c.c (P9_BUILTIN_CMPRB): Macro expansion to
define argument types for new builtin. 
(P9_BUILTIN_CMPRB2): Likewise.
(P9_BUILTIN_CMPEQB): Likewise.
* doc/extend.texi (PowerPC AltiVec Built-in Functions): Rearrange
the order of presentation for certain built-in functions
(scalar_extract_exp, scalar_extract_sig, scalar_insert_exp)
(scalar_cmp_exp_gt, scalar_cmp_exp_lt, scalar_cmp_exp_eq)
(scalar_cmp_exp_unordered, scalar_test_data_class)
(scalar_test_neg) to improve locality and flow.  Document
the new __builtin_scalar_byte_in_set,
__builtin_scalar_byte_in_range, and
__builtin_scalar_byte_in_either_range functions.

Index: gcc/config/rs6000/altivec.md
===
--- gcc/config/rs6000/altivec.md(revision 241245)
+++ gcc/config/rs6000/altivec.md(working copy)
@@ -153,6 +153,9 @@
UNSPEC_BCDADD
UNSPEC_BCDSUB
UNSPEC_BCD_OVERFLOW
+   UNSPEC_CMPRB
+   UNSPEC_CMPRB2
+   UNSPEC_CMPEQB
 ])
 
 (define_c_enum "unspecv"
@@ -3709,6 +3712,189 @@
   "darn %0,1"
   [(set_attr "type" "integer")])
 
+;; Test byte within range.
+;;
+;; The bytes of operand 1 are organized as xx:xx:xx:vv, where xx
+;; represents a byte whose value is ignored in this context and
+;; vv, the least significant byte, holds the byte value that is to
+;; be tested for membership within the range specified by operand 2.
+;; The bytes of operand 2 are organized as xx:xx:hi:lo.
+;;
+;; Return in target register operand 0 a value of 1 if lo <= vv and
+;; vv <= hi.  Otherwise, set register operand 0 to 0.
+;;
+;; Though the instructions to which this expansion maps operate on
+;; 64-bit registers, the current implementation only operates on
+;; SI-mode operands as the high-order bits provide no informati

[PATCH] PR78056: Finish fixing build failure on Power7

2016-12-09 Thread Kelvin Nilsen
This patch adds warning messages and test cases to an initial patch
already submitted and committed to the trunk on October 26, 2016.  The
earlier patch disables initialization of built-in functions which depend
on assembler capabilities that are not supported by the associated tool
chain.

The original patch was submitted before the work was considered
complete because it was desired to expedite a fix to allow builds on
Power7.  At the time the original patch was submitted for approval, the
following additional tasks were planned.

1. Fail with an assertion error instead of an internal compiler error
if built-in functions are ever defined for which the corresponding
instruction pattern is not supported by the current compiler
configuration.

2. Issue a warning message whenever a command-line -mcpu=XXX request
seeks to configure support for a CPU version which is not supported by
the accompanying assembler.

Besides addressing the above tasks, this new patch also adds a number
of tests to exercise different target configurations.

The patch has been tested with three different tool chains supporting
up to power7, power8, and power9 respectively.  It has successfully
bootstrapped and tested without regressions on
powerpc64le-unknown-linux and powerpc-unknown-linux (big-endian, with
both -m32 and -m64 target options) with no regressions.

Is this patch ok for trunk?

gcc/ChangeLog:

2016-12-08  Kelvin Nilsen  <kel...@gcc.gnu.org>

PR target/78056
* config/rs6000/rs6000.c: Provide default macro definitions for
HAVE_AS_POPCNB, HAVE_AS_DFP, HAVE_AS_POPCNTD, HAVE_AS_POWER8,
HAVE_AS_POWER9.
(rs6000_option_override_internal): Add code to issue warning
messages if a requested CPU configuration is not supported by the
binary (assembler and loader) toolchain.
(spe_init_builtins): Add two assertions to prevent ICE if attempt is
made to define a built-in function that has been disabled.
(paired_init_builtins): Add assertion to prevent ICE if attempt is
made to define a built-in function that has been disabled.
(altivec_init_builtins): Add comment explaining why definition
of the DST built-in functions is not preceded by an assertion
check.  Add assertions to prevent ICE if attempts are made to
define an altivec predicate or an abs* built-in function that has
been disabled.
(htm_init_builtins): Add comment explaining why definition of the
htm built-in functions is not preceded by an assertion check.


gcc/testsuite/ChangeLog:

2016-12-08  Kelvin Nilsen  <kel...@gcc.gnu.org>

PR target/78056
* gcc.target/powerpc/pr78056-1.c: New test.
* gcc.target/powerpc/pr78056-2.c: New test.
* gcc.target/powerpc/pr78056-3.c: New test.
* gcc.target/powerpc/pr78056-4.c: New test.
* gcc.target/powerpc/pr78056-5.c: New test.
* gcc.target/powerpc/pr78056-6.c: New test.
* gcc.target/powerpc/pr78056-7.c: New test.
* gcc.target/powerpc/pr78056-8.c: New test.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 241606)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -84,6 +84,22 @@
 #define min(A,B)   ((A) < (B) ? (A) : (B))
 #define max(A,B)   ((A) > (B) ? (A) : (B))
 
+#ifndef HAVE_AS_POWER9
+#define HAVE_AS_POWER9 0
+#endif
+#ifndef HAVE_AS_POWER8
+#define HAVE_AS_POWER8 0
+#endif
+#ifndef HAVE_AS_POPCNTD
+#define HAVE_AS_POPCNTD 0
+#endif
+#ifndef HAVE_AS_DFP
+#define HAVE_AS_DFP 0
+#endif
+#ifndef HAVE_AS_POPCNTB
+#define HAVE_AS_POPCNTB 0
+#endif
+
 /* Structure used to define the rs6000 stack */
 typedef struct rs6000_stack {
   int reload_completed;/* stack info won't change from here on 
*/
@@ -3860,6 +3876,62 @@ rs6000_option_override_internal (bool global_init_
 
   gcc_assert (cpu_index >= 0);
 
+  if (have_cpu)
+{
+  if (!HAVE_AS_POWER9
+ && (processor_target_table[rs6000_cpu_index].processor
+ == PROCESSOR_POWER9))
+   {
+ have_cpu = false;
+ warning (0, "will not generate power9 instructions because "
+  "assembler lacks power9 support");
+   }
+  if (!HAVE_AS_POWER8
+ && (processor_target_table[rs6000_cpu_index].processor
+ == PROCESSOR_POWER8))
+   {
+ have_cpu = false;
+ warning (0, "will not generate power8 instructions because "
+  "assembler lacks power8 support");
+   }
+  if (!HAVE_AS_POPCNTD
+ && (processor_target_table[rs6000_cpu_index].processor
+ == PROCESSOR_POWER7))
+   {
+ have_cpu = false;
+ warning (0, "will not generate power7 instructions because "
+  "assembler lacks power7 support&q

[PATCH v2,rs6000] Add built-in function support for Power9 byte instructions.

2016-12-05 Thread Kelvin Nilsen


This patch adds built-in function support for the new setb, cmprb, and
cmpeqb Power9 instructions.  This second version of the patch differs
from the first in the following ways:

1. Removed the UNSPEC_SETB new unspec value.  Rewrote these patterns to
describe semantics in terms of primitive RTL.

2. Changed the names of the cmprb_p, cmprb2_p, and cmpeqb_p define_insn
patterns to cmprb, cmprb2, and cmpeqb respectively.

3. Fixed two typos in the ChangeLog file.

4. Fixed comments that describe the cmprb and cmprb2 define_expand
patterns.

5. Fixed comments that describe the *cmprb, *setb, and *cmprb2
define_insn patterns.

6. Removed trailing space in description of the cmpeqb define_expand
pattern.

The patch has been bootstrapped and tested on
powerpc64le-unknown-linux and powerpc-unknown-linux (big-endian, with
both -m32 and -m64 target options) with no regressions.

Is this ok for the trunk?

gcc/testsuite/ChangeLog:

2016-12-05  Kelvin Nilsen  <kel...@gcc.gnu.org>

* gcc.target/powerpc/byte-in-either-range-0.c: New test.
* gcc.target/powerpc/byte-in-either-range-1.c: New test.
* gcc.target/powerpc/byte-in-range-0.c: New test.
* gcc.target/powerpc/byte-in-range-1.c: New test.
* gcc.target/powerpc/byte-in-set-0.c: New test.
* gcc.target/powerpc/byte-in-set-1.c: New test.
* gcc.target/powerpc/byte-in-set-2.c: New test.


gcc/ChangeLog:

2016-12-05  Kelvin Nilsen  <kel...@gcc.gnu.org>

* config/rs6000/altivec.md (UNSPEC_CMPRB): New unspec value.
(UNSPEC_CMPRB2): New unspec value.
(UNSPEC_CMPEQB): New unspec value.
(cmprb): New expansion.
(*cmprb): New insn.
(*setb): New insn.
(cmprb2): New expansion.
(*cmprb2): New insn.
(cmpeqb): New expansion.
(*cmpeqb): New insn.
* config/rs6000/rs6000-builtin.def (BU_P9V_64BIT_AV_2): New macro.
(BU_P9_OVERLOAD_2): Likewise.
(CMPRB): Add byte-in-range built-in function.
(CMBRB2): Add byte-in-either-range built-in function.
(CMPEQB): Add byte-in-set built-in function.
(CMPRB): Add overload support for byte-in-range function.
(CMPRB2): Add overload support for byte-in-either-range function.
(CMPEQB): Add overload support for byte-in-set built-in function.
* config/rs6000/rs6000-c.c (P9V_BUILTIN_SCALAR_CMPRB): Macro
expansion to define argument types for new builtin.
(P9V_BUILTIN_SCALAR_CMPRB2): Likewise.
(P9V_BUILTIN_SCALAR_CMPEQB): Likewise.
* doc/extend.texi (PowerPC AltiVec Built-in Functions): Rearrange
the order of presentation for certain built-in functions
(scalar_extract_exp, scalar_extract_sig, scalar_insert_exp)
(scalar_cmp_exp_gt, scalar_cmp_exp_lt, scalar_cmp_exp_eq)
(scalar_cmp_exp_unordered, scalar_test_data_class)
(scalar_test_neg) to improve locality and flow.  Document
the new __builtin_scalar_byte_in_set,
__builtin_scalar_byte_in_range, and
__builtin_scalar_byte_in_either_range functions.


Index: gcc/config/rs6000/altivec.md
===
--- gcc/config/rs6000/altivec.md(revision 241245)
+++ gcc/config/rs6000/altivec.md(working copy)
@@ -153,6 +153,9 @@
UNSPEC_BCDADD
UNSPEC_BCDSUB
UNSPEC_BCD_OVERFLOW
+   UNSPEC_CMPRB
+   UNSPEC_CMPRB2
+   UNSPEC_CMPEQB
 ])
 
 (define_c_enum "unspecv"
@@ -3709,6 +3712,138 @@
   "darn %0,1"
   [(set_attr "type" "integer")])
 
+;; Predicate: test byte within range.
+;; Return in target register operand 0 a value of 1 if the byte
+;; held in bits 24:31 of operand 1 is within the inclusive range
+;; bounded above by operand 2's bits 0:7 and below by operand 2's
+;; bits 8:15.  Otherwise, set register operand 0 to 0.
+(define_expand "cmprb"
+  [(set (match_dup 3)
+   (unspec:CC [(match_operand:SI 1 "gpc_reg_operand" "r")
+   (match_operand:SI 2 "gpc_reg_operand" "r")]
+UNSPEC_CMPRB))
+   (set (match_operand:SI 0 "gpc_reg_operand" "=r")
+   (if_then_else:SI (lt (match_dup 3)
+(const_int 0))
+(const_int -1)
+(if_then_else (gt (match_dup 3)
+  (const_int 0))
+  (const_int 1)
+  (const_int 0]
+  "TARGET_P9_MISC"
+{
+  operands[3] = gen_reg_rtx (CCmode);
+})
+
+;; Set bit 1 (the GT bit, 0x4) of CR register operand 0 to 1 iff the
+;; byte found in bits 24:31 of register operand 1 is within the
+;; inclusive range bounded above by operand 2's bits 0:7 and below by
+;; operand 2's bits 8:15.  The other 3 bits of the target CR register
+;; are set to 0.
+(define_insn "*cmprb"
+  [(set (ma

[PATCH,rs6000] Correct mode of operand 2 in vector extract half-word and word instruction patterns

2016-11-30 Thread Kelvin Nilsen

This patch corrects an error in a patch committed on 2016-10-18 to add
built-in function support for Power9 string operations.  In that
original patch, the mode for operand 2 of the newly added vector
extract half-word and full-word instruction patterns was described as
V16QI, even though those instruction patterns were conceptually
operating on V8HI and V4SI operands respectively.

This patch changes the modes of the operands for these instruction
patterns to better represent the intended types.  This patch improves
readability and maintainability of code.  It does not affect
correctness of generated code, since the existing implementation
implicitly coerces the operand types to the declared type.

The patch has been bootstrapped and tested on powerpc64le-unknown-linux
without regressions.

Is this ok for the trunk?

gcc/ChangeLog:

2016-11-30  Kelvin Nilsen  <kel...@gcc.gnu.org>

PR target/78577
* config/rs6000/vsx.md (vextuhlx): Revise mode of operand 2.
(vextuhrx): Likewise.
(vextuwlx): Likewise.
(vextuwrx): Likewise.

Index: gcc/config/rs6000/vsx.md
===
--- gcc/config/rs6000/vsx.md(revision 242948)
+++ gcc/config/rs6000/vsx.md(working copy)
@@ -3648,7 +3648,7 @@
   [(set (match_operand:SI 0 "register_operand" "=r")
(unspec:SI
 [(match_operand:SI 1 "register_operand" "r")
- (match_operand:V16QI 2 "altivec_register_operand" "v")]
+ (match_operand:V8HI 2 "altivec_register_operand" "v")]
 UNSPEC_VEXTUHLX))]
   "TARGET_P9_VECTOR"
   "vextuhlx %0,%1,%2"
@@ -3659,7 +3659,7 @@
   [(set (match_operand:SI 0 "register_operand" "=r")
(unspec:SI
 [(match_operand:SI 1 "register_operand" "r")
- (match_operand:V16QI 2 "altivec_register_operand" "v")]
+ (match_operand:V8HI 2 "altivec_register_operand" "v")]
 UNSPEC_VEXTUHRX))]
   "TARGET_P9_VECTOR"
   "vextuhrx %0,%1,%2"
@@ -3670,7 +3670,7 @@
   [(set (match_operand:SI 0 "register_operand" "=r")
(unspec:SI
 [(match_operand:SI 1 "register_operand" "r")
- (match_operand:V16QI 2 "altivec_register_operand" "v")]
+ (match_operand:V4SI 2 "altivec_register_operand" "v")]
 UNSPEC_VEXTUWLX))]
   "TARGET_P9_VECTOR"
   "vextuwlx %0,%1,%2"
@@ -3681,7 +3681,7 @@
   [(set (match_operand:SI 0 "register_operand" "=r")
(unspec:SI
 [(match_operand:SI 1 "register_operand" "r")
- (match_operand:V16QI 2 "altivec_register_operand" "v")]
+ (match_operand:V4SI 2 "altivec_register_operand" "v")]
 UNSPEC_VEXTUWRX))]
   "TARGET_P9_VECTOR"
   "vextuwrx %0,%1,%2"


-- 
Kelvin Nilsen, Ph.D.  kdnil...@linux.vnet.ibm.com
home office: 801-756-4821, cell: 520-991-6727
IBM Linux Technology Center - PPC Toolchain



Re: [PATCH,rs6000] Add built-in function support for Power9 byte instructions

2016-11-15 Thread Kelvin Nilsen

> 
>> Thanks for catching this.  I think I got endian confusion inside my head
>> while I was writing the above.  I will rewrite these comments, below also.
> 
> Note the ISA calls the bits in 32-bit registers 32..63, so that 63 is
> the rightmost bit in all registers.
> 

True, but the ISA only uses the lower half of the 64-bit register, so I
have describe my patterns using SI mode instead of DI mode, which is
part of the reason I was numbering my bits differently than the ISA
document.

The reason I am using SI mode is so that I don't have to disqualify the
use of these functions on a 32-bit big-endian configuration.

Do you want me to switch to DI mode for all the operands?

>>> I wonder if we really need all these predicate expanders, if it wouldn't
>>> be easier if the builtin handling code did the setb itself?
>>>
>>
>> The reason it seems most "natural" to me use the expanders is because I
>> need to introduce a temporary CR scratch register between expansion and
>> insn matching.  Also, it seems that the *setb pattern may be of more
>> general use in the future implementation of other built-in functions.
>> I'm inclined to keep this as is, but if you still feel otherwise, I'll
>> figure out how to avoid the expansion.
> 
> The code (in rs6000.c) expanding the builtin can create two insns directly,
> so that you do not need to repeat this over and over in define_expands?
> 

The pattern I'm familiar with is to allocate the temporary scratch
register during expansion, and to use the allocated temporary at insn
match time.  I'll have to teach myself a new pattern to do all of this
at insn match time.  Feel free to point me to an example of define_insn
code that does this.

Thanks again.


-- 
Kelvin Nilsen, Ph.D.  kdnil...@linux.vnet.ibm.com
home office: 801-756-4821, cell: 520-991-6727
IBM Linux Technology Center - PPC Toolchain



Re: [PATCH,rs6000] Add built-in function support for Power9 byte instructions

2016-11-15 Thread Kelvin Nilsen

Thank you very much for the prompt and thorough review.  There are a few
points below where I'd like to seek further clarification.

On 11/15/2016 04:19 AM, Segher Boessenkool wrote:
> Hi!
> 
> On Mon, Nov 14, 2016 at 04:43:35PM -0700, Kelvin Nilsen wrote:
>>  * config/rs6000/altivec.md (UNSPEC_CMPRB): New unspec value.
>>  (UNSPEC_CMPRB2): New unspec value.
> 
> I wonder if you really need both?  The number of arguments will tell
> which is which, anyway?

I appreciate your preference to avoid proliferation of special-case
unspec constants.  However, it is a not so straightforward to combine
these two cases under the same constant value.  The issue is that though
the two encoding conceptually represent different "numbers of
arguments", the arguments are all packed inside of a 32-bit register.
At the RTL level, it looks like the two different forms have the same
number of arguments (the same number of register operands).  The
difference is which bits serve relevant purposes within the incoming
register operands.

So I'm inclined to keep this as is if that's ok with you.

> 
>>  (cmprb_p): New expansion.
> 
> Not such a great name (now you get a gen_cmprb_p function which isn't
> a predicate itself).

I'll change these names.

> 
>>  (CMPRB): Add byte-in-range built-in function.
>>  (CMBRB2): Add byte-in-either_range built-in function.
>>  (CMPEQB): Add byte-in-set builtin-in function.
> 
> "builtin-in", and you typoed an underscore?

Thanks.


> 
>> +;; Predicate: test byte within range.
>> +;; Return in target register operand 0 a non-zero value iff the byte
>> +;; held in bits 24:31 of operand 1 is within the inclusive range
>> +;; bounded below by operand 2's bits 0:7 and above by operand 2's
>> +;; bits 8:15.
>> +(define_expand "cmprb_p"
> 
> It seems you got the bit numbers mixed up.  Maybe just call it the low
> byte, and the byte just above?
> 
> (And it always sets 0 or 1 here, you might want to make that more explicit).
> 
>> +;; Set bit 1 (the GT bit, 0x2) of CR register operand 0 to 1 iff the
> 
> That's 4, i.e. 0b0100.
> 
>> +;; Set operand 0 register to non-zero value iff the CR register named
>> +;; by operand 1 has its GT bit (0x2) or its LT bit (0x1) set.
>> +(define_insn "*setb"
> 
> LT is 8, GT is 4.  If LT is set it returns -1, otherwise if GT is set it
> returns 1, otherwise it returns 0.
> 

Thanks for catching this.  I think I got endian confusion inside my head
while I was writing the above.  I will rewrite these comments, below also.

>> +;; Predicate: test byte within two ranges.
>> +;; Return in target register operand 0 a non-zero value iff the byte
>> +;; held in bits 24:31 of operand 1 is within the inclusive range
>> +;; bounded below by operand 2's bits 0:7 and above by operand 2's
>> +;; bits 8:15 or if the byte is within the inclusive range bounded
>> +;; below by operand 2's bits 16:23 and above by operand 2's bits 24:31.
>> +(define_expand "cmprb2_p"
> 
> The high bound is higher in the reg than the low bound.  See the example
> where 0x3930 is used to do isdigit (and yes 0x3039 would be much more
> fun, but alas).
> 
>> +;; Predicate: test byte membership within set of 8 bytes.
>> +;; Return in target register operand 0 a non-zero value iff the byte
>> +;; held in bits 24:31 of operand 1 equals at least one of the eight
>> +;; byte values represented by the 64-bit register supplied as operand
>> +;; 2.  Note that the 8 byte values held within operand 2 need not be
>> +;; unique. 
> 
> (trailing space)
> 
> I wonder if we really need all these predicate expanders, if it wouldn't
> be easier if the builtin handling code did the setb itself?
> 

The reason it seems most "natural" to me use the expanders is because I
need to introduce a temporary CR scratch register between expansion and
insn matching.  Also, it seems that the *setb pattern may be of more
general use in the future implementation of other built-in functions.
I'm inclined to keep this as is, but if you still feel otherwise, I'll
figure out how to avoid the expansion.



[PATCH,rs6000] Add built-in function support for Power9 byte instructions

2016-11-14 Thread Kelvin Nilsen


This patch adds built-in function support for the new setb, cmprb, and
cmpeqb Power9 instructions.

The patch has been bootstrapped and tested on
powerpc64le-unknown-linux and powerpc-unknown-linux (big-endian, with
both -m32 and -m64 target options) with no regresions.

Is this ok for the trunk?

gcc/testsuite/ChangeLog:

2016-11-14  Kelvin Nilsen  <kel...@gcc.gnu.org>

* gcc.target/powerpc/byte-in-either-range-0.c: New test.
* gcc.target/powerpc/byte-in-either-range-1.c: New test.
* gcc.target/powerpc/byte-in-range-0.c: New test.
* gcc.target/powerpc/byte-in-range-1.c: New test.
* gcc.target/powerpc/byte-in-set-0.c: New test.
* gcc.target/powerpc/byte-in-set-1.c: New test.
* gcc.target/powerpc/byte-in-set-2.c: New test.


gcc/ChangeLog:

2016-11-14  Kelvin Nilsen  <kel...@gcc.gnu.org>

* config/rs6000/altivec.md (UNSPEC_CMPRB): New unspec value.
(UNSPEC_CMPRB2): New unspec value.
(UNSPEC_CMPEQB): New unspec value.
(UNSPEC_SETB): New unspec value.
(cmprb_p): New expansion.
(*cmprb): New insn.
(*setb): New insn.
(cmprb2_p): New expansion.
(*cmprb2): New insn.
(cmpeqb_p): New expansion.
(*cmpeqb): New insn.
* config/rs6000/rs6000-builtin.def (BU_P9V_64BIT_AV_2): New macro.
(BU_P9_OVERLOAD_2): Likewise.
(CMPRB): Add byte-in-range built-in function.
(CMBRB2): Add byte-in-either_range built-in function.
(CMPEQB): Add byte-in-set builtin-in function.
(CMPRB): Add overload support for byte-in-range function.
(CMPRB2): Add overload support for byte-in-either-range function.
(CMPEQB): Add overload support for byte-in-set built-in function.
* config/rs6000/rs6000-c.c (P9V_BUILTIN_SCALAR_CMPRB): Macro
expansion to define argument types for new builtin.
(P9V_BUILTIN_SCALAR_CMPRB2): Macro expansion to define argument
types for new builtin.
(P9V_BUILTIN_SCALAR_CMPEQB): Macro expansion to define argument
types for new builtin.
* doc/extend.texi (PowerPC AltiVec Built-in Functions): Rearrange
the order of presentation for certain built-in functions
(scalar_extract_exp, scalar_extract_sig, scalar_insert_exp)
(scalar_cmp_exp_gt, scalar_cmp_exp_lt, scalar_cmp_exp_eq)
(scalar_cmp_exp_unordered, scalar_test_data_class)
(scalar_test_neg) to improve locality and flow.  Document
the new __builtin_scalar_byte_in_set,
__builtin_scalar_byte_in_range, and
__builtin_scalar_byte_in_either_range functions.

Index: gcc/config/rs6000/altivec.md
===
--- gcc/config/rs6000/altivec.md(revision 241245)
+++ gcc/config/rs6000/altivec.md(working copy)
@@ -153,6 +153,10 @@
UNSPEC_BCDADD
UNSPEC_BCDSUB
UNSPEC_BCD_OVERFLOW
+   UNSPEC_CMPRB
+   UNSPEC_CMPRB2
+   UNSPEC_CMPEQB
+   UNSPEC_SETB
 ])
 
 (define_c_enum "unspecv"
@@ -3709,6 +3713,116 @@
   "darn %0,1"
   [(set_attr "type" "integer")])
 
+;; Predicate: test byte within range.
+;; Return in target register operand 0 a non-zero value iff the byte
+;; held in bits 24:31 of operand 1 is within the inclusive range
+;; bounded below by operand 2's bits 0:7 and above by operand 2's
+;; bits 8:15.
+(define_expand "cmprb_p"
+  [(set (match_dup 3)
+   (unspec:CC [(match_operand:SI 1 "gpc_reg_operand" "r")
+   (match_operand:SI 2 "gpc_reg_operand" "r")]
+UNSPEC_CMPRB))
+   (set (match_operand:SI 0 "gpc_reg_operand" "=r")
+(unspec:SI [(match_dup 3)]
+ UNSPEC_SETB))
+  ]
+  "TARGET_P9_MISC"
+{
+  operands[3] = gen_reg_rtx (CCmode);
+})
+
+;; Set bit 1 (the GT bit, 0x2) of CR register operand 0 to 1 iff the
+;; byte found in bits 24:31 of register operand 1 is within the
+;; inclusive range bounded below by operand 2's bits 0:7 and above by
+;; operand 2's bits 8:15.  The other 3 bits of the target CR register
+;; are set to 0.
+(define_insn "*cmprb"
+  [(set (match_operand:CC 0 "cc_reg_operand" "=y")
+   (unspec:CC [(match_operand:SI 1 "gpc_reg_operand" "r")
+   (match_operand:SI 2 "gpc_reg_operand" "r")]
+UNSPEC_CMPRB))]
+  "TARGET_P9_MISC"
+  "cmprb %0,0,%1,%2"
+  [(set_attr "type" "logical")])
+
+;; Set operand 0 register to non-zero value iff the CR register named
+;; by operand 1 has its GT bit (0x2) or its LT bit (0x1) set.
+(define_insn "*setb"
+   [(set (match_operand:SI 0 "gpc_reg_operand" "=r")
+(unspec:SI [(match_operand:CC 1 "cc_reg_operand" "y")]
+ UNSPEC_SETB))]
+  "TARGET_P9_MISC"
+  "s

[PATCH] PR78056: Fix build failure on Power7

2016-10-25 Thread Kelvin Nilsen

This patch corrects an error introduced with commit 241314.  That patch
introduced several new built-in functions to support Power9 string
instructions.  The error that was found subsequent to the trunk commit
is that initialization of the built-in function tables encounters an
internal compiler error if the assembler that is used with gcc lacks
support for Power9 instructions.

This patch disables initialization of built-in functions which depend
on assembler capabilities that are not supported by the associated tool
chain.

This patch has been booted and regression tested on
powerpcle-unknown-linux, trunk revision 241406.  (I was not able to
regression test on the most current trunk because that trunk does not
boot.)  I have also successfully boot-strapped this patch on a Power7
system for which the assembler lacks support for Power9.  (I could not
regression test on that platform because that platform could not
bootstrap without this patch.)

It is planned that a subsequent enhancement to this patch will make the
following improvements:

1. Fail with an assertion error instead of an internal compiler error
if built-in functions are ever defined for which the corresponding
instruction pattern is not supported by the current compiler
configuration.

2. Issue a warning message whenever a command-line -mcpu=XXX request
seeks to configure support for a CPU version which is not supported by
the accompanying assembler.

I am submitting the patch as is in order to expedite integration since
the error has broken the trunk for certain system configurations.

Is this patch ok for trunk?

gcc/ChangeLog:

2016-10-25  Kelvin Nilsen  <kel...@gcc.gnu.org>

PR target/78056
* config/rs6000/rs6000.c (spe_init_builtins): Modify loops to not
define builtin functions from the bdesc_spe_predicates or
bdesc_spe_evsel arrays if the builtin mask is not compatible with
the current compiler configuration.
(paired_init_builtins): Modify loop to not define define builtin
functions from the bdesc_paried_preds array if the builtin mask is
not compatible with the current compiler configuration.
(altivec_init_builtins): Modify loops to not define the
__builtin_altivec_stxvl function nor the builtin functions from
the bdesc_dst or bdesc_altivec_preds, bdesc_abs

gcc/testsuite/ChangeLog:

2016-10-25  Kelvin Nilsen  <kel...@gcc.gnu.org>

PR target/78056
* gcc.target/powerpc/vsu/vec-any-eqz-7.c (test_any_equal): Change
expected error message.
* gcc.target/powerpc/vsu/vec-xst-len-12.c (store_data): Change
expected error message.
* gcc.target/powerpc/vsu/vec-all-nez-7.c
(test_all_not_equal_and_not_zero): Change expected error message.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 241406)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -16923,6 +16923,7 @@ spe_init_builtins (void)
   tree pushort_type_node = build_pointer_type (short_unsigned_type_node);
   const struct builtin_description *d;
   size_t i;
+  HOST_WIDE_INT builtin_mask = rs6000_builtin_mask;
 
   tree v2si_ftype_4_v2si
 = build_function_type_list (opaque_V2SI_type_node,
@@ -17063,7 +17064,16 @@ spe_init_builtins (void)
   for (i = 0; i < ARRAY_SIZE (bdesc_spe_predicates); ++i, d++)
 {
   tree type;
+  HOST_WIDE_INT mask = d->mask;
 
+  if ((mask & builtin_mask) != mask)
+   {
+ if (TARGET_DEBUG_BUILTIN)
+   fprintf (stderr, "spe_init_builtins, skip predicate %s\n",
+d->name);
+ continue;
+   }
+
   switch (insn_data[d->icode].operand[1].mode)
{
case V2SImode:
@@ -17084,7 +17094,16 @@ spe_init_builtins (void)
   for (i = 0; i < ARRAY_SIZE (bdesc_spe_evsel); ++i, d++)
 {
   tree type;
+  HOST_WIDE_INT mask = d->mask;
 
+  if ((mask & builtin_mask) != mask)
+   {
+ if (TARGET_DEBUG_BUILTIN)
+   fprintf (stderr, "spe_init_builtins, skip evsel %s\n",
+d->name);
+ continue;
+   }
+
   switch (insn_data[d->icode].operand[1].mode)
{
case V2SImode:
@@ -17106,6 +17125,7 @@ paired_init_builtins (void)
 {
   const struct builtin_description *d;
   size_t i;
+  HOST_WIDE_INT builtin_mask = rs6000_builtin_mask;
 
tree int_ftype_int_v2sf_v2sf
 = build_function_type_list (integer_type_node,
@@ -17141,7 +17161,16 @@ paired_init_builtins (void)
   for (i = 0; i < ARRAY_SIZE (bdesc_paired_preds); ++i, d++)
 {
   tree type;
+  HOST_WIDE_INT mask = d->mask;
 
+  if ((mask & builtin_mask) != mask)
+   {
+ if (TARGET_DEBUG_BUILTIN)
+   fprintf (stderr, "paired_init_builtins, skip predicate %s\n",
+d->name);
+ continue;
+

[PATCH,committed] PR77847: Add FALLTRHOUGH comment to fix build error

2016-10-05 Thread Kelvin Nilsen

This trivial/obvious patch was committed without review as svn revision
240783.  The patch fixes a compile-time error that recently surfaced
with big-endian Power architecture builds.

libcpp/ChangeLog:

2016-10-04  Kelvin Nilsen  <kel...@gcc.gnu.org>

PR target/77847
* lex.c (search_line_fast): Add a FALLTHROUGH comment to correct
compiler error in the version of this function that is
conditionally compiled when GCC_VERSION >= 4005 and both
__ALTIVEC__ and __BIG_ENDIAN__ symbols are defined.

Index: libcpp/lex.c
===
--- libcpp/lex.c(revision 240755)
+++ libcpp/lex.c(working copy)
@@ -733,6 +733,7 @@
if (l != 0)
  break;
s += sizeof(unsigned long);
+   /* FALLTHROUGH */
   case 2:
l = u.l[i++];
if (l != 0)



  1   2   >