[PATCH] MAINTAINERS: Change to my personal email address
I'm leaving IBM and am changing my email to my my personal address. ChangeLog: 2019-11-15 Kelvin Nilsen * MAINTAINERS: Change my email address as maintainer. Index: MAINTAINERS === --- MAINTAINERS (revision 278306) +++ MAINTAINERS (working copy) @@ -524,7 +524,7 @@ Quentin Neill Adam Nemet Thomas Neumann Dan Nicolaescu -Kelvin Nilsen +Kelvin Nilsen James Norris Diego Novillo Dorit Nuzman
Re: Ping: [PATCH v4, rs6000] Replace X-form addressing with D-form addressing in new pass for Power9
On 10/25/19 8:30 PM, Kelvin Nilsen wrote: > > This patch adds a new optimization pass for rs6000 targets. > > This new pass scans existing rtl expressions and replaces X-form loads and > stores with rtl expressions that favor selection of the D-form instructions > in contexts for which the D-form instructions are preferred. The new pass > runs after the RTL loop optimizations since loop unrolling often introduces > opportunities for beneficial replacements of X-form addressing instructions. > > For each of the new tests, multiple X-form instructions are replaced with > D-form instructions, some addi instructions are replaced with add > instructions, and some addi instructions are eliminated. The typical > improvement for the included tests is a decrease of 4.28% to 12.12% in the > number of instructions executed on each iteration of the loop. The > optimization has not shown measurable improvement on specmark tests, > presumably because the typical loops that are benefited by this optimization > are memory bounded and this optimization does not eliminate memory loads or > stores. However, it is anticipated that multi-threaded workloads and > measurements of total power and cooling costs for heavy server workloads > would benefit. > > This version 4 patch responds to feedback and numerous suggestions by Segher: > > 1. Further improvements to comments and discussion of computational > complexity. > > 2. Changed the name of insn_sequence_no to luid. > > 3. Fixed some typos in comments. > > 4. Added macro-defined constants to enforce upper bounds on the sizes (and > number of required iterations) for certain data structures. The intent is to > bound compile time for programs that represent large numbers of opportunities > for D-form replacements. This optimization pass ignores parts of a source > program that exceed these macro-defined size limits. > > In a separate mail, I have sent discussion regarding the behavior of > preceding passes and how this behavior relates to this new pass. > > I have built and regression tested this patch on powerpc64le-unknown-linux > target with no regressions. > > Is this ok for trunk? > > gcc/ChangeLog: > > 2019-10-25 Kelvin Nilsen > > * config/rs6000/rs6000-p9dform.c: New file. > * config/rs6000/rs6000-passes.def: Add pass_insert_dform. > * config/rs6000/rs6000-protos.h > (rs6000_target_supports_dform_offset_p): New function prototype. > (make_pass_insert_dform): Likewise. > * config/rs6000/rs6000.c (rs6000_target_supports_dform_offset_p): > New function. > * config/rs6000/t-rs6000 (rs6000-p9dform.o): New build target. > * config.gcc: Add rs6000-p9dform.o object file. > > gcc/testsuite/ChangeLog: > > 2019-10-25 Kelvin Nilsen > > * gcc.target/powerpc/p9-dform-0.c: New test. > * gcc.target/powerpc/p9-dform-1.c: New test. > * gcc.target/powerpc/p9-dform-10.c: New test. > * gcc.target/powerpc/p9-dform-11.c: New test. > * gcc.target/powerpc/p9-dform-12.c: New test. > * gcc.target/powerpc/p9-dform-13.c: New test. > * gcc.target/powerpc/p9-dform-14.c: New test. > * gcc.target/powerpc/p9-dform-15.c: New test. > * gcc.target/powerpc/p9-dform-2.c: New test. > * gcc.target/powerpc/p9-dform-3.c: New test. > * gcc.target/powerpc/p9-dform-4.c: New test. > * gcc.target/powerpc/p9-dform-5.c: New test. > * gcc.target/powerpc/p9-dform-6.c: New test. > * gcc.target/powerpc/p9-dform-7.c: New test. > * gcc.target/powerpc/p9-dform-8.c: New test. > * gcc.target/powerpc/p9-dform-9.c: New test. > * gcc.target/powerpc/p9-dform-generic.h: New test. > > Index: gcc/config/rs6000/rs6000-p9dform.c > === > --- gcc/config/rs6000/rs6000-p9dform.c(nonexistent) > +++ gcc/config/rs6000/rs6000-p9dform.c(working copy) > @@ -0,0 +1,1763 @@ > +/* Subroutines used to transform array subscripting expressions into > + forms that are more amenable to d-form instruction selection for p9 > + little-endian VSX code. > + Copyright (C) 1991-2019 Free Software Foundation, Inc. > + > + This file is part of GCC. > + > + GCC is free software; you can redistribute it and/or modify it > + under the terms of the GNU General Public License as published > + by the Free Software Foundation; either version 3, or (at your > + option) any later version. > + > + GCC is distributed in the hope that it will be useful, but WITHOUT > + ANY WARRANTY; without even the implied warranty of MERCHANTABILITY > + or FITNESS FOR A PAR
[PATCH, rs6000] Add xxswapd support for V2DF and V2DI modes
It was recently discovered that the existing xxswapd instruction patterns lack support for the V2DF and V2DI modes. Support for these modes is required for certain new instruction patterns that are being implemented. This patch adds the desired support. The patch has been bootstrapped and tested without regressions on powerpc64le-unknown-linux. Is this ok for trunk? gcc/ChangeLog: 2019-11-06 Kelvin Nilsen * config/rs6000/vsx.md (xxswapd_): Add support for V2DF and V2DI modes. Index: gcc/config/rs6000/vsx.md === --- gcc/config/rs6000/vsx.md(revision 277861) +++ gcc/config/rs6000/vsx.md(working copy) @@ -2987,6 +2987,17 @@ "xxpermdi %x0,%x1,%x1,2" [(set_attr "type" "vecperm")]) +(define_insn "xxswapd_" + [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa") + (vec_select:VSX_D + (match_operand:VSX_D 1 "vsx_register_operand" "wa") + (parallel [(const_int 1) (const_int 0)])))] + "TARGET_VSX" +;; AIX does not support extended mnemonic xxswapd. Use the basic +;; mnemonic xxpermdi instead. + "xxpermdi %x0,%x1,%x1,2" + [(set_attr "type" "vecperm")]) + ;; lxvd2x for little endian loads. We need several of ;; these since the form of the PARALLEL differs by mode. (define_insn "*vsx_lxvd2x2_le_"
[PATCH v4, rs6000] Replace X-form addressing with D-form addressing in new pass for Power9
This patch adds a new optimization pass for rs6000 targets. This new pass scans existing rtl expressions and replaces X-form loads and stores with rtl expressions that favor selection of the D-form instructions in contexts for which the D-form instructions are preferred. The new pass runs after the RTL loop optimizations since loop unrolling often introduces opportunities for beneficial replacements of X-form addressing instructions. For each of the new tests, multiple X-form instructions are replaced with D-form instructions, some addi instructions are replaced with add instructions, and some addi instructions are eliminated. The typical improvement for the included tests is a decrease of 4.28% to 12.12% in the number of instructions executed on each iteration of the loop. The optimization has not shown measurable improvement on specmark tests, presumably because the typical loops that are benefited by this optimization are memory bounded and this optimization does not eliminate memory loads or stores. However, it is anticipated that multi-threaded workloads and measurements of total power and cooling costs for heavy server workloads would benefit. This version 4 patch responds to feedback and numerous suggestions by Segher: 1. Further improvements to comments and discussion of computational complexity. 2. Changed the name of insn_sequence_no to luid. 3. Fixed some typos in comments. 4. Added macro-defined constants to enforce upper bounds on the sizes (and number of required iterations) for certain data structures. The intent is to bound compile time for programs that represent large numbers of opportunities for D-form replacements. This optimization pass ignores parts of a source program that exceed these macro-defined size limits. In a separate mail, I have sent discussion regarding the behavior of preceding passes and how this behavior relates to this new pass. I have built and regression tested this patch on powerpc64le-unknown-linux target with no regressions. Is this ok for trunk? gcc/ChangeLog: 2019-10-25 Kelvin Nilsen * config/rs6000/rs6000-p9dform.c: New file. * config/rs6000/rs6000-passes.def: Add pass_insert_dform. * config/rs6000/rs6000-protos.h (rs6000_target_supports_dform_offset_p): New function prototype. (make_pass_insert_dform): Likewise. * config/rs6000/rs6000.c (rs6000_target_supports_dform_offset_p): New function. * config/rs6000/t-rs6000 (rs6000-p9dform.o): New build target. * config.gcc: Add rs6000-p9dform.o object file. gcc/testsuite/ChangeLog: 2019-10-25 Kelvin Nilsen * gcc.target/powerpc/p9-dform-0.c: New test. * gcc.target/powerpc/p9-dform-1.c: New test. * gcc.target/powerpc/p9-dform-10.c: New test. * gcc.target/powerpc/p9-dform-11.c: New test. * gcc.target/powerpc/p9-dform-12.c: New test. * gcc.target/powerpc/p9-dform-13.c: New test. * gcc.target/powerpc/p9-dform-14.c: New test. * gcc.target/powerpc/p9-dform-15.c: New test. * gcc.target/powerpc/p9-dform-2.c: New test. * gcc.target/powerpc/p9-dform-3.c: New test. * gcc.target/powerpc/p9-dform-4.c: New test. * gcc.target/powerpc/p9-dform-5.c: New test. * gcc.target/powerpc/p9-dform-6.c: New test. * gcc.target/powerpc/p9-dform-7.c: New test. * gcc.target/powerpc/p9-dform-8.c: New test. * gcc.target/powerpc/p9-dform-9.c: New test. * gcc.target/powerpc/p9-dform-generic.h: New test. Index: gcc/config/rs6000/rs6000-p9dform.c === --- gcc/config/rs6000/rs6000-p9dform.c (nonexistent) +++ gcc/config/rs6000/rs6000-p9dform.c (working copy) @@ -0,0 +1,1763 @@ +/* Subroutines used to transform array subscripting expressions into + forms that are more amenable to d-form instruction selection for p9 + little-endian VSX code. + Copyright (C) 1991-2019 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published + by the Free Software Foundation; either version 3, or (at your + option) any later version. + + GCC is distributed in the hope that it will be useful, but WITHOUT + ANY WARRANTY; without even the implied warranty of MERCHANTABILITY + or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public + License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + <http://www.gnu.org/licenses/>. */ + +#include "config.h" +#include "system.h" +#include "coretypes.h" +#include "backend.h" +#include "rtl.h" +#include "tree.h" +#include "memmodel.h" +#include "df.h" +#include "
Re: {PATCH v3, rs6000] Replace X-form addressing with D-form addressing in new pass for Power9
On 10/17/19 5:57 PM, Segher Boessenkool wrote: > Hi Kelvin, > > On Wed, Oct 09, 2019 at 03:28:45PM -0500, Kelvin Nilsen wrote: >> This new pass scans existing rtl expressions and replaces them with rtl >> expressions that favor selection of the D-form instructions in contexts for >> which the D-form instructions are preferred. The new pass runs after the >> RTL loop optimizations since loop unrolling often introduces opportunities >> for beneficial replacements of X-form addressing instructions. >> >> For each of the new tests, multiple X-form instructions are replaced with >> D-form instructions, some addi instructions are replaced with add >> instructions, and some addi instructions are eliminated. The typical >> improvement for the included tests is a decrease of 4.28% to 12.12% in the >> number of instructions executed on each iteration of the loop. The >> optimization has not shown measurable improvement on specmark tests, >> presumably because the typical loops that are benefited by this optimization >> are memory bounded and this optimization does not eliminate memory loads or >> stores. However, it is anticipated that multi-threaded workloads and >> measurements of total power and cooling costs for heavy server workloads >> would benefit. > > My first question is, why did ivopts choose the suboptimal solution? > _Did_ it, or did something later mess things up? > > This new pass can help us investigate that. It certainly sounds like we > could do better earlier already. > > I think it is a good design to make fixes late in the pass pipeline, *but* > we should try to make good choices earlier, too -- the "late tweaks" should > be just that, tweaks; 4%-12% is a bit much. > > (It's not that super late here; but still, why does it help so much?) > Thanks Segher for looking over my draft patch and providing your comments. When I first began work on this reported performance problem, I did look at the earlier passes in hopes of identifying a better place to address the poor instruction selection. It is difficult to know exactly where we want to accomplish the improved code generation. Some of the "earlier" candidate passes are disadvantaged because they are "blind" to instruction costs and do not even have an awareness of which addressing modes are supported by which instructions. Below, I'm providing some of the earlier pass information for one of the sample programs that motivates this patch. Please feel free to comment. I welcome suggestions as to alternative ways to attack this. Thanks. Consider the following program: extern float opt_value extern char *opt_desc; #define M 128 #define N 512 double x [N]; double y [N]; int main (int argc, char *argv []) { double sacc; first_dummy (); for (int j = 0; j < M; j++) { sacc = 0.00; for (unsigned long long int i = 0; i < N; i++) sacc += x[i] * y[i]; dummy (sacc, N); } opt_value = ((float) N) * 2 * ((float) M); opt_desc = "flops"; other_dummy (); } Compile this with the following command-line options on a Power target: xgcc p9-dform-0.c -da -m64 -fdump-tree-all -fno-diagnostics-show-caret \ -fno-diagnostics-show-line-numbers -fdiagnostics-color=never -O3 \ -mcpu=power9 -mtune=power9 -funroll-loops -ffat-lto-objects -fno-ident * * Auto-vectorization transforms this program into approximately the * following C code * int main (int argc, char *argv []) { double sacc; vector double x_values, y_values, xy_product; vector double *vectp_x, *vectp_y; first_dummy (); for (int j = 0; j < M; j++) { sacc = 0.00; vectp_x = x; vectp_y = y; for (unsigned int ivtmp_31 = 0; ivtmp_31 != N / 2; ivtmp_31++) { x_values = *vectp_x; y_values = *vectp_y; xy_product = x_values * y_values; sacc += xy_product[0]; sacc += xy_product[1]; vectp_x++; vectp_y++; } dummy (sacc, N); } opt_value = ((float) N) * 2 * ((float) M); opt_desc = "flops"; other_dummy (); } * * Induction variable optimization transforms this program into approximately * the following C code * int main (int argc, char *argv []) { double sacc; vector double x_values, y_values, xy_product; first_dummy (); for (int j = 0; j < M; j++) { sacc = 0.00; for (unsigned int ivtmp_14 = 0; ivtmp_31 != 4096; ivtmp_14 += 16) { x_values = x [ivtmp_14]; y_values = y [ivtmp_14]; xy_product = x_values * y_values; sacc += xy_product[0]; sacc += xy_product[1]; /* Note: induction variable optimization has removed 2 p
{PATCH v3, rs6000] Replace X-form addressing with D-form addressing in new pass for Power9
This patch is a refinement of a patch first submitted to this list on Nov. 10, 2018, with revisions submitted this list on Dec. 13, 2018 and Sep. 3, 2019. This new pass scans existing rtl expressions and replaces them with rtl expressions that favor selection of the D-form instructions in contexts for which the D-form instructions are preferred. The new pass runs after the RTL loop optimizations since loop unrolling often introduces opportunities for beneficial replacements of X-form addressing instructions. For each of the new tests, multiple X-form instructions are replaced with D-form instructions, some addi instructions are replaced with add instructions, and some addi instructions are eliminated. The typical improvement for the included tests is a decrease of 4.28% to 12.12% in the number of instructions executed on each iteration of the loop. The optimization has not shown measurable improvement on specmark tests, presumably because the typical loops that are benefited by this optimization are memory bounded and this optimization does not eliminate memory loads or stores. However, it is anticipated that multi-threaded workloads and measurements of total power and cooling costs for heavy server workloads would benefit. This version 3 patch responds to feedback and numerous suggestions by Segher: 1. Fixed multiple typos. 2. Improved comments and added discussion of computational complexity. 3. Added a field to the indexing_web_entry class, allowing constant-time test for dominance of instructions within a common basic block. 4. Improved implementation of the equivalence hash function. 5. Refactored the code to divide into smaller functions and provide more descriptive commentary. 6. Improved indentation. 7. Corrected definition of max_16bit_signed value. 8. Added To-do comment in rs6000_target_supports_dform_offset_p, to alert maintainers that adding support for future hardware architectures will require code to be added to this function. 9. Simplified the dg directives in the new test cases. I have built and regression tested this patch on powerpc64le-unknown-linux target with no regressions. Is this ok for trunk? gcc/ChangeLog: 2019-10-09 Kelvin Nilsen * config/rs6000/rs6000-p9dform.c: New file. * config/rs6000/rs6000-passes.def: Add pass_insert_dform. * config/rs6000/rs6000-protos.h (rs6000_target_supports_dform_offset_p): New function prototype. (make_pass_insert_dform): Likewise. * config/rs6000/rs6000.c (rs6000_target_supports_dform_offset_p): New function. * config/rs6000/t-rs6000 (rs6000-p9dform.o): New build target. * config.gcc: Add rs6000-p9dform.o object file. gcc/testsuite/ChangeLog: 2019-10-09 Kelvin Nilsen * gcc.target/powerpc/p9-dform-0.c: New test. * gcc.target/powerpc/p9-dform-1.c: New test. * gcc.target/powerpc/p9-dform-10.c: New test. * gcc.target/powerpc/p9-dform-11.c: New test. * gcc.target/powerpc/p9-dform-12.c: New test. * gcc.target/powerpc/p9-dform-13.c: New test. * gcc.target/powerpc/p9-dform-14.c: New test. * gcc.target/powerpc/p9-dform-15.c: New test. * gcc.target/powerpc/p9-dform-2.c: New test. * gcc.target/powerpc/p9-dform-3.c: New test. * gcc.target/powerpc/p9-dform-4.c: New test. * gcc.target/powerpc/p9-dform-5.c: New test. * gcc.target/powerpc/p9-dform-6.c: New test. * gcc.target/powerpc/p9-dform-7.c: New test. * gcc.target/powerpc/p9-dform-8.c: New test. * gcc.target/powerpc/p9-dform-9.c: New test. * gcc.target/powerpc/p9-dform-generic.h: New test. Index: gcc/config/rs6000/rs6000-p9dform.c === --- gcc/config/rs6000/rs6000-p9dform.c (nonexistent) +++ gcc/config/rs6000/rs6000-p9dform.c (working copy) @@ -0,0 +1,1623 @@ +/* Subroutines used to transform array subscripting expressions into + forms that are more amenable to d-form instruction selection for p9 + little-endian VSX code. + Copyright (C) 1991-2019 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published + by the Free Software Foundation; either version 3, or (at your + option) any later version. + + GCC is distributed in the hope that it will be useful, but WITHOUT + ANY WARRANTY; without even the implied warranty of MERCHANTABILITY + or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public + License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + <http://www.gnu.org/licenses/>. */ + +#include "config.h" +#include "system.h" +#include "coretypes.h" +#include "backend.h"
[PATCH v2, rs6000] Replace X-form addressing with D-form addressing in new pass for Power 9
This patch is a refinement of a path first submitted to this list on Nov. 10, 2018, with a revision submitted this list on Dec. 13, 2018. At the time of the last submission, it was deemed too close to the close of GCC 9, so was not considered at that time. This new pass scans existing rtl expressions and replaces them with rtl expressions that favor selection of the D-form instructions in contexts for which the D-form instructions are preferred. The new pass runs after the RTL loop optimizations since loop unrolling often introduces opportunities for beneficial replacements of X-form addressing instructions. This version 2 of the patch includes new tests representing additional applications for which the existing code generator produces sub-optimal code. For each of the sample tests, multiple x-form instructions are replaced with D-form instructions, some addi instructions are replaced with add instructions, and some addi instructions are eliminated. The typical improvement for the included tests is a decrease of 4.28% to 12.12% in the number of instructions executed on each iteration of the loop. The optimization has not shown measurable improvement on, for example, specmark tests, presumably because the typical loops that are benefited by this optimization are memory bounded and this optimization does not eliminate memory loads or stores. However, it is anticipated that multi-threaded workloads and, for example, measurements of total power and cooling costs for heavy server workloads would benefit. I have built and regression tested this patch on powerpc64le-unknown-linux target with no regressions. Is this ok for trunk? gcc/ChangeLog: 2019-09-03 Kelvin Nilsen * config/rs6000/rs6000-p9dform.c: New file. * config/rs6000/rs6000-passes.def: Add pass_insert_dform. * config/rs6000/rs6000-protos.h (rs6000_target_supports_dform_offset_p): New function prototype. (make_pass_insert_dform): Likewise. * config/rs6000/rs6000.c (rs6000_target_supports_dform_offset_p): New function. * config/rs6000/t-rs6000 (rs6000-p9dform.o): New build target. * config.gcc: Add rs6000-p9dform.o object file. gcc/testsuite/ChangeLog: 2019-09-03 Kelvin Nilsen * gcc.target/powerpc/p9-dform-0.c: New test. * gcc.target/powerpc/p9-dform-1.c: New test. * gcc.target/powerpc/p9-dform-10.c: New test. * gcc.target/powerpc/p9-dform-11.c: New test. * gcc.target/powerpc/p9-dform-12.c: New test. * gcc.target/powerpc/p9-dform-13.c: New test. * gcc.target/powerpc/p9-dform-14.c: New test. * gcc.target/powerpc/p9-dform-15.c: New test. * gcc.target/powerpc/p9-dform-2.c: New test. * gcc.target/powerpc/p9-dform-3.c: New test. * gcc.target/powerpc/p9-dform-4.c: New test. * gcc.target/powerpc/p9-dform-5.c: New test. * gcc.target/powerpc/p9-dform-6.c: New test. * gcc.target/powerpc/p9-dform-7.c: New test. * gcc.target/powerpc/p9-dform-8.c: New test. * gcc.target/powerpc/p9-dform-9.c: New test. * gcc.target/powerpc/p9-dform-generic.h: New header. Index: gcc/config/rs6000/rs6000-p9dform.c === --- gcc/config/rs6000/rs6000-p9dform.c (nonexistent) +++ gcc/config/rs6000/rs6000-p9dform.c (working copy) @@ -0,0 +1,1487 @@ +/* Subroutines used to transform array subscripting expressions into + forms that are more amenable to d-form instruction selection for p9 + little-endian VSX code. + Copyright (C) 1991-2018 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published + by the Free Software Foundation; either version 3, or (at your + option) any later version. + + GCC is distributed in the hope that it will be useful, but WITHOUT + ANY WARRANTY; without even the implied warranty of MERCHANTABILITY + or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public + License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + <http://www.gnu.org/licenses/>. */ + +#include "config.h" +#include "system.h" +#include "coretypes.h" +#include "backend.h" +#include "rtl.h" +#include "tree.h" +#include "memmodel.h" +#include "df.h" +#include "tm_p.h" +#include "ira.h" +#include "print-tree.h" +#include "varasm.h" +#include "explow.h" +#include "expr.h" +#include "output.h" +#include "tree-pass.h" +#include "rtx-vector-builder.h" +#include "cfgloop.h" + +#include "insn-config.h" +#include "recog.h" + +#include &
[PATCH, rs6000] PR89765: Multiple problems with vec-insert implementation on PowerPC
In combination with a related recently committed patch (https://gcc.gnu.org/ml/gcc-patches/2019-04/msg00989.html), the attached patch resolves the issues described in this problem report. This patch also includes tests to exercise the previously committed patch. This patch includes redundant content from patch PR89424 (https://gcc.gnu.org/ml/gcc-patches/2019-04/msg00994.html), which has been already been approved by Segher for trunk and backports to GCC 7 and 8 but is awaiting GCC 9 release. The patch has been bootstrapped and tested without regressions on powerpc64le-unknown-linux-gnu (both P8 and P9) and on powerpc64-unknown-linux-gnu (P7 and P8, both -m32 and -m64). Segher: After GCC9 release, is this ok for trunk and backports to GCC 7 and GCC8? Jakub or Richi: Is this patch and the redundant PR89424 patch ok for backports to GCC9? gcc/ChangeLog: 2019-04-30 Kelvin Nilsen PR target/89765 * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin): In handling of ALTIVEC_BUILTIN_VEC_INSERT, use modular arithmetic to compute vector element selector for both constant and variable operands. * config/rs6000/rs6000.c (rs6000_expand_vector_extract): Add case to handle V1TImode vectors. gcc/testsuite/ChangeLog: 2019-04-30 Kelvin Nilsen PR target/89765 * gcc.target/powerpc/pr89765-mc.c: New test. * gcc.target/powerpc/vsx-builtin-10c.c: New test. * gcc.target/powerpc/vsx-builtin-10d.c: New test. * gcc.target/powerpc/vsx-builtin-11c.c: New test. * gcc.target/powerpc/vsx-builtin-11d.c: New test. * gcc.target/powerpc/vsx-builtin-12c.c: New test. * gcc.target/powerpc/vsx-builtin-12d.c: New test. * gcc.target/powerpc/vsx-builtin-13c.c: New test. * gcc.target/powerpc/vsx-builtin-13d.c: New test. * gcc.target/powerpc/vsx-builtin-14c.c: New test. * gcc.target/powerpc/vsx-builtin-14d.c: New test. * gcc.target/powerpc/vsx-builtin-15c.c: New test. * gcc.target/powerpc/vsx-builtin-15d.c: New test. * gcc.target/powerpc/vsx-builtin-16c.c: New test. * gcc.target/powerpc/vsx-builtin-16d.c: New test. * gcc.target/powerpc/vsx-builtin-17c.c: New test. * gcc.target/powerpc/vsx-builtin-17d.c: New test. * gcc.target/powerpc/vsx-builtin-18c.c: New test. * gcc.target/powerpc/vsx-builtin-18d.c: New test. * gcc.target/powerpc/vsx-builtin-19c.c: New test. * gcc.target/powerpc/vsx-builtin-19d.c: New test. * gcc.target/powerpc/vsx-builtin-20c.c: New test. * gcc.target/powerpc/vsx-builtin-20d.c: New test. * gcc.target/powerpc/vsx-builtin-9c.c: New test. * gcc.target/powerpc/vsx-builtin-9d.c: New test. * gcc.target/powerpc/vsx-builtin-13a.c (PR89424): Define this macro to increase coverage of test. * gcc.target/powerpc/vsx-builtin-13b.c (PR89424): Likewise. * gcc.target/powerpc/vsx-builtin-20a.c (PR89424): Likewise. * gcc.target/powerpc/vsx-builtin-20b.c (PR89424): Likewise. Index: gcc/config/rs6000/rs6000-c.c === --- gcc/config/rs6000/rs6000-c.c(revision 270584) +++ gcc/config/rs6000/rs6000-c.c(working copy) @@ -6736,11 +6736,13 @@ altivec_resolve_overloaded_builtin (location_t loc /* If we can use the VSX xxpermdi instruction, use that for insert. */ mode = TYPE_MODE (arg1_type); if ((mode == V2DFmode || mode == V2DImode) && VECTOR_UNIT_VSX_P (mode) - && TREE_CODE (arg2) == INTEGER_CST - && wi::ltu_p (wi::to_wide (arg2), 2)) + && TREE_CODE (arg2) == INTEGER_CST) { + wide_int selector = wi::to_wide (arg2); + selector = wi::umod_trunc (selector, 2); tree call = NULL_TREE; + arg2 = wide_int_to_tree (TREE_TYPE (arg2), selector); if (mode == V2DFmode) call = rs6000_builtin_decls[VSX_BUILTIN_VEC_SET_V2DF]; else if (mode == V2DImode) @@ -6752,11 +6754,12 @@ altivec_resolve_overloaded_builtin (location_t loc return build_call_expr (call, 3, arg1, arg0, arg2); } else if (mode == V1TImode && VECTOR_UNIT_VSX_P (mode) - && TREE_CODE (arg2) == INTEGER_CST - && wi::eq_p (wi::to_wide (arg2), 0)) + && TREE_CODE (arg2) == INTEGER_CST) { tree call = rs6000_builtin_decls[VSX_BUILTIN_VEC_SET_V1TI]; + wide_int selector = wi::zero(32); + arg2 = wide_int_to_tree (TREE_TYPE (arg2), selector); /* Note, __builtin_vec_insert_ has vector and scalar types reversed. */ return build_call_expr (call, 3, arg1, arg0, arg2); @@ -6764,10 +6767,13 @@ altivec_resolve_overloaded_builtin (location_t loc /* Build *(((arg1_inner_type*)&(vec
[PATCH, rs6000] PR89424: __builtin_vec_ext_v1ti (v, i) results in ICE with variable i (RS6000)
The attached patch resolves the issue described in this problem report. The patch has been bootstrapped and tested without regressions on powerpc64le-unknown-linux-gnu (both P8 and P9) and on powerpc64-unknown-linux-gnu (P7 and P8, both -m32 and -m64). Is this ok for trunk and backports? Thanks. gcc/ChangeLog: 2019-04-25 Kelvin Nilsen PR target/89424 * config/rs6000/rs6000.c (rs6000_expand_vector_extract): Add handling of V1TImode. gcc/testsuite/ChangeLog: 2019-04-25 Kelvin Nilsen PR target/89424 * gcc.target/powerpc/pr89424-0.c: New test. * gcc.target/powerpc/vsx-builtin-13a.c: Define macro PR89424 to enable testing of newly patched capability. * gcc.target/powerpc/vsx-builtin-13b.c: Likewise. * gcc.target/powerpc/vsx-builtin-20a.c: Likewise. * gcc.target/powerpc/vsx-builtin-20b.c: Likewise. Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 270513) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -6944,6 +6944,10 @@ rs6000_expand_vector_extract (rtx target, rtx vec, switch (mode) { + case E_V1TImode: + emit_move_insn (target, gen_lowpart (TImode, vec)); + return; + case E_V2DFmode: emit_insn (gen_vsx_extract_v2df_var (target, vec, elt)); return; Index: gcc/testsuite/gcc.target/powerpc/pr89424-0.c === --- gcc/testsuite/gcc.target/powerpc/pr89424-0.c(nonexistent) +++ gcc/testsuite/gcc.target/powerpc/pr89424-0.c(working copy) @@ -0,0 +1,78 @@ +/* { dg-do run { target { powerpc*-*-* && lp64 } } } */ +/* { dg-skip-if "" { powerpc*-*-darwin* } } */ +/* { dg-require-effective-target vsx_hw } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { } } */ +/* { dg-options "-mvsx" } */ + +/* This test should run the same on any target that supports vsx + instructions. Intentionally not specifying cpu in order to test + all code generation paths. */ + +#include + +extern void abort (void); + +/* Define PR89626 after that pr is addressed. */ +#ifdef PR89626 +#define SIGNED +#else +#define SIGNED signed +#endif + +#define CONST0 (((__int128) 31415926539) << 60) + +/* Test that indices > length of vector are applied modulo the vector + length. */ + + +/* Test for variable selector and vector residing in register. */ +__attribute__((noinline)) +__int128 ei (vector SIGNED __int128 v, int i) +{ + return __builtin_vec_ext_v1ti (v, i); +} + +/* Test for variable selector and vector residing in memory. */ +__int128 mei (vector SIGNED __int128 *vp, int i) +{ + return __builtin_vec_ext_v1ti (*vp, i); +} + +int main (int argc, char *argv[]) { + vector SIGNED __int128 dv = { CONST0 }; + __int128 d; + + d = ei (dv, 0); + if (d != CONST0) +abort (); + + d = ei (dv, 1); + if (d != CONST0) +abort (); + + d = ei (dv, 2); + if (d != CONST0) +abort (); + + d = ei (dv, 3); + if (d != CONST0) +abort (); + + d = mei (, 0); + if (d != CONST0) +abort (); + + d = mei (, 1); + if (d != CONST0) +abort (); + + d = mei (, 2); + if (d != CONST0) +abort (); + + d = mei (, 3); + if (d != CONST0) +abort (); + + return 0; +} Index: gcc/testsuite/gcc.target/powerpc/vsx-builtin-13a.c === --- gcc/testsuite/gcc.target/powerpc/vsx-builtin-13a.c (revision 270513) +++ gcc/testsuite/gcc.target/powerpc/vsx-builtin-13a.c (working copy) @@ -9,7 +9,7 @@ #include /* Define this after PR89424 is addressed. */ -#undef PR89424 +#define PR89424 /* Define this after PR89626 is addressed. */ #undef PR89626 Index: gcc/testsuite/gcc.target/powerpc/vsx-builtin-13b.c === --- gcc/testsuite/gcc.target/powerpc/vsx-builtin-13b.c (revision 270513) +++ gcc/testsuite/gcc.target/powerpc/vsx-builtin-13b.c (working copy) @@ -9,7 +9,7 @@ #include /* Define this after PR89424 is addressed. */ -#undef PR89424 +#define PR89424 /* Define this after PR89626 is addressed. */ #undef PR89626 Index: gcc/testsuite/gcc.target/powerpc/vsx-builtin-20a.c === --- gcc/testsuite/gcc.target/powerpc/vsx-builtin-20a.c (revision 270513) +++ gcc/testsuite/gcc.target/powerpc/vsx-builtin-20a.c (working copy) @@ -9,7 +9,7 @@ #include /* Define this after PR89424 is addressed. */ -#undef PR89424 +#define PR89424 extern void abort (void); Index: gcc/testsuite/gcc.target/powerpc/vsx-builtin-20b.c === --- gcc/testsuite/gcc.target/powerpc/vsx-builtin-20b.c (revision 270513) +++ gcc/testsuite/gcc.target/powerpc/vsx-builtin-20b.
[PATCH, rs6000] PR89424: __builtin_vec_ext_v1ti (v, i) results in ICE with variable i (RS6000)
The attached patch resolves the issue described in this problem report. The patch has been bootstrapped and tested without regressions on powerpc64le-unknown-linux-gnu (both P8 and P9) and on powerpc64-linux (P7 and P8, both -m32 and -m64). Is this ok for trunk and backports? Thanks. gcc/ChangeLog: 2019-04-25 Kelvin Nilsen PR target/89424 * config/rs6000/rs6000.c (rs6000_expand_vector_extract): Add handling of V1TImode. gcc/testsuite/ChangeLog: 2019-04-25 Kelvin Nilsen PR target/89424 * gcc.target/powerpc/pr89424-0.c: New test. * gcc.target/powerpc/vsx-builtin-13a.c: Define macro PR89424 to enable testing of newly patched capability. * gcc.target/powerpc/vsx-builtin-13b.c: Likewise. * gcc.target/powerpc/vsx-builtin-20a.c: Likewise. * gcc.target/powerpc/vsx-builtin-20b.c: Likewise. Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 270513) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -6944,6 +6944,10 @@ rs6000_expand_vector_extract (rtx target, rtx vec, switch (mode) { + case E_V1TImode: + emit_move_insn (target, gen_lowpart (TImode, vec)); + return; + case E_V2DFmode: emit_insn (gen_vsx_extract_v2df_var (target, vec, elt)); return; Index: gcc/testsuite/gcc.target/powerpc/pr89424-0.c === --- gcc/testsuite/gcc.target/powerpc/pr89424-0.c(nonexistent) +++ gcc/testsuite/gcc.target/powerpc/pr89424-0.c(working copy) @@ -0,0 +1,78 @@ +/* { dg-do run { target { powerpc*-*-* && lp64 } } } */ +/* { dg-skip-if "" { powerpc*-*-darwin* } } */ +/* { dg-require-effective-target vsx_hw } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { } } */ +/* { dg-options "-mvsx" } */ + +/* This test should run the same on any target that supports vsx + instructions. Intentionally not specifying cpu in order to test + all code generation paths. */ + +#include + +extern void abort (void); + +/* Define PR89626 after that pr is addressed. */ +#ifdef PR89626 +#define SIGNED +#else +#define SIGNED signed +#endif + +#define CONST0 (((__int128) 31415926539) << 60) + +/* Test that indices > length of vector are applied modulo the vector + length. */ + + +/* Test for variable selector and vector residing in register. */ +__attribute__((noinline)) +__int128 ei (vector SIGNED __int128 v, int i) +{ + return __builtin_vec_ext_v1ti (v, i); +} + +/* Test for variable selector and vector residing in memory. */ +__int128 mei (vector SIGNED __int128 *vp, int i) +{ + return __builtin_vec_ext_v1ti (*vp, i); +} + +int main (int argc, char *argv[]) { + vector SIGNED __int128 dv = { CONST0 }; + __int128 d; + + d = ei (dv, 0); + if (d != CONST0) +abort (); + + d = ei (dv, 1); + if (d != CONST0) +abort (); + + d = ei (dv, 2); + if (d != CONST0) +abort (); + + d = ei (dv, 3); + if (d != CONST0) +abort (); + + d = mei (, 0); + if (d != CONST0) +abort (); + + d = mei (, 1); + if (d != CONST0) +abort (); + + d = mei (, 2); + if (d != CONST0) +abort (); + + d = mei (, 3); + if (d != CONST0) +abort (); + + return 0; +} Index: gcc/testsuite/gcc.target/powerpc/vsx-builtin-13a.c === --- gcc/testsuite/gcc.target/powerpc/vsx-builtin-13a.c (revision 270513) +++ gcc/testsuite/gcc.target/powerpc/vsx-builtin-13a.c (working copy) @@ -9,7 +9,7 @@ #include /* Define this after PR89424 is addressed. */ -#undef PR89424 +#define PR89424 /* Define this after PR89626 is addressed. */ #undef PR89626 Index: gcc/testsuite/gcc.target/powerpc/vsx-builtin-13b.c === --- gcc/testsuite/gcc.target/powerpc/vsx-builtin-13b.c (revision 270513) +++ gcc/testsuite/gcc.target/powerpc/vsx-builtin-13b.c (working copy) @@ -9,7 +9,7 @@ #include /* Define this after PR89424 is addressed. */ -#undef PR89424 +#define PR89424 /* Define this after PR89626 is addressed. */ #undef PR89626 Index: gcc/testsuite/gcc.target/powerpc/vsx-builtin-20a.c === --- gcc/testsuite/gcc.target/powerpc/vsx-builtin-20a.c (revision 270513) +++ gcc/testsuite/gcc.target/powerpc/vsx-builtin-20a.c (working copy) @@ -9,7 +9,7 @@ #include /* Define this after PR89424 is addressed. */ -#undef PR89424 +#define PR89424 extern void abort (void); Index: gcc/testsuite/gcc.target/powerpc/vsx-builtin-20b.c === --- gcc/testsuite/gcc.target/powerpc/vsx-builtin-20b.c (revision 270513) +++ gcc/testsuite/gcc.target/powerpc/vsx-builtin-20b.c (working copy) @
[PATCH, rs6000] PR87532: Bad results from vec_extract (unsigned char, foo) dependent upon function inline
A patch to address this problem report was committed on 3/15/2019. Some of the new regressions tests submitted with that initial patch failed on P8 big-endian and on P9 little-endian. This new patch addresses the code generation problems that were uncovered by these failing tests. Additionally, this new patch corrects some of the expected instruction counts for certain previously existing regression tests on certain targets to adjust for changes in the generated code. This new patch has been bootstrapped and tested without regressions on powerpcle-unknown-linux (both P8 and P9) and on powerpc-linux (P7 and P8, both -m32 and -m64). Is this ok for trunk and backports? Thanks. gcc/ChangeLog: 2019-04-09 Kelvin Nilsen PR target/87532 * config/rs6000/rs6000.c (rs6000_split_vec_extract_var): Use inner mode of vector rather than mode of destination for move instruction. * config/rs6000/vsx.md (*vsx_extract__mode_var): Use QI inner mode with V16QI vector mode. gcc/testsuite/ChangeLog: 2019-04-09 Kelvin Nilsen PR target/87532 * gcc.target/powerpc/fold-vec-extract-char.p8.c: Adjust expected instruction counts. * gcc.target/powerpc/fold-vec-extract-int.p8.c: Likewise. * gcc.target/powerpc/fold-vec-extract-short.p8.c: Likewise.: Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 270127) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -7167,7 +7167,7 @@ rtx tmp_altivec) { machine_mode mode = GET_MODE (src); - machine_mode scalar_mode = GET_MODE (dest); + machine_mode scalar_mode = GET_MODE_INNER (GET_MODE (src)); unsigned scalar_size = GET_MODE_SIZE (scalar_mode); int byte_shift = exact_log2 (scalar_size); Index: gcc/config/rs6000/vsx.md === --- gcc/config/rs6000/vsx.md(revision 270127) +++ gcc/config/rs6000/vsx.md(working copy) @@ -3739,9 +3739,9 @@ DONE; }) -(define_insn_and_split "*vsx_extract___var" - [(set (match_operand:SDI 0 "gpc_reg_operand" "=r,r,r") - (zero_extend:SDI +(define_insn_and_split "*vsx_extract__mode_var" + [(set (match_operand: 0 "gpc_reg_operand" "=r,r,r") + (zero_extend: (unspec: [(match_operand:VSX_EXTRACT_I 1 "input_operand" "wK,v,m") (match_operand:DI 2 "gpc_reg_operand" "r,r,r")] @@ -3753,7 +3753,7 @@ "&& reload_completed" [(const_int 0)] { - machine_mode smode = mode; + machine_mode smode = mode; rs6000_split_vec_extract_var (gen_rtx_REG (smode, REGNO (operands[0])), operands[1], operands[2], operands[3], operands[4]); Index: gcc/testsuite/gcc.target/powerpc/fold-vec-extract-char.p8.c === --- gcc/testsuite/gcc.target/powerpc/fold-vec-extract-char.p8.c (revision 270127) +++ gcc/testsuite/gcc.target/powerpc/fold-vec-extract-char.p8.c (working copy) @@ -6,9 +6,9 @@ /* { dg-options "-mdejagnu-cpu=power8 -O2" } */ // six tests total. Targeting P8LE / P8BE. -// P8 LE variable offset: rldicl, subfic, sldi, mtvsrd, xxpermdi, vslo, mfvsrd, sradi, (extsb) +// P8 LE variable offset: rldicl, subfic, sldi, mtvsrd, xxpermdi, vslo, mfvsrd, sradi, rlwin, (extsb) // P8 LE constant offset: vspltb, mfvsrd, rlwinm, (extsb) -// P8 BE variable offset: sldi, mtvsrd, xxpermdi, vslo, mfvsrd, sradi, (extsb) +// P8 BE variable offset: sldi, mtvsrd, xxpermdi, vslo, mfvsrd, sradi, rlwinm, (extsb) // P8 BE constant offset: vspltb, mfvsrd, rlwinm, (extsb) /* { dg-final { scan-assembler-times {\mrldicl\M} 3 { target { le } } } } */ @@ -21,7 +21,7 @@ /* { dg-final { scan-assembler-times {\msrdi\M} 3 { target lp64 } } } */ /* { dg-final { scan-assembler-times "extsb" 2 } } */ /* { dg-final { scan-assembler-times {\mvspltb\M} 3 { target lp64 } } } */ -/* { dg-final { scan-assembler-times {\mrlwinm\M} 2 { target lp64} } } */ +/* { dg-final { scan-assembler-times {\mrlwinm\M} 4 { target lp64} } } */ /* multiple codegen variations for -m32. */ /* { dg-final { scan-assembler-times {\mrlwinm\M} 3 { target ilp32} } } */ Index: gcc/testsuite/gcc.target/powerpc/fold-vec-extract-int.p8.c === --- gcc/testsuite/gcc.target/powerpc/fold-vec-extract-int.p8.c (revision 270127) +++ gcc/testsuite/gcc.target/powerpc/fold-vec-extract-int.p8.c (working copy) @@ -7,14 +7,14 @@ // Targeting P8 (LE) and (BE). 6 tests total. // P8 LE constant: vspltw, mfvsrwz, (1:extsw/2:rldicl) -// P8 LE variables: rldicl, subfic, sldi, mtvsrd, xxpermdi, vslo, mfvsrd, sradi, (1:extsw) +// P8 LE variables:
[PATCH, rs6000] PR89732: New test pr87532-mc.c fails on compiler not defaulting to VSX
A recently added test was observed to fail when compiled without the -mvsx option. This patch adds -mvsx to the dg-options directive. Was boostrapped and regression tested on powerpc-linux (P7 big-endian, both -m32 and -m64). Was preapproved by seg...@gcc.gnu.org and has been merged with trunk. gcc/testsuite/ChangeLog: 2019-03-19 Kelvin Nilsen PR target/89736 * gcc.target/powerpc/pr87532-mc.c: Modify dejagnu directives to restrict this test to vsx targets. Index: gcc/testsuite/gcc.target/powerpc/pr87532-mc.c === --- gcc/testsuite/gcc.target/powerpc/pr87532-mc.c (revision 269782) +++ gcc/testsuite/gcc.target/powerpc/pr87532-mc.c (working copy) @@ -1,8 +1,8 @@ /* { dg-do run { target int128 } } */ -/* { dg-require-effective-target vmx_hw } */ -/* { dg-options "-maltivec -O2" } */ +/* { dg-require-effective-target vsx_hw } */ +/* { dg-options "-mvsx -O2" } */ -/* This test should run the same on any target that supports altivec/dfp +/* This test should run the same on any target that supports vsx instructions. Intentionally not specifying cpu in order to test all code generation paths. */
[PATCH v2, rs6000] PR87532: Bad Results from vec_extract(unsigned char, foo) dependent upon function inline
An initial draft patch was distributed on 3/8/19. Thanks Segher for careful review and detailed feedback. This second draft patch differs from the first in the following regards: 1. Simplified dg directives in the new tests cases: a) Removed { target { powerpc*-*-* } } from dg-do run directives because this is redundant with powerpc.exp b) Removed { dg-skip-if "" { powerpc*-*-darwin* } } directives because this is redundant with requiring vsx or altivec c) Changed effective target requirement from dfp_hw to vmx_hw d) Required { target { int128 } } instead of lp64 for tests that require int128 support e) Removed dg-skip-if for -mcpu= overrides, because these tests are not setting cpu f) Removed "-save-temps -dp -g" from the dg-options directive on certain tests 2. Corrected certain __asm__ statements to require the "v" output constraint rather than the "wa" output constraint (when compiling with -maltivec) 3. In rs6000-c.c, made modular computation of constant selector expression unconditional 4. In rs6000.c: a) Changed computation of bits_in_element to use GET_MODE_INNER macro. b) Changed error message in case of selector expression overflow to not make reference to HOST_WIDE_INT This problem report, though initially motivated by differences in behavior between constant and non-constant selector arguments, uncovered a number of inconsistencies in the implementation of vec_extract. This patch provides several fixes to make handling of constant selector expressions the same as the handling of non-constant selector expressions. In the process of testing, it was observed that certain existing regression tests were looking for the wrong instructions to be emitted and those tests have been updated. This has bootstrapped and tested without regressions on powerpc64le-unknown-linux (both P8 and P9) and on powerpc-linux (P7 big-endian, with both -m32 and -m64 target options). Is this ok for trunk? gcc/ChangeLog: 2019-03-13 Kelvin Nilsen PR target/87532 * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin): When handling vec_extract, use modular arithmetic to allow constant selectors greater than vector length. * config/rs6000/rs6000.c (rs6000_expand_vector_extract): Allow V1TImode vectors to have constant selector values greater than 0. Use modular arithmetic to compute vector index. (rs6000_split_vec_extract_var): Use modular arithmetic to compute index for in-memory vectors. Correct code generation for in-register vectors. (altivec_expand_vec_ext_builtin): Use modular arithmetic to computer index. gcc/testsuite/ChangeLog: 2019-03-13 Kelvin Nilsen PR target/87532 * gcc.target/powerpc/fold-vec-extract-char.p8.c: Modify expected instruction selection. * gcc.target/powerpc/fold-vec-extract-int.p8.c: Likewise. * gcc.target/powerpc/fold-vec-extract-short.p8.c: Likewise. * gcc.target/powerpc/pr87532-mc.c: New test. * gcc.target/powerpc/pr87532.c: New test. * gcc.target/powerpc/vec-extract-v16qiu-v2.h: New test. * gcc.target/powerpc/vec-extract-v16qiu-v2a.c: New test. * gcc.target/powerpc/vec-extract-v16qiu-v2b.c: New test. * gcc.target/powerpc/vsx-builtin-10a.c: New test. * gcc.target/powerpc/vsx-builtin-10b.c: New test. * gcc.target/powerpc/vsx-builtin-11a.c: New test. * gcc.target/powerpc/vsx-builtin-11b.c: New test. * gcc.target/powerpc/vsx-builtin-12a.c: New test. * gcc.target/powerpc/vsx-builtin-12b.c: New test. * gcc.target/powerpc/vsx-builtin-13a.c: New test. * gcc.target/powerpc/vsx-builtin-13b.c: New test. * gcc.target/powerpc/vsx-builtin-14a.c: New test. * gcc.target/powerpc/vsx-builtin-14b.c: New test. * gcc.target/powerpc/vsx-builtin-15a.c: New test. * gcc.target/powerpc/vsx-builtin-15b.c: New test. * gcc.target/powerpc/vsx-builtin-16a.c: New test. * gcc.target/powerpc/vsx-builtin-16b.c: New test. * gcc.target/powerpc/vsx-builtin-17a.c: New test. * gcc.target/powerpc/vsx-builtin-17b.c: New test. * gcc.target/powerpc/vsx-builtin-18a.c: New test. * gcc.target/powerpc/vsx-builtin-18b.c: New test. * gcc.target/powerpc/vsx-builtin-19a.c: New test. * gcc.target/powerpc/vsx-builtin-19b.c: New test. * gcc.target/powerpc/vsx-builtin-20a.c: New test. * gcc.target/powerpc/vsx-builtin-20b.c: New test. * gcc.target/powerpc/vsx-builtin-9a.c: New test. * gcc.target/powerpc/vsx-builtin-9b.c: New test. Index: gcc/config/rs6000/rs6000-c.c === --- gcc/config/rs6000/rs6000-c.c(revision 269492) +++ gcc/config/rs6000/rs6000-c.c
[PATCH, rs6000] PR87532: Bad Results from vec_extract(unsigned char, foo) dependent upon function inline
This problem report, though initially motivated by differences in behavior between constant and non-constant selector arguments, uncovered a number of inconsistencies in the implementation of vec_extract. This patch provides several fixes to make handling of constant selector expressions the same as the handling of non-constant selector expressions. In the process of testing, it was observed that certain existing regression tests were looking for the wrong instructions to be emitted and those tests have been updated. This has bootstrapped and tested without regressions on powerpc64le-unknown-linux (both P8 and P9) and on powerpc-linux (P7 big-endian, with both -m32 and -m64 target options). Is this ok for trunk? gcc/ChangeLog: 2019-03-08 Kelvin Nilsen PR target/87532 * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin): When handling vec-extract, use modular arithmetic to allow constant selectors greater than vector length. * config/rs6000/rs6000.c (rs6000_expand_vector_extract): Allow V1TImode vectors to have constant selector values greater than 0. Use modular arithmetic to compute vector index. (rs6000_split_vec_extract_var): Use modular arithmetic to compute index for in-memory vectors. Correct code generation for in-register vectors. (altivec_expand_vec_ext_builtin): Use modular arithmetic to compute index. gcc/testsuite/ChangeLog: 2019-03-08 Kelvin Nilsen PR target/87532 * gcc.target/powerpc/vsx-builtin-10a.c: New test. * gcc.target/powerpc/vsx-builtin-20a.c: New test. * gcc.target/powerpc/vsx-builtin-11b.c: New test. * gcc.target/powerpc/vsx-builtin-9b.c: New test. * gcc.target/powerpc/vsx-builtin-12a.c: New test. * gcc.target/powerpc/vsx-builtin-13b.c: New test. * gcc.target/powerpc/vsx-builtin-14a.c: New test. * gcc.target/powerpc/vec-extract-v16qiu-v2a.c: New test. * gcc.target/powerpc/vsx-builtin-15b.c: New test. * gcc.target/powerpc/vsx-builtin-16a.c: New test. * gcc.target/powerpc/vsx-builtin-17b.c: New test. * gcc.target/powerpc/vsx-builtin-18a.c: New test. * gcc.target/powerpc/pr87532-mc.c: New test. * gcc.target/powerpc/vsx-builtin-19b.c: New test. * gcc.target/powerpc/vsx-builtin-10b.c: New test. * gcc.target/powerpc/vsx-builtin-11a.c: New test. * gcc.target/powerpc/vsx-builtin-9a.c: New test. * gcc.target/powerpc/vsx-builtin-20b.c: New test. * gcc.target/powerpc/vsx-builtin-12b.c: New test. * gcc.target/powerpc/vsx-builtin-13a.c: New test. * gcc.target/powerpc/vsx-builtin-14b.c: New test. * gcc.target/powerpc/vsx-builtin-15a.c: New test. * gcc.target/powerpc/vec-extract-v16qiu-v2b.c: New test. * gcc.target/powerpc/pr87532.c: New test. * gcc.target/powerpc/vsx-builtin-16b.c: New test. * gcc.target/powerpc/vec-extract-v16qiu-v2.h: New test. * gcc.target/powerpc/vsx-builtin-17a.c: New test. * gcc.target/powerpc/vsx-builtin-18b.c: New test. * gcc.target/powerpc/vsx-builtin-19a.c: New test. * gcc.target/powerpc/fold-vec-extract-char.p8.c: Modify expected instruction selection. * gcc.target/powerpc/fold-vec-extract-short.p8.c: Likewise. * gcc.target/powerpc/fold-vec-extract-int.p8.c: Likewise. Index: gcc/testsuite/gcc.target/powerpc/vsx-builtin-10a.c === --- gcc/testsuite/gcc.target/powerpc/vsx-builtin-10a.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/vsx-builtin-10a.c (revision 0) @@ -0,0 +1,157 @@ +/* { dg-do run { target { powerpc*-*-* } } } */ +/* { dg-skip-if "" { powerpc*-*-darwin* } } */ +/* { dg-require-effective-target dfp_hw } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { } } */ +/* { dg-options "-maltivec" } */ + +/* This test should run the same on any target that supports altivec/dfp + instructions. Intentionally not specifying cpu in order to test + all code generation paths. */ + +#include + +extern void abort (void); + +#define CONST0 (0) +#define CONST1 (1) +#define CONST2 (2) +#define CONST3 (3) +#define CONST4 (4) +#define CONST5 (5) +#define CONST6 (6) +#define CONST7 (7) + + +/* Test that indices > length of vector are applied modulo the vector + length. */ + +/* Test for vector residing in register. */ +short s3 (vector short v) +{ + return __builtin_vec_ext_v8hi (v, 3); +} + +short s7 (vector short v) +{ + return __builtin_vec_ext_v8hi (v, 7); +} + +short s21 (vector short v) +{ + return __builtin_vec_ext_v8hi (v, 21); +} + +short s30 (vector short v) +{ + return __builtin_vec_ext_v8hi (v, 30); +} + +/* Test for vector residing in memory. */
[PATCH, rs6000] Correct dg directives on recently added vec-extract tests
Overnight regression testing revealed a portability problem with several recently installed tests. The tests were observed to fail on a power7 test platform. The tests, which are intended to execute, are compiled with -mcpu=power8. Thus, they require power 8 hardware. I have regression tested this on powerpc64-linux (P7 big-endian, both -m32 and -m64), both 32-bit and 64-bit. Is this ok for trunk and for various backports to which the original patch is to be directed? gcc/testsuite/ChangeLog: 2019-02-01 Kelvin Nilsen * gcc.target/powerpc/vec-extract-slong-1.c: Require p8 execution hardware. * gcc.target/powerpc/vec-extract-schar-1.c: Likewise. * gcc.target/powerpc/vec-extract-sint128-1.c: Likewise. * gcc.target/powerpc/vec-extract-sshort-1.c: Likewise. * gcc.target/powerpc/vec-extract-ulong-1.c: Likewise. * gcc.target/powerpc/vec-extract-uchar-1.c: Likewise. * gcc.target/powerpc/vec-extract-sint-1.c: Likewise. * gcc.target/powerpc/vec-extract-uint128-1.c: Likewise. * gcc.target/powerpc/vec-extract-ushort-1.c: Likewise. * gcc.target/powerpc/vec-extract-uint-1.c: Likewise. Index: gcc/testsuite/gcc.target/powerpc/vec-extract-slong-1.c === --- gcc/testsuite/gcc.target/powerpc/vec-extract-slong-1.c (revision 268424) +++ gcc/testsuite/gcc.target/powerpc/vec-extract-slong-1.c (working copy) @@ -2,7 +2,7 @@ signed longs remains signed. */ /* { dg-do run } */ /* { dg-options "-ansi -mcpu=power8 " } */ -/* { dg-require-effective-target powerpc_p8vector_ok } */ +/* { dg-require-effective-target p8vector_hw } */ /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */ #include Index: gcc/testsuite/gcc.target/powerpc/vec-extract-schar-1.c === --- gcc/testsuite/gcc.target/powerpc/vec-extract-schar-1.c (revision 268424) +++ gcc/testsuite/gcc.target/powerpc/vec-extract-schar-1.c (working copy) @@ -2,7 +2,7 @@ signed chars remains signed. */ /* { dg-do run } */ /* { dg-options "-ansi -mcpu=power8 " } */ -/* { dg-require-effective-target powerpc_p8vector_ok } */ +/* { dg-require-effective-target p8vector_hw } */ /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */ #include Index: gcc/testsuite/gcc.target/powerpc/vec-extract-sint128-1.c === --- gcc/testsuite/gcc.target/powerpc/vec-extract-sint128-1.c(revision 268424) +++ gcc/testsuite/gcc.target/powerpc/vec-extract-sint128-1.c(working copy) @@ -2,7 +2,7 @@ signed __int128s remains signed. */ /* { dg-do run } */ /* { dg-options "-ansi -mcpu=power8 " } */ -/* { dg-require-effective-target powerpc_p8vector_ok } */ +/* { dg-require-effective-target p8vector_hw } */ /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */ #include Index: gcc/testsuite/gcc.target/powerpc/vec-extract-sshort-1.c === --- gcc/testsuite/gcc.target/powerpc/vec-extract-sshort-1.c (revision 268424) +++ gcc/testsuite/gcc.target/powerpc/vec-extract-sshort-1.c (working copy) @@ -2,7 +2,7 @@ signed shorts remains signed. */ /* { dg-do run } */ /* { dg-options "-ansi -mcpu=power8 " } */ -/* { dg-require-effective-target powerpc_p8vector_ok } */ +/* { dg-require-effective-target p8vector_hw } */ /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */ #include Index: gcc/testsuite/gcc.target/powerpc/vec-extract-ulong-1.c === --- gcc/testsuite/gcc.target/powerpc/vec-extract-ulong-1.c (revision 268424) +++ gcc/testsuite/gcc.target/powerpc/vec-extract-ulong-1.c (working copy) @@ -2,7 +2,7 @@ unsigned longs remains unsigned. */ /* { dg-do run } */ /* { dg-options "-ansi -mcpu=power8 " } */ -/* { dg-require-effective-target powerpc_p8vector_ok } */ +/* { dg-require-effective-target p8vector_hw } */ /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */ #include Index: gcc/testsuite/gcc.target/powerpc/vec-extract-uchar-1.c === --- gcc/testsuite/gcc.target/powerpc/vec-extract-uchar-1.c (revision 268424) +++ gcc/testsuite/gcc.target/powerpc/vec-extract-uchar-1.c (working copy) @@ -2,7 +2,7 @@ unsigned chars remains unsigned. */ /* { dg-do run } */ /* { dg-options "-ansi -mcpu=power8 " } */ -/* { dg-requir
[PATCH, rs6000] Fix invalid type returned from builtin vec_extract
An error in the type returned from the built-in vec_extract function was recently reported, as represented in the following sample program: #include #include int main() { unsigned char uc = 0xf6; printf("explicit cast: %x\n", (int)uc); vector unsigned char v = vec_splats((unsigned char)0xf6); printf("cast from vec_extract(): %x\n", (int)vec_extract(v, 0)); return 0; } When compiled with the current trunk, the output of running this program is: $ ./a.out explicit cast: f6 cast from vec_extract(): fff6 The fix is to coerce the result of vec_extract so that it matches the type of the array element supplied as its first argument. I have built and regression tested this patch on powerpc64le-unknown-linux with no regressions. Is this ok for trunk? gcc/ChangeLog: 2019-01-28 Kelvin Nilsen * config/rs6000/rs6000-c.c (altivec-resolve_overloaded_builtin): Change handling of ALTIVEC_BUILTIN_VEC_EXTRACT. Coerce result to type of vector element when vec_extract is implemented by direct move. gcc/testsuite/ChangeLog: 2019-01-28 Kelvin Nilsen * gcc.target/powerpc/vec-extract-schar-1.c: New test. * gcc.target/powerpc/vec-extract-sint-1.c: New test. * gcc.target/powerpc/vec-extract-sint128-1.c: New test. * gcc.target/powerpc/vec-extract-slong-1.c: New test. * gcc.target/powerpc/vec-extract-sshort-1.c: New test. * gcc.target/powerpc/vec-extract-uchar-1.c: New test. * gcc.target/powerpc/vec-extract-uint-1.c: New test. * gcc.target/powerpc/vec-extract-uint128-1.c: New test. * gcc.target/powerpc/vec-extract-ulong-1.c: New test. * gcc.target/powerpc/vec-extract-ushort-1.c: New test. Index: gcc/config/rs6000/rs6000-c.c === --- gcc/config/rs6000/rs6000-c.c(revision 268196) +++ gcc/config/rs6000/rs6000-c.c(working copy) @@ -6645,7 +6645,13 @@ } if (call) - return build_call_expr (call, 2, arg1, arg2); + { + tree result = build_call_expr (call, 2, arg1, arg2); + /* Coerce the result to vector element type. May be no-op. */ + arg1_inner_type = TREE_TYPE (arg1_type); + result = fold_convert (arg1_inner_type, result); + return result; + } } /* Build *(((arg1_inner_type*)&(vector type){arg1})+arg2). */ Index: gcc/testsuite/gcc.target/powerpc/vec-extract-schar-1.c === --- gcc/testsuite/gcc.target/powerpc/vec-extract-schar-1.c (nonexistent) +++ gcc/testsuite/gcc.target/powerpc/vec-extract-schar-1.c (working copy) @@ -0,0 +1,27 @@ +/* Test to verify that the vec_extract from a vector of + signed chars remains signed. */ +/* { dg-do run } */ +/* { dg-options "-ansi -mcpu=power8 " } */ + +#include +#include +#include + +int test1(signed char sc) { + int sce; + + vector signed char v = vec_splats(sc); + sce = vec_extract(v,0); + + if (sce != sc) +abort(); + return 0; +} + +int main() +{ + test1 (0xf6); + test1 (0x76); + test1 (0x06); + return 0; +} Index: gcc/testsuite/gcc.target/powerpc/vec-extract-sint-1.c === --- gcc/testsuite/gcc.target/powerpc/vec-extract-sint-1.c (nonexistent) +++ gcc/testsuite/gcc.target/powerpc/vec-extract-sint-1.c (working copy) @@ -0,0 +1,27 @@ +/* Test to verify that the vec_extract from a vector of + signed ints remains signed. */ +/* { dg-do run } */ +/* { dg-options "-ansi -mcpu=power8 " } */ + +#include +#include +#include + +int test1(signed int si) { + long long int sie; + + vector signed int v = vec_splats(si); + sie = vec_extract(v,0); + + if (sie != si) +abort(); + return 0; +} + +int main() +{ + test1 (0xf600); + test1 (0x7600); + test1 (0x0600); + return 0; +} Index: gcc/testsuite/gcc.target/powerpc/vec-extract-sint128-1.c === --- gcc/testsuite/gcc.target/powerpc/vec-extract-sint128-1.c(nonexistent) +++ gcc/testsuite/gcc.target/powerpc/vec-extract-sint128-1.c(working copy) @@ -0,0 +1,25 @@ +/* Test to verify that the vec_extract from a vector of + signed __int128s remains signed. */ +/* { dg-do run } */ +/* { dg-options "-ansi -mcpu=power8 " } */ + +#include +#include +#include + +int test1(signed __int128 st) { + + vector signed long long int v = vec_splats(st); + + if (vec_extract (v, 0) > st) +abort(); + return 0; +} + +int main() +{ + test1 (((__int128) 0xf600LL) << 64); + test1 (((__int128) 0x7600LL) << 64); + test1 (((__int128) 0x0600LL) << 64); + return 0; +} Index: gcc/testsuite/gcc.target/powerpc/vec-extract-slong-1.c ==
[PATCH, rs6000] Replace X-form addressing with D-form addressing in new pass for Power9
This patch is a refinement of a path first submitted to this list on Nov. 10, 2018. This new patch incorporates improvements suggested by seg...@gcc.gnu.org. Two regression observed at the time this patch was previously distributed have been resolved as described here: https://sourceware.org/bugzilla/show_bug.cgi?id=23937 New D-form instructions available on Power9 introduce new code generation options that result in more efficient execution. This new pass scans existing rtl expressions and replaces them with rtl expressions that favor selection of the D-form instructions in contexts for which the D-form instructions are preferred. The new pass runs after the RTL loop optimizations since loop unrolling often introduces opportunities for beneficial replacements of X-form addressing instructions. I have built and regression tested this patch on powerpc64le-unknown-linux (Power9) target with no regressions. Is this ok for trunk? gcc/ChangeLog: 2018-12-13 Kelvin Nilsen * config/rs6000/rs6000-p9dform.c: New file. * config/rs6000/rs6000-passes.def: Add pass_insert_dform after pass_loop2. * config/rs6000/rs6000-protos.h (rs6000_target_supports_dform_offset_p): New prototype. (make_pass_insert_dform): New prototype. * config/rs6000/rs6000.c (rs6000_target_supports_dform_offset_p): New function. * config/rs6000/t-rs6000: Add entry to compile rs6000-p9dform.c. * config.gcc: Add entry to link new object file rs6000-p9dform.o. gcc/testsuite/ChangeLog: 2018-12-13 Kelvin Nilsen * gcc.target/powerpc/p9-dform-0.c: New test. * gcc.target/powerpc/p9-dform-1.c: New test. Index: gcc/config/rs6000/rs6000-p9dform.c === --- gcc/config/rs6000/rs6000-p9dform.c (nonexistent) +++ gcc/config/rs6000/rs6000-p9dform.c (working copy) @@ -0,0 +1,1487 @@ +/* Subroutines used to transform array subscripting expressions into + forms that are more amenable to d-form instruction selection for p9 + little-endian VSX code. + Copyright (C) 1991-2018 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published + by the Free Software Foundation; either version 3, or (at your + option) any later version. + + GCC is distributed in the hope that it will be useful, but WITHOUT + ANY WARRANTY; without even the implied warranty of MERCHANTABILITY + or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public + License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + <http://www.gnu.org/licenses/>. */ + +#include "config.h" +#include "system.h" +#include "coretypes.h" +#include "backend.h" +#include "rtl.h" +#include "tree.h" +#include "memmodel.h" +#include "df.h" +#include "tm_p.h" +#include "ira.h" +#include "print-tree.h" +#include "varasm.h" +#include "explow.h" +#include "expr.h" +#include "output.h" +#include "tree-pass.h" +#include "rtx-vector-builder.h" +#include "cfgloop.h" + +#include "insn-config.h" +#include "recog.h" + +#include "print-rtl.h" +#include "tree-pretty-print.h" + +#include "genrtl.h" + +/* This pass transforms array indexing expressions from a form that + favors selection of X-form instructions into a form that favors + selection of D-form instructions. + + Showing favor for D-form instructions is especially important when + targeting Power9, as the Power9 architecture added a number of new + D-form instruction capabilities. + + Consider, for example, the following loop, excerpted from an actual + program: + +double sacc, x[], y[], z[]; +sacc = 0.00; +for (unsigned long long int i = 0; i < N; i++) { + z[i] = x[i] * y[i]; + sacc += z[i]; +} + + Compile this program with the following gcc options which enable both + vectorization and loop unrolling: +-m64 -fdump-rtl-all-details -mcpu=power9 -mtune=power9 -funroll-loops -O3 + + Without this pass, this loop is represented by the following: + + lxvx: 16 + addi: 8 + xvmuldp:8 + stxvx: 8 + fmr:8 + xxpermdi: 8 + fadd: 16 + bdnz: 1 + ___ + total: 73 instructions + +.L3: + lxvx 0,29,11 + lxvx 12,30,11 + addi 12,11,16 + addi 0,11,48 + addi 5,11,64 + addi 9,11,32 + addi 6,11,80 + addi 7,11,96 + addi 8,11,112 + lxvx 2,29,12 + lxvx 3,30,12 +
[RFC][PATCH, rs6000] Replace X-form addressing with D-form addressing in new pass for Power9
New D-form instructions available on Power9 introduce new code generation options that result in more efficient execution. This new pass scans existing rtl expressions and replaces them with rtl expressions that favor selection of the D-form instructions in contexts for which the D-form instructions are preferred. I have built and regression tested this patch on powerpc64le-unknown-linux (Power9) target with only two regressions. Both regressions relate to resolution of ifuncs, and I have determined that the toc pointer upon entry into the resolver functions are not valid. I have not yet determined why this is happening, though I have observed that the same problem seems to occur with certain other versions of the compiler prior to my trunk with patch. The two failures are: FAIL: gcc.dg/attr-ifunc-4.c execution test FAIL: gcc.dg/ipa/ipa-pta-19.c execution test I invite comments and suggestions regarding this draft patch at this time. gcc/ChangeLog: 2018-11-10 Kelvin Nilsen * config.gcc: Add entry to compile new object rs6000-p9indexing.o. * config/rs6000/rs6000-passes.def: Add pass_fix_indexing after pass_loop2. * config/rs6000/t-rs6000: Add entry to compile rs6000-p9indexing.c. * config/rs6000/rs6000.c (rs6000_target_supports_dform_offset_p): New function. * config/rs6000/rs6000-protos.h (rs6000_target_supports_dform_offset_p): New prototype. (make_pass_fix_indexing): New prototype. * config/rs6000/rs6000-p9indexing.c: New file. Index: gcc/config/rs6000/t-rs6000 === --- gcc/config/rs6000/t-rs6000 (revision 263589) +++ gcc/config/rs6000/t-rs6000 (working copy) @@ -35,6 +35,10 @@ $(COMPILE) $< $(POSTCOMPILE) +rs6000-p9indexing.o: $(srcdir)/config/rs6000/rs6000-p9indexing.c + $(COMPILE) $< + $(POSTCOMPILE) + $(srcdir)/config/rs6000/rs6000-tables.opt: $(srcdir)/config/rs6000/genopt.sh \ $(srcdir)/config/rs6000/rs6000-cpus.def $(SHELL) $(srcdir)/config/rs6000/genopt.sh $(srcdir)/config/rs6000 > \ Index: gcc/config/rs6000/rs6000-protos.h === --- gcc/config/rs6000/rs6000-protos.h (revision 263589) +++ gcc/config/rs6000/rs6000-protos.h (working copy) @@ -47,6 +47,8 @@ extern bool legitimate_indirect_address_p (rtx, int); extern bool legitimate_indexed_address_p (rtx, int); extern bool avoiding_indexed_address_p (machine_mode); +extern bool rs6000_target_supports_dform_offset_p (bool, machine_mode, + HOST_WIDE_INT); extern rtx rs6000_got_register (rtx); extern rtx find_addr_reg (rtx); @@ -244,6 +246,8 @@ class rtl_opt_pass; extern rtl_opt_pass *make_pass_analyze_swaps (gcc::context *); +extern rtl_opt_pass *make_pass_fix_indexing (gcc::context *); + extern bool rs6000_sum_of_two_registers_p (const_rtx expr); extern bool rs6000_quadword_masked_address_p (const_rtx exp); extern rtx rs6000_gen_lvx (enum machine_mode, rtx, rtx); Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 263589) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -9263,6 +9263,169 @@ return ret; } +/* This function provides an approximation of which d-form addressing + expressions are valid on any given target configuration. This + approximation guides optimization choices. Secondary validation + of the addressing mode is performed before code generation. + + Return true iff target has instructions to perform a memory + operation at the specified BYTE_OFFSET from an address held + in a general purpose register. if IS_STORE is true, test for + availability of a store instruction. Otherwise, test for + availability of a load instruction. */ +bool +rs6000_target_supports_dform_offset_p (bool is_store __attribute__((unused)), + machine_mode mode, + HOST_WIDE_INT byte_offset) +{ + const int max_16bit_signed = (0x7fff); + const int min_16bit_signed = -1 - max_16bit_signed; + + /* available d-form instructions with P1 (the original Power architecture): + + lbz RT,D(RA) - load byte and zero d-form + lhz RT,D(RA) - load half word and zero d-form + lha RT,D(RA) - load half word algebraic d-form + lwz RT,D(RA) - load word and zero d-form + lfs FRT,D(RA) - load floating-point single d-form + lfd FRT,D(RA) - load floating-point double d-form + + stb RS,D(RA) - store byte d-form + sth RS,D(RA) - store half word d-form + stfs FRS,D(RA) - store floating point single d-form + stfd FRS,D(RA) - store floating point double d-form + */ + + /* available d-form instructions with PPC (prior to v2.00): + (option mpowerpc "existed in the past" but is now "always
Re: [PATCH, rs6000] Correct descriptions of __builtin_bcdadd* and _builtin_bcdsub* functions
My "consistency" check was against the implementation. On 8/2/18 11:38 AM, Segher Boessenkool wrote: > Hi Kelvin, > > On Wed, Aug 01, 2018 at 02:55:22PM -0500, Kelvin Nilsen wrote: >> Several errors were discovered in the descriptions of the __builtin_bcdadd, >> __builtin_bcdadd_lt, __builtin_bcdadd_eq, __builtin_bcdadd_gt, >> __builtin_bcdadd_ov, __builtin_bcdsub, __builtin_bcdsub_lt, >> __builtin_bcdsub_eq, __builtin_bcdsub_gt, and __builtin_bcdsub_ov functions. >> This patch corrects these documentation errors. > > What did you check this against? The ABI doc, or what is currently > implemented? Neither is very clear to me :-/ > > > Segher > >
[PATCH, rs6000] Correct descriptions of __builtin_bcdadd* and _builtin_bcdsub* functions
Several errors were discovered in the descriptions of the __builtin_bcdadd, __builtin_bcdadd_lt, __builtin_bcdadd_eq, __builtin_bcdadd_gt, __builtin_bcdadd_ov, __builtin_bcdsub, __builtin_bcdsub_lt, __builtin_bcdsub_eq, __builtin_bcdsub_gt, and __builtin_bcdsub_ov functions. This patch corrects these documentation errors. I have built the gcc.pdf file and reviewed the formatting, and all looks good. Is this ok for trunk? gcc/ChangeLog: 2018-08-01 Kelvin Nilsen * doc/extend.texi (PowerPC AltiVec Built-in Functions Available on ISA 2.07): Correct spelling of bcdsub to be __builtin_bcdsub. Add third argument of type "const signed char" to descriptions of __builtin_bcdadd, __builtin_bcdadd_lt, __builtin_bcdadd_eq, __builtin_bcdadd_gt, __builtin_bcdadd_ov, __builtin_bcdsub, __builtin_bcdsub_lt, __builtin_bcdsub_eq, __builtin_bcdsub_gt, __builtin_bcdsub_ov functions. Index: gcc/doc/extend.texi === --- gcc/doc/extend.texi (revision 263068) +++ gcc/doc/extend.texi (working copy) @@ -18383,16 +18383,16 @@ vector __uint128 vec_vsubcuq (vector __uint128, ve __int128 vec_vsubuqm (__int128, __int128); __uint128 vec_vsubuqm (__uint128, __uint128); -vector __int128 __builtin_bcdadd (vector __int128, vector __int128); -int __builtin_bcdadd_lt (vector __int128, vector __int128); -int __builtin_bcdadd_eq (vector __int128, vector __int128); -int __builtin_bcdadd_gt (vector __int128, vector __int128); -int __builtin_bcdadd_ov (vector __int128, vector __int128); -vector __int128 bcdsub (vector __int128, vector __int128); -int __builtin_bcdsub_lt (vector __int128, vector __int128); -int __builtin_bcdsub_eq (vector __int128, vector __int128); -int __builtin_bcdsub_gt (vector __int128, vector __int128); -int __builtin_bcdsub_ov (vector __int128, vector __int128); +vector __int128 __builtin_bcdadd (vector __int128, vector __int128, const signed char); +int __builtin_bcdadd_lt (vector __int128, vector __int128, const signed char); +int __builtin_bcdadd_eq (vector __int128, vector __int128, const signed char); +int __builtin_bcdadd_gt (vector __int128, vector __int128, const signed char); +int __builtin_bcdadd_ov (vector __int128, vector __int128, const signed char); +vector __int128 __builtin_bcdsub (vector __int128, vector __int128, const signed char); +int __builtin_bcdsub_lt (vector __int128, vector __int128, const signed char); +int __builtin_bcdsub_eq (vector __int128, vector __int128, const signed char); +int __builtin_bcdsub_gt (vector __int128, vector __int128, const signed char); +int __builtin_bcdsub_ov (vector __int128, vector __int128, const signed char); @end smallexample @node PowerPC AltiVec Built-in Functions Available on ISA 3.0
Re: Fwd: [PATCH, rs6000] Replace __uint128_t and __int128_t with __uint128 and __int128 in Power PC built-in documentation
Thanks for review and approval. To respond to your question about error messages: > > microdoc3.c:22:3: error: invalid parameter combination for AltiVec intrinsic > ‘__builtin_vec_vaddcuq’ >u1 = vec_vaddcuq (d2, d3); >^~ On 7/26/18 9:54 AM, Segher Boessenkool wrote: > On Thu, Jul 26, 2018 at 08:40:01AM -0500, Kelvin Nilsen wrote: >> To improve internal consistency and to improve consistency with published >> ABI documents, this patch replaces the __uint128_t type with __uint128 and >> replaces __int128_t with __int128. > >> Is this ok for trunk? > > Looks good, thanks! Most (all?) of these functions are not documented > in the ABI, but this is a step forward anyway. Okay for trunk. > > What do things like error messages involving these functions look like? > What types do those say? > > > Segher > >
[PATCH, rs6000] Replace __uint128_t and __int128_t with __uint128 and __int128 in Power PC built-in documentation
To improve internal consistency and to improve consistency with published ABI documents, this patch replaces the __uint128_t type with __uint128 and replaces __int128_t with __int128. I have built and regression tested this patch on powerpc64le-unknown-linux with no regressions. I have also built and reviewed the gcc.pdf file. Is this ok for trunk? gcc/ChangeLog: 2018-07-25 Kelvin Nilsen * doc/extend.texi (Basic PowerPC Built-in Functions Available on ISA 2.05): Replace __uint128_t with __uint128 and __int128_t with __int128 in built-in function prototypes. (PowerPC AltiVec Built-in Functions on ISA 2.07): Likewise. (PowerPC AltiVec Built-in Functions on ISA 3.0): Likewise. Index: gcc/doc/extend.texi === --- gcc/doc/extend.texi (revision 262977) +++ gcc/doc/extend.texi (working copy) @@ -15762,9 +15762,9 @@ long long __builtin_divde (long long, long long); unsigned long long __builtin_divdeu (unsigned long long, unsigned long long); int __builtin_divwe (int, int); unsigned int __builtin_divweu (unsigned int, unsigned int); -vector __int128_t __builtin_pack_vector_int128 (long long, long long); +vector __int128 __builtin_pack_vector_int128 (long long, long long); void __builtin_rs6000_speculation_barrier (void); -long long __builtin_unpack_vector_int128 (vector __int128_t, signed char); +long long __builtin_unpack_vector_int128 (vector __int128, signed char); @end smallexample Of these, the @code{__builtin_divde} and @code{__builtin_divdeu} functions @@ -18331,57 +18331,57 @@ vector unsigned long long vec_vupklsw (vector int) If the ISA 2.07 additions to the vector/scalar (power8-vector) instruction set are available, the following additional functions are available for 64-bit targets. New vector types -(@var{vector __int128_t} and @var{vector __uint128_t}) are available -to hold the @var{__int128_t} and @var{__uint128_t} types to use these +(@var{vector __int128} and @var{vector __uint128}) are available +to hold the @var{__int128} and @var{__uint128} types to use these builtins. The normal vector extract, and set operations work on -@var{vector __int128_t} and @var{vector __uint128_t} types, +@var{vector __int128} and @var{vector __uint128} types, but the index value must be 0. @smallexample -vector __int128_t vec_vaddcuq (vector __int128_t, vector __int128_t); -vector __uint128_t vec_vaddcuq (vector __uint128_t, vector __uint128_t); +vector __int128 vec_vaddcuq (vector __int128, vector __int128); +vector __uint128 vec_vaddcuq (vector __uint128, vector __uint128); -vector __int128_t vec_vadduqm (vector __int128_t, vector __int128_t); -vector __uint128_t vec_vadduqm (vector __uint128_t, vector __uint128_t); +vector __int128 vec_vadduqm (vector __int128, vector __int128); +vector __uint128 vec_vadduqm (vector __uint128, vector __uint128); -vector __int128_t vec_vaddecuq (vector __int128_t, vector __int128_t, -vector __int128_t); -vector __uint128_t vec_vaddecuq (vector __uint128_t, vector __uint128_t, - vector __uint128_t); +vector __int128 vec_vaddecuq (vector __int128, vector __int128, +vector __int128); +vector __uint128 vec_vaddecuq (vector __uint128, vector __uint128, + vector __uint128); -vector __int128_t vec_vaddeuqm (vector __int128_t, vector __int128_t, -vector __int128_t); -vector __uint128_t vec_vaddeuqm (vector __uint128_t, vector __uint128_t, - vector __uint128_t); +vector __int128 vec_vaddeuqm (vector __int128, vector __int128, +vector __int128); +vector __uint128 vec_vaddeuqm (vector __uint128, vector __uint128, + vector __uint128); -vector __int128_t vec_vsubecuq (vector __int128_t, vector __int128_t, -vector __int128_t); -vector __uint128_t vec_vsubecuq (vector __uint128_t, vector __uint128_t, - vector __uint128_t); +vector __int128 vec_vsubecuq (vector __int128, vector __int128, +vector __int128); +vector __uint128 vec_vsubecuq (vector __uint128, vector __uint128, + vector __uint128); -vector __int128_t vec_vsubeuqm (vector __int128_t, vector __int128_t, -vector __int128_t); -vector __uint128_t vec_vsubeuqm (vector __uint128_t, vector __uint128_t, - vector __uint128_t); +vector __int128 vec_vsubeuqm (vector __int128, vector __int128, +vector __int128); +vector __uint128 vec_vsubeuqm (vector __uint128, vector __uint128, + vector __uint128); -vector __int128_t vec_vsubcuq (vector __int128_t, vector __int128_t); -vector __uint128_t
[PATCH, rs6000] Sort Altivec/VSX built-in functions into subsubsections according to configuration requirements
The many PowerPC built-in functions (intrinsics) that are enabled by including each have different configuration requirements. To simplify the description of the requirements, this patch sorts these functions into different subsubsections. A subsequent patch will add and remove various functions from each section to correct incompatibilities between what is implemented and what is documented. I have built and regression tested this patch on powerpc64le-unknown-linux and on powerpc-linux (P8 big-endian) with no regressions. I have also built and reviewed the gcc.pdf file. Is this ok for trunk? gcc/ChangeLog: 2018-07-17 Kelvin Nilsen * doc/extend.texi (PowerPC AltiVec/VSX Built-in Functions): Corrected spelling of this subsection. Moved some material to new subsubsections "PowerPC AltiVec Built-in Functions on ISA 2.06" and "PowerPC AltiVec Built-in Functions on ISA 2.07". (PowerPC Altivec Built-in Functions on ISA 2.05): New subsubsection. (PowerPC Altivec Built-in Functions on ISA 2.06): Likewise. (PowerPC Altivec Built-in Functions on ISA 2.07): Likewise. (PowerPC Altivec Built-in Functions on ISA 3.0): Likewise. Index: gcc/doc/extend.texi === --- gcc/doc/extend.texi (revision 262747) +++ gcc/doc/extend.texi (working copy) @@ -15941,10 +15941,8 @@ The @code{__builtin_dfp_dtstsfi_ov_dd} and require that the type of the @code{value} argument be @code{__Decimal64} and @code{__Decimal128} respectively. - - @node PowerPC AltiVec/VSX Built-in Functions -@subsection PowerPC AltiVec Built-in Functions +@subsection PowerPC AltiVec/VSX Built-in Functions GCC provides an interface for the PowerPC family of processors to access the AltiVec operations described in Motorola's AltiVec Programming @@ -15969,19 +15967,6 @@ vector bool int vector float @end smallexample -If @option{-mvsx} is used the following additional vector types are -implemented. - -@smallexample -vector unsigned long -vector signed long -vector double -@end smallexample - -The long types are only implemented for 64-bit code generation, and -the long type is only used in the floating point/integer conversion -instructions. - GCC's implementation of the high-level language interface available from C and C++ code differs from Motorola's documentation in several ways. @@ -16039,6 +16024,16 @@ the interfaces described therein. However, histor additional interfaces for access to vector instructions. These are briefly described below. +@menu +* PowerPC AltiVec Built-in Functions on ISA 2.05:: +* PowerPC AltiVec Built-in Functions Available on ISA 2.06:: +* PowerPC AltiVec Built-in Functions Available on ISA 2.07:: +* PowerPC AltiVec Built-in Functions Available on ISA 3.0:: +@end menu + +@node PowerPC AltiVec Built-in Functions on ISA 2.05 +@subsubsection PowerPC AltiVec Built-in Functions on ISA 2.05 + The following interfaces are supported for the generic and specific AltiVec operations and the AltiVec predicates. In cases where there is a direct mapping between generic and specific operations, only the @@ -17581,132 +17576,152 @@ vector unsigned char vec_xor (vector unsigned char vector unsigned char vec_xor (vector unsigned char, vector unsigned char); @end smallexample -The following built-in functions which are currently documented in -this section are not alphabetized with other built-in functions of -this section because they belong in different sections. +@node PowerPC AltiVec Built-in Functions Available on ISA 2.06 +@subsubsection PowerPC AltiVec Built-in Functions Available on ISA 2.06 +The AltiVec built-in functions described in this section are +available on the PowerPC family of processors starting with ISA 2.06 +or later. These are normally enabled by adding @option{-mvsx} to the +command line. + +When @option{-mvsx} is used, the following additional vector types are +implemented. + @smallexample -/* __int128, long long, and double arguments and results require -mvsx. */ +vector unsigned __int128 +vector signed __int128 +vector unsigned long long int +vector signed long long int +vector double +@end smallexample + +The long long types are only implemented for 64-bit code generation. + +@smallexample + vector bool long long vec_and (vector bool long long int, vector bool long long); + vector double vec_ctf (vector unsigned long, const int); vector double vec_ctf (vector signed long, const int); + vector signed long vec_cts (vector double, const int); + vector unsigned long vec_ctu (vector double, const int); + void vec_dst (const unsigned long *, int, const int); void vec_dst (const long *, int, const int); + void vec_dststt (const unsigned long *, int, const int); void vec_dststt (const long *, int, const int); + void vec_dstt (const unsigned long *, int, const int); void vec_dstt (const long *, int, const int); + vector un
Re: [RFC] Induction variable candidates not sufficiently general
Thanks for looking at this for me. In simplifying the test case for a bug report, I've narrowed the "problem" to integer overflow considerations. My len variable is declared int, and the target has 64-bit pointers. I'm gathering that the "manual transformation" I quoted below is not considered "equivalent" to the original source code due to different integer overflow behaviors. If I redeclare len to be unsigned long long, then I automatically get the optimizations that I was originally expecting. I suppose this is really NOT a bug? Is there a compiler optimization flag that allows the optimizer to ignore array index integer overflow in considering legal optimizations? On 7/13/18 9:14 PM, Bin.Cheng wrote: > On Fri, Jul 13, 2018 at 6:04 AM, Kelvin Nilsen wrote: >> A somewhat old "issue report" pointed me to the code generated for a 4-fold >> manually unrolled version of the following loop: >> >>> while (++len != len_limit) /* this is loop */ >>> if (pb[len] != cur[len]) >>> break; >> >> As unrolled, the loop appears as: >> >>> while (++len != len_limit) /* this is loop */ { >>> if (pb[len] != cur[len]) >>> break; >>> if (++len == len_limit) /* unrolled 2nd iteration */ >>> break; >>> if (pb[len] != cur[len]) >>> break; >>> if (++len == len_limit) /* unrolled 3rd iteration */ >>> break; >>> if (pb[len] != cur[len]) >>> break; >>> if (++len == len_limit) /* unrolled 4th iteration */ >>> break; >>> if (pb[len] != cur[len]) >>> break; >>> } >> >> In examining the behavior of tree-ssa-loop-ivopts.c, I've discovered the >> only induction variable candidates that are being considered are all forms >> of the len variable. We are not considering any induction variables to >> represent the address expressions [len] and [len]. >> >> I rewrote the source code for this loop to make the addressing expressions >> more explicit, as in the following: >> >>> cur++; >>> while (++pb != last_pb) /* this is loop */ { >>> if (*pb != *cur) >>> break; >>> ++cur; >>> if (++pb == last_pb) /* unrolled 2nd iteration */ >>> break; >>> if (*pb != *cur) >>> break; >>> ++cur; >>> if (++pb == last_pb) /* unrolled 3rd iteration */ >>> break; >>> if (*pb != *cur) >>> break; >>> ++cur; >>> if (++pb == last_pb) /* unrolled 4th iteration */ >>> break; >>> if (*pb != *cur) >>> break; >>> ++cur; >>> } >> >> Now, gcc does a better job of identifying the "address expression induction >> variables". This version of the loop runs about 10% faster than the >> original on my target architecture. >> >> This would seem to be a textbook pattern for the induction variable >> analysis. Does anyone have any thoughts on the best way to add these >> candidates to the set of induction variables that are considered by >> tree-ssa-loop-ivopts.c? >> >> Thanks in advance for any suggestions. >> > Hi, > Could you please file a bug with your original slow test code > attached? I tried to construct meaningful test case from your code > snippet but not successful. There is difference in generated > assembly, but it's not that fundamental. So a bug with preprocessed > test would be high appreciated. > I think there are two potential issues in cost computation for such > case: invariant expression and iv uses outside of loop handled as > inside uses. > > Thanks, > bin > > #include #include int bt_skip_func(const __uint64_t len_limit, const __uint8_t *cur, long long int delta, __uint64_t len) { const __uint8_t *pb = cur - delta; while (++len != len_limit) { if (pb[len] != cur[len]) break; if (++len == len_limit) break; if (pb[len] != cur[len]) break; if (++len == len_limit) break; if (pb[len] != cur[len]) break; if (++len == len_limit) break; if (pb[len] != cur[len]) break; } return len; } int main (int argc,
[RFC] Induction variable candidates not sufficiently general
A somewhat old "issue report" pointed me to the code generated for a 4-fold manually unrolled version of the following loop: > while (++len != len_limit) /* this is loop */ > if (pb[len] != cur[len]) > break; As unrolled, the loop appears as: > while (++len != len_limit) /* this is loop */ { > if (pb[len] != cur[len]) > break; > if (++len == len_limit) /* unrolled 2nd iteration */ > break; > if (pb[len] != cur[len]) > break; > if (++len == len_limit) /* unrolled 3rd iteration */ > break; > if (pb[len] != cur[len]) > break; > if (++len == len_limit) /* unrolled 4th iteration */ > break; > if (pb[len] != cur[len]) > break; > } In examining the behavior of tree-ssa-loop-ivopts.c, I've discovered the only induction variable candidates that are being considered are all forms of the len variable. We are not considering any induction variables to represent the address expressions [len] and [len]. I rewrote the source code for this loop to make the addressing expressions more explicit, as in the following: > cur++; > while (++pb != last_pb) /* this is loop */ { > if (*pb != *cur) > break; > ++cur; > if (++pb == last_pb) /* unrolled 2nd iteration */ > break; > if (*pb != *cur) > break; > ++cur; > if (++pb == last_pb) /* unrolled 3rd iteration */ > break; > if (*pb != *cur) > break; > ++cur; > if (++pb == last_pb) /* unrolled 4th iteration */ > break; > if (*pb != *cur) > break; > ++cur; > } Now, gcc does a better job of identifying the "address expression induction variables". This version of the loop runs about 10% faster than the original on my target architecture. This would seem to be a textbook pattern for the induction variable analysis. Does anyone have any thoughts on the best way to add these candidates to the set of induction variables that are considered by tree-ssa-loop-ivopts.c? Thanks in advance for any suggestions.
[PATCH, rs6000] Alphabetize prototypes of AltiVec built-in functions in extend.texi
This patch alphabetizes the list of AltiVec built-in function prototypes that consume about 15 pages of the gcc.pdf file. As part of the alphabetization effort, certain functions that should not be documented in this section of the manual are separated from the others and moved to the end of the section with comments to explain their role. This patch prepares the way for future patches that will remove certain prototypes from this section and will insert certain prototypes that are currently missing from this section. It also improves readability and maintainability of the section. This patch has bootstrapped and tested without regressions on powerpc64le-unknown-linux (P8). I have also built the gcc.pdf file and reviewed its contents. In total, the diffs may appear daunting. A condensation of the diffs is obtained by separating out the insertions (+ in the first column) from the deletions (- in the first column), sorting the respective files, and performing a diff. This condensed diff reveals that the entirety of this patch results only in the following "net changes", all of which are (temporary) additions to the extend.texi file: < < < < < < @end smallexample < /* __int128, long long, and double arguments and results require -mvsx. */ < @smallexample < The following built-in functions which are currently documented in < this section are not alphabetized with other built-in functions of < this section because they belong in different sections. < /* vec_doublee requires -mvsx. */ < /* vec_doubleh requires -mvsx. */ < /* vec_doublel requires -mvsx. */ < /* vec_doubleo requires -mvsx. */ < /* vec_float2 requires -mvsx. */ < /* vec_floate requires -mvsx. */ < /* vec_floato requires -mvsx. */ < /* vec_float requires -mvsx. */ < /* vec_neg requires P8_vector */ < /* vec_signed2 requires -mcpu=power8. */ < /* vec_signede requires -mvsx. */ < /* vec_signedo requires -mvsx. */ < /* vec_signed requires -mvsx. */ < /* vec_sldw requires -mvsx. */ < /* vec_unsignede requires -mcpu=power8. */ < /* vec_unsignede requires -mvsx. */ < /* vec_unsignedo requires -mvsx. */ < /* vec_unsigned requires -mvsx. */ Is this patch ok for trunk? gcc/ChangeLog: 2018-07-10 Kelvin Nilsen * doc/extend.texi (PowerPC AltiVec Built-in Functions): Alphabetize prototypes of built-in functions, separating out built-in functions that are listed in this section but should be described elsewhere. Index: gcc/doc/extend.texi === --- gcc/doc/extend.texi (revision 262542) +++ gcc/doc/extend.texi (working copy) @@ -16065,29 +16065,6 @@ vector unsigned int vec_add (vector unsigned int, vector unsigned int vec_add (vector unsigned int, vector unsigned int); vector float vec_add (vector float, vector float); -vector float vec_vaddfp (vector float, vector float); - -vector signed int vec_vadduwm (vector bool int, vector signed int); -vector signed int vec_vadduwm (vector signed int, vector bool int); -vector signed int vec_vadduwm (vector signed int, vector signed int); -vector unsigned int vec_vadduwm (vector bool int, vector unsigned int); -vector unsigned int vec_vadduwm (vector unsigned int, vector bool int); -vector unsigned int vec_vadduwm (vector unsigned int, vector unsigned int); - -vector signed short vec_vadduhm (vector bool short, vector signed short); -vector signed short vec_vadduhm (vector signed short, vector bool short); -vector signed short vec_vadduhm (vector signed short, vector signed short); -vector unsigned short vec_vadduhm (vector bool short, vector unsigned short); -vector unsigned short vec_vadduhm (vector unsigned short, vector bool short); -vector unsigned short vec_vadduhm (vector unsigned short, vector unsigned short); - -vector signed char vec_vaddubm (vector bool char, vector signed char); -vector signed char vec_vaddubm (vector signed char, vector bool char); -vector signed char vec_vaddubm (vector signed char, vector signed char); -vector unsigned char vec_vaddubm (vector bool char, vector unsigned char); -vector unsigned char vec_vaddubm (vector unsigned char, vector bool char); -vector unsigned char vec_vaddubm (vector unsigned char, vector unsigned char); - vector unsigned int vec_addc (vector unsigned int, vector unsigned int); vector unsigned char vec_adds (vector bool char, vector unsigned char); @@ -16109,34 +16086,151 @@ vector signed int vec_adds (vector bool int, vecto vector signed int vec_adds (vector signed int, vector bool int); vector signed int vec_adds (vector signed int, vector signed int); -vector signed int vec_vaddsws (vector bool int, vector signed int); -vector signed int vec_vaddsws (vector signed int, vector bool int); -vector signed int vec_vaddsws (vector signed int, vector signed int); +int vec_all_eq (vector signed char, vector bool
[PATCH] Backport testsuite: Introduce be/le selectors
Hi Jeff, Is it ok to backport this patch to gcc 8? There are other backports of test programs that would like to use the new selector options. Thanks. On 5/23/18 12:31 PM, Segher Boessenkool wrote: > On Tue, May 22, 2018 at 03:21:30PM -0600, Jeff Law wrote: >> On 05/21/2018 03:46 PM, Segher Boessenkool wrote: >>> This patch creates "be" and "le" selectors, which can be used by all >>> architectures, similar to ilp32 and lp64. >> >> I think this is fine. "be" "le" are used all over the place in gcc and >> the kernel to denote big/little endian. > > Thanks. This is what I checked in (to trunk): > > > 2017-05-23 Segher Boessenkool > > * doc/sourcebuild.texi (Endianness): New subsubsection. > > gcc/testsuite/ > * lib/target-supports.exp (check_effective_target_be): New. > (check_effective_target_le): New. > > > diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi > index dfb0578..596007d 100644 > --- a/gcc/doc/sourcebuild.texi > +++ b/gcc/doc/sourcebuild.texi > @@ -1313,6 +1313,16 @@ By convention, keywords ending in @code{_nocache} can > also include options > specified for the particular test in an earlier @code{dg-options} or > @code{dg-add-options} directive. > > +@subsubsection Endianness > + > +@table @code > +@item be > +Target uses big-endian memory order for multi-byte and multi-word data. > + > +@item le > +Target uses little-endian memory order for multi-byte and multi-word data. > +@end table > + > @subsubsection Data type sizes > > @table @code > diff --git a/gcc/testsuite/lib/target-supports.exp > b/gcc/testsuite/lib/target-supports.exp > index aa1296e6..0a53d7b 100644 > --- a/gcc/testsuite/lib/target-supports.exp > +++ b/gcc/testsuite/lib/target-supports.exp > @@ -2523,6 +2523,22 @@ proc check_effective_target_next_runtime { } { > }] > } > > +# Return 1 if we're generating code for big-endian memory order. > + > +proc check_effective_target_be { } { > +return [check_no_compiler_messages be object { > + int dummy[__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__ ? 1 : -1]; > +}] > +} > + > +# Return 1 if we're generating code for little-endian memory order. > + > +proc check_effective_target_le { } { > +return [check_no_compiler_messages le object { > + int dummy[__BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ ? 1 : -1]; > +}] > +} > + > # Return 1 if we're generating 32-bit code using default options, 0 > # otherwise. >
Re: [PATCH, rs6000] Backport Fix tests that are failing in gcc.target/powerpc/bfp with -m32
Hi Segher, This patch, as revised in response to your suggestions, was committed to trunk on 4/17/2018. Is this ok for backporting to gcc8, gcc7, and gcc6? Thanks. On 4/13/18 3:15 PM, Kelvin Nilsen wrote: > Twelve failures have been occuring in the bfp test directory during -m32 > regression testing. > > The cause of these failures was two-fold: > > 1. Patches added subsequent to development of the tests caused new error > messages > to be emitted that are different than the error messages expected in the > dejagnu patterns. > These new patches also changed which built-in functions are legal when > compiling with the > -m32 command-line option. > > 2. The implementation of overloaded built-in functions maps overloaded > function names to > non-overloaded names. Depending on the stage at which an error is > recognized, error > messages may refer either to the overloaded built-in function name or > the non-overloaded > name. > > This patch: > > 1. Changes the expected error messages in certain test programs. > > 2. Disables certain test programs from being exercised on 32-bit targets. > > 3. Adds a "note" error message to explain the mapping from overloaded > built-in functions > to non-overloaded built-in functions. > > > This patch has bootstrapped and tested without regressions on both > powerpc64le-unknown-linux (P8) and on powerpc-linux (P7 big-endian, with > both -m32 > and -m64 target options). > > Is this ok for trunk? > > gcc/ChangeLog: > > 2018-04-13 Kelvin Nilsen > >    * config/rs6000/rs6000-protos.h (rs6000_builtin_is_supported_p): >    New prototype. >    * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin): >    Add note to error message to explain internal mapping of overloaded >    built-in function name to non-overloaded built-in function name. >    * config/rs6000/rs6000.c (rs6000_builtin_is_supported_p): New >    function. > > gcc/testsuite/ChangeLog: > > 2018-04-13 Kelvin Nilsen > >    * gcc.target/powerpc/bfp/scalar-extract-sig-5.c: Simplify to >    prevent cascading of errors and change expected error message. >    * gcc.target/powerpc/bfp/scalar-test-neg-4.c: Restrict this test >    to 64-bit targets. >    * gcc.target/powerpc/bfp/scalar-test-data-class-8.c: Likewise. >    * gcc.target/powerpc/bfp/scalar-test-data-class-9.c: Likewise. >    * gcc.target/powerpc/bfp/scalar-test-data-class-10.c: Likewise. >    * gcc.target/powerpc/bfp/scalar-insert-exp-11.c: Change expected >    error message. >    * gcc.target/powerpc/bfp/scalar-extract-exp-5.c: Likewise. > > Index: gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-5.c > === > --- gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-5.c   > (revision 259316) > +++ gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-5.c   > (working copy) > @@ -8,10 +8,10 @@ >    error because the builtin requires 64 bits. */ >  #include >  > -unsigned __int128 /* { dg-error "'__int128' is not supported on this > target" } */ > +unsigned long long int >  get_significand (__ieee128 *p) >  { >   __ieee128 source = *p; >  > - return __builtin_vec_scalar_extract_sig (source); /* { dg-error > "builtin function '__builtin_vec_scalar_extract_sig' not supported in > this compiler configuration" } */ > + return (long long int) __builtin_vec_scalar_extract_sig (source); /* > { dg-error "requires ISA 3.0 IEEE 128-bit floating point" } */ >  } > Index: gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-neg-4.c > === > --- gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-neg-4.c   > (revision 259316) > +++ gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-neg-4.c   (working > copy) > @@ -1,5 +1,6 @@ >  /* { dg-do compile { target { powerpc*-*-* } } } */ >  /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } > { "-mcpu=power9" } } */ > +/* { dg-require-effective-target lp64 } */ >  /* { dg-require-effective-target powerpc_p9vector_ok } */ >  /* { dg-options "-mcpu=power9" } */ >  > @@ -11,6 +12,8 @@ >  { >   __ieee128 source = *p; >  > + /* IEEE 128-bit floating point operations are only supported > + on 64-bit targets. */ >   return scalar_test_neg (source); >  } >  > Index: gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-data-class-8.c > === > --- gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-data-class-8.c   > (revision 259316) &
[PATCH, rs6000] Obvious patch to fix erroneous comment
In recently committed patch to correct code generation for the vec_packsu (vector unsigned long long, vector unsigned long long) built-in function, I accidentally left a comment in place that was not relevant to the final patch that was committed. This patch fixes that comment. After regression testing, I have committed this patch as obvious. gcc/testsuite/ChangeLog: 2018-06-26 Kelvin Nilsen * gcc.target/powerpc/builtins-1.c: Correct a comment. Index: gcc/testsuite/gcc.target/powerpc/builtins-1.c === --- gcc/testsuite/gcc.target/powerpc/builtins-1.c (revision 262149) +++ gcc/testsuite/gcc.target/powerpc/builtins-1.c (working copy) @@ -288,7 +288,7 @@ int main () vec_mul mulld | mullw, mulhwu vec_nor xxlnor vec_or xxlor - vec_packsu vpkudus (matches twice due to -dp option) + vec_packsu vpkudus vec_perm vperm vec_round xvrdpi vec_sel xxsel
Re: [PATCH v2, rs6000] Backport Fix implementation of vec_pack (vector double, vector double) built-in function
Hi Segher, After waiting a few days for this newly committed patch to settle, is it ok to backport to gcc 6, gcc 7, and gcc 8? Thanks. On 6/22/18 5:34 PM, Kelvin Nilsen wrote: > Thanks for feedback. It turns out that the vmrgew and vmrgow instructions > require power 8. > > After coordinating with Segher on minor refinements to the test cases, I have > committed the patch as quoted below to the trunk. > > On 6/19/18 5:37 PM, Segher Boessenkool wrote: >> Hi! >> >> On Tue, Jun 19, 2018 at 01:37:51PM -0500, Kelvin Nilsen wrote: >>> --- gcc/testsuite/gcc.target/powerpc/builtins-9.c (nonexistent) >>> +++ gcc/testsuite/gcc.target/powerpc/builtins-9.c (working copy) >>> @@ -0,0 +1,21 @@ >>> +/* { dg-do compile } */ >>> +/* { dg-require-effective-target powerpc_p8vector_ok } */ >>> +/* Expect same instruction selecton on p8 and above. Fix if future >>> + targets behave differently. */ >>> +/* { dg-options "-O3 -maltivec" } */ >> >> But this doesn't use -mcpu=power8 or similar. Does it need it anyway? >> Both xxpermdi and xvcvdpsp are Power7 (ISA 2.06) and the rest is AltiVec? >> So maybe just powerpc_vsx_ok? >> >>> +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } } */ >> >> You do not use -mcpu= so you don't need this. >> >> Same issues in the next test. Rest looks good though :-) >> >> >> Segher >> >> > > gcc/ChangeLog: > > 2018-06-22 Kelvin Nilsen > > * config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Change > behavior of vec_pack (vector double, vector double) to match > behavior of vec_float2 (vector double, vector double). > > gcc/testsuite/ChangeLog: > > 2018-06-22 Kelvin Nilsen > > * gcc.target/powerpc/builtins-3-p8.c (test_pack_float): Remove > this test. > * gcc.target/powerpc/builtins-9.c: New test. > * gcc.target/powerpc/fold-vec-pack-double.c: Modify dg directives > to expect different code generation on big-endian vs. > little-endian targets. > > Index: gcc/config/rs6000/rs6000-c.c > === > --- gcc/config/rs6000/rs6000-c.c (revision 261775) > +++ gcc/config/rs6000/rs6000-c.c (working copy) > @@ -2425,7 +2425,7 @@ const struct altivec_builtin_types altivec_overloa > RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V2DI, > RS6000_BTI_unsigned_V2DI, 0 }, >{ ALTIVEC_BUILTIN_VEC_PACK, P8V_BUILTIN_VPKUDUM, > RS6000_BTI_bool_V4SI, RS6000_BTI_bool_V2DI, RS6000_BTI_bool_V2DI, 0 }, > - { ALTIVEC_BUILTIN_VEC_PACK, P8V_BUILTIN_VPKUDUM, > + { ALTIVEC_BUILTIN_VEC_PACK, P8V_BUILTIN_FLOAT2_V2DF, > RS6000_BTI_V4SF, RS6000_BTI_V2DF, RS6000_BTI_V2DF, 0 }, > >{ P8V_BUILTIN_VEC_NEG, P8V_BUILTIN_NEG_V16QI, > Index: gcc/testsuite/gcc.target/powerpc/builtins-3-p8.c > === > --- gcc/testsuite/gcc.target/powerpc/builtins-3-p8.c (revision 261775) > +++ gcc/testsuite/gcc.target/powerpc/builtins-3-p8.c (working copy) > @@ -11,12 +11,6 @@ test_eq_long_long (vector bool long long x, vector > return vec_cmpeq (x, y); > } > > -vector float > -test_pack_float (vector double x, vector double y) > -{ > - return vec_pack (x, y); > -} > - > vector unsigned char > test_vsi_packs_vusi_vusi (vector unsigned short x, >vector unsigned short y) > @@ -214,7 +208,6 @@ test_neg_double (vector double x) > /* Expected test results: > > test_eq_long_long 1 vcmpequd inst > - test_pack_float 1 vpkudum inst > test_vsi_packs_vsll_vsll 1 vpksdss > test_vui_packs_vull_vull 1 vpkudus > test_vui_packs_vssi_vssi 1 vpkshss > @@ -239,7 +232,6 @@ test_neg_double (vector double x) > */ > > /* { dg-final { scan-assembler-times "vcmpequd" 1 } } */ > -/* { dg-final { scan-assembler-times "vpkudum" 1 } } */ > /* { dg-final { scan-assembler-times "vpksdss" 1 } } */ > /* { dg-final { scan-assembler-times "vpkudus" 1 } } */ > /* { dg-final { scan-assembler-times "vpkuhus" 2 } } */ > Index: gcc/testsuite/gcc.target/powerpc/builtins-9.c > === > --- gcc/testsuite/gcc.target/powerpc/builtins-9.c (nonexistent) > +++ gcc/testsuite/gcc.target/powerpc/builtins-9.c (working copy) > @@ -0,0 +1,19 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-ta
Re: [PATCH v2, rs6000] Fix implementation of vec_pack (vector double, vector double) built-in function
Thanks for feedback. It turns out that the vmrgew and vmrgow instructions require power 8. After coordinating with Segher on minor refinements to the test cases, I have committed the patch as quoted below to the trunk. On 6/19/18 5:37 PM, Segher Boessenkool wrote: > Hi! > > On Tue, Jun 19, 2018 at 01:37:51PM -0500, Kelvin Nilsen wrote: >> --- gcc/testsuite/gcc.target/powerpc/builtins-9.c(nonexistent) >> +++ gcc/testsuite/gcc.target/powerpc/builtins-9.c(working copy) >> @@ -0,0 +1,21 @@ >> +/* { dg-do compile } */ >> +/* { dg-require-effective-target powerpc_p8vector_ok } */ >> +/* Expect same instruction selecton on p8 and above. Fix if future >> + targets behave differently. */ >> +/* { dg-options "-O3 -maltivec" } */ > > But this doesn't use -mcpu=power8 or similar. Does it need it anyway? > Both xxpermdi and xvcvdpsp are Power7 (ISA 2.06) and the rest is AltiVec? > So maybe just powerpc_vsx_ok? > >> +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } } */ > > You do not use -mcpu= so you don't need this. > > Same issues in the next test. Rest looks good though :-) > > > Segher > > gcc/ChangeLog: 2018-06-22 Kelvin Nilsen * config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Change behavior of vec_pack (vector double, vector double) to match behavior of vec_float2 (vector double, vector double). gcc/testsuite/ChangeLog: 2018-06-22 Kelvin Nilsen * gcc.target/powerpc/builtins-3-p8.c (test_pack_float): Remove this test. * gcc.target/powerpc/builtins-9.c: New test. * gcc.target/powerpc/fold-vec-pack-double.c: Modify dg directives to expect different code generation on big-endian vs. little-endian targets. Index: gcc/config/rs6000/rs6000-c.c === --- gcc/config/rs6000/rs6000-c.c(revision 261775) +++ gcc/config/rs6000/rs6000-c.c(working copy) @@ -2425,7 +2425,7 @@ const struct altivec_builtin_types altivec_overloa RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 }, { ALTIVEC_BUILTIN_VEC_PACK, P8V_BUILTIN_VPKUDUM, RS6000_BTI_bool_V4SI, RS6000_BTI_bool_V2DI, RS6000_BTI_bool_V2DI, 0 }, - { ALTIVEC_BUILTIN_VEC_PACK, P8V_BUILTIN_VPKUDUM, + { ALTIVEC_BUILTIN_VEC_PACK, P8V_BUILTIN_FLOAT2_V2DF, RS6000_BTI_V4SF, RS6000_BTI_V2DF, RS6000_BTI_V2DF, 0 }, { P8V_BUILTIN_VEC_NEG, P8V_BUILTIN_NEG_V16QI, Index: gcc/testsuite/gcc.target/powerpc/builtins-3-p8.c === --- gcc/testsuite/gcc.target/powerpc/builtins-3-p8.c(revision 261775) +++ gcc/testsuite/gcc.target/powerpc/builtins-3-p8.c(working copy) @@ -11,12 +11,6 @@ test_eq_long_long (vector bool long long x, vector return vec_cmpeq (x, y); } -vector float -test_pack_float (vector double x, vector double y) -{ - return vec_pack (x, y); -} - vector unsigned char test_vsi_packs_vusi_vusi (vector unsigned short x, vector unsigned short y) @@ -214,7 +208,6 @@ test_neg_double (vector double x) /* Expected test results: test_eq_long_long 1 vcmpequd inst - test_pack_float 1 vpkudum inst test_vsi_packs_vsll_vsll 1 vpksdss test_vui_packs_vull_vull 1 vpkudus test_vui_packs_vssi_vssi 1 vpkshss @@ -239,7 +232,6 @@ test_neg_double (vector double x) */ /* { dg-final { scan-assembler-times "vcmpequd" 1 } } */ -/* { dg-final { scan-assembler-times "vpkudum" 1 } } */ /* { dg-final { scan-assembler-times "vpksdss" 1 } } */ /* { dg-final { scan-assembler-times "vpkudus" 1 } } */ /* { dg-final { scan-assembler-times "vpkuhus" 2 } } */ Index: gcc/testsuite/gcc.target/powerpc/builtins-9.c === --- gcc/testsuite/gcc.target/powerpc/builtins-9.c (nonexistent) +++ gcc/testsuite/gcc.target/powerpc/builtins-9.c (working copy) @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_p8vector_ok } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */ +/* { dg-options "-maltivec -mcpu=power8 -O3" } */ + +#include + +vector float +test_pack_float (vector double x, vector double y) +{ + return vec_pack (x, y); +} + +/* { dg-final { scan-assembler-times "vmrgew" 1 { target be } } } */ +/* { dg-final { scan-assembler-times "vmrgow" 1 { target le } } } */ + +/* { dg-final { scan-assembler-times "xvcvdpsp" 2 } } */ +/* { dg-final { scan-assembler-times "xxpermdi" 2 } } */ + Index: g
Re: [PATCH, rs6000] Backport Fix implementation of vec_packsu (vector unsigned long long, vector unsigned long long) built-in function
This has been committed to trunk. Is this ok to backport to gcc6, gcc7, and gcc8? Thanks. On 6/19/18 2:30 PM, Segher Boessenkool wrote: > Hi! > > On Mon, Jun 18, 2018 at 11:29:55AM -0500, Kelvin Nilsen wrote: >> +/* A single vpkudus matches twice because this is compiled with -dp, >> + causing diagnostic comments to appear in the resulting .s file, one >> + of which matches vpkudus. */ > > -dp prints the name of the instruction pattern, which is altivec_vpkudus. > So if you look for the full word instead, this problem isn't there I > think? > >> +/* { dg-final { scan-assembler-times "vpkudus" 2 } } */ > > /* { dg-final { scan-assembler-times {\mvpkudus\M} 1 } } */ > > Okay with that change (and comment changes). Thanks! > > > Segher > >
[PATCH v2, rs6000] Fix implementation of vec_pack (vector double, vector double) built-in function
This patch fixes an error in the code generation for vec_pack (vector double, vector double). As previously implemented, this built-in function translates to the vpkudum instruction. This patch causes vec_pack (vector double, vector double) to behave the same as vec_float2 for the same type signature, producing the vmrgow instruction on little-endian targets and the vmrgew instruction on big-endian targets. This revision differs from the initial path submission in that it combines all of the new testing into two test programs, using target qualifiers on the dg scan-assembler-times directives. This patch has bootstrapped and tested without regressions on powerpc64le-unknown-linux (both P8 and P9) and on powerpc-linux (P8 big-endian, both -m32 and -m64). Is this ok for the trunk? gcc/ChangeLog: 2018-06-19 Kelvin Nilsen * config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Change behavior of vec_pack (vector double, vector double) to match behavior of vec_float2 (vector double, vector double). gcc/testsuite/ChangeLog: 2018-06-19 Kelvin Nilsen * gcc.target/powerpc/builtins-3-p8.c (test_pack_float): Remove this test. * gcc.target/powerpc/builtins-9.c: New test. * gcc.target/powerpc/fold-vec-pack-double.c: Modify dg directives to expect different code generation on big-endian vs. little-endian targets. Index: gcc/config/rs6000/rs6000-c.c === --- gcc/config/rs6000/rs6000-c.c(revision 261341) +++ gcc/config/rs6000/rs6000-c.c(working copy) @@ -2425,7 +2425,7 @@ const struct altivec_builtin_types altivec_overloa RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 }, { ALTIVEC_BUILTIN_VEC_PACK, P8V_BUILTIN_VPKUDUM, RS6000_BTI_bool_V4SI, RS6000_BTI_bool_V2DI, RS6000_BTI_bool_V2DI, 0 }, - { ALTIVEC_BUILTIN_VEC_PACK, P8V_BUILTIN_VPKUDUM, + { ALTIVEC_BUILTIN_VEC_PACK, P8V_BUILTIN_FLOAT2_V2DF, RS6000_BTI_V4SF, RS6000_BTI_V2DF, RS6000_BTI_V2DF, 0 }, { P8V_BUILTIN_VEC_NEG, P8V_BUILTIN_NEG_V16QI, Index: gcc/testsuite/gcc.target/powerpc/builtins-3-p8.c === --- gcc/testsuite/gcc.target/powerpc/builtins-3-p8.c(revision 261341) +++ gcc/testsuite/gcc.target/powerpc/builtins-3-p8.c(working copy) @@ -11,12 +11,6 @@ test_eq_long_long (vector bool long long x, vector return vec_cmpeq (x, y); } -vector float -test_pack_float (vector double x, vector double y) -{ - return vec_pack (x, y); -} - vector unsigned char test_vsi_packs_vusi_vusi (vector unsigned short x, vector unsigned short y) @@ -214,7 +208,6 @@ test_neg_double (vector double x) /* Expected test results: test_eq_long_long 1 vcmpequd inst - test_pack_float 1 vpkudum inst test_vsi_packs_vsll_vsll 1 vpksdss test_vui_packs_vull_vull 1 vpkudus test_vui_packs_vssi_vssi 1 vpkshss @@ -239,7 +232,6 @@ test_neg_double (vector double x) */ /* { dg-final { scan-assembler-times "vcmpequd" 1 } } */ -/* { dg-final { scan-assembler-times "vpkudum" 1 } } */ /* { dg-final { scan-assembler-times "vpksdss" 1 } } */ /* { dg-final { scan-assembler-times "vpkudus" 1 } } */ /* { dg-final { scan-assembler-times "vpkuhus" 2 } } */ Index: gcc/testsuite/gcc.target/powerpc/builtins-9.c === --- gcc/testsuite/gcc.target/powerpc/builtins-9.c (nonexistent) +++ gcc/testsuite/gcc.target/powerpc/builtins-9.c (working copy) @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_p8vector_ok } */ +/* Expect same instruction selecton on p8 and above. Fix if future + targets behave differently. */ +/* { dg-options "-O3 -maltivec" } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } } */ + +#include + +vector float +test_pack_float (vector double x, vector double y) +{ + return vec_pack (x, y); +} + +/* { dg-final { scan-assembler-times "vmrgew" 1 { target be } } } */ +/* { dg-final { scan-assembler-times "vmrgow" 1 { target le } } } */ + +/* { dg-final { scan-assembler-times "xvcvdpsp" 2 } } */ +/* { dg-final { scan-assembler-times "xxpermdi" 2 } } */ + Index: gcc/testsuite/gcc.target/powerpc/fold-vec-pack-double.c === --- gcc/testsuite/gcc.target/powerpc/fold-vec-pack-double.c (revision 261341) +++ gcc/testsuite/gcc.target/powerpc/fold-vec-pack-double.c (working copy) @@ -3,7 +3,10 @@ /* { dg-do compile } */ /* { dg-require-effective-target powerpc_p8vector_ok } */ -/* { dg-options "-mvsx
[PATCH, rs6000] Fix implementation of vec_packsu (vector unsigned long long, vector unsigned long long) built-in function
This patch fixes an error in the code generation for vec_packsu (vector unsigned long long, vector unsigned long long). As previously implemented, this built-in function translates to the vpksdus instruction. This patch causes vec_packsu (vector unsigned long long, vector unsigned long long) to behave the same as vec_packs (vector unsigned long long, vector unsigned long long) for the same type signature, producing the vpkudus instruction. This patch has bootstrapped and tested without regressions on powerpc64le-unknown-linux (both P8 and P9) and on powerpc-linux (P8 big-endian, both -m32 and -m64). Is this ok for the trunk? gcc/ChangeLog: 2018-06-18 Kelvin Nilsen * config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Change behavior of vec_packsu (vector unsigned long long, vector unsigned long long) to match behavior of vec_packs with same signature. gcc/testsuite/ChangeLog: 2018-06-18 Kelvin Nilsen * gcc.target/powerpc/builtins-1.c: Adjust dg directives to scan for vpkudus in place of vpksdus. * gcc.target/powerpc/builtins-3-p8.c: Likewise. Index: gcc/config/rs6000/rs6000-c.c === --- gcc/config/rs6000/rs6000-c.c(revision 261599) +++ gcc/config/rs6000/rs6000-c.c(working copy) @@ -2544,7 +2544,7 @@ const struct altivec_builtin_types altivec_overloa RS6000_BTI_unsigned_V8HI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 }, { ALTIVEC_BUILTIN_VEC_PACKSU, P8V_BUILTIN_VPKSDUS, RS6000_BTI_unsigned_V4SI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 }, - { ALTIVEC_BUILTIN_VEC_PACKSU, P8V_BUILTIN_VPKSDUS, + { ALTIVEC_BUILTIN_VEC_PACKSU, P8V_BUILTIN_VPKUDUS, RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 }, { ALTIVEC_BUILTIN_VEC_VPKSWUS, ALTIVEC_BUILTIN_VPKSWUS, RS6000_BTI_unsigned_V8HI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 }, Index: gcc/testsuite/gcc.target/powerpc/builtins-1.c === --- gcc/testsuite/gcc.target/powerpc/builtins-1.c (revision 261599) +++ gcc/testsuite/gcc.target/powerpc/builtins-1.c (working copy) @@ -297,7 +297,7 @@ int main () vec_mul mulld | mullw, mulhwu vec_nor xxlnor vec_or xxlor - vec_packsu vpksdus + vec_packsu vpkudus (matches twice due to -dp option) vec_perm vperm vec_round xvrdpi vec_sel xxsel @@ -335,7 +335,11 @@ int main () /* { dg-final { scan-assembler-times "xxlnor" 6 } } */ /* { dg-final { scan-assembler-times "xxlor" 11 { target { ilp32 } } } } */ /* { dg-final { scan-assembler-times "xxlor" 7 { target { lp64 } } } } */ -/* { dg-final { scan-assembler-times "vpksdus" 2 } } */ + +/* A single vpkudus matches twice because this is compiled with -dp, + causing diagnostic comments to appear in the resulting .s file, one + of which matches vpkudus. */ +/* { dg-final { scan-assembler-times "vpkudus" 2 } } */ /* { dg-final { scan-assembler-times "vperm" 4 } } */ /* { dg-final { scan-assembler-times "xvrdpi" 2 } } */ /* { dg-final { scan-assembler-times "xxsel" 10 } } */ Index: gcc/testsuite/gcc.target/powerpc/builtins-3-p8.c === --- gcc/testsuite/gcc.target/powerpc/builtins-3-p8.c(revision 261599) +++ gcc/testsuite/gcc.target/powerpc/builtins-3-p8.c(working copy) @@ -219,6 +219,8 @@ test_neg_double (vector double x) test_vui_packs_vull_vull 1 vpkudus test_vui_packs_vssi_vssi 1 vpkshss test_vsi_packsu_vssi_vssi 1 vpkshus + test_vsi_packsu_vsll_vsll 1 vpksdus + test_vsi_packsu_vull_vull 1 vpkudus test_unsigned_char_popcnt_signed_char 1 vpopcntb test_unsigned_char_popcnt_unsigned_char 1 vpopcntb test_unsigned_short_popcnt_signed_short 1 vpopcnth @@ -241,11 +243,11 @@ test_neg_double (vector double x) /* { dg-final { scan-assembler-times "vcmpequd" 1 } } */ /* { dg-final { scan-assembler-times "vpkudum" 1 } } */ /* { dg-final { scan-assembler-times "vpksdss" 1 } } */ -/* { dg-final { scan-assembler-times "vpkudus" 1 } } */ +/* { dg-final { scan-assembler-times "vpkudus" 2 } } */ /* { dg-final { scan-assembler-times "vpkuhus" 2 } } */ /* { dg-final { scan-assembler-times "vpkshss" 1 } } */ /* { dg-final { scan-assembler-times "vpkshus" 1 } } */ -/* { dg-final { scan-assembler-times "vpksdus" 2 } } */ +/* { dg-final { scan-assembler-times "vpksdus" 1 } } */ /* { dg-final { scan-assembler-times "vpkuwus" 2 } } */ /* { dg-final { scan-assembler-times "vpopcntb" 2 } } */ /* { dg-final { scan-assembler-times "vpopcnth" 2 } } */
[PATCH, rs6000] Fix implementation of vec_pack (vector double, vector double) built-in function
This patch fixes an error in the code generation for vec_pack (vector double, vector double). As previously implemented, this built-in function translates to the vpkudum instruction. This patch causes vec_pack (vector double, vector double) to behave the same as vec_float2 for the same type signature, producing the vmrgow instruction on little-endian targets and the vmrgew instruction on big-endian targets. This patch has bootstrapped and tested without regressions on powerpc64le-unknown-linux (both P8 and P9) and on powerpc-linux (P8 big-endian, both -m32 and -m64). Is this ok for the trunk? gcc/ChangeLog: 2018-06-14 Kelvin Nilsen * config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Change behavior of vec_pack (double, double) to match behavior of vec_float2 (double, double). gcc/testsuite/ChangeLog: 2018-06-14 Kelvin Nilsen * gcc.target/powerpc/builtins-3-p8.c (test_pack_float): Remove this test. * gcc.target/powerpc/builtins-9-p8-be.c: New test. * gcc.target/powerpc/builtins-9-p8-le.c: New test. * gcc.target/powerpc/builtins-9-p9-le.c: New test. * gcc.target/powerpc/fold-vec-pack-double-p8-be.c: New test. * gcc.target/powerpc/fold-vec-pack-double-p8-le.c: New test. * gcc.target/powerpc/fold-vec-pack-double.c: Specialize this test for p9 little-endian. Index: gcc/config/rs6000/rs6000-c.c === --- gcc/config/rs6000/rs6000-c.c(revision 261341) +++ gcc/config/rs6000/rs6000-c.c(working copy) @@ -2425,7 +2425,7 @@ const struct altivec_builtin_types altivec_overloa RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 }, { ALTIVEC_BUILTIN_VEC_PACK, P8V_BUILTIN_VPKUDUM, RS6000_BTI_bool_V4SI, RS6000_BTI_bool_V2DI, RS6000_BTI_bool_V2DI, 0 }, - { ALTIVEC_BUILTIN_VEC_PACK, P8V_BUILTIN_VPKUDUM, + { ALTIVEC_BUILTIN_VEC_PACK, P8V_BUILTIN_FLOAT2_V2DF, RS6000_BTI_V4SF, RS6000_BTI_V2DF, RS6000_BTI_V2DF, 0 }, { P8V_BUILTIN_VEC_NEG, P8V_BUILTIN_NEG_V16QI, Index: gcc/testsuite/gcc.target/powerpc/builtins-3-p8.c === --- gcc/testsuite/gcc.target/powerpc/builtins-3-p8.c(revision 261341) +++ gcc/testsuite/gcc.target/powerpc/builtins-3-p8.c(working copy) @@ -11,12 +11,6 @@ test_eq_long_long (vector bool long long x, vector return vec_cmpeq (x, y); } -vector float -test_pack_float (vector double x, vector double y) -{ - return vec_pack (x, y); -} - vector unsigned char test_vsi_packs_vusi_vusi (vector unsigned short x, vector unsigned short y) @@ -214,7 +208,6 @@ test_neg_double (vector double x) /* Expected test results: test_eq_long_long 1 vcmpequd inst - test_pack_float 1 vpkudum inst test_vsi_packs_vsll_vsll 1 vpksdss test_vui_packs_vull_vull 1 vpkudus test_vui_packs_vssi_vssi 1 vpkshss @@ -239,7 +232,6 @@ test_neg_double (vector double x) */ /* { dg-final { scan-assembler-times "vcmpequd" 1 } } */ -/* { dg-final { scan-assembler-times "vpkudum" 1 } } */ /* { dg-final { scan-assembler-times "vpksdss" 1 } } */ /* { dg-final { scan-assembler-times "vpkudus" 1 } } */ /* { dg-final { scan-assembler-times "vpkuhus" 2 } } */ Index: gcc/testsuite/gcc.target/powerpc/builtins-9-p8-be.c === --- gcc/testsuite/gcc.target/powerpc/builtins-9-p8-be.c (nonexistent) +++ gcc/testsuite/gcc.target/powerpc/builtins-9-p8-be.c (working copy) @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_p8vector_ok } */ +/* { dg-require-effective-target be } */ /* Require big-endian. */ +/* { dg-options "-O3 -maltivec -mcpu=power8" } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */ + +#include + +vector float +test_pack_float (vector double x, vector double y) +{ + return vec_pack (x, y); +} + +/* { dg-final { scan-assembler-times "vmrgew" 1 } } */ +/* { dg-final { scan-assembler-times "xvcvdpsp" 2 } } */ +/* { dg-final { scan-assembler-times "xxpermdi" 2 } } */ + Index: gcc/testsuite/gcc.target/powerpc/builtins-9-p8-le.c === --- gcc/testsuite/gcc.target/powerpc/builtins-9-p8-le.c (nonexistent) +++ gcc/testsuite/gcc.target/powerpc/builtins-9-p8-le.c (working copy) @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_p8vector_ok } */ +/* { dg-require-effective-target le } */ /* Require little-endian. */ +/* { dg-options "-O3 -maltivec -mcpu=power8" } */ +/* { dg-skip-if "do not
[PATCH, rs6000] Improve indentation of prototype documentation
This patch removes extraneous line breaks to condense the number of lines require in the "PowerPC AltiVec Built-in Functions" section of the gcc.pdf manual by about 7 pages. Besides improving the appearance of this documentation, there are two additional benefits: 1. Subsequent patches that move prototype definitions in order to alphabetize definitions or in order to group definitions requiring the same target options together are easier to understand if each prototype description is represented on a single line. 2. Enclosing the group of 8 vec_xl prototypes and 8 vec_xst prototypes between @smallexample and @end smallexample allows these prototypes to be automatically parsed by a tool that validates consistency between implementation and documentation of built-in functions. This patch has bootstrapped and tested without regressions on powerpc64le-unknown-linux (P8). I have also built the gcc.pdf file and reviewed its contents. Segher: if you prefer, I can break this into multiple smaller patches. What would be the ideal size of each patch? Is this ok for trunk? gcc/ChangeLog: 2018-06-04 Kelvin Nilsen * doc/extend.texi (PowerPC AltiVec Built-in Functions): Adjust indentation and line wrap for many prototypes. Add missing @smallexample directives around block of prototypes for vec_xl and vec_xst. Index: gcc/doc/extend.texi === --- gcc/doc/extend.texi (revision 261067) +++ gcc/doc/extend.texi (working copy) @@ -16200,17 +16200,13 @@ vector signed char vec_add (vector signed char, ve vector signed char vec_add (vector signed char, vector signed char); vector unsigned char vec_add (vector bool char, vector unsigned char); vector unsigned char vec_add (vector unsigned char, vector bool char); -vector unsigned char vec_add (vector unsigned char, - vector unsigned char); +vector unsigned char vec_add (vector unsigned char, vector unsigned char); vector signed short vec_add (vector bool short, vector signed short); vector signed short vec_add (vector signed short, vector bool short); vector signed short vec_add (vector signed short, vector signed short); -vector unsigned short vec_add (vector bool short, - vector unsigned short); -vector unsigned short vec_add (vector unsigned short, - vector bool short); -vector unsigned short vec_add (vector unsigned short, - vector unsigned short); +vector unsigned short vec_add (vector bool short, vector unsigned short); +vector unsigned short vec_add (vector unsigned short, vector bool short); +vector unsigned short vec_add (vector unsigned short, vector unsigned short); vector signed int vec_add (vector bool int, vector signed int); vector signed int vec_add (vector signed int, vector bool int); vector signed int vec_add (vector signed int, vector signed int); @@ -16226,47 +16222,33 @@ vector signed int vec_vadduwm (vector signed int, vector signed int vec_vadduwm (vector signed int, vector signed int); vector unsigned int vec_vadduwm (vector bool int, vector unsigned int); vector unsigned int vec_vadduwm (vector unsigned int, vector bool int); -vector unsigned int vec_vadduwm (vector unsigned int, - vector unsigned int); +vector unsigned int vec_vadduwm (vector unsigned int, vector unsigned int); -vector signed short vec_vadduhm (vector bool short, - vector signed short); -vector signed short vec_vadduhm (vector signed short, - vector bool short); -vector signed short vec_vadduhm (vector signed short, - vector signed short); -vector unsigned short vec_vadduhm (vector bool short, - vector unsigned short); -vector unsigned short vec_vadduhm (vector unsigned short, - vector bool short); -vector unsigned short vec_vadduhm (vector unsigned short, - vector unsigned short); +vector signed short vec_vadduhm (vector bool short, vector signed short); +vector signed short vec_vadduhm (vector signed short, vector bool short); +vector signed short vec_vadduhm (vector signed short, vector signed short); +vector unsigned short vec_vadduhm (vector bool short, vector unsigned short); +vector unsigned short vec_vadduhm (vector unsigned short, vector bool short); +vector unsigned short vec_vadduhm (vector unsigned short, vector unsigned short); vector signed char vec_vaddubm (vector bool char, vector signed char); vector signed char vec_vaddubm (vector signed char, vector bool char); vector signed char vec_vaddubm (vector signed char, vector signed char); -vector unsigned char vec_vaddubm (vector bool char, - vector unsigned char); -vector unsigned char vec_vaddu
[PATCH, rs6000] Correct documentation of vec_lvsl and vec_lvsr arguments
The existing documentation incorrectly specifies that the second argument of vec_lvsl and vec_lvsr instructions are volatile *. This patch removes the volatile qualifier from the documentation of these arguments. his patch has bootstrapped and tested without regressions on powerpc64le-unknown-linux (P8). I have built the gcc.pdf file and reviewed its contents. Is this ok for trunk? gcc/ChangeLog: 2018-06-04 Kelvin Nilsen * doc/extend.texi (PowerPC AltiVec Built-in Functions): Remove volatile qualifier from vec_lvsl and vec_lvsr argument prototypes. Index: gcc/doc/extend.texi === --- gcc/doc/extend.texi (revision 261067) +++ gcc/doc/extend.texi (working copy) @@ -16662,25 +16662,25 @@ vector unsigned char vec_ldl (int, const unsigned vector float vec_loge (vector float); -vector unsigned char vec_lvsl (int, const volatile unsigned char *); -vector unsigned char vec_lvsl (int, const volatile signed char *); -vector unsigned char vec_lvsl (int, const volatile unsigned short *); -vector unsigned char vec_lvsl (int, const volatile short *); -vector unsigned char vec_lvsl (int, const volatile unsigned int *); -vector unsigned char vec_lvsl (int, const volatile int *); -vector unsigned char vec_lvsl (int, const volatile unsigned long *); -vector unsigned char vec_lvsl (int, const volatile long *); -vector unsigned char vec_lvsl (int, const volatile float *); +vector unsigned char vec_lvsl (int, const unsigned char *); +vector unsigned char vec_lvsl (int, const signed char *); +vector unsigned char vec_lvsl (int, const unsigned short *); +vector unsigned char vec_lvsl (int, const short *); +vector unsigned char vec_lvsl (int, const unsigned int *); +vector unsigned char vec_lvsl (int, const int *); +vector unsigned char vec_lvsl (int, const unsigned long *); +vector unsigned char vec_lvsl (int, const long *); +vector unsigned char vec_lvsl (int, const float *); -vector unsigned char vec_lvsr (int, const volatile unsigned char *); -vector unsigned char vec_lvsr (int, const volatile signed char *); -vector unsigned char vec_lvsr (int, const volatile unsigned short *); -vector unsigned char vec_lvsr (int, const volatile short *); -vector unsigned char vec_lvsr (int, const volatile unsigned int *); -vector unsigned char vec_lvsr (int, const volatile int *); -vector unsigned char vec_lvsr (int, const volatile unsigned long *); -vector unsigned char vec_lvsr (int, const volatile long *); -vector unsigned char vec_lvsr (int, const volatile float *); +vector unsigned char vec_lvsr (int, const unsigned char *); +vector unsigned char vec_lvsr (int, const signed char *); +vector unsigned char vec_lvsr (int, const unsigned short *); +vector unsigned char vec_lvsr (int, const short *); +vector unsigned char vec_lvsr (int, const unsigned int *); +vector unsigned char vec_lvsr (int, const int *); +vector unsigned char vec_lvsr (int, const unsigned long *); +vector unsigned char vec_lvsr (int, const long *); +vector unsigned char vec_lvsr (int, const float *); vector float vec_madd (vector float, vector float, vector float); @@ -18210,8 +18210,8 @@ vector double vec_ld (int, const vector double *); vector double vec_ld (int, const double *); vector double vec_ldl (int, const vector double *); vector double vec_ldl (int, const double *); -vector unsigned char vec_lvsl (int, const volatile double *); -vector unsigned char vec_lvsr (int, const volatile double *); +vector unsigned char vec_lvsl (int, const double *); +vector unsigned char vec_lvsr (int, const double *); vector double vec_madd (vector double, vector double, vector double); vector double vec_max (vector double, vector double); vector signed long vec_mergeh (vector signed long, vector signed long);
[PATCH, rs6000] Clean up implementation of built-in functions
This patch improves maintainability of the rs6000 built-in functions by adding a comment to describe the non-traditional implementation of the __builtin_vec_vsx_ld and __builtin_vec_vsx_st functions, and by removing eight redundant entries from the altivec_overloaded_builtins array. Note, in the patch file, that the lines immediately preceding each of the deletions from altivec_overloaded_builtins exactly matches the deleted lines. This redundancy may have been accidentally introduced by manual resolution of merge conflicts. I did not investigate the origin of the redundancy. The redundant entries cause trouble to tools that automate consistency checking between implementation and documentation of built-in functions. Additionally, they are a likely cause of future bugs if any future efforts need to make corrections or changes to the associated functions. This patch has bootstrapped and tested without regressions on powerpc64le-unknown-linux (P8). Is this ok for trunk? gcc/ChangeLog: 2018-06-01 Kelvin Nilsen * config/rs6000/rs6000-builtin.def (VSX_BUILTIN_VEC_LD, VSX_BUILTIN_VEC_ST): Add comment to explain non-traditional uses. * config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Remove several redundant entries. Index: gcc/config/rs6000/rs6000-builtin.def === --- gcc/config/rs6000/rs6000-builtin.def(revision 261067) +++ gcc/config/rs6000/rs6000-builtin.def(working copy) @@ -1811,6 +1811,15 @@ BU_VSX_OVERLOAD_1 (VUNSIGNEDE, "vunsignede") BU_VSX_OVERLOAD_1 (VUNSIGNEDO, "vunsignedo") /* VSX builtins that are handled as special cases. */ + + +/* NON-TRADITIONAL BEHAVIOR HERE: Besides introducing the + __builtin_vec_ld and __builtin_vec_st built-in functions, + the VSX_BUILTIN_VEC_LD and VSX_BUILTIN_VEC_ST symbolic constants + introduced below are also affiliated with the __builtin_vec_vsx_ld + and __builtin_vec_vsx_st functions respectively. This unnatural + binding is formed with explicit calls to the def_builtin function + found in rs6000.c. */ BU_VSX_OVERLOAD_X (LD, "ld") BU_VSX_OVERLOAD_X (ST, "st") BU_VSX_OVERLOAD_X (XL, "xl") Index: gcc/config/rs6000/rs6000-c.c === --- gcc/config/rs6000/rs6000-c.c(revision 261067) +++ gcc/config/rs6000/rs6000-c.c(working copy) @@ -1375,28 +1375,16 @@ const struct altivec_builtin_types altivec_overloa RS6000_BTI_bool_V4SI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 }, { ALTIVEC_BUILTIN_VEC_VCMPGTSW, ALTIVEC_BUILTIN_VCMPGTSW, RS6000_BTI_bool_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 }, - { ALTIVEC_BUILTIN_VEC_VCMPGTSW, ALTIVEC_BUILTIN_VCMPGTSW, -RS6000_BTI_bool_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 }, { ALTIVEC_BUILTIN_VEC_VCMPGTUW, ALTIVEC_BUILTIN_VCMPGTUW, RS6000_BTI_bool_V4SI, RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, 0 }, - { ALTIVEC_BUILTIN_VEC_VCMPGTUW, ALTIVEC_BUILTIN_VCMPGTUW, -RS6000_BTI_bool_V4SI, RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, 0 }, { ALTIVEC_BUILTIN_VEC_VCMPGTSH, ALTIVEC_BUILTIN_VCMPGTSH, RS6000_BTI_bool_V8HI, RS6000_BTI_V8HI, RS6000_BTI_V8HI, 0 }, - { ALTIVEC_BUILTIN_VEC_VCMPGTSH, ALTIVEC_BUILTIN_VCMPGTSH, -RS6000_BTI_bool_V8HI, RS6000_BTI_V8HI, RS6000_BTI_V8HI, 0 }, { ALTIVEC_BUILTIN_VEC_VCMPGTUH, ALTIVEC_BUILTIN_VCMPGTUH, RS6000_BTI_bool_V8HI, RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI, 0 }, - { ALTIVEC_BUILTIN_VEC_VCMPGTUH, ALTIVEC_BUILTIN_VCMPGTUH, -RS6000_BTI_bool_V8HI, RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI, 0 }, { ALTIVEC_BUILTIN_VEC_VCMPGTSB, ALTIVEC_BUILTIN_VCMPGTSB, RS6000_BTI_bool_V16QI, RS6000_BTI_V16QI, RS6000_BTI_V16QI, 0 }, - { ALTIVEC_BUILTIN_VEC_VCMPGTSB, ALTIVEC_BUILTIN_VCMPGTSB, -RS6000_BTI_bool_V16QI, RS6000_BTI_V16QI, RS6000_BTI_V16QI, 0 }, { ALTIVEC_BUILTIN_VEC_VCMPGTUB, ALTIVEC_BUILTIN_VCMPGTUB, RS6000_BTI_bool_V16QI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0 }, - { ALTIVEC_BUILTIN_VEC_VCMPGTUB, ALTIVEC_BUILTIN_VCMPGTUB, -RS6000_BTI_bool_V16QI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0 }, { ALTIVEC_BUILTIN_VEC_CMPLE, ALTIVEC_BUILTIN_VCMPGEFP, RS6000_BTI_bool_V4SI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 }, { ALTIVEC_BUILTIN_VEC_CMPLE, VSX_BUILTIN_XVCMPGEDP, @@ -4249,8 +4237,6 @@ const struct altivec_builtin_types altivec_overloa { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVD2X_V2DI, RS6000_BTI_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_long_long, 0 }, { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVD2X_V2DI, -RS6000_BTI_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_long_long, 0 }, - { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVD2X_V2DI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTTI, 0 }, { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVD2X_V2DI, RS6000_BTI_unsigned_V
[PATCH, rs6000] Remove incorrect built-in function documentation
This patch removes several incorrectly documented functions from the "PowerPC AltiVec Built-in Functions" section of the "Using the GNU Compiler Collection" manual. The following two functions are removed because they are not implemented: vector float vec_copysign (vector float); vector float vec_recip (vector float, vector float); The following six functions are removed because though they are implemented, they are not specified in the AltiVec PIM document and the type of the result vector does not match the type of the supplied pointer argument: vector signed int vec_lde (int, const long long *); vector unsigned int vec_lde (int, const unsigned long long *); vector int vec_ld (int, long *) vector unsigned int vec_ld (int, const unsigned long *); vector signed int vec_lvewx (int, long *); vector unsigned int vec_lvewx (int, unsigned long *); The following two functions are removed because they are not implemented. Also, they are not specified in the AltiVec PIM document and the type of the result vector does not match the type of the supplied pointer argument: vector signed int vec_ldl (int, const long *); vector unsigned int vec_ldl (int, const unsigned long *); The following four functions are removed because they are not implemented. They do happen to be specified in the AltiVec PIM document. Until they are implemented, they should not be documented: void vec_st (vector pixel, int, unsigned short *) void vec_st (vector pixel, int, short *) void vec_stl (vector pixel, int, unsigned short *); void vec_stl (vector pixel, int, short *); The following two functions are removed because they are not implemented. They are not specified in the AltiVec PIM or ABI v.2 documents: void vec_stvehx (vector pixel, int, short *); void vec_stvehx (vector pixel, int, unsigned short *); The following function was incompletely documented. The argument list lacked a closing parenthesis. There is no function by this name. test_vsi_packsu_vssi_vssi (vector signed short x, This patch successfully builds on both powerpc64le-unknown-linux (P8) and on powerpc-linux (P7 big-endian, with both -m32 and -m64 target options). The patch affects only extend.texi. The gcc.pdf file has been built and reviewed. Is this ok for the trunk? gcc/ChangeLog: 2018-05-23 Kelvin Nilsen <kel...@gcc.gnu.org> * doc/extend.texi (PowerPC AltiVec Built-in Functions): Remove descriptions of various incorrectly documented functions. Index: gcc/doc/extend.texi === --- gcc/doc/extend.texi (revision 260607) +++ gcc/doc/extend.texi (working copy) @@ -16354,8 +16354,6 @@ vector signed char vec_vavgsb (vector signed char, vector unsigned char vec_vavgub (vector unsigned char, vector unsigned char); -vector float vec_copysign (vector float); - vector float vec_ceil (vector float); vector signed int vec_cmpb (vector float, vector float); @@ -16569,10 +16567,8 @@ vector float vec_ld (int, const float *); vector bool int vec_ld (int, const vector bool int *); vector signed int vec_ld (int, const vector signed int *); vector signed int vec_ld (int, const int *); -vector signed int vec_ld (int, const long *); vector unsigned int vec_ld (int, const vector unsigned int *); vector unsigned int vec_ld (int, const unsigned int *); -vector unsigned int vec_ld (int, const unsigned long *); vector bool short vec_ld (int, const vector bool short *); vector pixel vec_ld (int, const vector pixel *); vector signed short vec_ld (int, const vector signed short *); @@ -16592,14 +16588,10 @@ vector unsigned short vec_lde (int, const unsigned vector float vec_lde (int, const float *); vector signed int vec_lde (int, const int *); vector unsigned int vec_lde (int, const unsigned int *); -vector signed int vec_lde (int, const long *); -vector unsigned int vec_lde (int, const unsigned long *); vector float vec_lvewx (int, float *); vector signed int vec_lvewx (int, int *); vector unsigned int vec_lvewx (int, unsigned int *); -vector signed int vec_lvewx (int, long *); -vector unsigned int vec_lvewx (int, unsigned long *); vector signed short vec_lvehx (int, short *); vector unsigned short vec_lvehx (int, unsigned short *); @@ -16612,10 +16604,8 @@ vector float vec_ldl (int, const float *); vector bool int vec_ldl (int, const vector bool int *); vector signed int vec_ldl (int, const vector signed int *); vector signed int vec_ldl (int, const int *); -vector signed int vec_ldl (int, const long *); vector unsigned int vec_ldl (int, const vector unsigned int *); vector unsigned int vec_ldl (int, const unsigned int *); -vector unsigned int vec_ldl (int, const unsigned long *); vector bool short vec_ldl (int, const vector bool short *); vector pixel vec_ldl (int, const vector pixel *); vector signed short vec_ldl (int, const vector signed short *); @@ -
[PATCH, rs6000] Improved Documentation of Built-in Functions Part 2
The focus of this patch is to restructure the section headers within the PowerPC portion of the extend.texi documentation file. Restructuring section headers prepares the foundation for subsequent documentation improvements which will be delivered in follow-on patches. I have bootstrapped and regression tested without regressions on powerpc64le-unknown-linux (P8). I have also confirmed that this patch builds on powerpc-linux (P7 bing-endian, both -m32 and -m64 target options). I have also built and reviewed the gcc.pdf file. Is this ok for the trunk? gcc/ChangeLog: 2018-05-14 Kelvin Nilsen <kel...@gcc.gnu.org> * doc/extend.texi (Basic PowerPC Built-in Functions): Rename this subsection to be "PowerPC Built-in Functions". (PowerPC Altivec/VSX Built-in Functions): Change this subsection to subsubsection and rename as "PowerPC Altivec Built-in Functions Available on ISA 2.05". (PowerPC Built-in Functions Available on ISA 2.06): New subsubsection. (PowerPC Built-in Functions Available on ISA 2.07): Likewise. (PowerPC Built-in Functions Available on ISA 3.0): Likewise. (PowerPC Hardware Transactional Memory Built-in Functions): Split this subsection into two subsubsections named "Basic PowerPC Hardware Transactional Memory Built-in Functions" and "PowerPC Hardware Transactional Memory Built-in Functions". Move the basic subsubsection forward to be next to other basic subsubsections. (PowerPC Atomic Memory Operation Functions): Change this subsection to subsubsection. Index: gcc/doc/extend.texi === --- gcc/doc/extend.texi (revision 260182) +++ gcc/doc/extend.texi (working copy) @@ -12477,10 +12477,7 @@ instructions, but allow the compiler to schedule t * MSP430 Built-in Functions:: * NDS32 Built-in Functions:: * picoChip Built-in Functions:: -* Basic PowerPC Built-in Functions:: -* PowerPC AltiVec/VSX Built-in Functions:: -* PowerPC Hardware Transactional Memory Built-in Functions:: -* PowerPC Atomic Memory Operation Functions:: +* PowerPC Built-in Functions:: * RX Built-in Functions:: * S/390 System z Built-in Functions:: * SH Built-in Functions:: @@ -15536,25 +15533,35 @@ implementing assertions. @end table -@node Basic PowerPC Built-in Functions -@subsection Basic PowerPC Built-in Functions +@node PowerPC Built-in Functions +@subsection PowerPC Built-in Functions +This section describes built-in functions that are supported for +various configurations of the PowerPC processor. + @menu * Basic PowerPC Built-in Functions Available on all Configurations:: * Basic PowerPC Built-in Functions Available on ISA 2.05:: * Basic PowerPC Built-in Functions Available on ISA 2.06:: * Basic PowerPC Built-in Functions Available on ISA 2.07:: +* Basic PowerPC Hardware Transactional Memory Built-in Functions:: * Basic PowerPC Built-in Functions Available on ISA 3.0:: +* PowerPC AltiVec Built-in Functions Available on ISA 2.05:: +* PowerPC AltiVec Built-in Functions Available on ISA 2.06:: +* PowerPC AltiVec Built-in Functions Available on ISA 2.07:: +* PowerPC AltiVec Built-in Functions Available on ISA 3.0:: +* PowerPC Hardware Transactional Memory Built-in Functions:: +* PowerPC Atomic Memory Operation Functions:: @end menu -This section describes PowerPC built-in functions that do not require -the inclusion of any special header files to declare prototypes or -provide macro definitions. The sections that follow describe -additional PowerPC built-in functions. - @node Basic PowerPC Built-in Functions Available on all Configurations @subsubsection Basic PowerPC Built-in Functions Available on all Configurations +This section describes PowerPC built-in functions that are supported +on all configurations and do not require +the inclusion of any special header files to declare prototypes or +provide macro definitions. + @deftypefn {Built-in Function} void __builtin_cpu_init (void) This function is a @code{nop} on the PowerPC platform and is included solely to maintain API compatibility with the x86 builtins. @@ -15889,6 +15896,150 @@ addition to the @option{-mpower8-fusion}, @option{ This section intentionally empty. +@node Basic PowerPC Hardware Transactional Memory Built-in Functions +@subsubsection Basic PowerPC Hardware Transactional Memory Built-in Functions + +The following basic built-in functions are available with +@option{-mhtm} or @option{-mcpu=CPU} where CPU is `power8' or later. +They all generate the machine instruction that is part of the name. + +The Hardware Transactional Memory (HTM) builtins (with the exception +of @code{__builtin_tbegin}) return +the full 4-bit condition register value set by their associated hardware +instruction. The header file @code{htmintrin.h} defines some macros that can +be used to decipher the retu
[PATCH v2, rs6000] Improve Documentation of Built-In Functions Part 1
This is the first of several planned patches to address shortcomings in existing documentation of PowerPC built-in functions. The focus of this particular patch is to improve documentation of basic built-in functions that do not require inclusion of special header files. A summary of this patch follows: 1. Change the name of the first PowerPC built-in section from "PowerPC Built-in Functions" to "Basic PowerPC Built-in Functions". This section has never described all PowerPC built-in functions. 2. Introduce subsubsections within this section to independently describe built-in functions that target particular ISA levels. Sort function descriptions into appropriate subsubsections. 3. Add descriptions of three new features that can be tested with the __builtin_cpu_supports function: darn, htm-no-suspend, and scv. 4. Corrected the spellings of several built-in functions: __builtin_fmaf128_round_to_odd, __builtin_addg6s, __builtin_cbctdt, __builtin_cdtbcd. This patch is limited in scope in order to manage complexity of the diffs. Subsequent patches will address different sections of the documentation. Subsequent patches will also add new function descriptions into these sections. This differs from the previous draft patch in the following regards: 1. This patch adds back in documentation of the __builtin_fabsq, __builtin_copysignq, __builtin_infq, __builtin_huge_valq, __builtin_nanq, __builtin_nansq, __builtin_sqrtf128, and __builtin_fmaf128 functions. 2. Consistently, changed subsubsection names from "Low-Level PowerPC Built-in ... " to "Basic PowerPC Built-in ... " 3. Changed subsubsection name from "... Available on All Targets" to "... Available on All Configurations". 4. Used @code{} font for darn and tsuspend. instruction names. 5. Removed unnecessary parentheses around many option descriptions. 6. Clarified that the result returned from the __builtin_darn_32 function is conditioned. 7. Enhanced the ChangeLog to call out each of the subsection names (within extend.texi) that is affected by this patch. 8. Changed the menu reference to the newly named "Basic PowerPC Built-in Functions" 9. Added a new sub-menu to identify the subsubsections of the "Basic PowerPC Built-in Functions" section. I have bootstrapped and regression tested without regressions on both powerpc64le-unknown-linux (P8) and on powerpc-linux (P8 big-endian, with both -m32 and -m64 target options). I have built and reviewed the gcc.pdf on the little-endian test platform. I did not build the gcc.pdf file on my big-endian test platform because it is missing relevant fonts. Is this ok for the trunk? 2018-05-09 Kelvin Nilsen <kel...@gcc.gnu.org> * doc/extend.texi (PowerPC Built-in Functions): Rename this subsection. (Basic PowerPC Built-in Functions): The new name of the subsection previously known as "PowerPC Built-in Functions". (Basic PowerPC Built-in Functions Available on all Configurations): New subsubsection. (Basic PowerPC Built-in Functions Available on ISA 2.05): Likewise. (Basic PowerPC Built-in Functions Available on ISA 2.06): Likewise. (Basic PowerPC Built-in Functions Available on ISA 2.07): Likewise. (Basic PowerPC Built-in Functions Available on ISA 3.0): Likewise. Index: gcc/doc/extend.texi === --- gcc/doc/extend.texi (revision 260073) +++ gcc/doc/extend.texi (working copy) @@ -12475,7 +12475,7 @@ * MSP430 Built-in Functions:: * NDS32 Built-in Functions:: * picoChip Built-in Functions:: -* PowerPC Built-in Functions:: +* Basic PowerPC Built-in Functions:: * PowerPC AltiVec/VSX Built-in Functions:: * PowerPC Hardware Transactional Memory Built-in Functions:: * PowerPC Atomic Memory Operation Functions:: @@ -15534,12 +15534,25 @@ @end table -@node PowerPC Built-in Functions -@subsection PowerPC Built-in Functions +@node Basic PowerPC Built-in Functions +@subsection Basic PowerPC Built-in Functions -The following built-in functions are always available and can be used to -check the PowerPC target platform type: +@menu +* Basic PowerPC Built-in Functions Available on all Configurations:: +* Basic PowerPC Built-in Functions Available on ISA 2.05:: +* Basic PowerPC Built-in Functions Available on ISA 2.06:: +* Basic PowerPC Built-in Functions Available on ISA 2.07:: +* Basic PowerPC Built-in Functions Available on ISA 3.0:: +@end menu +This section describes PowerPC built-in functions that do not require +the inclusion of any special header files to declare prototypes or +provide macro definitions. The sections that follow describe +additional PowerPC built-in functions. + +@node Basic PowerPC Built-in Functions Available on all Configurations +@subsubsection Basic PowerPC Built-in Functions
Re: [PATCH, rs6000] Improve Documentation of Built-In Functions Part 1
Thank you for the prompt review and careful feedback. I didn't notice your message until this morning. At this point, I'll wait a few days before committing these changes as I understand we are still in the "RC phase of GCC 8". On 4/24/18 4:45 PM, Segher Boessenkool wrote: > Hi! > > On Tue, Apr 24, 2018 at 02:25:58PM -0500, Kelvin Nilsen wrote: >>> 4. Remove descriptions of built-in function that do not belong in this >>> section because the >>>built-in functions are generic (not specific to PowerPC): >>> __builtin_fabsq, >>>__builtin_copysignq, __builtin_infq, __builtin_huge_valq, __builtin_nanq, >>>__builtin_nansq, __builtin_sqrtf128, __builtin_fmaf128. > > Are these described in a generic place, then? I don't see it? > >> +@node Low-Level PowerPC Built-in Functions Available on all Targets >> +@subsubsection Low-Level PowerPC Built-in Functions Available on all Targets Regarding your question about "q functions", the existing gcc.pdf document is a bit confusing. Here's what I can figure out. The following are mentioned only in "Section 6.59.33: x86 Built-in Functions" __float128 __builtin_fabsq (__float128) __float128 __builtin_copysignq (__float128, __float128) __float128 __builtin_infq (void) __float128 __builtin_huge_valq (void) __float128 __builtin_nanq (void) __float128 __builtin_nansq (void) As far as I can tell, these should not be documented as specific to x86, but should be documented as generic across all platforms. This is an issue outside the realm of PowerPC maintenance. If we want to preserve mention of these "q" functions, I would recommend changing the text that introduces them. Currently, it says: "Previous versions of GCC supported some 'q' builtins for IEEE 128-bit floating point. These functions are now mapped into the equivalent 'f128' builtin functions." If the description of these built-ins is not moved to a more generic context, I would prefer to replace this section with something like: The following functions, which are also supported on x86 targets, are supported if the -mfloat128 option is specified: __float128 __builtin_fabsq (__float128) __float128 __builtin_copysignq (__float128, __float128) __float128 __builtin_infq (void) __float128 __builtin_huge_valq (void) __float128 __builtin_nanq (void) __float128 __builtin_nansq (void) Regarding your question about f128 functions, these are "supposed to be" documented in "Section 6.58: Other Built-in Functions Provided by GCC". Search for the phrase "corresponding to the TS 18661-3 functions". We should add "__builtin_sqrtf128 and builtin_fmaf128 to the list of functions described this way. These may not be the only omissions. Should we push for fixing this documentation in Section 6.58 instead of keeping it in the PowerPC section? It is difficult to find the official TS 18661-3 document, and I'm not sure where to look for a list of which of the functions are currently implemented by gcc. I found this "diff" document, which provides some hints. Given that this standard is not easily accessible, perhaps the generic built-in documentation should provide a little more information? See http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1945.pdf
Re: [PATCH, rs6000] Improve Documentation of Built-In Functions Part 1
I'm updating this patch to make two improvements to what was submitted earlier today: 1. Correct the description of the htm-no-suspend CPU feature. 2. Add a comment to clarify that the builtin_divde and builtin_divdeu built-in functions require 64-bit targets. Everything else is the same as submitted previously. On 4/24/18 9:12 AM, Kelvin Nilsen wrote: > This is the first of several patches to address shortcomings in existing > documentation of > PowerPC built-in functions. The focus of this particular patch is to > improve documentation > of low-level built-in functions that do not require special include headers. > > A summary of this patch follows: > > 1. Change the name of the first PowerPC built-in section from "PowerPC > Built-in Functions" >   to "Low-Level PowerPC Built-in Functions". This section has never > described all PowerPC >   built-in functions. > > 2. Introduce subsubsections within this section to independently > describe built-in functions >   that target particular ISA levels. Sort function descriptions into > appropriate >   subsubsections. > > 3. Add descriptions of three new features that can be tested with the > __builtin_cpu_supports >   function: darn, htm-no-suspend, and scv. > > 4. Remove descriptions of built-in function that do not belong in this > section because the >   built-in functions are generic (not specific to PowerPC): > __builtin_fabsq, >   __builtin_copysignq, __builtin_infq, __builtin_huge_valq, __builtin_nanq, >   __builtin_nansq, __builtin_sqrtf128, __builtin_fmaf128. > > 5. Corrected the spellings of several built-in functions: > __builtin_fmaf128_round_to_odd, >   __builtin_addg6s, __builtin_cbctdt, __builtin_cdtbcd. > > This patch is limited in scope in order to manage complexity of the > diffs. Subsequent patches > will address different sections of the documentation. Subsequent > patches will also add > new function descriptions into these sections. > > This patch affects only extend.texi. The gcc.pdf file has been built > and reviewed. > > Is this ok for the trunk? gcc/ChangeLog: 2018-04-24 Kelvin Nilsen <kel...@gcc.gnu.org> * doc/extend.texi: Tidy documentation of PowerPC built-in functions. Index: gcc/doc/extend.texi === --- gcc/doc/extend.texi (revision 259504) +++ gcc/doc/extend.texi (working copy) @@ -15524,12 +15524,17 @@ implementing assertions. @end table -@node PowerPC Built-in Functions -@subsection PowerPC Built-in Functions +@node Low-Level PowerPC Built-in Functions +@subsection Low-Level PowerPC Built-in Functions -The following built-in functions are always available and can be used to -check the PowerPC target platform type: +This section describes PowerPC built-in functions that do not require +the inclusion of any special header files to declare prototypes or +provide macro definitions. The sections that follow describe +additional PowerPC built-in functions. +@node Low-Level PowerPC Built-in Functions Available on all Targets +@subsubsection Low-Level PowerPC Built-in Functions Available on all Targets + @deftypefn {Built-in Function} void __builtin_cpu_init (void) This function is a @code{nop} on the PowerPC platform and is included solely to maintain API compatibility with the x86 builtins. @@ -15633,6 +15638,8 @@ CPU supports the set of compatible performance mon CPU supports the Embedded ISA category. @item cellbe CPU has a CELL broadband engine. +@item darn +CPU supports the darn (deliver a random number) instruction. @item dfp CPU has a decimal floating point unit. @item dscr @@ -15649,6 +15656,9 @@ CPU has a floating point unit. CPU has hardware transaction memory instructions. @item htm-nosc Kernel aborts hardware transactions when a syscall is made. +@item htm-no-suspend +CPU supports hardware transaction memory but does not support the +tsuspend. instruction. @item ic_snoop CPU supports icache snooping capabilities. @item ieee128 @@ -15677,6 +15687,8 @@ CPU supports the old POWER ISA (eg, 601) CPU supports 64-bit mode execution. @item ppcle CPU supports a little-endian mode that uses address swizzling. +@item scv +Kernel supports system call vectored. @item smt CPU support simultaneous multi-threading. @item spe @@ -15708,19 +15720,81 @@ Here is an example: @end smallexample @end deftypefn -These built-in functions are available for the PowerPC family of +The following built-in functions are also available on all PowerPC processors: @smallexample -float __builtin_recipdivf (float, float); -float __builtin_rsqrtf (float); -double __builtin_recipdiv (double, double); -double __builtin_rsqrt (double); uint64_t __builtin_ppc_get_timebase (); unsigned long __builtin_ppc_mftb (); -double __builtin_unpack_longdouble (lon
[PATCH, rs6000] Improve Documentation of Built-In Functions Part 1
This is the first of several patches to address shortcomings in existing documentation of PowerPC built-in functions. The focus of this particular patch is to improve documentation of low-level built-in functions that do not require special include headers. A summary of this patch follows: 1. Change the name of the first PowerPC built-in section from "PowerPC Built-in Functions"   to "Low-Level PowerPC Built-in Functions". This section has never described all PowerPC   built-in functions. 2. Introduce subsubsections within this section to independently describe built-in functions   that target particular ISA levels. Sort function descriptions into appropriate   subsubsections. 3. Add descriptions of three new features that can be tested with the __builtin_cpu_supports   function: darn, htm-no-suspend, and scv. 4. Remove descriptions of built-in function that do not belong in this section because the   built-in functions are generic (not specific to PowerPC): __builtin_fabsq,   __builtin_copysignq, __builtin_infq, __builtin_huge_valq, __builtin_nanq,   __builtin_nansq, __builtin_sqrtf128, __builtin_fmaf128. 5. Corrected the spellings of several built-in functions: __builtin_fmaf128_round_to_odd,   __builtin_addg6s, __builtin_cbctdt, __builtin_cdtbcd. This patch is limited in scope in order to manage complexity of the diffs. Subsequent patches will address different sections of the documentation. Subsequent patches will also add new function descriptions into these sections. This patch affects only extend.texi. The gcc.pdf file has been built and reviewed. Is this ok for the trunk? gcc/ChangeLog: 2018-04-24 Kelvin Nilsen <kel...@gcc.gnu.org>    * doc/extend.texi: Tidy documentation of PowerPC built-in functions. Index: gcc/doc/extend.texi === --- gcc/doc/extend.texi   (revision 259504) +++ gcc/doc/extend.texi   (working copy) @@ -15524,12 +15524,17 @@ implementing assertions.   @end table  -@node PowerPC Built-in Functions -@subsection PowerPC Built-in Functions +@node Low-Level PowerPC Built-in Functions +@subsection Low-Level PowerPC Built-in Functions  -The following built-in functions are always available and can be used to -check the PowerPC target platform type: +This section describes PowerPC built-in functions that do not require +the inclusion of any special header files to declare prototypes or +provide macro definitions. The sections that follow describe +additional PowerPC built-in functions.  +@node Low-Level PowerPC Built-in Functions Available on all Targets +@subsubsection Low-Level PowerPC Built-in Functions Available on all Targets +  @deftypefn {Built-in Function} void __builtin_cpu_init (void)  This function is a @code{nop} on the PowerPC platform and is included solely  to maintain API compatibility with the x86 builtins. @@ -15633,6 +15638,8 @@ CPU supports the set of compatible performance mon  CPU supports the Embedded ISA category.  @item cellbe  CPU has a CELL broadband engine. +@item darn +CPU supports the darn (deliver a random number) instruction.  @item dfp  CPU has a decimal floating point unit.  @item dscr @@ -15649,6 +15656,8 @@ CPU has a floating point unit.  CPU has hardware transaction memory instructions.  @item htm-nosc  Kernel aborts hardware transactions when a syscall is made. +@item htm-no-suspend +Kernel aborts hardware transactions when the thread is suspended.  @item ic_snoop  CPU supports icache snooping capabilities.  @item ieee128 @@ -15677,6 +15686,8 @@ CPU supports the old POWER ISA (eg, 601)  CPU supports 64-bit mode execution.  @item ppcle  CPU supports a little-endian mode that uses address swizzling. +@item scv +Kernel supports system call vectored.  @item smt  CPU support simultaneous multi-threading.  @item spe @@ -15708,19 +15719,81 @@ Here is an example:  @end smallexample  @end deftypefn  -These built-in functions are available for the PowerPC family of +The following built-in functions are also available on all PowerPC  processors:  @smallexample -float __builtin_recipdivf (float, float); -float __builtin_rsqrtf (float); -double __builtin_recipdiv (double, double); -double __builtin_rsqrt (double);  uint64_t __builtin_ppc_get_timebase ();  unsigned long __builtin_ppc_mftb (); -double __builtin_unpack_longdouble (long double, int); -long double __builtin_pack_longdouble (double, double);  @end smallexample  +The @code{__builtin_ppc_get_timebase} and @code{__builtin_ppc_mftb} +functions generate instructions to read the Time Base Register. The +@code{__builtin_ppc_get_timebase} function may generate multiple +instructions and always returns the 64 bits of the Time Base Register. +The @code{__builtin_ppc_mftb} function always generates one instruction and +returns the Time Base Register value as an unsigned long, throwing away +the most significant word on 32-bit environments. + +@node Low-Level PowerPC Built-in Funct
[PATCH, rs6000] Fix tests that are failing in gcc.target/powerpc/bfp with -m32
Twelve failures have been occuring in the bfp test directory during -m32 regression testing. The cause of these failures was two-fold: 1. Patches added subsequent to development of the tests caused new error messages to be emitted that are different than the error messages expected in the dejagnu patterns. These new patches also changed which built-in functions are legal when compiling with the -m32 command-line option. 2. The implementation of overloaded built-in functions maps overloaded function names to non-overloaded names. Depending on the stage at which an error is recognized, error messages may refer either to the overloaded built-in function name or the non-overloaded name. This patch: 1. Changes the expected error messages in certain test programs. 2. Disables certain test programs from being exercised on 32-bit targets. 3. Adds a "note" error message to explain the mapping from overloaded built-in functions to non-overloaded built-in functions. This patch has bootstrapped and tested without regressions on both powerpc64le-unknown-linux (P8) and on powerpc-linux (P7 big-endian, with both -m32 and -m64 target options). Is this ok for trunk? gcc/ChangeLog: 2018-04-13 Kelvin Nilsen <kel...@gcc.gnu.org>    * config/rs6000/rs6000-protos.h (rs6000_builtin_is_supported_p):    New prototype.    * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):    Add note to error message to explain internal mapping of overloaded    built-in function name to non-overloaded built-in function name.    * config/rs6000/rs6000.c (rs6000_builtin_is_supported_p): New    function. gcc/testsuite/ChangeLog: 2018-04-13 Kelvin Nilsen <kel...@gcc.gnu.org>    * gcc.target/powerpc/bfp/scalar-extract-sig-5.c: Simplify to    prevent cascading of errors and change expected error message.    * gcc.target/powerpc/bfp/scalar-test-neg-4.c: Restrict this test    to 64-bit targets.    * gcc.target/powerpc/bfp/scalar-test-data-class-8.c: Likewise.    * gcc.target/powerpc/bfp/scalar-test-data-class-9.c: Likewise.    * gcc.target/powerpc/bfp/scalar-test-data-class-10.c: Likewise.    * gcc.target/powerpc/bfp/scalar-insert-exp-11.c: Change expected    error message.    * gcc.target/powerpc/bfp/scalar-extract-exp-5.c: Likewise. Index: gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-5.c === --- gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-5.c   (revision 259316) +++ gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-5.c   (working copy) @@ -8,10 +8,10 @@    error because the builtin requires 64 bits. */  #include  -unsigned __int128 /* { dg-error "'__int128' is not supported on this target" } */ +unsigned long long int  get_significand (__ieee128 *p)  {   __ieee128 source = *p;  - return __builtin_vec_scalar_extract_sig (source); /* { dg-error "builtin function '__builtin_vec_scalar_extract_sig' not supported in this compiler configuration" } */ + return (long long int) __builtin_vec_scalar_extract_sig (source); /* { dg-error "requires ISA 3.0 IEEE 128-bit floating point" } */  } Index: gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-neg-4.c === --- gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-neg-4.c   (revision 259316) +++ gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-neg-4.c   (working copy) @@ -1,5 +1,6 @@  /* { dg-do compile { target { powerpc*-*-* } } } */  /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */ +/* { dg-require-effective-target lp64 } */  /* { dg-require-effective-target powerpc_p9vector_ok } */  /* { dg-options "-mcpu=power9" } */  @@ -11,6 +12,8 @@  {   __ieee128 source = *p;  + /* IEEE 128-bit floating point operations are only supported + on 64-bit targets. */   return scalar_test_neg (source);  }  Index: gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-data-class-8.c === --- gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-data-class-8.c   (revision 259316) +++ gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-data-class-8.c   (working copy) @@ -1,5 +1,6 @@  /* { dg-do compile { target { powerpc*-*-* } } } */  /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */ +/* { dg-require-effective-target lp64 } */  /* { dg-require-effective-target powerpc_p9vector_ok } */  /* { dg-options "-mcpu=power9" } */  @@ -11,6 +12,8 @@  {   __ieee128 source = *p;  + /* IEEE 128-bit floating point operations are only supported + on 64-bit targets. */   return scalar_test_data_class (source, 3);  }  Index: gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-11.c =
[PATCH][OBVIOUS] PR85347: New testcase vec-ldl-1.c FAILs on powerpc64-linux
This new test case required a dejagnu qualifier to restrict its execution on big-endian platforms. The patch bootstrapped and tested without regressions. Was committed as obvious. gcc/testsuite/ChangeLog: 2018-04-12 Kelvin Nilsen <kel...@gcc.gnu.org>    PR target/85347    * gcc.target/powerpc/vec-ldl-1.c: Change dejagnu directives to    specify -mvsx on gcc command line. Index: gcc/testsuite/gcc.target/powerpc/vec-ldl-1.c === --- gcc/testsuite/gcc.target/powerpc/vec-ldl-1.c   (revision 259318) +++ gcc/testsuite/gcc.target/powerpc/vec-ldl-1.c   (working copy) @@ -1,6 +1,6 @@  /* { dg-do run { target powerpc*-*-* } } */ -/* { dg-require-effective-target vmx_hw } */ -/* { dg-options "-maltivec -O0 -Wall" } */ +/* { dg-require-effective-target vsx_hw } */ +/* { dg-options "-mvsx -O0 -Wall" } */   #include  #include
[PATCH v2, rs6000] Tidy implementation of vec_ldl
This is a second draft of a draft patch originally submitted on 3/29. This patch corrects inconsistencies in the supported prototypes for the vec_ldl built-in function. Specifically, it removes support for:  vector int vec_ldl (int, long int *)  vector unsigned int vec_ldl (int, unsigned long int *) and adds support for:  vector bool char vec_ldl (int, bool char *)  vector bool short vec_ldl (int, bool short *)  vector bool int vec_ldl (int, bool int *)  vector bool long long vec_ldl (int, bool long long *)  vector long long vec_ldl (int, long long *)  vector unsigned long long vec_ldl (int, unsigned long long *) Thanks to Segher Boessenkool for his careful review and feedback on the first draft of this patch. This second revision differs from the first in the following: 1. Removed support for the proposed new prototype: "vector pixel vec_ldl (int, pixel *)" 2. Removed an extraneous tab character in the ChangeLog. 3. Changed the mangling of the bool_long_long_type_node. 4. Removed leading * on comment continuation lines. 5. Added a comment to describe limitations on use of the pixel data type. 6. Removed requirement for lp64 on the new test program. This patch has bootstrapped and tested without regressions on both powerpc64le-unknown-linux (P8) and on powerpc-linux (P7 big-endian, with both -m32 and -m64 target options). Is this ok for trunk? gcc/ChangeLog: 2018-04-03 Kelvin Nilsen <kel...@gcc.gnu.org>    * config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Remove    erroneous entries for    "vector int vec_ldl (int, long int *)", and    "vector unsigned int vec_ldl (int, unsigned long int *)".    Add comments and entries for    "vector bool char vec_ldl (int, bool char *)",    "vector bool short vec_ldl (int, bool short *)",    "vector bool int vec_ldl (int, bool int *)",    "vector bool long long vec_ldl (int, bool long long *)",    "vector pixel vec_ldl (int, pixel *)",    "vector long long vec_ldl (int, long long *)",    "vector unsigned long long vec_ldl (int, unsigned long long *)".    * config/rs6000/rs6000.c (rs6000_init_builtins): Initialize new    type tree bool_long_long_type_node and correct definition of    bool_V2DI_type_node to make reference to this new type tree.    (rs6000_mangle_type): Replace erroneous reference to    bool_long_type_node with bool_long_long_type_node.    * config/rs6000/rs6000.h (enum rs6000_builtin_type_index): Add    comments to emphasize sign distinctions for char and int types and    replace RS6000_BTI_bool_long constant with    RS6000_BTI_bool_long_long constant. Also add comment to restrict    use of RS6000_BTI_pixel.    (bool_long_type_node): Remove this macro definition.    (bool_long_long_type_node): New macro definition gcc/testsuite/ChangeLog: 2018-04-03 Kelvin Nilsen <kel...@gcc.gnu.org>    * gcc.target/powerpc/vec-ldl-1.c: New test.    * gcc.dg/vmx/ops-long-1.c: Correct test programs to reflect    corrections to ABI implementation. Index: gcc/config/rs6000/rs6000-c.c === --- gcc/config/rs6000/rs6000-c.c   (revision 258800) +++ gcc/config/rs6000/rs6000-c.c   (working copy) @@ -1656,27 +1656,45 @@ const struct altivec_builtin_types altivec_overloa RS6000_BTI_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_INTQI, 0 },   { ALTIVEC_BUILTIN_VEC_LVEBX, ALTIVEC_BUILTIN_LVEBX, RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTQI, 0 }, + + /* vector float vec_ldl (int, vector float *); + vector float vec_ldl (int, float *); */   { ALTIVEC_BUILTIN_VEC_LDL, ALTIVEC_BUILTIN_LVXL_V4SF, RS6000_BTI_V4SF, RS6000_BTI_INTSI, ~RS6000_BTI_V4SF, 0 },   { ALTIVEC_BUILTIN_VEC_LDL, ALTIVEC_BUILTIN_LVXL_V4SF, RS6000_BTI_V4SF, RS6000_BTI_INTSI, ~RS6000_BTI_float, 0 }, + + /* vector bool int vec_ldl (int, vector bool int *); + vector bool int vec_ldl (int, bool int *); + vector int vec_ldl (int, vector int *); + vector int vec_ldl (int, int *); + vector unsigned int vec_ldl (int, vector unsigned int *); + vector unsigned int vec_ldl (int, unsigned int *); */   { ALTIVEC_BUILTIN_VEC_LDL, ALTIVEC_BUILTIN_LVXL_V4SI, RS6000_BTI_bool_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_bool_V4SI, 0 },   { ALTIVEC_BUILTIN_VEC_LDL, ALTIVEC_BUILTIN_LVXL_V4SI, +   RS6000_BTI_bool_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_bool_int, 0 }, + { ALTIVEC_BUILTIN_VEC_LDL, ALTIVEC_BUILTIN_LVXL_V4SI, RS6000_BTI_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_V4SI, 0 },   { ALTIVEC_BUILTIN_VEC_LDL, ALTIVEC_BUILTIN_LVXL_V4SI, RS6000_BTI_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_INTSI, 0 },   { ALTIVEC_BUILTIN_VEC_LDL, ALTIVEC_BUILTIN_LVXL_V4SI, -   RS6000_BTI_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_long, 0 }, - { ALTIVEC_BUILTIN_VEC_LDL, ALTIVEC_BUILTIN_LVXL_V4SI, RS6000_BTI_unsigned_V
[PATCH, rs6000] Tidy implementation of vec_ldl
During code review, an inconsistency was noticed in some of the prototypes defined for the vec_ldl built-in function. In particular, the vector fetched from an address declare to be long long * was returned as "vector int". In addressing this problem, certain other inconsistencies and omissions were discovered. This patch tidies up the implementation of this function. A separate patch is in preparation to address the documentation for this and all other PowerPC built-in functions. In summary, this patch removes two prototypes:  vector int vec_ldl (int, long int *)  vector unsigned int vec_ldl (int, unsigned long int *) and adds eight:  vector bool char vec_ldl (int, bool char *)  vector bool short vec_ldl (int, bool short *)  vector bool int vec_ldl (int, bool int *)  vector bool long long vec_ldl (int, bool long long *)  vector pixel vec_ldl (int, pixel *)  vector long long vec_ldl (int, long long *)  vector unsigned long long vec_ldl (int, unsigned long long *) This patch has been bootstrapped and tested without regressions on both powerpc64le-unknown-linux (P8) and on powerpc-linux (P7 big-endian, with both -m32 and -m64 target options). Is this ok for trunk? gcc/ChangeLog: 2018-03-29 Kelvin Nilsen <kel...@gcc.gnu.org>    * config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Remove    erroneous entries for    "vector int vec_ldl (int, long int *)", and    "vector   unsigned int vec_ldl (int, unsigned long int *)".    Add comments and entries for    "vector bool char vec_ldl (int, bool char *)",    "vector bool short vec_ldl (int, bool short *)",    "vector bool int vec_ldl (int, bool int *)",    "vector bool long long vec_ldl (int, bool long long *)",    "vector pixel vec_ldl (int, pixel *)",    "vector long long vec_ldl (int, long long *)",    "vector unsigned long long vec_ldl (int, unsigned long long *)".    * config/rs6000/rs6000.c (rs6000_init_builtins): Initialize new    type tree bool_long_long_type_node and correct definition of    bool_V2DI_type_node to make reference to this new type tree.    (rs6000_mangle_type): Replace erroneous reference to    bool_long_type_node with bool_long_long_type_node.    * config/rs6000/rs6000.h (enum rs6000_builtin_type_index): Add    comments to emphasize sign distinctions for char and int types and    replace RS6000_BTI_bool_long constant with    RS6000_BTI_bool_long_long constant.    (bool_long_type_node): Remove this macro definition.    (bool_long_long_type_node): New macro definition gcc/testsuite/ChangeLog: 2018-03-29 Kelvin Nilsen <kel...@gcc.gnu.org>    * gcc.target/powerpc/vec-ldl-1.c: New test.    * gcc.dg/vmx/ops-long-1.c: Correct test programs to reflect    corrections to ABI implementation. Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c   (revision 258800) +++ gcc/config/rs6000/rs6000.c   (working copy) @@ -16947,7 +16947,7 @@ rs6000_init_builtins (void)   bool_char_type_node = build_distinct_type_copy (unsigned_intQI_type_node);   bool_short_type_node = build_distinct_type_copy (unsigned_intHI_type_node);   bool_int_type_node = build_distinct_type_copy (unsigned_intSI_type_node); - bool_long_type_node = build_distinct_type_copy (unsigned_intDI_type_node); + bool_long_long_type_node = build_distinct_type_copy (unsigned_intDI_type_node);   pixel_type_node = build_distinct_type_copy (unsigned_intHI_type_node);   long_integer_type_internal_node = long_integer_type_node; @@ -17064,7 +17064,7 @@ rs6000_init_builtins (void)   bool_V2DI_type_node = rs6000_vector_type (TARGET_POWERPC64                ? "__vector __bool long"                : "__vector __bool long long", -                  bool_long_type_node, 2); +                  bool_long_long_type_node, 2);   pixel_V8HI_type_node = rs6000_vector_type ("__vector __pixel",             pixel_type_node, 8); @@ -32855,7 +32855,7 @@ rs6000_mangle_type (const_tree type)   if (type == bool_short_type_node) return "U6__bools";   if (type == pixel_type_node) return "u7__pixel";   if (type == bool_int_type_node) return "U6__booli"; - if (type == bool_long_type_node) return "U6__booll"; + if (type == bool_long_long_type_node) return "U6__booll";   /* Use a unique name for __float128 rather than trying to use "e" or "g". Use  "g" for IBM extended double, no matter whether it is long double (using Index: gcc/config/rs6000/rs6000.h === --- gcc/config/rs6000/rs6000.h   (revision 258800) +++ gcc/config/rs6000/rs6000.h   (working copy) @@ -2578,7 +2578,7 @@ enum rs6000_builtin_type_index   RS6000_BTI_opaque_V2SF,  Â
[PATCH, rs6000] Finish implementation of __builtin_atlivec_lvx_v1ti
During code review, it was discovered that the implementation of __builtin_altivec_lvx_v1ti is not complete. The constant ALTIVEC_BUILTINLVX_V1TI is introduced and is bound to the function __builtin_altivec_lvx_v1ti. However, this function's implementation is incomplete because there is no call to the def_builtin function for this binding. This patch provides the missing pieces to add support for this function. Additionally, this patch introduces four new __int128-based prototypes of the overloaded __builtin_vec_ld function. This is the function that implements the vec_ld () macro expansion. A new test case has been provided to exercise each of these prototypes. This patch has been bootstrapped and tested without regressions on both powerpc64le-unknown-linux (P8) and on powerpc-linux (P7 big-endian, with both -m32 and -m64 target options). Is this patch ok for trunk? gcc/ChangeLog: 2018-03-14 Kelvin Nilsen <kel...@gcc.gnu.org> * config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Add entries for V1TI variants of __builtin_altivec_ld builtin. * config/rs6000/rs6000.c (altivec_expand_lv_builtin): Add test and handling of V1TI variant of LVX icode pattern. (altivec_expand_builtin): Add case for ALTIVEC_BUILTIN_LVX_V1TI. (rs6000_gimple_fold_builtin): Likewise. (altivec_init_builtins): Add code to define __builtin_altivec_lvx_v1ti function. * doc/extend.texi: Add four new prototypes for vec_ld. gcc/testsuite/ChangeLog: 2018-03-14 Kelvin Nilsen <kel...@gcc.gnu.org> * gcc.target/powerpc/altivec-ld-1.c: New test. Index: gcc/config/rs6000/rs6000-c.c === --- gcc/config/rs6000/rs6000-c.c(revision 258341) +++ gcc/config/rs6000/rs6000-c.c(working copy) @@ -1562,6 +1562,15 @@ const struct altivec_builtin_types altivec_overloa { VSX_BUILTIN_VEC_FLOATO, VSX_BUILTIN_UNS_FLOATO_V2DI, RS6000_BTI_V4SF, RS6000_BTI_unsigned_V2DI, 0, 0 }, + { ALTIVEC_BUILTIN_VEC_LD, ALTIVEC_BUILTIN_LVX_V1TI, +RS6000_BTI_V1TI, RS6000_BTI_INTSI, ~RS6000_BTI_V1TI, 0 }, + { ALTIVEC_BUILTIN_VEC_LD, ALTIVEC_BUILTIN_LVX_V1TI, +RS6000_BTI_unsigned_V1TI, RS6000_BTI_INTSI, ~RS6000_BTI_unsigned_V1TI, 0 }, + { ALTIVEC_BUILTIN_VEC_LD, ALTIVEC_BUILTIN_LVX_V1TI, +RS6000_BTI_V1TI, RS6000_BTI_INTSI, ~RS6000_BTI_INTTI, 0 }, + { ALTIVEC_BUILTIN_VEC_LD, ALTIVEC_BUILTIN_LVX_V1TI, +RS6000_BTI_unsigned_V1TI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTTI, 0 }, + { ALTIVEC_BUILTIN_VEC_LD, ALTIVEC_BUILTIN_LVX_V2DF, RS6000_BTI_V2DF, RS6000_BTI_INTSI, ~RS6000_BTI_V2DF, 0 }, { ALTIVEC_BUILTIN_VEC_LD, ALTIVEC_BUILTIN_LVX_V2DI, Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 258341) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -14452,6 +14452,7 @@ altivec_expand_lv_builtin (enum insn_code icode, t LVXL and LVE*X expand to use UNSPECs to hide their special behavior, so the raw address is fine. */ if (icode == CODE_FOR_altivec_lvx_v2df_2op + || icode == CODE_FOR_altivec_lvx_v1ti_2op || icode == CODE_FOR_altivec_lvx_v2di_2op || icode == CODE_FOR_altivec_lvx_v4sf_2op || icode == CODE_FOR_altivec_lvx_v4si_2op @@ -15811,6 +15812,9 @@ altivec_expand_builtin (tree exp, rtx target, bool case ALTIVEC_BUILTIN_LVX_V2DI: return altivec_expand_lv_builtin (CODE_FOR_altivec_lvx_v2di_2op, exp, target, false); +case ALTIVEC_BUILTIN_LVX_V1TI: + return altivec_expand_lv_builtin (CODE_FOR_altivec_lvx_v1ti_2op, + exp, target, false); case ALTIVEC_BUILTIN_LVX_V4SF: return altivec_expand_lv_builtin (CODE_FOR_altivec_lvx_v4sf_2op, exp, target, false); @@ -16542,6 +16546,7 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator * case ALTIVEC_BUILTIN_LVX_V4SF: case ALTIVEC_BUILTIN_LVX_V2DI: case ALTIVEC_BUILTIN_LVX_V2DF: +case ALTIVEC_BUILTIN_LVX_V1TI: { arg0 = gimple_call_arg (stmt, 0); // offset arg1 = gimple_call_arg (stmt, 1); // address @@ -17443,6 +17448,10 @@ altivec_init_builtins (void) = build_function_type_list (V2DI_type_node, long_integer_type_node, pcvoid_type_node, NULL_TREE); + tree v1ti_ftype_long_pcvoid += build_function_type_list (V1TI_type_node, + long_integer_type_node, pcvoid_type_node, + NULL_TREE); tree void_ftype_opaque_long_pvoid = build_function_type_list (void_type_node, @@ -17540,6 +17549,8 @@ altivec_init_builtins (void) def_builtin ("__builtin_altivec_lvx", v4si_ftype_long_pcvoid, ALTIVEC_BUILTIN_LVX); def_builtin ("__builtin_altivec_lvx_v2
[PATCH v2, rs6000] Remove unused (and incorrect) code for internal store and load operations
Thank you for feedback and discussion regarding my first draft of this patch with Segher Boessenkool and Bill Schmidt. This revision of the patch differs from the first in the following regards: 1. I have also removed the vector_altivec_load_ and vectore_altivec_store_ expansions from vector.md. 2. I have removed the unused rs6000_address_for_altivec function from rs6000.c. I have once again bootstrapped and regression tested on both little- endian and big-endian targets. The remainder of this description is borrowed from my initial submission of the patch. While working to assure rs6000 documentation of built-in functions is consistent with the implementation of built-in functions, I discovered some apparent typographic errors in the definitions of the ST_INTERNAL_4sf and ST_INTERNAL_2df built-in functions. As I endeavored to fix these definitions and write test cases to prove that I had properly fixed them, I discovered that these functions are no longer in use. This patch removes the unnecessary definitions and related back-end functions. This has bootstrapped and tested without regressions on both powerpc64le-unknown-linux (P8) and on powerpc-linux (P7 big-endian, with both -m32 and -m64 target options). Is this patch ok for trunk? gcc/ChangeLog: 2018-03-14 Kelvin Nilsen <kel...@gcc.gnu.org> * config/rs6000/rs6000-builtin.def: Remove various BU_ALTIVEC_X macro expansions for definition of ST_INTERNAL_ and LD_INTERNAL_ builtins. * config/rs6000/rs6000-protos.h (rs6000_address_for_altivec): Remove prototype. * config/rs6000/rs6000.c (altivec_expand_ld_builtin): Delete this function. (altivec_expand_st_builtin): Likewise. (altivec_expand_builtin): Remove calls to deleted functions. (rs6000_address_for_altivec): Delete this function. * config/rs6000/vector.md: Remove expands for vector_altivec_load_ and vector_altivec_store_. Index: gcc/config/rs6000/rs6000-builtin.def === --- gcc/config/rs6000/rs6000-builtin.def(revision 258338) +++ gcc/config/rs6000/rs6000-builtin.def(working copy) @@ -1210,20 +1210,6 @@ BU_ALTIVEC_P (VCMPGTSB_P, "vcmpgtsb_p", CONST, BU_ALTIVEC_P (VCMPGTUB_P, "vcmpgtub_p",CONST, vector_gtu_v16qi_p) /* AltiVec builtins that are handled as special cases. */ -BU_ALTIVEC_X (ST_INTERNAL_4si, "st_internal_4si", MEM) -BU_ALTIVEC_X (LD_INTERNAL_4si, "ld_internal_4si", MEM) -BU_ALTIVEC_X (ST_INTERNAL_8hi, "st_internal_8hi", MEM) -BU_ALTIVEC_X (LD_INTERNAL_8hi, "ld_internal_8hi", MEM) -BU_ALTIVEC_X (ST_INTERNAL_16qi,"st_internal_16qi", MEM) -BU_ALTIVEC_X (LD_INTERNAL_16qi,"ld_internal_16qi", MEM) -BU_ALTIVEC_X (ST_INTERNAL_4sf, "st_internal_16qi", MEM) -BU_ALTIVEC_X (LD_INTERNAL_4sf, "ld_internal_4sf", MEM) -BU_ALTIVEC_X (ST_INTERNAL_2df, "st_internal_4sf", MEM) -BU_ALTIVEC_X (LD_INTERNAL_2df, "ld_internal_2df", MEM) -BU_ALTIVEC_X (ST_INTERNAL_2di, "st_internal_2di", MEM) -BU_ALTIVEC_X (LD_INTERNAL_2di, "ld_internal_2di", MEM) -BU_ALTIVEC_X (ST_INTERNAL_1ti, "st_internal_1ti", MEM) -BU_ALTIVEC_X (LD_INTERNAL_1ti, "ld_internal_1ti", MEM) BU_ALTIVEC_X (MTVSCR, "mtvscr", MISC) BU_ALTIVEC_X (MFVSCR, "mfvscr", MISC) BU_ALTIVEC_X (DSSALL, "dssall", MISC) Index: gcc/config/rs6000/rs6000-protos.h === --- gcc/config/rs6000/rs6000-protos.h (revision 258338) +++ gcc/config/rs6000/rs6000-protos.h (working copy) @@ -162,7 +162,6 @@ extern void rs6000_emit_parity (rtx, rtx); extern rtx rs6000_machopic_legitimize_pic_address (rtx, machine_mode, rtx); extern rtx rs6000_address_for_fpconvert (rtx); -extern rtx rs6000_address_for_altivec (rtx); extern rtx rs6000_allocate_stack_temp (machine_mode, bool, bool); extern int rs6000_loop_align (rtx); extern void rs6000_split_logical (rtx [], enum rtx_code, bool, bool, bool); Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 258338) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -15183,127 +15183,7 @@ rs6000_expand_ternop_builtin (enum insn_code icode return target; } -/* Expand the lvx builtins. */ -static rtx -altivec_expand_ld_builtin (tree exp, rtx target, bool *expandedp) -{ - tree fndecl = TREE_OPERAND (CALL_EXPR_FN (exp), 0); - unsigned int fcode = DECL_FUNCTION_CODE (fndecl); - tree arg0; - machine_mode tmode, mode0; - rtx pat, op0; - enum insn_code icode; - switch (fcode) -{ -case ALTIVEC_BUILTIN_LD_
[PATCH, rs6000] Remove unused (and incorrect) code for internal store and load operations
While working to assure rs6000 documentation of built-in functions is consistent with the implementation of built-in functions, I discovered some apparent typographic errors in the definitions of the ST_INTERNAL_4sf and ST_INTERNAL_2df built-in functions. As I endeavored to fix these definitions and write test cases to prove that I had properly fixed them, I discovered that these functions are no longer in use. This patch removes the unnecessary definitions and related back-end functions. This has bootstrapped and tested without regressions on both powerpc64le-unknown-linux (P8) and on powerpc-linux (P7 big-endian, with both -m32 and -m64 target options). Is this patch ok for trunk? gcc/ChangeLog: 2018-03-09 Kelvin Nilsen <kel...@gcc.gnu.org> * config/rs6000/rs6000-builtin.def: Remove various BU_ALTIVEC_X macro expansions for definition of ST_INTERNAL_ and LD_INTERNAL_ builtins. * config/rs6000/rs6000.c (altivec_expand_ld_builtin): Delete this function. (altivec_expand_st_builtin): Likewise. (altivec_expand_builtin): Remove calls to deleted functions. Index: gcc/config/rs6000/rs6000-builtin.def === --- gcc/config/rs6000/rs6000-builtin.def(revision 258338) +++ gcc/config/rs6000/rs6000-builtin.def(working copy) @@ -1210,20 +1210,6 @@ BU_ALTIVEC_P (VCMPGTSB_P, "vcmpgtsb_p", CONST, BU_ALTIVEC_P (VCMPGTUB_P, "vcmpgtub_p",CONST, vector_gtu_v16qi_p) /* AltiVec builtins that are handled as special cases. */ -BU_ALTIVEC_X (ST_INTERNAL_4si, "st_internal_4si", MEM) -BU_ALTIVEC_X (LD_INTERNAL_4si, "ld_internal_4si", MEM) -BU_ALTIVEC_X (ST_INTERNAL_8hi, "st_internal_8hi", MEM) -BU_ALTIVEC_X (LD_INTERNAL_8hi, "ld_internal_8hi", MEM) -BU_ALTIVEC_X (ST_INTERNAL_16qi,"st_internal_16qi", MEM) -BU_ALTIVEC_X (LD_INTERNAL_16qi,"ld_internal_16qi", MEM) -BU_ALTIVEC_X (ST_INTERNAL_4sf, "st_internal_16qi", MEM) -BU_ALTIVEC_X (LD_INTERNAL_4sf, "ld_internal_4sf", MEM) -BU_ALTIVEC_X (ST_INTERNAL_2df, "st_internal_4sf", MEM) -BU_ALTIVEC_X (LD_INTERNAL_2df, "ld_internal_2df", MEM) -BU_ALTIVEC_X (ST_INTERNAL_2di, "st_internal_2di", MEM) -BU_ALTIVEC_X (LD_INTERNAL_2di, "ld_internal_2di", MEM) -BU_ALTIVEC_X (ST_INTERNAL_1ti, "st_internal_1ti", MEM) -BU_ALTIVEC_X (LD_INTERNAL_1ti, "ld_internal_1ti", MEM) BU_ALTIVEC_X (MTVSCR, "mtvscr", MISC) BU_ALTIVEC_X (MFVSCR, "mfvscr", MISC) BU_ALTIVEC_X (DSSALL, "dssall", MISC) Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 258338) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -15183,127 +15183,7 @@ rs6000_expand_ternop_builtin (enum insn_code icode return target; } -/* Expand the lvx builtins. */ -static rtx -altivec_expand_ld_builtin (tree exp, rtx target, bool *expandedp) -{ - tree fndecl = TREE_OPERAND (CALL_EXPR_FN (exp), 0); - unsigned int fcode = DECL_FUNCTION_CODE (fndecl); - tree arg0; - machine_mode tmode, mode0; - rtx pat, op0; - enum insn_code icode; - switch (fcode) -{ -case ALTIVEC_BUILTIN_LD_INTERNAL_16qi: - icode = CODE_FOR_vector_altivec_load_v16qi; - break; -case ALTIVEC_BUILTIN_LD_INTERNAL_8hi: - icode = CODE_FOR_vector_altivec_load_v8hi; - break; -case ALTIVEC_BUILTIN_LD_INTERNAL_4si: - icode = CODE_FOR_vector_altivec_load_v4si; - break; -case ALTIVEC_BUILTIN_LD_INTERNAL_4sf: - icode = CODE_FOR_vector_altivec_load_v4sf; - break; -case ALTIVEC_BUILTIN_LD_INTERNAL_2df: - icode = CODE_FOR_vector_altivec_load_v2df; - break; -case ALTIVEC_BUILTIN_LD_INTERNAL_2di: - icode = CODE_FOR_vector_altivec_load_v2di; - break; -case ALTIVEC_BUILTIN_LD_INTERNAL_1ti: - icode = CODE_FOR_vector_altivec_load_v1ti; - break; -default: - *expandedp = false; - return NULL_RTX; -} - - *expandedp = true; - - arg0 = CALL_EXPR_ARG (exp, 0); - op0 = expand_normal (arg0); - tmode = insn_data[icode].operand[0].mode; - mode0 = insn_data[icode].operand[1].mode; - - if (target == 0 - || GET_MODE (target) != tmode - || ! (*insn_data[icode].operand[0].predicate) (target, tmode)) -target = gen_reg_rtx (tmode); - - if (! (*insn_data[icode].operand[1].predicate) (op0, mode0)) -op0 = gen_rtx_MEM (mode0, copy_to_mode_reg (Pmode, op0)); - - pat = GEN_FCN (icode) (target, op0); - if (! pat) -return 0; - emit_insn (pat); - return target; -} - -/* Expand the stvx builtins. */ -static rtx -altivec_expand_st_builtin (tree exp, rtx target ATTRIBUTE_UNUSED, - bool *expandedp) -{ - tree
wwwdocs: An additional release note for powerpc for GCC 8
Is this revision to the existing draft GCC 8 release notes ok for commit? Thanks ? cvs.diffs Index: htdocs/gcc-8/changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-8/changes.html,v retrieving revision 1.36 diff -u -3 -p -r1.36 changes.html --- htdocs/gcc-8/changes.html 12 Feb 2018 07:23:11 - 1.36 +++ htdocs/gcc-8/changes.html 14 Feb 2018 14:58:56 - @@ -464,6 +464,11 @@ a work-in-progress. powerpc-xilinx-eabi*) is deprecated and will be removed in a future release. + +Support for using big-endian AltiVec intrinsics on a little-endian target +(-maltivec=be) is deprecated and will be removed in a +future release. + PowerPC SPE
[PATCH, rs6000] Begin deprecation of -maltivec=be
PR 78303 was recently marked RESOLVED, WONTFIX. The resolution was to deprecate the troublesome command-line option. This patch begins the process of deprecation by issuing a warning message when this command-line option is specified. The patch has bootstrapped and tested without regressions on powerpc64le-unknown-linux. Is this ok for trunk? gcc/ChangeLog: 2018-02-13 Kelvin Nilsen <kel...@gcc.gnu.org> * config/rs6000/rs6000.c (rs6000_option_override_internal): Issue warning message if user requests -maltivec=be. gcc/testsuite/ChangeLog: 2018-02-13 Kelvin Nilsen <kel...@gcc.gnu.org> * gcc.dg/vmx/extract-be-order.c: Disable -maltivec=be warning so this test case still works ok. * gcc.dg/vmx/extract-vsx-be-order.c: Likewise. * gcc.dg/vmx/insert-be-order.c: Likewise. * gcc.dg/vmx/insert-vsx-be-order.c: Likewise. * gcc.dg/vmx/ld-be-order.c: Likewise. * gcc.dg/vmx/ld-vsx-be-order.c: Likewise. * gcc.dg/vmx/lde-be-order.c: Likewise. * gcc.dg/vmx/ldl-be-order.c: Likewise. * gcc.dg/vmx/ldl-vsx-be-order.c: Likewise. * gcc.dg/vmx/merge-be-order.c: Likewise. * gcc.dg/vmx/merge-vsx-be-order.c: Likewise. * gcc.dg/vmx/mult-even-odd-be-order.c: Likewise. * gcc.dg/vmx/pack-be-order.c: Likewise. * gcc.dg/vmx/perm-be-order.c: Likewise. * gcc.dg/vmx/splat-be-order.c: Likewise. * gcc.dg/vmx/splat-vsx-be-order.c: Likewise. * gcc.dg/vmx/st-be-order.c: Likewise. * gcc.dg/vmx/st-vsx-be-order.c: Likewise. * gcc.dg/vmx/ste-be-order.c: Likewise. * gcc.dg/vmx/stl-be-order.c: Likewise. * gcc.dg/vmx/stl-vsx-be-order.c: Likewise. * gcc.dg/vmx/sum2s-be-order.c: Likewise. * gcc.dg/vmx/unpack-be-order.c: Likewise. * gcc.dg/vmx/vsums-be-order.c: Likewise. * gcc.target/powerpc/vec-setup-be-long.c: Likewise. Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 257395) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -4028,6 +4028,13 @@ rs6000_option_override_internal (bool global_init_ if (global_init_p) rs6000_isa_flags_explicit = global_options_set.x_rs6000_isa_flags; + /* We plan to deprecate the -maltivec=be option. For now, just + issue a warning message. */ + if (global_init_p + && (rs6000_altivec_element_order == 2)) +warning (0, "%qs command-line option is deprecated", +"-maltivec=be"); + /* On 64-bit Darwin, power alignment is ABI-incompatible with some C library functions, so warn about it. The flag may be useful for performance studies from time to time though, so don't disable it Index: gcc/testsuite/gcc.dg/vmx/extract-be-order.c === --- gcc/testsuite/gcc.dg/vmx/extract-be-order.c (revision 257395) +++ gcc/testsuite/gcc.dg/vmx/extract-be-order.c (working copy) @@ -1,4 +1,4 @@ -/* { dg-options "-maltivec=be -mabi=altivec -std=gnu99 -mno-vsx" } */ +/* { dg-options "-maltivec=be -mabi=altivec -std=gnu99 -mno-vsx -w" } */ #include "harness.h" Index: gcc/testsuite/gcc.dg/vmx/extract-vsx-be-order.c === --- gcc/testsuite/gcc.dg/vmx/extract-vsx-be-order.c (revision 257395) +++ gcc/testsuite/gcc.dg/vmx/extract-vsx-be-order.c (working copy) @@ -1,6 +1,6 @@ /* { dg-skip-if "" { powerpc*-*-darwin* } } */ /* { dg-require-effective-target powerpc_vsx_ok } */ -/* { dg-options "-maltivec=be -mabi=altivec -std=gnu99 -mvsx" } */ +/* { dg-options "-maltivec=be -mabi=altivec -std=gnu99 -mvsx -w" } */ #include "harness.h" Index: gcc/testsuite/gcc.dg/vmx/insert-be-order.c === --- gcc/testsuite/gcc.dg/vmx/insert-be-order.c (revision 257395) +++ gcc/testsuite/gcc.dg/vmx/insert-be-order.c (working copy) @@ -1,4 +1,4 @@ -/* { dg-options "-maltivec=be -mabi=altivec -std=gnu99 -mno-vsx" } */ +/* { dg-options "-w -maltivec=be -mabi=altivec -std=gnu99 -mno-vsx" } */ #include "harness.h" Index: gcc/testsuite/gcc.dg/vmx/insert-vsx-be-order.c === --- gcc/testsuite/gcc.dg/vmx/insert-vsx-be-order.c (revision 257395) +++ gcc/testsuite/gcc.dg/vmx/insert-vsx-be-order.c (working copy) @@ -1,6 +1,6 @@ /* { dg-skip-if "" { powerpc*-*-darwin* } } */ /* { dg-require-effective-target powerpc_vsx_ok } */ -/* { dg-options "-maltivec=be -mabi=altivec -std=gnu99 -mvsx" } */ +/* { dg-options "-w -maltivec=be -mabi=altivec -std=gnu99 -mvsx" } */ #include "harness.h" Index: gcc/testsuite/gcc.dg/vmx/ld-be-order.c ===
[PATCH] PR 80867: ICE during -O3 compile of libgnat
It was determined that the reported ICE occurs because a NULL value is passed from vectorizable_call () to targetm.vectorize.builtin_md_vectorized_function ( callee, vectype_out, vectype_in). This patch avoids making this call if callee equals NULL. After successful bootstrap and regression testing, with preapproval, this patch has been committed to the trunk. It it ok to backport to GCC 7 and GCC 6 (after testing on those platforms)? Thanks. gcc/ChangeLog: 2018-01-29 Richard Biener <rguent...@suse.de> Kelvin Nilsen <kel...@gcc.gnu.org> PR bootstrap/80867 * tree-vect-stmts.c (vectorizable_call): Don't call targetm.vectorize_builtin_md_vectorized_function if callee is NULL. Index: gcc/tree-vect-stmts.c === --- gcc/tree-vect-stmts.c (revision 257105) +++ gcc/tree-vect-stmts.c (working copy) @@ -3159,7 +3159,7 @@ if (cfn != CFN_LAST) fndecl = targetm.vectorize.builtin_vectorized_function (cfn, vectype_out, vectype_in); - else + else if (callee) fndecl = targetm.vectorize.builtin_md_vectorized_function (callee, vectype_out, vectype_in); }
[PATCH, rs6000] Fix ICE caused by recent patch: Generate lvx and stvx without swaps for aligned vector loads and stores
A patch committed on 2018-01-10 is causing an ICE with existing test program $GCC_SRC/gcc/testsuite/gcc.target/powerpc/pr83399.c, when compiled with the -m32 option. At the time of the commit, it was thought that this was a problem with the recent resolution of PR83399. However, further investigation revealed a problem with the patch that was just committed. The generated code did not distinguish between 32- and 64-bit targets. This patch corrects that problem. This has been bootstrapped and tested without regressions on powerpc64le-unknown-linux (P8) and on powerpc64-unknown-linux (P7) with both -m32 and -m64 target options. Is this ok for trunk? gcc/ChangeLog: 2018-01-16 Kelvin Nilsen <kel...@gcc.gnu.org> * config/rs6000/rs6000-p8swap.c (rs6000_gen_stvx): Generate different rtl trees depending on TARGET_64BIT. (rs6000_gen_lvx): Likewise. Index: gcc/config/rs6000/rs6000-p8swap.c === --- gcc/config/rs6000/rs6000-p8swap.c (revision 256710) +++ gcc/config/rs6000/rs6000-p8swap.c (working copy) @@ -1554,23 +1554,31 @@ rs6000_gen_stvx (enum machine_mode mode, rtx dest_ op1 = XEXP (memory_address, 0); op2 = XEXP (memory_address, 1); if (mode == V16QImode) - stvx = gen_altivec_stvx_v16qi_2op (src_exp, op1, op2); + stvx = TARGET_64BIT ? gen_altivec_stvx_v16qi_2op (src_exp, op1, op2) + : gen_altivec_stvx_v16qi_2op_si (src_exp, op1, op2); else if (mode == V8HImode) - stvx = gen_altivec_stvx_v8hi_2op (src_exp, op1, op2); + stvx = TARGET_64BIT ? gen_altivec_stvx_v8hi_2op (src_exp, op1, op2) + : gen_altivec_stvx_v8hi_2op_si (src_exp, op1, op2); #ifdef HAVE_V8HFmode else if (mode == V8HFmode) - stvx = gen_altivec_stvx_v8hf_2op (src_exp, op1, op2); + stvx = TARGET_64BIT ? gen_altivec_stvx_v8hf_2op (src_exp, op1, op2) + : gen_altivec_stvx_v8hf_2op_si (src_exp, op1, op2); #endif else if (mode == V4SImode) - stvx = gen_altivec_stvx_v4si_2op (src_exp, op1, op2); + stvx = TARGET_64BIT ? gen_altivec_stvx_v4si_2op (src_exp, op1, op2) + : gen_altivec_stvx_v4si_2op_si (src_exp, op1, op2); else if (mode == V4SFmode) - stvx = gen_altivec_stvx_v4sf_2op (src_exp, op1, op2); + stvx = TARGET_64BIT ? gen_altivec_stvx_v4sf_2op (src_exp, op1, op2) + : gen_altivec_stvx_v4sf_2op_si (src_exp, op1, op2); else if (mode == V2DImode) - stvx = gen_altivec_stvx_v2di_2op (src_exp, op1, op2); + stvx = TARGET_64BIT ? gen_altivec_stvx_v2di_2op (src_exp, op1, op2) + : gen_altivec_stvx_v2di_2op_si (src_exp, op1, op2); else if (mode == V2DFmode) - stvx = gen_altivec_stvx_v2df_2op (src_exp, op1, op2); + stvx = TARGET_64BIT ? gen_altivec_stvx_v2df_2op (src_exp, op1, op2) + : gen_altivec_stvx_v2df_2op_si (src_exp, op1, op2); else if (mode == V1TImode) - stvx = gen_altivec_stvx_v1ti_2op (src_exp, op1, op2); + stvx = TARGET_64BIT ? gen_altivec_stvx_v1ti_2op (src_exp, op1, op2) + : gen_altivec_stvx_v1ti_2op_si (src_exp, op1, op2); else /* KFmode, TFmode, other modes not expected in this context. */ gcc_unreachable (); @@ -1578,23 +1586,39 @@ rs6000_gen_stvx (enum machine_mode mode, rtx dest_ else /* REG_P (memory_address) */ { if (mode == V16QImode) - stvx = gen_altivec_stvx_v16qi_1op (src_exp, memory_address); + stvx = TARGET_64BIT ? + gen_altivec_stvx_v16qi_1op (src_exp, memory_address) + : gen_altivec_stvx_v16qi_1op_si (src_exp, memory_address); else if (mode == V8HImode) - stvx = gen_altivec_stvx_v8hi_1op (src_exp, memory_address); + stvx = TARGET_64BIT ? + gen_altivec_stvx_v8hi_1op (src_exp, memory_address) + : gen_altivec_stvx_v8hi_1op_si (src_exp, memory_address); #ifdef HAVE_V8HFmode else if (mode == V8HFmode) - stvx = gen_altivec_stvx_v8hf_1op (src_exp, memory_address); + stvx = TARGET_64BIT ? + gen_altivec_stvx_v8hf_1op (src_exp, memory_address) + : gen_altivec_stvx_v8hf_1op_si (src_exp, memory_address); #endif else if (mode == V4SImode) - stvx = gen_altivec_stvx_v4si_1op (src_exp, memory_address); + stvx =TARGET_64BIT ? + gen_altivec_stvx_v4si_1op (src_exp, memory_address) + : gen_altivec_stvx_v4si_1op_si (src_exp, memory_address); else if (mode == V4SFmode) - stvx = gen_altivec_stvx_v4sf_1op (src_exp, memory_address); + stvx = TARGET_64BIT ? + gen_altivec_stvx_v4sf_1op (src_exp, memory_address) + : gen_altivec_stvx_v4sf_1op_si (src_exp, memory_address); else if (mode == V2DImode) - stvx = gen_altivec_stvx_v2di_1op (src_exp, memory_address); + stvx = TARGET_64BIT ? + gen_altivec_stvx_v2di_1op (src_exp, memory_address) + : gen_altivec_stvx_v2di_
[PATCH, rs6000] Generate lvx and stvx without swaps for aligned vector loads and stores
On Power 7 and Power 8 little endian, the code generator has been emitting two instructions for each vector load and each vector store. One instruction does a swapping load or store, and the second instruction does an in-register swap. This patch replaces the two-instruction sequences with a single lvx (for loads) or stvx (for stores) instruction in the very common case that the vector is known to reside at a quad-word aligned address in memory. This patch is most relevant to Power 7 and Power 8 targets because Power 9 code generation uses new single-instruction encodings for both aligned and unaligned vector loads and stores. This patch has been boostrapped and tested without regressions on powerpc64le-unknown-linux (P8). It has also been boostrapped and tested on powerpc-linux (P7 and P8, big-endian, with both -m32 and -m64 target options). One regression was identified during big-endian regression testing: > FAIL: gcc.target/powerpc/pr83399.c (internal compiler error) > FAIL: gcc.target/powerpc/pr83399.c (test for excess errors) The pr83399.c test and the ICE are related to a recently committed patch that addresses a problem originally found and reported as part of the work on this lvx/stvx optimization patch. It appears that the PR83399 patch may not have fully addressed the big-endian aspects of the original problem report. > The ICE occurs at > ;; > /home/kelvin/gcc/gcc-trunk4test99327/gcc/testsuite/gcc.target/powerpc/pr8339\ > 9.c:15:1: internal compiler error: in plus_constant, at explow.c:103^M > ;; 0x104af39f plus_constant(machine_mode, rtx_def*, poly_int<1u, long>, > bool)^M > ;; /home/kelvin/gcc/gcc-trunk4test99327/gcc/explow.c:103^M > ;; 0x112e2d97 record_store^M > ;; /home/kelvin/gcc/gcc-trunk4test99327/gcc/dse.c:1502^M > ;; 0x112e525b scan_insn^M > ;; /home/kelvin/gcc/gcc-trunk4test99327/gcc/dse.c:2540^M > ;; 0x112e525b dse_step1^M > ;; /home/kelvin/gcc/gcc-trunk4test99327/gcc/dse.c:2652^M > ;; 0x112e525b rest_of_handle_dse^M > ;; /home/kelvin/gcc/gcc-trunk4test99327/gcc/dse.c:3569^M > ;; 0x112e525b execute^M > ;; /home/kelvin/gcc/gcc-trunk4test99327/gcc/dse.c:3627^M > ;; Please submit a full bug report,^M Is this patch ok for trunk? gcc/ChangeLog: 2018-01-10 Kelvin Nilsen <kel...@gcc.gnu.org> * config/rs6000/rs6000-p8swap.c (rs6000_sum_of_two_registers_p): New function. (rs6000_quadword_masked_address_p): Likewise. (quad_aligned_load_p): Likewise. (quad_aligned_store_p): Likewise. (const_load_sequence_p): Add comment to describe the outer-most loop. (mimic_memory_attributes_and_flags): New function. (rs6000_gen_stvx): Likewise. (replace_swapped_aligned_store): Likewise. (rs6000_gen_lvx): Likewise. (replace_swapped_aligned_load): Likewise. (replace_swapped_load_constant): Capitalize argument name in comment describing this function. (rs6000_analyze_swaps): Add a third pass to search for vector loads and stores that access quad-word aligned addresses and replace with stvx or lvx instructions when appropriate. * config/rs6000/rs6000-protos.h (rs6000_sum_of_two_registers_p): New function prototype. (rs6000_quadword_masked_address_p): Likewise. (rs6000_gen_lvx): Likewise. (rs6000_gen_stvx): Likewise. * config/rs6000/vsx.md (*vsx_le_perm_load_): For modes VSX_D (V2DF, V2DI), modify this split to select lvx instruction when memory address is aligned. (*vsx_le_perm_load_): For modes VSX_W (V4SF, V4SI), modify this split to select lvx instruction when memory address is aligned. (*vsx_le_perm_load_v8hi): Modify this split to select lvx instruction when memory address is aligned. (*vsx_le_perm_load_v16qi): Likewise. (four unnamed splitters): Modify to select the stvx instruction when memory is aligned. gcc/testsuite/ChangeLog: 2018-01-10 Kelvin Nilsen <kel...@gcc.gnu.org> * gcc.target/powerpc/pr48857.c: Modify dejagnu directives to look for lvx and stvx instead of lxvd2x and stxvd2x and require little-endian target. Add comments. * gcc.target/powerpc/swaps-p8-28.c: Add functions for more comprehensive testing. * gcc.target/powerpc/swaps-p8-29.c: Likewise. * gcc.target/powerpc/swaps-p8-30.c: Likewise. * gcc.target/powerpc/swaps-p8-31.c: Likewise. * gcc.target/powerpc/swaps-p8-32.c: Likewise. * gcc.target/powerpc/swaps-p8-33.c: Likewise. * gcc.target/powerpc/swaps-p8-34.c: Likewise. * gcc.target/powerpc/swaps-p8-35.c: Likewise. * gcc.target/powerpc/swaps-p8-36.c: Likewise. * gcc.target/powerpc/swaps-p8-37.c: Likewise. * gcc.target/powerpc/swaps-p8-38.c: Likewise. * gcc.target/powerpc
Backports to gcc 7.x
I would like to backport the following patch to the GCC 7 branch. PR80101: Fix ICE in store_data_bypass_p https://gcc.gnu.org/ml/gcc-patches/2017-04/msg00953.html This patch has been bootstrapped and regression tested on the GCC 7 branch. Is this ok for backporting to GCC 7?
[PATCH,rs6000] Correct dejagnu directives in several newly added tests
This patch corrects an error in several newly added test programs that was causing these programs to be SUPPORTED on platforms where they were not supposed to be SUPPORTED, which was causing unexpected FAILS. The patch has been preapproved by seg...@gcc.gnu.org. gcc/testsuite/ChangeLog: 2017-09-29 Kelvin Nilsen <kel...@gcc.gnu.org> * gcc.target/powerpc/swaps-p8-30.c: Exchange the order of dg-do and dg-require-effective-target directives to correct testing behavior. * gcc.target/powerpc/swaps-p8-32.c: Likewise. * gcc.target/powerpc/swaps-p8-41.c: Likewise. * gcc.target/powerpc/swaps-p8-34.c: Likewise. * gcc.target/powerpc/swaps-p8-43.c: Likewise. * gcc.target/powerpc/swaps-p8-36.c: Likewise. * gcc.target/powerpc/swaps-p8-45.c: Likewise. * gcc.target/powerpc/swaps-p8-29.c: Likewise. * gcc.target/powerpc/swaps-p8-38.c: Likewise. * gcc.target/powerpc/swaps-p8-31.c: Likewise. * gcc.target/powerpc/swaps-p8-40.c: Likewise. * gcc.target/powerpc/swaps-p8-33.c: Likewise. * gcc.target/powerpc/swaps-p8-42.c: Likewise. * gcc.target/powerpc/swaps-p8-35.c: Likewise. * gcc.target/powerpc/swaps-p8-44.c: Likewise. * gcc.target/powerpc/swaps-p8-28.c: Likewise. * gcc.target/powerpc/swaps-p8-37.c: Likewise. * gcc.target/powerpc/swaps-p8-39.c: Likewise. Index: gcc/testsuite/gcc.target/powerpc/swaps-p8-30.c === --- gcc/testsuite/gcc.target/powerpc/swaps-p8-30.c (revision 253294) +++ gcc/testsuite/gcc.target/powerpc/swaps-p8-30.c (working copy) @@ -1,5 +1,5 @@ +/* { dg-do compile { target { powerpc64le-*-* } } } */ /* { dg-require-effective-target powerpc_p8vector_ok } */ -/* { dg-do compile { target { powerpc64le-*-* } } } */ /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */ /* { dg-options "-mcpu=power8 -O3 " } */ /* { dg-final { scan-assembler-not "xxpermdi" } } */ Index: gcc/testsuite/gcc.target/powerpc/swaps-p8-32.c === --- gcc/testsuite/gcc.target/powerpc/swaps-p8-32.c (revision 253294) +++ gcc/testsuite/gcc.target/powerpc/swaps-p8-32.c (working copy) @@ -1,5 +1,5 @@ +/* { dg-do run { target { powerpc*-*-* } } } */ /* { dg-require-effective-target p8vector_hw } */ -/* { dg-do run { target { powerpc*-*-* } } } */ /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */ /* { dg-options "-mcpu=power8 -O3 " } */ Index: gcc/testsuite/gcc.target/powerpc/swaps-p8-41.c === --- gcc/testsuite/gcc.target/powerpc/swaps-p8-41.c (revision 253294) +++ gcc/testsuite/gcc.target/powerpc/swaps-p8-41.c (working copy) @@ -1,5 +1,5 @@ +/* { dg-do run { target { powerpc*-*-* } } } */ /* { dg-require-effective-target p8vector_hw } */ -/* { dg-do run { target { powerpc*-*-* } } } */ /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */ /* { dg-options "-mcpu=power8 -O3 " } */ Index: gcc/testsuite/gcc.target/powerpc/swaps-p8-34.c === --- gcc/testsuite/gcc.target/powerpc/swaps-p8-34.c (revision 253294) +++ gcc/testsuite/gcc.target/powerpc/swaps-p8-34.c (working copy) @@ -1,5 +1,5 @@ +/* { dg-do run { target { powerpc*-*-* } } } */ /* { dg-require-effective-target p8vector_hw } */ -/* { dg-do run { target { powerpc*-*-* } } } */ /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */ /* { dg-options "-mcpu=power8 -O3 " } */ Index: gcc/testsuite/gcc.target/powerpc/swaps-p8-43.c === --- gcc/testsuite/gcc.target/powerpc/swaps-p8-43.c (revision 253294) +++ gcc/testsuite/gcc.target/powerpc/swaps-p8-43.c (working copy) @@ -1,5 +1,5 @@ +/* { dg-do run { target { powerpc*-*-* } } } */ /* { dg-require-effective-target p8vector_hw } */ -/* { dg-do run { target { powerpc*-*-* } } } */ /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */ /* { dg-options "-mcpu=power8 -O3 " } */ Index: gcc/testsuite/gcc.target/powerpc/swaps-p8-36.c === --- gcc/testsuite/gcc.target/powerpc/swaps-p8-36.c (revision 253294) +++ gcc/testsuite/gcc.target/powerpc/swaps-p8-36.c (working copy) @@ -1,5 +1,5 @@ +/* { dg-do compile { target { powerpc64le-*-* } } } */ /* { dg-require-effective-target powerpc_p8vector_ok } */ -/* { dg-do compile { ta
[PATCH v2,rs6000] Replace swap of a loaded vector constant with load of a swapped vector constant
On Power8 little endian, two instructions are needed to load from the natural in-memory representation of a vector into a vector register: a load followed by a swap. When the vector value to be loaded is a constant, more efficient code can be achieved by swapping the representation of the constant in memory so that only a load instruction is required. This second version of the patch responds to feedback provided by Segher Boessenkool, Bill Schmidt, and Pat Haugen. Thank you for the careful reviews: 1. Revised comments in const_load_sequence_p function of rs6000-p8swap.c 2. Restructured nested if statements as a single if-statement with compound condition in const_load_sequence_p function of rs6000-p8swap.c 3. In replace_swapped_load_constant function of rs6000-p8swap.c, replaced two FOR_EACH_INSN_INFO_USE macro expansions with non-looping control structures. 4. Added comments and white space to replace_swapped_load_constant function of rs6000-p8swap.c to improve readability. 5. Reordered handling of cases in replace_swapped_load_constant function of rs6000-p8swap.c, moving V8HImode and V8HFmode handling above V4SImode handling. 6. Replaced gcc_assert (0) with gcc_unreachable () in replace_swapped_load_constant of rs6000-p8swap.c. 7. In rs6000_analyze_swaps function of rs6000-p8swap.c, added requirement that !pass2_insn_entry[i].is_store before calling const_load_sequence_p. 8. Removed unnecessary code blocks at end of rs6000_analyze_swaps function of rs6000-p8swap.c. 9. Added 15 new tests to exercise different vector element sizes. This patch has been bootstrapped and tested without regressions on powerpc64le-unknown-linux (P8) and on powerpc-unknown-linux (P8, big-endian, with both -m32 and -m64 target options). Is this ok for trunk? gcc/ChangeLog: 2017-09-25 Kelvin Nilsen <kel...@gcc.gnu.org> * config/rs6000/rs6000-p8swap.c (const_load_sequence_p): Revise this function to return false if the definition used by the swap instruction is artificial, or if the memory address from which the constant value is loaded is not represented by a base address held in a register or if the base address register is a frame or stack pointer. Additionally, return false if the base address of the loaded constant is a SYMBOL_REF but is not considered to be a constant. (replace_swapped_load_constant): New function. (rs6000_analyze_swaps): Add a new pass to replace a swap of a loaded constant vector with a load of a swapped constant vector. gcc/testsuite/ChangeLog: 2017-09-25 Kelvin Nilsen <kel...@gcc.gnu.org> * gcc.target/powerpc/swaps-p8-28.c: New test. * gcc.target/powerpc/swaps-p8-29.c: New test. * gcc.target/powerpc/swaps-p8-31.c: New test. * gcc.target/powerpc/swaps-p8-32.c: New test. * gcc.target/powerpc/swaps-p8-34.c: New test. * gcc.target/powerpc/swaps-p8-35.c: New test. * gcc.target/powerpc/swaps-p8-37.c: New test. * gcc.target/powerpc/swaps-p8-38.c: New test. * gcc.target/powerpc/swaps-p8-40.c: New test. * gcc.target/powerpc/swaps-p8-41.c: New test. * gcc.target/powerpc/swaps-p8-43.c: New test. * gcc.target/powerpc/swaps-p8-44.c: New test. * gcc.target/powerpc/swps-p8-30.c: New test. * gcc.target/powerpc/swps-p8-33.c: New test. * gcc.target/powerpc/swps-p8-36.c: New test. * gcc.target/powerpc/swps-p8-39.c: New test. * gcc.target/powerpc/swps-p8-42.c: New test. * gcc.target/powerpc/swps-p8-45.c: New test. Index: gcc/config/rs6000/rs6000-p8swap.c === --- gcc/config/rs6000/rs6000-p8swap.c (revision 252768) +++ gcc/config/rs6000/rs6000-p8swap.c (working copy) @@ -335,21 +335,26 @@ const_load_sequence_p (swap_web_entry *insn_entry, const_rtx tocrel_base; - /* Find the unique use in the swap and locate its def. If the def - isn't unique, punt. */ struct df_insn_info *insn_info = DF_INSN_INFO_GET (insn); df_ref use; FOR_EACH_INSN_INFO_USE (use, insn_info) { struct df_link *def_link = DF_REF_CHAIN (use); - if (!def_link || def_link->next) + + /* If there is no def or the def is artificial or there are +multiple defs, punt. */ + if (!def_link || !def_link->ref || DF_REF_IS_ARTIFICIAL (def_link->ref) + || def_link->next) return false; rtx def_insn = DF_REF_INSN (def_link->ref); unsigned uid2 = INSN_UID (def_insn); + /* If this is not a load or is not a swap, return false */ if (!insn_entry[uid2].is_load || !insn_entry[uid2].is_swap) return false; + /* If the source of the rtl def is not a set from memory, return +false. */ rtx body = PATTERN (def_insn); if (GET_CODE (body) != SET || GET_CODE (SE
[PATCH,rs6000] Replace swap of a loaded vector constant with load of a swapped vector constant
On Power8 little endian, two instructions are needed to load from the natural in-memory representation of a vector into a vector register: a load followed by a swap. When the vector value to be loaded is a constant, more efficient code can be achieved by swapping the representation of the constant in memory so that only a load instruction is required. This patch has been bootstrapped and tested without regressions on powerpc64le-unknown-linux (P8) and on powerpc-unknown-linux (P8, big-endian, with both -m32 and -m64 target options). Is this ok for trunk? gcc/ChangeLog: 2017-09-14 Kelvin Nilsen <kel...@gcc.gnu.org> * config/rs6000/rs6000-p8swap.c (const_load_sequence_p): Revise this function to return false if the definition used by the swap instruction is artificial, or if the memory address from which the constant value is loaded is not represented by a base address held in a register or if the base address register is a frame or stack pointer. Additionally, return false if the base address of the loaded constant is a SYMBOL_REF but is not considered to be a constant. (replace_swapped_load_constant): New function. (rs6000_analyze_swaps): Add a new pass to replace a swap of a loaded constant vector with a load of a swapped constant vector. gcc/testsuite/ChangeLog: 2017-09-14 Kelvin Nilsen <kel...@gcc.gnu.org> * gcc.target/powerpc/swaps-p8-28.c: New test. * gcc.target/powerpc/swaps-p8-29.c: New test. * gcc.target/powerpc/swps-p8-30.c: New test. Index: gcc/config/rs6000/rs6000-p8swap.c === --- gcc/config/rs6000/rs6000-p8swap.c (revision 252768) +++ gcc/config/rs6000/rs6000-p8swap.c (working copy) @@ -342,7 +342,8 @@ const_load_sequence_p (swap_web_entry *insn_entry, FOR_EACH_INSN_INFO_USE (use, insn_info) { struct df_link *def_link = DF_REF_CHAIN (use); - if (!def_link || def_link->next) + if (!def_link || !def_link->ref || DF_REF_IS_ARTIFICIAL (def_link->ref) + || def_link->next) return false; rtx def_insn = DF_REF_INSN (def_link->ref); @@ -358,6 +359,8 @@ const_load_sequence_p (swap_web_entry *insn_entry, rtx mem = XEXP (SET_SRC (body), 0); rtx base_reg = XEXP (mem, 0); + if (!REG_P (base_reg)) + return false; df_ref base_use; insn_info = DF_INSN_INFO_GET (def_insn); @@ -370,6 +373,14 @@ const_load_sequence_p (swap_web_entry *insn_entry, if (!base_def_link || base_def_link->next) return false; + /* Constants held on the stack are not "true" constants + * because their values are not part of the static load + * image. If this constant's base reference is a stack + * or frame pointer, it is seen as an artificial + * reference. */ + if (DF_REF_IS_ARTIFICIAL (base_def_link->ref)) + return false; + rtx tocrel_insn = DF_REF_INSN (base_def_link->ref); rtx tocrel_body = PATTERN (tocrel_insn); rtx base, offset; @@ -385,6 +396,25 @@ const_load_sequence_p (swap_web_entry *insn_entry, split_const (XVECEXP (tocrel_base, 0, 0), , ); if (GET_CODE (base) != SYMBOL_REF || !CONSTANT_POOL_ADDRESS_P (base)) return false; + else + { + /* FIXME: The conditions under which + * ((GET_CODE (const_vector) == SYMBOL_REF) && + * !CONSTANT_POOL_ADDRESS_P (const_vector)) + * are not well understood. This code prevents + * an internal compiler error which will occur in + * replace_swapped_load_constant () if we were to return + * true. Some day, we should figure out how to properly + * handle this condition in + * replace_swapped_load_constant () and then we can + * remove this special test. */ + rtx const_vector = get_pool_constant (base); + if (GET_CODE (const_vector) == SYMBOL_REF) + { + if (!CONSTANT_POOL_ADDRESS_P (const_vector)) + return false; + } + } } } return true; @@ -1281,6 +1311,189 @@ replace_swap_with_copy (swap_web_entry *insn_entry insn->set_deleted (); } +/* Given that swap_insn represents a swap of a load of a constant + vector value, replace with a single instruction that loads a + swapped variant of the original constant. + + The "natural" representation of a byte array in memory is the same + for big endian and little endian. + + unsigned char byte_array[] = + { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f }; + + However, when loaded into a vector register, the representation + depends on endian con
Backports to gcc 6.x
I would like to backport the following patches to the GCC 6 branch. PR9: Fix failure of gcc.dg/loop-8.c on Power https://gcc.gnu.org/ml/gcc-patches/2017-01/msg01788.html PR68972: g++.dg/cpp1y/vla-initlist1.C test case fails on power https://gcc.gnu.org/ml/gcc-patches/2017-02/msg00541.html Handle conflicting target options -mno-power9-vector and -mcpu=power9 https://gcc.gnu.org/ml/gcc-patches/2017-03/msg01192.html PR80103: Fix ICE with cross compiler https://gcc.gnu.org/ml/gcc-patches/2017-03/msg01335.html PR80101: Fix ICE in store_data_bypass_p https://gcc.gnu.org/ml/gcc-patches/2017-04/msg00953.html Each of these patches has been bootstrapped and regression tested on the GCC 6 branch. In backport, patch PR80103 omits certain changes to existing comments that are not present in GCC6. Are these patches ok for backporting to GCC 6?
[PATCH,rs6000] PR80103: Fix typo in test case
While reviewing regression test results for a back port of the PR80103 patch, I discovered a typographic error in the test case. This patch corrects the error. I have tested this fix on powerpc64le-unknown-linux-gnu with no regressions. Is this ok for trunk? gcc/testsuite/ChangeLog: 2017-06-30 Kelvin Nilsen <kel...@gcc.gnu.org> PR target/80103 * gcc.target/powerpc/pr80103-1.c (b): Correct spelling of __attribute__. Index: gcc/testsuite/gcc.target/powerpc/pr80103-1.c === --- gcc/testsuite/gcc.target/powerpc/pr80103-1.c(revision 249798) +++ gcc/testsuite/gcc.target/powerpc/pr80103-1.c(working copy) @@ -12,5 +12,5 @@ int a; void b (__attribute__ ((__vector_size__ (16))) char c) { - a = ((__attributes__ ((__vector_size__ (2 * sizeof (long long) c)[0]; + a = ((__attribute__ ((__vector_size__ (2 * sizeof (long long) c)[0]; }
Re: Backport [PATCH,rs6000] PR80103: Fix ICE with cross compiler
Is the attached refinement of this patch previously applied to mainline ok for backport to gcc 6? I have bootstrapped and tested without regressions on powerpc64le-unknown-linux-gnu. This patch differs from the original mainline patch in the following regards: 1. Certain commentary changes are omitted because the context to which they applied is missing from GCC 6. 2. A typo in a test case has been corrected. The typo was discovered during scrutiny of the backport regression testing results. I will momentarily submit a patch to correct the same test case on main line. On 03/24/2017 04:14 PM, Segher Boessenkool wrote: > On Fri, Mar 24, 2017 at 04:04:33PM -0600, Kelvin Nilsen wrote: >> PR 80103 provides a test case which results in an internal >> compiler error when invoked with -mno-direct-move -mpower9-dform- >> vector target options. The internal compiler error results because >> these two target options are incompatible with each other. >> >> The enclosed patch simply disables this particular combination of >> target options, terminating gcc with an error message instead of >> producing an internal compiler error. Additionally, this patch >> includes new comments to address omissions from a patch committed >> on 2017/03/23 which deals with conflicts between the >> -mno-power9-vector and -mcpu=power9 target options. >> >> This patch has been bootstrapped and tested with no regressions on >> both powerpc64-unknown-linux-gnu and powerpc64le-unknown-linux-gnu. >> Is this ok for the trunk? > > This looks good, please apply. Thanks, > > > Segher > > gcc/ChangeLog: 2017-06-28 Kelvin Nilsen <kel...@gcc.gnu.org> Backport from mainline 2017-03-27 Kelvin Nilsen <kel...@gcc.gnu.org> PR target/80103 * config/rs6000/rs6000.c (rs6000_option_override_internal): Add special handling for target option conflicts between dform options (-mpower9-dform, -mpower9-dform-vector, -mpower9-dform-scalar) and -mno-direct-move. gcc/testsuite/ChangeLog: 2017-06-28 Kelvin Nilsen <kel...@gcc.gnu.org> Backport from mainline 2017-03-27 Kelvin Nilsen <kel...@gcc.gnu.org> PR target/80103 * gcc.target/powerpc/pr80103-1.c: New test. Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 249572) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -4295,6 +4295,33 @@ rs6000_option_override_internal (bool global_init_ | OPTION_MASK_P9_DFORM_VECTOR); } + if ((TARGET_P9_DFORM_SCALAR || TARGET_P9_DFORM_VECTOR) + && !TARGET_DIRECT_MOVE) +{ + /* We prefer to not mention undocumented options in +error messages. However, if users have managed to select +power9-dform without selecting power9-vector, they +already know about undocumented flags. */ + if ((rs6000_isa_flags_explicit & OPTION_MASK_DIRECT_MOVE) + && ((rs6000_isa_flags_explicit & OPTION_MASK_P9_DFORM_VECTOR) || + (rs6000_isa_flags_explicit & OPTION_MASK_P9_DFORM_SCALAR) || + (TARGET_P9_DFORM_BOTH == 1))) + error ("-mpower9-dform, -mpower9-dform-vector, -mpower9-dform-scalar" + " require -mdirect-move"); + else if ((rs6000_isa_flags_explicit & OPTION_MASK_DIRECT_MOVE) == 0) + { + rs6000_isa_flags |= OPTION_MASK_DIRECT_MOVE; + rs6000_isa_flags_explicit |= OPTION_MASK_DIRECT_MOVE; + } + else + { + rs6000_isa_flags &= + ~(OPTION_MASK_P9_DFORM_SCALAR | OPTION_MASK_P9_DFORM_VECTOR); + rs6000_isa_flags_explicit |= + (OPTION_MASK_P9_DFORM_SCALAR | OPTION_MASK_P9_DFORM_VECTOR); + } +} + if (TARGET_P9_DFORM_SCALAR && !TARGET_UPPER_REGS_DF) { /* We prefer to not mention undocumented options in Index: gcc/testsuite/gcc.target/powerpc/pr80103-1.c === --- gcc/testsuite/gcc.target/powerpc/pr80103-1.c(revision 0) +++ gcc/testsuite/gcc.target/powerpc/pr80103-1.c(working copy) @@ -0,0 +1,16 @@ +/* { dg-do compile { target { powerpc*-*-* } } } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */ +/* { dg-require-effective-target powerpc_p9vector_ok } */ +/* { dg-options "-mpower9-dform-vector -mno-direct-move" } */ +/* { dg-excess-errors "expect error due to conflicting target options" } */ +/* Since the error message is not associated with a particular line + number, we cannot use the dg-error directive and cannot specify a + regexp to describe the expected error message. The expected er
Re: Backport [PATCH,rs6000] Handle conflicting target options -mno-power9-vector and -mcpu=power9
I have bootstrapped and tested this patch on powerpc64le-unkonwn-linux-gnu with no regressions. Is this ok for backporting to gcc 6? On 03/22/2017 10:17 PM, Segher Boessenkool wrote: > On Wed, Mar 22, 2017 at 05:55:53PM -0600, Kelvin Nilsen wrote: >>> Or it could do -mpower9-dform-scalar but disable -mpower9-dform-vector? >>> That seems more reasonable. >> >> The internal problem report sent to me said "-mno-power9-vector should >> override power9-dform unless the latter has been deliberately specified >> by the user." I'm just following orders. > > Heh :-) > >> If you think it preferable to >> only override -mpower-dform-vector, I'll make that modification. > > It is more logical. Or so I though. But as it turns out, > -mpower9-dform-scalar is about vector registers as well. > > So the patch is approved for trunk as-is. Thanks! > >>>>* config/rs6000/rs6000.c (rs6000_option_override_internal): Change >>>>handling of certain combinations of target options, including the >>>>combinations -mpower8-vector vs. -mno-vsx, -mpower8-vector vs. >>>>-mno-power8-vector, and -mpower9_dform vs. -mno-power9-vector. >>> >>> Those other changes are independent? >> >> Actually, these other changes are not independent. My initial attempt >> at a patch only changed the behavior of -mpower9_dform vs. >> -mno-power9-vector. But this actually resulted in a regression of an >> existing test. To "properly" handle the new case without impacting >> existing "established" behavior (as represented in the existing dejagnu >> testsuite), I had to make these other changes as well. > > Too many options :-( > > > Segher > > -- Kelvin Nilsen, Ph.D. kdnil...@linux.vnet.ibm.com home office: 801-756-4821, cell: 520-991-6727 IBM Linux Technology Center - PPC Toolchain
Re: [PATCH] PR68972: g++.dg/cpp1y/vla-initlist1.C test case fails on power (backport)
Is this ok for backport to GCC 6? On 02/06/2017 03:20 PM, Kelvin Nilsen wrote: > > The test g++.dg/cpp1y/vla-initlist1.C makes assumptions that the memory > used to represent the private temporary variables of neighboring control > blocks at the same control nesting level is: > > 1. found at the same address, and > 2. not overwritten between when the first block ends and the second > block begins. > > While these assumptions are valid with some optimization choices on some > architectures, these assumptions do not hold universally. > > With optimization disabled on the power architecture, the > g++.dg/cpp1y/vla-initlist1.C test program runs initialization code to > allocate the variable-length array a[] before entry into the second of > two neighboring control blocks. This initialization code overwrites the > first two cells of the array i[] that were initialized by the first of > the two neighboring control blocks. Thus, the initialization value > stored into i[1] is no longer present when this value is subsequently > fetched as a[1].i from within the second control block. > > This patch disables this particular test case on power hardware. > > The patch has been bootstrapped and tested on > powerpc64le-unknown-linux with no regressions. > > Is this ok for trunk? > > gcc/testsuite/ChangeLog: > > 2017-02-06 Kelvin Nilsen <kel...@gcc.gnu.org> > > PR target/68972 > * g++.dg/cpp1y/vla-initlist1.C: Add dg-skip-if directive to > disable this test on power architecture. > > Index: gcc/testsuite/g++.dg/cpp1y/vla-initlist1.C > === > --- gcc/testsuite/g++.dg/cpp1y/vla-initlist1.C(revision 245156) > +++ gcc/testsuite/g++.dg/cpp1y/vla-initlist1.C(working copy) > @@ -1,4 +1,5 @@ > // { dg-do run { target c++11 } } > +// { dg-skip-if "power overwrites two slots of array i" { "power*-*-*" > } { "*" } { "" } } > // { dg-options "-Wno-vla" } > > #include > > -- Kelvin Nilsen, Ph.D. kdnil...@linux.vnet.ibm.com home office: 801-756-4821, cell: 520-991-6727 IBM Linux Technology Center - PPC Toolchain
Re: [PATCH] PR66669: Fix failure of gcc.dg/loop-8.c on Power (Backport)
Is it ok to backport this patch to GCC-6? On 01/23/2017 09:59 AM, Kelvin Nilsen wrote: > > The test gcc.dg/loop-8.c makes assumptions that are not valid on Power > architecture (and on certain other architectures for which this issue > has already been addressed). The test case assumes that a single > loop-invariant statement will be moved outside the loop. On Power, a > constant is copy-propagated within the loop, and the subsequent > loop-invariant code motion moves two loop-invariant statements out of > the loop. > > This patch simply disables this test case on Power architecture. > > > gcc/testsuite/ChangeLog: > > 2017-01-23 Kelvin Nilsen <kel...@gcc.gnu.org> > > PR target/9 > * gcc.dg/loop-8.c: Modify dg-skip-if directive to exclude this > test on powerpc targets. > > Index: gcc/testsuite/gcc.dg/loop-8.c > === > --- gcc/testsuite/gcc.dg/loop-8.c (revision 244730) > +++ gcc/testsuite/gcc.dg/loop-8.c (working copy) > @@ -1,6 +1,6 @@ > /* { dg-do compile } */ > /* { dg-options "-O1 -fdump-rtl-loop2_invariant" } */ > -/* { dg-skip-if "unexpected IV" { "hppa*-*-* mips*-*-* visium-*-*" } { "*" } > { "" } } */ > +/* { dg-skip-if "unexpected IV" { "hppa*-*-* mips*-*-* visium-*-* > powerpc*-*-*" } { "*" } { "" } } */ > > void > f (int *a, int *b) > > -- Kelvin Nilsen, Ph.D. kdnil...@linux.vnet.ibm.com home office: 801-756-4821, cell: 520-991-6727 IBM Linux Technology Center - PPC Toolchain
[PATCH,rs6000] Add IEEE 128 support for several existing built-in functions
This patch adds IEEE 128 support to the existing scalar_insert_exp, scalar_extract_exp, scalar_extract_sig, scalar_test_data_class, and scalar_test_neg rs6000 built-in functions. Test programs are provided to exercise the new IEEE 128 functionality and to validate forms of these built-in functions that do not depend on IEEE 128 support. The patch has been boostrapped and tested on powerpc64le-unknown-linux (both P8 and P9 targets) and powerpc-unknown-linux (beg-endian, with both -m32 and -m64 target options) with no regressions. Is this ok for the trunk? gcc/ChangeLog: 2017-06-19 Kelvin Nilsen <kel...@gcc.gnu.org> * config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Add array entries to represent __ieee128 versions of the scalar_test_data_class, scalar_test_neg, scalar_extract_exp, scalar_extract_sig, and scalar_insert_exp built-in functions. (altivec_resolve_overloaded_builtin): Add special case handling for the __builtin_scalar_insert_exp function, as represented by the P9V_BUILTIN_VEC_VSIEDP constant. * config/rs6000/rs6000-builtin.def (VSEEQP): Add scalar extract exponent support for __ieee128 argument. (VSESQP): Add scalar extract signature support for __ieee128 argument. (VSTDCNQP): Add scalar test negative support for __ieee128 argument. (VSIEQP): Add scalar insert exponent support for __int128 argument with __ieee128 result. (VSIEQPF): Add scalar insert exponent support for __ieee128 argument with __ieee128 result. (VSTDCQP): Add scalar test data class support for __ieee128 argument. (VSTDCNQP): Add overload support for scalar test negative with __ieee128 argument. (VSTDCQP): Add overload support for scalar test data class __ieee128 argument. * config/rs6000/vsx.md (UNSPEC_VSX_SIEXPQP): New constant. (xsxexpqp): New insn for VSX scalar extract exponent quad precision. (xsxsigqp): New insn for VSX scalar extract significand quad precision. (xsiexpqpf): New insn for VSX scalar insert exponent quad precision with floating point argument. (xststdcqp): New expand for VSX scalar test data class quad precision. (xststdcnegqp): New expand for VSX scalar test negative quad precision. (xststdcqp): New insn to match expansions for VSX scalar test data class quad precision and VSX scalar test negative quad precision. * config/rs6000/rs6000.c (rs6000_expand_binop_builtin): Add special case operand checking to enforce that second operand of VSX scalar test data class with quad precision argument is a 7-bit unsigned literal. * doc/extend.texi (PowerPC AltiVec Built-in Functions): Add prototypes and descriptions of __ieee128 versions of scalar_extract_exp, scalar_extract_sig, scalar_insert_exp, scalar_test_data_class, and scalar_test_neg built-in functions. gcc/testsuite/ChangeLog: 2017-06-19 Kelvin Nilsen <kel...@gcc.gnu.org> * gcc.target/powerpc/bfp/scalar-cmp-exp-eq-3.c: New test. * gcc.target/powerpc/bfp/scalar-cmp-exp-eq-4.c: New test. * gcc.target/powerpc/bfp/scalar-cmp-exp-gt-3.c: New test. * gcc.target/powerpc/bfp/scalar-cmp-exp-gt-4.c: New test. * gcc.target/powerpc/bfp/scalar-cmp-exp-lt-3.c: New test. * gcc.target/powerpc/bfp/scalar-cmp-exp-lt-4.c: New test. * gcc.target/powerpc/bfp/scalar-cmp-exp-unordered-3.c: New test. * gcc.target/powerpc/bfp/scalar-cmp-exp-unordered-4.c: New test. * gcc.target/powerpc/bfp/scalar-extract-exp-3.c: New test. * gcc.target/powerpc/bfp/scalar-extract-exp-4.c: New test. * gcc.target/powerpc/bfp/scalar-extract-exp-5.c: New test. * gcc.target/powerpc/bfp/scalar-extract-exp-6.c: New test. * gcc.target/powerpc/bfp/scalar-extract-exp-7.c: New test. * gcc.target/powerpc/bfp/scalar-extract-sig-3.c: New test. * gcc.target/powerpc/bfp/scalar-extract-sig-4.c: New test. * gcc.target/powerpc/bfp/scalar-extract-sig-5.c: New test. * gcc.target/powerpc/bfp/scalar-extract-sig-6.c: New test. * gcc.target/powerpc/bfp/scalar-extract-sig-7.c: New test. * gcc.target/powerpc/bfp/scalar-insert-exp-10.c: New test. * gcc.target/powerpc/bfp/scalar-insert-exp-11.c: New test. * gcc.target/powerpc/bfp/scalar-insert-exp-12.c: New test. * gcc.target/powerpc/bfp/scalar-insert-exp-13.c: New test. * gcc.target/powerpc/bfp/scalar-insert-exp-14.c: New test. * gcc.target/powerpc/bfp/scalar-insert-exp-15.c: New test. * gcc.target/powerpc/bfp/scalar-insert-exp-6.c: New test. * gcc.target/powerpc/bfp/scalar-insert-exp-7.c: New test. * gcc.target/powerpc/bfp/scalar-insert-exp-8.c: New test. * gcc.target/powerpc/bfp/s
[PATCH v2,rs6000] Add built-in function support for compare bytes instruction
This patch adds support for the compare bytes instruction, which has been available in the rs6000 architecture since Power6. Thank you to Segher Boessenkool for feedback on the original submission of this patch. The following refinements have been incorporated: 1. Changed the implementation and documentation to present a single overloaded function that handles either 32-bit or 64-bit arguments. 2. Corrected the spelling of compare in the comment describing the RS6000_BTM_CMPB macro. In response to reviewer question of whether this line is too long: it is not. It only appears that way due to alignment of tabs in the diff output. The patch has been bootstrapped and tested on powerpc64le-unknown-linux and powerpc-unknown-linux (big-endian, with both -m32 and -m64 target options) with no regressions. Is this ok for the trunk? gcc/testsuite/ChangeLog: 2017-05-08 Kelvin Nilsen <kel...@gcc.gnu.org> * gcc.target/powerpc/cmpb-1.c: New test. * gcc.target/powerpc/cmpb-2.c: New test. * gcc.target/powerpc/cmpb-3.c: New test. * gcc.target/powerpc/cmpb32-1.c: New test. * gcc.target/powerpc/cmpb32-2.c: New test. gcc/ChangeLog: 2017-05-08 Kelvin Nilsen <kel...@gcc.gnu.org> * config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Add array entries to represent two legal parameterizations of the overloaded __builtin_cmpb function, as represented by the P6_OV_BUILTIN_CMPB constant. (altivec_resolve_overloaded_builtin): Add special case handling for the __builtin_cmpb function, as represented by the P6_OV_BUILTIN_CMPB constant. * config/rs6000/rs6000-builtin.def (BU_P6_2): New macro. (BU_P6_64BIT_2): New macro. (BU_P6_OVERLOAD_2): New macro (CMPB_32): Add 32-bit compare-bytes support for 32-bit only targets. (CMPB): Add 64-bit compare-bytes support for 32-bit and 64-bit targets. (CMPB): Add overload support to represent both 32-bit and 64-bit compare-bytes function. * config/rs6000/rs6000.c (rs6000_builtin_mask_calculate): Add support for TARGET_CMPB. * config/rs6000/rs6000.h: Add support for RS6000_BTM_CMPB. * doc/extend.texi (PowerPC AltiVec Built-in Functions): Add documentation of the __builtin_cmpb overloaded built-in function. Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 247069) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -3788,6 +3788,7 @@ HOST_WIDE_INT rs6000_builtin_mask_calculate (void) { return (((TARGET_ALTIVEC)? RS6000_BTM_ALTIVEC : 0) + | ((TARGET_CMPB) ? RS6000_BTM_CMPB : 0) | ((TARGET_VSX) ? RS6000_BTM_VSX : 0) | ((TARGET_SPE) ? RS6000_BTM_SPE : 0) | ((TARGET_PAIRED_FLOAT) ? RS6000_BTM_PAIRED: 0) Index: gcc/config/rs6000/rs6000.h === --- gcc/config/rs6000/rs6000.h (revision 247069) +++ gcc/config/rs6000/rs6000.h (working copy) @@ -2717,6 +2717,7 @@ extern int frame_pointer_needed; aren't in target_flags. */ #define RS6000_BTM_ALWAYS 0 /* Always enabled. */ #define RS6000_BTM_ALTIVEC MASK_ALTIVEC/* VMX/altivec vectors. */ +#define RS6000_BTM_CMPBMASK_CMPB /* ISA 2.05: compare bytes. */ #define RS6000_BTM_VSX MASK_VSX/* VSX (vector/scalar). */ #define RS6000_BTM_P8_VECTOR MASK_P8_VECTOR /* ISA 2.07 vector. */ #define RS6000_BTM_P9_VECTOR MASK_P9_VECTOR /* ISA 3.0 vector. */ Index: gcc/config/rs6000/rs6000-c.c === --- gcc/config/rs6000/rs6000-c.c(revision 247069) +++ gcc/config/rs6000/rs6000-c.c(working copy) @@ -5348,6 +5348,11 @@ const struct altivec_builtin_types altivec_overloa RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, 0 }, + { P6_OV_BUILTIN_CMPB, P6_BUILTIN_CMPB_32, +RS6000_BTI_UINTSI, RS6000_BTI_UINTSI, RS6000_BTI_UINTSI, 0 }, + { P6_OV_BUILTIN_CMPB, P6_BUILTIN_CMPB, +RS6000_BTI_UINTDI, RS6000_BTI_UINTDI, RS6000_BTI_UINTDI, 0 }, + { P8V_BUILTIN_VEC_VUPKHSW, P8V_BUILTIN_VUPKHSW, RS6000_BTI_V2DI, RS6000_BTI_V4SI, 0, 0 }, { P8V_BUILTIN_VEC_VUPKHSW, P8V_BUILTIN_VUPKHSW, @@ -6409,25 +6414,76 @@ altivec_resolve_overloaded_builtin (location_t loc for (desc = altivec_overloaded_builtins; desc->code && desc->code != fcode; desc++) continue; - -/* For arguments after the last, we have RS6000_BTI_NOT_OPAQUE in - the opX fields. */ -for (; desc->code == fcode; desc++) + +/* Need to special case __builtin_cmp because the overloaded forms + of this function take (unsigned int, unsigned int) o
[PATCH,rs600] Add built-in function support for compare bytes instruction
This patch adds support for the compare bytes instruction, which has been available in the rs6000 architecture since Power6. The patch has been bootstrapped and tested on powerpc64le-unknown-linux and powerpc-unknown-linux (big-endian, with both -m32 and -m64 target options) with no regressions. Is this ok for the trunk? gcc/ChangeLog: 2017-04-28 Kelvin Nilsen <kel...@gcc.gnu.org> * config/rs6000/rs6000.c (rs6000_builtin_mask_calculate): Add support for TARGET_CMPB. * config/rs6000/rs6000.h: Add support for RS6000_BTM_CMPB. * config/rs6000/rs6000-builtin.def (BU_P6_CMPB_2): New macro. (BU_P6_64BIT_CMPB_2): New macro. (CMPB_32): Add compare-bytes support for 32-bit only targets. (CMPB): Add compare-bytes support for 32-bit and 64-bit targets. * doc/extend.texi (PowerPC AltiVec Built-in Functions): Add documentation of __builtin_cmpb and __builtin_cmpb_32 built-in functions. gcc/testsuite/ChangeLog: 2017-04-28 Kelvin Nilsen <kel...@gcc.gnu.org> * gcc.target/powerpc/cmpb-1.c: New test. * gcc.target/powerpc/cmpb-2.c: New test. * gcc.target/powerpc/cmpb-3.c: New test. * gcc.target/powerpc/cmpb32-1.c: New test. * gcc.target/powerpc/cmpb32-2.c: New test. Index: gcc/config/rs6000/rs6000-builtin.def === --- gcc/config/rs6000/rs6000-builtin.def(revision 247069) +++ gcc/config/rs6000/rs6000-builtin.def(working copy) @@ -339,6 +339,26 @@ | RS6000_BTC_SPECIAL), \ CODE_FOR_nothing) /* ICODE */ +/* ISA 2.05 (power6) convenience macros. */ +/* For functions that depend on the CMPB instruction */ +#define BU_P6_CMPB_2(ENUM, NAME, ATTR, ICODE) \ + RS6000_BUILTIN_2 (P6_BUILTIN_ ## ENUM, /* ENUM */ \ + "__builtin_" NAME, /* NAME */ \ + RS6000_BTM_CMPB,/* MASK */ \ + (RS6000_BTC_ ## ATTR/* ATTR */ \ +| RS6000_BTC_BINARY), \ + CODE_FOR_ ## ICODE) /* ICODE */ + +/* For functions that depend on 64-BIT support and on the CMPB instruction */ +#define BU_P6_64BIT_CMPB_2(ENUM, NAME, ATTR, ICODE)\ + RS6000_BUILTIN_2 (P6_BUILTIN_ ## ENUM, /* ENUM */ \ + "__builtin_" NAME, /* NAME */ \ + RS6000_BTM_CMPB \ + | RS6000_BTM_64BIT, /* MASK */ \ + (RS6000_BTC_ ## ATTR/* ATTR */ \ +| RS6000_BTC_BINARY), \ + CODE_FOR_ ## ICODE) /* ICODE */ + /* ISA 2.07 (power8) vector convenience macros. */ /* For the instructions that are encoded as altivec instructions use __builtin_altivec_ as the builtin name. */ @@ -1778,6 +1798,10 @@ BU_VSX_OVERLOAD_X (ST,"st") BU_VSX_OVERLOAD_X (XL, "xl") BU_VSX_OVERLOAD_X (XST, "xst") +/* 2 argument CMPB instructions added in ISA 2.05. */ +BU_P6_CMPB_2 (CMPB_32,"cmpb_32", CONST, cmpbsi3) +BU_P6_64BIT_CMPB_2 (CMPB, "cmpb", CONST, cmpbdi3) + /* 1 argument VSX instructions added in ISA 2.07. */ BU_P8V_VSX_1 (XSCVSPDPN, "xscvspdpn", CONST, vsx_xscvspdpn) BU_P8V_VSX_1 (XSCVDPSPN, "xscvdpspn", CONST, vsx_xscvdpspn) Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 247069) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -3788,6 +3788,7 @@ HOST_WIDE_INT rs6000_builtin_mask_calculate (void) { return (((TARGET_ALTIVEC)? RS6000_BTM_ALTIVEC : 0) + | ((TARGET_CMPB) ? RS6000_BTM_CMPB : 0) | ((TARGET_VSX) ? RS6000_BTM_VSX : 0) | ((TARGET_SPE) ? RS6000_BTM_SPE : 0) | ((TARGET_PAIRED_FLOAT) ? RS6000_BTM_PAIRED: 0) Index: gcc/config/rs6000/rs6000.h === --- gcc/config/rs6000/rs6000.h (revision 247069) +++ gcc/config/rs6000/rs6000.h (working copy) @@ -2717,6 +2717,7 @@ extern int frame_pointer_needed; aren't in target_flags. */ #define RS6000_BTM_ALWAYS 0 /* Always enabled. */ #define RS6000_BTM_ALTIVEC MASK_ALTIVEC/* VMX/altivec vectors. */ +#define RS6000_BTM_CMPBMASK_CMPB /* ISA 2.05: cmopare bytes. */ #define RS6000_BTM_VSX MASK_VSX
[PATCH v3,rs6000] PR80101: Fix ICE in store_data_bypass_p
This problem reports an assertion error when certain rtl expressions which are not eligible as producers or consumers of a store bypass optimization are passed as arguments to the store_data_bypass_p function. Since the problem surfaced with tests targeting the rs6000 architecture, the proposed patch is integrated within the rs6000 back end. A new rs6000_store_data_bypass_p function has been introduced and all calls to store_data_bypass_p from within the rs6000 back end have been replaced with calls to rs6000_store_data_bypass_p. This new function scans its arguments for patterns that are known to cause assertion errors in store_data_bypass_p and returns false if any of those patterns are encountered. Otherwise, rs6000_store_data_bypass_p simply returns the result produced when passing its arguments to a call of store_data_bypass_p. Thank you for feedback and guidance from Eric Botcazou, Segher Boessenkool, Richard Sandiford, and Pat Haugen which was offered in response to my first two patch submissions and an RFC post on this topic. With all of your help, I now have a much better understanding of the intended role of store_data_bypass_p. The patch has been boostrapped without regressions on powerpc64le-unknown-linux-gnu. Is this ok for the trunk? gcc/testsuite/ChangeLog: 2017-04-20 Kelvin Nilsen <kel...@gcc.gnu.org> PR target/80101 * gcc.target/powerpc/pr80101-1.c: New test. gcc/ChangeLog: 2017-04-20 Kelvin Nilsen <kel...@gcc.gnu.org> PR target/80101 * config/rs6000/power6.md: Replace store_data_bypass_p calls with rs6000_store_data_bypass_p in seven define_bypass directives and in several comments. * config/rs6000/rs6000-protos.h: Add prototype for rs6000_store_data_bypass_p function. * config/rs6000/rs6000.c (rs6000_store_data_bypass_p): New function implements slightly different (rs6000-specific) semantics than store_data_bypass_p, returning false rather than aborting with assertion error when arguments do not satisfy the requirements of store data bypass. (rs6000_adjust_cost): Replace six calls of store_data_bypass_p with rs6000_store_data_bypass_p. Index: gcc/config/rs6000/power6.md === --- gcc/config/rs6000/power6.md (revision 246469) +++ gcc/config/rs6000/power6.md (working copy) @@ -108,7 +108,7 @@ power6-store-update-indexed,\ power6-fpstore,\ power6-fpstore-update" - "store_data_bypass_p") + "rs6000_store_data_bypass_p") (define_insn_reservation "power6-load-ext" 4 ; fx (and (eq_attr "type" "load") @@ -128,7 +128,7 @@ power6-store-update-indexed,\ power6-fpstore,\ power6-fpstore-update" - "store_data_bypass_p") + "rs6000_store_data_bypass_p") (define_insn_reservation "power6-load-update" 2 ; fx (and (eq_attr "type" "load") @@ -276,7 +276,7 @@ power6-store-update-indexed,\ power6-fpstore,\ power6-fpstore-update" - "store_data_bypass_p") + "rs6000_store_data_bypass_p") (define_insn_reservation "power6-cntlz" 2 (and (eq_attr "type" "cntlz") @@ -289,7 +289,7 @@ power6-store-update-indexed,\ power6-fpstore,\ power6-fpstore-update" - "store_data_bypass_p") + "rs6000_store_data_bypass_p") (define_insn_reservation "power6-var-rotate" 4 (and (eq_attr "type" "shift") @@ -355,7 +355,7 @@ power6-store-update-indexed,\ power6-fpstore,\ power6-fpstore-update" - "store_data_bypass_p") + "rs6000_store_data_bypass_p") (define_insn_reservation "power6-delayed-compare" 2 ; N/A (and (eq_attr "type" "shift") @@ -420,7 +420,7 @@ power6-store-update-indexed,\ power6-fpstore,\ power6-fpstore-update" - "store_data_bypass_p") + "rs6000_store_data_bypass_p") (define_insn_reservation "power6-idiv" 44 (and (eq_attr "type" "div") @@ -436,7 +436,7 @@ ; power6-store-update-indexed,\ ; power6-fpstore,\ ; power6-fpstore-update" -; "store_data_bypass_p") +; "rs6000_store_data_bypass_p") (define_insn_reservation "power6-ldiv" 56 (and (eq_attr "type" "div") @@ -452,7 +452,7 @@ ; power6-store-update-indexed,\ ; power6-fpstore,\ ; power6-fpstor
[PATCH v2] PR80101: Fix ICE in store_data_bypass_p
This problem reports an assertion error when certain rtl expressions which are not eligible as producers or consumers of a store bypass optimization are passed as arguments to the store_data_bypass_p function. The proposed patch returns false from store_data_bypass_p rather than terminating with an assertion error. False indicates that the passed arguments are not eligible for the store bypass scheduling optimization. Thank you for feedback and guidance received in response to my first patch submission and the follow-on RFC post from Eric Botcazou, Segher Boessenkool, Richard Sandiford, and Pat Haugen. With all of your help, I now have a much better understanding of the intended role of store_data_bypass_p. This new revision of the patch differs from the original submission in the following ways: 1. I have modified the comment that describes this function to clarify that this function is only called if it is already determined that there exists at least one variable that is set by OUT_INSN and read by IN_INSN. My modified comment also clarifies the function's new behavior, as implemented with this patch. 2. I have added comments to the body of the function to clarify some of the rationale for the existing code and the newly inserted code, especially where I was originally confused because I did not understand the rationale. 3. I have added code to allow USE expressions beneath a PARALLEL node without invalidating store data bypass (for consistency, for example, with the implementation of single_set, and as mentioned in feedback from Richard Sandiford). I gather that it is extremely unlikely that in_insn would represent a PARALLEL with multiple store operations beneath it, but this function, as originally implemented, supports that possibility, and my changes to the function do as well. The patch has been boostrapped without regressions on powerpc64le-unknown-linux-gnu. Is this ok for the trunk? gcc/testsuite/ChangeLog: 2017-04-14 Kelvin Nilsen <kel...@gcc.gnu.org> * gcc.target/powerpc/pr80101-1.c: New test. gcc/ChangeLog: 2017-04-14 Kelvin Nilsen <kel...@gcc.gnu.org> * recog.c (store_data_bypass_p): Rather than terminate with assertion error, return false if either of the function's arguments is not a singe_set or a PARALLEL with only SETS inside. Allow USE subexpressions in addition to CLOBBER subexpressions within a PARALLEL that represents either of the function's arguments. Add and modify comments to clarify behavior. Index: gcc/recog.c === --- gcc/recog.c (revision 246469) +++ gcc/recog.c (working copy) @@ -3663,9 +3663,14 @@ peephole2_optimize (void) /* Common predicates for use with define_bypass. */ -/* True if the dependency between OUT_INSN and IN_INSN is on the store - data not the address operand(s) of the store. IN_INSN and OUT_INSN - must be either a single_set or a PARALLEL with SETs inside. */ +/* Given that there exists at least one variable that is set (produced) + by OUT_INSN and read (consumed) by IN_INSN, return true iff + IN_INSN represents one or more memory store operations and none of + the variables set by OUT_INSN is used by IN_INSN as the address of a + store operation. If either IN_INSN or OUT_INSN does not represent + a "single" RTL SET expression (as loosely defined by the + implementation of the single_set function) or a PARALLEL with only + SETs, CLOBBERs, and USEs inside, this function returns false. */ int store_data_bypass_p (rtx_insn *out_insn, rtx_insn *in_insn) @@ -3678,6 +3683,8 @@ store_data_bypass_p (rtx_insn *out_insn, rtx_insn in_set = single_set (in_insn); if (in_set) { + /* If in_set does not represent a store operation, this insn +pair is not eligible for store data bypass. */ if (!MEM_P (SET_DEST (in_set))) return false; @@ -3684,6 +3691,9 @@ store_data_bypass_p (rtx_insn *out_insn, rtx_insn out_set = single_set (out_insn); if (out_set) { + /* If the address stored by in_set is set by out_set, the +dependency is on the address of the store operation, so +this insn pair is not eligible for store data bypass. */ if (reg_mentioned_p (SET_DEST (out_set), SET_DEST (in_set))) return false; } @@ -3698,11 +3708,15 @@ store_data_bypass_p (rtx_insn *out_insn, rtx_insn { out_exp = XVECEXP (out_pat, 0, i); -if (GET_CODE (out_exp) == CLOBBER) - continue; + if ((GET_CODE (out_exp) == CLOBBER) || (GET_CODE (out_exp) == USE)) + continue; +else if (GET_CODE (out_exp) != SET) + return false; -gcc_assert (GET_CODE (out_exp) == SET); - + /* If the address to which the in_set store operation + writes is set by any of
[PATCH,rs6000] PR80315: Add test cases to confirm ICE has been fixed
PR80315 Reported an Internal Compiler Error when the third argument to __builtin_crypto_vshasigmaw was an integer constant with a value greater than 15. The patch to correct this problem was committed yesterday. This patch adds 4 new test cases to the regression suite. Regression testing has confirmed that these test programs reproduce the error reported with PR80315 before yesterday's patch was applied, and that all test programs pass following application of yesterday's patch. Is this ok for the trunk? gcc/testsuite/ChangeLog: 2017-04-12 Kelvin Nilsen <kel...@gcc.gnu.org> * gcc.target/powerpc/pr80315-1.c: New test. * gcc.target/powerpc/pr80315-2.c: New test. * gcc.target/powerpc/pr80315-3.c: New test. * gcc.target/powerpc/pr80315-4.c: New test. Index: gcc/testsuite/gcc.target/powerpc/pr80315-1.c === --- gcc/testsuite/gcc.target/powerpc/pr80315-1.c(revision 0) +++ gcc/testsuite/gcc.target/powerpc/pr80315-1.c(working copy) @@ -0,0 +1,16 @@ +/* { dg-do compile { target { powerpc*-*-* } } } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */ +/* { dg-require-effective-target powerpc_p8vector_ok } */ +/* { dg-options "-mcpu=power8" } */ + +int +main() +{ + __attribute__((altivec(vector__))) unsigned int test, res; + const int s0 = 0; + int mask; + + /* Argument 2 must be 0 or 1. Argument 3 must be in range 0..15. */ + res = __builtin_crypto_vshasigmaw (test, 1, 0xff); /* { dg-error "argument 3 must be in the range 0..15" } */ + return 0; +} Index: gcc/testsuite/gcc.target/powerpc/pr80315-2.c === --- gcc/testsuite/gcc.target/powerpc/pr80315-2.c(revision 0) +++ gcc/testsuite/gcc.target/powerpc/pr80315-2.c(working copy) @@ -0,0 +1,16 @@ +/* { dg-do compile { target { powerpc*-*-* } } } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */ +/* { dg-require-effective-target powerpc_p8vector_ok } */ +/* { dg-options "-mcpu=power8" } */ + +int +main () +{ + __attribute__((altivec(vector__))) unsigned long long test, res; + const int s0 = 0; + int mask; + + /* Argument 2 must be 0 or 1. Argument 3 must be in range 0..15. */ + res = __builtin_crypto_vshasigmad (test, 1, 0xff); /* { dg-error "argument 3 must be in the range 0..15" } */ + return 0; +} Index: gcc/testsuite/gcc.target/powerpc/pr80315-3.c === --- gcc/testsuite/gcc.target/powerpc/pr80315-3.c(revision 0) +++ gcc/testsuite/gcc.target/powerpc/pr80315-3.c(working copy) @@ -0,0 +1,18 @@ +/* { dg-do compile { target { powerpc*-*-* } } } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */ +/* { dg-require-effective-target powerpc_p8vector_ok } */ +/* { dg-options "-mcpu=power8" } */ + +#include + +vector unsigned int +main () +{ + vector unsigned int test, res; + const int s0 = 0; + int mask; + + /* Argument 2 must be 0 or 1. Argument 3 must be in range 0..15. */ + res = vec_shasigma_be (test, 1, 0xff); /* { dg-error "argument 3 must be in the range 0..15" } */ + return res; +} Index: gcc/testsuite/gcc.target/powerpc/pr80315-4.c === --- gcc/testsuite/gcc.target/powerpc/pr80315-4.c(revision 0) +++ gcc/testsuite/gcc.target/powerpc/pr80315-4.c(working copy) @@ -0,0 +1,18 @@ +/* { dg-do compile { target { powerpc*-*-* } } } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */ +/* { dg-require-effective-target powerpc_p8vector_ok } */ +/* { dg-options "-mcpu=power8" } */ + +#include + +vector unsigned long long int +main () +{ + vector unsigned long long int test, res; + const int s0 = 0; + int mask; + + /* Argument 2 must be 0 or 1. Argument 3 must be in range 0..15. */ + res = vec_shasigma_be (test, 1, 0xff); /* { dg-error "argument 3 must be in the range 0..15" } */ + return res; +}
RFC: seeking insight on store_data_bypass_p (recog.c)
My work on PR80101 is "motivating" me to modify the implementation of store_data_bypass_p (in gcc/recog.c). I have a patch that bootstraps with no regressions. However, I think "regression" testing may not be enough to prove I got this right. If my new patch returns the wrong value, the outcome will be poor instruction scheduling decisions, which will impact performance, but probably not "correctness". So I'd like some help understanding the existing implementation of store_data_bypass_p. To establish some context, here is what I think I understand about this function: 1. As input arguments, out_insn represents an rtl expression that potentially "produces" a store to memory and in_insn represents an rtl expression that potentially "consumes" a value recently stored to memory. 2. If the memory store produced matches the memory fetch consumed, this function returns true to indicate that this sequence of two instructions qualifies for a special "bypass" latency that represents the fact that the fetch will obtain the value out of the write buffer. So, whereas the instruction scheduler might normally expect that this sequence of two instructions would experience Load-Hit-Store penalties associated with cache coherency hardware costs, since these two instruction qualify for the store_data_bypass optimization, the instruction scheduler counts the latency as only 1 or 2 cycles (potentially). [This is what I understand, but I may be wrong, so please correct me if so.] 3. Actually, what I described above is only the "simple" case. It may be that the rtl for either out_insn or in_insn is really a parallel clause with multiple rtl trees beneath it. In this case, we compare the subtrees in a "similar" way to see if the compound expressions qualify for the store_data_bypass_p "optimization". (I've got some questions about how this is done below) As currently implemented, special handling is given to a CLOBBER subtree as part of either PARALLEL expression: we ignore it. This is because CLOBBER does not represent any real machine instructions. It just represents semantic information that might be used by the compiler. In addition to seeking confirmation of my existing understanding of the code as outlined above, the specific questions that I am seeking help with are: 1. In the current implementation (as I understand it), near the top of the function body, we handle the case that the consumer (in_insn) rtl is a single SET expression and the producer (out_insn) rtl is a PARALLEL expression containing multiple sets. The way I read this code, we are requiring that every one of the producer's parallel SET instructions produce the same value that is to be consumed in order to qualify this sequence as a "store data bypass". That seems wrong to me. I would expect that we only need "one" of the produced values to match the consumed value in order to qualify for the "store data bypass" optimization. Please explain. (The same confusing behavior happens below in the same function, in the case that the consumer rtl is a PARALLEL expression of multiple SETs: we require that every producer's stored value match every consumer's fetched value.) 2. A "bigger" concern is that any time any SETs are buried within a PARALLEL tree, I'm not sure the answer produced by this function, as currently implemented, is at all reliable: a) PARALLEL does not necessarily mean all of its subtrees happen in parallel on hardware. It just means that there is no sequencing imposed by the source code, so the final order in which the multiple subtrees beneath the PARALLEL node is not known at this stage of compilation. b) It seems to me that it doesn't really make sense to speak of whether a whole bunch of producers combined with a whole bunch of consumers qualify for an optimized store data bypass latency. If we say that they do qualify (as a group), which pair(s) of producer and consumer machine instructions qualify? It seems we need to know which producer matches with which consumer in order to know where the bypass latencies "fit" into the schedule. c) Furthermore, if it turns out that the "arbitrary" order in which the producer instructions and consumer instructions are emitted places too much "distance" between a producer and the matching consumer, then it is possible that by the time the hardware executes the consumer, the stored value is no longer in the write buffer, so even though we might have "thought" two PARALLEL rtl expressions qualified for the store bypass optimization, we really should have returned false. Can someone help me understand this better? Thanks much. -- Kelvin Nilsen, Ph.D. kdnil...@linux.vnet.ibm.com home office: 801-756-4821, cell: 520-991-6727 IBM Linux Technology Center - PPC Toolchain
[PATCH] PR80101: Fix ICE in store_data_bypass_p
[This is a repost of a patch previously posted on 3/29/2017. Eric, I hope you might consider that this falls within your scope of maintenance. Thanks.] This problem reports an assertion error when certain rtl expressions which are not eligible as producers or consumers of a store bypass optimization are passed as arguments to the store_data_bypass_p function. The proposed patch returns false from store_data_bypass_p rather than terminating with an assertion error. False indicates that the passed arguments are not eligible for the store bypass scheduling optimization. The patch has been boostrapped without regressions on powerpc64le-unknown-linux-gnu. Is this ok for the trunk? gcc/ChangeLog: 2017-03-29 Kelvin Nilsen <kel...@gcc.gnu.org> PR target/80101 * recog.c (store_data_bypass_p): Rather than terminate with assertion error, return false if either function argument is not a single_set or a PARALLEL with SETs inside. gcc/testsuite/ChangeLog: 2017-03-29 Kelvin Nilsen <kel...@gcc.gnu.org> PR target/80101 * gcc.target/powerpc/pr80101-1.c: New test. Index: gcc/recog.c === --- gcc/recog.c (revision 246469) +++ gcc/recog.c (working copy) @@ -3663,9 +3663,12 @@ peephole2_optimize (void) /* Common predicates for use with define_bypass. */ -/* True if the dependency between OUT_INSN and IN_INSN is on the store - data not the address operand(s) of the store. IN_INSN and OUT_INSN - must be either a single_set or a PARALLEL with SETs inside. */ +/* Returns true if the dependency between OUT_INSN and IN_INSN is on + the stored data, false if there is no dependency. Note that a + consumer instruction that loads only the address (rather than the + value) stored by a producer instruction does not represent a + dependency. If IN_INSN or OUT_INSN are not a single_set or a + PARALLEL with SETs inside, this function returns false. */ int store_data_bypass_p (rtx_insn *out_insn, rtx_insn *in_insn) @@ -3701,7 +3704,8 @@ store_data_bypass_p (rtx_insn *out_insn, rtx_insn if (GET_CODE (out_exp) == CLOBBER) continue; -gcc_assert (GET_CODE (out_exp) == SET); + if (GET_CODE (out_exp) != SET) + return false; if (reg_mentioned_p (SET_DEST (out_exp), SET_DEST (in_set))) return false; @@ -3711,7 +3715,8 @@ store_data_bypass_p (rtx_insn *out_insn, rtx_insn else { in_pat = PATTERN (in_insn); - gcc_assert (GET_CODE (in_pat) == PARALLEL); + if (GET_CODE (in_pat) != PARALLEL) + return false; for (i = 0; i < XVECLEN (in_pat, 0); i++) { @@ -3720,7 +3725,8 @@ store_data_bypass_p (rtx_insn *out_insn, rtx_insn if (GET_CODE (in_exp) == CLOBBER) continue; - gcc_assert (GET_CODE (in_exp) == SET); + if (GET_CODE (in_exp) != SET) + return false; if (!MEM_P (SET_DEST (in_exp))) return false; @@ -3734,7 +3740,8 @@ store_data_bypass_p (rtx_insn *out_insn, rtx_insn else { out_pat = PATTERN (out_insn); - gcc_assert (GET_CODE (out_pat) == PARALLEL); + if (GET_CODE (out_pat) != PARALLEL) + return false; for (j = 0; j < XVECLEN (out_pat, 0); j++) { @@ -3743,7 +3750,8 @@ store_data_bypass_p (rtx_insn *out_insn, rtx_insn if (GET_CODE (out_exp) == CLOBBER) continue; - gcc_assert (GET_CODE (out_exp) == SET); + if (GET_CODE (out_exp) != SET) + return false; if (reg_mentioned_p (SET_DEST (out_exp), SET_DEST (in_exp))) return false; Index: gcc/testsuite/gcc.target/powerpc/pr80101-1.c === --- gcc/testsuite/gcc.target/powerpc/pr80101-1.c(revision 0) +++ gcc/testsuite/gcc.target/powerpc/pr80101-1.c(working copy) @@ -0,0 +1,22 @@ +/* { dg-do compile { target { powerpc*-*-* } } } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power6" } } */ +/* { dg-require-effective-target dfp_hw } */ +/* { dg-options "-mcpu=power6 -mno-sched-epilog -Ofast" } */ + +/* Prior to resolving PR 80101, this test case resulted in an internal + compiler error. The role of this test program is to assure that + dejagnu's "test for excess errors" does not find any. */ + +int b; + +void e (); + +int c () +{ + struct + { +int a[b]; + } d; + if (d.a[0]) +e (); +}
[PATCH v2,rs6000] PR80108: Fix ICE with cross compiler
I am reposting this patch, previously posted just moments ago, to correct the subject so that it clarifies that this is a rs6000-specific patch. Thanks. PR 80108 describes an ICE that occurs on an existing test program when compiled with a particular combination of target options. This patch fixes the compiler to reject that particular combination of target options since it is not meaningful and duplicates the offending test case with a dg-options directive to exercise the problematic command-line options. Thanks to feedback from Pat Haugen, Michael Meissner, and Segher Boessenkool, version 2 of this proposed patch integrates the following refinements: 1. Issue an error message when -mpower9-minmax is used in combination with -mcpu=power9 if specific prerequisite target options have been explicitly disabled. 2. Change the exclude-opts clause on the test case's dg-skip-if directive from -mcpu=power9 to -mcpu=405. (This was a copy-and-paste error when this line was borrowed from a different test program.) 3. Remove -m32 from the dg-options directive. Though this target option had been specified in the original problem report, subsequent testing confirmed that the original ICE occurs independent of this option. Eliminating this option allows the regression test to be exercised in more more contexts. This patch has been bootstrapped and tested with no regressions on both powerpc64-unknown-linux-gnu and powerpc64le-unknown-linux-gnu. Is this ok for the trunk? gcc/ChangeLog: 2017-04-06 Kelvin Nilsen <kel...@gcc.gnu.org> PR target/80108 * config/rs6000/rs6000.c (rs6000_option_override_internal): Enhance special handling given to the TARGET_P9_MINMAX option in relation to certain other options. gcc/testsuite/ChangeLog: 2017-04-06 Kelvin Nilsen <kel...@gcc.gnu.org> PR target/80108 * gcc.target/powerpc/ppc-fortran/ppc-fortran.exp: New file. * gcc.target/powerpc/ppc-fortran/pr80108-1.f90: New test. Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 246573) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -4273,8 +4273,40 @@ rs6000_option_override_internal (bool global_init_ /* For the newer switches (vsx, dfp, etc.) set some of the older options, unless the user explicitly used the -mno- to disable the code. */ if (TARGET_P9_VECTOR || TARGET_MODULO || TARGET_P9_DFORM_SCALAR - || TARGET_P9_DFORM_VECTOR || TARGET_P9_DFORM_BOTH > 0 || TARGET_P9_MINMAX) + || TARGET_P9_DFORM_VECTOR || TARGET_P9_DFORM_BOTH > 0) rs6000_isa_flags |= (ISA_3_0_MASKS_SERVER & ~rs6000_isa_flags_explicit); + else if (TARGET_P9_MINMAX) +{ + if (have_cpu) + { + if (cpu_index == PROCESSOR_POWER9) + { + /* legacy behavior: allow -mcpu-power9 with certain +capabilities explicitly disabled. */ + rs6000_isa_flags |= + (ISA_3_0_MASKS_SERVER & ~rs6000_isa_flags_explicit); + /* However, reject this automatic fix if certain +capabilities required for TARGET_P9_MINMAX support +have been explicitly disabled. */ + if (((OPTION_MASK_VSX | OPTION_MASK_UPPER_REGS_SF + | OPTION_MASK_UPPER_REGS_DF) & rs6000_isa_flags) + != (OPTION_MASK_VSX | OPTION_MASK_UPPER_REGS_SF + | OPTION_MASK_UPPER_REGS_DF)) + error ("-mpower9-minmax incompatible with explicitly disabled options"); + } + else + error ("Power9 target option is incompatible with -mcpu= for " + " less than power9"); + } + else if ((ISA_3_0_MASKS_SERVER & rs6000_isa_flags_explicit) + != (ISA_3_0_MASKS_SERVER & rs6000_isa_flags + & rs6000_isa_flags_explicit)) + /* Enforce that none of the ISA_3_0_MASKS_SERVER flags + were explicitly cleared. */ + error ("-mpower9-minmax incompatible with explicitly disabled options"); + else + rs6000_isa_flags |= ISA_3_0_MASKS_SERVER; +} else if (TARGET_P8_VECTOR || TARGET_DIRECT_MOVE || TARGET_CRYPTO) rs6000_isa_flags |= (ISA_2_7_MASKS_SERVER & ~rs6000_isa_flags_explicit); else if (TARGET_VSX) Index: gcc/testsuite/gcc.target/powerpc/ppc-fortran/ppc-fortran.exp === --- gcc/testsuite/gcc.target/powerpc/ppc-fortran/ppc-fortran.exp (revision 0) +++ gcc/testsuite/gcc.target/powerpc/ppc-fortran/ppc-fortran.exp (revision 246624) @@ -0,0 +1,65 @@ +# Copyright (C) 2004-2017 Free Software Foundation, Inc. + +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free
[PATCH v2] PR80108: Fix ICE with cross compiler
PR 80108 describes an ICE that occurs on an existing test program when compiled with a particular combination of target options. This patch fixes the compiler to reject that particular combination of target options since it is not meaningful and duplicates the offending test case with a dg-options directive to exercise the problematic command-line options. Thanks to feedback from Pat Haugen, Michael Meissner, and Segher Boessenkool, version 2 of this proposed patch integrates the following refinements: 1. Issue an error message when -mpower9-minmax is used in combination with -mcpu=power9 if specific prerequisite target options have been explicitly disabled. 2. Change the exclude-opts clause on the test case's dg-skip-if directive from -mcpu=power9 to -mcpu=405. (This was a copy-and-paste error when this line was borrowed from a different test program.) 3. Remove -m32 from the dg-options directive. Though this target option had been specified in the original problem report, subsequent testing confirmed that the original ICE occurs independent of this option. Eliminating this option allows the regression test to be exercised in more more contexts. This patch has been bootstrapped and tested with no regressions on both powerpc64-unknown-linux-gnu and powerpc64le-unknown-linux-gnu. Is this ok for the trunk? gcc/ChangeLog: 2017-04-06 Kelvin Nilsen <kel...@gcc.gnu.org> PR target/80108 * config/rs6000/rs6000.c (rs6000_option_override_internal): Enhance special handling given to the TARGET_P9_MINMAX option in relation to certain other options. gcc/testsuite/ChangeLog: 2017-04-06 Kelvin Nilsen <kel...@gcc.gnu.org> PR target/80108 * gcc.target/powerpc/ppc-fortran/ppc-fortran.exp: New file. * gcc.target/powerpc/ppc-fortran/pr80108-1.f90: New test. Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 246573) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -4273,8 +4273,40 @@ rs6000_option_override_internal (bool global_init_ /* For the newer switches (vsx, dfp, etc.) set some of the older options, unless the user explicitly used the -mno- to disable the code. */ if (TARGET_P9_VECTOR || TARGET_MODULO || TARGET_P9_DFORM_SCALAR - || TARGET_P9_DFORM_VECTOR || TARGET_P9_DFORM_BOTH > 0 || TARGET_P9_MINMAX) + || TARGET_P9_DFORM_VECTOR || TARGET_P9_DFORM_BOTH > 0) rs6000_isa_flags |= (ISA_3_0_MASKS_SERVER & ~rs6000_isa_flags_explicit); + else if (TARGET_P9_MINMAX) +{ + if (have_cpu) + { + if (cpu_index == PROCESSOR_POWER9) + { + /* legacy behavior: allow -mcpu-power9 with certain +capabilities explicitly disabled. */ + rs6000_isa_flags |= + (ISA_3_0_MASKS_SERVER & ~rs6000_isa_flags_explicit); + /* However, reject this automatic fix if certain +capabilities required for TARGET_P9_MINMAX support +have been explicitly disabled. */ + if (((OPTION_MASK_VSX | OPTION_MASK_UPPER_REGS_SF + | OPTION_MASK_UPPER_REGS_DF) & rs6000_isa_flags) + != (OPTION_MASK_VSX | OPTION_MASK_UPPER_REGS_SF + | OPTION_MASK_UPPER_REGS_DF)) + error ("-mpower9-minmax incompatible with explicitly disabled options"); + } + else + error ("Power9 target option is incompatible with -mcpu= for " + " less than power9"); + } + else if ((ISA_3_0_MASKS_SERVER & rs6000_isa_flags_explicit) + != (ISA_3_0_MASKS_SERVER & rs6000_isa_flags + & rs6000_isa_flags_explicit)) + /* Enforce that none of the ISA_3_0_MASKS_SERVER flags + were explicitly cleared. */ + error ("-mpower9-minmax incompatible with explicitly disabled options"); + else + rs6000_isa_flags |= ISA_3_0_MASKS_SERVER; +} else if (TARGET_P8_VECTOR || TARGET_DIRECT_MOVE || TARGET_CRYPTO) rs6000_isa_flags |= (ISA_2_7_MASKS_SERVER & ~rs6000_isa_flags_explicit); else if (TARGET_VSX) Index: gcc/testsuite/gcc.target/powerpc/ppc-fortran/ppc-fortran.exp === --- gcc/testsuite/gcc.target/powerpc/ppc-fortran/ppc-fortran.exp (revision 0) +++ gcc/testsuite/gcc.target/powerpc/ppc-fortran/ppc-fortran.exp (revision 246624) @@ -0,0 +1,65 @@ +# Copyright (C) 2004-2017 Free Software Foundation, Inc. + +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope
[PATCH,rs6000] PR80108: Fix ICE with cross compiler
PR 80108 describes an ICE that occurs on an existing test program when compiled with a particular combination of target options. This patch fixes the compiler to reject that particular combination of target options since it is not meaningful and duplicates the offending test case with a dg-options directive to exercise the problematic command-line options. This patch has been bootstrapped and tested with no regressions on both powerpc64-unknown-linux-gnu and powerpc64le-unknown-linux-gnu. Is this ok for the trunk? gcc/ChangeLog: 2017-03-31 Kelvin Nilsen <kel...@gcc.gnu.org> PR target/80108 * config/rs6000/rs6000.c (rs6000_option_override_internal): Enhance special handling given to the TARGET_P9_MINMAX option in relation to certain other options. gcc/testsuite/ChangeLog: 2017-03-31 Kelvin Nilsen <kel...@gcc.gnu.org> PR target/80108 * gcc.target/powerpc/ppc-fortran/ppc-fortran.exp: New file. * gcc.target/powerpc/ppc-fortran/pr80108-1.f90: New test. Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 246573) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -4273,8 +4273,30 @@ rs6000_option_override_internal (bool global_init_ /* For the newer switches (vsx, dfp, etc.) set some of the older options, unless the user explicitly used the -mno- to disable the code. */ if (TARGET_P9_VECTOR || TARGET_MODULO || TARGET_P9_DFORM_SCALAR - || TARGET_P9_DFORM_VECTOR || TARGET_P9_DFORM_BOTH > 0 || TARGET_P9_MINMAX) + || TARGET_P9_DFORM_VECTOR || TARGET_P9_DFORM_BOTH > 0) rs6000_isa_flags |= (ISA_3_0_MASKS_SERVER & ~rs6000_isa_flags_explicit); + else if (TARGET_P9_MINMAX) +{ + if (have_cpu) + { + if (cpu_index == PROCESSOR_POWER9) + /* legacy behavior: allow -mcpu-power9 with certain capabilities + (eg -mno-vsx) explicitly disabled. */ + rs6000_isa_flags |= + (ISA_3_0_MASKS_SERVER & ~rs6000_isa_flags_explicit); + else + error ("Power9 target option is incompatible with -mcpu= for " + " less than power9"); + } + else if ((ISA_3_0_MASKS_SERVER & rs6000_isa_flags_explicit) + != (ISA_3_0_MASKS_SERVER & rs6000_isa_flags + & rs6000_isa_flags_explicit)) + /* Enforce that none of the ISA_3_0_MASKS_SERVER flags + were explicitly cleared. */ + error ("-mpower9-minmax incompatible with explicitly disabled options"); + else + rs6000_isa_flags |= ISA_3_0_MASKS_SERVER; +} else if (TARGET_P8_VECTOR || TARGET_DIRECT_MOVE || TARGET_CRYPTO) rs6000_isa_flags |= (ISA_2_7_MASKS_SERVER & ~rs6000_isa_flags_explicit); else if (TARGET_VSX) Index: gcc/testsuite/gcc.target/powerpc/ppc-fortran/ppc-fortran.exp === --- gcc/testsuite/gcc.target/powerpc/ppc-fortran/ppc-fortran.exp (revision 0) +++ gcc/testsuite/gcc.target/powerpc/ppc-fortran/ppc-fortran.exp (revision 246624) @@ -0,0 +1,65 @@ +# Copyright (C) 2004-2017 Free Software Foundation, Inc. + +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with GCC; see the file COPYING3. If not see +# <http://www.gnu.org/licenses/>. + +# GCC testsuite that uses the `dg.exp' driver. + +# Load support procs. +load_lib gfortran-dg.exp + +# If a testcase doesn't have special options, use these. +global DEFAULT_FFLAGS +if ![info exists DEFAULT_FFLAGS] then { +set DEFAULT_FFLAGS " -pedantic-errors" +} + +# Initialize `dg'. +dg-init + +global gfortran_test_path +global gfortran_aux_module_flags +set gfortran_test_path $srcdir/$subdir +set gfortran_aux_module_flags $DEFAULT_FFLAGS +proc dg-compile-aux-modules { args } { +global gfortran_test_path +global gfortran_aux_module_flags +if { [llength $args] != 2 } { + error "dg-set-target-env-var: needs one argument" + return +} + +set level [info level] +if { [info procs dg-save-unknown] != [list] } { + rename dg-save-unknown dg-save-unknown-level-$level +} + +dg-test $gfortran_test_path/[lindex $args 1] "" $gfortran_aux_module_flags +# cleanup-modules is intentionally not invoked here. + +if { [info p
[PATCH] PR80101: Fix ICE in store_data_bypass_p
This problem reports an assertion error when certain rtl expressions which are not eligible as producers or consumers of a store bypass optimization are passed as arguments to the store_data_bypass_p function. The proposed patch returns false from store_data_bypass_p rather than terminating with an assertion error. False indicates that the passed arguments are not eligible for the store bypass scheduling optimization. The patch has been boostrapped without regressions on powerpc64le-unknown-linux-gnu. Is this ok for the trunk? gcc/ChangeLog: 2017-03-29 Kelvin Nilsen <kel...@gcc.gnu.org> PR target/80101 * recog.c (store_data_bypass_p): Rather than terminate with assertion error, return false if either function argument is not a single_set or a PARALLEL with SETs inside. gcc/testsuite/ChangeLog: 2017-03-29 Kelvin Nilsen <kel...@gcc.gnu.org> PR target/80101 * gcc.target/powerpc/pr80101-1.c: New test. Index: gcc/recog.c === --- gcc/recog.c (revision 246469) +++ gcc/recog.c (working copy) @@ -3663,9 +3663,12 @@ peephole2_optimize (void) /* Common predicates for use with define_bypass. */ -/* True if the dependency between OUT_INSN and IN_INSN is on the store - data not the address operand(s) of the store. IN_INSN and OUT_INSN - must be either a single_set or a PARALLEL with SETs inside. */ +/* Returns true if the dependency between OUT_INSN and IN_INSN is on + the stored data, false if there is no dependency. Note that a + consumer instruction that loads only the address (rather than the + value) stored by a producer instruction does not represent a + dependency. If IN_INSN or OUT_INSN are not a single_set or a + PARALLEL with SETs inside, this function returns false. */ int store_data_bypass_p (rtx_insn *out_insn, rtx_insn *in_insn) @@ -3701,7 +3704,8 @@ store_data_bypass_p (rtx_insn *out_insn, rtx_insn if (GET_CODE (out_exp) == CLOBBER) continue; -gcc_assert (GET_CODE (out_exp) == SET); + if (GET_CODE (out_exp) != SET) + return false; if (reg_mentioned_p (SET_DEST (out_exp), SET_DEST (in_set))) return false; @@ -3711,7 +3715,8 @@ store_data_bypass_p (rtx_insn *out_insn, rtx_insn else { in_pat = PATTERN (in_insn); - gcc_assert (GET_CODE (in_pat) == PARALLEL); + if (GET_CODE (in_pat) != PARALLEL) + return false; for (i = 0; i < XVECLEN (in_pat, 0); i++) { @@ -3720,7 +3725,8 @@ store_data_bypass_p (rtx_insn *out_insn, rtx_insn if (GET_CODE (in_exp) == CLOBBER) continue; - gcc_assert (GET_CODE (in_exp) == SET); + if (GET_CODE (in_exp) != SET) + return false; if (!MEM_P (SET_DEST (in_exp))) return false; @@ -3734,7 +3740,8 @@ store_data_bypass_p (rtx_insn *out_insn, rtx_insn else { out_pat = PATTERN (out_insn); - gcc_assert (GET_CODE (out_pat) == PARALLEL); + if (GET_CODE (out_pat) != PARALLEL) + return false; for (j = 0; j < XVECLEN (out_pat, 0); j++) { @@ -3743,7 +3750,8 @@ store_data_bypass_p (rtx_insn *out_insn, rtx_insn if (GET_CODE (out_exp) == CLOBBER) continue; - gcc_assert (GET_CODE (out_exp) == SET); + if (GET_CODE (out_exp) != SET) + return false; if (reg_mentioned_p (SET_DEST (out_exp), SET_DEST (in_exp))) return false; Index: gcc/testsuite/gcc.target/powerpc/pr80101-1.c === --- gcc/testsuite/gcc.target/powerpc/pr80101-1.c(revision 0) +++ gcc/testsuite/gcc.target/powerpc/pr80101-1.c(working copy) @@ -0,0 +1,22 @@ +/* { dg-do compile { target { powerpc*-*-* } } } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power6" } } */ +/* { dg-require-effective-target dfp_hw } */ +/* { dg-options "-mcpu=power6 -mno-sched-epilog -Ofast" } */ + +/* Prior to resolving PR 80101, this test case resulted in an internal + compiler error. The role of this test program is to assure that + dejagnu's "test for excess errors" does not find any. */ + +int b; + +void e (); + +int c () +{ + struct + { +int a[b]; + } d; + if (d.a[0]) +e (); +}
[PATCH,rs6000] PR80103: Fix ICE with cross compiler
PR 80103 provides a test case which results in an internal compiler error when invoked with -mno-direct-move -mpower9-dform- vector target options. The internal compiler error results because these two target options are incompatible with each other. The enclosed patch simply disables this particular combination of target options, terminating gcc with an error message instead of producing an internal compiler error. Additionally, this patch includes new comments to address omissions from a patch committed on 2017/03/23 which deals with conflicts between the -mno-power9-vector and -mcpu=power9 target options. This patch has been bootstrapped and tested with no regressions on both powerpc64-unknown-linux-gnu and powerpc64le-unknown-linux-gnu. Is this ok for the trunk? gcc/testsuite/ChangeLog: 2017-03-24 Kelvin Nilsen <kel...@gcc.gnu.org> PR target/80103 * gcc.target/powerpc/pr80103-1.c: New test. gcc/ChangeLog: 2017-03-24 Kelvin Nilsen <kel...@gcc.gnu.org> PR target/80103 * config/rs6000/rs6000-c.c (rs6000_target_modify_macros): Edit and add comments. * config/rs6000/rs6000.c (rs6000_option_override_internal): Add special handling for target option conflicts between dform options (-mpower9-dform, -mpower9-dform-vector, -mpower9-dform-scalar) and -mno-direct-move. Index: gcc/config/rs6000/rs6000-c.c === --- gcc/config/rs6000/rs6000-c.c(revision 246406) +++ gcc/config/rs6000/rs6000-c.c(working copy) @@ -429,6 +429,12 @@ rs6000_target_modify_macros (bool define_p, HOST_W if ((flags & OPTION_MASK_POPCNTD) != 0) rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR7"); /* Note that the OPTION_MASK_DIRECT_MOVE flag is automatically + turned on in the following condition: + 1. TARGET_P9_DFORM_SCALAR or TARGET_P9_DFORM_VECTOR are enabled +and OPTION_MASK_DIRECT_MOVE is not explicitly disabled. +Hereafter, the OPTION_MASK_DIRECT_MOVE flag is considered to +have been turned on explicitly. + Note that the OPTION_MASK_DIRECT_MOVE flag is automatically turned off in any of the following conditions: 1. TARGET_HARD_FLOAT, TARGET_ALTIVEC, or TARGET_VSX is explicitly disabled and OPTION_MASK_DIRECT_MOVE was not explicitly @@ -473,8 +479,13 @@ rs6000_target_modify_macros (bool define_p, HOST_W if (!flag_iso) rs6000_define_or_undefine_macro (define_p, "__APPLE_ALTIVEC__"); } - /* Note that the OPTION_MASK_VSX flag is automatically turned off in + /* Note that the OPTION_MASK_VSX flag is automatically turned on in the following conditions: + 1. TARGET_P8_VECTOR is explicitly turned on and the OPTION_MASK_VSX +was not explicitly turned off. Hereafter, the OPTION_MASK_VSX +flag is considered to have been explicitly turned on. + Note that the OPTION_MASK_VSX flag is automatically turned off in + the following conditions: 1. The operating system does not support saving of AltiVec registers (OS_MISSING_ALTIVEC). 2. If any of the options TARGET_HARD_FLOAT, TARGET_FPRS, @@ -507,6 +518,12 @@ rs6000_target_modify_macros (bool define_p, HOST_W rs6000_define_or_undefine_macro (define_p, "__TM_FENCE__"); } /* Note that the OPTION_MASK_P8_VECTOR flag is automatically turned + on in the following conditions: + 1. TARGET_P9_VECTOR is explicitly turned on and +OPTION_MASK_P8_VECTOR is not explicitly turned off. +Hereafter, the OPTION_MASK_P8_VECTOR flag is considered to +have been turned off explicitly. + Note that the OPTION_MASK_P8_VECTOR flag is automatically turned off in the following conditions: 1. If any of TARGET_HARD_FLOAT, TARGET_ALTIVEC, or TARGET_VSX were turned off explicitly and OPTION_MASK_P8_VECTOR flag was @@ -514,15 +531,24 @@ rs6000_target_modify_macros (bool define_p, HOST_W 2. If TARGET_ALTIVEC is turned off. Hereafter, the OPTION_MASK_P8_VECTOR flag is considered to have been turned off explicitly. - 3. If TARGET_VSX is turned off. Hereafter, the OPTION_MASK_P8_VECTOR - flag is considered to have been turned off explicitly. */ + 3. If TARGET_VSX is turned off and OPTION_MASK_P8_VECTOR was not +explicitly enabled. If TARGET_VSX is explicitly enabled, the +OPTION_MASK_P8_VECTOR flag is hereafter also considered to + have been turned off explicitly. */ if ((flags & OPTION_MASK_P8_VECTOR) != 0) rs6000_define_or_undefine_macro (define_p, "__POWER8_VECTOR__"); /* Note that the OPTION_MASK_P9_VECTOR flag is automatically turned off in the following conditions: - 1. If TARGET_P8_VECTOR is turned off. Hereafter, the - OPTION_MASK_P9_VECTOR flag is considered to have been turned off -
Re: [PATCH,rs6000] Handle conflicting target options -mno-power9-vector and -mcpu=power9
On 03/22/2017 05:35 PM, Segher Boessenkool wrote: > On Wed, Mar 22, 2017 at 11:44:49AM -0600, Kelvin Nilsen wrote: >> Internal testing recently revealed that use of the -mno-power9-vector >> target option in combination with the -mcpu=power9 target option >> results in termination of gcc with the error message: >> >> power9-dform requires power9-vector > >> In both cases, the preferred behavior is that the target option >> -mno-power9-vector causes power9-dform to be automatically disabled. >> This patch implements the preferred behavior and adds a test case to >> demonstrate the fix. > > Or it could do -mpower9-dform-scalar but disable -mpower9-dform-vector? > That seems more reasonable. The internal problem report sent to me said "-mno-power9-vector should override power9-dform unless the latter has been deliberately specified by the user." I'm just following orders. If you think it preferable to only override -mpower-dform-vector, I'll make that modification. > > Ideally none of the -mpower9-dform* or -mpower9-vector options would > exist at all, of course. > >> 2017-03-21 Kelvin Nilsen <kel...@gcc.gnu.org> >> >> * config/rs6000/rs6000.c (rs6000_option_override_internal): Change >> handling of certain combinations of target options, including the >> combinations -mpower8-vector vs. -mno-vsx, -mpower8-vector vs. >> -mno-power8-vector, and -mpower9_dform vs. -mno-power9-vector. > > Those other changes are independent? Actually, these other changes are not independent. My initial attempt at a patch only changed the behavior of -mpower9_dform vs. -mno-power9-vector. But this actually resulted in a regression of an existing test. To "properly" handle the new case without impacting existing "established" behavior (as represented in the existing dejagnu testsuite), I had to make these other changes as well. > > > Segher > > -- Kelvin Nilsen, Ph.D. kdnil...@linux.vnet.ibm.com home office: 801-756-4821, cell: 520-991-6727 IBM Linux Technology Center - PPC Toolchain
[PATCH,rs6000] Handle conflicting target options -mno-power9-vector and -mcpu=power9
Internal testing recently revealed that use of the -mno-power9-vector target option in combination with the -mcpu=power9 target option results in termination of gcc with the error message: power9-dform requires power9-vector This same problem is seen if the -mno-power9-vector target option is specified to a gcc which was built using --with-cpu=power9 as an argument to configure. In both cases, the preferred behavior is that the target option -mno-power9-vector causes power9-dform to be automatically disabled. This patch implements the preferred behavior and adds a test case to demonstrate the fix. The patch has been bootstrapped and tested with no regressions on both powerpc64-unknown-linux-gnu and powerpc64le-unknown-linux-gnu. Is this ok for the trunk? gcc/testsuite/ChangeLog: 2017-03-21 Kelvin Nilsen <kel...@gcc.gnu.org> * gcc.target/powerpc/p9-options-1.c: New test. gcc/ChangeLog: 2017-03-21 Kelvin Nilsen <kel...@gcc.gnu.org> * config/rs6000/rs6000.c (rs6000_option_override_internal): Change handling of certain combinations of target options, including the combinations -mpower8-vector vs. -mno-vsx, -mpower8-vector vs. -mno-power8-vector, and -mpower9_dform vs. -mno-power9-vector. Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 246212) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -4246,9 +4246,22 @@ rs6000_option_override_internal (bool global_init_ if (TARGET_P8_VECTOR && !TARGET_VSX) { - if (rs6000_isa_flags_explicit & OPTION_MASK_P8_VECTOR) + if ((rs6000_isa_flags_explicit & OPTION_MASK_P8_VECTOR) + && (rs6000_isa_flags_explicit & OPTION_MASK_VSX)) error ("-mpower8-vector requires -mvsx"); - rs6000_isa_flags &= ~OPTION_MASK_P8_VECTOR; + else if ((rs6000_isa_flags_explicit & OPTION_MASK_P8_VECTOR) == 0) + { + rs6000_isa_flags &= ~OPTION_MASK_P8_VECTOR; + if (rs6000_isa_flags_explicit & OPTION_MASK_VSX) + rs6000_isa_flags_explicit |= OPTION_MASK_P8_VECTOR; + } + else + { + /* OPTION_MASK_P8_VECTOR is explicit, and OPTION_MASK_VSX is +not explicit. */ + rs6000_isa_flags |= OPTION_MASK_VSX; + rs6000_isa_flags_explicit |= OPTION_MASK_VSX; + } } if (TARGET_VSX_TIMODE && !TARGET_VSX) @@ -4448,9 +4461,22 @@ rs6000_option_override_internal (bool global_init_ error messages. However, if users have managed to select power9-vector without selecting power8-vector, they already know about undocumented flags. */ - if (rs6000_isa_flags_explicit & OPTION_MASK_P8_VECTOR) + if ((rs6000_isa_flags_explicit & OPTION_MASK_P9_VECTOR) && + (rs6000_isa_flags_explicit & OPTION_MASK_P8_VECTOR)) error ("-mpower9-vector requires -mpower8-vector"); - rs6000_isa_flags &= ~OPTION_MASK_P9_VECTOR; + else if ((rs6000_isa_flags_explicit & OPTION_MASK_P9_VECTOR) == 0) + { + rs6000_isa_flags &= ~OPTION_MASK_P9_VECTOR; + if (rs6000_isa_flags_explicit & OPTION_MASK_P8_VECTOR) + rs6000_isa_flags_explicit |= OPTION_MASK_P9_VECTOR; + } + else + { + /* OPTION_MASK_P9_VECTOR is explicit and +OPTION_MASK_P8_VECTOR is not explicit. */ + rs6000_isa_flags |= OPTION_MASK_P8_VECTOR; + rs6000_isa_flags_explicit |= OPTION_MASK_P8_VECTOR; + } } /* -mpower9-dform turns on both -mpower9-dform-scalar and @@ -4479,10 +4505,25 @@ rs6000_option_override_internal (bool global_init_ error messages. However, if users have managed to select power9-dform without selecting power9-vector, they already know about undocumented flags. */ - if (rs6000_isa_flags_explicit & OPTION_MASK_P9_VECTOR) + if ((rs6000_isa_flags_explicit & OPTION_MASK_P9_VECTOR) + && (rs6000_isa_flags_explicit & (OPTION_MASK_P9_DFORM_SCALAR + | OPTION_MASK_P9_DFORM_VECTOR))) error ("-mpower9-dform requires -mpower9-vector"); - rs6000_isa_flags &= ~(OPTION_MASK_P9_DFORM_SCALAR - | OPTION_MASK_P9_DFORM_VECTOR); + else if (rs6000_isa_flags_explicit & OPTION_MASK_P9_VECTOR) + { + rs6000_isa_flags &= + ~(OPTION_MASK_P9_DFORM_SCALAR | OPTION_MASK_P9_DFORM_VECTOR); + rs6000_isa_flags_explicit |= + (OPTION_MASK_P9_DFORM_SCALAR | OPTION_MASK_P9_DFORM_VECTOR); + } + else + { + /* We know that OPTION_MASK_P9_VECTOR is not explicit and +OPTION_MASK_P9_DFORM_SCALAR or OPTION_MASK_P9_DORM_VECTOR +may be explicit. */ + rs6000_isa_flags |= OPTION_MASK_P9_VEC
[PATCH,rs6000] Add documentation to describe implicit handling of command-line target options
This patch adds comments to clarify the automatic setting and clearing of target attribute flags in order to assure consistency between configuration settings and between multiple interrelated compilation target options. Particular attention is given to the target options that affect the C preprocessor macros that are automatically defined to denote support is enabled for particular target options. This patch consists entirely of new comments. Nevertheless, it has been bootstrapped on powerpc64le-unknown-linux with no regressions. Is this ok for trunk? gcc/ChangeLog: 2017-03-17 Kelvin Nilsen <kel...@gcc.gnu.org> * config/rs6000/rs6000-c.c (rs6000_target_modify_macros): Add comments. * config/rs6000/rs6000.c (rs6000_option_override_internal): Add comments. Index: gcc/config/rs6000/rs6000-c.c === --- gcc/config/rs6000/rs6000-c.c(revision 246086) +++ gcc/config/rs6000/rs6000-c.c(working copy) @@ -343,6 +343,71 @@ rs6000_target_modify_macros (bool define_p, HOST_W (define_p) ? "define" : "undef", flags, bu_mask); + /* Each of the flags mentioned below controls whether certain + preprocessor macros will be automatically defined when + preprocessing source files for compilation by this compiler. + While most of these flags can be enabled or disabled + explicitly by specifying certain command-line options when + invoking the compiler, there are also many ways in which these + flags are enabled or disabled implicitly, based on compiler + defaults, configuration choices, and on the presence of certain + related command-line options. Many, but not all, of these + implicit behaviors can be found in file "rs6000.c", the + rs6000_option_override_internal() function. + + In general, each of the flags may be automatically enabled in + any of the following conditions: + + 1. If no -mcpu target is specified on the command line and no + --with-cpu target is specified to the configure command line + and the TARGET_DEFAULT macro for this default cpu host + includes the flag, and the flag has not been explicitly disabled + by command-line options. + + 2. If the target specified with -mcpu=target on the command line, or + in the absence of a -mcpu=target command-line option, if the + target specified using --with-cpu=target on the configure + command line, is disqualified because the associated binary + tools (e.g. the assembler) lack support for the requested cpu, + and the TARGET_DEFAULT macro for this default cpu host + includes the flag, and the flag has not been explicitly disabled + by command-line options. + + 3. If either of the above two conditions apply except that the + TARGET_DEFAULT macro is defined to equal zero, and + TARGET_POWERPC64 and + a) BYTES_BIG_ENDIAN and the flag to be enabled is either + MASK_PPC_GVXOPT or MASK_POWERPC64 (flags for "powerpc64" + target), or + b) !BYTES_BIG_ENDIAN and the flag to be enabled is either + MASK_POWERPC64 or it is one of the flags included in + ISA_2_7_MASKS_SERVER (flags for "powerpc64le" target). + + 4. If a cpu has been requested with a -mcpu=target command-line option + and this cpu has not been disqualified due to shortcomings of the + binary tools, and the set of flags associated with the requested cpu + include the flag to be enabled. See rs6000-cpus.def for macro + definitions that represent various ABI standards + (e.g. ISA_2_1_MASKS, ISA_3_0_MASKS_SERVER) and for a list of + the specific flags that are associated with each of the cpu + choices that can be specified as the target of a -mcpu=target + compile option, or as the the target of a --with-cpu=target + configure option. Target flags that are specified in either + of these two ways are considered "implicit" since the flags + are not mentioned specifically by name. + + Additional documentation describing behavior specific to + particular flags is provided below, immediately preceding the + use of each relevant flag. + + 5. If there is no -mcpu=target command-line option, and the cpu + requested by a --with-cpu=target command-line option has not + been disqualified due to shortcomings of the binary tools, and + the set of flags associated with the specified target include + the flag to be enabled. See the notes immediately above for a + summary of the flags associated with particular cpu + definitions. */ + /* rs6000_isa_flags based options. */ rs6000_define_or_undefine_macro (define_p, "_ARCH_PPC"); if ((flags & OPTION_MASK_PPC_GPOP
[PATCH,RS6000] PR79963: Correct which condition code bit represents result of vec_any_eq built-in function
This patch corrects several errors in a patch that was submitted on 2017-03-01. A copy-and-paste error in the previous patch resulted in accidental use of the lt flag instead of the eq flag to represent the outcome of the vec_any_eq built-in function. Also, in reviewing the code of the previous patch, it was discovered that changes to the C++ templates representing the vec_all_ne and vec_any_eq built-in functions were incomplete. This patch has bootstrapped and been tested on powerpc64le-unknown-linux with no regressions. Is this ok for trunk? gcc/ChangeLog: 2017-03-14 Kelvin Nilsen <kel...@gcc.gnu.org> PR target/79963 * config/rs6000/altivec.h (vec_all_ne): Under __cplusplus++ and __POWER9_VECTOR__ #ifdef control, change template definition to use Power9-specific built-in function. (vec_any_eq): Likewise. * config/rs6000/vector.md (vector_ae_v2di_p): Change the flag used to control outcomes from this test. (vector_ae_p): For VEC_F modes, likewise. Index: gcc/config/rs6000/altivec.h === --- gcc/config/rs6000/altivec.h (revision 246096) +++ gcc/config/rs6000/altivec.h (working copy) @@ -521,9 +521,9 @@ __altivec_scalar_pred(vec_all_nez, __altivec_scalar_pred(vec_any_eqz, __builtin_vec_vcmpnez_p (__CR6_LT_REV, a1, a2)) __altivec_scalar_pred(vec_all_ne, - __builtin_vec_allne_p (a1, a2)) + __builtin_vec_vcmpne_p (a1, a2)) __altivec_scalar_pred(vec_any_eq, - __builtin_vec_anyeq_p (a1, a2)) + __builtin_vec_vcmpae_p (a1, a2)) #endif __altivec_scalar_pred(vec_any_ne, Index: gcc/config/rs6000/vector.md === --- gcc/config/rs6000/vector.md (revision 246096) +++ gcc/config/rs6000/vector.md (working copy) @@ -790,7 +790,7 @@ (eq:V2DI (match_dup 1) (match_dup 2)))]) (set (match_operand:SI 0 "register_operand" "=r") - (lt:SI (reg:CC CR6_REGNO) + (eq:SI (reg:CC CR6_REGNO) (const_int 0))) (set (match_dup 0) (xor:SI (match_dup 0) @@ -837,7 +837,7 @@ (eq:VEC_F (match_dup 1) (match_dup 2)))]) (set (match_operand:SI 0 "register_operand" "=r") - (lt:SI (reg:CC CR6_REGNO) + (eq:SI (reg:CC CR6_REGNO) (const_int 0))) (set (match_dup 0) (xor:SI (match_dup 0)
[PATCH,rs6000] PR79395: Fix compile error with -mcpu=power9 and -mno-vsx and __builtin_vec_cmpne_p
PR 79395 reports a problem that arises when the preprocessor believes that the target supports Power9 but the gcc compiler believes that Power9 is not supported. This patch addresses this inconsistency by introducing a new preprocessor macro named __POWER9_VECTOR__ which is automatically defined if the current gcc configuration, as adjusted by gcc command line options, supports Power9. Previously, certain macro definitions that were supplied in altivec.h were conditioned upon the _ARCH_PWR9 macro, which represents statically whether the compiler can support Power9, but ignores any command-line options that might disable the Power9 support in this run of the compiler. Also addressed in this patch is elimination of the xvcmpnesp and xvcmpnedp instructions, which are not currently supported. This patch has been demonstrated to fix the problems identified in the test case mentioned in the PR 79395 report. This patch has been bootstrapped and tested on powerpc64le-unknown-linux with no regressions. Is this ok for trunk? gcc/ChangeLog: 2017-02-28 Kelvin Nilsen <kel...@gcc.gnu.org> PR target/79395 * config/rs6000/altivec.h (vec_ctz and others): Change the preprocessor macro that controls conditional compilation from _ARCH_PWR9 to __POWER9_VECTOR. (vec_all_ne): Change parameterization of __altivec_scalar_pred macro expansion under preprocessor #ifdef __POWER9_VECTOR__ control (instead of _ARCH_PWR9 control) so that template definition uses power9-specific function. (vec_any_eq): Likewise. (vec_all_ne): Change macro definition to use a power9-specific expansion under #ifdef __POWER9_VECTOR CONTROL (instead of _ARCH_PWR9 control). (vec_any_eq) Likewise. * config/rs6000/rs6000-builtin.def (CMPNEF): Remove BU_P9V_AV_2 expansion for CMPNEF to remove support for xvcmpnesp instruction. (CMPNED): Remove BU_P9V_AV2 expansion for CMPNED to remove support for xvcmpnedp instruction. (VCMPNEB_P): Replace BU_P9V_AV_P macro expansion with BU_P9V_AV_2 macro expansion so that Power9 implementation of vec_all_ne does not use the AltiVec predicate framework. (VCMPNEH_P): Likewise. (VCMPNEW_P): Likewise. (VCMPNED_P): Likewise. (VCMPNEFP_P): Likewise. (VCMPNEDP_P): Likewise. (VCMPAEB_P): Add BU_P9V_AV_2 macro expansion to change implementation of vec_any_eq to not use AltiVec predicate framework. (VCMPAEH_P): Likewise. (VCMPAEW_P): Likewise. (VCMPAED_P): Likewise. (VCMPAEFP_P): Likewise. (VCMPAEDP_P): Likewise. (VCMPNE_P): Replace BU_P9V_OVERLOAD_P macro expansion with BU_P9V_OVERLOAD_2 so that Power9 implementation of vec_all_ne does not use the AltiVec predicate framework. (VCMPAE_P): Add BU_P9V_OVERLOAD_2 macro to change implementation of vec_any_eq to not use AltiVec predicate framework. * config/rs6000/rs6000-c.c (rs6000_target_modify_macros): Add support for predefined __POWER9_VECTOR__ macro to indicate that Power9 instruction selection is enabled. (altivec_overloaded_builtins): Remove extraneous ALTIVEC_BUILTIN_VEC_CMPNE entry for overloaded function argument types RS6000_BTI_bool_V16QI and RS6000_BTI_bool_V16QI. Remove erroneous ALTIVEC_BUILTIN_VEC_CMPNE entry for overloaded function argument types RS6000_BTI_bool_V4SI andRS6000_BTI_bool_V4SI, mapping to P9V_BUILTIN_CMPNEB. Remove two entries mapping to P9V_BUITIN_CMPNED and one entry mapping to P9V_BUILTIN_CMPNEF to force use of instructions not specific to Power9 for impelmentations of vec_cmpne. Change the signature for all definitions of the overloaded P9V_BUILTIN_VEC_CMPNE_P function (representing vec_all_ne) to remove the previously described first argument of type RS6000_BTI_INTSI, as this was an artifact of reliance on the AltiVec predicate framework, which is no longer used in the implementation of these functions. Add P9V_BUILTIN_VEC_VCMPAE_P entries (representing the vec_anyeq function) to match all of the P9V_BUILTIN_VEC_VCMNE_P entries since, unlike the AltiVec predicate framework implementation, we do not share function descriptors between vec_alle and vec_anyeq. (altivec_resolve_overloaded_builtin): Add SFmode and DFmode to the set of modes that receive special treatment even when TARGET_P9_VECTOR is true. The special treatment emits code that does not depend on Power9 instructions. * config/rs6000/vector.md (vector_ne__p): Change this define_expand to not rely on AltiVec predicate framework. (vector_ae_p): New define_expand to represent vec_any_eq function. (vector_ne_v2di_p): Change this define_
[PATCH,rs6000] PR78056: Remove unreliable test case
This patch amends a patch merged with the trunk on 2017-01-14. One of the new test cases added at that time has proven to be unreliable so this path removes it. Is this patch ok for trunk? gcc/testsuite/ChangeLog: 2017-02-17 Kelvin Nilsen <kel...@gcc.gnu.org> PR target/78056 * gcc.target/powerpc/pr78056-8.c: Remove. Index: gcc/testsuite/gcc.target/powerpc/pr78056-8.c === --- gcc/testsuite/gcc.target/powerpc/pr78056-8.c(revision 245539) +++ gcc/testsuite/gcc.target/powerpc/pr78056-8.c(working copy) @@ -1,26 +0,0 @@ -/* { dg-do compile { target { powerpc*-*-* } } } */ -/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power5" } } */ - -/* powerpc_popcntb_ok represents support for power 5. */ -/* { dg-require-effective-target powerpc_popcntb_ok } */ -/* dfp_hw represents support for power 6. */ -/* { dg-skip-if "" { dfp_hw } } */ -/* { dg-skip-if "" { powerpc*-*-aix* } } */ -/* { dg-options "-mcpu=power5" } */ - -/* This test follows the pattern of pr78056-2.c, which has been - * exercised with binutils 2.25. This test, however, has not - * been exercised because the author of the test does not have access - * to a development environment that succesfully bootstraps gcc - * while at the same lacking assembler support for power 6. */ - -/* This test should succeed on both 32- and 64-bit configurations. */ -/* Though the command line specifies power5 target, this function is - to support power6. Expect an error message here because this target - does not support power6. */ -__attribute__((target("cpu=power6"))) -/* fabs/fnabs/fsel */ -double normal1 (double a, double b) -{ /* { dg-warning "lacks power6 support" } */ - return __builtin_copysign (a, b); /* { dg-warning "implicit declaration" } */ -}
[PATCH v2] PR68972: g++.dg/cpp1y/vla-initlist1.C test case fails on power
This second version of the proposed patch removes redundant and unnecessary default arguments to the dg-skip-if directive, as requested by Rainer Orth. Thank you Rainer for your review and feedback. The test g++.dg/cpp1y/vla-initlist1.C makes assumptions that the memory used to represent the private temporary variables of neighboring control blocks at the same control nesting level is: 1. found at the same address, and 2. not overwritten between when the first block ends and the second block begins. While these assumptions are valid with some optimization choices on some architectures, these assumptions do not hold universally. With optimization disabled on the power architecture, the g++.dg/cpp1y/vla-initlist1.C test program runs initialization code to allocate the variable-length array a[] before entry into the second of two neighboring control blocks. This initialization code overwrites the first two cells of the array i[] that were initialized by the first of the two neighboring control blocks. Thus, the initialization value stored into i[1] is no longer present when this value is subsequently fetched as a[1].i from within the second control block. This patch disables this particular test case on power hardware. The patch has been bootstrapped and tested on powerpc64le-unknown-linux with no regressions. Is this ok for trunk? gcc/testsuite/ChangeLog: 2017-02-07 Kelvin Nilsen <kel...@gcc.gnu.org> PR target/68972 * g++.dg/cpp1y/vla-initlist1.C: Add dg-skip-if directive to disable this test on power architecture. Index: gcc/testsuite/g++.dg/cpp1y/vla-initlist1.C === --- gcc/testsuite/g++.dg/cpp1y/vla-initlist1.C (revision 245156) +++ gcc/testsuite/g++.dg/cpp1y/vla-initlist1.C (working copy) @@ -1,4 +1,5 @@ // { dg-do run { target c++11 } } +// { dg-skip-if "power overwrites two slots of array i" { "power*-*-*" } } // { dg-options "-Wno-vla" } #include
[PATCH] PR68972: g++.dg/cpp1y/vla-initlist1.C test case fails on power
The test g++.dg/cpp1y/vla-initlist1.C makes assumptions that the memory used to represent the private temporary variables of neighboring control blocks at the same control nesting level is: 1. found at the same address, and 2. not overwritten between when the first block ends and the second block begins. While these assumptions are valid with some optimization choices on some architectures, these assumptions do not hold universally. With optimization disabled on the power architecture, the g++.dg/cpp1y/vla-initlist1.C test program runs initialization code to allocate the variable-length array a[] before entry into the second of two neighboring control blocks. This initialization code overwrites the first two cells of the array i[] that were initialized by the first of the two neighboring control blocks. Thus, the initialization value stored into i[1] is no longer present when this value is subsequently fetched as a[1].i from within the second control block. This patch disables this particular test case on power hardware. The patch has been bootstrapped and tested on powerpc64le-unknown-linux with no regressions. Is this ok for trunk? gcc/testsuite/ChangeLog: 2017-02-06 Kelvin Nilsen <kel...@gcc.gnu.org> PR target/68972 * g++.dg/cpp1y/vla-initlist1.C: Add dg-skip-if directive to disable this test on power architecture. Index: gcc/testsuite/g++.dg/cpp1y/vla-initlist1.C === --- gcc/testsuite/g++.dg/cpp1y/vla-initlist1.C (revision 245156) +++ gcc/testsuite/g++.dg/cpp1y/vla-initlist1.C (working copy) @@ -1,4 +1,5 @@ // { dg-do run { target c++11 } } +// { dg-skip-if "power overwrites two slots of array i" { "power*-*-*" } { "*" } { "" } } // { dg-options "-Wno-vla" } #include
[PATCH] PR66669: Fix failure of gcc.dg/loop-8.c on Power
The test gcc.dg/loop-8.c makes assumptions that are not valid on Power architecture (and on certain other architectures for which this issue has already been addressed). The test case assumes that a single loop-invariant statement will be moved outside the loop. On Power, a constant is copy-propagated within the loop, and the subsequent loop-invariant code motion moves two loop-invariant statements out of the loop. This patch simply disables this test case on Power architecture. gcc/testsuite/ChangeLog: 2017-01-23 Kelvin Nilsen <kel...@gcc.gnu.org> PR target/9 * gcc.dg/loop-8.c: Modify dg-skip-if directive to exclude this test on powerpc targets. Index: gcc/testsuite/gcc.dg/loop-8.c === --- gcc/testsuite/gcc.dg/loop-8.c (revision 244730) +++ gcc/testsuite/gcc.dg/loop-8.c (working copy) @@ -1,6 +1,6 @@ /* { dg-do compile } */ /* { dg-options "-O1 -fdump-rtl-loop2_invariant" } */ -/* { dg-skip-if "unexpected IV" { "hppa*-*-* mips*-*-* visium-*-*" } { "*" } { "" } } */ +/* { dg-skip-if "unexpected IV" { "hppa*-*-* mips*-*-* visium-*-* powerpc*-*-*" } { "*" } { "" } } */ void f (int *a, int *b)
[PATCH,rs6000] Correct argument and result types for binary floating point built-in functions
This patch corrects several errors in a patch originally committed on 2016-08-10. The following corrections are required to maintain compliance with "Power Architecture 64-Bit ELF V2 ABI Specification", also known as "OpenPOWER ABI for Linux Supplement". vector unsigned long long vec_extract_exp (vector double); (instead of vector long long vec_extract_exp (vector double)) vector unsigned int vec_extract_exp (vector float); (instead of vector int vec_extract_exp (vector float)) vector unsigned long long vec_extract_sig (vector double); (instead of vector long long vec_extract_sig (vector double)) vector unsigned int vec_extract_sig (vector float); (instead of vector int vec_extract_sig (vector float)) vector double vec_insert_exp (vector double, vector unsigned long long); vector float vec_insert_exp (vector float, vector unsigned int); (the above two are new forms, to complement the existing forms which take matching integer arguments) vector bool int vec_test_data_class (vector float, const int); (instead of vector int vec_test_class (vector float, unsigned int)) vector bool long long vec_test_data_class (vector double, const int); (instead of vector long long vec_test_data_class (vector double, unsigned int)) Though the following functions are not defined in the ABI specification, they were also corrected to provide improved consistency with the corresponding vector functions: double scalar_insert_exp (double, unsigned long long); (The above was added to complement the existing form: double scalar_insert_exp (unsigned long long int, unsigned long long int)) bool scalar_test_data_class (double, const int); (instead of int scalar_test_data_class (double, unsigned int)) bool scalar_test_data_class (float, const int); (instead of int scalar_test_data_class (float, unsigned int)) bool scalar_test_neg (double); (instead of int scalar_test_neg (double)) bool scalar_test_neg (float); (instead of int scalar_test_neg (float)) This patch has bootstrapped and tested on powerpcle-unknown-linux (little-endian) and on powerpc-unknown-linux (big-endian, with both -m32 and -m64 target option) with no regressions. Is this ok for the trunk? gcc/testsuite/ChangeLog: 2017-01-19 Kelvin Nilsen <kel...@gcc.gnu.org> * gcc.target/powerpc/bfp/scalar-insert-exp-3.c: New test. * gcc.target/powerpc/bfp/scalar-insert-exp-4.c: New test. * gcc.target/powerpc/bfp/scalar-insert-exp-5.c: New test. * gcc.target/powerpc/bfp/scalar-test-data-class-0.c: Adjust return type of test function to reflect change in built-in function's return type. * gcc.target/powerpc/bfp/scalar-test-data-class-1.c: Likewise. * gcc.target/powerpc/bfp/scalar-test-data-class-2.c: Likewise. * gcc.target/powerpc/bfp/scalar-test-data-class-3.c: Likewise. * gcc.target/powerpc/bfp/scalar-test-data-class-4.c: Adjust return type and second argument type to reflect change in built-in function's type signature. * gcc.target/powerpc/bfp/scalar-test-data-class-5.c: Likewise. * gcc.target/powerpc/bfp/scalar-test-data-class-6.c: Adjust return type of test function to reflect change in built-in function's return type. * gcc.target/powerpc/bfp/scalar-test-data-class-7.c: Likewise. * gcc.target/powerpc/bfp/scalar-test-neg-0.c: Likewise. * gcc.target/powerpc/bfp/scalar-test-neg-1.c: Likewise. * gcc.target/powerpc/bfp/scalar-test-neg-2.c: Likewise. * gcc.target/powerpc/bfp/scalar-test-neg-3.c: Likewise. * gcc.target/powerpc/bfp/vec-extract-exp-0.c: Likewise. * gcc.target/powerpc/bfp/vec-extract-exp-1.c: Likewise. * gcc.target/powerpc/bfp/vec-extract-exp-2.c: Likewise. * gcc.target/powerpc/bfp/vec-extract-exp-3.c: Likewise. * gcc.target/powerpc/bfp/vec-extract-sig-0.c: Likewise. * gcc.target/powerpc/bfp/vec-extract-sig-1.c: Likewise. * gcc.target/powerpc/bfp/vec-extract-sig-2.c: Likewise. * gcc.target/powerpc/bfp/vec-extract-sig-3.c: Likewise. * gcc.target/powerpc/bfp/vec-insert-exp-4.c: New test. * gcc.target/powerpc/bfp/vec-insert-exp-5.c: New test. * gcc.target/powerpc/bfp/vec-insert-exp-6.c: New test. * gcc.target/powerpc/bfp/vec-insert-exp-7.c: New test. * gcc.target/powerpc/bfp/vec-test-data-class-0.c: Adjust return type of test function to reflect change in built-in function's return type. * gcc.target/powerpc/bfp/vec-test-data-class-1.c: Likewise. * gcc.target/powerpc/bfp/vec-test-data-class-2.c: Likewise. * gcc.target/powerpc/bfp/vec-test-data-class-3.c: Likewise. * gcc.target/powerpc/bfp/vec-test-data-class-4.c: Likewise. * gcc.target/powerpc/bfp/vec-test-data-class-5.c: Likewise. * gcc.target/powerpc/bfp/vec-test-da
[PATCH v2,rs6000] PR78056: Finish fixing build failure on Power7
This patch adds warning messages and test cases to an initial patch already submitted and committed to the trunk on October 26, 2016. The earlier patch disables initialization of built-in functions which depend on assembler capabilities that are not supported by the associated tool chain. The original patch was submitted before the work was considered complete because it was desired to expedite a fix to allow builds on Power7. At the time the original patch was submitted for approval, the following additional tasks were planned. 1. Fail with an assertion error instead of an internal compiler error if built-in functions are ever defined for which the corresponding instruction pattern is not supported by the current compiler configuration. 2. Issue a warning message whenever a command-line -mcpu=XXX request seeks to configure support for a CPU version which is not supported by the accompanying assembler. Besides addressing the above tasks, this new patch also adds a number of tests to exercise different target configurations. This second version of the patch differs from the first revision (which had been sent for review on Dec. 9) in the following ways: 1. Removed #define directives from rs6000.c which were defining HAVE_AS_POWER9, HAVE_AS_POWER8, HAVE_AS_POPCNTD, HAVE_AS_DFP, and HAVE_AS_POPCNTB macros. Rewrote the code that made use of these macros to be conditioned on #ifdef. 2. Removed redundant parentheses in an expression that defines the value of the default_cpu variable. 3. Replaced multiple occurrences of (d->icode > 0) with (d->icode != CODE_FOR_nothing). 4. Replaced two comments which claimed that it is expected that d->icode equals CODE_FOR_nothing to say instead that d->icode may equal CODE_FOR_nothing. 5. Added a new effective-target named powerpc_popcntb_ok and required this effective target in the pr78056-8.c test case. The patch has been tested with three different tool chains supporting up to power7, power8, and power9 respectively. It has successfully bootstrapped and tested without regressions on powerpc64le-unknown-linux and powerpc-unknown-linux (big-endian, with both -m32 and -m64 target options) with no regressions. Is this patch ok for trunk? gcc/testsuite/ChangeLog: 2016-12-16 Kelvin Nilsen <kel...@gcc.gnu.org> PR target/78056 * gcc.target/powerpc/pr78056-1.c: New test. * gcc.target/powerpc/pr78056-2.c: New test. * gcc.target/powerpc/pr78056-3.c: New test. * gcc.target/powerpc/pr78056-4.c: New test. * gcc.target/powerpc/pr78056-5.c: New test. * gcc.target/powerpc/pr78056-6.c: New test. * gcc.target/powerpc/pr78056-7.c: New test. * gcc.target/powerpc/pr78056-8.c: New test. * lib/target-supports.exp (check_effective_target_powerpc_popcntb_ok): New procedure to test whether the effective target supports the popcntb instruction. gcc/ChangeLog: 2016-12-16 Kelvin Nilsen <kel...@gcc.gnu.org> PR target/78056 * doc/sourcebuild.texi (PowerPC-specific attributes): Add documentation of the powerpc_popcntb_ok attribute. * config/rs6000/rs6000.c (rs6000_option_override_internal): Add code to issue warning messages if a requested CPU configuration is not supported by the binary (assembler and loader) toolchain. (spe_init_builtins): Add two assertions to prevent ICE if attempt is made to define a built-in function that has been disabled. (paired_init_builtins): Add assertion to prevent ICE if attempt is made to define a built-in function that has been disabled. (altivec_init_builtins): Add comment explaining why definition of the DST built-in functions is not preceded by an assertion check. Add assertions to prevent ICE if attempts are made to define an altivec predicate or an abs* built-in function that has been disabled. (htm_init_builtins): Add comment explaining why definition of the htm built-in functions is not preceded by an assertion check. Index: gcc/doc/sourcebuild.texi === --- gcc/doc/sourcebuild.texi(revision 241606) +++ gcc/doc/sourcebuild.texi(working copy) @@ -1763,6 +1763,10 @@ PowerPC target supports @code{-mhtm} @item powerpc_p8vector_ok PowerPC target supports @code{-mpower8-vector} +@item powerpc_popcntb_ok +PowerPC target supports the @code{popcntb} instruction, indicating +that this target supports @code{-mcpu=power5}. + @item powerpc_ppu_ok PowerPC target supports @code{-mcpu=cell}. Index: gcc/testsuite/gcc.target/powerpc/pr78056-1.c === --- gcc/testsuite/gcc.target/powerpc/pr78056-1.c(revision 0) +++ gcc/testsuite/gcc.target/powerpc/pr78056-1.c(revision 241861) @@ -0,0 +1,18 @@ +/* { dg-do compile { target { powerpc*-*-* }
Re: [PATCH v3,rs6000] Add built-in function support for Power9 byte instructions
Thanks for your quick feedback. I'll update the comments regarding possible future enhancement to support QImode for operands[1] as well. Regarding the two test cases that are missing the scan-assembler directive (byte-in-set-1.c and byte-in-set-2.c), those tests are both expected to fail. They are checking that the compiler rejects those programs with appropriate error messages. On 12/13/2016 03:14 PM, Segher Boessenkool wrote: > Hi Kelvin, > > On Mon, Dec 12, 2016 at 05:40:05PM -0700, Kelvin Nilsen wrote: >> The patch has been bootstrapped and tested on >> powerpc64le-unknown-linux and powerpc-unknown-linux (big-endian, with >> both -m32 and -m64 target options) with no regressions. >> >> Is this ok for the trunk? > > Yes it is, much better, thanks! Two comments below, please fix the testcase > one before commit if it is indeed a problem: > >> +;; Though the instructions to which this expansion maps operate on >> +;; 64-bit registers, the current implementation only operates on >> +;; SI-mode operands as the high-order bits provide no information >> +;; that is not already available in the low-order bits. To avoid the >> +;; costs of data widening operations, a future enhancement might add >> +;; support for DI-mode operands. > > And operands[1] could be QImode. > >> +(define_expand "cmprb" >> + [(set (match_dup 3) >> +(unspec:CC [(match_operand:SI 1 "gpc_reg_operand" "r") >> +(match_operand:SI 2 "gpc_reg_operand" "r")] >> + UNSPEC_CMPRB)) > > >> --- gcc/testsuite/gcc.target/powerpc/byte-in-set-1.c (revision 0) >> +++ gcc/testsuite/gcc.target/powerpc/byte-in-set-1.c (working copy) > > Did you forget the scan-assembler here and in the next one, or do you only > want to test it does indeed compile? > > > Segher > > -- Kelvin Nilsen, Ph.D. kdnil...@linux.vnet.ibm.com home office: 801-756-4821, cell: 520-991-6727 IBM Linux Technology Center - PPC Toolchain
[PATCH v3,rs6000] Add built-in function support for Power9 byte instructions
This patch adds built-in function support for the new setb, cmprb, and cmpeqb Power9 instructions. This third version of the patch differs from the second in the following ways: 1. Changed the name of the *cmprb, *setb, *cmprb2, and *cmpeqb new instructions to *cmprb_internal, *setb_internal, and *cmprb2_internal respectively. 2. Added comments to the cmprb, setb, and cmprb2 instructions to acknowledge that, as implemented, we do not currently support the use of double-integer operands though support for this might be added in the future. 3. Changed the names of the new non-overloaded builtin functions to be of the form __builtin_scalar_ instead of __builtin_altivec_. Changed the names of the new overloaded functions to be of the form __builtin_ instead of __builtin_scalar_. 4. Corrected the comments describing range encodings and simplified the descriptions by speaking of individual bytes instead of bit numbers (cmprb, cmprb2, cmpeqb define_expand patterns and *cmprb_internal, *cmprb2_internal, *cmpeqb_internal define_insn patterns). 5. Updated documentation to use the new function names and to speak of range encodings in terms of individual bytes instead of bit numbers. 6. Changed the test cases to use the new function names. 7. Corrected bit shifting of arguments in the byte-in-range-0.c and byte-in-range-1.c test cases. The patch has been bootstrapped and tested on powerpc64le-unknown-linux and powerpc-unknown-linux (big-endian, with both -m32 and -m64 target options) with no regressions. Is this ok for the trunk? gcc/testsuite/ChangeLog: 2016-12-12 Kelvin Nilsen <kel...@gcc.gnu.org> * gcc.target/powerpc/byte-in-either-range-0.c: New test. * gcc.target/powerpc/byte-in-either-range-1.c: New test. * gcc.target/powerpc/byte-in-range-0.c: New test. * gcc.target/powerpc/byte-in-range-1.c: New test. * gcc.target/powerpc/byte-in-set-0.c: New test. * gcc.target/powerpc/byte-in-set-1.c: New test. * gcc.target/powerpc/byte-in-set-2.c: New test. gcc/ChangeLog: 2016-12-12 Kelvin Nilsen <kel...@gcc.gnu.org> * config/rs6000/altivec.md (UNSPEC_CMPRB): New unspec value. (UNSPEC_CMPRB2): New unspec value. (UNSPEC_CMPEQB): New unspec value. (cmprb): New expansion. (*cmprb_internal): New insn. (*setb_internal): New insn. (cmprb2): New expansion. (*cmprb2_internal): New insn. (cmpeqb): New expansion. (*cmpeqb_internal): New insn. * config/rs6000/rs6000-builtin.def (BU_P9_2): New macro. (BU_P9_64BIT_2): Likewise. (BU_P9_OVERLOAD_2): Likewise. (CMPRB): Add byte-in-range built-in function. (CMBRB2): Add byte-in-either-range built-in function. (CMPEQB): Add byte-in-set built-in function. (CMPRB): Add overload support for byte-in-range function. (CMPRB2): Add overload support for byte-in-either-range function. (CMPEQB): Add overload support for byte-in-set built-in function. * config/rs6000/rs6000-c.c (P9_BUILTIN_CMPRB): Macro expansion to define argument types for new builtin. (P9_BUILTIN_CMPRB2): Likewise. (P9_BUILTIN_CMPEQB): Likewise. * doc/extend.texi (PowerPC AltiVec Built-in Functions): Rearrange the order of presentation for certain built-in functions (scalar_extract_exp, scalar_extract_sig, scalar_insert_exp) (scalar_cmp_exp_gt, scalar_cmp_exp_lt, scalar_cmp_exp_eq) (scalar_cmp_exp_unordered, scalar_test_data_class) (scalar_test_neg) to improve locality and flow. Document the new __builtin_scalar_byte_in_set, __builtin_scalar_byte_in_range, and __builtin_scalar_byte_in_either_range functions. Index: gcc/config/rs6000/altivec.md === --- gcc/config/rs6000/altivec.md(revision 241245) +++ gcc/config/rs6000/altivec.md(working copy) @@ -153,6 +153,9 @@ UNSPEC_BCDADD UNSPEC_BCDSUB UNSPEC_BCD_OVERFLOW + UNSPEC_CMPRB + UNSPEC_CMPRB2 + UNSPEC_CMPEQB ]) (define_c_enum "unspecv" @@ -3709,6 +3712,189 @@ "darn %0,1" [(set_attr "type" "integer")]) +;; Test byte within range. +;; +;; The bytes of operand 1 are organized as xx:xx:xx:vv, where xx +;; represents a byte whose value is ignored in this context and +;; vv, the least significant byte, holds the byte value that is to +;; be tested for membership within the range specified by operand 2. +;; The bytes of operand 2 are organized as xx:xx:hi:lo. +;; +;; Return in target register operand 0 a value of 1 if lo <= vv and +;; vv <= hi. Otherwise, set register operand 0 to 0. +;; +;; Though the instructions to which this expansion maps operate on +;; 64-bit registers, the current implementation only operates on +;; SI-mode operands as the high-order bits provide no informati
[PATCH] PR78056: Finish fixing build failure on Power7
This patch adds warning messages and test cases to an initial patch already submitted and committed to the trunk on October 26, 2016. The earlier patch disables initialization of built-in functions which depend on assembler capabilities that are not supported by the associated tool chain. The original patch was submitted before the work was considered complete because it was desired to expedite a fix to allow builds on Power7. At the time the original patch was submitted for approval, the following additional tasks were planned. 1. Fail with an assertion error instead of an internal compiler error if built-in functions are ever defined for which the corresponding instruction pattern is not supported by the current compiler configuration. 2. Issue a warning message whenever a command-line -mcpu=XXX request seeks to configure support for a CPU version which is not supported by the accompanying assembler. Besides addressing the above tasks, this new patch also adds a number of tests to exercise different target configurations. The patch has been tested with three different tool chains supporting up to power7, power8, and power9 respectively. It has successfully bootstrapped and tested without regressions on powerpc64le-unknown-linux and powerpc-unknown-linux (big-endian, with both -m32 and -m64 target options) with no regressions. Is this patch ok for trunk? gcc/ChangeLog: 2016-12-08 Kelvin Nilsen <kel...@gcc.gnu.org> PR target/78056 * config/rs6000/rs6000.c: Provide default macro definitions for HAVE_AS_POPCNB, HAVE_AS_DFP, HAVE_AS_POPCNTD, HAVE_AS_POWER8, HAVE_AS_POWER9. (rs6000_option_override_internal): Add code to issue warning messages if a requested CPU configuration is not supported by the binary (assembler and loader) toolchain. (spe_init_builtins): Add two assertions to prevent ICE if attempt is made to define a built-in function that has been disabled. (paired_init_builtins): Add assertion to prevent ICE if attempt is made to define a built-in function that has been disabled. (altivec_init_builtins): Add comment explaining why definition of the DST built-in functions is not preceded by an assertion check. Add assertions to prevent ICE if attempts are made to define an altivec predicate or an abs* built-in function that has been disabled. (htm_init_builtins): Add comment explaining why definition of the htm built-in functions is not preceded by an assertion check. gcc/testsuite/ChangeLog: 2016-12-08 Kelvin Nilsen <kel...@gcc.gnu.org> PR target/78056 * gcc.target/powerpc/pr78056-1.c: New test. * gcc.target/powerpc/pr78056-2.c: New test. * gcc.target/powerpc/pr78056-3.c: New test. * gcc.target/powerpc/pr78056-4.c: New test. * gcc.target/powerpc/pr78056-5.c: New test. * gcc.target/powerpc/pr78056-6.c: New test. * gcc.target/powerpc/pr78056-7.c: New test. * gcc.target/powerpc/pr78056-8.c: New test. Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 241606) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -84,6 +84,22 @@ #define min(A,B) ((A) < (B) ? (A) : (B)) #define max(A,B) ((A) > (B) ? (A) : (B)) +#ifndef HAVE_AS_POWER9 +#define HAVE_AS_POWER9 0 +#endif +#ifndef HAVE_AS_POWER8 +#define HAVE_AS_POWER8 0 +#endif +#ifndef HAVE_AS_POPCNTD +#define HAVE_AS_POPCNTD 0 +#endif +#ifndef HAVE_AS_DFP +#define HAVE_AS_DFP 0 +#endif +#ifndef HAVE_AS_POPCNTB +#define HAVE_AS_POPCNTB 0 +#endif + /* Structure used to define the rs6000 stack */ typedef struct rs6000_stack { int reload_completed;/* stack info won't change from here on */ @@ -3860,6 +3876,62 @@ rs6000_option_override_internal (bool global_init_ gcc_assert (cpu_index >= 0); + if (have_cpu) +{ + if (!HAVE_AS_POWER9 + && (processor_target_table[rs6000_cpu_index].processor + == PROCESSOR_POWER9)) + { + have_cpu = false; + warning (0, "will not generate power9 instructions because " + "assembler lacks power9 support"); + } + if (!HAVE_AS_POWER8 + && (processor_target_table[rs6000_cpu_index].processor + == PROCESSOR_POWER8)) + { + have_cpu = false; + warning (0, "will not generate power8 instructions because " + "assembler lacks power8 support"); + } + if (!HAVE_AS_POPCNTD + && (processor_target_table[rs6000_cpu_index].processor + == PROCESSOR_POWER7)) + { + have_cpu = false; + warning (0, "will not generate power7 instructions because " + "assembler lacks power7 support&q
[PATCH v2,rs6000] Add built-in function support for Power9 byte instructions.
This patch adds built-in function support for the new setb, cmprb, and cmpeqb Power9 instructions. This second version of the patch differs from the first in the following ways: 1. Removed the UNSPEC_SETB new unspec value. Rewrote these patterns to describe semantics in terms of primitive RTL. 2. Changed the names of the cmprb_p, cmprb2_p, and cmpeqb_p define_insn patterns to cmprb, cmprb2, and cmpeqb respectively. 3. Fixed two typos in the ChangeLog file. 4. Fixed comments that describe the cmprb and cmprb2 define_expand patterns. 5. Fixed comments that describe the *cmprb, *setb, and *cmprb2 define_insn patterns. 6. Removed trailing space in description of the cmpeqb define_expand pattern. The patch has been bootstrapped and tested on powerpc64le-unknown-linux and powerpc-unknown-linux (big-endian, with both -m32 and -m64 target options) with no regressions. Is this ok for the trunk? gcc/testsuite/ChangeLog: 2016-12-05 Kelvin Nilsen <kel...@gcc.gnu.org> * gcc.target/powerpc/byte-in-either-range-0.c: New test. * gcc.target/powerpc/byte-in-either-range-1.c: New test. * gcc.target/powerpc/byte-in-range-0.c: New test. * gcc.target/powerpc/byte-in-range-1.c: New test. * gcc.target/powerpc/byte-in-set-0.c: New test. * gcc.target/powerpc/byte-in-set-1.c: New test. * gcc.target/powerpc/byte-in-set-2.c: New test. gcc/ChangeLog: 2016-12-05 Kelvin Nilsen <kel...@gcc.gnu.org> * config/rs6000/altivec.md (UNSPEC_CMPRB): New unspec value. (UNSPEC_CMPRB2): New unspec value. (UNSPEC_CMPEQB): New unspec value. (cmprb): New expansion. (*cmprb): New insn. (*setb): New insn. (cmprb2): New expansion. (*cmprb2): New insn. (cmpeqb): New expansion. (*cmpeqb): New insn. * config/rs6000/rs6000-builtin.def (BU_P9V_64BIT_AV_2): New macro. (BU_P9_OVERLOAD_2): Likewise. (CMPRB): Add byte-in-range built-in function. (CMBRB2): Add byte-in-either-range built-in function. (CMPEQB): Add byte-in-set built-in function. (CMPRB): Add overload support for byte-in-range function. (CMPRB2): Add overload support for byte-in-either-range function. (CMPEQB): Add overload support for byte-in-set built-in function. * config/rs6000/rs6000-c.c (P9V_BUILTIN_SCALAR_CMPRB): Macro expansion to define argument types for new builtin. (P9V_BUILTIN_SCALAR_CMPRB2): Likewise. (P9V_BUILTIN_SCALAR_CMPEQB): Likewise. * doc/extend.texi (PowerPC AltiVec Built-in Functions): Rearrange the order of presentation for certain built-in functions (scalar_extract_exp, scalar_extract_sig, scalar_insert_exp) (scalar_cmp_exp_gt, scalar_cmp_exp_lt, scalar_cmp_exp_eq) (scalar_cmp_exp_unordered, scalar_test_data_class) (scalar_test_neg) to improve locality and flow. Document the new __builtin_scalar_byte_in_set, __builtin_scalar_byte_in_range, and __builtin_scalar_byte_in_either_range functions. Index: gcc/config/rs6000/altivec.md === --- gcc/config/rs6000/altivec.md(revision 241245) +++ gcc/config/rs6000/altivec.md(working copy) @@ -153,6 +153,9 @@ UNSPEC_BCDADD UNSPEC_BCDSUB UNSPEC_BCD_OVERFLOW + UNSPEC_CMPRB + UNSPEC_CMPRB2 + UNSPEC_CMPEQB ]) (define_c_enum "unspecv" @@ -3709,6 +3712,138 @@ "darn %0,1" [(set_attr "type" "integer")]) +;; Predicate: test byte within range. +;; Return in target register operand 0 a value of 1 if the byte +;; held in bits 24:31 of operand 1 is within the inclusive range +;; bounded above by operand 2's bits 0:7 and below by operand 2's +;; bits 8:15. Otherwise, set register operand 0 to 0. +(define_expand "cmprb" + [(set (match_dup 3) + (unspec:CC [(match_operand:SI 1 "gpc_reg_operand" "r") + (match_operand:SI 2 "gpc_reg_operand" "r")] +UNSPEC_CMPRB)) + (set (match_operand:SI 0 "gpc_reg_operand" "=r") + (if_then_else:SI (lt (match_dup 3) +(const_int 0)) +(const_int -1) +(if_then_else (gt (match_dup 3) + (const_int 0)) + (const_int 1) + (const_int 0] + "TARGET_P9_MISC" +{ + operands[3] = gen_reg_rtx (CCmode); +}) + +;; Set bit 1 (the GT bit, 0x4) of CR register operand 0 to 1 iff the +;; byte found in bits 24:31 of register operand 1 is within the +;; inclusive range bounded above by operand 2's bits 0:7 and below by +;; operand 2's bits 8:15. The other 3 bits of the target CR register +;; are set to 0. +(define_insn "*cmprb" + [(set (ma
[PATCH,rs6000] Correct mode of operand 2 in vector extract half-word and word instruction patterns
This patch corrects an error in a patch committed on 2016-10-18 to add built-in function support for Power9 string operations. In that original patch, the mode for operand 2 of the newly added vector extract half-word and full-word instruction patterns was described as V16QI, even though those instruction patterns were conceptually operating on V8HI and V4SI operands respectively. This patch changes the modes of the operands for these instruction patterns to better represent the intended types. This patch improves readability and maintainability of code. It does not affect correctness of generated code, since the existing implementation implicitly coerces the operand types to the declared type. The patch has been bootstrapped and tested on powerpc64le-unknown-linux without regressions. Is this ok for the trunk? gcc/ChangeLog: 2016-11-30 Kelvin Nilsen <kel...@gcc.gnu.org> PR target/78577 * config/rs6000/vsx.md (vextuhlx): Revise mode of operand 2. (vextuhrx): Likewise. (vextuwlx): Likewise. (vextuwrx): Likewise. Index: gcc/config/rs6000/vsx.md === --- gcc/config/rs6000/vsx.md(revision 242948) +++ gcc/config/rs6000/vsx.md(working copy) @@ -3648,7 +3648,7 @@ [(set (match_operand:SI 0 "register_operand" "=r") (unspec:SI [(match_operand:SI 1 "register_operand" "r") - (match_operand:V16QI 2 "altivec_register_operand" "v")] + (match_operand:V8HI 2 "altivec_register_operand" "v")] UNSPEC_VEXTUHLX))] "TARGET_P9_VECTOR" "vextuhlx %0,%1,%2" @@ -3659,7 +3659,7 @@ [(set (match_operand:SI 0 "register_operand" "=r") (unspec:SI [(match_operand:SI 1 "register_operand" "r") - (match_operand:V16QI 2 "altivec_register_operand" "v")] + (match_operand:V8HI 2 "altivec_register_operand" "v")] UNSPEC_VEXTUHRX))] "TARGET_P9_VECTOR" "vextuhrx %0,%1,%2" @@ -3670,7 +3670,7 @@ [(set (match_operand:SI 0 "register_operand" "=r") (unspec:SI [(match_operand:SI 1 "register_operand" "r") - (match_operand:V16QI 2 "altivec_register_operand" "v")] + (match_operand:V4SI 2 "altivec_register_operand" "v")] UNSPEC_VEXTUWLX))] "TARGET_P9_VECTOR" "vextuwlx %0,%1,%2" @@ -3681,7 +3681,7 @@ [(set (match_operand:SI 0 "register_operand" "=r") (unspec:SI [(match_operand:SI 1 "register_operand" "r") - (match_operand:V16QI 2 "altivec_register_operand" "v")] + (match_operand:V4SI 2 "altivec_register_operand" "v")] UNSPEC_VEXTUWRX))] "TARGET_P9_VECTOR" "vextuwrx %0,%1,%2" -- Kelvin Nilsen, Ph.D. kdnil...@linux.vnet.ibm.com home office: 801-756-4821, cell: 520-991-6727 IBM Linux Technology Center - PPC Toolchain
Re: [PATCH,rs6000] Add built-in function support for Power9 byte instructions
> >> Thanks for catching this. I think I got endian confusion inside my head >> while I was writing the above. I will rewrite these comments, below also. > > Note the ISA calls the bits in 32-bit registers 32..63, so that 63 is > the rightmost bit in all registers. > True, but the ISA only uses the lower half of the 64-bit register, so I have describe my patterns using SI mode instead of DI mode, which is part of the reason I was numbering my bits differently than the ISA document. The reason I am using SI mode is so that I don't have to disqualify the use of these functions on a 32-bit big-endian configuration. Do you want me to switch to DI mode for all the operands? >>> I wonder if we really need all these predicate expanders, if it wouldn't >>> be easier if the builtin handling code did the setb itself? >>> >> >> The reason it seems most "natural" to me use the expanders is because I >> need to introduce a temporary CR scratch register between expansion and >> insn matching. Also, it seems that the *setb pattern may be of more >> general use in the future implementation of other built-in functions. >> I'm inclined to keep this as is, but if you still feel otherwise, I'll >> figure out how to avoid the expansion. > > The code (in rs6000.c) expanding the builtin can create two insns directly, > so that you do not need to repeat this over and over in define_expands? > The pattern I'm familiar with is to allocate the temporary scratch register during expansion, and to use the allocated temporary at insn match time. I'll have to teach myself a new pattern to do all of this at insn match time. Feel free to point me to an example of define_insn code that does this. Thanks again. -- Kelvin Nilsen, Ph.D. kdnil...@linux.vnet.ibm.com home office: 801-756-4821, cell: 520-991-6727 IBM Linux Technology Center - PPC Toolchain
Re: [PATCH,rs6000] Add built-in function support for Power9 byte instructions
Thank you very much for the prompt and thorough review. There are a few points below where I'd like to seek further clarification. On 11/15/2016 04:19 AM, Segher Boessenkool wrote: > Hi! > > On Mon, Nov 14, 2016 at 04:43:35PM -0700, Kelvin Nilsen wrote: >> * config/rs6000/altivec.md (UNSPEC_CMPRB): New unspec value. >> (UNSPEC_CMPRB2): New unspec value. > > I wonder if you really need both? The number of arguments will tell > which is which, anyway? I appreciate your preference to avoid proliferation of special-case unspec constants. However, it is a not so straightforward to combine these two cases under the same constant value. The issue is that though the two encoding conceptually represent different "numbers of arguments", the arguments are all packed inside of a 32-bit register. At the RTL level, it looks like the two different forms have the same number of arguments (the same number of register operands). The difference is which bits serve relevant purposes within the incoming register operands. So I'm inclined to keep this as is if that's ok with you. > >> (cmprb_p): New expansion. > > Not such a great name (now you get a gen_cmprb_p function which isn't > a predicate itself). I'll change these names. > >> (CMPRB): Add byte-in-range built-in function. >> (CMBRB2): Add byte-in-either_range built-in function. >> (CMPEQB): Add byte-in-set builtin-in function. > > "builtin-in", and you typoed an underscore? Thanks. > >> +;; Predicate: test byte within range. >> +;; Return in target register operand 0 a non-zero value iff the byte >> +;; held in bits 24:31 of operand 1 is within the inclusive range >> +;; bounded below by operand 2's bits 0:7 and above by operand 2's >> +;; bits 8:15. >> +(define_expand "cmprb_p" > > It seems you got the bit numbers mixed up. Maybe just call it the low > byte, and the byte just above? > > (And it always sets 0 or 1 here, you might want to make that more explicit). > >> +;; Set bit 1 (the GT bit, 0x2) of CR register operand 0 to 1 iff the > > That's 4, i.e. 0b0100. > >> +;; Set operand 0 register to non-zero value iff the CR register named >> +;; by operand 1 has its GT bit (0x2) or its LT bit (0x1) set. >> +(define_insn "*setb" > > LT is 8, GT is 4. If LT is set it returns -1, otherwise if GT is set it > returns 1, otherwise it returns 0. > Thanks for catching this. I think I got endian confusion inside my head while I was writing the above. I will rewrite these comments, below also. >> +;; Predicate: test byte within two ranges. >> +;; Return in target register operand 0 a non-zero value iff the byte >> +;; held in bits 24:31 of operand 1 is within the inclusive range >> +;; bounded below by operand 2's bits 0:7 and above by operand 2's >> +;; bits 8:15 or if the byte is within the inclusive range bounded >> +;; below by operand 2's bits 16:23 and above by operand 2's bits 24:31. >> +(define_expand "cmprb2_p" > > The high bound is higher in the reg than the low bound. See the example > where 0x3930 is used to do isdigit (and yes 0x3039 would be much more > fun, but alas). > >> +;; Predicate: test byte membership within set of 8 bytes. >> +;; Return in target register operand 0 a non-zero value iff the byte >> +;; held in bits 24:31 of operand 1 equals at least one of the eight >> +;; byte values represented by the 64-bit register supplied as operand >> +;; 2. Note that the 8 byte values held within operand 2 need not be >> +;; unique. > > (trailing space) > > I wonder if we really need all these predicate expanders, if it wouldn't > be easier if the builtin handling code did the setb itself? > The reason it seems most "natural" to me use the expanders is because I need to introduce a temporary CR scratch register between expansion and insn matching. Also, it seems that the *setb pattern may be of more general use in the future implementation of other built-in functions. I'm inclined to keep this as is, but if you still feel otherwise, I'll figure out how to avoid the expansion.
[PATCH,rs6000] Add built-in function support for Power9 byte instructions
This patch adds built-in function support for the new setb, cmprb, and cmpeqb Power9 instructions. The patch has been bootstrapped and tested on powerpc64le-unknown-linux and powerpc-unknown-linux (big-endian, with both -m32 and -m64 target options) with no regresions. Is this ok for the trunk? gcc/testsuite/ChangeLog: 2016-11-14 Kelvin Nilsen <kel...@gcc.gnu.org> * gcc.target/powerpc/byte-in-either-range-0.c: New test. * gcc.target/powerpc/byte-in-either-range-1.c: New test. * gcc.target/powerpc/byte-in-range-0.c: New test. * gcc.target/powerpc/byte-in-range-1.c: New test. * gcc.target/powerpc/byte-in-set-0.c: New test. * gcc.target/powerpc/byte-in-set-1.c: New test. * gcc.target/powerpc/byte-in-set-2.c: New test. gcc/ChangeLog: 2016-11-14 Kelvin Nilsen <kel...@gcc.gnu.org> * config/rs6000/altivec.md (UNSPEC_CMPRB): New unspec value. (UNSPEC_CMPRB2): New unspec value. (UNSPEC_CMPEQB): New unspec value. (UNSPEC_SETB): New unspec value. (cmprb_p): New expansion. (*cmprb): New insn. (*setb): New insn. (cmprb2_p): New expansion. (*cmprb2): New insn. (cmpeqb_p): New expansion. (*cmpeqb): New insn. * config/rs6000/rs6000-builtin.def (BU_P9V_64BIT_AV_2): New macro. (BU_P9_OVERLOAD_2): Likewise. (CMPRB): Add byte-in-range built-in function. (CMBRB2): Add byte-in-either_range built-in function. (CMPEQB): Add byte-in-set builtin-in function. (CMPRB): Add overload support for byte-in-range function. (CMPRB2): Add overload support for byte-in-either-range function. (CMPEQB): Add overload support for byte-in-set built-in function. * config/rs6000/rs6000-c.c (P9V_BUILTIN_SCALAR_CMPRB): Macro expansion to define argument types for new builtin. (P9V_BUILTIN_SCALAR_CMPRB2): Macro expansion to define argument types for new builtin. (P9V_BUILTIN_SCALAR_CMPEQB): Macro expansion to define argument types for new builtin. * doc/extend.texi (PowerPC AltiVec Built-in Functions): Rearrange the order of presentation for certain built-in functions (scalar_extract_exp, scalar_extract_sig, scalar_insert_exp) (scalar_cmp_exp_gt, scalar_cmp_exp_lt, scalar_cmp_exp_eq) (scalar_cmp_exp_unordered, scalar_test_data_class) (scalar_test_neg) to improve locality and flow. Document the new __builtin_scalar_byte_in_set, __builtin_scalar_byte_in_range, and __builtin_scalar_byte_in_either_range functions. Index: gcc/config/rs6000/altivec.md === --- gcc/config/rs6000/altivec.md(revision 241245) +++ gcc/config/rs6000/altivec.md(working copy) @@ -153,6 +153,10 @@ UNSPEC_BCDADD UNSPEC_BCDSUB UNSPEC_BCD_OVERFLOW + UNSPEC_CMPRB + UNSPEC_CMPRB2 + UNSPEC_CMPEQB + UNSPEC_SETB ]) (define_c_enum "unspecv" @@ -3709,6 +3713,116 @@ "darn %0,1" [(set_attr "type" "integer")]) +;; Predicate: test byte within range. +;; Return in target register operand 0 a non-zero value iff the byte +;; held in bits 24:31 of operand 1 is within the inclusive range +;; bounded below by operand 2's bits 0:7 and above by operand 2's +;; bits 8:15. +(define_expand "cmprb_p" + [(set (match_dup 3) + (unspec:CC [(match_operand:SI 1 "gpc_reg_operand" "r") + (match_operand:SI 2 "gpc_reg_operand" "r")] +UNSPEC_CMPRB)) + (set (match_operand:SI 0 "gpc_reg_operand" "=r") +(unspec:SI [(match_dup 3)] + UNSPEC_SETB)) + ] + "TARGET_P9_MISC" +{ + operands[3] = gen_reg_rtx (CCmode); +}) + +;; Set bit 1 (the GT bit, 0x2) of CR register operand 0 to 1 iff the +;; byte found in bits 24:31 of register operand 1 is within the +;; inclusive range bounded below by operand 2's bits 0:7 and above by +;; operand 2's bits 8:15. The other 3 bits of the target CR register +;; are set to 0. +(define_insn "*cmprb" + [(set (match_operand:CC 0 "cc_reg_operand" "=y") + (unspec:CC [(match_operand:SI 1 "gpc_reg_operand" "r") + (match_operand:SI 2 "gpc_reg_operand" "r")] +UNSPEC_CMPRB))] + "TARGET_P9_MISC" + "cmprb %0,0,%1,%2" + [(set_attr "type" "logical")]) + +;; Set operand 0 register to non-zero value iff the CR register named +;; by operand 1 has its GT bit (0x2) or its LT bit (0x1) set. +(define_insn "*setb" + [(set (match_operand:SI 0 "gpc_reg_operand" "=r") +(unspec:SI [(match_operand:CC 1 "cc_reg_operand" "y")] + UNSPEC_SETB))] + "TARGET_P9_MISC" + "s
[PATCH] PR78056: Fix build failure on Power7
This patch corrects an error introduced with commit 241314. That patch introduced several new built-in functions to support Power9 string instructions. The error that was found subsequent to the trunk commit is that initialization of the built-in function tables encounters an internal compiler error if the assembler that is used with gcc lacks support for Power9 instructions. This patch disables initialization of built-in functions which depend on assembler capabilities that are not supported by the associated tool chain. This patch has been booted and regression tested on powerpcle-unknown-linux, trunk revision 241406. (I was not able to regression test on the most current trunk because that trunk does not boot.) I have also successfully boot-strapped this patch on a Power7 system for which the assembler lacks support for Power9. (I could not regression test on that platform because that platform could not bootstrap without this patch.) It is planned that a subsequent enhancement to this patch will make the following improvements: 1. Fail with an assertion error instead of an internal compiler error if built-in functions are ever defined for which the corresponding instruction pattern is not supported by the current compiler configuration. 2. Issue a warning message whenever a command-line -mcpu=XXX request seeks to configure support for a CPU version which is not supported by the accompanying assembler. I am submitting the patch as is in order to expedite integration since the error has broken the trunk for certain system configurations. Is this patch ok for trunk? gcc/ChangeLog: 2016-10-25 Kelvin Nilsen <kel...@gcc.gnu.org> PR target/78056 * config/rs6000/rs6000.c (spe_init_builtins): Modify loops to not define builtin functions from the bdesc_spe_predicates or bdesc_spe_evsel arrays if the builtin mask is not compatible with the current compiler configuration. (paired_init_builtins): Modify loop to not define define builtin functions from the bdesc_paried_preds array if the builtin mask is not compatible with the current compiler configuration. (altivec_init_builtins): Modify loops to not define the __builtin_altivec_stxvl function nor the builtin functions from the bdesc_dst or bdesc_altivec_preds, bdesc_abs gcc/testsuite/ChangeLog: 2016-10-25 Kelvin Nilsen <kel...@gcc.gnu.org> PR target/78056 * gcc.target/powerpc/vsu/vec-any-eqz-7.c (test_any_equal): Change expected error message. * gcc.target/powerpc/vsu/vec-xst-len-12.c (store_data): Change expected error message. * gcc.target/powerpc/vsu/vec-all-nez-7.c (test_all_not_equal_and_not_zero): Change expected error message. Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 241406) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -16923,6 +16923,7 @@ spe_init_builtins (void) tree pushort_type_node = build_pointer_type (short_unsigned_type_node); const struct builtin_description *d; size_t i; + HOST_WIDE_INT builtin_mask = rs6000_builtin_mask; tree v2si_ftype_4_v2si = build_function_type_list (opaque_V2SI_type_node, @@ -17063,7 +17064,16 @@ spe_init_builtins (void) for (i = 0; i < ARRAY_SIZE (bdesc_spe_predicates); ++i, d++) { tree type; + HOST_WIDE_INT mask = d->mask; + if ((mask & builtin_mask) != mask) + { + if (TARGET_DEBUG_BUILTIN) + fprintf (stderr, "spe_init_builtins, skip predicate %s\n", +d->name); + continue; + } + switch (insn_data[d->icode].operand[1].mode) { case V2SImode: @@ -17084,7 +17094,16 @@ spe_init_builtins (void) for (i = 0; i < ARRAY_SIZE (bdesc_spe_evsel); ++i, d++) { tree type; + HOST_WIDE_INT mask = d->mask; + if ((mask & builtin_mask) != mask) + { + if (TARGET_DEBUG_BUILTIN) + fprintf (stderr, "spe_init_builtins, skip evsel %s\n", +d->name); + continue; + } + switch (insn_data[d->icode].operand[1].mode) { case V2SImode: @@ -17106,6 +17125,7 @@ paired_init_builtins (void) { const struct builtin_description *d; size_t i; + HOST_WIDE_INT builtin_mask = rs6000_builtin_mask; tree int_ftype_int_v2sf_v2sf = build_function_type_list (integer_type_node, @@ -17141,7 +17161,16 @@ paired_init_builtins (void) for (i = 0; i < ARRAY_SIZE (bdesc_paired_preds); ++i, d++) { tree type; + HOST_WIDE_INT mask = d->mask; + if ((mask & builtin_mask) != mask) + { + if (TARGET_DEBUG_BUILTIN) + fprintf (stderr, "paired_init_builtins, skip predicate %s\n", +d->name); + continue; +
[PATCH,committed] PR77847: Add FALLTRHOUGH comment to fix build error
This trivial/obvious patch was committed without review as svn revision 240783. The patch fixes a compile-time error that recently surfaced with big-endian Power architecture builds. libcpp/ChangeLog: 2016-10-04 Kelvin Nilsen <kel...@gcc.gnu.org> PR target/77847 * lex.c (search_line_fast): Add a FALLTHROUGH comment to correct compiler error in the version of this function that is conditionally compiled when GCC_VERSION >= 4005 and both __ALTIVEC__ and __BIG_ENDIAN__ symbols are defined. Index: libcpp/lex.c === --- libcpp/lex.c(revision 240755) +++ libcpp/lex.c(working copy) @@ -733,6 +733,7 @@ if (l != 0) break; s += sizeof(unsigned long); + /* FALLTHROUGH */ case 2: l = u.l[i++]; if (l != 0)