[PATCH][PR middle-end/79521] Refine condition for calling ira_init_register_move_cost_if_necessary

2017-02-15 Thread Jeff Law


ira_init_register_move_cost_if_necessary asserts have_regs_of_mode[MODE] 
is true.  We need to make sure not to call 
ira_init_register_move_cost_if_necessary when have_regs_of_mode[MODE] is 
false.


Verified the H8 port still builds libgcc, also bootstrapped and 
regression tested on x86_64-linux-gnu and i686-linux-gnu.


Installing on the trunk.

Sorry for the breakage,
Jeff
commit 8f78832934c603a04db68329c195184a142f04a6
Author: Jeff Law 
Date:   Wed Feb 15 23:35:08 2017 -0700

PR middle-end/79521
* ira-costs.c (scan_one_insn): Check have_regs_of_mode before calling
ira_init_register_move_cost_if_necessary.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index d99f444..24d9c15 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,9 @@
+2017-02-15 Jeff Law  
+
+   PR middle-end/79521
+   * ira-costs.c (scan_one_insn): Check have_regs_of_mode before calling
+   ira_init_register_move_cost_if_necessary.
+
 2017-02-15  Martin Sebor  
 
PR middle-end/32003
diff --git a/gcc/ira-costs.c b/gcc/ira-costs.c
index 1737430..9cf0119 100644
--- a/gcc/ira-costs.c
+++ b/gcc/ira-costs.c
@@ -1452,7 +1452,8 @@ scan_one_insn (rtx_insn *insn)
 {
   rtx x = XEXP (PATTERN (insn), 0);
   if (GET_CODE (x) == REG
- && REGNO (x) >= FIRST_PSEUDO_REGISTER)
+ && REGNO (x) >= FIRST_PSEUDO_REGISTER
+ && have_regs_of_mode[GET_MODE (x)])
 ira_init_register_move_cost_if_necessary (GET_MODE (x));
   return insn;
 }


[patch, contrib] Add support to install libcaf-mpi for multi-image coarray Fortran

2017-02-15 Thread Jerry DeLisle
The attached patch adds a new subdirectory called mk-libcaf-multi under contrib 
which contains scripts which will download OpenCoarrays, build libcaf-mpi.a, and 
install it in the user provided --install-prefix.


As given the script is only manually executed by a user interested in doing so.

Eventually we would like to fully integrate the build of libcaf-mpi into gcc to 
provide full multi-image support for gfortran.  These scripts provide an 
intermediate means of doing so, bringing gfortran pretty close to full Fortran 
2008 and 2015 standards.


Providing this will greatly expand user testing and development of gfortran 
based CoArray Fortran (CAF) and simplify for users enabling this modern feature.


Tested on linux-x86-64 (Fedora 25) and MAC Darwin. Requires the user to 
previously have installed  mpich for the mpi library. The build uses cmake and 
bash 3 scripts to enable a lot of useful argument checking and diagnostics.


For those not familiar with Coarrays in Fortran and want or need to explore 
these advanced features, using this script is a very helful way to get started.


Others may chime in with comments or questions.

OK for trunk?

Regards,

Jerry

diff --git a/contrib/mk-libcaf-multi/mk-libcaf-multi.sh b/contrib/mk-libcaf-multi/mk-libcaf-multi.sh
new file mode 100755
index 000..753b035
--- /dev/null
+++ b/contrib/mk-libcaf-multi/mk-libcaf-multi.sh
@@ -0,0 +1,269 @@
+#!/usr/bin/env bash
+
+#  Copyright (C) 2017 Free Software Foundation, Inc.
+#  Contributed by Jerry DeLisle in collaboration with Damian Rousan.
+#
+# This file is part of the GNU Fortran runtime library (libgfortran).
+#
+# Libgfortran is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3, or (at your option)
+# any later version.
+
+# Libgfortran is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+# Under Section 7 of GPL version 3, you are granted additional
+# permissions described in the GCC Runtime Library Exception, version
+# 3.1, as published by the Free Software Foundation.
+
+# You should have received a copy of the GNU General Public License and
+# a copy of the GCC Runtime Library Exception along with this program;
+# see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+# .
+
+# mk-multi-image.sh
+#
+# --- This script downloads and installs OpenCoarrays to directly support mult-image 
+# execution in libgfortran. Execute this script the last step of the libgfortran 
+# make install.
+
+# Portions of this script derive from or call sub-scripts of BASH3 Boilerplate. See
+# the B3B_USE_CASE subdirectory for the substantial portions of the Software and the
+# required permission notices of the MIT License (MIT).
+
+
+export LIBGFORTRAN_SRC_DIR="${LIBGFORTRAN_SRC_DIR:-${PWD%/}}"
+if [[ ! -d "${LIBGFORTRAN_SRC_DIR}/caf" ]]; then
+  echo "File not found: ${LIBGFORTRAN_SRC_DIR}/caf"
+  echo "Please run this script inside the libgfortran source directory or "
+  echo "set LIBGFORTRAN_SRC_DIR to the libgfortran source directory path."
+  exit 1
+fi
+export B3B_USE_CASE="${B3B_USE_CASE:-${LIBGFORTRAN_SRC_DIR}/../contrib/mk-libcaf-multi/utils}"
+if [[ ! -f "${B3B_USE_CASE:-}/bootstrap.sh" ]]; then
+  echo "Please set B3B_USE_CASE to the bash3boilerplate utils directory path."
+  exit 2
+else
+  source "${B3B_USE_CASE}/bootstrap.sh" "$@"
+fi
+
+# Set expected value of present flags that take no arguments
+export __flag_present=1
+
+if [[ "${arg_D}" == "${__flag_present}" || "${arg_L}" == "${__flag_present}" || "${arg_U}" == "${__flag_present}" || ${arg_V}"" == "${__flag_present}" ]]; then 
+   print_debug_only=7
+   if [ "$(( LOG_LEVEL < print_debug_only ))" -ne 0 ]; then
+ debug "Supressing info and debug messages: -v present."
+ suppress_info_debug_messages
+#export LOG_LEVEL=5
+   fi
+fi
+
+# If one of the --print-* arguments is present (or its single-letter equivalanet), we 
+# print its value and exit normally.  Here we print with echo instead of a B3B function u
+# because the output might be used in an assignment to a variable in another script e.g., 
+# version=`mk-multi-image.sh -V`
+
+if [[ "${arg_L}" == "${__flag_present}" ]]; then
+  echo "mpich"
+  exit 0
+fi
+
+# Set the variable 'fetch' to invoke an available downloader utility.
+source ${B3B_USE_CASE}/set_or_print_downloader.sh
+set_or_print_downloader
+if [[ "${arg_D}" == "${__flag_present}" ]]; then
+  echo "${fetch}"
+  exit 0
+fi
+
+opencoarrays_version=${arg_v}
+if [[ "${arg_V}" == "${__flag_present}" ]]; then
+  echo "${opencoarrays_version}"
+  exit 0
+fi
+
+# Set to true just before releasing (false avoids inflating
+# OpenCoarrays download statistics)
+tracked_download="true"

Re: [Patch, fortran] PR79434 - [submodules] separate module procedure breaks encapsulation

2017-02-15 Thread Jerry DeLisle
On 02/15/2017 10:12 AM, Paul Richard Thomas wrote:
> Dear All,
> 
> Another straightforward patch, although it took some head scratching
> to realize how simple the fix could be :-)
> 
> Bootstraps and regtests on FC23/x_86_64 - OK for trunk and 6-branch?
> 

Yes, OK

Jerry


Re: [Patch, fortran] PR79447 - [F08] gfortran rejects valid & accepts invalid internal subprogram in a submodule

2017-02-15 Thread Jerry DeLisle
On 02/15/2017 10:01 AM, Paul Richard Thomas wrote:
> Dear All,
> 
> This patch is straightforward, verging on 'obvious'.
> 
> Bootstraps and regtests on FC23/x86_64 - OK for trunk and 6 branch?
> 
Yes, OK

Jerry


Re: [gomp4] adjust num_gangs and add a diagnostic for unsupported num_workers

2017-02-15 Thread Thomas Schwinge
Hi Cesar!

On Mon, 13 Feb 2017 08:58:39 -0800, Cesar Philippidis  
wrote:
> This patch does the followings:
> 
>  * Adjusts the default num_gangs to utilize more of GPU hardware.
>  * Teach libgomp to emit a diagnostic when num_workers isn't supported.
> 
> [...]

Thanks!

> This patch has been applied to gomp-4_0-branch.

For easier review, I'm quoting here your r245393 commit with whitespace
changes ignored:

> --- libgomp/plugin/plugin-nvptx.c
> +++ libgomp/plugin/plugin-nvptx.c
> @@ -917,10 +918,15 @@ nvptx_exec (void (*fn), size_t mapnum, void 
> **hostaddrs, void **devaddrs,
>   seen_zero = 1;
>  }
>  
> -  if (seen_zero)
> -{
> -  /* See if the user provided GOMP_OPENACC_DIM environment
> -  variable to specify runtime defaults. */
> +  /* Both reg_granuarlity and warp_granuularity were extracted from
> + the "Register Allocation Granularity" in Nvidia's CUDA Occupancy
> + Calculator spreadsheet.  Specifically, this required SM_30+
> + targets.  */
> +  const int reg_granularity = 256;

That is per warp, so a granularity of 256 / 32 = 8 registers per thread.
(Would be strange otherwise.)

> +  const int warp_granularity = 4;
> +
> +  /* See if the user provided GOMP_OPENACC_DIM environment variable to
> + specify runtime defaults. */
>static int default_dims[GOMP_DIM_MAX];
>  
>pthread_mutex_lock (_dev_lock);
> @@ -952,25 +958,30 @@ nvptx_exec (void (*fn), size_t mapnum, void 
> **hostaddrs, void **devaddrs,
>CUdevice dev = nvptx_thread()->ptx_dev->dev;
>/* 32 is the default for known hardware.  */
>int gang = 0, worker = 32, vector = 32;
> -   CUdevice_attribute cu_tpb, cu_ws, cu_mpc, cu_tpm;
> +  CUdevice_attribute cu_tpb, cu_ws, cu_mpc, cu_tpm, cu_rf, cu_sm;
>  
>cu_tpb = CU_DEVICE_ATTRIBUTE_MAX_THREADS_PER_BLOCK;
>cu_ws = CU_DEVICE_ATTRIBUTE_WARP_SIZE;
>cu_mpc = CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT;
>cu_tpm  = CU_DEVICE_ATTRIBUTE_MAX_THREADS_PER_MULTIPROCESSOR;
> +  cu_rf  = CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR;
> +  cu_sm  = CU_DEVICE_ATTRIBUTE_MAX_SHARED_MEMORY_PER_MULTIPROCESSOR;
>  
>if (cuDeviceGetAttribute (_size, cu_tpb, dev) == CUDA_SUCCESS
> && cuDeviceGetAttribute (_size, cu_ws, dev) == CUDA_SUCCESS
> && cuDeviceGetAttribute (_size, cu_mpc, dev) == CUDA_SUCCESS
> -   && cuDeviceGetAttribute (_size, cu_tpm, dev)  == CUDA_SUCCESS)
> +   && cuDeviceGetAttribute (_size, cu_tpm, dev) == CUDA_SUCCESS
> +   && cuDeviceGetAttribute (_size, cu_rf, dev)  == CUDA_SUCCESS
> +   && cuDeviceGetAttribute (_size, cu_sm, dev)  == CUDA_SUCCESS)

Trying to compile this on CUDA 5.5/331.113, I run into:

[...]/source-gcc/libgomp/plugin/plugin-nvptx.c: In function 'nvptx_exec':
[...]/source-gcc/libgomp/plugin/plugin-nvptx.c:970:16: error: 
'CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR' undeclared (first use in 
this function)
   cu_rf  = CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR;
^~~~
[...]/source-gcc/libgomp/plugin/plugin-nvptx.c:970:16: note: each 
undeclared identifier is reported only once for each function it appears in
[...]/source-gcc/libgomp/plugin/plugin-nvptx.c:971:16: error: 
'CU_DEVICE_ATTRIBUTE_MAX_SHARED_MEMORY_PER_MULTIPROCESSOR' undeclared (first 
use in this function)
   cu_sm  = CU_DEVICE_ATTRIBUTE_MAX_SHARED_MEMORY_PER_MULTIPROCESSOR;
^~~~

For reference, please see the code handling
CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR in the trunk version
of the nvptx_open_device function.

And then, I don't specifically have a problem with discontinuing CUDA 5.5
support, and require 6.5, for example, but that should be a conscious
decision.

> @@ -980,8 +991,6 @@ nvptx_exec (void (*fn), size_t mapnum, void **hostaddrs, 
> void **devaddrs,
>matches the hardware configuration.  Logical gangs are
>scheduled onto physical hardware.  To maximize usage, we
>should guess a large number.  */
> -   if (default_dims[GOMP_DIM_GANG] < 1)
> - default_dims[GOMP_DIM_GANG] = gang ? gang : 1024;

That's "bad", because a non-zero "default_dims[GOMP_DIM_GANG]" (also
known as "default_dims[0]") is used to decide whether to enter this whole
code block, and with that assignment removed, every call of the
nvptx_exec function will now re-do all this GOMP_OPENACC_DIM parsing,
cuDeviceGetAttribute calls, computations, and so on.  (See "GOMP_DEBUG=1"
output.)

I think this whole code block should be moved into the nvptx_open_device
function, to have it executed once when the device is opened -- after
all, all these are per-device attributes.  (So, it's actually
conceptually incorrect to have this done only once in the nvptx_exec
function, given that this data then is used in the same 

C++ PATCH for c++/79464 (ICE in IPA with inherited ctor)

2017-02-15 Thread Jason Merrill
In the new inherited constructor framework, base inheriting
constructor variants don't retain the parameters inherited from a
virtual base constructor, since that constructor won't actually be
called from there; it will be called directly by the complete object
constructor.

Previously, I had cut off DECL_ARGUMENTS and the argument list, but
left these omitted parameters in TYPE_ARG_TYPES.  This constituted
lying to the compiler, and it got its revenge.

This patch also removes the omitted parameters from TYPE_ARG_TYPES,
and then adjusts overloading and mangling to refer to the parameters
from the maybe-in-charge variant instead.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit d133cc9358be1b81147584b37a85ba58a5027d24
Author: Jason Merrill 
Date:   Tue Feb 14 12:25:06 2017 -0500

PR c++/79464 - ICE in IPA with omitted constructor parms

* class.c (build_clone): Also omit parms from TYPE_ARG_TYPES.
(adjust_clone_args): Adjust.
(add_method): Remember omitted parms.
* call.c (add_function_candidate): Likewise.
* mangle.c (write_method_parms): Likewise.
* method.c (ctor_omit_inherited_parms): Return false if there are no
parms to omit.

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index 718438c..154509b 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -2005,7 +2005,11 @@ add_function_candidate (struct z_candidate **candidates,
  considered in overload resolution.  */
   if (DECL_CONSTRUCTOR_P (fn))
 {
-  parmlist = skip_artificial_parms_for (fn, parmlist);
+  if (ctor_omit_inherited_parms (fn))
+   /* Bring back parameters omitted from an inherited ctor.  */
+   parmlist = FUNCTION_FIRST_USER_PARMTYPE (DECL_ORIGIN (fn));
+  else
+   parmlist = skip_artificial_parms_for (fn, parmlist);
   skip = num_artificial_parms_for (fn);
   if (skip > 0 && first_arg != NULL_TREE)
{
diff --git a/gcc/cp/class.c b/gcc/cp/class.c
index 7ec07c9..1442b55 100644
--- a/gcc/cp/class.c
+++ b/gcc/cp/class.c
@@ -1149,6 +1149,12 @@ add_method (tree type, tree method, tree using_decl)
   if (! DECL_STATIC_FUNCTION_P (method))
parms2 = TREE_CHAIN (parms2);
 
+  /* Bring back parameters omitted from an inherited ctor.  */
+  if (ctor_omit_inherited_parms (fn))
+   parms1 = FUNCTION_FIRST_USER_PARMTYPE (DECL_ORIGIN (fn));
+  if (ctor_omit_inherited_parms (method))
+   parms2 = FUNCTION_FIRST_USER_PARMTYPE (DECL_ORIGIN (method));
+
   if (compparms (parms1, parms2)
  && (!DECL_CONV_FN_P (fn)
  || same_type_p (TREE_TYPE (fn_type),
@@ -4761,6 +4767,10 @@ build_clone (tree fn, tree name)
DECL_VINDEX (clone) = NULL_TREE;
 }
 
+  bool ctor_omit_inherited_parms_p = ctor_omit_inherited_parms (clone);
+  if (ctor_omit_inherited_parms_p)
+gcc_assert (DECL_HAS_IN_CHARGE_PARM_P (clone));
+
   /* If there was an in-charge parameter, drop it from the function
  type.  */
   if (DECL_HAS_IN_CHARGE_PARM_P (clone))
@@ -4780,8 +4790,12 @@ build_clone (tree fn, tree name)
   if (DECL_HAS_VTT_PARM_P (fn)
  && ! DECL_NEEDS_VTT_PARM_P (clone))
parmtypes = TREE_CHAIN (parmtypes);
-   /* If this is subobject constructor or destructor, add the vtt
-parameter.  */
+  if (ctor_omit_inherited_parms_p)
+   {
+ /* If we're omitting inherited parms, that just leaves the VTT.  */
+ gcc_assert (DECL_NEEDS_VTT_PARM_P (clone));
+ parmtypes = tree_cons (NULL_TREE, vtt_parm_type, void_list_node);
+   }
   TREE_TYPE (clone)
= build_method_type_directly (basetype,
  TREE_TYPE (TREE_TYPE (clone)),
@@ -4818,7 +4832,7 @@ build_clone (tree fn, tree name)
 
   /* A base constructor inheriting from a virtual base doesn't get the
  arguments.  */
-  if (ctor_omit_inherited_parms (clone))
+  if (ctor_omit_inherited_parms_p)
 DECL_CHAIN (DECL_CHAIN (DECL_ARGUMENTS (clone))) = NULL_TREE;
 
   for (parms = DECL_ARGUMENTS (clone); parms; parms = DECL_CHAIN (parms))
@@ -4965,6 +4979,13 @@ adjust_clone_args (tree decl)
   decl_parms = TREE_CHAIN (decl_parms),
 clone_parms = TREE_CHAIN (clone_parms))
{
+ if (clone_parms == void_list_node)
+   {
+ gcc_assert (decl_parms == clone_parms
+ || ctor_omit_inherited_parms (clone));
+ break;
+   }
+
  gcc_assert (same_type_p (TREE_TYPE (decl_parms),
   TREE_TYPE (clone_parms)));
 
@@ -4999,7 +5020,7 @@ adjust_clone_args (tree decl)
  break;
}
}
-  gcc_assert (!clone_parms);
+  gcc_assert (!clone_parms || clone_parms == void_list_node);
 }
 }
 
diff --git a/gcc/cp/mangle.c b/gcc/cp/mangle.c
index 3ead33e..8b30f42 100644
--- a/gcc/cp/mangle.c
+++ b/gcc/cp/mangle.c
@@ -2740,6 +2740,10 @@ write_method_parms 

[PATCH v4] aarch64: Add split-stack initial support

2017-02-15 Thread Adhemerval Zanella
This is an update patch from my previous version (v3) based mainly on
glibc comments on exported loader symbol [1].  The changes from previous
version are:

   - Remove __tcb_private_ss call on libgcc and emit a data usage on
 glibc symbol when split-stack is used.  This removes the runtime
 errors when running on older glibc and instead make the loader
 fails with a missing symbol.

   - Add glibc version check on split-stack support check.

   - Some comments fixes on morestack.S.

   - Remove some compile warnings on morestack-c.c.

--

This patch adds the split-stack support on aarch64 (PR #67877).  As for
other ports this patch should be used along with glibc and gold support.

The support is done similar to other architectures: a split-stack field
is allocated before TCB by glibc, a target-specific __morestack implementation
and helper functions are added in libgcc and compiler supported in adjusted
(split-stack prologue, va_start for argument handling).  I also plan to
send the gold support to adjust stack allocation acrosss split-stack
and default code calls.

Current approach is to set the final stack adjustments using a 2 instructions
at most (mov/movk) which limits stack allocation to upper limit of 4GB.
The morestack call is non standard with x10 hollding the requested stack
pointer, x11 the argument pointer (if required), and x12 to return
continuation address.  Unwinding is handled by a personality routine that
knows how to find stack segments.

Split-stack prologue on function entry is as follow (this goes before the
usual function prologue):

function:
mrsx9, tpidr_el0
movx10, 
movk   x10, #0x0, lsl #16
subx10, sp, x10
movx11, sp  # if function has stacked arguments
adrp   x12, main_fn_entry
addx12, x12, :lo12:.L2
cmpx9, x10
b.lt   
b  __morestack
main_fn_entry:
[function prologue]

Notes:

1. Even if a function does not allocate a stack frame, a split-stack prologue
   is created.  It is to avoid issues with tail call for external symbols
   which might require linker adjustment (libgo/runtime/go-varargs.c).

2. Basic-block reordering (enabled with -O2) will move split-stack TCB ldr
   to after the required stack calculation.

3. Similar to powerpc, When the linker detects a call from split-stack to
   non-split-stack code, it adds 16k (or more) to the value found in "allocate"
   instructions (so non-split-stack code gets a larger stack).  The amount is
   tunable by a linker option.  The edit means aarch64 does not need to
   implement __morestack_non_split, necessary on x86 because insufficient
   space is available there to edit the stack comparison code.  This feature
   is only implemented in the GNU gold linker.

4. AArch64 does not handle >4G stack initially and although it is possible
   to implement it, limiting to 4G allows to materize the allocation with
   only 2 instructions (mov + movk) and thus simplifying the linker
   adjustments required.  Supporting multiple threads each requiring more
   than 4G of stack is probably not that important, and likely to OOM at
   run time.

5. The TCB support on GLIBC is meant to be included in version 2.26.

6. The continuation address materialized on x12 is done using 'adrp'
   plus add and a static relocation.  Current code uses the
   aarch64_expand_mov_immediate function and since a better alternative
   would be 'adp', it could be a future optimization (not implemented
   in this patch).

[1] https://sourceware.org/ml/libc-alpha/2017-02/msg00272.html

libgcc/ChangeLog:

* libgcc/config.host: Use t-stack and t-statck-aarch64 for
aarch64*-*-linux.
* libgcc/config/aarch64/morestack-c.c: New file.
* libgcc/config/aarch64/morestack.S: Likewise.
* libgcc/config/aarch64/t-stack-aarch64: Likewise.
* libgcc/generic-morestack.c (__splitstack_find): Add aarch64-specific
code.

gcc/ChangeLog:

* common/config/aarch64/aarch64-common.c
(aarch64_supports_split_stack): New function.
(TARGET_SUPPORTS_SPLIT_STACK): New macro.
* gcc/config/aarch64/aarch64-linux.h (TARGET_ASM_FILE_END): Remove
macro.
* gcc/config/aarch64/aarch64-protos.h: Add
aarch64_expand_split_stack_prologue and
aarch64_split_stack_space_check.
* gcc/config/aarch64/aarch64.c (aarch64_gen_far_branch): Add suport
to emit 'b' instruction to rtx different than LABEL_REF.
(aarch64_expand_builtin_va_start): Use internal argument pointer
instead of virtual_incoming_args_rtx.
(morestack_ref): New symbol.
(aarch64_load_split_stack_value): New function.
(aarch64_expand_split_stack_prologue): Likewise.
(aarch64_internal_arg_pointer): Likewise.
(aarch64_split_stack_space_check): Likewise.
(aarch64_file_end): Emit the split-stack note sections.

Re: [C++ RFC] Fix up attribute handling in templates (PR c++/79502)

2017-02-15 Thread Jason Merrill
OK.

On Wed, Feb 15, 2017 at 12:30 PM, Jakub Jelinek  wrote:
> On Wed, Feb 15, 2017 at 11:56:30AM -0500, Jason Merrill wrote:
>> On Tue, Feb 14, 2017 at 3:13 PM, Jakub Jelinek  wrote:
>> > The following testcase fails, because while we have the nodiscard
>> > attribute on the template, we actually never propagate it to the
>> > instantiation, which is where it is checked (I'm really surprised about
>> > this).
>>
>> Normally we propagate attributes when instantiating the class; see the
>> call to apply_late_template_attributes in
>> instantiate_class_template_1.  I'm not sure why that wouldn't be
>
> Yes, instantiate_class_template_1 calls apply_late_template_attributes,
> but that actually does nothing if there are no dependent attributes.
> If there are any, it sets TYPE_ATTRIBUTES (or DECL_ATTRIBUTES) to
> a copy of the attributes list, removes all the dependent attributes
> from there and calls cplus_decl_attributes on the late attrs (after
> tsubst_attribute them).  So setting {TYPE,DECL}_ATTRIBUTES to the
> attributes list unmodified if there are no dependent ones matches
> the behavior for non-dependent ones if there is at least one dependent.
>
> So, does the following patch look better?
>
> 2017-02-15  Jakub Jelinek  
>
> PR c++/79502
> * pt.c (apply_late_template_attributes): If there are
> no dependent attributes, set *p to attributes.
>
> * g++.dg/cpp1z/nodiscard4.C: New test.
>
> --- gcc/cp/pt.c.jj  2017-02-14 09:23:49.0 +0100
> +++ gcc/cp/pt.c 2017-02-15 18:21:45.581055915 +0100
> @@ -10113,6 +10113,8 @@ apply_late_template_attributes (tree *de
>
>cplus_decl_attributes (decl_p, late_attrs, attr_flags);
>  }
> +  else
> +*p = attributes;
>  }
>
>  /* Perform (or defer) access check for typedefs that were referenced
> --- gcc/testsuite/g++.dg/cpp1z/nodiscard4.C.jj  2017-02-15 18:11:33.281135469 
> +0100
> +++ gcc/testsuite/g++.dg/cpp1z/nodiscard4.C 2017-02-15 18:11:33.281135469 
> +0100
> @@ -0,0 +1,14 @@
> +// PR c++/79502
> +// { dg-do compile { target c++11 } }
> +
> +template
> +struct [[nodiscard]] missiles {};
> +
> +missiles make() { return {}; }
> +missiles (*fnptr)() = make;
> +
> +int main()
> +{
> +  make();  // { dg-warning "ignoring returned value of type" }
> +  fnptr(); // { dg-warning "ignoring returned value of type" }
> +}
>
>
> Jakub


Re: [PATCH doc] clean up -fdump-tree- options (PR 32003)

2017-02-15 Thread Martin Sebor

On 02/15/2017 05:51 AM, Thomas Schwinge wrote:

Hi!

On Wed, 1 Feb 2017 20:26:24 -0700, Martin Sebor  wrote:

On 02/01/2017 08:06 PM, Sandra Loosemore wrote:

On 02/01/2017 06:57 PM, Martin Sebor wrote:

PR middle-end/32003
* doc/invoke.texi (-fdump-rtl-): Remove pass-specific options from
index.


"rtl" vs. "tree" typo.  ;-)


--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -544,29 +544,9 @@ Objective-C and Objective-C++ Dialects}.
 -fdump-rtl-@var{pass}  -fdump-rtl-@var{pass}=@var{filename} @gol
 -fdump-statistics @gol
 -fdump-tree-all @gol
--fdump-tree-original@r{[}-@var{n}@r{]}  @gol
-[...]
--fdump-tree-storeccp@r{[}-@var{n}@r{]} @gol
--fdump-final-insns=@var{file} @gol


Is it intentional that you've also removed "-fdump-final-insns" here?
(It remains documented further down the file.)


No, I meant to only remove the -fdump-tree-xxx options in this pass.
I committed r245493 to restore it and fix the ChangeLog.  Thanks for
the review and for pointing it out!

Martin


Re: [PATCH][RFA][target/79404] Fix uninitialized reference to ira_register_move_cost[mode]

2017-02-15 Thread Jeff Law

On 02/15/2017 12:42 PM, Gerald Pfeifer wrote:

Hi Jeff,

On Mon, 13 Feb 2017, Jeff Law wrote:

I've verified this allows libgcc to build on the H8 target and
bootstrapped/regression tested the change on x86_64-unknown-linux-gnu
as well.


I need to reproduce this, but my latest daily bootstrap on
i386-unknown-freebsd10.3 failed with...

.../gerald/gcc-HEAD/libquadmath/math/remainderq.c: In function
'remainderq':
.../gerald/gcc-HEAD/libquadmath/math/remainderq.c:67:1: internal
compiler error: in ira_init_register_move_cost, at ira.c:1580

...and your message was the only one on gcc-patches the last couple
of days that contains the string "ira_init_register_move_cost".

Not sure yet that it's your patch, but this looks a little bit like a
smoking gun... :-)

Already testing a fix.

jeff


Re: [gomp4] Async related additions to OpenACC runtime library

2017-02-15 Thread Thomas Schwinge
Hi!

On Tue, 14 Feb 2017 19:58:11 +0800, Chung-Lin Tang  
wrote:
> On 2017/2/14 07:25 PM, Thomas Schwinge wrote:
> > Testing [...], I saw a lot of regressions,
> > and in r245427 just committed [...] to address
> > these.  Did you simply forget to commit your changes to
> > libgomp/libgomp.map, or why did this work for you?  Please verify:
> 
> Weird, I did not see any regressions

We figured it out; I just filed 
'Inappropriate "ld --version" checking'.


Grüße
 Thomas


Re: [PATCH][RFA][target/79404] Fix uninitialized reference to ira_register_move_cost[mode]

2017-02-15 Thread Gerald Pfeifer

Hi Jeff,

On Mon, 13 Feb 2017, Jeff Law wrote:

I've verified this allows libgcc to build on the H8 target and
bootstrapped/regression tested the change on x86_64-unknown-linux-gnu as well.


I need to reproduce this, but my latest daily bootstrap on
i386-unknown-freebsd10.3 failed with...

.../gerald/gcc-HEAD/libquadmath/math/remainderq.c: In function 'remainderq':
.../gerald/gcc-HEAD/libquadmath/math/remainderq.c:67:1: internal compiler 
error: in ira_init_register_move_cost, at ira.c:1580

...and your message was the only one on gcc-patches the last couple
of days that contains the string "ira_init_register_move_cost".

Not sure yet that it's your patch, but this looks a little bit like 
a smoking gun... :-)


Gerald


New Spanish PO file for 'gcc' (version 7.1-b20170101)

2017-02-15 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the Spanish team of translators.  The file is available at:

http://translationproject.org/latest/gcc/es.po

(This file, 'gcc-7.1-b20170101.es.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

http://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

http://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




[Patch, fortran] PR79434 - [submodules] separate module procedure breaks encapsulation

2017-02-15 Thread Paul Richard Thomas
Dear All,

Another straightforward patch, although it took some head scratching
to realize how simple the fix could be :-)

Bootstraps and regtests on FC23/x_86_64 - OK for trunk and 6-branch?

Cheers

Paul

2017-02-15  Paul Thomas  

PR fortran/79434
* parse.c (check_component, parse_union): Whitespace.
(set_syms_host_assoc): For a derived type, check if the module
in which it was declared is one of the submodule ancestors. If
it is, make the components public. Otherwise, reset attribute
'host_assoc' and set 'use-assoc' so that encapsulation is
preserved.

2017-02-15  Paul Thomas  

PR fortran/79434
* gfortran.dg/submodule_25.f08 : New test.
Index: gcc/fortran/parse.c
===
*** gcc/fortran/parse.c (revision 245196)
--- gcc/fortran/parse.c (working copy)
*** check_component (gfc_symbol *sym, gfc_co
*** 2917,2923 
coarray = true;
sym->attr.coarray_comp = 1;
  }
!  
if (c->ts.type == BT_DERIVED && c->ts.u.derived->attr.coarray_comp
&& !c->attr.pointer)
  {
--- 2917,2923 
coarray = true;
sym->attr.coarray_comp = 1;
  }
! 
if (c->ts.type == BT_DERIVED && c->ts.u.derived->attr.coarray_comp
&& !c->attr.pointer)
  {
*** parse_union (void)
*** 3081,3087 
/* Add a component to the union for each map. */
if (!gfc_add_component (un, gfc_new_block->name, ))
  {
!   gfc_internal_error ("failed to create map component '%s'", 
gfc_new_block->name);
reject_statement ();
return;
--- 3081,3087 
/* Add a component to the union for each map. */
if (!gfc_add_component (un, gfc_new_block->name, ))
  {
!   gfc_internal_error ("failed to create map component '%s'",
gfc_new_block->name);
reject_statement ();
return;
*** static void
*** 5809,5814 
--- 5809,5817 
  set_syms_host_assoc (gfc_symbol *sym)
  {
gfc_component *c;
+   const char dot[2] = ".";
+   char parent1[GFC_MAX_SYMBOL_LEN + 1];
+   char parent2[GFC_MAX_SYMBOL_LEN + 1];
  
if (sym == NULL)
  return;
*** set_syms_host_assoc (gfc_symbol *sym)
*** 5816,5831 
if (sym->attr.module_procedure)
  sym->attr.external = 0;
  
- /*  sym->attr.access = ACCESS_PUBLIC;  */
- 
sym->attr.use_assoc = 0;
sym->attr.host_assoc = 1;
sym->attr.used_in_submodule =1;
  
if (sym->attr.flavor == FL_DERIVED)
  {
!   for (c = sym->components; c; c = c->next)
!   c->attr.access = ACCESS_PUBLIC;
  }
  }
  
--- 5819,5850 
if (sym->attr.module_procedure)
  sym->attr.external = 0;
  
sym->attr.use_assoc = 0;
sym->attr.host_assoc = 1;
sym->attr.used_in_submodule =1;
  
if (sym->attr.flavor == FL_DERIVED)
  {
!   /* Derived types with PRIVATE components that are declared in
!modules other than the parent module must not be changed to be
!PUBLIC. The 'use-assoc' attribute must be reset so that the
!test in symbol.c(gfc_find_component) works correctly. This is
!not necessary for PRIVATE symbols since they are not read from
!the module.  */
!   memset(parent1, '\0', sizeof(parent1));
!   memset(parent2, '\0', sizeof(parent2));
!   strcpy (parent1, gfc_new_block->name);
!   strcpy (parent2, sym->module);
!   if (strcmp (strtok (parent1, dot), strtok (parent2, dot)) == 0)
!   {
! for (c = sym->components; c; c = c->next)
!   c->attr.access = ACCESS_PUBLIC;
!   }
!   else
!   {
! sym->attr.use_assoc = 1;
! sym->attr.host_assoc = 0;
!   }
  }
  }
  
Index: gcc/testsuite/gfortran.dg/submodule_25.f08
===
*** gcc/testsuite/gfortran.dg/submodule_25.f08  (nonexistent)
--- gcc/testsuite/gfortran.dg/submodule_25.f08  (working copy)
***
*** 0 
--- 1,43 
+ ! { dg-do compile }
+ ! Test the fix for PR79434 in which the PRIVATE attribute of the
+ ! component 'i' of the derived type 't' was not respected in the
+ ! submodule 's_u'.
+ !
+ ! Contributed by Reinhold Bader  
+ !
+ module mod_encap_t
+   implicit none
+   type, public :: t
+ private
+ integer :: i
+   end type
+ end module
+ module mod_encap_u
+   use mod_encap_t
+   type, public, extends(t) :: u
+ private
+ integer :: j
+   end type
+   interface
+ module subroutine fu(this)
+   type(u), intent(inout) :: this
+ end subroutine
+   end interface
+ end module
+ submodule (mod_encap_u) s_u
+ contains
+   module procedure fu
+ !   the following statement should cause the compiler to
+ !   abort, pointing out a private component defined in
+ !   a USED module is 

[Patch, fortran] PR79447 - [F08] gfortran rejects valid & accepts invalid internal subprogram in a submodule

2017-02-15 Thread Paul Richard Thomas
Dear All,

This patch is straightforward, verging on 'obvious'.

Bootstraps and regtests on FC23/x86_64 - OK for trunk and 6 branch?

Cheers

Paul

2017-02-15  Paul Thomas  

PR fortran/79447
* decl.c (gfc_set_constant_character_len): Whitespace.
(gfc_match_end): Catch case where a procedure is contained in
a module procedure and ensure that 'end procedure' is the
correct termination.

2017-02-15  Paul Thomas  

PR fortran/79447
* gfortran.dg/submodule_24.f08 : New test.
Index: gcc/fortran/decl.c
===
*** gcc/fortran/decl.c  (revision 245196)
--- gcc/fortran/decl.c  (working copy)
*** gfc_set_constant_character_len (int len,
*** 1499,1505 
  
if (expr->ts.type != BT_CHARACTER)
  return;
!  
if (expr->expr_type != EXPR_CONSTANT)
  {
gfc_error_now ("CHARACTER length must be a constant at %L", 
>where);
--- 1499,1505 
  
if (expr->ts.type != BT_CHARACTER)
  return;
! 
if (expr->expr_type != EXPR_CONSTANT)
  {
gfc_error_now ("CHARACTER length must be a constant at %L", 
>where);
*** gfc_match_end (gfc_statement *st)
*** 6756,6762 
match m;
gfc_namespace *parent_ns, *ns, *prev_ns;
gfc_namespace **nsp;
!   bool abreviated_modproc_decl;
bool got_matching_end = false;
  
old_loc = gfc_current_locus;
--- 6756,6762 
match m;
gfc_namespace *parent_ns, *ns, *prev_ns;
gfc_namespace **nsp;
!   bool abreviated_modproc_decl = false;
bool got_matching_end = false;
  
old_loc = gfc_current_locus;
*** gfc_match_end (gfc_statement *st)
*** 6780,6794 
state = gfc_state_stack->previous->state;
block_name = gfc_state_stack->previous->sym == NULL
 ? NULL : gfc_state_stack->previous->sym->name;
break;
  
  default:
break;
  }
  
!   abreviated_modproc_decl
!   = gfc_current_block ()
! && gfc_current_block ()->abr_modproc_decl;
  
switch (state)
  {
--- 6780,6796 
state = gfc_state_stack->previous->state;
block_name = gfc_state_stack->previous->sym == NULL
 ? NULL : gfc_state_stack->previous->sym->name;
+   abreviated_modproc_decl = gfc_state_stack->previous->sym
+   && gfc_state_stack->previous->sym->abr_modproc_decl;
break;
  
  default:
break;
  }
  
!   if (!abreviated_modproc_decl)
! abreviated_modproc_decl = gfc_current_block ()
! && gfc_current_block ()->abr_modproc_decl;
  
switch (state)
  {
Index: gcc/testsuite/gfortran.dg/submodule_24.f08
===
*** gcc/testsuite/gfortran.dg/submodule_24.f08  (nonexistent)
--- gcc/testsuite/gfortran.dg/submodule_24.f08  (working copy)
***
*** 0 
--- 1,23 
+ ! { dg-do compile }
+ ! Test the fix for PR79447, in which the END PROCEDURE statement
+ ! for MODULE PROCEDURE foo was not accepted.
+ !
+ ! Contributed by Damian Rouson  
+ !
+ module foo_interface
+   implicit none
+   interface
+ module subroutine foo()
+ end subroutine
+   end interface
+ end module foo_interface
+ 
+ submodule(foo_interface) foo_implementation
+ contains
+ module procedure foo
+ contains
+   module subroutine bar()
+   end subroutine
+ end procedure
+!end subroutine ! gfortran accepted this invalid workaround
+ end submodule


Re: [C++ RFC] Fix up attribute handling in templates (PR c++/79502)

2017-02-15 Thread Jakub Jelinek
On Wed, Feb 15, 2017 at 11:56:30AM -0500, Jason Merrill wrote:
> On Tue, Feb 14, 2017 at 3:13 PM, Jakub Jelinek  wrote:
> > The following testcase fails, because while we have the nodiscard
> > attribute on the template, we actually never propagate it to the
> > instantiation, which is where it is checked (I'm really surprised about
> > this).
> 
> Normally we propagate attributes when instantiating the class; see the
> call to apply_late_template_attributes in
> instantiate_class_template_1.  I'm not sure why that wouldn't be

Yes, instantiate_class_template_1 calls apply_late_template_attributes,
but that actually does nothing if there are no dependent attributes.
If there are any, it sets TYPE_ATTRIBUTES (or DECL_ATTRIBUTES) to
a copy of the attributes list, removes all the dependent attributes
from there and calls cplus_decl_attributes on the late attrs (after
tsubst_attribute them).  So setting {TYPE,DECL}_ATTRIBUTES to the
attributes list unmodified if there are no dependent ones matches
the behavior for non-dependent ones if there is at least one dependent.

So, does the following patch look better?

2017-02-15  Jakub Jelinek  

PR c++/79502
* pt.c (apply_late_template_attributes): If there are
no dependent attributes, set *p to attributes.

* g++.dg/cpp1z/nodiscard4.C: New test.

--- gcc/cp/pt.c.jj  2017-02-14 09:23:49.0 +0100
+++ gcc/cp/pt.c 2017-02-15 18:21:45.581055915 +0100
@@ -10113,6 +10113,8 @@ apply_late_template_attributes (tree *de
 
   cplus_decl_attributes (decl_p, late_attrs, attr_flags);
 }
+  else
+*p = attributes;
 }
 
 /* Perform (or defer) access check for typedefs that were referenced
--- gcc/testsuite/g++.dg/cpp1z/nodiscard4.C.jj  2017-02-15 18:11:33.281135469 
+0100
+++ gcc/testsuite/g++.dg/cpp1z/nodiscard4.C 2017-02-15 18:11:33.281135469 
+0100
@@ -0,0 +1,14 @@
+// PR c++/79502
+// { dg-do compile { target c++11 } }
+
+template
+struct [[nodiscard]] missiles {};
+
+missiles make() { return {}; }
+missiles (*fnptr)() = make;
+
+int main()
+{
+  make();  // { dg-warning "ignoring returned value of type" }
+  fnptr(); // { dg-warning "ignoring returned value of type" }
+}


Jakub


Re: [AArch64] Accelerate -fstack-protector through pointer authentication extension

2017-02-15 Thread Jiong Wang



On 15/02/17 15:45, Richard Earnshaw (lists) wrote:

On 18/01/17 17:10, Jiong Wang wrote:

NOTE, this approach however requires DWARF change as the original LR
is signed,
the binary needs new libgcc to make sure c++ eh works correctly.
Given this
acceleration already needs the user specify
-mstack-protector-dialect=pauth
which means the target platform largely should have install new
libgcc, otherwise
you can't utilize new pointer authentication features.

gcc/
2016-11-11  Jiong Wang  

 * config/aarch64/aarch64-opts.h
(aarch64_stack_protector_type): New
 enum.
 (aarch64_layout_frame): Swap callees and locals when
 -mstack-protector-dialect=pauth specified.
 (aarch64_expand_prologue): Use AARCH64_PAUTH_SSP_OR_RA_SIGN
instead
 of AARCH64_ENABLE_RETURN_ADDRESS_SIGN.
 (aarch64_expand_epilogue): Likewise.
 * config/aarch64/aarch64.md (*do_return): Likewise.
 (aarch64_override_options): Sanity check for ILP32 and
TARGET_PAUTH.
 * config/aarch64/aarch64.h (AARCH64_PAUTH_SSP_OPTION,
AARCH64_PAUTH_SSP,
 AARCH64_PAUTH_SSP_OR_RA_SIGN, LINK_SSP_SPEC): New defines.
 * config/aarch64/aarch64.opt (-mstack-protector-dialect=): New
option.
 * doc/invoke.texi (AArch64 Options): Documents
 -mstack-protector-dialect=.


  Patch updated
to migrate to TARGET_STACK_PROTECT_RUNTIME_ENABLED_P.

aarch64 cross check OK with the following options enabled on all testcases.
 -fstack-protector-all -mstack-protector-pauth

OK for trunk?
 gcc/
2017-01-18  Jiong Wang  
* config/aarch64/aarch64-protos.h
 (aarch64_pauth_stack_protector_enabled): New declaration.
 * config/aarch64/aarch64.c (aarch64_layout_frame): Swap
callee-save area
 and locals area when aarch64_pauth_stack_protector_enabled
returns true.
 (aarch64_stack_protect_runtime_enabled): New function.
 (aarch64_pauth_stack_protector_enabled): New function.
 (aarch64_return_address_signing_enabled): Enabled by
 aarch64_pauth_stack_protector_enabled.
 (aarch64_override_options): Sanity check for
-mstack-protector-pauth.
 (TARGET_STACK_PROTECT_RUNTIME_ENABLED_P): Define.
 * config/aarch64/aarch64.h (LINK_SSP_SPEC): Likewise.
 * config/aarch64/aarch64.opt (-mstack-protector-pauth): New option.
 * doc/invoke.texi (AArch64 Options): Documents
-mstack-protector-pauth.

gcc/testsuite/
 * gcc.target/aarch64/stack_protector_1.c: New test.


1.patch


diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 
632dd4768d82c340ae4e9b4a93206743756c06e7..a3ad623eef498d00b52d24bf02a5748fad576c3d
 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -383,6 +383,7 @@ void aarch64_init_cumulative_args (CUMULATIVE_ARGS *, 
const_tree, rtx,
  void aarch64_init_expanders (void);
  void aarch64_init_simd_builtins (void);
  void aarch64_emit_call_insn (rtx);
+bool aarch64_pauth_stack_protector_enabled (void);
  void aarch64_register_pragmas (void);
  void aarch64_relayout_simd_types (void);
  void aarch64_reset_previous_fndecl (void);
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 
3718ad1b3bf27c6bdb9e74831fd660e617cccbde..dd742d37ab6fc6fb5085e1c6b5d86d5ce1ce5f8a
 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -958,4 +958,11 @@ extern const char *host_detect_local_cpu (int argc, const 
char **argv);
  extern tree aarch64_fp16_type_node;
  extern tree aarch64_fp16_ptr_type_node;
  
+#ifndef TARGET_LIBC_PROVIDES_SSP

+#define LINK_SSP_SPEC "%{!mstack-protector-pauth:\
+%{fstack-protector|fstack-protector-all\
+  |fstack-protector-strong|fstack-protector-explicit:\
+  -lssp_nonshared -lssp}}"
+#endif
+

I don't think we want to suppress this.  PAUTH pased stack protections
isn't an all-or-nothing solution.  What if some object files are built
with traditional -fstack-protector code?


I had done a decription on this in the ping email (changed summary may caused
trouble to email client)

--
Code compiled with "-mstack-protector-pauth" can co-work with code compiled
without "-mstack-protector-pauth".  The only problem is when
"-mstack-protector-pauth" is specified, "-lssp/-lssp_nonshared" won't be implied
as the software runtime supports are not required any more.  So if the user has
some object files compiled using default stack protector and wants them to be
linked with object files compiled using "-mstack-protector-pauth", if
"-mstack-protector-pauth" appear in the final command line and "gcc" is used as
linker driver, then "-lssp/-lssp_nonshared" needs to be specified explicitly.
--

Generally, after 

Re: [PATCH PR79347]Maintain profile counter information in vect_do_peeling

2017-02-15 Thread Jan Hubicka
> On Tue, Feb 14, 2017 at 2:13 PM, Bin.Cheng  wrote:
> > On Tue, Feb 14, 2017 at 1:57 PM, Jan Hubicka  wrote:
> >>> Thanks,
> >>> bin
> >>> 2017-02-13  Bin Cheng  
> >>>
> >>>   PR tree-optimization/79347
> >>>   * tree-vect-loop-manip.c (apply_probability_for_bb): New function.
> >>>   (vect_do_peeling): Maintain profile counters during peeling.
> >>>
> >>> gcc/testsuite/ChangeLog
> >>> 2017-02-13  Bin Cheng  
> >>>
> >>>   PR tree-optimization/79347
> >>>   * gcc.dg/vect/pr79347.c: New test.
> >>
> >>> diff --git a/gcc/testsuite/gcc.dg/vect/pr79347.c 
> >>> b/gcc/testsuite/gcc.dg/vect/pr79347.c
> >>> new file mode 100644
> >>> index 000..586c638
> >>> --- /dev/null
> >>> +++ b/gcc/testsuite/gcc.dg/vect/pr79347.c
> >>> @@ -0,0 +1,13 @@
> >>> +/* { dg-do compile } */
> >>> +/* { dg-require-effective-target vect_int } */
> >>> +/* { dg-additional-options "-fdump-tree-vect-all" } */
> >>> +
> >>> +short *a;
> >>> +int c;
> >>> +void n(void)
> >>> +{
> >>> +  for (int i = 0; i >>> +a[i]++;
> >>> +}
> >>
> >> Thanks for fixing the prologue.  I think there is still one extra problem 
> >> in the vectorizer.
> >> With the internal vectorized loop I now see:
> >>
> >> ;;   basic block 9, loop depth 1, count 0, freq 956, maybe hot
> >> ;;   Invalid sum of incoming frequencies 1961, should be 956
> >> ;;prev block 8, next block 10, flags: (NEW, REACHABLE, VISITED)
> >> ;;pred:   10 [100.0%]  (FALLTHRU,DFS_BACK,EXECUTABLE)
> >> ;;8 [100.0%]  (FALLTHRU)
> >>   # i_18 = PHI 
> >>   # vectp_a.13_66 = PHI 
> >>   # vectp_a.19_75 = PHI 
> >>   # ivtmp_78 = PHI 
> >>   _2 = (long unsigned int) i_18;
> >>   _3 = _2 * 2;
> >>   _4 = a.0_1 + _3;
> >>   vect__5.15_68 = MEM[(short int *)vectp_a.13_66];
> >>   _5 = *_4;
> >>   vect__6.16_69 = VIEW_CONVERT_EXPR >> short>(vect__5.15_68);
> >>   _6 = (unsigned short) _5;
> >>   vect__7.17_71 = vect__6.16_69 + vect_cst__70;
> >>   _7 = _6 + 1;
> >>   vect__8.18_72 = VIEW_CONVERT_EXPR(vect__7.17_71);
> >>   _8 = (short int) _7;
> >>   MEM[(short int *)vectp_a.19_75] = vect__8.18_72;
> >>   i_14 = i_18 + 1;
> >>   vectp_a.13_67 = vectp_a.13_66 + 16;
> >>   vectp_a.19_76 = vectp_a.19_75 + 16;
> >>   ivtmp_79 = ivtmp_78 + 1;
> >>   if (ivtmp_79 < bnd.10_59)
> >> goto ; [85.00%]
> >>   else
> >> goto ; [15.00%]
> >>
> >> So it seems that the frequency of the loop itself is unrealistically 
> >> scaled down.
> >> Before vetorizing the frequency is 8500 and predicted number of iterations 
> >> is
> >> 6.6.  Now the loop is intereed via BB 8 with frequency 1148, so the loop, 
> >> by
> >> exit probability exits with 15% probability and thus still has 6.6 
> >> iterations,
> >> but by BB frequencies its body executes fewer times than the preheader.
> >>
> >> Now this is a fragile area vectirizing loop should scale number of 
> >> iterations down
> >> 8 times. However guessed CFG profiles are always very "flat". Of course
> >> if loop iterated 6.6 times at the average vectorizing would not make any 
> >> sense.
> >> Making guessed profiles less flat is unrealistic, because average loop 
> >> iterates few times,
> >> but of course while vectorizing we make additional guess that the 
> >> vectorizable loops
> >> matters and the guessed profile is probably unrealistic.
> > That's what I mentioned in the original patch.  Vectorizer calls
> > scale_loop_profile in
> > function vect_transform_loop to scale down loop's frequency regardless 
> > mismatch
> > between loop and preheader/exit basic blocks.  In fact, after this
> > patch all mismatches
> > in vectorizer are introduced by this.  I don't see any way to keep
> > consistency beween
> > vectorized loop and the rest program without visiting whole CFG.  So
> > shall we skip
> > scaling down profile counters for vectorized loop?
> >
> >>
> >> GCC 6 seems however bit more consistent.
> >>> +/* Apply probability PROB to basic block BB and its single succ edge.  */
> >>> +
> >>> +static void
> >>> +apply_probability_for_bb (basic_block bb, int prob)
> >>> +{
> >>> +  bb->frequency = apply_probability (bb->frequency, prob);
> >>> +  bb->count = apply_probability (bb->count, prob);
> >>> +  gcc_assert (single_succ_p (bb));
> >>> +  single_succ_edge (bb)->count = bb->count;
> >>> +}
> >>> +
> >>>  /* Function vect_do_peeling.
> >>>
> >>> Input:
> >>> @@ -1690,7 +1701,18 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree 
> >>> niters, tree nitersm1,
> >>>   may be preferred.  */
> >>>basic_block anchor = loop_preheader_edge (loop)->src;
> >>>if (skip_vector)
> >>> -split_edge (loop_preheader_edge (loop));
> >>> +{
> >>> +  split_edge (loop_preheader_edge (loop));
> >>> +
> >>> +  /* Due to the order in which we peel 

[PATCH] PR 79356: Do not xfail attr-alloc_size-11.c on some targets.

2017-02-15 Thread Dominik Vogt
The attached patch removes the xfail on two sub tests of
attr-alloc_size-11.c for the targets who were reported to produce
an XPASS:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79356

Tested on s390x biarch and s390.  Please check if the strings for
the other targets are correct.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
gcc/testsuite/ChangeLog-pr79356

PR 79356
* gcc.dg/attr-alloc_size-11.c: Remove xfail for aarch64, ia64*,
powerpc*, sparc* and s390*.
>From 8486df212e3284e5fbdfb3f47bff59652e1e55a7 Mon Sep 17 00:00:00 2001
From: Dominik Vogt 
Date: Wed, 15 Feb 2017 17:39:07 +0100
Subject: [PATCH] PR 79356: Do not xfail attr-alloc_size-11.c on some
 targets.

---
 gcc/testsuite/gcc.dg/attr-alloc_size-11.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/attr-alloc_size-11.c 
b/gcc/testsuite/gcc.dg/attr-alloc_size-11.c
index fac8b18..077b725 100644
--- a/gcc/testsuite/gcc.dg/attr-alloc_size-11.c
+++ b/gcc/testsuite/gcc.dg/attr-alloc_size-11.c
@@ -45,9 +45,10 @@ typedef __SIZE_TYPE__size_t;
 return CAT (alloc_anti_range_, __LINE__)(n);   \
   } typedef void dummy   /* Require a semicolon.  */
 
-/* The following tests fail because of missing range information.  */
-TEST (signed char, SCHAR_MIN + 2, ALLOC_MAX);   /* { dg-warning "argument 1 
range \\\[13, \[0-9\]+\\\] exceeds maximum object size 12" "missing range info 
for signed char" { xfail *-*-* } } */
-TEST (short, SHRT_MIN + 2, ALLOC_MAX); /* { dg-warning "argument 1 range 
\\\[13, \[0-9\]+\\\] exceeds maximum object size 12" "missing range info for 
short" { xfail *-*-* } } */
+/* The following tests fail on some targets because of missing range
+   information.  */
+TEST (signed char, SCHAR_MIN + 2, ALLOC_MAX);   /* { dg-warning "argument 1 
range \\\[13, \[0-9\]+\\\] exceeds maximum object size 12" "missing range info 
for signed char" { xfail { ! "aarch64-*-* ia64*-*-* powerpc*-*-* s390*-*-* 
sparc*-*-*" } } } */
+TEST (short, SHRT_MIN + 2, ALLOC_MAX); /* { dg-warning "argument 1 range 
\\\[13, \[0-9\]+\\\] exceeds maximum object size 12" "missing range info for 
short" { xfail { ! "aarch64-*-* ia64*-*-* powerpc*-*-* s390*-*-* sparc*-*-*" } 
} } */
 
 TEST (int, INT_MIN + 2, ALLOC_MAX);/* { dg-warning "argument 1 range 
\\\[13, \[0-9\]+\\\] exceeds maximum object size 12" } */
 TEST (int, -3, ALLOC_MAX); /* { dg-warning "argument 1 range 
\\\[13, \[0-9\]+\\\] exceeds maximum object size 12" } */
-- 
2.3.0



Re: [C++ RFC] Fix up attribute handling in templates (PR c++/79502)

2017-02-15 Thread Jason Merrill
On Tue, Feb 14, 2017 at 3:13 PM, Jakub Jelinek  wrote:
> The following testcase fails, because while we have the nodiscard
> attribute on the template, we actually never propagate it to the
> instantiation, which is where it is checked (I'm really surprised about
> this).

Normally we propagate attributes when instantiating the class; see the
call to apply_late_template_attributes in
instantiate_class_template_1.  I'm not sure why that wouldn't be
happening here; are we calling maybe_warn_nodiscard before
instantiating the class?

> Unfortunately, this patch regresses
> FAIL: g++.dg/ext/visibility/template8.C  -std=gnu++{11,14,98}  scan-hidden 
> hidden[ \\t_]*_Z1gI1AI1BEEvT_
> It expects that the visibility attribute from the template never
> makes it to the implementation or something, is that correct?  Or do
> we need to handle visibility in some special way?

Visibility is handled specially; determine_visibility looks up the
visibility of the template if the specialization doesn't specify its
own visibility.

> Regarding the first hunk, it is just a wild guess, I couldn't trigger
> that code by make check-c++-all.  Is there a way to get it through
> some partial instantiation of scoped enum with/without attributes or
> something similar?

Hmm, not sure.

> Anyway, except for that template8.C the patch passed bootstrap/regtest
> on x86_64-linux and i686-linux.  But it really puzzles me that the
> attributes aren't instantiated, what happens e.g. with abi_tag
> attribute?

abi_tag and may_alias are applied specifically in lookup_class_template_1.

Jason


Re: [PATCH] avoid ICE when attempting to init a flexible array member (PR c++/79363)

2017-02-15 Thread Jason Merrill
OK.

On Mon, Feb 6, 2017 at 9:04 PM, Martin Sebor  wrote:
> The attached patch avoids another ICE (in addition to the already
> fixed bug 72775) in flexible array member NSDMI.  To avoid code
> duplication and for consistency I factored the diagnostic code
> out of perform_member_init and into a new helper.
>
> Martin


Re: [libgomp, testsuite] Enable libgomp.c/pr48591.c on __float128 targets

2017-02-15 Thread Jakub Jelinek
On Wed, Feb 15, 2017 at 05:28:48PM +0100, Rainer Orth wrote:
> When comparing x86_64-pc-linux-gnu and i386-pc-solaris2.12 testsuite
> results for PR rtl-optimization/79532, I noticed that
> libgomp.c/pr48591.c wasn't run on Solaris/x86.  The testcase contains a
> seemingly random list of targets to run on, but ISTM that it should
> really check for __float128 support instead.  This is what the following
> patch does.  Tested with the appropriate runtest invocations on
> i386-pc-solaris2.12 and x86_64-pc-linux-gnu (and currently showing the
> 32-bit ICEs on both).
> 
> Ok for mainline?
> 
>   Rainer
> 
> -- 
> -
> Rainer Orth, Center for Biotechnology, Bielefeld University
> 
> 
> 2017-02-15  Rainer Orth  
> 
>   * testsuite/libgomp.c/pr48591.c: Enable on all __float128
>   targets.
>   Add __float128 options.

Ok, thanks.

Jakub


[libgomp, testsuite] Enable libgomp.c/pr48591.c on __float128 targets

2017-02-15 Thread Rainer Orth
When comparing x86_64-pc-linux-gnu and i386-pc-solaris2.12 testsuite
results for PR rtl-optimization/79532, I noticed that
libgomp.c/pr48591.c wasn't run on Solaris/x86.  The testcase contains a
seemingly random list of targets to run on, but ISTM that it should
really check for __float128 support instead.  This is what the following
patch does.  Tested with the appropriate runtest invocations on
i386-pc-solaris2.12 and x86_64-pc-linux-gnu (and currently showing the
32-bit ICEs on both).

Ok for mainline?

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2017-02-15  Rainer Orth  

* testsuite/libgomp.c/pr48591.c: Enable on all __float128
targets.
Add __float128 options.

# HG changeset patch
# Parent  dc8478f495c9b58e11289f1702244226513af059
Enable libgomp.c/pr48591.c on __float128 targets

diff --git a/libgomp/testsuite/libgomp.c/pr48591.c b/libgomp/testsuite/libgomp.c/pr48591.c
--- a/libgomp/testsuite/libgomp.c/pr48591.c
+++ b/libgomp/testsuite/libgomp.c/pr48591.c
@@ -1,6 +1,7 @@
 /* PR middle-end/48591 */
-/* { dg-do run { target i?86-*-linux* i?86-*-gnu* x86_64-*-linux* ia64-*-linux* x86_64-*-freebsd* } } */
+/* { dg-do run { target __float128 } } */
 /* { dg-options "-O0" } */
+/* { dg-add-options __float128 } */
 
 extern void abort (void);
 


Re: Patch ping^2

2017-02-15 Thread Jakub Jelinek
On Tue, Feb 14, 2017 at 10:05:04AM -0500, Nathan Sidwell wrote:
> On 02/13/2017 10:46 AM, Jakub Jelinek wrote:
> > Hi!
> > 
> > I'd like to ping a couple of patches:
> 
> > - C++ P1 PR79288 - wrong default TLS model for __thread static data members
> >   http://gcc.gnu.org/ml/gcc-patches/2017-01/msg02349.html
> 
> This is ok, but don't you think the changelog is misleading?  In your
> description you say it needs DECL_EXTERNAL set, but the changelog says
> 'inline', which isn't something static member vars have (although I can see
> how it's involved in DECL_EXTERNAL setting).

It is something static member vars can have (explicitly or implicitly).

In 6.x we had:
/* Even if there is an in-class initialization, DECL
   is considered undefined until an out-of-class
   definition is provided.  */
DECL_EXTERNAL (decl) = 1;

if (thread_p)
  {
CP_DECL_THREAD_LOCAL_P (decl) = true;
if (!processing_template_decl)
  set_decl_tls_model (decl, decl_default_tls_model (decl));
if (declspecs->gnu_thread_keyword_p)
  SET_DECL_GNU_TLS_P (decl);
  }
...
With the addition of C++17 inline vars I changed that to:
if (thread_p)
  {
CP_DECL_THREAD_LOCAL_P (decl) = true;
if (!processing_template_decl)
  set_decl_tls_model (decl, decl_default_tls_model (decl));
if (declspecs->gnu_thread_keyword_p)
  SET_DECL_GNU_TLS_P (decl);
  }
...

if (inlinep)
  mark_inline_variable (decl);

if (!DECL_VAR_DECLARED_INLINE_P (decl)
&& !(cxx_dialect >= cxx1z && constexpr_p))
  /* Even if there is an in-class initialization, DECL
 is considered undefined until an out-of-class
 definition is provided, unless this is an inline
 variable.  */
  DECL_EXTERNAL (decl) = 1;
because inline static data members, explicit or implicit, really shouldn't
be marked DECL_EXTERNAL, they have in-class definition rather than a mere
declaration.
The patch just changes that back to:
...
if (inlinep)
  mark_inline_variable (decl);

if (!DECL_VAR_DECLARED_INLINE_P (decl)
&& !(cxx_dialect >= cxx1z && constexpr_p))
  /* Even if there is an in-class initialization, DECL
 is considered undefined until an out-of-class
 definition is provided, unless this is an inline
 variable.  */
  DECL_EXTERNAL (decl) = 1;

if (thread_p)
  {
CP_DECL_THREAD_LOCAL_P (decl) = true;
if (!processing_template_decl)
  set_decl_tls_model (decl, decl_default_tls_model (decl));
if (declspecs->gnu_thread_keyword_p)
  SET_DECL_GNU_TLS_P (decl);
  }

Jakub


Re: [C++ PATCH] Don't pedwarn on deprecated or fallthrough attributes (PR c++/79301)

2017-02-15 Thread Jason Merrill
OK.

On Fri, Feb 10, 2017 at 2:53 PM, Jakub Jelinek  wrote:
> Hi!
>
> The reporter complained that even when __has_cpp_attribute (fallthrough)
> and similarly __has_cpp_attribute (deprecated) return true, in pedantic
> mode the compiler will error out on those.
>
> The following patch just removes the pedwarn, i.e. we accept those
> attributes as extensions even in C++11 and C++14 modes.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2017-02-10  Jakub Jelinek  
>
> PR c++/79301
> * parser.c (cp_parser_std_attribute): Don't pedwarn about
> [[deprecated]] with -std=c++11 and [[fallthrough]] with
> -std=c++11 and -std=c++14.
>
> * g++.dg/cpp1y/feat-cxx11-neg.C: Remove (with pedwarn) from
> [[deprecated]] comment.
> * g++.dg/cpp1y/feat-cxx98-neg.C: Likewise.
> * g++.dg/cpp1y/feat-cxx11.C: Likewise.
> * g++.dg/cpp1y/attr-deprecated-neg.C: Don't expect warnings for
> [[deprecated]] in -std=c++11.
> * g++.dg/cpp0x/fallthrough2.C: Don't expect warnings for
> [[fallthrough]] in -std=c++11 and -std=c++14.
>
> --- gcc/cp/parser.c.jj  2017-02-09 23:01:49.0 +0100
> +++ gcc/cp/parser.c 2017-02-10 17:24:15.860948212 +0100
> @@ -24747,22 +24747,10 @@ cp_parser_std_attribute (cp_parser *pars
> TREE_PURPOSE (TREE_PURPOSE (attribute)) = get_identifier ("gnu");
>/* C++14 deprecated attribute is equivalent to GNU's.  */
>else if (is_attribute_p ("deprecated", attr_id))
> -   {
> - if (cxx_dialect == cxx11)
> -   pedwarn (token->location, OPT_Wpedantic,
> -"% is a C++14 feature;"
> -" use %");
> - TREE_PURPOSE (TREE_PURPOSE (attribute)) = get_identifier ("gnu");
> -   }
> +   TREE_PURPOSE (TREE_PURPOSE (attribute)) = get_identifier ("gnu");
>/* C++17 fallthrough attribute is equivalent to GNU's.  */
>else if (is_attribute_p ("fallthrough", attr_id))
> -   {
> - if (cxx_dialect < cxx1z)
> -   pedwarn (token->location, OPT_Wpedantic,
> -"% is a C++17 feature;"
> -" use %");
> - TREE_PURPOSE (TREE_PURPOSE (attribute)) = get_identifier ("gnu");
> -   }
> +   TREE_PURPOSE (TREE_PURPOSE (attribute)) = get_identifier ("gnu");
>/* Transactional Memory TS optimize_for_synchronized attribute is
>  equivalent to GNU transaction_callable.  */
>else if (is_attribute_p ("optimize_for_synchronized", attr_id))
> --- gcc/testsuite/g++.dg/cpp1y/feat-cxx11-neg.C.jj  2015-10-11 
> 19:11:09.0 +0200
> +++ gcc/testsuite/g++.dg/cpp1y/feat-cxx11-neg.C 2017-02-10 17:28:32.637566582 
> +0100
> @@ -57,7 +57,7 @@
>
>  //  C++14 attributes:
>
> -//  Attribute [[deprecated]] is allowed in C++11 as an extension (with 
> pedwarn).
> +//  Attribute [[deprecated]] is allowed in C++11 as an extension.
>  //#ifdef __has_cpp_attribute
>  //#  if __has_cpp_attribute(deprecated) == 201309
>  //#error "__has_cpp_attribute(deprecated)" // {  }
> --- gcc/testsuite/g++.dg/cpp1y/feat-cxx98-neg.C.jj  2016-07-07 
> 20:40:27.0 +0200
> +++ gcc/testsuite/g++.dg/cpp1y/feat-cxx98-neg.C 2017-02-10 17:29:46.592592629 
> +0100
> @@ -144,7 +144,7 @@
>
>  //  C++14 attributes:
>
> -//  Attribute [[deprecated]] is allowed in C++11 as an extension (with 
> pedwarn).
> +//  Attribute [[deprecated]] is allowed in C++11 as an extension.
>  //#ifdef __has_cpp_attribute
>  //#  if __has_cpp_attribute(deprecated) == 201309
>  //#error "__has_cpp_attribute(deprecated)" // {  }
> --- gcc/testsuite/g++.dg/cpp1y/feat-cxx11.C.jj  2016-11-03 22:03:27.0 
> +0100
> +++ gcc/testsuite/g++.dg/cpp1y/feat-cxx11.C 2017-02-10 17:28:45.356399081 
> +0100
> @@ -166,7 +166,7 @@
>
>  //  C++14 attributes:
>
> -//  Attribute [[deprecated]] is allowed in C++11 as an extension (with 
> pedwarn).
> +//  Attribute [[deprecated]] is allowed in C++11 as an extension.
>  #ifdef __has_cpp_attribute
>  #  if ! __has_cpp_attribute(deprecated)
>  #error "__has_cpp_attribute(deprecated)"
> --- gcc/testsuite/g++.dg/cpp1y/attr-deprecated-neg.C.jj 2014-10-10 
> 08:19:37.0 +0200
> +++ gcc/testsuite/g++.dg/cpp1y/attr-deprecated-neg.C2017-02-10 
> 17:32:22.592541551 +0100
> @@ -1,22 +1,22 @@
>  // { dg-do compile { target c++11_only } }
>  // { dg-options "-pedantic" }
>
> -class [[deprecated]] A // { dg-warning "'deprecated' is a C..14 feature" }
> +class [[deprecated]] A // { dg-bogus "'deprecated' is a C..14 feature" }
>  {
>  };
>
> -[[deprecated]] // { dg-warning "'deprecated' is a C..14 feature" }
> +[[deprecated]] // { dg-bogus "'deprecated' is a C..14 feature" }
>  int
>  foo(int n)
>  {
>return 42 + n;
>  }
>
> -class [[deprecated("B has been superceded by C")]] B // { dg-warning 
> "'deprecated' is a C..14 

Re: Handle GIMPLE NOPs in is_maybe_undefined (PR, tree-optimization/79529).

2017-02-15 Thread Aldy Hernandez

On 02/15/2017 09:49 AM, Martin Liška wrote:

Hi.

As mentioned in the PR, gimple nops are wrongly handled in is_maybe_undefined 
function.
Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.




  gimple *def = SSA_NAME_DEF_STMT (t);

+  if (!def || gimple_nop_p (def))
+   return true;
+



Out of curiosity, what is T?

Because we early bail out if ssa_undefined_value_p() right before 
calling SSA_NAME_DEF_STMT, and ssa_undefined_value_p() will already bail 
if gimple_nop_p.


So T must be one of:


  /* Parameters get their initial value from the function entry.  */
  else if (TREE_CODE (var) == PARM_DECL)
return false;
  /* When returning by reference the return address is actually a hidden
 parameter.  */
  else if (TREE_CODE (var) == RESULT_DECL && DECL_BY_REFERENCE (var))
return false;
  /* Hard register variables get their initial value from the ether.  */
  else if (VAR_P (var) && DECL_HARD_REGISTER (var))
return false;

which I wonder if we should special case in is_maybe_undefined.

Aldy


Re: [AArch64] Accelerate -fstack-protector through pointer authentication extension

2017-02-15 Thread Richard Earnshaw (lists)
On 18/01/17 17:10, Jiong Wang wrote:
>> NOTE, this approach however requires DWARF change as the original LR
>> is signed,
>> the binary needs new libgcc to make sure c++ eh works correctly. 
>> Given this
>> acceleration already needs the user specify
>> -mstack-protector-dialect=pauth
>> which means the target platform largely should have install new
>> libgcc, otherwise
>> you can't utilize new pointer authentication features.
>>
>> gcc/
>> 2016-11-11  Jiong Wang  
>>
>> * config/aarch64/aarch64-opts.h
>> (aarch64_stack_protector_type): New
>> enum.
>> (aarch64_layout_frame): Swap callees and locals when
>> -mstack-protector-dialect=pauth specified.
>> (aarch64_expand_prologue): Use AARCH64_PAUTH_SSP_OR_RA_SIGN
>> instead
>> of AARCH64_ENABLE_RETURN_ADDRESS_SIGN.
>> (aarch64_expand_epilogue): Likewise.
>> * config/aarch64/aarch64.md (*do_return): Likewise.
>> (aarch64_override_options): Sanity check for ILP32 and
>> TARGET_PAUTH.
>> * config/aarch64/aarch64.h (AARCH64_PAUTH_SSP_OPTION,
>> AARCH64_PAUTH_SSP,
>> AARCH64_PAUTH_SSP_OR_RA_SIGN, LINK_SSP_SPEC): New defines.
>> * config/aarch64/aarch64.opt (-mstack-protector-dialect=): New
>> option.
>> * doc/invoke.texi (AArch64 Options): Documents
>> -mstack-protector-dialect=.
>>
>  Patch updated
> to migrate to TARGET_STACK_PROTECT_RUNTIME_ENABLED_P.
> 
> aarch64 cross check OK with the following options enabled on all testcases.
> -fstack-protector-all -mstack-protector-pauth
> 
> OK for trunk?
> gcc/
> 2017-01-18  Jiong Wang  
>* config/aarch64/aarch64-protos.h
> (aarch64_pauth_stack_protector_enabled): New declaration.
> * config/aarch64/aarch64.c (aarch64_layout_frame): Swap
> callee-save area
> and locals area when aarch64_pauth_stack_protector_enabled
> returns true.
> (aarch64_stack_protect_runtime_enabled): New function.
> (aarch64_pauth_stack_protector_enabled): New function.
> (aarch64_return_address_signing_enabled): Enabled by
> aarch64_pauth_stack_protector_enabled.
> (aarch64_override_options): Sanity check for
> -mstack-protector-pauth.
> (TARGET_STACK_PROTECT_RUNTIME_ENABLED_P): Define.
> * config/aarch64/aarch64.h (LINK_SSP_SPEC): Likewise.
> * config/aarch64/aarch64.opt (-mstack-protector-pauth): New option.
> * doc/invoke.texi (AArch64 Options): Documents
> -mstack-protector-pauth.
> 
> gcc/testsuite/
> * gcc.target/aarch64/stack_protector_1.c: New test.
> 
> 
> 1.patch
> 
> 
> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index 
> 632dd4768d82c340ae4e9b4a93206743756c06e7..a3ad623eef498d00b52d24bf02a5748fad576c3d
>  100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -383,6 +383,7 @@ void aarch64_init_cumulative_args (CUMULATIVE_ARGS *, 
> const_tree, rtx,
>  void aarch64_init_expanders (void);
>  void aarch64_init_simd_builtins (void);
>  void aarch64_emit_call_insn (rtx);
> +bool aarch64_pauth_stack_protector_enabled (void);
>  void aarch64_register_pragmas (void);
>  void aarch64_relayout_simd_types (void);
>  void aarch64_reset_previous_fndecl (void);
> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> index 
> 3718ad1b3bf27c6bdb9e74831fd660e617cccbde..dd742d37ab6fc6fb5085e1c6b5d86d5ce1ce5f8a
>  100644
> --- a/gcc/config/aarch64/aarch64.h
> +++ b/gcc/config/aarch64/aarch64.h
> @@ -958,4 +958,11 @@ extern const char *host_detect_local_cpu (int argc, 
> const char **argv);
>  extern tree aarch64_fp16_type_node;
>  extern tree aarch64_fp16_ptr_type_node;
>  
> +#ifndef TARGET_LIBC_PROVIDES_SSP
> +#define LINK_SSP_SPEC "%{!mstack-protector-pauth:\
> +  %{fstack-protector|fstack-protector-all\
> +|fstack-protector-strong|fstack-protector-explicit:\
> +-lssp_nonshared -lssp}}"
> +#endif
> +

I don't think we want to suppress this.  PAUTH pased stack protections
isn't an all-or-nothing solution.  What if some object files are built
with traditional -fstack-protector code?

If the library isn't referenced by any of the input objects we won't
pull anything useful in from the library, so leaving it in the link list
should be harmless.


>  #endif /* GCC_AARCH64_H */
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> 6451b08191cf1a44aed502930da8603111f6e8ca..461f7b59584af9315accaecc0256abc9a2df4350
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -2884,8 +2884,28 @@ aarch64_layout_frame (void)
>else if (cfun->machine->frame.wb_candidate1 != INVALID_REGNUM)
>  max_push_offset = 256;
>  
> -  

Re: [RFC PATCH] Improve switchconv optimization (PR tree-optimization/79472)

2017-02-15 Thread Jeff Law

On 02/15/2017 04:51 AM, Jakub Jelinek wrote:

On Wed, Feb 15, 2017 at 12:46:44PM +0100, Richard Biener wrote:

Possibly, but for GCC 8.


To both this switchconv patch and the potential improvement for loading
from const arrays (can create an enhancement PR for that), or just the
latter?


Both I think - the patch is pretty big.


Ok, I'll queue the patch for GCC8 then.


 Maybe we can instead make early
threading not mess this up?


Maybe, but not planning to do that myself, my knowledge about jump threading
is too limited.

The problem is at the point where we thread all we see is this:

  # s_1 = PHI <"foo"(2), "bar"(3), "spam"(4), 0B(5)>
:
  if (s_1 == 0B)
goto ;
  else
goto ;

Nothing more is needed for jump threading to do its job.  This doesn't 
look any different to the threader than any other simple jump threading 
opportunity.


I guess we could look to see if the PHI is the join point for a switch, 
but that seems rather hacky.


jeff


Re: C PATCH to fix ICE with -Wdouble-promotion (PR c/79515)

2017-02-15 Thread Bernd Schmidt

On 02/15/2017 12:49 PM, Marek Polacek wrote:

We ICEd on this testcase in do_warn_double_promotion because an invalid
conversion had produced an error result type and accessing that via
TYPE_MAIN_VARIANT crashes.  Fixed in an obvious way.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2017-02-15  Marek Polacek  

PR c/79515
* c-warn.c (do_warn_double_promotion): Don't warn if an invalid
conversion has occured.

* gcc.dg/dfp/pr79515.c: New.


Ok.


Bernd




Re: [PATCH][AArch64] Accept more addressing modes for PRFM

2017-02-15 Thread Richard Earnshaw (lists)
On 15/02/17 15:03, Kyrill Tkachov wrote:
> Hi Richard,
> 
> On 15/02/17 15:00, Richard Earnshaw (lists) wrote:
>> On 03/02/17 17:12, Kyrill Tkachov wrote:
>>> Hi all,
>>>
>>> While evaluating Maxim's SW prefetch patches [1] I noticed that the
>>> aarch64 prefetch pattern is
>>> overly restrictive in its address operand. It only accepts simple
>>> register addressing modes.
>>> In fact, the PRFM instruction accepts almost all modes that a normal
>>> 64-bit LDR supports.
>>> The restriction in the pattern leads to explicit address calculation
>>> code to be emitted which we could avoid.
>>>
>>> This patch relaxes the restrictions on the prefetch define_insn. It
>>> creates a predicate and constraint that
>>> allow the full addressing modes that PRFM allows. Thus for the testcase
>>> in the patch (adapted from one of the existing
>>> __builtin_prefetch tests in the testsuite) we can generate a:
>>> prfmPLDL1STRM, [x1, 8]
>>>
>>> instead of the current
>>> prfmPLDL1STRM, [x1]
>>> with an explicit increment of x1 by 8 in a separate instruction.
>>>
>>> I've removed the %a output modifier in the output template and wrapped
>>> the address operand into a DImode MEM before
>>> passing it down to aarch64_print_operand.
>>>
>>> This is because operand 0 is an address operand rather than a memory
>>> operand and thus doesn't have a mode associated
>>> with it.  When processing the 'a' output modifier the code in final.c
>>> will call TARGET_PRINT_OPERAND_ADDRESS with a VOIDmode
>>> argument.  This will ICE on aarch64 because we need a mode for the
>>> memory in order for aarch64_classify_address to work
>>> correctly.  Rather than overriding the VOIDmode in
>>> aarch64_print_operand_address I decided to instead create the DImode
>>> MEM in the "prefetch" output template and treat it as a normal 64-bit
>>> memory address, which at the point of assembly output
>>> is what it is anyway.
>>>
>>> With this patch I see a reduction in instruction count in the SPEC2006
>>> benchmarks when SW prefetching is enabled on top
>>> of Maxim's patchset because fewer address calculation instructions are
>>> emitted due to the use of the more expressive
>>> addressing modes. It also fixes a performance regression that I observed
>>> in 410.bwaves from Maxim's patches on Cortex-A72.
>>> I'll be running a full set of benchmarks to evaluate this further, but I
>>> think this is the right thing to do.
>>>
>>> Bootstrapped and tested on aarch64-none-linux-gnu.
>>>
>>> Maxim, do you want to try this on top of your patches on your hardware
>>> to see if it helps with the regressions you mentioned?
>>>
>>> Thanks,
>>> Kyrill
>>>
>>>
>>> [1] https://gcc.gnu.org/ml/gcc-patches/2017-01/msg02284.html
>>>
>>> 2016-02-03  Kyrylo Tkachov  
>>>
>>>  * config/aarch64/aarch64.md (prefetch); Adjust predicate and
>>>  constraint on operand 0 to allow more general addressing modes.
>>>  Adjust output template.
>>>  * config/aarch64/aarch64.c (aarch64_address_valid_for_prefetch_p):
>>>  New function.
>>>  * config/aarch64/aarch64-protos.h
>>>  (aarch64_address_valid_for_prefetch_p): Declare prototype.
>>>  * config/aarch64/constraints.md (Dp): New address constraint.
>>>  * config/aarch64/predicates.md (aarch64_prefetch_operand): New
>>>  predicate.
>>>
>>> 2016-02-03  Kyrylo Tkachov  
>>>
>>>  * gcc.target/aarch64/prfm_imm_offset_1.c: New test.
>>>
>>> aarch64-prfm-imm.patch
>>>
>> Hmm, I'm not sure about this.  rtl.texi says that a prefetch code
>> contains an address, not a MEM.  So it's theoretically possible for
>> generic code to want to look inside the first operand and find an
>> address directly.  This change would break that assumption.
> 
> With this change the prefetch operand is still an address, not a MEM
> during all the
> optimisation passes.
> It's wrapped in a MEM only during the ultimate printing of the assembly
> string
> during 'final'.
> 

Ah!  I'd missed that.

This is OK for stage1.

R.

> Kyrill
> 
>> R.
>>
>>> commit a324e2f2ea243fe42f23a026ecbe1435876e2c8b
>>> Author: Kyrylo Tkachov 
>>> Date:   Thu Feb 2 14:46:11 2017 +
>>>
>>>  [AArch64] Accept more addressing modes for PRFM
>>>
>>> diff --git a/gcc/config/aarch64/aarch64-protos.h
>>> b/gcc/config/aarch64/aarch64-protos.h
>>> index babc327..61706de 100644
>>> --- a/gcc/config/aarch64/aarch64-protos.h
>>> +++ b/gcc/config/aarch64/aarch64-protos.h
>>> @@ -300,6 +300,7 @@ extern struct tune_params aarch64_tune_params;
>>> HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned,
>>> unsigned);
>>>   int aarch64_get_condition_code (rtx);
>>> +bool aarch64_address_valid_for_prefetch_p (rtx, bool);
>>>   bool aarch64_bitmask_imm (HOST_WIDE_INT val, machine_mode);
>>>   unsigned HOST_WIDE_INT aarch64_and_split_imm1 (HOST_WIDE_INT val_in);
>>>   unsigned HOST_WIDE_INT aarch64_and_split_imm2 (HOST_WIDE_INT val_in);
>>> diff --git 

Re: [PATCH PR79347]Maintain profile counter information in vect_do_peeling

2017-02-15 Thread Bin.Cheng
On Tue, Feb 14, 2017 at 2:13 PM, Bin.Cheng  wrote:
> On Tue, Feb 14, 2017 at 1:57 PM, Jan Hubicka  wrote:
>>> Thanks,
>>> bin
>>> 2017-02-13  Bin Cheng  
>>>
>>>   PR tree-optimization/79347
>>>   * tree-vect-loop-manip.c (apply_probability_for_bb): New function.
>>>   (vect_do_peeling): Maintain profile counters during peeling.
>>>
>>> gcc/testsuite/ChangeLog
>>> 2017-02-13  Bin Cheng  
>>>
>>>   PR tree-optimization/79347
>>>   * gcc.dg/vect/pr79347.c: New test.
>>
>>> diff --git a/gcc/testsuite/gcc.dg/vect/pr79347.c 
>>> b/gcc/testsuite/gcc.dg/vect/pr79347.c
>>> new file mode 100644
>>> index 000..586c638
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.dg/vect/pr79347.c
>>> @@ -0,0 +1,13 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-require-effective-target vect_int } */
>>> +/* { dg-additional-options "-fdump-tree-vect-all" } */
>>> +
>>> +short *a;
>>> +int c;
>>> +void n(void)
>>> +{
>>> +  for (int i = 0; i>> +a[i]++;
>>> +}
>>
>> Thanks for fixing the prologue.  I think there is still one extra problem in 
>> the vectorizer.
>> With the internal vectorized loop I now see:
>>
>> ;;   basic block 9, loop depth 1, count 0, freq 956, maybe hot
>> ;;   Invalid sum of incoming frequencies 1961, should be 956
>> ;;prev block 8, next block 10, flags: (NEW, REACHABLE, VISITED)
>> ;;pred:   10 [100.0%]  (FALLTHRU,DFS_BACK,EXECUTABLE)
>> ;;8 [100.0%]  (FALLTHRU)
>>   # i_18 = PHI 
>>   # vectp_a.13_66 = PHI 
>>   # vectp_a.19_75 = PHI 
>>   # ivtmp_78 = PHI 
>>   _2 = (long unsigned int) i_18;
>>   _3 = _2 * 2;
>>   _4 = a.0_1 + _3;
>>   vect__5.15_68 = MEM[(short int *)vectp_a.13_66];
>>   _5 = *_4;
>>   vect__6.16_69 = VIEW_CONVERT_EXPR(vect__5.15_68);
>>   _6 = (unsigned short) _5;
>>   vect__7.17_71 = vect__6.16_69 + vect_cst__70;
>>   _7 = _6 + 1;
>>   vect__8.18_72 = VIEW_CONVERT_EXPR(vect__7.17_71);
>>   _8 = (short int) _7;
>>   MEM[(short int *)vectp_a.19_75] = vect__8.18_72;
>>   i_14 = i_18 + 1;
>>   vectp_a.13_67 = vectp_a.13_66 + 16;
>>   vectp_a.19_76 = vectp_a.19_75 + 16;
>>   ivtmp_79 = ivtmp_78 + 1;
>>   if (ivtmp_79 < bnd.10_59)
>> goto ; [85.00%]
>>   else
>> goto ; [15.00%]
>>
>> So it seems that the frequency of the loop itself is unrealistically scaled 
>> down.
>> Before vetorizing the frequency is 8500 and predicted number of iterations is
>> 6.6.  Now the loop is intereed via BB 8 with frequency 1148, so the loop, by
>> exit probability exits with 15% probability and thus still has 6.6 
>> iterations,
>> but by BB frequencies its body executes fewer times than the preheader.
>>
>> Now this is a fragile area vectirizing loop should scale number of 
>> iterations down
>> 8 times. However guessed CFG profiles are always very "flat". Of course
>> if loop iterated 6.6 times at the average vectorizing would not make any 
>> sense.
>> Making guessed profiles less flat is unrealistic, because average loop 
>> iterates few times,
>> but of course while vectorizing we make additional guess that the 
>> vectorizable loops
>> matters and the guessed profile is probably unrealistic.
> That's what I mentioned in the original patch.  Vectorizer calls
> scale_loop_profile in
> function vect_transform_loop to scale down loop's frequency regardless 
> mismatch
> between loop and preheader/exit basic blocks.  In fact, after this
> patch all mismatches
> in vectorizer are introduced by this.  I don't see any way to keep
> consistency beween
> vectorized loop and the rest program without visiting whole CFG.  So
> shall we skip
> scaling down profile counters for vectorized loop?
>
>>
>> GCC 6 seems however bit more consistent.
>>> +/* Apply probability PROB to basic block BB and its single succ edge.  */
>>> +
>>> +static void
>>> +apply_probability_for_bb (basic_block bb, int prob)
>>> +{
>>> +  bb->frequency = apply_probability (bb->frequency, prob);
>>> +  bb->count = apply_probability (bb->count, prob);
>>> +  gcc_assert (single_succ_p (bb));
>>> +  single_succ_edge (bb)->count = bb->count;
>>> +}
>>> +
>>>  /* Function vect_do_peeling.
>>>
>>> Input:
>>> @@ -1690,7 +1701,18 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree 
>>> niters, tree nitersm1,
>>>   may be preferred.  */
>>>basic_block anchor = loop_preheader_edge (loop)->src;
>>>if (skip_vector)
>>> -split_edge (loop_preheader_edge (loop));
>>> +{
>>> +  split_edge (loop_preheader_edge (loop));
>>> +
>>> +  /* Due to the order in which we peel prolog and epilog, we first
>>> +  propagate probability to the whole loop.  The purpose is to
>>> +  avoid adjusting probabilities of both prolog and vector loops
>>> +  separately.  Note in this case, the probability of epilog loop
>>> +  needs to be 

Re: [PATCH][AArch64] Accept more addressing modes for PRFM

2017-02-15 Thread Kyrill Tkachov

Hi Richard,

On 15/02/17 15:00, Richard Earnshaw (lists) wrote:

On 03/02/17 17:12, Kyrill Tkachov wrote:

Hi all,

While evaluating Maxim's SW prefetch patches [1] I noticed that the
aarch64 prefetch pattern is
overly restrictive in its address operand. It only accepts simple
register addressing modes.
In fact, the PRFM instruction accepts almost all modes that a normal
64-bit LDR supports.
The restriction in the pattern leads to explicit address calculation
code to be emitted which we could avoid.

This patch relaxes the restrictions on the prefetch define_insn. It
creates a predicate and constraint that
allow the full addressing modes that PRFM allows. Thus for the testcase
in the patch (adapted from one of the existing
__builtin_prefetch tests in the testsuite) we can generate a:
prfmPLDL1STRM, [x1, 8]

instead of the current
prfmPLDL1STRM, [x1]
with an explicit increment of x1 by 8 in a separate instruction.

I've removed the %a output modifier in the output template and wrapped
the address operand into a DImode MEM before
passing it down to aarch64_print_operand.

This is because operand 0 is an address operand rather than a memory
operand and thus doesn't have a mode associated
with it.  When processing the 'a' output modifier the code in final.c
will call TARGET_PRINT_OPERAND_ADDRESS with a VOIDmode
argument.  This will ICE on aarch64 because we need a mode for the
memory in order for aarch64_classify_address to work
correctly.  Rather than overriding the VOIDmode in
aarch64_print_operand_address I decided to instead create the DImode
MEM in the "prefetch" output template and treat it as a normal 64-bit
memory address, which at the point of assembly output
is what it is anyway.

With this patch I see a reduction in instruction count in the SPEC2006
benchmarks when SW prefetching is enabled on top
of Maxim's patchset because fewer address calculation instructions are
emitted due to the use of the more expressive
addressing modes. It also fixes a performance regression that I observed
in 410.bwaves from Maxim's patches on Cortex-A72.
I'll be running a full set of benchmarks to evaluate this further, but I
think this is the right thing to do.

Bootstrapped and tested on aarch64-none-linux-gnu.

Maxim, do you want to try this on top of your patches on your hardware
to see if it helps with the regressions you mentioned?

Thanks,
Kyrill


[1] https://gcc.gnu.org/ml/gcc-patches/2017-01/msg02284.html

2016-02-03  Kyrylo Tkachov  

 * config/aarch64/aarch64.md (prefetch); Adjust predicate and
 constraint on operand 0 to allow more general addressing modes.
 Adjust output template.
 * config/aarch64/aarch64.c (aarch64_address_valid_for_prefetch_p):
 New function.
 * config/aarch64/aarch64-protos.h
 (aarch64_address_valid_for_prefetch_p): Declare prototype.
 * config/aarch64/constraints.md (Dp): New address constraint.
 * config/aarch64/predicates.md (aarch64_prefetch_operand): New
 predicate.

2016-02-03  Kyrylo Tkachov  

 * gcc.target/aarch64/prfm_imm_offset_1.c: New test.

aarch64-prfm-imm.patch


Hmm, I'm not sure about this.  rtl.texi says that a prefetch code
contains an address, not a MEM.  So it's theoretically possible for
generic code to want to look inside the first operand and find an
address directly.  This change would break that assumption.


With this change the prefetch operand is still an address, not a MEM during all 
the
optimisation passes.
It's wrapped in a MEM only during the ultimate printing of the assembly string
during 'final'.

Kyrill


R.


commit a324e2f2ea243fe42f23a026ecbe1435876e2c8b
Author: Kyrylo Tkachov 
Date:   Thu Feb 2 14:46:11 2017 +

 [AArch64] Accept more addressing modes for PRFM

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index babc327..61706de 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -300,6 +300,7 @@ extern struct tune_params aarch64_tune_params;
  
  HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, unsigned);

  int aarch64_get_condition_code (rtx);
+bool aarch64_address_valid_for_prefetch_p (rtx, bool);
  bool aarch64_bitmask_imm (HOST_WIDE_INT val, machine_mode);
  unsigned HOST_WIDE_INT aarch64_and_split_imm1 (HOST_WIDE_INT val_in);
  unsigned HOST_WIDE_INT aarch64_and_split_imm2 (HOST_WIDE_INT val_in);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index acc093a..c05eff3 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -4549,6 +4549,24 @@ aarch64_classify_address (struct aarch64_address_info 
*info,
  }
  }
  
+/* Return true if the address X is valid for a PRFM instruction.

+   STRICT_P is true if we should do strict checking with
+   aarch64_classify_address.  */
+
+bool
+aarch64_address_valid_for_prefetch_p (rtx x, bool strict_p)
+{
+  struct 

Re: [PATCH][AArch64] Accept more addressing modes for PRFM

2017-02-15 Thread Richard Earnshaw (lists)
On 03/02/17 17:12, Kyrill Tkachov wrote:
> Hi all,
> 
> While evaluating Maxim's SW prefetch patches [1] I noticed that the
> aarch64 prefetch pattern is
> overly restrictive in its address operand. It only accepts simple
> register addressing modes.
> In fact, the PRFM instruction accepts almost all modes that a normal
> 64-bit LDR supports.
> The restriction in the pattern leads to explicit address calculation
> code to be emitted which we could avoid.
> 
> This patch relaxes the restrictions on the prefetch define_insn. It
> creates a predicate and constraint that
> allow the full addressing modes that PRFM allows. Thus for the testcase
> in the patch (adapted from one of the existing
> __builtin_prefetch tests in the testsuite) we can generate a:
> prfmPLDL1STRM, [x1, 8]
> 
> instead of the current
> prfmPLDL1STRM, [x1]
> with an explicit increment of x1 by 8 in a separate instruction.
> 
> I've removed the %a output modifier in the output template and wrapped
> the address operand into a DImode MEM before
> passing it down to aarch64_print_operand.
> 
> This is because operand 0 is an address operand rather than a memory
> operand and thus doesn't have a mode associated
> with it.  When processing the 'a' output modifier the code in final.c
> will call TARGET_PRINT_OPERAND_ADDRESS with a VOIDmode
> argument.  This will ICE on aarch64 because we need a mode for the
> memory in order for aarch64_classify_address to work
> correctly.  Rather than overriding the VOIDmode in
> aarch64_print_operand_address I decided to instead create the DImode
> MEM in the "prefetch" output template and treat it as a normal 64-bit
> memory address, which at the point of assembly output
> is what it is anyway.
> 
> With this patch I see a reduction in instruction count in the SPEC2006
> benchmarks when SW prefetching is enabled on top
> of Maxim's patchset because fewer address calculation instructions are
> emitted due to the use of the more expressive
> addressing modes. It also fixes a performance regression that I observed
> in 410.bwaves from Maxim's patches on Cortex-A72.
> I'll be running a full set of benchmarks to evaluate this further, but I
> think this is the right thing to do.
> 
> Bootstrapped and tested on aarch64-none-linux-gnu.
> 
> Maxim, do you want to try this on top of your patches on your hardware
> to see if it helps with the regressions you mentioned?
> 
> Thanks,
> Kyrill
> 
> 
> [1] https://gcc.gnu.org/ml/gcc-patches/2017-01/msg02284.html
> 
> 2016-02-03  Kyrylo Tkachov  
> 
> * config/aarch64/aarch64.md (prefetch); Adjust predicate and
> constraint on operand 0 to allow more general addressing modes.
> Adjust output template.
> * config/aarch64/aarch64.c (aarch64_address_valid_for_prefetch_p):
> New function.
> * config/aarch64/aarch64-protos.h
> (aarch64_address_valid_for_prefetch_p): Declare prototype.
> * config/aarch64/constraints.md (Dp): New address constraint.
> * config/aarch64/predicates.md (aarch64_prefetch_operand): New
> predicate.
> 
> 2016-02-03  Kyrylo Tkachov  
> 
> * gcc.target/aarch64/prfm_imm_offset_1.c: New test.
> 
> aarch64-prfm-imm.patch
> 

Hmm, I'm not sure about this.  rtl.texi says that a prefetch code
contains an address, not a MEM.  So it's theoretically possible for
generic code to want to look inside the first operand and find an
address directly.  This change would break that assumption.

R.

> 
> commit a324e2f2ea243fe42f23a026ecbe1435876e2c8b
> Author: Kyrylo Tkachov 
> Date:   Thu Feb 2 14:46:11 2017 +
> 
> [AArch64] Accept more addressing modes for PRFM
> 
> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index babc327..61706de 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -300,6 +300,7 @@ extern struct tune_params aarch64_tune_params;
>  
>  HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, unsigned);
>  int aarch64_get_condition_code (rtx);
> +bool aarch64_address_valid_for_prefetch_p (rtx, bool);
>  bool aarch64_bitmask_imm (HOST_WIDE_INT val, machine_mode);
>  unsigned HOST_WIDE_INT aarch64_and_split_imm1 (HOST_WIDE_INT val_in);
>  unsigned HOST_WIDE_INT aarch64_and_split_imm2 (HOST_WIDE_INT val_in);
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index acc093a..c05eff3 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -4549,6 +4549,24 @@ aarch64_classify_address (struct aarch64_address_info 
> *info,
>  }
>  }
>  
> +/* Return true if the address X is valid for a PRFM instruction.
> +   STRICT_P is true if we should do strict checking with
> +   aarch64_classify_address.  */
> +
> +bool
> +aarch64_address_valid_for_prefetch_p (rtx x, bool strict_p)
> +{
> +  struct aarch64_address_info addr;
> +
> +  /* PRFM accepts the same addresses as DImode...  */

Handle GIMPLE NOPs in is_maybe_undefined (PR, tree-optimization/79529).

2017-02-15 Thread Martin Liška
Hi.

As mentioned in the PR, gimple nops are wrongly handled in is_maybe_undefined 
function.
Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin
>From 54b98e2d035f92ec20bf7b548f90b1d00c4c597b Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 15 Feb 2017 13:46:38 +0100
Subject: [PATCH] Handle GIMPLE NOPs in is_maybe_undefined (PR
 tree-optimization/79529).

gcc/ChangeLog:

2017-02-15  Martin Liska  

	PR tree-optimization/79529
	* tree-ssa-loop-unswitch.c (is_maybe_undefined): Bail out when
	spotting a gimple NOP.
---
 gcc/tree-ssa-loop-unswitch.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/tree-ssa-loop-unswitch.c b/gcc/tree-ssa-loop-unswitch.c
index 4ef3a6bf80a..a52e4719bec 100644
--- a/gcc/tree-ssa-loop-unswitch.c
+++ b/gcc/tree-ssa-loop-unswitch.c
@@ -141,6 +141,9 @@ is_maybe_undefined (const tree name, gimple *stmt, struct loop *loop)
 
   gimple *def = SSA_NAME_DEF_STMT (t);
 
+  if (!def || gimple_nop_p (def))
+	return true;
+
   /* Check that all the PHI args are fully defined.  */
   if (gphi *phi = dyn_cast  (def))
 	{
-- 
2.11.0



Re: [PATCH][GRAPHITE] Remove support for ISL 0.14

2017-02-15 Thread Thomas Schwinge
Hi!

On Wed, 15 Feb 2017 13:44:13 +0100, I wrote:
> On Fri, 10 Feb 2017 15:13:57 +0100 (CET), Richard Biener  
> wrote:
> > As a cleanup (and to be able to close bugs only reproducing with ISL 0.14)
> > the following removes support for ISL 0.14 for GCC 7.

> OK to commit the following to restore graphite fuse-*.c testing?  That
> is, revert most of (the remaining pieces of) r232811.

;-) Heh, I forgot to include the obvious gcc/Makefile.in change.  With
that added, committed to trunk in r245483:

commit a7355f503d3d6f0a8e98b48440fcfc72cc7a8963
Author: tschwinge 
Date:   Wed Feb 15 14:43:42 2017 +

Restore Graphite fuse-*.c testing

* Makefile.tpl: Remove HOST_ISLVER.
(HOST_EXPORTS): Remove ISLVER.
* Makefile.in: Regenerate.
gcc/
* Makefile.in (site.exp): Remove "set ISLVER".
gcc/testsuite/
* gcc.dg/graphite/graphite.exp: Merge "fuse_files" into
"opt_files".

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@245483 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 ChangeLog  | 6 ++
 Makefile.in| 2 --
 Makefile.tpl   | 2 --
 gcc/ChangeLog  | 4 
 gcc/Makefile.in| 1 -
 gcc/testsuite/ChangeLog| 5 +
 gcc/testsuite/gcc.dg/graphite/graphite.exp | 8 +---
 7 files changed, 16 insertions(+), 12 deletions(-)

diff --git ChangeLog ChangeLog
index 071a281..f4e6dc3 100644
--- ChangeLog
+++ ChangeLog
@@ -1,3 +1,9 @@
+2017-02-15  Thomas Schwinge  
+
+   * Makefile.tpl: Remove HOST_ISLVER.
+   (HOST_EXPORTS): Remove ISLVER.
+   * Makefile.in: Regenerate.
+
 2017-02-13  Richard Biener  
 
* configure: Re-generate.
diff --git Makefile.in Makefile.in
index 1c0b9e7..da2600b 100644
--- Makefile.in
+++ Makefile.in
@@ -220,7 +220,6 @@ HOST_EXPORTS = \
GMPINC="$(HOST_GMPINC)"; export GMPINC; \
ISLLIBS="$(HOST_ISLLIBS)"; export ISLLIBS; \
ISLINC="$(HOST_ISLINC)"; export ISLINC; \
-   ISLVER="$(HOST_ISLVER)"; export ISLVER; \
LIBELFLIBS="$(HOST_LIBELFLIBS)"; export LIBELFLIBS; \
LIBELFINC="$(HOST_LIBELFINC)"; export LIBELFINC; \
XGCC_FLAGS_FOR_TARGET="$(XGCC_FLAGS_FOR_TARGET)"; export 
XGCC_FLAGS_FOR_TARGET; \
@@ -313,7 +312,6 @@ HOST_GMPINC = @gmpinc@
 # Where to find isl
 HOST_ISLLIBS = @isllibs@
 HOST_ISLINC = @islinc@
-HOST_ISLVER = @islver@
 
 # Where to find libelf
 HOST_LIBELFLIBS = @libelflibs@
diff --git Makefile.tpl Makefile.tpl
index a6a3166..d0fa070 100644
--- Makefile.tpl
+++ Makefile.tpl
@@ -223,7 +223,6 @@ HOST_EXPORTS = \
GMPINC="$(HOST_GMPINC)"; export GMPINC; \
ISLLIBS="$(HOST_ISLLIBS)"; export ISLLIBS; \
ISLINC="$(HOST_ISLINC)"; export ISLINC; \
-   ISLVER="$(HOST_ISLVER)"; export ISLVER; \
LIBELFLIBS="$(HOST_LIBELFLIBS)"; export LIBELFLIBS; \
LIBELFINC="$(HOST_LIBELFINC)"; export LIBELFINC; \
XGCC_FLAGS_FOR_TARGET="$(XGCC_FLAGS_FOR_TARGET)"; export 
XGCC_FLAGS_FOR_TARGET; \
@@ -316,7 +315,6 @@ HOST_GMPINC = @gmpinc@
 # Where to find isl
 HOST_ISLLIBS = @isllibs@
 HOST_ISLINC = @islinc@
-HOST_ISLVER = @islver@
 
 # Where to find libelf
 HOST_LIBELFLIBS = @libelflibs@
diff --git gcc/ChangeLog gcc/ChangeLog
index 7466dab..fa0b01c 100644
--- gcc/ChangeLog
+++ gcc/ChangeLog
@@ -1,3 +1,7 @@
+2017-02-15  Thomas Schwinge  
+
+   * Makefile.in (site.exp): Remove "set ISLVER".
+
 2017-02-15  Jakub Jelinek  
 
PR target/79487
diff --git gcc/Makefile.in gcc/Makefile.in
index 821584a..0cde1ae 100644
--- gcc/Makefile.in
+++ gcc/Makefile.in
@@ -3805,7 +3805,6 @@ site.exp: ./config.status Makefile
  echo "set PLUGINCFLAGS \"$(PLUGINCFLAGS)\"" >> ./site.tmp; \
  echo "set GMPINC \"$(GMPINC)\"" >> ./site.tmp; \
fi
-   @echo "set ISLVER \"$(ISLVER)\"" >> ./site.tmp
 # If newlib has been configured, we need to pass -B to gcc so it can find
 # newlib's crt0.o if it exists.  This will cause a "path prefix not used"
 # message if it doesn't, but the testsuite is supposed to ignore the message -
diff --git gcc/testsuite/ChangeLog gcc/testsuite/ChangeLog
index a379d2a..2097fc9 100644
--- gcc/testsuite/ChangeLog
+++ gcc/testsuite/ChangeLog
@@ -1,3 +1,8 @@
+2017-02-15  Thomas Schwinge  
+
+   * gcc.dg/graphite/graphite.exp: Merge "fuse_files" into
+   "opt_files".
+
 2017-02-15  Jakub Jelinek  
 
PR target/79487
diff --git gcc/testsuite/gcc.dg/graphite/graphite.exp 
gcc/testsuite/gcc.dg/graphite/graphite.exp
index 2499431..50aae30 100644
--- gcc/testsuite/gcc.dg/graphite/graphite.exp
+++ gcc/testsuite/gcc.dg/graphite/graphite.exp
@@ -49,10 +49,10 @@ set run_id_files  [lsort [glob 

Re: [PATCH][GRAPHITE] Remove support for ISL 0.14

2017-02-15 Thread Sebastian Pop
On Wed, Feb 15, 2017 at 6:44 AM, Thomas Schwinge
 wrote:
> Hi!
>
> On Fri, 10 Feb 2017 15:13:57 +0100 (CET), Richard Biener  
> wrote:
>> As a cleanup (and to be able to close bugs only reproducing with ISL 0.14)
>> the following removes support for ISL 0.14 for GCC 7.
>
> (This got committed in r245382.)
>
>> --- config/isl.m4 (revision 245328)
>> +++ config/isl.m4 (working copy)
>> @@ -106,27 +106,15 @@ AC_DEFUN([ISL_CHECK_VERSION],
>
>> -if test x"$ac_has_isl_options_set_schedule_serialize_sccs" = x"yes"; 
>> then
>> -  islver="0.15"
>> -  AC_SUBST([islver])
>> +  AC_MSG_RESULT([required isl version is 0.15 or later])
>>  fi
>
> This removed "islver", which is still used:
>
> Makefile.tpl:   ISLVER="$(HOST_ISLVER)"; export ISLVER; \
> Makefile.tpl:HOST_ISLVER = @islver@
> gcc/Makefile.in:@echo "set ISLVER \"$(ISLVER)\"" >> ./site.tmp
> gcc/testsuite/gcc.dg/graphite/graphite.exp:global ISLVER
> gcc/testsuite/gcc.dg/graphite/graphite.exp:if { $ISLVER == "0.15" } {
> gcc/testsuite/gcc.dg/graphite/graphite.exp-dg-runtest $fuse_files 
> "" "-O2 -ffast-math -floop-nest-optimize -fdump-tree-graphite-all"
> gcc/testsuite/gcc.dg/graphite/graphite.exp-}
>
> OK to commit the following to restore graphite fuse-*.c testing?  That
> is, revert most of (the remaining pieces of) r232811.
>

Looks good.  Thanks!

Sebastian


>  Makefile.in| 2 --
>  Makefile.tpl   | 2 --
>  gcc/testsuite/gcc.dg/graphite/graphite.exp | 8 +---
>  3 files changed, 1 insertion(+), 11 deletions(-)
>
> diff --git Makefile.in Makefile.in
> [snipped]
> diff --git Makefile.tpl Makefile.tpl
> index a6a3166..d0fa070 100644
> --- Makefile.tpl
> +++ Makefile.tpl
> @@ -223,7 +223,6 @@ HOST_EXPORTS = \
> GMPINC="$(HOST_GMPINC)"; export GMPINC; \
> ISLLIBS="$(HOST_ISLLIBS)"; export ISLLIBS; \
> ISLINC="$(HOST_ISLINC)"; export ISLINC; \
> -   ISLVER="$(HOST_ISLVER)"; export ISLVER; \
> LIBELFLIBS="$(HOST_LIBELFLIBS)"; export LIBELFLIBS; \
> LIBELFINC="$(HOST_LIBELFINC)"; export LIBELFINC; \
> XGCC_FLAGS_FOR_TARGET="$(XGCC_FLAGS_FOR_TARGET)"; export 
> XGCC_FLAGS_FOR_TARGET; \
> @@ -316,7 +315,6 @@ HOST_GMPINC = @gmpinc@
>  # Where to find isl
>  HOST_ISLLIBS = @isllibs@
>  HOST_ISLINC = @islinc@
> -HOST_ISLVER = @islver@
>
>  # Where to find libelf
>  HOST_LIBELFLIBS = @libelflibs@
> diff --git gcc/testsuite/gcc.dg/graphite/graphite.exp 
> gcc/testsuite/gcc.dg/graphite/graphite.exp
> index 2499431..50aae30 100644
> --- gcc/testsuite/gcc.dg/graphite/graphite.exp
> +++ gcc/testsuite/gcc.dg/graphite/graphite.exp
> @@ -49,10 +49,10 @@ set run_id_files  [lsort [glob -nocomplain 
> $srcdir/$subdir/run-id-*.c ] ]
>  set opt_files [lsort [glob -nocomplain 
> $srcdir/$subdir/interchange-*.c \
>
> $srcdir/$subdir/uns-interchange-*.c \
>
> $srcdir/$subdir/isl-ast-gen-*.c \
> +  $srcdir/$subdir/fuse-*.c \
>$srcdir/$subdir/block-*.c \
>$srcdir/$subdir/uns-block-*.c 
> ] ]
>  set vect_files[lsort [glob -nocomplain $srcdir/$subdir/vect-*.c ] ]
> -set fuse_files[lsort [glob -nocomplain $srcdir/$subdir/fuse-*.c ] ]
>
>  # Tests to be compiled.
>  set dg-do-what-default compile
> @@ -64,11 +64,6 @@ set dg-do-what-default run
>  dg-runtest $run_id_files  "" "-O2 -fgraphite-identity"
>  dg-runtest $opt_files "" "-O2 -ffast-math -floop-nest-optimize 
> -fdump-tree-graphite-all"
>
> -global ISLVER
> -if { $ISLVER == "0.15" } {
> -dg-runtest $fuse_files "" "-O2 -ffast-math -floop-nest-optimize 
> -fdump-tree-graphite-all"
> -}
> -
>  # Vectorizer tests, to be run or compiled, depending on target capabilities.
>  global DEFAULT_VECTCFLAGS
>  set DEFAULT_VECTCFLAGS "-O2 -fgraphite-identity -ftree-vectorize 
> -fno-vect-cost-model -fdump-tree-vect-details -ffast-math"
> @@ -84,7 +79,6 @@ foreach f $id_files  {lremove wait_to_run_files $f}
>  foreach f $run_id_files  {lremove wait_to_run_files $f}
>  foreach f $opt_files {lremove wait_to_run_files $f}
>  foreach f $vect_files{lremove wait_to_run_files $f}
> -foreach f $fuse_files{lremove wait_to_run_files $f}
>  dg-runtest $wait_to_run_files "" "-ansi -pedantic-errors"
>
>  # Clean up.
>
>
> Grüße
>  Thomas


Re: [PATCH][GRAPHITE] Remove support for ISL 0.14

2017-02-15 Thread Richard Biener
On February 15, 2017 1:44:13 PM GMT+01:00, Thomas Schwinge 
 wrote:
>Hi!
>
>On Fri, 10 Feb 2017 15:13:57 +0100 (CET), Richard Biener
> wrote:
>> As a cleanup (and to be able to close bugs only reproducing with ISL
>0.14)
>> the following removes support for ISL 0.14 for GCC 7.
>
>(This got committed in r245382.)
>
>> --- config/isl.m4(revision 245328)
>> +++ config/isl.m4(working copy)
>> @@ -106,27 +106,15 @@ AC_DEFUN([ISL_CHECK_VERSION],
>
>> -if test x"$ac_has_isl_options_set_schedule_serialize_sccs" =
>x"yes"; then
>> -  islver="0.15"
>> -  AC_SUBST([islver])
>> +  AC_MSG_RESULT([required isl version is 0.15 or later])
>>  fi
>
>This removed "islver", which is still used:
>
>Makefile.tpl:   ISLVER="$(HOST_ISLVER)"; export ISLVER; \
>Makefile.tpl:HOST_ISLVER = @islver@
> gcc/Makefile.in:@echo "set ISLVER \"$(ISLVER)\"" >> ./site.tmp
>gcc/testsuite/gcc.dg/graphite/graphite.exp:global ISLVER
>  gcc/testsuite/gcc.dg/graphite/graphite.exp:if { $ISLVER == "0.15" } {
>gcc/testsuite/gcc.dg/graphite/graphite.exp-dg-runtest $fuse_files  
> "" "-O2 -ffast-math -floop-nest-optimize -fdump-tree-graphite-all"
>gcc/testsuite/gcc.dg/graphite/graphite.exp-}
>
>OK to commit the following to restore graphite fuse-*.c testing?  That
>is, revert most of (the remaining pieces of) r232811.

OK.

Richard.

> Makefile.in| 2 --
> Makefile.tpl   | 2 --
> gcc/testsuite/gcc.dg/graphite/graphite.exp | 8 +---
> 3 files changed, 1 insertion(+), 11 deletions(-)
>
>diff --git Makefile.in Makefile.in
>[snipped]
>diff --git Makefile.tpl Makefile.tpl
>index a6a3166..d0fa070 100644
>--- Makefile.tpl
>+++ Makefile.tpl
>@@ -223,7 +223,6 @@ HOST_EXPORTS = \
>   GMPINC="$(HOST_GMPINC)"; export GMPINC; \
>   ISLLIBS="$(HOST_ISLLIBS)"; export ISLLIBS; \
>   ISLINC="$(HOST_ISLINC)"; export ISLINC; \
>-  ISLVER="$(HOST_ISLVER)"; export ISLVER; \
>   LIBELFLIBS="$(HOST_LIBELFLIBS)"; export LIBELFLIBS; \
>   LIBELFINC="$(HOST_LIBELFINC)"; export LIBELFINC; \
>   XGCC_FLAGS_FOR_TARGET="$(XGCC_FLAGS_FOR_TARGET)"; export
>XGCC_FLAGS_FOR_TARGET; \
>@@ -316,7 +315,6 @@ HOST_GMPINC = @gmpinc@
> # Where to find isl
> HOST_ISLLIBS = @isllibs@
> HOST_ISLINC = @islinc@
>-HOST_ISLVER = @islver@
> 
> # Where to find libelf
> HOST_LIBELFLIBS = @libelflibs@
>diff --git gcc/testsuite/gcc.dg/graphite/graphite.exp
>gcc/testsuite/gcc.dg/graphite/graphite.exp
>index 2499431..50aae30 100644
>--- gcc/testsuite/gcc.dg/graphite/graphite.exp
>+++ gcc/testsuite/gcc.dg/graphite/graphite.exp
>@@ -49,10 +49,10 @@ set run_id_files  [lsort [glob -nocomplain
>$srcdir/$subdir/run-id-*.c ] ]
>set opt_files [lsort [glob -nocomplain
>$srcdir/$subdir/interchange-*.c \
>  
> $srcdir/$subdir/uns-interchange-*.c \
>  $srcdir/$subdir/isl-ast-gen-*.c \
>+ $srcdir/$subdir/fuse-*.c \
>  $srcdir/$subdir/block-*.c \
>  $srcdir/$subdir/uns-block-*.c ] ]
>set vect_files[lsort [glob -nocomplain $srcdir/$subdir/vect-*.c
>] ]
>-set fuse_files[lsort [glob -nocomplain
>$srcdir/$subdir/fuse-*.c ] ]
> 
> # Tests to be compiled.
> set dg-do-what-default compile
>@@ -64,11 +64,6 @@ set dg-do-what-default run
> dg-runtest $run_id_files  "" "-O2 -fgraphite-identity"
>dg-runtest $opt_files "" "-O2 -ffast-math -floop-nest-optimize
>-fdump-tree-graphite-all"
> 
>-global ISLVER
>-if { $ISLVER == "0.15" } {
>-dg-runtest $fuse_files "" "-O2 -ffast-math
>-floop-nest-optimize -fdump-tree-graphite-all"
>-}
>-
># Vectorizer tests, to be run or compiled, depending on target
>capabilities.
> global DEFAULT_VECTCFLAGS
>set DEFAULT_VECTCFLAGS "-O2 -fgraphite-identity -ftree-vectorize
>-fno-vect-cost-model -fdump-tree-vect-details -ffast-math"
>@@ -84,7 +79,6 @@ foreach f $id_files  {lremove
>wait_to_run_files $f}
> foreach f $run_id_files  {lremove wait_to_run_files $f}
> foreach f $opt_files {lremove wait_to_run_files $f}
> foreach f $vect_files{lremove wait_to_run_files $f}
>-foreach f $fuse_files{lremove wait_to_run_files $f}
> dg-runtest $wait_to_run_files "" "-ansi -pedantic-errors"
> 
> # Clean up.
>
>
>Grüße
> Thomas



[PATCH] Enable RDPID instruction.

2017-02-15 Thread Koval, Julia
Hi,

This patch enables RDPID intrinsic, described in SDM 4-534 Vol. 2B
https://software.intel.com/sites/default/files/managed/39/c5/325462-sdm-vol-1-2abcd-3abcd.pdf
and intrinsic guide:
https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=rdpi=2778,2777,4219

gcc/
* common/config/i386/i386-common.c (OPTION_MASK_ISA_RDPID_SET): New.
(OPTION_MASK_ISA_PKU_UNSET): New.
(ix86_handle_option): Handle -mrdpid.
* config/i386/cpuid.h
(bit_RDPID): New.
* config/i386/driver-i386.c (host_detect_local_cpu): Detect RDPID 
feature.
* config/i386/i386-builtin.def (__builtin_ia32_rdpid): New.
* config/i386/i386-c.c (ix86_target_macros_internal): Handle RDPID flag.
* config/i386/i386.c (ix86_target_string): Add -mrdpid to isa2_opts.
(ix86_valid_target_attribute_inner_p): Add "rdpid".
(ix86_expand_builtin): Handle IX86_BUILTIN_RDPID.
* config/i386/i386.h (TARGET_RDPID, TARGET_RDPID_P): New.
* config/i386/i386.md (define_insn "rdpid"): New.
* config/i386/i386.opt Add -mrdpid.
* config/i386/immintrin.h (_rdpid_u32): New.
* testsuite/gcc.target/i386/rdpid.c New test.

Ok for trunk?

Julia


rdpid_patch_2_15
Description: rdpid_patch_2_15


[PATCH, Fortran, pr79335, v2] [7 Regression] Conditional jump or move depends on uninitialised in value get_scalar_to_descriptor_type(tree_node*, symbol_attribute) (trans-expr.c:53)

2017-02-15 Thread Andre Vehreschild
Hi all,

attached patch fixes (hopefully) all occurrences of uninitialized
symbol_attributes used for getting a descriptor for a scalar as reported in the
reopended pr79335.

Bootstraps and regtests ok on x86_64-linux/f25. Ok for trunk?

Regards,
Andre

On Tue, 14 Feb 2017 08:32:59 +
"marxin at gcc dot gnu.org"  wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79335
> 
> Martin Liška  changed:
> 
>What|Removed |Added
> 
>  Status|WAITING |REOPENED
> 
> --- Comment #8 from Martin Liška  ---
> Thanks for fix, however there are still some issues:
> 
> $ valgrind --leak-check=yes --trace-children=yes ./xgcc -B.
> /home/marxin/Programming/gcc/gcc/testsuite/gfortran.dg/coarray_lib_alloc_4.f90
> -fcoarray=lib -lcaf_single 
> 
> ==15334== Conditional jump or move depends on uninitialised value(s)
> ==15334==at 0x95EE91: get_scalar_to_descriptor_type(tree_node*,
> symbol_attribute) (trans-expr.c:53)
> ==15334==by 0x95EFF0: gfc_conv_scalar_to_descriptor(gfc_se*, tree_node*,
> symbol_attribute) (trans-expr.c:71)
> ==15334==by 0x977D9C: gfc_trans_structure_assign(tree_node*, gfc_expr*,
> bool, bool) (trans-expr.c:7552)
> ==15334==by 0x97830F: gfc_conv_structure(gfc_se*, gfc_expr*, int)
> (trans-expr.c:7646)
> ==15334==by 0x978AD9: gfc_conv_expr(gfc_se*, gfc_expr*)
> (trans-expr.c:7813) ==15334==by 0x97F5EC:
> gfc_trans_assignment_1(gfc_expr*, gfc_expr*, bool, bool, bool, bool)
> (trans-expr.c:9923) ==15334==by 0x9804A5: gfc_trans_assignment(gfc_expr*,
> gfc_expr*, bool, bool, bool, bool) (trans-expr.c:10231)
> ==15334==by 0x9804E1: gfc_trans_init_assign(gfc_code*)
> (trans-expr.c:10237) ==15334==by 0x9D2655: gfc_trans_allocate(gfc_code*)
> (trans-stmt.c:6328) ==15334==by 0x92077B: trans_code(gfc_code*,
> tree_node*) (trans.c:1965) ==15334==by 0x9209F6:
> gfc_trans_code(gfc_code*) (trans.c:2124) ==15334==by 0x95B503:
> gfc_generate_function_code(gfc_namespace*) (trans-decl.c:6306)
> ...
> 


-- 
Andre Vehreschild * Email: vehre ad gmx dot de 
gcc/fortran/ChangeLog:

2017-02-15  Andre Vehreschild  

PR fortran/79335
* trans-array.c (duplicate_allocatable_coarray): Ensure attributes
passed are properly initialized.
(structure_alloc_comps): Same.
* trans-expr.c (gfc_trans_structure_assign): Same.


diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c
index d0dfc26..47e8c09 100644
--- a/gcc/fortran/trans-array.c
+++ b/gcc/fortran/trans-array.c
@@ -7950,6 +7950,8 @@ duplicate_allocatable_coarray (tree dest, tree dest_tok, tree src,
   tree dummy_desc;
 
   gfc_init_se (, NULL);
+  gfc_clear_attr ();
+  attr.allocatable = 1;
   dummy_desc = gfc_conv_scalar_to_descriptor (, dest, attr);
   gfc_add_block_to_block (, );
   size = TYPE_SIZE_UNIT (TREE_TYPE (type));
@@ -8518,14 +8520,15 @@ structure_alloc_comps (gfc_symbol * der_type, tree decl,
 	  else
 		{
 		  gfc_se se;
-		  symbol_attribute attr;
 
 		  gfc_init_se (, NULL);
-		  gfc_clear_attr ();
 		  token = fold_build3_loc (input_location, COMPONENT_REF,
 	   pvoid_type_node, decl, c->caf_token,
 	   NULL_TREE);
-		  comp = gfc_conv_scalar_to_descriptor (, comp, attr);
+		  comp = gfc_conv_scalar_to_descriptor (, comp,
+			c->ts.type == BT_CLASS
+			? CLASS_DATA (c)->attr
+			: c->attr);
 		  gfc_add_block_to_block (, );
 		}
 
diff --git a/gcc/fortran/trans-expr.c b/gcc/fortran/trans-expr.c
index 87bf069..cc41fe3 100644
--- a/gcc/fortran/trans-expr.c
+++ b/gcc/fortran/trans-expr.c
@@ -7516,7 +7516,6 @@ gfc_trans_structure_assign (tree dest, gfc_expr * expr, bool init, bool coarray)
 	  && (!c->expr || c->expr->expr_type == EXPR_NULL))
 	{
 	  tree token, desc, size;
-	  symbol_attribute attr;
 	  bool is_array = cm->ts.type == BT_CLASS
 	  ? CLASS_DATA (cm)->attr.dimension : cm->attr.dimension;
 
@@ -7549,7 +7548,10 @@ gfc_trans_structure_assign (tree dest, gfc_expr * expr, bool init, bool coarray)
 	}
 	  else
 	{
-	  desc = gfc_conv_scalar_to_descriptor (, field, attr);
+	  desc = gfc_conv_scalar_to_descriptor (, field,
+		cm->ts.type == BT_CLASS
+		? CLASS_DATA (cm)->attr
+		: cm->attr);
 	  size = TYPE_SIZE_UNIT (TREE_TYPE (field));
 	}
 	  gfc_add_block_to_block (, );


Re: [PATCH doc] clean up -fdump-tree- options (PR 32003)

2017-02-15 Thread Thomas Schwinge
Hi!

On Wed, 1 Feb 2017 20:26:24 -0700, Martin Sebor  wrote:
> On 02/01/2017 08:06 PM, Sandra Loosemore wrote:
> > On 02/01/2017 06:57 PM, Martin Sebor wrote:
>   PR middle-end/32003
>   * doc/invoke.texi (-fdump-rtl-): Remove pass-specific options from
>   index.

"rtl" vs. "tree" typo.  ;-)

> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -544,29 +544,9 @@ Objective-C and Objective-C++ Dialects}.
>  -fdump-rtl-@var{pass}  -fdump-rtl-@var{pass}=@var{filename} @gol
>  -fdump-statistics @gol
>  -fdump-tree-all @gol
> --fdump-tree-original@r{[}-@var{n}@r{]}  @gol
> -[...]
> --fdump-tree-storeccp@r{[}-@var{n}@r{]} @gol
> --fdump-final-insns=@var{file} @gol

Is it intentional that you've also removed "-fdump-final-insns" here?
(It remains documented further down the file.)


Grüße
 Thomas


Re: [PATCH][GRAPHITE] Remove support for ISL 0.14

2017-02-15 Thread Thomas Schwinge
Hi!

On Fri, 10 Feb 2017 15:13:57 +0100 (CET), Richard Biener  
wrote:
> As a cleanup (and to be able to close bugs only reproducing with ISL 0.14)
> the following removes support for ISL 0.14 for GCC 7.

(This got committed in r245382.)

> --- config/isl.m4 (revision 245328)
> +++ config/isl.m4 (working copy)
> @@ -106,27 +106,15 @@ AC_DEFUN([ISL_CHECK_VERSION],

> -if test x"$ac_has_isl_options_set_schedule_serialize_sccs" = x"yes"; then
> -  islver="0.15"
> -  AC_SUBST([islver])
> +  AC_MSG_RESULT([required isl version is 0.15 or later])
>  fi

This removed "islver", which is still used:

Makefile.tpl:   ISLVER="$(HOST_ISLVER)"; export ISLVER; \
Makefile.tpl:HOST_ISLVER = @islver@
gcc/Makefile.in:@echo "set ISLVER \"$(ISLVER)\"" >> ./site.tmp
gcc/testsuite/gcc.dg/graphite/graphite.exp:global ISLVER
gcc/testsuite/gcc.dg/graphite/graphite.exp:if { $ISLVER == "0.15" } {
gcc/testsuite/gcc.dg/graphite/graphite.exp-dg-runtest $fuse_files   
  "" "-O2 -ffast-math -floop-nest-optimize -fdump-tree-graphite-all"
gcc/testsuite/gcc.dg/graphite/graphite.exp-}

OK to commit the following to restore graphite fuse-*.c testing?  That
is, revert most of (the remaining pieces of) r232811.

 Makefile.in| 2 --
 Makefile.tpl   | 2 --
 gcc/testsuite/gcc.dg/graphite/graphite.exp | 8 +---
 3 files changed, 1 insertion(+), 11 deletions(-)

diff --git Makefile.in Makefile.in
[snipped]
diff --git Makefile.tpl Makefile.tpl
index a6a3166..d0fa070 100644
--- Makefile.tpl
+++ Makefile.tpl
@@ -223,7 +223,6 @@ HOST_EXPORTS = \
GMPINC="$(HOST_GMPINC)"; export GMPINC; \
ISLLIBS="$(HOST_ISLLIBS)"; export ISLLIBS; \
ISLINC="$(HOST_ISLINC)"; export ISLINC; \
-   ISLVER="$(HOST_ISLVER)"; export ISLVER; \
LIBELFLIBS="$(HOST_LIBELFLIBS)"; export LIBELFLIBS; \
LIBELFINC="$(HOST_LIBELFINC)"; export LIBELFINC; \
XGCC_FLAGS_FOR_TARGET="$(XGCC_FLAGS_FOR_TARGET)"; export 
XGCC_FLAGS_FOR_TARGET; \
@@ -316,7 +315,6 @@ HOST_GMPINC = @gmpinc@
 # Where to find isl
 HOST_ISLLIBS = @isllibs@
 HOST_ISLINC = @islinc@
-HOST_ISLVER = @islver@
 
 # Where to find libelf
 HOST_LIBELFLIBS = @libelflibs@
diff --git gcc/testsuite/gcc.dg/graphite/graphite.exp 
gcc/testsuite/gcc.dg/graphite/graphite.exp
index 2499431..50aae30 100644
--- gcc/testsuite/gcc.dg/graphite/graphite.exp
+++ gcc/testsuite/gcc.dg/graphite/graphite.exp
@@ -49,10 +49,10 @@ set run_id_files  [lsort [glob -nocomplain 
$srcdir/$subdir/run-id-*.c ] ]
 set opt_files [lsort [glob -nocomplain $srcdir/$subdir/interchange-*.c 
\
   
$srcdir/$subdir/uns-interchange-*.c \
   $srcdir/$subdir/isl-ast-gen-*.c \
+  $srcdir/$subdir/fuse-*.c \
   $srcdir/$subdir/block-*.c \
   $srcdir/$subdir/uns-block-*.c ] ]
 set vect_files[lsort [glob -nocomplain $srcdir/$subdir/vect-*.c ] ]
-set fuse_files[lsort [glob -nocomplain $srcdir/$subdir/fuse-*.c ] ]
 
 # Tests to be compiled.
 set dg-do-what-default compile
@@ -64,11 +64,6 @@ set dg-do-what-default run
 dg-runtest $run_id_files  "" "-O2 -fgraphite-identity"
 dg-runtest $opt_files "" "-O2 -ffast-math -floop-nest-optimize 
-fdump-tree-graphite-all"
 
-global ISLVER
-if { $ISLVER == "0.15" } {
-dg-runtest $fuse_files "" "-O2 -ffast-math -floop-nest-optimize 
-fdump-tree-graphite-all"
-}
-
 # Vectorizer tests, to be run or compiled, depending on target capabilities.
 global DEFAULT_VECTCFLAGS
 set DEFAULT_VECTCFLAGS "-O2 -fgraphite-identity -ftree-vectorize 
-fno-vect-cost-model -fdump-tree-vect-details -ffast-math"
@@ -84,7 +79,6 @@ foreach f $id_files  {lremove wait_to_run_files $f}
 foreach f $run_id_files  {lremove wait_to_run_files $f}
 foreach f $opt_files {lremove wait_to_run_files $f}
 foreach f $vect_files{lremove wait_to_run_files $f}
-foreach f $fuse_files{lremove wait_to_run_files $f}
 dg-runtest $wait_to_run_files "" "-ansi -pedantic-errors"
 
 # Clean up.


Grüße
 Thomas


Re: [PATCH] Fix DFP conversion from INTEGER_CST to REAL_CST (PR target/79487)

2017-02-15 Thread Jakub Jelinek
On Wed, Feb 15, 2017 at 09:15:48AM +0100, Richard Biener wrote:
> >As the following testcase shows, we store decimal REAL_CSTs always in
> >_Decimal128 internal form and perform all the arithmetics on that, but
> >while
> >for arithmetics we then ensure rounding to the actual type
> >(_Decimal{32,64}
> >or for _Decimal128 no further rounding), e.g. const_binop calls
> >  inexact = real_arithmetic (, code, , );
> >  real_convert (, mode, );
> >when converting integers to _Decimal{32,64} we do nothing like that.
> >We do that only for non-decimal conversions from INTEGER_CSTs to
> >REAL_CSTs.
> >
> >The following patch fixes that.  Bootstrapped/regtested on x86_64-linux
> >(i686-linux fails to bootstrap for other reason), and on 6.x branch on
> >x86_64-linux and i686-linux.  Dominik has kindly tested it on s390x
> >(where
> >the bug has been originally reported on the float-cast-overflow-10.c
> >test).
> >
> >Ok for trunk?
> 
> OK.

Note I think it is not 100% correct because it is then suffering
from double rounding, but that is only the case for conversions from
__int128/unsigned __int128 and those are apparently not supported anyway
(things compile, but unless the conversion is optimized during compilation,
programs don't link, as the libraries don't support it).

E.g. I believe
int
main ()
{
  // 9994999
  _Decimal32 a = (((__int128_t) 0x134261629f6653ULL) << 64) | 
0x0c750eb777ffULL;
  _Decimal32 b = 9995LL;
  _Decimal32 c = 999500LL;
  // 9994499
  _Decimal32 d = (((__int128_t) 0x1342616101cf34ULL) << 64) | 
0xbc8cce9903ffULL;
  _Decimal32 e = 9994LL;
  if (a != 9.99E+34DF
  || b != 1.00E+8DF
  || c != 1.00E+14DF
  || d != 9.99E+34DF
  || e != 9.99E+7DF)
__builtin_abort ();
  return 0;
}

should pass, but it doesn't, without or with my patch, a is different.  As
only
999
   1000
are exactly representable in _Decimal32 and we do round to even, I believe
both numbers should be rounded down, but when first rounding to _Decimal128
the first number is rounded up to 9995000
and then that is rounded again up to 1000.
I've tried to tweak this, but haven't succeeded.  And I believe all of
_Decimal{32,64} arithmetics is broken similarly if evaluated by the
compiler, as decimal_real_arithmetic helpers perform all the computations
in _Decimal128 precision (so do rounding in it too) and then everything
is rounded again to _Decimal{32,64}.

Anyway, I think my patch is a significant improvement and except for the
not really supported int128 -> _Decimal{32,64} conversion it should work
fine, so I'm committing the patch now.

What I've been trying is below (incremental patch), but it didn't seem
to work well, I really don't know how decNumber works.

--- gcc/real.c  2017-02-14 21:35:35.868906203 +0100
+++ gcc/real.c  2017-02-15 12:40:22.551817716 +0100
@@ -101,7 +101,8 @@
 static void do_fix_trunc (REAL_VALUE_TYPE *, const REAL_VALUE_TYPE *);
 
 static unsigned long rtd_divmod (REAL_VALUE_TYPE *, REAL_VALUE_TYPE *);
-static void decimal_from_integer (REAL_VALUE_TYPE *);
+static void decimal_from_integer (REAL_VALUE_TYPE *,
+ const struct real_format *);
 static void decimal_integer_string (char *, const REAL_VALUE_TYPE *,
size_t);
 
@@ -2265,8 +2266,8 @@
 }
 
   if (fmt.decimal_p ())
-decimal_from_integer (r);
-  if (fmt)
+decimal_from_integer (r, fmt);
+  else if (fmt)
 real_convert (r, fmt, r);
 }
 
@@ -2320,12 +2321,12 @@
 /* Convert a real with an integral value to decimal float.  */
 
 static void
-decimal_from_integer (REAL_VALUE_TYPE *r)
+decimal_from_integer (REAL_VALUE_TYPE *r, const struct real_format *fmt)
 {
   char str[256];
 
   decimal_integer_string (str, r, sizeof (str) - 1);
-  decimal_real_from_string (r, str);
+  decimal_real_from_string (r, str, fmt);
 }
 
 /* Returns 10**2**N.  */
--- gcc/dfp.c.jj2017-01-01 12:45:38.0 +0100
+++ gcc/dfp.c   2017-02-15 12:38:49.627086419 +0100
@@ -67,11 +67,19 @@ decimal_from_decnumber (REAL_VALUE_TYPE
 /* Create decimal encoded R from string S.  */
 
 void
-decimal_real_from_string (REAL_VALUE_TYPE *r, const char *s)
+decimal_real_from_string (REAL_VALUE_TYPE *r, const char *s,
+ const real_format *fmt)
 {
   decNumber dn;
   decContext set;
-  decContextDefault (, DEC_INIT_DECIMAL128);
+  if (fmt == _quad_format)
+decContextDefault (, DEC_INIT_DECIMAL128);
+  else if (fmt == _double_format)
+decContextDefault (, DEC_INIT_DECIMAL64);
+  else if (fmt == _single_format)
+decContextDefault (, DEC_INIT_DECIMAL64);
+  else
+gcc_unreachable ();
   set.traps = 0;
 
   decNumberFromString (, s, );
@@ -82,6 +90,12 @@ decimal_real_from_string (REAL_VALUE_TYP
   decimal_from_decnumber 

[GIMPLE FE] avoid ICE with same ssa version number for multiple names

2017-02-15 Thread Prathamesh Kulkarni
Hi,
For the following (invalid) test-case:

void __GIMPLE () foo (int a)
{
  int t0;
  int _1;
  _1 = a;
  t0_1 = a;
}

results in following ICE:
gimplefe-error-4.c: In function ‘foo’:
gimplefe-error-4.c:20:1: error: SSA_NAME_DEF_STMT is wrong
 }
 ^
Expected definition statement:
_1 = a_2(D);

Actual definition statement:
_1 = a_2(D);
gimplefe-error-4.c:20:1: internal compiler error: verify_ssa failed
0xe1458b verify_ssa(bool, bool)
../../gcc/gcc/tree-ssa.c:1184
0xb0d1ed execute_function_todo
../../gcc/gcc/passes.c:1973
0xb0dad5 execute_todo
../../gcc/gcc/passes.c:2016

The reason for ICE is that in c_parser_parse_ssa_name, ssa_name (1)
returns tree node for _1, and "t0_1" gets replaced by "_1"
resulting in multiple definitions for _1.

The attached patch checks if multiple ssa names have same version
number and emits a diagnostic in that case, for the above case:
gimplefe-error-4.c: In function ‘foo’:
gimplefe-error-4.c:10:3: error: ssa version ‘1’ used anonymously and in ‘t0’
   t0_1 = a;
   ^~~~

OK to commit after bootstrap+test ?

Thanks,
Prathamesh
2017-02-15  Prathamesh Kulkarni  

c/
* gimple-parser.c (c_parser_parse_ssa_name): Emit diagnostic if same
ssa version is used with multiple names.

testsuite/
* gcc.dg/gimplefe-error-4.c: New test.

diff --git a/gcc/c/gimple-parser.c b/gcc/c/gimple-parser.c
index d959877..2e163f4 100644
--- a/gcc/c/gimple-parser.c
+++ b/gcc/c/gimple-parser.c
@@ -672,29 +672,49 @@ c_parser_parse_ssa_name (c_parser *parser,
 }
   else
 {
+  /* Separate var name from version.  */
+  char *var_name = XNEWVEC (char, ver_offset + 1);
+  memcpy (var_name, token, ver_offset);
+  var_name[ver_offset] = '\0';
+  /* lookup for parent decl.  */
+  id = get_identifier (var_name);
+  tree parent = lookup_name (id);
+  if (! parent || parent == error_mark_node)
+   {
+ c_parser_error (parser, "base variable or SSA name undeclared");
+ XDELETEVEC (var_name);
+ return error_mark_node;
+   }
   if (version < num_ssa_names)
name = ssa_name (version);
   if (! name)
{
- /* Separate var name from version.  */
- char *var_name = XNEWVEC (char, ver_offset + 1);
- memcpy (var_name, token, ver_offset);
- var_name[ver_offset] = '\0';
- /* lookup for parent decl.  */
- id = get_identifier (var_name);
- tree parent = lookup_name (id);
- XDELETEVEC (var_name);
- if (! parent || parent == error_mark_node)
-   {
- c_parser_error (parser, "base variable or SSA name undeclared"); 
- return error_mark_node;
-   }
  if (VECTOR_TYPE_P (TREE_TYPE (parent))
  || TREE_CODE (TREE_TYPE (parent)) == COMPLEX_TYPE)
DECL_GIMPLE_REG_P (parent) = 1;
  name = make_ssa_name_fn (cfun, parent,
   gimple_build_nop (), version);
}
+  else if (!SSA_NAME_IDENTIFIER (name))
+   {
+ error_at (input_location, "ssa version %<%d%> used anonymously"
+   " and in %<%s%>", version, var_name);
+ XDELETEVEC (var_name);
+ return error_mark_node;
+   }
+  else
+   {
+ const char *ssaname = IDENTIFIER_POINTER (SSA_NAME_IDENTIFIER (name));
+ if (strcmp (ssaname, var_name))
+   {
+ error_at (input_location, "ssa version %<%d%> used for"
+   " multiple names %<%s%>, %<%s%>", version,
+   ssaname, var_name);
+ XDELETEVEC (var_name);
+ return error_mark_node;
+   }
+   }
+  XDELETEVEC (var_name);
 }
 
   return name;
diff --git a/gcc/testsuite/gcc.dg/gimplefe-error-4.c 
b/gcc/testsuite/gcc.dg/gimplefe-error-4.c
new file mode 100644
index 000..a3c652e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/gimplefe-error-4.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-fgimple" } */
+
+void __GIMPLE () foo (int a)
+{
+  int t0;
+  int _1;
+
+  _1 = a;
+  t0_1 = a; /* { dg-error "used anonymously and in 't0'" } */
+}
+
+void __GIMPLE () bar (int a)
+{
+  int t0;
+  int t1;
+
+  t0_1 = a;
+  t1_1 = a; /* { dg-error "multiple names 't0', 't1'" } */
+}


[PATCH][GCC6] Backport PR target/76731 fix

2017-02-15 Thread Koval, Julia
Hi,
Is it ok to backport this fix to GCC6 branch?

PR target/76731
* config/i386/avx512fintrin.h
(_mm512_i32gather_ps): Change __addr type to void const*.
(_mm512_mask_i32gather_ps): Ditto.
(_mm512_i32gather_pd): Ditto.
(_mm512_mask_i32gather_pd): Ditto.
(_mm512_i64gather_ps): Ditto.
(_mm512_mask_i64gather_ps): Ditto.
(_mm512_i64gather_pd): Ditto.
(_mm512_mask_i64gather_pd): Ditto.
(_mm512_i32gather_epi32): Ditto.
(_mm512_mask_i32gather_epi32): Ditto.
(_mm512_i32gather_epi64): Ditto.
(_mm512_mask_i32gather_epi64): Ditto.
(_mm512_i64gather_epi32): Ditto.
(_mm512_mask_i64gather_epi32): Ditto.
(_mm512_i64gather_epi64): Ditto.
(_mm512_mask_i64gather_epi64): Ditto.
(_mm512_i32scatter_ps): Change __addr type to void*.
(_mm512_mask_i32scatter_ps): Ditto.
(_mm512_i32scatter_pd): Ditto.
(_mm512_mask_i32scatter_pd): Ditto.
(_mm512_i64scatter_ps): Ditto.
(_mm512_mask_i64scatter_ps): Ditto.
(_mm512_i64scatter_pd): Ditto.
(_mm512_mask_i64scatter_pd): Ditto.
(_mm512_i32scatter_epi32): Ditto.
(_mm512_mask_i32scatter_epi32): Ditto.
(_mm512_i32scatter_epi64): Ditto.
(_mm512_mask_i32scatter_epi64): Ditto.
(_mm512_i64scatter_epi32): Ditto.
(_mm512_mask_i64scatter_epi32): Ditto.
(_mm512_i64scatter_epi64): Ditto.
(_mm512_mask_i64scatter_epi64): Ditto.
* config/i386/avx512pfintrin.h
(_mm512_mask_prefetch_i32gather_pd): Change addr type to void const*.
(_mm512_mask_prefetch_i32gather_ps): Ditto.
(_mm512_mask_prefetch_i64gather_pd): Ditto.
(_mm512_mask_prefetch_i64gather_ps): Ditto.
(_mm512_prefetch_i32scatter_pd): Change addr type to void*.
(_mm512_prefetch_i32scatter_ps): Ditto.
(_mm512_mask_prefetch_i32scatter_pd): Ditto.
(_mm512_mask_prefetch_i32scatter_ps): Ditto.
(_mm512_prefetch_i64scatter_pd): Ditto.
(_mm512_prefetch_i64scatter_ps): Ditto.
(_mm512_mask_prefetch_i64scatter_pd): Ditto.
(_mm512_mask_prefetch_i64scatter_ps): Ditto.
* config/i386/avx512vlintrin.h
(_mm256_mmask_i32gather_ps): Change __addr type to void const*.
(_mm_mmask_i32gather_ps): Ditto.
(_mm256_mmask_i32gather_pd): Ditto.
(_mm_mmask_i32gather_pd): Ditto.
(_mm256_mmask_i64gather_ps): Ditto.
(_mm_mmask_i64gather_ps): Ditto.
(_mm256_mmask_i64gather_pd): Ditto.
(_mm_mmask_i64gather_pd): Ditto.
(_mm256_mmask_i32gather_epi32): Ditto.
(_mm_mmask_i32gather_epi32): Ditto.
(_mm256_mmask_i32gather_epi64): Ditto.
(_mm_mmask_i32gather_epi64): Ditto.
(_mm256_mmask_i64gather_epi32): Ditto.
(_mm_mmask_i64gather_epi32): Ditto.
(_mm256_mmask_i64gather_epi64): Ditto.
(_mm_mmask_i64gather_epi64): Ditto.
(_mm256_i32scatter_ps): Change __addr type to void*.
(_mm256_mask_i32scatter_ps): Ditto.
(_mm_i32scatter_ps): Ditto.
(_mm_mask_i32scatter_ps): Ditto.
(_mm256_i32scatter_pd): Ditto.
(_mm256_mask_i32scatter_pd): Ditto.
(_mm_i32scatter_pd): Ditto.
(_mm_mask_i32scatter_pd): Ditto.
(_mm256_i64scatter_ps): Ditto.
(_mm256_mask_i64scatter_ps): Ditto.
(_mm_i64scatter_ps): Ditto.
(_mm_mask_i64scatter_ps): Ditto.
(_mm256_i64scatter_pd): Ditto.
(_mm256_mask_i64scatter_pd): Ditto.
(_mm_i64scatter_pd): Ditto.
(_mm_mask_i64scatter_pd): Ditto.
(_mm256_i32scatter_epi32): Ditto.
(_mm256_mask_i32scatter_epi32): Ditto.
(_mm_i32scatter_epi32): Ditto.
(_mm_mask_i32scatter_epi32): Ditto.
(_mm256_i32scatter_epi64): Ditto.
(_mm256_mask_i32scatter_epi64): Ditto.
(_mm_i32scatter_epi64): Ditto.
(_mm_mask_i32scatter_epi64): Ditto.
(_mm256_i64scatter_epi32): Ditto.
(_mm256_mask_i64scatter_epi32): Ditto.
(_mm_i64scatter_epi32): Ditto.
(_mm_mask_i64scatter_epi32): Ditto.
(_mm256_i64scatter_epi64): Ditto.
(_mm256_mask_i64scatter_epi64): Ditto.
(_mm_i64scatter_epi64): Ditto.
(_mm_mask_i64scatter_epi64): Ditto.
* config/i386/i386-builtin-types.def (V16SF_V16SF_PCFLOAT_V16SI_HI_INT)
(V8DF_V8DF_PCDOUBLE_V8SI_QI_INT, V8SF_V8SF_PCFLOAT_V8DI_QI_INT)
(V8DF_V8DF_PCDOUBLE_V8DI_QI_INT, V16SI_V16SI_PCINT_V16SI_HI_INT)
(V8DI_V8DI_PCINT64_V8SI_QI_INT, V8SI_V8SI_PCINT_V8DI_QI_INT)
(V8DI_V8DI_PCINT64_V8DI_QI_INT, V2DF_V2DF_PCDOUBLE_V4SI_QI_INT)
(V4DF_V4DF_PCDOUBLE_V4SI_QI_INT, V2DF_V2DF_PCDOUBLE_V2DI_QI_INT)
(V4DF_V4DF_PCDOUBLE_V4DI_QI_INT, V4SF_V4SF_PCFLOAT_V4SI_QI_INT)
(V8SF_V8SF_PCFLOAT_V8SI_QI_INT, V4SF_V4SF_PCFLOAT_V2DI_QI_INT)
(V4SF_V4SF_PCFLOAT_V4DI_QI_INT, V2DI_V2DI_PCINT64_V4SI_QI_INT)
 

Re: [RFC PATCH] Improve switchconv optimization (PR tree-optimization/79472)

2017-02-15 Thread Jakub Jelinek
On Wed, Feb 15, 2017 at 12:46:44PM +0100, Richard Biener wrote:
> >> Possibly, but for GCC 8.
> > 
> > To both this switchconv patch and the potential improvement for loading
> > from const arrays (can create an enhancement PR for that), or just the
> > latter?
> 
> Both I think - the patch is pretty big.

Ok, I'll queue the patch for GCC8 then.

>  Maybe we can instead make early
> threading not mess this up?

Maybe, but not planning to do that myself, my knowledge about jump threading
is too limited.

> >> can we teach EVRP about this?  It runs before switch conversion.
> > 
> > I guess so.  It is a matter of calling simplify_switch_using_ranges
> > and then doing some cleanup (you wrote that optimization)
> > - to_update_switch_stmts handling.
> 
> Sounds like a good thing to do (for GCC 8 as well).

Ok, will file enhancement PRs.

Jakub


C PATCH to fix ICE with -Wdouble-promotion (PR c/79515)

2017-02-15 Thread Marek Polacek
We ICEd on this testcase in do_warn_double_promotion because an invalid
conversion had produced an error result type and accessing that via
TYPE_MAIN_VARIANT crashes.  Fixed in an obvious way.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2017-02-15  Marek Polacek  

PR c/79515
* c-warn.c (do_warn_double_promotion): Don't warn if an invalid
conversion has occured.

* gcc.dg/dfp/pr79515.c: New.

diff --git gcc/c-family/c-warn.c gcc/c-family/c-warn.c
index 3c9077c..09c5760 100644
--- gcc/c-family/c-warn.c
+++ gcc/c-family/c-warn.c
@@ -1864,6 +1864,9 @@ do_warn_double_promotion (tree result_type, tree type1, 
tree type2,
  warn about it.  */
   if (c_inhibit_evaluation_warnings)
 return;
+  /* If an invalid conversion has occured, don't warn.  */
+  if (result_type == error_mark_node)
+return;
   if (TYPE_MAIN_VARIANT (result_type) != double_type_node
   && TYPE_MAIN_VARIANT (result_type) != complex_double_type_node)
 return;
diff --git gcc/testsuite/gcc.dg/dfp/pr79515.c gcc/testsuite/gcc.dg/dfp/pr79515.c
index e69de29..6f6f09c 100644
--- gcc/testsuite/gcc.dg/dfp/pr79515.c
+++ gcc/testsuite/gcc.dg/dfp/pr79515.c
@@ -0,0 +1,13 @@
+/* PR c/79515 */
+/* { dg-do compile } */
+/* { dg-options "-Wdouble-promotion" } */
+
+extern _Decimal64 x;
+extern int i;
+
+void
+foo (void)
+{
+  if (x <= 2.0) /* { dg-error "mix operands" } */
+i++;
+}

Marek


Re: [RFC PATCH] Improve switchconv optimization (PR tree-optimization/79472)

2017-02-15 Thread Richard Biener
On 15/02/17 08:17, Jakub Jelinek wrote:
> On Wed, Feb 15, 2017 at 08:06:16AM +0100, Richard Biener wrote:
>> On February 14, 2017 9:04:45 PM GMT+01:00, Jakub Jelinek  
>> wrote:
>>> Hi!
>>>
>>> The following patch is an attempt to fix a regression where we no
>>> longer
>>> switch convert one switch because earlier optimizations turn it into
>>> unsupported shape.
>>
>> Is that because of early threading?
> 
> Yes.
> 
>>> and expects to be optimized into return 3 by vrp1.  As switchconv is
>>> earlier
>>> than that, vrp1 sees:
>>>  _1 = a_3(D) & 1;
>>>  _4 = (unsigned int) _1;
>>>  _5 = CSWTCH.1[_4];
>>>  return _5;
>>> and doesn't optimize it.  If the testcase had say case 7: replaced with
>>> default:, it wouldn't pass already before.
>>
>> That looks odd...
> 
> Just a pass ordering issue.
> 
>>   If the patch is ok, what
>>> should
>>> we do with vrp40.c?  Change it in some way (e.g. return variable in one
>>> case) so that switchconv doesn't trigger, or add an optimization in vrp
>>> if we see a load from constant array with known initializer and the
>>> range
>>> is small enough and contains the same value for all those values,
>>> replace
>>> it with the value? 
>>
>> Possibly, but for GCC 8.
> 
> To both this switchconv patch and the potential improvement for loading
> from const arrays (can create an enhancement PR for that), or just the
> latter?

Both I think - the patch is pretty big.  Maybe we can instead make early
threading not mess this up?

>> can we teach EVRP about this?  It runs before switch conversion.
> 
> I guess so.  It is a matter of calling simplify_switch_using_ranges
> and then doing some cleanup (you wrote that optimization)
> - to_update_switch_stmts handling.

Sounds like a good thing to do (for GCC 8 as well).

Thanks,
Richard.

> 
>   Jakub
> 



Re: [PATCH, wwwdocs/ARM] Deprecate ARMv5 and ARMv5E support

2017-02-15 Thread Richard Earnshaw (lists)
On 15/02/17 11:23, Thomas Preudhomme wrote:
> Hi,
> 
> ARMv5 and ARMv5E architectures have no known implementation. I therefore
> suggest that we deprecate these architectures. ARMv5T, ARMv5TE and
> ARMv5TEJ would remain supported though.
> 
> Is this ok to commit?
> 
> Best regards,
> 
> Thomas
> 
> deprecate_armv5_armv5e.patch
> 
> 
> cvs diff: Diffing .
> cvs diff: Diffing benchmarks
> cvs diff: Diffing bugs

"cvs -q diff" is your friend :-)

> Index: gcc-7/changes.html
> ===
> RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-7/changes.html,v
> retrieving revision 1.55
> diff -u -r1.55 changes.html
> --- gcc-7/changes.html2 Feb 2017 15:21:11 -   1.55
> +++ gcc-7/changes.html14 Feb 2017 09:21:23 -
> @@ -752,6 +752,13 @@
>  ARM
> 
>   
> +   Support for the ARMv5 and ARMv5E architectures has been deprecated and

Suggest "Support for the ARMv5 and ARMv5E architectures (which have no
known implementations) has been ..."

> +   will be removed in a future GCC release.  Note that ARMv5T, ARMv5TE 
> and
> +   ARMv5TEJ architectures remain supported.
> +   The values armv5 and armv5e of
> +   -march are thus deprecated.
> + 
> + 
> The ARMv8.2-A architecture and the ARMv8.2-A 16-bit Floating-Point
> Extensions are now supported.  They can be used by specifying the
> -march=armv8.2-a or -march=armv8.2-a+fp16

Otherwise OK.

R.


[PATCH, wwwdocs/ARM] Deprecate ARMv5 and ARMv5E support

2017-02-15 Thread Thomas Preudhomme

Hi,

ARMv5 and ARMv5E architectures have no known implementation. I therefore suggest 
that we deprecate these architectures. ARMv5T, ARMv5TE and ARMv5TEJ would remain 
supported though.


Is this ok to commit?

Best regards,

Thomas
cvs diff: Diffing .
cvs diff: Diffing benchmarks
cvs diff: Diffing bugs
cvs diff: Diffing bzkanban
cvs diff: Diffing egcs-1.0
cvs diff: Diffing egcs-1.1
cvs diff: Diffing fortran
cvs diff: Diffing gcc-2.95
cvs diff: Diffing gcc-3.0
cvs diff: Diffing gcc-3.1
cvs diff: Diffing gcc-3.2
cvs diff: Diffing gcc-3.3
cvs diff: Diffing gcc-3.4
cvs diff: Diffing gcc-4.0
cvs diff: Diffing gcc-4.1
cvs diff: Diffing gcc-4.2
cvs diff: Diffing gcc-4.3
cvs diff: Diffing gcc-4.4
cvs diff: Diffing gcc-4.5
cvs diff: Diffing gcc-4.6
cvs diff: Diffing gcc-4.7
cvs diff: Diffing gcc-4.8
cvs diff: Diffing gcc-4.9
cvs diff: Diffing gcc-5
cvs diff: Diffing gcc-6
cvs diff: Diffing gcc-7
Index: gcc-7/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-7/changes.html,v
retrieving revision 1.55
diff -u -r1.55 changes.html
--- gcc-7/changes.html	2 Feb 2017 15:21:11 -	1.55
+++ gcc-7/changes.html	14 Feb 2017 09:21:23 -
@@ -752,6 +752,13 @@
 ARM

  
+   Support for the ARMv5 and ARMv5E architectures has been deprecated and
+   will be removed in a future GCC release.  Note that ARMv5T, ARMv5TE and
+   ARMv5TEJ architectures remain supported.
+   The values armv5 and armv5e of
+   -march are thus deprecated.
+ 
+ 
The ARMv8.2-A architecture and the ARMv8.2-A 16-bit Floating-Point
Extensions are now supported.  They can be used by specifying the
-march=armv8.2-a or -march=armv8.2-a+fp16
cvs diff: Diffing git
cvs diff: Diffing img
cvs diff: Diffing install
cvs diff: Diffing java
cvs diff: Diffing libstdc++
cvs diff: Diffing news
cvs diff: Diffing onlinedocs
cvs diff: Diffing onlinedocs/4.6.0
cvs diff: Diffing onlinedocs/4.6.1
cvs diff: Diffing onlinedocs/4.6.2
cvs diff: Diffing onlinedocs/4.6.3
cvs diff: Diffing onlinedocs/4.6.4
cvs diff: Diffing onlinedocs/4.7.0
cvs diff: Diffing onlinedocs/4.7.1
cvs diff: Diffing onlinedocs/4.7.2
cvs diff: Diffing onlinedocs/4.7.3
cvs diff: Diffing onlinedocs/4.7.4
cvs diff: Diffing onlinedocs/4.8.0
cvs diff: Diffing onlinedocs/4.8.1
cvs diff: Diffing onlinedocs/4.8.2
cvs diff: Diffing onlinedocs/4.8.3
cvs diff: Diffing onlinedocs/4.8.4
cvs diff: Diffing onlinedocs/4.8.5
cvs diff: Diffing onlinedocs/4.9.0
cvs diff: Diffing onlinedocs/4.9.1
cvs diff: Diffing onlinedocs/4.9.2
cvs diff: Diffing onlinedocs/4.9.3
cvs diff: Diffing onlinedocs/4.9.4
cvs diff: Diffing onlinedocs/5.1.0
cvs diff: Diffing onlinedocs/5.2.0
cvs diff: Diffing onlinedocs/5.3.0
cvs diff: Diffing onlinedocs/5.4.0
cvs diff: Diffing onlinedocs/6.1.0
cvs diff: Diffing onlinedocs/6.2.0
cvs diff: Diffing onlinedocs/6.3.0
cvs diff: Diffing projects
cvs diff: Diffing projects/bp
cvs diff: Diffing projects/cxx-reflection
cvs diff: Diffing projects/gomp
cvs diff: Diffing projects/lto
cvs diff: Diffing projects/strees
cvs diff: Diffing projects/tree-ssa
cvs diff: Diffing testing


[PATCH] PR 68749: S/390: Disable ifcvt-4.c for -m31.

2017-02-15 Thread Dominik Vogt
The attached patch disables the test ifcvt-4.c on s390 and on
s390x with -31, and adds -march=z196 for s390x.  It should no
longer fail on s390 and s390x.

Tested on s390x biarch.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
gcc/testsuite/ChangeLog-pr68749

PR 68749
* gcc.dg/ifcvt-4.c: Disable for -m31, use -march=z196.
>From 741e57a26f203a0dc3e0744c63249109f001d7c3 Mon Sep 17 00:00:00 2001
From: Dominik Vogt 
Date: Wed, 15 Feb 2017 12:13:15 +0100
Subject: [PATCH] PR 68749: S/390: Disable ifcvt-4.c for -m31.

The test needs the conditional move pattern which is available only with
-march=z196 or higher and -m64.
---
 gcc/testsuite/gcc.dg/ifcvt-4.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/ifcvt-4.c b/gcc/testsuite/gcc.dg/ifcvt-4.c
index 466ad15..b4a4bc8 100644
--- a/gcc/testsuite/gcc.dg/ifcvt-4.c
+++ b/gcc/testsuite/gcc.dg/ifcvt-4.c
@@ -1,6 +1,8 @@
 /* { dg-options "-fdump-rtl-ce1 -O2 --param max-rtl-if-conversion-insns=3 
--param max-rtl-if-conversion-unpredictable-cost=100" } */
 /* { dg-additional-options "-misel" { target { powerpc*-*-* } } } */
-/* { dg-skip-if "Multiple set if-conversion not guaranteed on all subtargets" 
{ "arm*-*-* hppa*64*-*-* visium-*-*" riscv*-*-* } }  */
+/* { dg-additional-options "-march=z196" { target { s390x-*-* } } } */
+/* { dg-skip-if "Multiple set if-conversion not guaranteed on all subtargets" 
{ "arm*-*-* hppa*64*-*-* s390-*-* visium-*-*" riscv*-*-* } }  */
+/* { dg-skip-if "" { "s390x-*-*" } { "-m31" } }  */
 
 typedef int word __attribute__((mode(word)));
 
-- 
2.3.0



Re: [PATCH v5] add -fprolog-pad=N,M option

2017-02-15 Thread Richard Earnshaw (lists)
On 15/02/17 11:12, Marek Polacek wrote:
> On Wed, Feb 15, 2017 at 11:01:16AM +, Richard Earnshaw (lists) wrote:
>> On 13/01/17 12:19, Torsten Duwe wrote:
>>> Changes since v4: hopefully addressed all of Sandra's requests
>>> and suggestions concerning the documentation snippets, thanks
>>> for the feedback.  If it still isn't clear, feel free to rephrase
>>> -- I'm a programmer, not a technical writer.
>>>
>>
>> Generally, I find 'pad' somewhat confusing.  It's OK for the option name
>> itself, but more generally, we should be talking about either space or
>> number of instructions according to context.
>>
>>> 2017-01-13  Torsten Duwe : 
>>
>> Two spaces between date and name, two more between name and email, no
>> colon at the end.
>>
>>>
>>>  * c-family/c-attribs.c : introduce prolog_pad attribute and create
>>>a handler for it.
>>>
>>
>>
>> Don't leave blank lines between files mentioned here.  All lines should
>> be indented by /exactly/ one tab; don't add extra indentation for
>> continuation lines.  No space before colon.  Capital letter at start of
>> sentences, full stop at the end of each one.  Which function, or data
>> structure did you change (put the name in brackets before the colon)?
>> Document each function or variable changed.
>>
>> The above entry should read:
>>
>>  * c-family/c-attribs.c (c_common_attribute_table): Add entry
>>  for "prolog_pad".
>>  (handle_prolog_pad_attribute): New function.
> 
> Except that the c-family/ prefix shouldn't be there.
> 

Ah, I'd missed that it has its own ChangeLog file and thus needs to be
recorded separately.  Ergo, the LTO entry can't use Likewise since the
context for that will not be immediately above that entry...

R.

>   Marek
> 



Re: [PATCH v5] add -fprolog-pad=N,M option

2017-02-15 Thread Marek Polacek
On Wed, Feb 15, 2017 at 11:01:16AM +, Richard Earnshaw (lists) wrote:
> On 13/01/17 12:19, Torsten Duwe wrote:
> > Changes since v4: hopefully addressed all of Sandra's requests
> > and suggestions concerning the documentation snippets, thanks
> > for the feedback.  If it still isn't clear, feel free to rephrase
> > -- I'm a programmer, not a technical writer.
> > 
> 
> Generally, I find 'pad' somewhat confusing.  It's OK for the option name
> itself, but more generally, we should be talking about either space or
> number of instructions according to context.
> 
> > 2017-01-13  Torsten Duwe : 
> 
> Two spaces between date and name, two more between name and email, no
> colon at the end.
> 
> > 
> >  * c-family/c-attribs.c : introduce prolog_pad attribute and create
> >a handler for it.
> > 
> 
> 
> Don't leave blank lines between files mentioned here.  All lines should
> be indented by /exactly/ one tab; don't add extra indentation for
> continuation lines.  No space before colon.  Capital letter at start of
> sentences, full stop at the end of each one.  Which function, or data
> structure did you change (put the name in brackets before the colon)?
> Document each function or variable changed.
> 
> The above entry should read:
> 
>   * c-family/c-attribs.c (c_common_attribute_table): Add entry
>   for "prolog_pad".
>   (handle_prolog_pad_attribute): New function.

Except that the c-family/ prefix shouldn't be there.

Marek


Re: [PATCH v5] add -fprolog-pad=N,M option

2017-02-15 Thread Richard Earnshaw (lists)
On 13/01/17 12:19, Torsten Duwe wrote:
> Changes since v4: hopefully addressed all of Sandra's requests
> and suggestions concerning the documentation snippets, thanks
> for the feedback.  If it still isn't clear, feel free to rephrase
> -- I'm a programmer, not a technical writer.
> 

Generally, I find 'pad' somewhat confusing.  It's OK for the option name
itself, but more generally, we should be talking about either space or
number of instructions according to context.

> 2017-01-13Torsten Duwe : 

Two spaces between date and name, two more between name and email, no
colon at the end.

> 
>* c-family/c-attribs.c : introduce prolog_pad attribute and create
>  a handler for it.
> 


Don't leave blank lines between files mentioned here.  All lines should
be indented by /exactly/ one tab; don't add extra indentation for
continuation lines.  No space before colon.  Capital letter at start of
sentences, full stop at the end of each one.  Which function, or data
structure did you change (put the name in brackets before the colon)?
Document each function or variable changed.

The above entry should read:

* c-family/c-attribs.c (c_common_attribute_table): Add entry
for "prolog_pad".
(handle_prolog_pad_attribute): New function.

>* lto/lto-lang.c : Likewise.
You still need to name the objects affected, even if you can then use
likewise to refer to the immediately preceding entry.

> 
>* common.opt : introduce -fprolog_pad command line option
>  and its variables prolog_nop_pad_size and prolog_nop_pad_entry.
> 
>* doc/extend.texi : document prolog_pad attribute.
> 
>* doc/invoke.texi : document -fprolog_pad command line option.
> 
>* opts.c (OPT_fprolog_pad_): add parser.
> 
>* doc/tm.texi.in (TARGET_ASM_PRINT_PROLOG_PAD): new target hook
> 
>* doc/tm.texi : Likewise.
> 
>* target.def (print_prolog_pad): Likewise.
> 
>* targhooks.h (default_print_prolog_pad): new function.
> 
>* targhooks.c (default_print_prolog_pad): Likewise.
> 
>* testsuite/c-c++-common/attribute-prolog_pad-1.c : New test.
> 
>* toplev.c (process_options): Switch off IPA-RA if
>  prolog pads are being generated.
> 
>* varasm.c (assemble_start_function): look at prolog-pad command
>  line switch and function attributes and maybe generate NOP
>  pads by calling print_prolog_pad.
> 
> diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
> index ce7fcaa..5c6cf1c 100644
> --- a/gcc/c-family/c-attribs.c
> +++ b/gcc/c-family/c-attribs.c
> @@ -139,6 +139,7 @@ static tree handle_bnd_variable_size_attribute (tree *, 
> tree, tree, int, bool *)
>  static tree handle_bnd_legacy (tree *, tree, tree, int, bool *);
>  static tree handle_bnd_instrument (tree *, tree, tree, int, bool *);
>  static tree handle_fallthrough_attribute (tree *, tree, tree, int, bool *);
> +static tree handle_prolog_pad_attribute (tree *, tree, tree, int, bool *);
>  
>  /* Table of machine-independent attributes common to all C-like languages.
>  
> @@ -345,6 +346,8 @@ const struct attribute_spec c_common_attribute_table[] =
> handle_bnd_instrument, false },
>{ "fallthrough", 0, 0, false, false, false,
> handle_fallthrough_attribute, false },
> +  { "prolog_pad",  1, 2, true, false, false,
> +   handle_prolog_pad_attribute, false },
>{ NULL, 0, 0, false, false, false, NULL, false }
>  };
>  
> @@ -3173,3 +3176,10 @@ handle_fallthrough_attribute (tree *, tree name, tree, 
> int,
>*no_add_attrs = true;
>return NULL_TREE;
>  }
> +
> +static tree
> +handle_prolog_pad_attribute (tree *, tree, tree, int,
> +  bool *)
> +{
> +  return NULL_TREE;
> +}
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 8ad5b77..37d4009 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -163,6 +163,13 @@ bool flag_stack_usage_info = false
>  Variable
>  int flag_debug_asm
>  
> +; If we should generate NOP pads before each function prologue

'If' suggests this is a boolean, but the declaration is for a count, so
this should be "Number of NOP instructions to insert before each
function prologue."

> +Variable
> +HOST_WIDE_INT prolog_nop_pad_size
> +
> +; And how far the asm entry point is into this pad
> +Variable
> +HOST_WIDE_INT prolog_nop_pad_entry
>  
>  ; Balance between GNAT encodings and standard DWARF to emit.
>  Variable
> @@ -2019,6 +2026,10 @@ fprofile-reorder-functions
>  Common Report Var(flag_profile_reorder_functions)
>  Enable function reordering that improves code placement.
>  
> +fprolog-pad=
> +Common Joined Optimization
> +Pad NOPs before each function prologue.
> +

Insert NOP instructions before ...

>  frandom-seed
>  Common Var(common_deferred_options) Defer
>  
> diff --git a/gcc/doc/extend.texi 

Re: PR rtl-optimization/64081: Enable RTL loop unrolling for duplicated exit blocks and back edges (with AIX fixes)

2017-02-15 Thread Aldy Hernandez

On 02/13/2017 07:15 PM, Jeff Law wrote:


So it seems in your updated patch there is only one call where we ask
for LOOP_EXIT_COMPLEX, specifically the call from get_loop_location.

But I don't see how asking for LOOP_EXIT_COMPLEX from that location
would change whether or not we unroll any given loop (which is the core
of bz64081).

Am I missing something?


Ughh, only the spaghetti that is this code? ;-).

get_loop_location is only called once in the compiler, in 
decide_unrolling().  This call to get_loop_location() will set the loop 
description, particularly desc->simple_p where you point out.


Later on down in decide_unrolling(), we decide the number of iterations, 
and use desc->simple_p to ignore the loop if it is not simple.


  decide_unroll_constant_iterations (loop, flags);
  if (loop->lpt_decision.decision == LPT_NONE)
decide_unroll_runtime_iterations (loop, flags);
  if (loop->lpt_decision.decision == LPT_NONE)
decide_unroll_stupid (loop, flags);

Any one of these functions will bail if the loop description was not 
simple_p:


  /* Check for simple loops.  */
  desc = get_simple_loop_desc (loop);

  /* Check simpleness.  */
  if (!desc->simple_p || desc->assumptions)
{
  if (dump_file)
fprintf (dump_file,
 ";; Unable to prove that the number of iterations "
 "can be counted in runtime\n");
  return;
}

(Yes, there's a lot of duplicated code in decide_unroll_*_iterations.)

Now a problem I see here is that decide_unroll_*_iterations all call 
get_simple_loop_desc() which is basically LOOP_EXIT_SIMPLE, but since 
the value is already cached we return the previous call that was 
LOOP_EXIT_COMPLEX.  So the code works because we will already have a 
cached value.


I think to make it clearer we could:

1. Add an assert in get_loop_desc to make sure that if we're returning a 
cached loop description, that the LOOP_EXIT_TYPEs match.  Just in case...


2. Change all the decide_unroll_*_iterations variants to specifically 
ask for a LOOP_EXIT_TYPE, not just  the simple variant. And have this 
set to LOOP_EXIT_COMPLEX from decide_unrolling.  Right now, this is all 
working because we have only one call to get_loop_location, but I assume 
that could change.


3. And finally, what the heck is get_loop_location doing in cfgloop, 
when it's only used once within loop-unroll.c?  I say we move it to 
loop-unroll.c and mark it static.


Does this help?

Aldy


Re: [libstdc++,doc] Strip links to ANSI (web shop)

2017-02-15 Thread Jonathan Wakely

On 09/02/17 15:15 +0100, Gerald Pfeifer wrote:

On Sun, 5 Feb 2017, Jonathan Wakely wrote:

ANSI sells the standard for a much more reasonable price than ISO or
any of the other national standards bodies. I think many people use
draft standards now, rather than buying it, but for those who do want
an official copy ANSI is the cheapest source.


Ah, makes sense.  Unfortunately the direct link into the ANSI
web shop is broken, and I somehow failed to find a good replacement.
Do you have one?


The C++14 standard is:
http://webstore.ansi.org/RecordDetail.aspx?sku=ISO%2fIEC+14882%3a2014


Also, below an updated patch suggestion that mostly removes the
essential duplication of information in faq.xml.

What do you think?


Should we make the FAQ link to the info in the manual, instead of just
removing it?



Re: [v3 PATCH] Implement C++17 GB50 resolution

2017-02-15 Thread Jonathan Wakely

Hi, Dinka, thanks for the patch.

On 14/02/17 21:22 +, Dinka Ranns wrote:

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index ceae7f8..6a6995c 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -349,50 +349,50 @@ _GLIBCXX_END_NAMESPACE_VERSION
operator-() const
{ return duration(-__r); }

-   duration&
+   constexpr duration&


This needs to use _GLIBCXX17_CONSTEXPR instead of 'constexpr'

These functions aren't constexpr in C++11 and C++14, and the standard
(annoyingly) forbids us from adding constexpr anywhere it isn't
present in the standard.

The macro _GLIBCXX17_CONSTEXPR expands to 'constexpr' if __cplusplus >
201402L, and expands to nothing otherwise.

Each new 'constexpr' you've added needs to use that macro.


diff --git a/libstdc++-v3/testsuite/20_util/duration/arithmetic/constexpr.cc 
b/libstdc++-v3/testsuite/20_util/duration/arithmetic/constexpr.cc
index 285f941..1128a52 100644
--- a/libstdc++-v3/testsuite/20_util/duration/arithmetic/constexpr.cc
+++ b/libstdc++-v3/testsuite/20_util/duration/arithmetic/constexpr.cc
@@ -19,11 +19,31 @@

#include 
#include 


There should be a blank line before and after this function, however
...


+constexpr auto test_operators()
+{
+  std::chrono::nanoseconds d1 { };
+  d1++;


This new function uses C++14 return type deduction, so will fail if
the test is run in C++11 mode (the default is C++14, but it can be
overridden on the command-line).


+  ++d1;
+  d1--;
+  --d1;


Also once you change the new 'constexpr' specifiers to use the
_GLIBCXX17_CONSTEXPR macro this test will fail in C++14 mode. I think
this new function needs to be moved to a new test file, such as
testsuite/20_util/duration/arithmetic/constexpr_c++17.cc

That should contain just your new test_operators() function, because
the rest of the class will be tested by the existing constexpr.cc test
file. So something like:

// { dg-options "-std=gnu++17" }
// { dg-do compile { target c++1z } }

// Copyright etc. etc.
// ...

#include 

constexpr auto test_operators()
{
 // ...
}

constexpr auto d4 = test_operators();


Note that the "dg-do compile" line should use the c++1z target instead
of c++11, and needs to override the default dialect with a dg-options
line.

This test doesn't need a "main" function because it's a "dg-do
compile" test, so isn't linked. (The existing test that you modified
didn't need one either, but it doesn't do any harm leaving it there).



diff --git a/libstdc++-v3/testsuite/20_util/time_point/arithmetic/constexpr.cc 
b/libstdc++-v3/testsuite/20_util/time_point/arithmetic/constexpr.cc
new file mode 100644
index 000..e87a226
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/time_point/arithmetic/constexpr.cc
@@ -0,0 +1,39 @@
+// { dg-do compile { target c++11 } }


Since the time_point member functions will only be constexpr in C++17
this test also needs to use c++1z instead of c++11, and needs to
override the default C++14 dialect, i.e.

// { dg-options "-std=gnu++17" }
// { dg-do compile { target c++1z } }



Re: [PATCH] PR target/79241: S/390: define TARGET_CUSTOM_FUNCTION_DESCRIPTORS.

2017-02-15 Thread Andreas Krebbel
On 02/09/2017 07:16 PM, Dominik Vogt wrote:
> The attached patch fixes PR 79241 on s390x.  Bootstrapped and
> regression tested on s390x biarch (not tested on s390).
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79421
> 
> Ciao
> 
> Dominik ^_^  ^_^
> 
Applied. Thanks!

-Andreas-



Re: [PATCH] Implement P0393R3

2017-02-15 Thread Tim Shen via gcc-patches
On Mon, Jan 9, 2017 at 2:52 AM, Jonathan Wakely  wrote:
> On 08/01/17 22:49 -0800, Tim Shen wrote:
>>
>> On Tue, Jan 3, 2017 at 6:17 AM, Jonathan Wakely 
>> wrote:
>>>
>>> On 01/01/17 04:17 -0800, Tim Shen via libstdc++ wrote:


 +#define _VARIANT_RELATION_FUNCTION_TEMPLATE(__op, __name) \
 +  template \
 +constexpr bool operator __op(const variant<_Types...>& __lhs, \
 +const variant<_Types...>& __rhs) \
 +{ \
 +  return __lhs._M##__name(__rhs,
 std::index_sequence_for<_Types...>{}); \
 +} \
 +\
 +  constexpr bool operator __op(monostate, monostate) noexcept \
 +  { return 0 __op 0; }
 +
 +  _VARIANT_RELATION_FUNCTION_TEMPLATE(<, _erased_less_than)
 +  _VARIANT_RELATION_FUNCTION_TEMPLATE(<=, _erased_less_equal)
 +  _VARIANT_RELATION_FUNCTION_TEMPLATE(==, _erased_equal)
 +  _VARIANT_RELATION_FUNCTION_TEMPLATE(!=, _erased_not_equal)
 +  _VARIANT_RELATION_FUNCTION_TEMPLATE(>=, _erased_greater_than)
 +  _VARIANT_RELATION_FUNCTION_TEMPLATE(>, _erased_greater)
>>>
>>>
>>>
>>> These need double underscore prefixes.
>>
>>
>> Done.
>
>
> I'm sorry, I missed that they get appended to _M to form a member
> function name, so they don't need a double underscore.
>
> But since they all have the same prefix, why not use _M_erased_##name
> and just use less_than, less_equal etc. in the macro invocations?
>
> However, the names are weird, you have >= as greater_than (not
> greater_equal) and > as greater (which is inconsistent with < as
> less_than).
>
> So I'd go with:
>
> _VARIANT_RELATION_FUNCTION_TEMPLATE(<, less)
> _VARIANT_RELATION_FUNCTION_TEMPLATE(<=, less_equal)
> _VARIANT_RELATION_FUNCTION_TEMPLATE(==, equal)
> _VARIANT_RELATION_FUNCTION_TEMPLATE(!=, not_equal)
> _VARIANT_RELATION_FUNCTION_TEMPLATE(>=, greater_equal)
> _VARIANT_RELATION_FUNCTION_TEMPLATE(>, greater)
>
>> +#define _VARIANT_RELATION_FUNCTION_TEMPLATE(__op, __name) \
>
>
> I think we usually use all-caps for macro arguments, so _OP and _NAME,
> but it doesn't really matter.
>
>> +  template \
>> +   static constexpr bool \
>> +   (*_S##__name##_vtable[])(const variant&, const variant&) = \
>> + { &__detail::__variant::__name... };
>> \
>
>
> With the suggestions above this would change to use _S_erased_##_NAME
> and &__detail::__variant::__erased_##_NAME
>
>> +  template \
>> +   constexpr inline bool \
>> +   _M##__name(const variant& __rhs, \
>> +std::index_sequence<__indices...>) const \
>> +   { \
>> + auto __lhs_index = this->index(); \
>> + auto __rhs_index = __rhs.index(); \
>> + if (__lhs_index != __rhs_index || valueless_by_exception()) \
>> +   /* Intentinoal modulo addition. */ \
>
>
> "Intentional" is spelled wrong, but I think simply "Modulo addition"
> is clear enough that it's intentional.
>
>> +   return __lhs_index + 1 __op __rhs_index + 1; \
>> + return _S##__name##_vtable<__indices...>[__lhs_index](*this,
>> __rhs); \
>> }

All done.

>>
>> -  template
>
>
> And we'd usually use _Indices for template parameters, but this is
> already inconsistent in .

I didn't change this one, since I prefer in-file consistency. That's
weird, So let's discuss this. There are several naming style in
libstdc++:
1) __underscore_name, for free functions, variables, and namespaces;
2) _CamelCase, for types, template type parameters in some files (e.g.
, I forgot why I did that :/).
3) _Camel_underscore_name, for types in many other files.
4) _S_underscore_name, _M_underscore name, for static and member
functions, respectively.

There are two questions:
*) It seems natural to have (1) for non-type template parameters,
since that are also variables.
*) _CamelCase vs _Camel_underscore_name? Which one is preferred?

>
> The patch is OK with those naming tweaks. Thanks, and sorry for the
> mixup about the underscores.
>

No worries! Tested and committed.


-- 
Regards,
Tim Shen


Re: [PATCH] Fix DFP conversion from INTEGER_CST to REAL_CST (PR target/79487)

2017-02-15 Thread Richard Biener
On February 15, 2017 7:53:58 AM GMT+01:00, Jakub Jelinek  
wrote:
>Hi!
>
>As the following testcase shows, we store decimal REAL_CSTs always in
>_Decimal128 internal form and perform all the arithmetics on that, but
>while
>for arithmetics we then ensure rounding to the actual type
>(_Decimal{32,64}
>or for _Decimal128 no further rounding), e.g. const_binop calls
>  inexact = real_arithmetic (, code, , );
>  real_convert (, mode, );
>when converting integers to _Decimal{32,64} we do nothing like that.
>We do that only for non-decimal conversions from INTEGER_CSTs to
>REAL_CSTs.
>
>The following patch fixes that.  Bootstrapped/regtested on x86_64-linux
>(i686-linux fails to bootstrap for other reason), and on 6.x branch on
>x86_64-linux and i686-linux.  Dominik has kindly tested it on s390x
>(where
>the bug has been originally reported on the float-cast-overflow-10.c
>test).
>
>Ok for trunk?

OK.

Richard.

>2017-02-15  Jakub Jelinek  
>
>   PR target/79487
>   * real.c (real_from_integer): Call real_convert even for decimal.
>
>   * gcc.dg/dfp/pr79487.c: New test.
>   * c-c++-common/ubsan/float-cast-overflow-8.c (TEST): Revert
>   2017-02-13 change.
>
>--- gcc/real.c.jj  2017-01-01 12:45:37.0 +0100
>+++ gcc/real.c 2017-02-14 21:35:35.868906203 +0100
>@@ -2266,7 +2266,7 @@ real_from_integer (REAL_VALUE_TYPE *r, f
> 
>   if (fmt.decimal_p ())
> decimal_from_integer (r);
>-  else if (fmt)
>+  if (fmt)
> real_convert (r, fmt, r);
> }
> 
>--- gcc/testsuite/gcc.dg/dfp/pr79487.c.jj  2017-02-14 22:42:33.137938789
>+0100
>+++ gcc/testsuite/gcc.dg/dfp/pr79487.c 2017-02-14 22:42:22.0
>+0100
>@@ -0,0 +1,16 @@
>+/* PR target/79487 */
>+/* { dg-options "-O2" } */
>+
>+int
>+main ()
>+{
>+  _Decimal32 a = (-9223372036854775807LL - 1LL); 
>+  _Decimal32 b = -9.223372E+18DF;
>+  if (b - a != 0.0DF)
>+__builtin_abort ();
>+  _Decimal64 c = (-9223372036854775807LL - 1LL); 
>+  _Decimal64 d = -9.223372036854776E+18DD;
>+  if (d - c != 0.0DD)
>+__builtin_abort ();
>+  return 0;
>+}
>---
>gcc/testsuite/c-c++-common/ubsan/float-cast-overflow-8.c.jj2017-02-14
>00:08:33.0 +0100
>+++ gcc/testsuite/c-c++-common/ubsan/float-cast-overflow-8.c   2017-02-15
>07:46:46.780778627 +0100
>@@ -8,7 +8,7 @@
> #define TEST(type1, type2) \
>   if (type1##_MIN)\
> { \
>-  volatile type2 min = type1##_MIN;   \
>+  type2 min = type1##_MIN;\
>   type2 add = -1.0;   \
>   while (1)   \
>   {   \
>@@ -28,7 +28,7 @@
>   volatile type1 tem3 = cvt_##type1##_##type2 (-1.0f);\
> } \
>   {   \
>-volatile type2 max = type1##_MAX; \
>+type2 max = type1##_MAX;  \
> type2 add = 1.0;  \
> while (1) \
>   {   \
>
>   Jakub