date:20170109

Patch ping

2017-01-09 Thread Jakub Jelinek

Hi!

I'd like to ping 2 patches:

- DWARF5 - adjust for 161031.2 resolution - remove padding from unit headers
  http://gcc.gnu.org/ml/gcc-patches/2017-01/msg00138.html

- Introduce the noipa attribute
  http://gcc.gnu.org/ml/gcc-patches/2016-12/msg01501.html

Jakub

[PATCH] Optimize away useless snprintf calls with -fprintf-return-value (take 2)

2017-01-09 Thread Jakub Jelinek

On Mon, Jan 02, 2017 at 04:54:57PM -0700, Martin Sebor wrote:
> Looks good to me, thanks!  Just a couple of suggestions:

Here is updated patch with those added comments (and adjusted for the
current trunk).  Ok for trunk?

2017-01-10  Jakub Jelinek  

* gimple-ssa-sprintf.c (try_substitute_return_value): Remove
info.nowrite calls with no lhs that can't throw.  Return bool
whether gsi_remove has been called or not.
(pass_sprintf_length::handle_gimple_call): Return bool whether
try_substitute_return_value called gsi_remove.  Formatting fix.
(pass_sprintf_length::execute): Don't use gsi_remove if
handle_gimple_call returned true.

* gcc.dg/tree-ssa/builtin-snprintf-1.c: New test.

--- gcc/gimple-ssa-sprintf.c.jj 2017-01-09 11:35:03.769828764 +0100
+++ gcc/gimple-ssa-sprintf.c2017-01-10 08:21:34.185771116 +0100
@@ -128,7 +128,7 @@ public:
   fold_return_value = param;
 }
 
-  void handle_gimple_call (gimple_stmt_iterator*);
+  bool handle_gimple_call (gimple_stmt_iterator *);
 
   struct call_info;
   bool compute_format_length (call_info &, format_result *);
@@ -2735,9 +2735,11 @@ get_destination_size (tree dest)
described by INFO, substitute the result for the return value of
the call.  The result is suitable if the number of bytes it represents
is known and exact.  A result that isn't suitable for substitution may
-   have its range set to the range of return values, if that is known.  */
+   have its range set to the range of return values, if that is known.
+   Return true if the call is removed and gsi_next should not be performed
+   in the caller.  */
 
-static void
+static bool
 try_substitute_return_value (gimple_stmt_iterator *gsi,
 const pass_sprintf_length::call_info ,
 const format_result )
@@ -2797,6 +2799,24 @@ try_substitute_return_value (gimple_stmt
   res.constant ? "constant" : "variable");
}
 }
+  else if (lhs == NULL_TREE
+  && info.nowrite
+  && !stmt_ends_bb_p (info.callstmt))
+{
+  /* Remove the call to the bounded function with a zero size
+(e.g., snprintf(0, 0, "%i", 123)) if there is no lhs.  */
+  unlink_stmt_vdef (info.callstmt);
+  gsi_remove (gsi, true);
+  if (dump_file)
+   {
+ location_t callloc = gimple_location (info.callstmt);
+ fprintf (dump_file, "On line %i removing ",
+  LOCATION_LINE (callloc));
+ print_generic_expr (dump_file, info.func, dump_flags);
+ fprintf (dump_file, " call.\n");
+   }
+  return true;
+}
   else
 {
   unsigned HOST_WIDE_INT maxbytes;
@@ -2852,19 +2872,22 @@ try_substitute_return_value (gimple_stmt
 inbounds, (unsigned long)res.number_chars - 1, ign);
}
 }
+
+  return false;
 }
 
 /* Determine if a GIMPLE CALL is to one of the sprintf-like built-in
-   functions and if so, handle it.  */
+   functions and if so, handle it.  Return true if the call is removed
+   and gsi_next should not be performed in the caller.  */
 
-void
+bool
 pass_sprintf_length::handle_gimple_call (gimple_stmt_iterator *gsi)
 {
   call_info info = call_info ();
 
   info.callstmt = gsi_stmt (*gsi);
   if (!gimple_call_builtin_p (info.callstmt, BUILT_IN_NORMAL))
-return;
+return false;
 
   info.func = gimple_call_fndecl (info.callstmt);
   info.fncode = DECL_FUNCTION_CODE (info.func);
@@ -2955,7 +2978,7 @@ pass_sprintf_length::handle_gimple_call
   break;
 
 default:
-  return;
+  return false;
 }
 
   /* The first argument is a pointer to the destination.  */
@@ -3019,11 +3042,9 @@ pass_sprintf_length::handle_gimple_call
 }
 
   if (idx_objsize != HOST_WIDE_INT_M1U)
-{
-  if (tree size = gimple_call_arg (info.callstmt, idx_objsize))
- if (tree_fits_uhwi_p (size))
-   objsize = tree_to_uhwi (size);
-}
+if (tree size = gimple_call_arg (info.callstmt, idx_objsize))
+  if (tree_fits_uhwi_p (size))
+   objsize = tree_to_uhwi (size);
 
   if (info.bounded && !dstsize)
 {
@@ -3048,7 +3069,7 @@ pass_sprintf_length::handle_gimple_call
  location_t loc = gimple_location (info.callstmt);
  warning_at (EXPR_LOC_OR_LOC (dstptr, loc),
  info.warnopt (), "null destination pointer");
- return;
+ return false;
}
 
   /* Set the object size to the smaller of the two arguments
@@ -3077,12 +3098,12 @@ pass_sprintf_length::handle_gimple_call
   location_t loc = gimple_location (info.callstmt);
   warning_at (EXPR_LOC_OR_LOC (info.format, loc),
  info.warnopt (), "null format string");
-  return;
+  return false;
 }
 
   info.fmtstr = get_format_string (info.format, );
   if (!info.fmtstr)
-return;
+return false;
 
   /* The result is the number of bytes output by the formatted

Re: [PATCH] Introduce --with-gcc-major-version-only configure option

2017-01-09 Thread Jakub Jelinek

On Tue, Jan 10, 2017 at 12:15:41AM +0100, Matthias Klose wrote:
> On 09.01.2017 21:43, Jakub Jelinek wrote:
> > On Fri, Jan 06, 2017 at 01:48:26PM +0100, Jakub Jelinek wrote:
> >> Yet another option is introduce AC_ARG_ENABLE into all those configure
> >> scripts (some macro in config/*.m4) and do the sed conditionally.
> > 
> > Here is a patch to do that.
> > Bootstrapped/regtested on x86_64-linux (without
> > --with-gcc-major-version-only) and on i686-linux (with
> > --with-gcc-major-version-only), then tested make install of both.
> > The former uses the standard gcc -dumpversion of 7.0.0 and 7.0.0 in
> > pathnames (e.g. usr/local/bin/x86_64-pc-linux-gnu-gcc-7.0.0,
> > usr/local/lib/gcc/x86_64-pc-linux-gnu/7.0.0,
> > usr/local/libexec/gcc/x86_64-pc-linux-gnu/7.0.0,
> > usr/local/lib/go/7.0.0/x86_64-pc-linux-gnu,
> > usr/local/include/c++/7.0.0 etc.), while the latter uses
> > gcc -dumpversion of 7 and 7 in pathnames (e.g.
> > i686-pc-linux-gnu-gcc-7, usr/local/lib/gcc/i686-pc-linux-gnu/7,
> > usr/local/libexec/gcc/i686-pc-linux-gnu/7,
> > usr/local/lib/go/7/i686-pc-linux-gnu,
> > usr/local/include/c++/7 etc.).
> > Ok for trunk?
> 
> Thanks for working on this.  I'm using such a layout for the Debian/Ubuntu GCC
> builds for some years.  The one thing a dislike with your patch is the changed
> output of the -dumpversion option which is different whether you use the the 
> new
> configure option or not.  This could break builds of third party software.  I
> would prefer having -dumpversion the very same output independent of any
> configure options.  Please could you introduce a new option if you really 
> need that?

As I said on IRC, I think -dumpversion of 7 in this configuration is the
right thing to do.  In GCC sources, we have 3 uses of -dumpversion,
two of them look like:
gcc_version := $(shell $(GOC) -dumpversion)
...
toolexeclibgodir = 
$(nover_glibgo_toolexeclibdir)/go/$(gcc_version)/$(target_alias)
libexecsubdir = $(libexecdir)/gcc/$(target_alias)/$(gcc_version)
(in libgo and gotools), one in config/tcl.m4 is like:
if test "`gcc -dumpversion | awk -F. '{print 
[$]1}'`" -lt "3" ; then
AC_MSG_WARN([64bit mode not supported with GCC 
< 3.2 on $system])
which works well whether it prints 7 or 7.1.1.
With --with-gcc-major-version-only, the spec_version is different from
BASEVER, and -dumpversion can print just one of those, when they are not the
same.  So, we either break users that expect they can do
`$CC -dumpmachine`-gcc-`$CC -dumpversion`, or find out the C++ includes by
g++ -dumpversion, etc., or we break users that expect 3 numbers separated
by dot or 2 numbers separated by dot with optional another one.
In the past, we have not always pointed 3 numbers, releases printed just
major.minor, like gcc -dumpversion printed 3.0 (as mentioned in the manual).
But the former 3.0 in the previous versioning scheme corresponds to just 7
in the new one.  So, users that expect 3 numbers are already broken,
and just one number is just adjusting those assumptions to the current
versioning scheme.  Yes, we can add a new option, but IMNSHO it should
be -dumpbaseversion or -dumpfullversion that will always print
major.minor.patchlevel.  From the SUSE bugzilla, it looks like SUSE has
been shipping compilers that printed just 5 or 6 for almost 2 years now,
so hopefully some changes if needed somewhere have been already upstreamed.

Jakub

[Commited/WWW] Add Cavium ThunderX related changes to changes.html for gcc-7

2017-01-09 Thread Andrew Pinski

Just adding the changes that were done to add Cavium ThunderX to changes.html.
Committed as obvious.

Thanks,
Andrew

Index: changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-7/changes.html,v
retrieving revision 1.34
diff -u -p -r1.34 changes.html
--- changes.html9 Jan 2017 11:56:08 -   1.34
+++ changes.html10 Jan 2017 05:46:04 -
@@ -390,7 +390,12 @@ flagged as having failed.
  
Support has been added for the following processors
(GCC identifiers in parentheses): ARM Cortex-A73
-   (cortex-a73) and Broadcom Vulcan (vulcan).
+   (cortex-a73), Broadcom Vulcan (vulcan),
+   Cavium ThunderX CN81xx (thunderxt81<\code>),
+   Cavium ThunderX CN83xx (thunderxt83<\code>),
+   Cavium ThunderX CN88xx (thunderxt88<\code>),
+   Cavium ThunderX CN88xx pass 1.x (thunderxt88p1<\code>),
+   Cavium ThunderX 2 CN99xx (thunderx2t99<\code>).
The GCC identifiers can be used
as arguments to the -mcpu or -mtune options,
for example: -mcpu=cortex-a73 or

Go patch committed: drop size arguments to hash/equal functions

2017-01-09 Thread Ian Lance Taylor

Drop the size arguments for the hash/equal functions stored in type
descriptors.  Types know what size they are.  To make this work,
generate hash/equal functions for types that can use an identity
comparison but are not a standard size and alignment.

Drop the multiplications by 33 in the generated hash code and the
reflect package hash code.  They are not necessary since we started
passing a seed value around, as the seed includes the hash of the
earlier values.

Copy the algorithms for standard types from the Go 1.7 runtime,
replacing the C functions.

Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 244236)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-189ea81cc758e000325fd6cca7882c252d33f8f0
+f439989e483b7c2eada6ddcf6e730a791cce603f
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/expressions.cc
===
--- gcc/go/gofrontend/expressions.cc(revision 244166)
+++ gcc/go/gofrontend/expressions.cc(working copy)
@@ -5335,7 +5335,6 @@ Binary_expression::lower_array_compariso
   Expression_list* args = new Expression_list();
   args->push_back(this->operand_address(inserter, this->left_));
   args->push_back(this->operand_address(inserter, this->right_));
-  args->push_back(Expression::make_type_info(at, TYPE_INFO_SIZE));
 
   Expression* ret = Expression::make_call(func, args, false, loc);
 
Index: gcc/go/gofrontend/gogo.cc
===
--- gcc/go/gofrontend/gogo.cc   (revision 244166)
+++ gcc/go/gofrontend/gogo.cc   (working copy)
@@ -2343,7 +2343,7 @@ Gogo::clear_file_scope()
 // parse tree is lowered.
 
 void
-Gogo::queue_specific_type_function(Type* type, Named_type* name,
+Gogo::queue_specific_type_function(Type* type, Named_type* name, int64_t size,
   const std::string& hash_name,
   Function_type* hash_fntype,
   const std::string& equal_name,
@@ -2351,7 +2351,7 @@ Gogo::queue_specific_type_function(Type*
 {
   go_assert(!this->specific_type_functions_are_written_);
   go_assert(!this->in_global_scope());
-  Specific_type_function* tsf = new Specific_type_function(type, name,
+  Specific_type_function* tsf = new Specific_type_function(type, name, size,
   hash_name,
   hash_fntype,
   equal_name,
@@ -2386,7 +2386,7 @@ Specific_type_functions::type(Type* t)
 case Type::TYPE_NAMED:
   {
Named_type* nt = t->named_type();
-   if (!t->compare_is_identity(this->gogo_) && t->is_comparable())
+   if (t->needs_specific_type_functions(this->gogo_))
  t->type_functions(this->gogo_, nt, NULL, NULL, _fn, _fn);
 
// If this is a struct type, we don't want to make functions
@@ -2420,7 +2420,7 @@ Specific_type_functions::type(Type* t)
 
 case Type::TYPE_STRUCT:
 case Type::TYPE_ARRAY:
-  if (!t->compare_is_identity(this->gogo_) && t->is_comparable())
+  if (t->needs_specific_type_functions(this->gogo_))
t->type_functions(this->gogo_, NULL, NULL, NULL, _fn, _fn);
   break;
 
@@ -2443,7 +2443,7 @@ Gogo::write_specific_type_functions()
 {
   Specific_type_function* tsf = this->specific_type_functions_.back();
   this->specific_type_functions_.pop_back();
-  tsf->type->write_specific_type_functions(this, tsf->name,
+  tsf->type->write_specific_type_functions(this, tsf->name, tsf->size,
   tsf->hash_name,
   tsf->hash_fntype,
   tsf->equal_name,
Index: gcc/go/gofrontend/gogo.h
===
--- gcc/go/gofrontend/gogo.h(revision 244166)
+++ gcc/go/gofrontend/gogo.h(working copy)
@@ -563,7 +563,7 @@ class Gogo
   // used when a type-specific function is needed when not at the top
   // level.
   void
-  queue_specific_type_function(Type* type, Named_type* name,
+  queue_specific_type_function(Type* type, Named_type* name, int64_t size,
   const std::string& hash_name,
   Function_type* hash_fntype,
   const std::string& equal_name,
@@ -824,17 +824,18 @@ class Gogo
   {
 Type* type;
 Named_type* name;
+int64_t size;
 std::string hash_name;
 Function_type* hash_fntype;
 std::string equal_name;
 Function_type* equal_fntype;
 
-Specific_type_function(Type* atype,

[PATCH 9j] testsuite: add x86_64-specific files

2017-01-09 Thread David Malcolm

A collection of test cases, capturing the state of various
functions at various places within the pass list, and verifying
that we can restart at various passes.

gcc/testsuite/ChangeLog:
* gcc.dg/rtl/x86_64/dfinit.c: New test case.
* gcc.dg/rtl/x86_64/different-structs.c: New test case.
* gcc.dg/rtl/x86_64/final.c: New test case.
* gcc.dg/rtl/x86_64/into-cfglayout.c: New test case.
* gcc.dg/rtl/x86_64/ira.c: New test case.
* gcc.dg/rtl/x86_64/pro_and_epilogue.c: New test case.
* gcc.dg/rtl/x86_64/test-multiple-fns.c: New test case.
* gcc.dg/rtl/x86_64/test-return-const.c.after-expand.c: New test case.
* gcc.dg/rtl/x86_64/test-return-const.c.before-fwprop.c: New test case.
* gcc.dg/rtl/x86_64/test-rtl.c: New test case.
* gcc.dg/rtl/x86_64/test_1.h: New file.
* gcc.dg/rtl/x86_64/times-two.c.after-expand.c: New test case.
* gcc.dg/rtl/x86_64/times-two.c.before-df.c: New test case.
* gcc.dg/rtl/x86_64/times-two.h: New file.
* gcc.dg/rtl/x86_64/vregs.c: New test case.
---
 gcc/testsuite/gcc.dg/rtl/x86_64/dfinit.c   | 116 ++
 .../gcc.dg/rtl/x86_64/different-structs.c  |  81 +
 gcc/testsuite/gcc.dg/rtl/x86_64/final.c| 133 +
 gcc/testsuite/gcc.dg/rtl/x86_64/into-cfglayout.c   | 117 ++
 gcc/testsuite/gcc.dg/rtl/x86_64/ira.c  | 111 +
 gcc/testsuite/gcc.dg/rtl/x86_64/pro_and_epilogue.c | 110 +
 .../gcc.dg/rtl/x86_64/test-multiple-fns.c  | 105 
 .../rtl/x86_64/test-return-const.c.after-expand.c  |  39 ++
 .../rtl/x86_64/test-return-const.c.before-fwprop.c |  42 +++
 gcc/testsuite/gcc.dg/rtl/x86_64/test-rtl.c | 101 
 gcc/testsuite/gcc.dg/rtl/x86_64/test_1.h   |  16 +++
 .../gcc.dg/rtl/x86_64/times-two.c.after-expand.c   |  70 +++
 .../gcc.dg/rtl/x86_64/times-two.c.before-df.c  |  54 +
 gcc/testsuite/gcc.dg/rtl/x86_64/times-two.h|  22 
 gcc/testsuite/gcc.dg/rtl/x86_64/vregs.c| 112 +
 15 files changed, 1229 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/rtl/x86_64/dfinit.c
 create mode 100644 gcc/testsuite/gcc.dg/rtl/x86_64/different-structs.c
 create mode 100644 gcc/testsuite/gcc.dg/rtl/x86_64/final.c
 create mode 100644 gcc/testsuite/gcc.dg/rtl/x86_64/into-cfglayout.c
 create mode 100644 gcc/testsuite/gcc.dg/rtl/x86_64/ira.c
 create mode 100644 gcc/testsuite/gcc.dg/rtl/x86_64/pro_and_epilogue.c
 create mode 100644 gcc/testsuite/gcc.dg/rtl/x86_64/test-multiple-fns.c
 create mode 100644 
gcc/testsuite/gcc.dg/rtl/x86_64/test-return-const.c.after-expand.c
 create mode 100644 
gcc/testsuite/gcc.dg/rtl/x86_64/test-return-const.c.before-fwprop.c
 create mode 100644 gcc/testsuite/gcc.dg/rtl/x86_64/test-rtl.c
 create mode 100644 gcc/testsuite/gcc.dg/rtl/x86_64/test_1.h
 create mode 100644 gcc/testsuite/gcc.dg/rtl/x86_64/times-two.c.after-expand.c
 create mode 100644 gcc/testsuite/gcc.dg/rtl/x86_64/times-two.c.before-df.c
 create mode 100644 gcc/testsuite/gcc.dg/rtl/x86_64/times-two.h
 create mode 100644 gcc/testsuite/gcc.dg/rtl/x86_64/vregs.c

diff --git a/gcc/testsuite/gcc.dg/rtl/x86_64/dfinit.c 
b/gcc/testsuite/gcc.dg/rtl/x86_64/dfinit.c
new file mode 100644
index 000..3425b97
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/rtl/x86_64/dfinit.c
@@ -0,0 +1,116 @@
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-fdump-rtl-dfinit" } */
+
+#include "test_1.h"
+
+/* Lightly-modified dump of test.c.261r.split1 for x86_64.  */
+
+int __RTL (startwith ("no-opt dfinit")) test_1 (int i, int j, int k)
+{
+(function "test_1"
+  (param "i"
+(DECL_RTL (mem/c:SI (plus:DI (reg/f:DI frame)
+(const_int -4)) [1 i+0 S4 A32]))
+(DECL_RTL_INCOMING (reg:SI di [ i ])))
+  (param "j"
+(DECL_RTL (mem/c:SI (plus:DI (reg/f:DI frame)
+(const_int -8)) [1 j+0 S4 A32]))
+(DECL_RTL_INCOMING (reg:SI si [ j ])))
+  (param "k"
+(DECL_RTL (mem/c:SI (plus:DI (reg/f:DI frame)
+(const_int -12)) [1 k+0 S4 A32]))
+(DECL_RTL_INCOMING (reg:SI dx [ k ])))
+  (insn-chain
+(cnote 1 NOTE_INSN_DELETED)
+(block 2
+  (edge-from entry (flags "FALLTHRU"))
+  (cnote 6 [bb 2] NOTE_INSN_BASIC_BLOCK)
+  (cinsn 2 (set (mem/c:SI (plus:DI (reg/f:DI frame)
+(const_int -4)) [1 i+0 S4 A32])
+(reg:SI di [ i ])) 
"../../src/gcc/testsuite/gcc.dg/rtl/test.c":2)
+  (cinsn 3 (set (mem/c:SI (plus:DI (reg/f:DI frame)
+(const_int -8)) [1 j+0 S4 A32])
+(reg:SI si [ j ])) 
"../../src/gcc/testsuite/gcc.dg/rtl/test.c":2)
+  (cinsn 4 (set (mem/c:SI (plus:DI (reg/f:DI frame)
+(const_int -12)) [1 k+0 S4 A32])
+(reg:SI dx [ k ])) 
"../../src/gcc/testsuite/gcc.dg/rtl/test.c":2)
+

[PATCH 9h] testsuite: add platform-independent files

2017-01-09 Thread David Malcolm

This patch adds:
  - an rtl.exp (to make it easy to run just the tests
for __RTL-tagged functions)
  - a test.c source file I used when generating the various RTL
dumps (for reference)
  - a couple of tests of __RTL parser errors

gcc/testsuite/ChangeLog:
* gcc.dg/rtl/rtl.exp: New file.
* gcc.dg/rtl/test.c: New file.
* gcc.dg/rtl/truncated-rtl-file.c: New test case.
* gcc.dg/rtl/unknown-rtx-code.c: New test case.
---
 gcc/testsuite/gcc.dg/rtl/rtl.exp  | 41 +++
 gcc/testsuite/gcc.dg/rtl/test.c   | 31 
 gcc/testsuite/gcc.dg/rtl/truncated-rtl-file.c |  2 ++
 gcc/testsuite/gcc.dg/rtl/unknown-rtx-code.c   |  8 ++
 4 files changed, 82 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/rtl/rtl.exp
 create mode 100644 gcc/testsuite/gcc.dg/rtl/test.c
 create mode 100644 gcc/testsuite/gcc.dg/rtl/truncated-rtl-file.c
 create mode 100644 gcc/testsuite/gcc.dg/rtl/unknown-rtx-code.c

diff --git a/gcc/testsuite/gcc.dg/rtl/rtl.exp b/gcc/testsuite/gcc.dg/rtl/rtl.exp
new file mode 100644
index 000..70a6d8b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/rtl/rtl.exp
@@ -0,0 +1,41 @@
+#   Copyright (C) 2016-2017 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+# 
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+# 
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# .
+
+# GCC testsuite that uses the `dg.exp' driver.
+
+# Load support procs.
+load_lib gcc-dg.exp
+
+# If a testcase doesn't have special options, use these.
+global DEFAULT_RTLFLAGS
+if ![info exists DEFAULT_RTLFLAGS] then {
+set DEFAULT_RTLFLAGS ""
+# -fdump-tree-rtl-raw
+}
+
+# Initialize `dg'.
+dg-init
+
+# Gather a list of all tests.
+set tests [lsort [find $srcdir/$subdir *.c]]
+
+verbose "rtl.exp tests: $tests" 1
+
+# Main loop.
+dg-runtest $tests "" $DEFAULT_RTLFLAGS
+
+# All done.
+dg-finish
diff --git a/gcc/testsuite/gcc.dg/rtl/test.c b/gcc/testsuite/gcc.dg/rtl/test.c
new file mode 100644
index 000..ebb8aef
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/rtl/test.c
@@ -0,0 +1,31 @@
+int test_1 (int i, int j, int k)
+{
+  if (i < j)
+return k + 4;
+  else
+return -k;
+}
+
+/* Example showing:
+   - data structure
+   - loop
+   - call to "abort".  */
+
+struct foo
+{
+  int count;
+  float *data;
+};
+
+float test_2 (struct foo *lhs, struct foo *rhs)
+{
+  float result = 0.0f;
+
+  if (lhs->count != rhs->count)
+__builtin_abort ();
+
+  for (int i = 0; i < lhs->count; i++)
+result += lhs->data[i] * rhs->data[i];
+
+  return result;
+}
diff --git a/gcc/testsuite/gcc.dg/rtl/truncated-rtl-file.c 
b/gcc/testsuite/gcc.dg/rtl/truncated-rtl-file.c
new file mode 100644
index 000..4dd8214
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/rtl/truncated-rtl-file.c
@@ -0,0 +1,2 @@
+void __RTL test (void)
+{ /* { dg-error "no closing brace" } */
diff --git a/gcc/testsuite/gcc.dg/rtl/unknown-rtx-code.c 
b/gcc/testsuite/gcc.dg/rtl/unknown-rtx-code.c
new file mode 100644
index 000..dd252f1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/rtl/unknown-rtx-code.c
@@ -0,0 +1,8 @@
+void __RTL test (void)
+{
+  (function "test"
+(insn-chain
+  (not-a-valid-kind-of-insn 1 0 0) ;; { dg-error "unknown rtx code" }
+) ;; insn-chain
+  ) ;; function
+}
-- 
1.8.5.3

[PATCH 9f] Add a way for the C frontend to compile __RTL-tagged functions

2017-01-09 Thread David Malcolm

The backend is full of singleton state, so we have to compile
__RTL-functions as soon as we parse them.  This means that the
C frontend needs to invoke the backed.

This patch adds the support needed.

Normally this would be a no-no, and including rtl headers is
blocked by this within system.h:

 /* Front ends should never have to include middle-end headers.  Enforce
this by poisoning the header double-include protection defines.  */
 #ifdef IN_GCC_FRONTEND
 #pragma GCC poison GCC_RTL_H GCC_EXCEPT_H GCC_EXPR_H
 #endif

Hence the patch puts the decl into a new header (run-rtl-passes.h)
that's accessible to the C frontend without exposing any RTL
internals.  (If adding a header for just this decl is overkill, is
there a better place to put the decl?)

gcc/ChangeLog:
* Makefile.in (OBJS): Add run-rtl-passes.o.
* pass_manager.h (gcc::pass_manager::get_rest_of_compilation): New
accessor.
(gcc::pass_manager::get_clean_slate): New accessor.
* run-rtl-passes.c: New file.
* run-rtl-passes.h: New file.
---
 gcc/Makefile.in  |  1 +
 gcc/pass_manager.h   |  6 +
 gcc/run-rtl-passes.c | 66 
 gcc/run-rtl-passes.h | 25 
 4 files changed, 98 insertions(+)
 create mode 100644 gcc/run-rtl-passes.c
 create mode 100644 gcc/run-rtl-passes.h

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 3d9532b..3ad53ad 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1442,6 +1442,7 @@ OBJS = \
rtlhash.o \
rtlanal.o \
rtlhooks.o \
+   run-rtl-passes.o \
sbitmap.o \
sched-deps.o \
sched-ebb.o \
diff --git a/gcc/pass_manager.h b/gcc/pass_manager.h
index 4d15407..ae97cd4 100644
--- a/gcc/pass_manager.h
+++ b/gcc/pass_manager.h
@@ -82,6 +82,12 @@ public:
 
   opt_pass *get_pass_by_name (const char *name);
 
+  opt_pass *get_rest_of_compilation () const
+  {
+return pass_rest_of_compilation_1;
+  }
+  opt_pass *get_clean_slate () const { return pass_clean_state_1; }
+
 public:
   /* The root of the compilation pass tree, once constructed.  */
   opt_pass *all_passes;
diff --git a/gcc/run-rtl-passes.c b/gcc/run-rtl-passes.c
new file mode 100644
index 000..e1ac4bd
--- /dev/null
+++ b/gcc/run-rtl-passes.c
@@ -0,0 +1,66 @@
+/* run-rtl-passes.c - Run RTL passes directly from frontend
+   Copyright (C) 2016-2017 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "target.h"
+#include "rtl.h"
+#include "function.h"
+#include "basic-block.h"
+#include "tree-pass.h"
+#include "context.h"
+#include "pass_manager.h"
+#include "bitmap.h"
+#include "df.h"
+#include "regs.h"
+#include "insn-attr-common.h" /* for INSN_SCHEDULING.  */
+#include "insn-attr.h" /* for init_sched_attrs.  */
+#include "run-rtl-passes.h"
+
+/* Run the backend passes, starting at the given pass.
+   Take ownership of INITIAL_PASS_NAME.  */
+
+void
+run_rtl_passes (char *initial_pass_name)
+{
+  cfun->pass_startwith = initial_pass_name;
+  max_regno = max_reg_num ();
+
+  /* Pass "expand" normally sets this up.  */
+#ifdef INSN_SCHEDULING
+  init_sched_attrs ();
+#endif
+
+  bitmap_obstack_initialize (NULL);
+  bitmap_obstack_initialize (_obstack);
+
+  opt_pass *rest_of_compilation
+= g->get_passes ()->get_rest_of_compilation ();
+  gcc_assert (rest_of_compilation);
+  execute_pass_list (cfun, rest_of_compilation);
+
+  opt_pass *clean_slate = g->get_passes ()->get_clean_slate ();
+  gcc_assert (clean_slate);
+  execute_pass_list (cfun, clean_slate);
+
+  bitmap_obstack_release (_obstack);
+
+  cfun->curr_properties |= PROP_rtl;
+}
diff --git a/gcc/run-rtl-passes.h b/gcc/run-rtl-passes.h
new file mode 100644
index 000..1390303
--- /dev/null
+++ b/gcc/run-rtl-passes.h
@@ -0,0 +1,25 @@
+/* run-rtl-passes.h - Run a subset of the RTL passes
+   Copyright (C) 2016-2017 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A

[PATCH 9d] Don't call delete_tree_ssa for __RTL functions

2017-01-09 Thread David Malcolm

gcc/ChangeLog:
* final.c (rest_of_clean_state): Don't call delete_tree_ssa for
__RTL functions.
---
 gcc/final.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/final.c b/gcc/final.c
index 8a4c9f8..2483381 100644
--- a/gcc/final.c
+++ b/gcc/final.c
@@ -4699,7 +4699,8 @@ rest_of_clean_state (void)
 
   free_bb_for_insn ();
 
-  delete_tree_ssa (cfun);
+  if (cfun->gimple_df)
+delete_tree_ssa (cfun);
 
   /* We can reduce stack alignment on call site only when we are sure that
  the function body just produced will be actually used in the final
-- 
1.8.5.3

[PATCH 9g] Extend .md and RTL parsing to support being wired up to cc1

2017-01-09 Thread David Malcolm

gcc/ChangeLog:
* read-md.c (md_reader::read_char): Support filtering
the input to a subset of line numbers.
(md_reader::md_reader): Initialize fields
m_first_line and m_last_line.
(md_reader::read_file_fragment): New function.
* read-md.h (md_reader::read_file_fragment): New decl.
(md_reader::m_first_line): New field.
(md_reader::m_last_line): New field.
* read-rtl-function.c (function_reader::create_function): Only
create cfun if it doesn't already exist.  Set PROP_rtl on cfun's
curr_properties.  Set DECL_INITIAL to a dummy block.
(read_rtl_function_body_from_file_range): New function.
* read-rtl-function.h (read_rtl_function_body_from_file_range):
New decl.
---
 gcc/read-md.c   | 34 +++-
 gcc/read-md.h   |  7 +
 gcc/read-rtl-function.c | 83 +++--
 gcc/read-rtl-function.h |  3 ++
 4 files changed, 109 insertions(+), 18 deletions(-)

diff --git a/gcc/read-md.c b/gcc/read-md.c
index ac28944..4036afa 100644
--- a/gcc/read-md.c
+++ b/gcc/read-md.c
@@ -411,6 +411,16 @@ md_reader::read_char (void)
   else
 m_read_md_colno++;
 
+  /* If we're filtering lines, treat everything before the range of
+ interest as a space, and as EOF for everything after.  */
+  if (m_first_line && m_last_line)
+{
+  if (m_read_md_lineno < m_first_line)
+   return ' ';
+  if (m_read_md_lineno > m_last_line)
+   return EOF;
+}
+
   return ch;
 }
 
@@ -991,7 +1001,9 @@ md_reader::md_reader (bool compact)
   m_read_md_lineno (0),
   m_read_md_colno (0),
   m_first_dir_md_include (NULL),
-  m_last_dir_md_include_ptr (_first_dir_md_include)
+  m_last_dir_md_include_ptr (_first_dir_md_include),
+  m_first_line (0),
+  m_last_line (0)
 {
   /* Set the global singleton pointer.  */
   md_reader_ptr = this;
@@ -1314,6 +1326,26 @@ md_reader::read_file (const char *filename)
   return !have_error;
 }
 
+/* Read FILENAME, filtering to just the given lines.  */
+
+bool
+md_reader::read_file_fragment (const char *filename,
+  int first_line,
+  int last_line)
+{
+  m_read_md_filename = filename;
+  m_read_md_file = fopen (m_read_md_filename, "r");
+  if (m_read_md_file == 0)
+{
+  perror (m_read_md_filename);
+  return false;
+}
+  m_first_line = first_line;
+  m_last_line = last_line;
+  handle_toplevel_file ();
+  return !have_error;
+}
+
 /* class noop_reader : public md_reader */
 
 /* A dummy implementation which skips unknown directives.  */
diff --git a/gcc/read-md.h b/gcc/read-md.h
index 4fcbcb4..fea7011 100644
--- a/gcc/read-md.h
+++ b/gcc/read-md.h
@@ -111,6 +111,9 @@ class md_reader
 
   bool read_md_files (int, const char **, bool (*) (const char *));
   bool read_file (const char *filename);
+  bool read_file_fragment (const char *filename,
+  int first_line,
+  int last_line);
 
   /* A hook that handles a single .md-file directive, up to but not
  including the closing ')'.  It takes two arguments: the file position
@@ -245,6 +248,10 @@ class md_reader
 
   /* A table of enum_type structures, hashed by name.  */
   htab_t m_enum_types;
+
+  /* If non-zero, filter the input to just this subset of lines.  */
+  int m_first_line;
+  int m_last_line;
 };
 
 /* Global singleton; constrast with rtx_reader_ptr below.  */
diff --git a/gcc/read-rtl-function.c b/gcc/read-rtl-function.c
index c5cb3f7..f27e174 100644
--- a/gcc/read-rtl-function.c
+++ b/gcc/read-rtl-function.c
@@ -475,23 +475,38 @@ function_reader::create_function ()
   /* We start in cfgrtl mode, rather than cfglayout mode.  */
   rtl_register_cfg_hooks ();
 
-  /* Create cfun.  */
-  tree fn_name = get_identifier (m_name ? m_name : "test_1");
-  tree int_type = integer_type_node;
-  tree return_type = int_type;
-  tree arg_types[3] = {int_type, int_type, int_type};
-  tree fn_type = build_function_type_array (return_type, 3, arg_types);
-  tree fndecl = build_decl_stat (UNKNOWN_LOCATION, FUNCTION_DECL, fn_name,
-fn_type);
-  tree resdecl = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE,
-return_type);
-  DECL_ARTIFICIAL (resdecl) = 1;
-  DECL_IGNORED_P (resdecl) = 1;
-  DECL_RESULT (fndecl) = resdecl;
-  allocate_struct_function (fndecl, false);
-  /* This sets cfun.  */
-
-  current_function_decl = fndecl;
+  /* When run from selftests or "rtl1", cfun is NULL.
+ When run from "cc1" for a C function tagged with __RTL, cfun is the
+ tagged function.  */
+  if (!cfun)
+{
+  tree fn_name = get_identifier (m_name ? m_name : "test_1");
+  tree int_type = integer_type_node;
+  tree return_type = int_type;
+  tree arg_types[3] = {int_type, int_type, int_type};
+  tree fn_type = build_function_type_array (return_type, 3, arg_types);
+  tree

[PATCH 9i] testsuite: add aarch64-specific files

2017-01-09 Thread David Malcolm

gcc/testsuite/ChangeLog:
* gcc.dg/rtl/aarch64/asr_div1.c: New test case.
* gcc.dg/rtl/aarch64/pr71779.c: New test case.
---
 gcc/testsuite/gcc.dg/rtl/aarch64/asr_div1.c | 41 +++
 gcc/testsuite/gcc.dg/rtl/aarch64/pr71779.c  | 50 +
 2 files changed, 91 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/rtl/aarch64/asr_div1.c
 create mode 100644 gcc/testsuite/gcc.dg/rtl/aarch64/pr71779.c

diff --git a/gcc/testsuite/gcc.dg/rtl/aarch64/asr_div1.c 
b/gcc/testsuite/gcc.dg/rtl/aarch64/asr_div1.c
new file mode 100644
index 000..a95c8c7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/rtl/aarch64/asr_div1.c
@@ -0,0 +1,41 @@
+/* { dg-do compile { target aarch64-*-* } } */
+/* { dg-options "-mtune=cortex-a53 -fdump-rtl-combine -O2" } */
+
+/* Taken from
+ gcc/testsuite/gcc.dg/asr_div1.c -O2 -fdump-rtl-all -mtune=cortex-a53
+   for aarch64, hand editing to the new format.  */
+
+int __RTL (startwith ("combine")) f1 (int n)
+{
+(function "f1"
+  (param "n"
+(DECL_RTL (reg/v:SI <1> [ n ]))
+(DECL_RTL_INCOMING (reg:SI x0 [ n ]))
+  ) ;; param "n"
+  (insn-chain
+(block 2
+  (edge-from entry (flags "FALLTHRU"))
+  (cnote 6 [bb 2] NOTE_INSN_BASIC_BLOCK)
+  (cinsn 8 (set (reg:DI <2>)
+(lshiftrt:DI (reg:DI <0>)
+(const_int 32)))
+"../../src/gcc/testsuite/gcc.dg/asr_div1.c":14
+(expr_list:REG_DEAD (reg:DI <0>)))
+  (cinsn 9 (set (reg:SI <1>)
+(ashiftrt:SI (subreg:SI (reg:DI <2>) 0)
+(const_int 3)))
+"../../src/gcc/testsuite/gcc.dg/asr_div1.c":14
+(expr_list:REG_DEAD (reg:DI <2>)))
+
+  ;; Extra insn, to avoid all of the above from being deleted by DCE
+  (insn 10 (use (reg/i:SI <1>)))
+
+  (edge-to exit (flags "FALLTHRU"))
+) ;; block 2
+  ) ;; insn-chain
+) ;; function
+}
+
+/* Verify that insns 8 and 9 get combined into a shift of 35 (0x23) */
+/* { dg-final { scan-rtl-dump "allowing combination of insns 8 and 9" 
"combine" } } */
+/* { dg-final { scan-rtl-dump "modifying insn i3 9: 
r\[0-9\]+:SI#0=r\[0-9\]+:DI>>0x23" "combine" } } */
diff --git a/gcc/testsuite/gcc.dg/rtl/aarch64/pr71779.c 
b/gcc/testsuite/gcc.dg/rtl/aarch64/pr71779.c
new file mode 100644
index 000..9174abb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/rtl/aarch64/pr71779.c
@@ -0,0 +1,50 @@
+/* { dg-do compile { target aarch64-*-* } } */
+/* { dg-options "-fdump-rtl-cse1" } */
+
+/* Dump taken from comment 2 of PR 71779, of
+   "...the relevant memory access coming out of expand"
+   hand-edited to the compact dump format.  */
+
+int __RTL (startwith ("cse1")) test (int n)
+{
+(function "fragment"
+  (param "n"
+(DECL_RTL (reg/v:SI <1> [ n ]))
+(DECL_RTL_INCOMING (reg:SI x0 [ n ]))
+  ) ;; param "n"
+  (insn-chain
+(block 2
+  (edge-from entry (flags "FALLTHRU"))
+  (cnote 6 [bb 2] NOTE_INSN_BASIC_BLOCK)
+
+;; MEM[(struct isl_obj *)] = _obj_map_vtable;
+(insn 1045 (set (reg:SI <480>)
+(high:SI (symbol_ref:SI ("isl_obj_map_vtable")
+[flags 0xc0]
+)))
+ "y.c":12702)
+(insn 1046 (set (reg/f:SI <479>)
+(lo_sum:SI (reg:SI <480>)
+(symbol_ref:SI ("isl_obj_map_vtable")
+   [flags 0xc0]
+   )))
+ "y.c":12702
+ (expr_list:REG_EQUAL (symbol_ref:SI ("isl_obj_map_vtable")
+ [flags 0xc0]
+ )))
+(insn 1047 (set (reg:DI <481>)
+(subreg:DI (reg/f:SI <479>) 0)) "y.c":12702)
+(insn 1048 (set (zero_extract:DI (reg/v:DI <191> [ obj1D.17368 ])
+(const_int 32)
+(const_int 0))
+(reg:DI <481>)) "y.c":12702)
+;; Extra insn, to avoid all of the above from being deleted by DCE
+(insn 1049 (set (mem:DI (reg:DI <191>) [1 i+0 S4 A32])
+ (const_int 1)))
+  (edge-to exit (flags "FALLTHRU"))
+) ;; block 2
+  ) ;; insn-chain
+) ;; function
+}
+
+/* TODO: scan the dump.  */
-- 
1.8.5.3

[PATCH 9e] Update "startwith" logic for pass-skipping to handle __RTL functions

2017-01-09 Thread David Malcolm

gcc/ChangeLog:
* passes.c: Include "insn-addr.h".
(should_skip_pass_p): Add logging.  Update logic for running
"expand" to be compatible with both __GIMPLE and __RTL.  Guard
property-provider override so it is only done for gimple passes.
Don't skip dfinit.
(skip_pass): New function.
(execute_one_pass): Call skip_pass when skipping passes.
---
 gcc/passes.c | 65 +---
 1 file changed, 58 insertions(+), 7 deletions(-)

diff --git a/gcc/passes.c b/gcc/passes.c
index 31262ed..6954d1e 100644
--- a/gcc/passes.c
+++ b/gcc/passes.c
@@ -59,6 +59,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "cfgrtl.h"
 #include "tree-ssa-live.h"  /* For remove_unused_locals.  */
 #include "tree-cfgcleanup.h"
+#include "insn-addr.h" /* for INSN_ADDRESSES_ALLOC.  */
 
 using namespace gcc;
 
@@ -2315,26 +2316,73 @@ should_skip_pass_p (opt_pass *pass)
   if (!cfun->pass_startwith)
 return false;
 
-  /* We can't skip the lowering phase yet -- ideally we'd
- drive that phase fully via properties.  */
-  if (!(cfun->curr_properties & PROP_ssa))
-return false;
+ /* For __GIMPLE functions, we have to at least start when we leave
+ SSA.  */
+  if (pass->properties_destroyed & PROP_ssa)
+{
+  if (!quiet_flag)
+   fprintf (stderr, "starting anyway when leaving SSA: %s\n", pass->name);
+  cfun->pass_startwith = NULL;
+  return false;
+}
 
   if (determine_pass_name_match (pass->name, cfun->pass_startwith))
 {
+  if (!quiet_flag)
+   fprintf (stderr, "found starting pass: %s\n", pass->name);
   cfun->pass_startwith = NULL;
   return false;
 }
 
-  /* And also run any property provider.  */
-  if (pass->properties_provided != 0)
+  /* Run any property provider.  */
+  if (pass->type == GIMPLE_PASS
+  && pass->properties_provided != 0)
 return false;
 
+  /* Don't skip df init; later RTL passes need it.  */
+  if (strstr (pass->name, "dfinit") != NULL)
+return false;
+
+  if (!quiet_flag)
+fprintf (stderr, "skipping pass: %s\n", pass->name);
+
   /* If we get here, then we have a "startwith" that we haven't seen yet;
  skip the pass.  */
   return true;
 }
 
+/* Skip the given pass, for handling passes before "startwith"
+   in __GIMPLE and__RTL-marked functions.
+   In theory, this ought to be a no-op, but some of the RTL passes
+   need additional processing here.  */
+
+static void
+skip_pass (opt_pass *pass)
+{
+  /* Pass "reload" sets the global "reload_completed", and many
+ things depend on this (e.g. instructions in .md files).  */
+  if (strcmp (pass->name, "reload") == 0)
+reload_completed = 1;
+
+  /* The INSN_ADDRESSES vec is normally set up by
+ shorten_branches; set it up for the benefit of passes that
+ run after this.  */
+  if (strcmp (pass->name, "shorten") == 0)
+INSN_ADDRESSES_ALLOC (get_max_uid ());
+
+  /* Update the cfg hooks as appropriate.  */
+  if (strcmp (pass->name, "into_cfglayout") == 0)
+{
+  cfg_layout_rtl_register_cfg_hooks ();
+  cfun->curr_properties |= PROP_cfglayout;
+}
+  if (strcmp (pass->name, "outof_cfglayout") == 0)
+{
+  rtl_register_cfg_hooks ();
+  cfun->curr_properties &= ~PROP_cfglayout;
+}
+}
+
 /* Execute PASS. */
 
 bool
@@ -2375,7 +2423,10 @@ execute_one_pass (opt_pass *pass)
 }
 
   if (should_skip_pass_p (pass))
-return true;
+{
+  skip_pass (pass);
+  return true;
+}
 
   /* Pass execution event trigger: useful to identify passes being
  executed.  */
-- 
1.8.5.3

[PATCH 9c] callgraph: handle __RTL functions

2017-01-09 Thread David Malcolm

The RTL backend code is full of singleton state, so we have to handle
functions as soon as we parse them.  This requires various special-casing
in the callgraph code.

gcc/ChangeLog:
* cgraph.h (symtab_node::native_rtl_p): New decl.
* cgraphunit.c (symtab_node::native_rtl_p): New function.
(symtab_node::needed_p): Don't assert for early assembly output
for __RTL functions.
(cgraph_node::finalize_function): Set "force_output" for __RTL
functions.
(cgraph_node::analyze): Bail out early for __RTL functions.
(analyze_functions): Update assertion to support __RTL functions.
(cgraph_node::expand): Bail out early for __RTL functions.
* gimple-expr.c: Include "tree-pass.h".
(gimple_has_body_p): Return false for __RTL functions.
---
 gcc/cgraph.h  |  4 
 gcc/cgraphunit.c  | 41 ++---
 gcc/gimple-expr.c |  3 ++-
 3 files changed, 44 insertions(+), 4 deletions(-)

diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index db2915c..edaae51 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -328,6 +328,10 @@ public:
  configury. This function is used just during symbol creation.  */
   bool needed_p (void);
 
+  /* Return true if this symbol is a function from the C frontend specified
+ directly in RTL form (with "__RTL").  */
+  bool native_rtl_p () const;
+
   /* Return true when there are references to the node.  */
   bool referred_to_p (bool include_self = true);
 
diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
index 81a3ae9..ed699e1 100644
--- a/gcc/cgraphunit.c
+++ b/gcc/cgraphunit.c
@@ -217,6 +217,19 @@ static void handle_alias_pairs (void);
 /* Used for vtable lookup in thunk adjusting.  */
 static GTY (()) tree vtable_entry_type;
 
+/* Return true if this symbol is a function from the C frontend specified
+   directly in RTL form (with "__RTL").  */
+
+bool
+symtab_node::native_rtl_p () const
+{
+  if (TREE_CODE (decl) != FUNCTION_DECL)
+return false;
+  if (!DECL_STRUCT_FUNCTION (decl))
+return false;
+  return DECL_STRUCT_FUNCTION (decl)->curr_properties & PROP_rtl;
+}
+
 /* Determine if symbol declaration is needed.  That is, visible to something
either outside this translation unit, something magic in the system
configury */
@@ -225,8 +238,10 @@ symtab_node::needed_p (void)
 {
   /* Double check that no one output the function into assembly file
  early.  */
-  gcc_checking_assert (!DECL_ASSEMBLER_NAME_SET_P (decl)
-  || !TREE_SYMBOL_REFERENCED (DECL_ASSEMBLER_NAME (decl)));
+  if (!native_rtl_p ())
+  gcc_checking_assert
+   (!DECL_ASSEMBLER_NAME_SET_P (decl)
+|| !TREE_SYMBOL_REFERENCED (DECL_ASSEMBLER_NAME (decl)));
 
   if (!definition)
 return false;
@@ -435,6 +450,14 @@ cgraph_node::finalize_function (tree decl, bool no_collect)
   && !DECL_DISREGARD_INLINE_LIMITS (decl))
 node->force_output = 1;
 
+  /* __RTL functions were already output as soon as they were parsed (due
+ to the large amount of global state in the backend).
+ Mark such functions as "force_output" to reflect the fact that they
+ will be in the asm file when considering the symbols they reference.
+ The attempt to output them later on will bail out immediately.  */
+  if (node->native_rtl_p ())
+node->force_output = 1;
+
   /* When not optimizing, also output the static functions. (see
  PR24561), but don't do so for always_inline functions, functions
  declared inline and nested functions.  These were optimized out
@@ -568,6 +591,12 @@ cgraph_node::add_new_function (tree fndecl, bool lowered)
 void
 cgraph_node::analyze (void)
 {
+  if (native_rtl_p ())
+{
+  analyzed = true;
+  return;
+}
+
   tree decl = this->decl;
   location_t saved_loc = input_location;
   input_location = DECL_SOURCE_LOCATION (decl);
@@ -1226,7 +1255,8 @@ analyze_functions (bool first_time)
 
  gcc_assert (!cnode->definition || cnode->thunk.thunk_p
  || cnode->alias
- || gimple_has_body_p (decl));
+ || gimple_has_body_p (decl)
+ || cnode->native_rtl_p ());
  gcc_assert (cnode->analyzed == cnode->definition);
}
   node->aux = NULL;
@@ -1965,6 +1995,11 @@ cgraph_node::expand (void)
   /* We ought to not compile any inline clones.  */
   gcc_assert (!global.inlined_to);
 
+  /* __RTL functions are compiled as soon as they are parsed, so don't
+ do it again.  */
+  if (native_rtl_p ())
+return;
+
   announce_function (decl);
   process = 0;
   gcc_assert (lowered);
diff --git a/gcc/gimple-expr.c b/gcc/gimple-expr.c
index b435b99..2ee87c2 100644
--- a/gcc/gimple-expr.c
+++ b/gcc/gimple-expr.c
@@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "demangle.h"
 #include "hash-set.h"
 #include "rtl.h"
+#include "tree-pass.h"
 
 /* - Type related -  */
 
@@ -323,7 +324,7 @@ bool

[PATCH 9/9] Add "__RTL" to cc1 (v8)

2017-01-09 Thread David Malcolm

This is a slightly updated version of the patch sent here:

  "[PATCH] Add "__RTL" to cc1 (v7)"
https://gcc.gnu.org/ml/gcc-patches/2016-12/msg01662.html

but split up into smaller parts to (I hope) make review
easier.

Other changes in v8:
 - fix copyright years in new files
 - split out changes to passes.c; fixups to pass skipping

The set of patches wires up the RTL-reading code into cc1 for
functions tagged with "__RTL" in an analogous manner to those
tagged with "__GIMPLE", and adds a collection of testcases using
this functionality.

One difference from the GIMPLE frontend is that, due to the
pervasive singleton state throughout the RTL code, we can't have
more than one RTL function in memory at once.  Hence as soon as
we're done parsing an __RTL-tagged function we have to run the
rest of the passes on it, and emit asm for it, rather than having
the callgraph control this.

Successfully bootstrapped on x86_64-pc-linux-gnu;
v7 was successfully built for 191 target configurations.

David Malcolm (10):
  Add "__RTL" to C frontend
  Don't assume that copy tables were initialized
  callgraph: handle __RTL functions
  Don't call delete_tree_ssa for __RTL functions
  Update "startwith" logic for pass-skipping to handle __RTL functions
  Add a way for the C frontend to compile __RTL-tagged functions
  Extend .md and RTL parsing to support being wired up to cc1
  testsuite: add platform-independent files
  testsuite: add aarch64-specific files
  testsuite: add x86_64-specific files

 gcc/Makefile.in|   1 +
 gcc/c-family/c-common.c|   1 +
 gcc/c-family/c-common.h|   3 +
 gcc/c/c-parser.c   | 109 -
 gcc/c/c-tree.h |   7 +-
 gcc/c/gimple-parser.c  |   8 +-
 gcc/c/gimple-parser.h  |   2 +-
 gcc/cfg.c  |   9 ++
 gcc/cfg.h  |   1 +
 gcc/cfgrtl.c   |   3 +-
 gcc/cgraph.h   |   4 +
 gcc/cgraphunit.c   |  41 ++-
 gcc/final.c|   3 +-
 gcc/function.h |   2 +-
 gcc/gimple-expr.c  |   3 +-
 gcc/pass_manager.h |   6 +
 gcc/passes.c   |  65 --
 gcc/read-md.c  |  34 +-
 gcc/read-md.h  |   7 ++
 gcc/read-rtl-function.c|  83 ++---
 gcc/read-rtl-function.h|   3 +
 gcc/run-rtl-passes.c   |  66 ++
 gcc/run-rtl-passes.h   |  25 
 gcc/testsuite/gcc.dg/rtl/aarch64/asr_div1.c|  41 +++
 gcc/testsuite/gcc.dg/rtl/aarch64/pr71779.c |  50 
 gcc/testsuite/gcc.dg/rtl/rtl.exp   |  41 +++
 gcc/testsuite/gcc.dg/rtl/test.c|  31 +
 gcc/testsuite/gcc.dg/rtl/truncated-rtl-file.c  |   2 +
 gcc/testsuite/gcc.dg/rtl/unknown-rtx-code.c|   8 ++
 gcc/testsuite/gcc.dg/rtl/x86_64/dfinit.c   | 116 ++
 .../gcc.dg/rtl/x86_64/different-structs.c  |  81 +
 gcc/testsuite/gcc.dg/rtl/x86_64/final.c| 133 +
 gcc/testsuite/gcc.dg/rtl/x86_64/into-cfglayout.c   | 117 ++
 gcc/testsuite/gcc.dg/rtl/x86_64/ira.c  | 111 +
 gcc/testsuite/gcc.dg/rtl/x86_64/pro_and_epilogue.c | 110 +
 .../gcc.dg/rtl/x86_64/test-multiple-fns.c  | 105 
 .../rtl/x86_64/test-return-const.c.after-expand.c  |  39 ++
 .../rtl/x86_64/test-return-const.c.before-fwprop.c |  42 +++
 gcc/testsuite/gcc.dg/rtl/x86_64/test-rtl.c | 101 
 gcc/testsuite/gcc.dg/rtl/x86_64/test_1.h   |  16 +++
 .../gcc.dg/rtl/x86_64/times-two.c.after-expand.c   |  70 +++
 .../gcc.dg/rtl/x86_64/times-two.c.before-df.c  |  54 +
 gcc/testsuite/gcc.dg/rtl/x86_64/times-two.h|  22 
 gcc/testsuite/gcc.dg/rtl/x86_64/vregs.c| 112 +
 44 files changed, 1844 insertions(+), 44 deletions(-)
 create mode 100644 gcc/run-rtl-passes.c
 create mode 100644 gcc/run-rtl-passes.h
 create mode 100644 gcc/testsuite/gcc.dg/rtl/aarch64/asr_div1.c
 create mode 100644 gcc/testsuite/gcc.dg/rtl/aarch64/pr71779.c
 create mode 100644 gcc/testsuite/gcc.dg/rtl/rtl.exp
 create mode 100644 gcc/testsuite/gcc.dg/rtl/test.c
 create mode 100644 gcc/testsuite/gcc.dg/rtl/truncated-rtl-file.c
 create mode 100644 gcc/testsuite/gcc.dg/rtl/unknown-rtx-code.c
 create mode 100644 gcc/testsuite/gcc.dg/rtl/x86_64/dfinit.c
 create mode 100644

[PATCH 9a] Add "__RTL" to C frontend

2017-01-09 Thread David Malcolm

This part of the patch adds the ability to tag a function with
"__RTL", analogous to the "__GIMPLE" tag.

gcc/c-family/ChangeLog:
* c-common.c (c_common_reswords): Add "__RTL".
* c-common.h (enum rid): Add RID_RTL.

gcc/c/ChangeLog:
* c-parser.c: Include "read-rtl-function.h" and
"run-rtl-passes.h".
(c_parser_declaration_or_fndef): Rename "gimple-pass-list" in
grammar to gimple-or-rtl-pass-list.  Add rtl-function-definition
production.  Update for renaming of field "gimple_pass" to
"gimple_or_rtl_pass".  If __RTL was seen, call
c_parser_parse_rtl_body.  Convert a timevar_push/pop pair
to an auto_timevar, to cope with early exit.
(c_parser_declspecs): Update RID_GIMPLE handling for renaming of
field "gimple_pass" to "gimple_or_rtl_pass", and for renaming of
c_parser_gimple_pass_list to c_parser_gimple_or_rtl_pass_list.
Handle RID_RTL.
(c_parser_parse_rtl_body): New function.
* c-tree.h (enum c_declspec_word): Add cdw_rtl.
(struct c_declspecs): Rename field "gimple_pass" to
"gimple_or_rtl_pass".  Add field "rtl_p".
* gimple-parser.c (c_parser_gimple_pass_list): Rename to...
(c_parser_gimple_or_rtl_pass_list): ...this, updating accordingly.
* gimple-parser.h (c_parser_gimple_pass_list): Rename to...
(c_parser_gimple_or_rtl_pass_list): ...this.

gcc/ChangeLog:
* function.h (struct function): Update comment for field
"pass_startwith".
---
 gcc/c-family/c-common.c |   1 +
 gcc/c-family/c-common.h |   3 ++
 gcc/c/c-parser.c| 109 +---
 gcc/c/c-tree.h  |   7 +++-
 gcc/c/gimple-parser.c   |   8 ++--
 gcc/c/gimple-parser.h   |   2 +-
 gcc/function.h  |   2 +-
 7 files changed, 119 insertions(+), 13 deletions(-)

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 15ead18..62b762b 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -437,6 +437,7 @@ const struct c_common_resword c_common_reswords[] =
   { "__volatile__",RID_VOLATILE,   0 },
   { "__GIMPLE",RID_GIMPLE, D_CONLY },
   { "__PHI",   RID_PHI,D_CONLY },
+  { "__RTL",   RID_RTL,D_CONLY },
   { "alignas", RID_ALIGNAS,D_CXXONLY | D_CXX11 | D_CXXWARN },
   { "alignof", RID_ALIGNOF,D_CXXONLY | D_CXX11 | D_CXXWARN },
   { "asm", RID_ASM,D_ASM },
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index b838869..fc2ce87 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -124,6 +124,9 @@ enum rid
   /* "__PHI", for parsing PHI function in GIMPLE FE.  */
   RID_PHI,
 
+  /* "__RTL", for the RTL-parsing extension to the C frontend.  */
+  RID_RTL,
+
   /* C11 */
   RID_ALIGNAS, RID_GENERIC,
 
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 6d443da..bcfae86 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -62,6 +62,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "gcc-rich-location.h"
 #include "c-parser.h"
 #include "gimple-parser.h"
+#include "read-rtl-function.h"
+#include "run-rtl-passes.h"
 
 /* We need to walk over decls with incomplete struct/union/enum types
after parsing the whole translation unit.
@@ -1311,6 +1313,8 @@ static tree c_parser_array_notation (location_t, c_parser 
*, tree, tree);
 static tree c_parser_cilk_clause_vectorlength (c_parser *, tree, bool);
 static void c_parser_cilk_grainsize (c_parser *, bool *);
 
+static void c_parser_parse_rtl_body (c_parser *parser, char *start_with_pass);
+
 /* Parse a translation unit (C90 6.7, C99 6.9).
 
translation-unit:
@@ -1547,7 +1551,11 @@ static void c_finish_oacc_routine (struct 
oacc_routine_data *, tree, bool);
GIMPLE:
 
gimple-function-definition:
- declaration-specifiers[opt] __GIMPLE (gimple-pass-list) declarator
+ declaration-specifiers[opt] __GIMPLE (gimple-or-rtl-pass-list) declarator
+   declaration-list[opt] compound-statement
+
+   rtl-function-definition:
+ declaration-specifiers[opt] __RTL (gimple-or-rtl-pass-list) declarator
declaration-list[opt] compound-statement  */
 
 static void
@@ -2043,7 +2051,7 @@ c_parser_declaration_or_fndef (c_parser *parser, bool 
fndef_ok,
 tv = TV_PARSE_INLINE;
   else
 tv = TV_PARSE_FUNC;
-  timevar_push (tv);
+  auto_timevar at (g_timer, tv);
 
   /* Parse old-style parameter declarations.  ??? Attributes are
 not allowed to start declaration specifiers here because of a
@@ -2075,12 +2083,28 @@ c_parser_declaration_or_fndef (c_parser *parser, bool 
fndef_ok,
  function body as GIMPLE.  */
   if (specs->gimple_p)
{
- cfun->pass_startwith = specs->gimple_pass;
+ cfun->pass_startwith = specs->gimple_or_rtl_pass;
  bool saved = in_late_binary_op;
  in_late_binary_op = true;

[PATCH 9b] Don't assume that copy tables were initialized

2017-01-09 Thread David Malcolm

gcc/ChangeLog:
* cfg.c (original_copy_tables_initialized_p): New function.
* cfg.h (original_copy_tables_initialized_p): New decl.
* cfgrtl.c (relink_block_chain): Guard the call to
free_original_copy_tables with a call to
original_copy_tables_initialized_p.
---
 gcc/cfg.c| 9 +
 gcc/cfg.h| 1 +
 gcc/cfgrtl.c | 3 ++-
 3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/gcc/cfg.c b/gcc/cfg.c
index 97cc755..f30b680 100644
--- a/gcc/cfg.c
+++ b/gcc/cfg.c
@@ -1094,6 +1094,15 @@ free_original_copy_tables (void)
   original_copy_bb_pool = NULL;
 }
 
+/* Return true iff we have had a call to initialize_original_copy_tables
+   without a corresponding call to free_original_copy_tables.  */
+
+bool
+original_copy_tables_initialized_p (void)
+{
+  return original_copy_bb_pool != NULL;
+}
+
 /* Removes the value associated with OBJ from table TAB.  */
 
 static void
diff --git a/gcc/cfg.h b/gcc/cfg.h
index d421d3b..b44f1e1 100644
--- a/gcc/cfg.h
+++ b/gcc/cfg.h
@@ -110,6 +110,7 @@ extern void scale_bbs_frequencies_gcov_type (basic_block *, 
int, gcov_type,
 extern void initialize_original_copy_tables (void);
 extern void reset_original_copy_tables (void);
 extern void free_original_copy_tables (void);
+extern bool original_copy_tables_initialized_p (void);
 extern void set_bb_original (basic_block, basic_block);
 extern basic_block get_bb_original (basic_block);
 extern void set_bb_copy (basic_block, basic_block);
diff --git a/gcc/cfgrtl.c b/gcc/cfgrtl.c
index 7604346..b3b1146 100644
--- a/gcc/cfgrtl.c
+++ b/gcc/cfgrtl.c
@@ -3646,7 +3646,8 @@ relink_block_chain (bool stay_in_cfglayout_mode)
   /* Maybe reset the original copy tables, they are not valid anymore
  when we renumber the basic blocks in compact_blocks.  If we are
  are going out of cfglayout mode, don't re-allocate the tables.  */
-  free_original_copy_tables ();
+  if (original_copy_tables_initialized_p ())
+free_original_copy_tables ();
   if (stay_in_cfglayout_mode)
 initialize_original_copy_tables ();
 
-- 
1.8.5.3

[PATCH] PR target/79004, Fix char/short -> _Float128 on PowerPC -mcpu=power9

2017-01-09 Thread Michael Meissner

This patch fixes PR target/79004 by eliminating the optimization of avoiding
direct move if we are converting an 8/16-bit integer value from memory to IEEE
128-bit floating point.

I opened a new bug (PR target/79038) to address the underlying issue that the
IEEE 128-bit floating point integer conversions were written before small
integers were allowed in the traditional Altivec registers.  This meant that we
had to use UNSPEC and explicit temporaries to get the integers into the
appropriate registers.

I have tested this bug by doing a bootstrap build and make check on a little
endian power8 system and using an assembler that knows about ISA 3.0
instructions.  I added a new test to verify the results.  Can I check this into
the trunk?  This is not an issue on GCC 6.x.

[gcc]
2017-01-09  Michael Meissner  

PR target/79004
* config/rs6000/rs6000.md (FP_ISA3): Do not optimize converting
char or short to __float128/_Float128 directly.

[gcc/testsuite]
2017-01-09  Michael Meissner  

PR target/79004
* gcc.target/powerpc/pr79004.c: New test.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/rs6000.md
===
--- gcc/config/rs6000/rs6000.md (revision 244232)
+++ gcc/config/rs6000/rs6000.md (working copy)
@@ -521,10 +521,7 @@ (define_mode_iterator SIGNBIT [(KF "FLOA
   (TF "FLOAT128_VECTOR_P (TFmode)")])
 
 ; Iterator for ISA 3.0 supported floating point types
-(define_mode_iterator FP_ISA3 [SF
-  DF
-  (KF "FLOAT128_IEEE_P (KFmode)")
-  (TF "FLOAT128_IEEE_P (TFmode)")])
+(define_mode_iterator FP_ISA3 [SF DF])
 
 ; SF/DF suffix for traditional floating instructions
 (define_mode_attr Ftrad[(SF "s") (DF "")])
Index: gcc/testsuite/gcc.target/powerpc/pr79004.c
===
--- gcc/testsuite/gcc.target/powerpc/pr79004.c  (revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr79004.c  (revision 0)
@@ -0,0 +1,118 @@
+/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
+/* { dg-require-effective-target powerpc_float128_hw_ok } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power9" } } */
+/* { dg-options "-mcpu=power9 -O2" } */
+
+#include 
+
+#ifndef TYPE
+#define TYPE __float128
+#endif
+
+TYPE from_double (double a) { return (TYPE)a; }
+TYPE from_single (float a) { return (TYPE)a; }
+
+TYPE from_double_load (double *a) { return (TYPE)*a; }
+TYPE from_single_load (float *a) { return (TYPE)*a; }
+
+double to_double (TYPE a) { return (double)a; }
+float to_single (TYPE a) { return (float)a; }
+
+void to_double_store (TYPE a, double *p) { *p = (double)a; }
+void to_single_store (TYPE a, float *p) { *p = (float)a; }
+
+TYPE from_sign_char (signed char a) { return (TYPE)a; }
+TYPE from_sign_short (short a) { return (TYPE)a; }
+TYPE from_sign_int (int a) { return (TYPE)a; }
+TYPE from_sign_long (long a) { return (TYPE)a; }
+
+TYPE from_sign_char_load (signed char *a) { return (TYPE)*a; }
+TYPE from_sign_short_load (short *a) { return (TYPE)*a; }
+TYPE from_sign_int_load (int *a) { return (TYPE)*a; }
+TYPE from_sign_long_load (long *a) { return (TYPE)*a; }
+
+TYPE from_sign_char_load_4 (signed char *a) { return (TYPE)a[4]; }
+TYPE from_sign_short_load_4 (short *a) { return (TYPE)a[4]; }
+TYPE from_sign_int_load_4 (int *a) { return (TYPE)a[4]; }
+TYPE from_sign_long_load_4 (long *a) { return (TYPE)a[4]; }
+
+TYPE from_sign_char_load_n (signed char *a, long n) { return (TYPE)a[n]; }
+TYPE from_sign_short_load_n (short *a, long n) { return (TYPE)a[n]; }
+TYPE from_sign_int_load_n (int *a, long n) { return (TYPE)a[n]; }
+TYPE from_sign_long_load_n (long *a, long n) { return (TYPE)a[n]; }
+
+signed char to_sign_char (TYPE a) { return (signed char)a; }
+short to_sign_short (TYPE a) { return (short)a; }
+int to_sign_int (TYPE a) { return (int)a; }
+long to_sign_long (TYPE a) { return (long)a; }
+
+void to_sign_char_store (TYPE a, signed char *p) { *p = (signed char)a; }
+void to_sign_short_store (TYPE a, short *p) { *p = (short)a; }
+void to_sign_int_store (TYPE a, int *p) { *p = (int)a; }
+void to_sign_long_store (TYPE a, long *p) { *p = (long)a; }
+
+void to_sign_char_store_4 (TYPE a, signed char *p) { p[4] = (signed char)a; }
+void to_sign_short_store_4 (TYPE a, short *p) { p[4] = (short)a; }
+void to_sign_int_store_4 (TYPE a, int *p) { p[4] = (int)a; }
+void to_sign_long_store_4 (TYPE a, long *p) { p[4] = (long)a; }
+
+void to_sign_char_store_n (TYPE a, signed char *p, long n) { p[n] = (signed 
char)a; }
+void to_sign_short_store_n (TYPE a, short *p, long n) { p[n] = (short)a; }
+void to_sign_int_store_n (TYPE a, int *p, long

Re: [PATCH] Introduce --with-gcc-major-version-only configure option

2017-01-09 Thread Matthias Klose

On 09.01.2017 21:43, Jakub Jelinek wrote:
> Hi!
> 
> On Fri, Jan 06, 2017 at 01:48:26PM +0100, Jakub Jelinek wrote:
>> Yet another option is introduce AC_ARG_ENABLE into all those configure
>> scripts (some macro in config/*.m4) and do the sed conditionally.
> 
> Here is a patch to do that.
> Bootstrapped/regtested on x86_64-linux (without
> --with-gcc-major-version-only) and on i686-linux (with
> --with-gcc-major-version-only), then tested make install of both.
> The former uses the standard gcc -dumpversion of 7.0.0 and 7.0.0 in
> pathnames (e.g. usr/local/bin/x86_64-pc-linux-gnu-gcc-7.0.0,
> usr/local/lib/gcc/x86_64-pc-linux-gnu/7.0.0,
> usr/local/libexec/gcc/x86_64-pc-linux-gnu/7.0.0,
> usr/local/lib/go/7.0.0/x86_64-pc-linux-gnu,
> usr/local/include/c++/7.0.0 etc.), while the latter uses
> gcc -dumpversion of 7 and 7 in pathnames (e.g.
> i686-pc-linux-gnu-gcc-7, usr/local/lib/gcc/i686-pc-linux-gnu/7,
> usr/local/libexec/gcc/i686-pc-linux-gnu/7,
> usr/local/lib/go/7/i686-pc-linux-gnu,
> usr/local/include/c++/7 etc.).
> Ok for trunk?

Thanks for working on this.  I'm using such a layout for the Debian/Ubuntu GCC
builds for some years.  The one thing a dislike with your patch is the changed
output of the -dumpversion option which is different whether you use the the new
configure option or not.  This could break builds of third party software.  I
would prefer having -dumpversion the very same output independent of any
configure options.  Please could you introduce a new option if you really need 
that?

Matthias

Re: Use a specfile that actually allows building programs on NetBSD

2017-01-09 Thread coypu

3 month ping, 1 week ping (trying again), etc...

This patch has zero affect on non-netbsd users and was already
accepted in NetBSD years ago.

On Wed, Jan 04, 2017 at 11:24:27AM +, coypu wrote:
> Like most operating systems, NetBSD has a libc which contains
> stuff it needs for most programs to work, and people expect
> it to be linked without explicitly specifying -lc.
> 
> This patch is needed for just about any program to work.
> 
> ---
>  gcc/config/netbsd.h | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/gcc/config/netbsd.h b/gcc/config/netbsd.h
> index f2d6cc6..65ce943 100644
> --- a/gcc/config/netbsd.h
> +++ b/gcc/config/netbsd.h
> @@ -96,6 +96,7 @@ along with GCC; see the file COPYING3.  If not see
> %{!pg:-lposix}}   \
>   %{p:-lposix_p}  \
>   %{pg:-lposix_p}}\
> +   %{shared:-lc} \
> %{!shared:\
>   %{!symbolic:\
> %{!p: \
> @@ -109,6 +110,7 @@ along with GCC; see the file COPYING3.  If not see
> %{!pg:-lposix}}   \
>   %{p:-lposix_p}  \
>   %{pg:-lposix_p}}\
> +   %{shared:-lc} \
> %{!shared:\
>   %{!symbolic:\
> %{!p: \
> -- 
> 2.9.0

[committed PATCH] PR79026 The tests changed by revision r244006 now fail on darwin

2017-01-09 Thread Dominique d'Humières

I have committed on the trunk as revision r244244 (pre approved by Uros Bizjak 
in bugzilla) the following patch

--- ../_clean/gcc/testsuite/gcc.target/i386/pr78904-2.c 2017-01-09 
23:14:04.0 +0100
+++ gcc/testsuite/gcc.target/i386/pr78904-2.c   2017-01-09 22:41:49.0 
+0100
@@ -1,5 +1,6 @@
 /* PR target/78904 */
 /* { dg-do compile } */
+/* { dg-require-effective-target nonpic } */
 /* { dg-options "-O2 -masm=att" } */
 
 struct S1
--- ../_clean/gcc/testsuite/gcc.target/i386/pr78904-4.c 2017-01-09 
23:14:14.0 +0100
+++ gcc/testsuite/gcc.target/i386/pr78904-4.c   2017-01-09 22:41:49.0 
+0100
@@ -1,5 +1,6 @@
 /* PR target/78904 */
 /* { dg-do compile } */
+/* { dg-require-effective-target nonpic } */
 /* { dg-options "-O2 -masm=att" } */
 
 typedef __SIZE_TYPE__ size_t;
--- ../_clean/gcc/testsuite/gcc.target/i386/pr78904-6.c 2017-01-09 
23:14:24.0 +0100
+++ gcc/testsuite/gcc.target/i386/pr78904-6.c   2017-01-09 22:41:49.0 
+0100
@@ -1,5 +1,6 @@
 /* PR target/78904 */
 /* { dg-do compile } */
+/* { dg-require-effective-target nonpic } */
 /* { dg-options "-O2 -masm=att" } */
 
 typedef __SIZE_TYPE__ size_t;
--- ../_clean/gcc/testsuite/gcc.target/i386/pr78967-2.c 2017-01-09 
23:14:39.0 +0100
+++ gcc/testsuite/gcc.target/i386/pr78967-2.c   2017-01-09 22:41:49.0 
+0100
@@ -1,6 +1,7 @@
 /* PR target/78967 */
 /* { dg-do compile } */
 /* { dg-options "-O2 -masm=att" } */
+/* { dg-require-effective-target nonpic } */
 /* { dg-final { scan-assembler-not "movzbl" } } */
 
 typedef __SIZE_TYPE__ size_t;

Dominique

Re: C++ PATCH to implement C++17 variadic using

2017-01-09 Thread Jakub Jelinek

On Mon, Jan 09, 2017 at 05:02:03PM -0500, Jason Merrill wrote:
> On Mon, Jan 9, 2017 at 4:56 PM, Jakub Jelinek  wrote:
> > On Mon, Jan 09, 2017 at 04:50:20PM -0500, Jason Merrill wrote:
> >> The last C++17 feature
> >
> > What about P0490R0?
> 
> That's the response to various national body comments, I don't see
> anything in there that I would consider a new feature, most is just
> wording clarification.  Are you thinking of something in particular?

Yeah, it doesn't look like a feature, just lots of minor changes.
What I wanted to mention is that we should nevertheless go through them and
implement them (unless they are reverted (I hope GB 20 will be, because
right now decomp is broken for const structures when  is included)).

Jakub

Re: [Patch] PR71017 - libgcc/config/i386/cpuinfo.c:346:17: runtime error: left shift of 1 by 31 places cannot be represented in type 'int'

2017-01-09 Thread Dominique d'Humières


> Le 9 janv. 2017 à 20:37, Uros Bizjak  a écrit :
> 
> Hello!
> 
>> The following patch fixes errors of the kind
>> 
>> libgcc/config/i386/cpuinfo.c:260:17: runtime error: left shift of 1 by 31 
>> places cannot be
>> represented in type ‘int'
>> 
>> 2017-01-07  Dominique d'Humieres  
>> 
>>PR target/71017
>>* config/i386/cpuid.h: Fix undefined behavior.
> 
>> Is it OK for trunk/branches?
> 
> OK.
> 
> Thanks,
> Uros.

Thanks, committed on the trunk as revision r244248. Is it OK for the 5 and 6 
branches?

Dominique

Re: C++ PATCH to implement C++17 variadic using

2017-01-09 Thread Jason Merrill

On Mon, Jan 9, 2017 at 4:56 PM, Jakub Jelinek  wrote:
> On Mon, Jan 09, 2017 at 04:50:20PM -0500, Jason Merrill wrote:
>> The last C++17 feature
>
> What about P0490R0?

That's the response to various national body comments, I don't see
anything in there that I would consider a new feature, most is just
wording clarification.  Are you thinking of something in particular?

Jason

C++ PATCH to implement C++17 variadic using

2017-01-09 Thread Jason Merrill

The last C++17 feature was pretty trivial to implement, as expected.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit 10650d842cd49cad2adb396bc19192bf52975be8
Author: Jason Merrill 
Date:   Mon Jan 9 15:08:47 2017 -0500

Implement P0195R2, C++17 variadic using.

* parser.c (cp_parser_using_declaration): Handle ellipsis and comma.
* pt.c (tsubst_decl): Handle pack expansion in USING_DECL_SCOPE.
* error.c (dump_decl): Likewise.

diff --git a/gcc/cp/cp-tree.def b/gcc/cp/cp-tree.def
index 038227b..ff4f4ef 100644
--- a/gcc/cp/cp-tree.def
+++ b/gcc/cp/cp-tree.def
@@ -199,7 +199,8 @@ DEFTREECODE (BOUND_TEMPLATE_TEMPLATE_PARM, 
"bound_template_template_parm",
 DEFTREECODE (UNBOUND_CLASS_TEMPLATE, "unbound_class_template", tcc_type, 0)
 
 /* A using declaration.  USING_DECL_SCOPE contains the specified
-   scope.  In a member using decl, unless DECL_DEPENDENT_P is true,
+   scope.  In a variadic using-declaration, this is a TYPE_PACK_EXPANSION.
+   In a member using decl, unless DECL_DEPENDENT_P is true,
USING_DECL_DECLS contains the _DECL or OVERLOAD so named.  This is
not an alias, but is later expanded into multiple aliases.  */
 DEFTREECODE (USING_DECL, "using_decl", tcc_declaration, 0)
diff --git a/gcc/cp/error.c b/gcc/cp/error.c
index fde8499..72044a9 100644
--- a/gcc/cp/error.c
+++ b/gcc/cp/error.c
@@ -1268,10 +1268,21 @@ dump_decl (cxx_pretty_printer *pp, tree t, int flags)
   break;
 
 case USING_DECL:
-  pp_cxx_ws_string (pp, "using");
-  dump_type (pp, USING_DECL_SCOPE (t), flags);
-  pp_cxx_colon_colon (pp);
-  dump_decl (pp, DECL_NAME (t), flags);
+  {
+   pp_cxx_ws_string (pp, "using");
+   tree scope = USING_DECL_SCOPE (t);
+   bool variadic = false;
+   if (PACK_EXPANSION_P (scope))
+ {
+   scope = PACK_EXPANSION_PATTERN (scope);
+   variadic = true;
+ }
+   dump_type (pp, scope, flags);
+   pp_cxx_colon_colon (pp);
+   dump_decl (pp, DECL_NAME (t), flags);
+   if (variadic)
+ pp_cxx_ws_string (pp, "...");
+  }
   break;
 
 case STATIC_ASSERT:
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index e8c0642..aa045c4 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -18372,6 +18372,7 @@ cp_parser_using_declaration (cp_parser* parser,
   /* Look for the `using' keyword.  */
   cp_parser_require_keyword (parser, RID_USING, RT_USING);
   
+ again:
   /* Peek at the next token.  */
   token = cp_lexer_peek_token (parser->lexer);
   /* See if it's `typename'.  */
@@ -18438,6 +18439,16 @@ cp_parser_using_declaration (cp_parser* parser,
   if (!cp_parser_parse_definitely (parser))
return false;
 }
+  else if (cp_lexer_next_token_is (parser->lexer, CPP_ELLIPSIS))
+{
+  cp_token *ell = cp_lexer_consume_token (parser->lexer);
+  if (cxx_dialect < cxx1z
+ && !in_system_header_at (ell->location))
+   pedwarn (ell->location, 0,
+"pack expansion in using-declaration only available "
+"with -std=c++1z or -std=gnu++1z");
+  qscope = make_pack_expansion (qscope);
+}
 
   /* The function we call to handle a using-declaration is different
  depending on what scope we are in.  */
@@ -18455,7 +18466,7 @@ cp_parser_using_declaration (cp_parser* parser,
   if (at_class_scope_p ())
{
  /* Create the USING_DECL.  */
- decl = do_class_using_decl (parser->scope, identifier);
+ decl = do_class_using_decl (qscope, identifier);
 
  if (decl && typename_p)
USING_DECL_TYPENAME_P (decl) = 1;
@@ -18490,6 +18501,17 @@ cp_parser_using_declaration (cp_parser* parser,
}
 }
 
+  if (!access_declaration_p
+  && cp_lexer_next_token_is (parser->lexer, CPP_COMMA))
+{
+  cp_token *comma = cp_lexer_consume_token (parser->lexer);
+  if (cxx_dialect < cxx1z)
+   pedwarn (comma->location, 0,
+"comma-separated list in using-declaration only available "
+"with -std=c++1z or -std=gnu++1z");
+  goto again;
+}
+
   /* Look for the final `;'.  */
   cp_parser_require (parser, CPP_SEMICOLON, RT_SEMICOLON);
 
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 366c59a..dec7d39 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -12591,16 +12591,42 @@ tsubst_decl (tree t, tree args, tsubst_flags_t 
complain)
   if (DECL_DEPENDENT_P (t)
  || uses_template_parms (USING_DECL_SCOPE (t)))
{
- tree inst_scope = tsubst_copy (USING_DECL_SCOPE (t), args,
-complain, in_decl);
+ tree scope = USING_DECL_SCOPE (t);
  tree name = tsubst_copy (DECL_NAME (t), args, complain, in_decl);
- r = do_class_using_decl (inst_scope, name);
- if (!r)
-   r = error_mark_node;
+ if (PACK_EXPANSION_P (scope))
+   {
+ tree vec =

Re: C++ PATCH to implement C++17 variadic using

2017-01-09 Thread Jakub Jelinek

On Mon, Jan 09, 2017 at 04:50:20PM -0500, Jason Merrill wrote:
> The last C++17 feature

What about P0490R0?

Jakub

[PR tree-optimization/79007 tree-optimization/67955] Don't be so conservative with pt.null

2017-01-09 Thread Jeff Law



Per the discussion with Richi.  Bootstrapped and regression tested on 
x86_64-linux-gnu.  Though I thought I'd done the same with the prior 
patch, but it clearly was making the new dse-points-to test fail everywhere.


Installed on the trunk.

Jeff
commit 18c5f2e1d38dfd248f7d17eb9656251191a8bd15
Author: law 
Date:   Mon Jan 9 21:53:02 2017 +

PR tree-optimization/79007
PR tree-optimization/67955
* tree-ssa-alias.c (same_addr_size_stores_p): Only need to be
conservative for pt.null when flag_non_call_exceptions is on.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@244247 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 76cb51b..ed75ea8 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,10 @@
+2017-01-04  Jeff Law  
+
+   PR tree-optimization/79007
+   PR tree-optimization/67955
+   * tree-ssa-alias.c (same_addr_size_stores_p): Only need to be
+   conservative for pt.null when flag_non_call_exceptions is on.
+
 2017-01-09  Jakub Jelinek  
 
PR translation/79019
diff --git a/gcc/tree-ssa-alias.c b/gcc/tree-ssa-alias.c
index 871fa12..83fa6f5 100644
--- a/gcc/tree-ssa-alias.c
+++ b/gcc/tree-ssa-alias.c
@@ -2373,9 +2373,9 @@ same_addr_size_stores_p (tree base1, HOST_WIDE_INT 
offset1, HOST_WIDE_INT size1,
   || !pt_solution_singleton_or_null_p (>pt, _uid))
 return false;
 
-  /* If the solution has a singleton and NULL, then we can not
- be sure that the two stores hit the same address.  */
-  if (pi->pt.null)
+  /* Be conservative with non-call exceptions when the address might
+ be NULL.  */
+  if (flag_non_call_exceptions && pi->pt.null)
 return false;
 
   /* Check that ptr points relative to obj.  */

Re: [PATCH] Spelling and typo fixes in translatable strings (PR translation/79019, PR translation/79020)

2017-01-09 Thread Joseph Myers

On Mon, 9 Jan 2017, Jakub Jelinek wrote:

> I'm not a native English speaker, so I'd appreciate corrections (e.g. not
> 100% sure about the endianess -> endianness).  Bootstrapped/regtested on
> x86_64-linux and i686-linux, ok for trunk?

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [patch] aarch64--freebsd support for gcc.

2017-01-09 Thread Andreas Tobler


On 13.10.16 17:14, Jeff Law wrote:

On 10/12/2016 01:43 PM, Andreas Tobler wrote:


libgcc:

2016-10-10  Andreas Tobler  

* config.host: Add support for aarch64-*-freebsd*.

gcc:

2016-10-10  Andreas Tobler  

* config.gcc: Add aarch64-*-freebsd* support.
* config.host: Likewise.
* config/aarch64/aarch64-freebsd.h: New file.
* config/aarch64/t-aarch64-freebsd: Ditto.

toplevel:

2016-10-10  Andreas Tobler 

* configure.ac: Add aarch64-*-freebsd*.
* configure: Regenerate.

Certainly OK for the trunk.  Jakub, Richi & Joseph make the rules for
the release branches.


I had a chat with Jakub and I learned as long as there is no branch
freeze or such, every global reviewer can approve such a patch backport.
So may I ask you, would you mind approving this patch for 6.x and 5.x?

Yes.  Approved for 5.x and 6.x.


After fixing a bootstrap comparison failure on 6.x finally committed to 
6.x (244242) and 5.x (244243)


Thanks,
Andreas

[PATCH] Introduce --with-gcc-major-version-only configure option

2017-01-09 Thread Jakub Jelinek

Hi!

On Fri, Jan 06, 2017 at 01:48:26PM +0100, Jakub Jelinek wrote:
> Yet another option is introduce AC_ARG_ENABLE into all those configure
> scripts (some macro in config/*.m4) and do the sed conditionally.

Here is a patch to do that.
Bootstrapped/regtested on x86_64-linux (without
--with-gcc-major-version-only) and on i686-linux (with
--with-gcc-major-version-only), then tested make install of both.
The former uses the standard gcc -dumpversion of 7.0.0 and 7.0.0 in
pathnames (e.g. usr/local/bin/x86_64-pc-linux-gnu-gcc-7.0.0,
usr/local/lib/gcc/x86_64-pc-linux-gnu/7.0.0,
usr/local/libexec/gcc/x86_64-pc-linux-gnu/7.0.0,
usr/local/lib/go/7.0.0/x86_64-pc-linux-gnu,
usr/local/include/c++/7.0.0 etc.), while the latter uses
gcc -dumpversion of 7 and 7 in pathnames (e.g.
i686-pc-linux-gnu-gcc-7, usr/local/lib/gcc/i686-pc-linux-gnu/7,
usr/local/libexec/gcc/i686-pc-linux-gnu/7,
usr/local/lib/go/7/i686-pc-linux-gnu,
usr/local/include/c++/7 etc.).
Ok for trunk?

2017-01-09  Jakub Jelinek  

* configure: Regenerated.
config/
* acx.m4 (GCC_BASE_VER): New m4 function.
(ACX_TOOL_DIRS): Require GCC_BASE_VER, for
--with-gcc-major-version-only use just major number from BASE-VER.
gcc/
* configure.ac: Add GCC_BASE_VER.
* Makefile.in (version): Use @get_gcc_base_ver@ instead of cat to get
version from BASE-VER file.
* doc/install.texi: Document --with-gcc-major-version-only.
* configure: Regenerated.
libatomic/
* configure.ac: Add GCC_BASE_VER.
* Makefile.am (gcc_version): Use @get_gcc_base_ver@ instead of cat to
get version from BASE-VER file.
* testsuite/Makefile.in: Regenerated.
* configure: Regenerated.
* Makefile.in: Regenerated.
libgomp/
* configure.ac: Add GCC_BASE_VER.
* Makefile.am (gcc_version): Use @get_gcc_base_ver@ instead of cat to
get version from BASE-VER file.
* testsuite/Makefile.in: Regenerated.
* configure: Regenerated.
* Makefile.in: Regenerated.
libgcc/
* configure.ac: Add GCC_BASE_VER.
* Makefile.in (version): Use @get_gcc_base_ver@ instead of cat to get
version from BASE-VER file.
* configure: Regenerated.
libssp/
* configure.ac: Add GCC_BASE_VER.
* Makefile.am (gcc_version): Use @get_gcc_base_ver@ instead of cat to
get version from BASE-VER file.
* configure: Regenerated.
* Makefile.in: Regenerated.
liboffloadmic/
* configure.ac: Add GCC_BASE_VER.
* Makefile.am (gcc_version): Use @get_gcc_base_ver@ instead of cat to
get version from BASE-VER file.
* aclocal.m4: Include ../config/acx.m4.
* configure: Regenerated.
* Makefile.in: Regenerated.
libquadmath/
* configure.ac: Add GCC_BASE_VER.
* Makefile.am (gcc_version): Use @get_gcc_base_ver@ instead of cat to
get version from BASE-VER file.
* configure: Regenerated.
* Makefile.in: Regenerated.
libmpx/
* configure.ac: Add GCC_BASE_VER.
* Makefile.am (gcc_version): Use @get_gcc_base_ver@ instead of cat to
get version from BASE-VER file.
* configure: Regenerated.
* Makefile.in: Regenerated.
libada/
* configure.ac: Add GCC_BASE_VER.
* Makefile.in (version): Use @get_gcc_base_ver@ instead of cat to get
version from BASE-VER file.
* configure: Regenerated.
lto-plugin/
* configure.ac: Add GCC_BASE_VER.
* Makefile.am (gcc_version): Use @get_gcc_base_ver@ instead of cat to
get version from BASE-VER file.
* configure: Regenerated.
* Makefile.in: Regenerated.
libitm/
* configure.ac: Add GCC_BASE_VER.
* Makefile.am (gcc_version): Use @get_gcc_base_ver@ instead of cat to
get version from BASE-VER file.
* testsuite/Makefile.in: Regenerated.
* configure: Regenerated.
* Makefile.in: Regenerated.
fixincludes/
* configure.ac: Add GCC_BASE_VER.
* Makefile.in (gcc_version): Use @get_gcc_base_ver@ instead of cat to
get version from BASE-VER file.
* configure: Regenerated.
libcilkrts/
* configure.ac: Add GCC_BASE_VER.
* Makefile.am (gcc_version): Use @get_gcc_base_ver@ instead of cat to
get version from BASE-VER file.
* aclocal.m4: Include ../config/acx.m4.
* configure: Regenerated.
* Makefile.in: Regenerated.
libcc1/
* configure.ac: Add GCC_BASE_VER.  For --with-gcc-major-version-only
use just major number from BASE-VER.
* configure: Regenerated.
* Makefile.in: Regenerated.
libobjc/
* configure.ac: Add GCC_BASE_VER.
* Makefile.in (gcc_version): Use @get_gcc_base_ver@ instead of cat to
get version from BASE-VER file.
* configure: Regenerated.
libstdc++-v3/
* configure.ac: Add GCC_BASE_VER.
* fragment.am (gcc_version):

Re: [PATCH] Fix late dwarf generated early from optimized out globals

2017-01-09 Thread Andreas Tobler


On 09.01.17 18:36, Jakub Jelinek wrote:

On Mon, Jan 09, 2017 at 06:25:05PM +0100, Andreas Tobler wrote:

On 09.01.17 12:25, Jakub Jelinek wrote:

On Mon, Jan 09, 2017 at 11:53:38AM +0100, Richard Biener wrote:

Ok, attached the part I bootstrapped successfully on amd64-*-freebsd12 and
aarch64-*-freebsd12. From the amd64 run you'll find some test results at the
usual place. The aarch64 run takes some more time.

I hope I got it right this time :)
What do you think?


Looks good to me with the added comment to dwarf2out_late_global_decl
exchanged to the one on trunk.


The formatting is completely wrong.
Lines indented e.g. by 7 spaces (or tab + 1/3 space(s)),
/* comment inside of { block starting in the same column as {
(should be 2 columns to the right), && ! not aligned below VAR_P,
or indenting by 3 columns instead of 2.


Hehe, yep. This time done with emacs ;)

Here the hopefully final patch with proper ChangeLog and formatting fixed.

Ok to apply?


Formatting LGTM, so I think Richard's approval applies now.


Thanks a lot!

Committed as 244240.

Andreas

Re: Pretty printers for versioned namespace

2017-01-09 Thread François Dumont


On 04/01/2017 13:52, Jonathan Wakely wrote:

On 24/12/16 14:47 +0100, François Dumont wrote:

I'd prefer not to have to use the regex matches in libstdc++.exp as
they complicate things.

For the two examples above, the whatis results are bad even for the
non-versioned namespace. For specializations of basic_string we only
have type printers that recognize the standard typedefs like
std::u16string, but not other specializations. We really want it to
show std::basic_string not the full name. That would
require a TemplateTypePrinter for basic_string. The attached patch
works, and should be easy to incorporate into your changes for the
versioned namespace.


+add_one_template_type_printer(obj, 'basic_string',
+'basic_string<((un)?signed char), std::char_traits<\\1 ?>, 
std::allocator<\\1 ?> >',
+'basic_string<{1}>')
+

I had consider a similar approach but more generic like:

+add_one_template_type_printer(obj, 'basic_string',
+'basic_string<(.*)?, std::char_traits<\\1 ?>, std::allocator<\\1 ?> 
>',
+'basic_string<{1}>')
+


but it had bad effect on rendering of std::string type so I give up on this 
approach. Your version is indeed enough to cover not too exotic instantiations 
of std::basic_string.

I also updated 48362.cc test case as this test was already adapted for 
versioned namespace. But I had to keep one occurence of '__7' when displaying 
types inside a tuple. I think it is ok.

Tested with versioned namespace. Is it ok to commit after I completed tests 
without versioned namespace ?

François

diff --git a/libstdc++-v3/python/libstdcxx/v6/printers.py b/libstdc++-v3/python/libstdcxx/v6/printers.py
index 7690a6b..9de1a96 100644
--- a/libstdc++-v3/python/libstdcxx/v6/printers.py
+++ b/libstdc++-v3/python/libstdcxx/v6/printers.py
@@ -36,6 +36,8 @@ import sys
 # We probably can't do much about this until this GDB PR is addressed:
 # 
 
+vers_nsp = '__7::'
+
 if sys.version_info[0] > 2:
 ### Python 3 stuff
 Iterator = object
@@ -100,11 +102,15 @@ def find_type(orig, name):
 raise ValueError("Cannot find type %s::%s" % (str(orig), name))
 typ = field.type
 
+# Test if a type is a given template instantiation.
+def is_specialization_of(type, template_name):
+   return re.match('^std::(%s)?%s<.*>$' % (vers_nsp, template_name), type) is not None
+
 class SharedPointerPrinter:
 "Print a shared_ptr or weak_ptr"
 
 def __init__ (self, typename, val):
-self.typename = typename
+self.typename = typename.replace(vers_nsp, '')
 self.val = val
 
 def to_string (self):
@@ -127,9 +133,9 @@ class UniquePointerPrinter:
 
 def to_string (self):
 impl_type = self.val.type.fields()[0].type.tag
-if impl_type.startswith('std::__uniq_ptr_impl<'): # New implementation
+if is_specialization_of(impl_type, '__uniq_ptr_impl'): # New implementation
 v = self.val['_M_t']['_M_t']['_M_head_impl']
-elif impl_type.startswith('std::tuple<'):
+elif is_specialization_of(impl_type, 'tuple'):
 v = self.val['_M_t']['_M_head_impl']
 else:
 raise ValueError("Unsupported implementation for unique_ptr: %s" % self.val.type.fields()[0].type.tag)
@@ -179,7 +185,7 @@ class StdListPrinter:
 return ('[%d]' % count, val)
 
 def __init__(self, typename, val):
-self.typename = typename
+self.typename = typename.replace(vers_nsp, '')
 self.val = val
 
 def children(self):
@@ -299,7 +305,7 @@ class StdVectorPrinter:
 return ('[%d]' % count, elt)
 
 def __init__(self, typename, val):
-self.typename = typename
+self.typename = typename.replace(vers_nsp, '')
 self.val = val
 self.is_bool = val.type.template_argument(0).code  == gdb.TYPE_CODE_BOOL
 
@@ -403,7 +409,7 @@ class StdTuplePrinter:
 return ('[%d]' % self.count, impl['_M_head_impl'])
 
 def __init__ (self, typename, val):
-self.typename = typename
+self.typename = typename.replace(vers_nsp, '')
 self.val = val;
 
 def children (self):
@@ -418,7 +424,7 @@ class StdStackOrQueuePrinter:
 "Print a std::stack or std::queue"
 
 def __init__ (self, typename, val):
-self.typename = typename
+self.typename = typename.replace(vers_nsp, '')
 self.visualizer = gdb.default_visualizer(val['c'])
 
 def children (self):
@@ -496,7 +502,10 @@ class StdRbtreeIteratorPrinter:
 def __init__ (self, typename, val):
 self.val = val
 valtype = self.val.type.template_argument(0).strip_typedefs()
-nodetype = gdb.lookup_type('std::_Rb_tree_node<' + str(valtype) + '>')
+if typename.startswith('std::' + vers_nsp):
+nodetype = gdb.lookup_type('std::' + vers_nsp + '_Rb_tree_node<' + str(valtype) + '>')
+else:
+nodetype =

[PATCH] Spelling and typo fixes in translatable strings (PR translation/79019, PR translation/79020)

2017-01-09 Thread Jakub Jelinek

Hi!

These two PRs made me run aspell -c po/gcc.pot.  Didn't want to spend too
much time, so I've been pressing I like crazy and therefore surely missed
lots of things, here is what I've noticed (which includes the two issues
filed in bugzilla):

containg -> containing
intructions -> instructions
outpoing -> outgoing
endianess -> endianness
signess -> signedness
caling -> calling
isnsns -> insns
occured -> occurred
instrumetnation -> instrumentation
byt -> but
vairant -> variant
invokation -> invocation

British English spellings:
recognised -> recognized
normalised -> normalized
initialisation -> initialization

After that I've diffed the gcc.pot and adjusted the original sources, plus
with the exception of the UK spellings also grepped other files in gcc/
subdir for similar typos and fixed them too.  Some of the bugs were just
string literals split across multiple lines without space before " at the
end of one line and without space after " at the beginning of another line.

I'm not a native English speaker, so I'd appreciate corrections (e.g. not
100% sure about the endianess -> endianness).  Bootstrapped/regtested on
x86_64-linux and i686-linux, ok for trunk?

2017-01-09  Jakub Jelinek  

PR translation/79019
PR translation/79020
* params.def (PARAM_INLINE_MIN_SPEEDUP,
PARAM_IPA_CP_SINGLE_CALL_PENALTY,
PARAM_USE_AFTER_SCOPE_DIRECT_EMISSION_THRESHOLD): Fix typos
in descriptions.
* config/avr/avr.opt (maccumulate-args): Likewise.
* config/msp430/msp430.opt (mwarn-mcu): Likewise.
* common.opt (freport-bug): Likewise.
* cif-code.def (CIF_FINAL_ERROR): Likewise.
* doc/invoke.texi (ipa-cp-single-call-penalty): Likewise.
* config/s390/s390.c (s390_invalid_binary_op): Fix spelling in
translatable string.
* config/i386/i386.c (function_value_32): Likewise.
* config/nios2/nios2.c (nios2_valid_target_attribute_rec): Likewise.
* config/msp430/msp430.c (msp430_option_override, msp430_attr):
Likewise.
* config/msp430/driver-msp430.c (msp430_select_hwmult_lib): Likewise.
* common/config/msp430/msp430-common.c (msp430_handle_option):
Likewise.
* symtab.c (symtab_node::verify_base): Likewise.
* opts.c (set_debug_level): Likewise.
* tree.c (verify_type_variant): Likewise.  Fix typo in comment.
* config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin): Add
missing whitespace to translatable strings.
* config/avr/avr.md (bswapsi2): Fix typo in comment.
* config/sh/superh.h: Likewise.
* config/i386/xopintrin.h: Likewise.
* config/i386/znver1.md: Likewise.
* config/rs6000/rs6000.c (struct rs6000_opt_mask): Likewise.
* ipa-inline-analysis.c (compute_inline_parameters): Likewise.
* double-int.h (struct double_int): Likewise.
* double-int.c (div_and_round_double): Likewise.
* wide-int.cc: Likewise.
* tree-ssa.c (non_rewritable_mem_ref_base): Likewise.
* tree-ssa-sccvn.c (vn_reference_lookup_3): Likewise.
* cfgcleanup.c (crossjumps_occured): Renamed to ...
(crossjumps_occurred): ... this.
(try_crossjump_bb, try_head_merge_bb, try_optimize_cfg, cleanup_cfg):
Adjust all uses.
cp/
* semantics.c (finish_omp_clauses): Add missing whitespace to
translatable strings.
* cp-cilkplus.c (cpp_validate_cilk_plus_loop_aux): Fix comment typo.
lto/
* lto-symtab.c (lto_symtab_merge_symbols): Fix comment typo.
fortran/
* decl.c (attr_decl1): Fix spelling in translatable string.
* intrinsic.texi: Fix spelling - invokation -> invocation.
* lang.opt (faggressive-function-elimination, gfc_convert): Fix
typos in descriptions.
* openmp.c (resolve_omp_clauses): Add missing whitespace to
translatable strings.
c-family/
* c.opt (Wnormalized=): Fix typo in description.
testsuite/
* c-c++-common/goacc/host_data-2.c (f): Adjust expected spelling of
diagnostics.
* gfortran.dg/initialization_17.f90: Likewise.

--- gcc/params.def.jj   2017-01-08 17:41:14.0 +0100
+++ gcc/params.def  2017-01-08 19:06:14.858050159 +0100
@@ -51,7 +51,7 @@ DEFPARAM (PARAM_PREDICTABLE_BRANCH_OUTCO
 
 DEFPARAM (PARAM_INLINE_MIN_SPEEDUP,
  "inline-min-speedup",
- "The minimal estimated speedup allowing inliner to ignore 
inline-insns-single and inline-isnsns-auto.",
+ "The minimal estimated speedup allowing inliner to ignore 
inline-insns-single and inline-insns-auto.",
  10, 0, 0)
 
 /* The single function inlining limit. This is the maximum size
@@ -1007,7 +1007,7 @@ DEFPARAM (PARAM_IPA_CP_RECURSION_PENALTY
 
 DEFPARAM (PARAM_IPA_CP_SINGLE_CALL_PENALTY,
  "ipa-cp-single-call-penalty",
- "Percentage penalty functions containg a single call to another "
+ "Percentage

[committed] increase buffer size to avoid truncation warning in asan.c (PR 79033)

2017-01-09 Thread Martin Sebor


To unblock the bootstrap failure I committed r244237 as an obviously
safe fix to resolve bug 79033 - asan.c not compiling with make
BOOT_CFLAGS='-O0'.

Martin

Re: [Patch] PR71017 - libgcc/config/i386/cpuinfo.c:346:17: runtime error: left shift of 1 by 31 places cannot be represented in type 'int'

2017-01-09 Thread Uros Bizjak

Hello!

> The following patch fixes errors of the kind
>
> libgcc/config/i386/cpuinfo.c:260:17: runtime error: left shift of 1 by 31 
> places cannot be
>  represented in type ‘int'
>
> 2017-01-07  Dominique d'Humieres  
>
> PR target/71017
> * config/i386/cpuid.h: Fix undefined behavior.

> Is it OK for trunk/branches?

OK.

Thanks,
Uros.

[v3 PATCH] Reduce the size of variant, it doesn't need an index of type size_t internally.

2017-01-09 Thread Ville Voutilainen

Tested on Linux-x64.

2017-01-09  Ville Voutilainen  

Reduce the size of variant, it doesn't need an index of
type size_t internally.
* include/std/variant (parse_numbers.h): New include.
(__select_index): New.
(_Variant_storage::_M_reset_impl): Use
_index_type for comparison with variant_npos.
(_Variant_storage::__index_type): New.
(_Variant_storage::_M_index): Change the
type from size_t to __index_type.
(_Variant_storage::__index_type): New.
(_Variant_storage::_M_index): Change the
type from size_t to __index_type.
(_Variant_base::_M_valid): Use __index_type for comparison
with variant_npos.
(variant::__index_type): New.
(variant::index): Use __index_type for comparison with variant_npos.
* testsuite/20_util/variant/index_type.cc: New.
diff --git a/libstdc++-v3/include/std/variant b/libstdc++-v3/include/std/variant
index 3d025a7..b016f32 100644
--- a/libstdc++-v3/include/std/variant
+++ b/libstdc++-v3/include/std/variant
@@ -43,6 +43,7 @@
 #include 
 #include 
 #include 
+#include 
 
 namespace std _GLIBCXX_VISIBILITY(default)
 {
@@ -314,6 +315,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 struct _Variant_storage;
 
+  template 
+  using __select_index =
+typename __select_int::_Select_int_base
+::type::value_type;
+
   template
 struct _Variant_storage
 {
@@ -332,7 +340,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
constexpr void _M_reset_impl(std::index_sequence<__indices...>)
{
- if (_M_index != variant_npos)
+ if (_M_index != __index_type(variant_npos))
_S_vtable<__indices...>[_M_index](*this);
}
 
@@ -346,7 +354,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   { _M_reset(); }
 
   _Variadic_union<_Types...> _M_u;
-  size_t _M_index;
+  using __index_type = __select_index<_Types...>;
+  __index_type _M_index;
 };
 
   template
@@ -364,7 +373,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   { _M_index = variant_npos; }
 
   _Variadic_union<_Types...> _M_u;
-  size_t _M_index;
+  using __index_type = __select_index<_Types...>;
+  __index_type _M_index;
 };
 
   // Helps SFINAE on special member functions. Otherwise it can live in variant
@@ -487,7 +497,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   constexpr bool
   _M_valid() const noexcept
-  { return this->_M_index != variant_npos; }
+  {
+   return this->_M_index !=
+ typename _Storage::__index_type(variant_npos);
+  }
 };
 
   // For how many times does _Tp appear in _Tuple?
@@ -944,6 +957,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  __detail::__variant::__index_of_v<_Tp, _Types...>;
 
 public:
+  using __index_type = typename _Base::_Storage::__index_type;
   constexpr variant()
   noexcept(is_nothrow_default_constructible_v<__to_type<0>>) = default;
   variant(const variant&) = default;
@@ -1086,7 +1100,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   { return !this->_M_valid(); }
 
   constexpr size_t index() const noexcept
-  { return this->_M_index; }
+  {
+   if (this->_M_index == __index_type(variant_npos))
+ return variant_npos;
+   return this->_M_index;
+  }
 
   void
   swap(variant& __rhs)
diff --git a/libstdc++-v3/testsuite/20_util/variant/index_type.cc 
b/libstdc++-v3/testsuite/20_util/variant/index_type.cc
new file mode 100644
index 000..ea6b4d6
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/variant/index_type.cc
@@ -0,0 +1,43 @@
+// { dg-options "-std=gnu++17" }
+// { dg-do compile }
+
+// Copyright (C) 2017 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+#include 
+#include 
+#include 
+
+template  struct Bogus {};
+
+template  auto f(std::index_sequence)
+{
+  return std::variant{};
+}
+
+static_assert(sizeof(char) >= sizeof(int) ||
+ CHAR_BIT != 8 ||
+ std::is_same_v<
+ decltype(f(std::make_index_sequence<3>()))::__index_type,
+

Re: [patch,libgomp] Make libgomp Fortran modules multilib-aware

2017-01-09 Thread Mike Stump

On Jan 9, 2017, at 2:04 AM, FX  wrote:
> 
> Given lack of review of this Fortran-specific patch for libgomp, can a 
> Fortran maintainer approve it please?

Ok.

Re: [PATCH] PR79017 workaround incomplete C99 math on darwin

2017-01-09 Thread Mike Stump

On Jan 9, 2017, at 9:16 AM, Jonathan Wakely  wrote:
> 
> Older versions of OS X (at least Leopard) are missing some
> declarations of C99 functions from , which causes our
> configure test to decide that all C99 functions are missing from
> . Rather then splitting up the check into dozens of smaller
> checks for individual functions (which would be stage 1 material) this
> just adds a special case for the six missing functions, so that darwin
> checks for them separately and defaines a new macro to say they're
> missing.
> 
>   PR libstdc++/79017
>   * acinclude.m4 (GLIBCXX_CHECK_C99_TR1): Check for llrint and llround
>   functions separately on darwin and if they're missing define
>   _GLIBCXX_NO_C99_ROUNDING_FUNCS.
>   * config.h.in: Regenerate.
>   * configure: Regenerate.
>   * include/c_global/cmath [_GLIBCXX_NO_C99_ROUNDING_FUNCS] (llrint)
>   (llrintf, llrintl, llround, llroundf, llroundl): Do not define.

Another possibility might be possible if llrint is the same as lrint and lrint 
is present... then liberty can define forwarders on a per function basis if 
they aren't there.

I think this is the case when sizeof (long long) == sizeof(long).

Re: [PATCH, rs6000] Add vec_nabs builtin support

2017-01-09 Thread Carl E. Love

Oops, accidentally  hit send.  Was trying to insert file.

On Mon, 2017-01-09 at 09:58 -0800, Carl E. Love wrote:
> GCC maintainers:
> 
> The following patch adds two more built-ins that are missing.
> Specifically:
> 
> vector signed char vec_nabs (vector signed char)
>   vector signed short vec_nabs (vector signed short)
>   vector signed int vec_nabs (vector signed int)
>   vector signed long long vec_nabs (vector signed long long)
>   vector float vec_nabs (vector float)
>   vector double vec_nabs (vector double)
>  
> 
> The patch has been boot strapped and tested on
> powerpc64le-unknown-linux-gnu (Power 8 LE) and on 
> powerpc64-unknown-linux-gnu (Power 8 BE 64-bit, 32-bit) and on 
> powerpc64-unknown-linux-gnu (Power 7 64-bit, 32-bit) with no
> regressions.
> 
> Is this OK for trunk?
> 
> Carl Love
> 
> 
> ---

gcc/ChangeLog:

2017-01-09  Carl Love  

* config/rs6000/rs6000-c: Add support for built-in functions
vector signed char vec_nabs (vector signed char)
vector signed short vec_nabs (vector signed short)
vector signed int vec_nabs (vector signed int)
vector signed long long vec_nabs (vector signed long long)
vector float vec_nabs (vector float)
vector double vec_nabs (vector double)
* config/rs6000/rs6000-builtin.def: Add definitions for NABS functions
and NABS overload.
* config/rs6000/altivec.md: Add define to expand nabs2 types
* config/rs6000/altivec.h: Add define for vec_nabs built-in function.
* doc/extend.texi: Update the built-in documentation file for the
new built-in functions.

gcc/testsuite/ChangeLog:

2017-01-09  Carl Love  

* gcc.target/powerpc/builtins-3.c: Add tests for the new built-ins
to the test suite file.
* gcc.target/powerpc/builtins-3-p8.c: Add tests for the new built-ins
to the test suite file.
---
 gcc/config/rs6000/altivec.h  |  1 +
 gcc/config/rs6000/altivec.md | 25 +
 gcc/config/rs6000/rs6000-builtin.def |  9 +
 gcc/config/rs6000/rs6000-c.c | 12 ++
 gcc/doc/extend.texi  |  8 
 gcc/testsuite/gcc.target/powerpc/builtins-3-p8.c | 12 +-
 gcc/testsuite/gcc.target/powerpc/builtins-3.c| 47 +++-
 7 files changed, 112 insertions(+), 2 deletions(-)

diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
index 73567ff..17bc33e 100644
--- a/gcc/config/rs6000/altivec.h
+++ b/gcc/config/rs6000/altivec.h
@@ -189,6 +189,7 @@
 #define vec_vupklsh __builtin_vec_vupklsh
 #define vec_vupklsb __builtin_vec_vupklsb
 #define vec_abs __builtin_vec_abs
+#define vec_nabs __builtin_vec_nabs
 #define vec_abss __builtin_vec_abss
 #define vec_add __builtin_vec_add
 #define vec_adds __builtin_vec_adds
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index c2063d5..2c8d20b 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -2741,6 +2741,31 @@
 })
 
 ;; Generate
+;;vspltisw SCRATCH1,0
+;;vsubu?m SCRATCH2,SCRATCH1,%1
+;;vmins? %0,%1,SCRATCH2"
+(define_expand "nabs2"
+  [(set (match_dup 2) (match_dup 3))
+   (set (match_dup 4)
+(minus:VI2 (match_dup 2)
+  (match_operand:VI2 1 "register_operand" "v")))
+   (set (match_operand:VI2 0 "register_operand" "=v")
+(smin:VI2 (match_dup 1) (match_dup 4)))]
+  ""
+{
+  int i, n_elt = GET_MODE_NUNITS (mode);
+  rtvec v = rtvec_alloc (n_elt);
+
+  /* Create an all 0 constant.  */
+  for (i = 0; i < n_elt; ++i)
+RTVEC_ELT (v, i) = const0_rtx;
+
+  operands[2] = gen_reg_rtx (mode);
+  operands[3] = gen_rtx_CONST_VECTOR (mode, v);
+  operands[4] = gen_reg_rtx (mode);
+})
+
+;; Generate
 ;;vspltisw SCRATCH1,-1
 ;;vslw SCRATCH2,SCRATCH1,SCRATCH1
 ;;vandc %0,%1,SCRATCH2
diff --git a/gcc/config/rs6000/rs6000-builtin.def 
b/gcc/config/rs6000/rs6000-builtin.def
index 2329c1f..1cdf9a8 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -1129,6 +1129,14 @@ BU_ALTIVEC_A (ABSS_V4SI,  "abss_v4si",   SAT,
altivec_abss_v4si)
 BU_ALTIVEC_A (ABSS_V8HI,  "abss_v8hi", SAT,altivec_abss_v8hi)
 BU_ALTIVEC_A (ABSS_V16QI, "abss_v16qi",SAT,altivec_abss_v16qi)
 
+/* Altivec NABS functions.  */
+BU_ALTIVEC_A (NABS_V2DI,  "nabs_v2di", CONST,  nabsv2di2)
+BU_ALTIVEC_A (NABS_V4SI,  "nabs_v4si", CONST,  nabsv4si2)
+BU_ALTIVEC_A (NABS_V8HI,  "nabs_v8hi", CONST,  nabsv8hi2)
+BU_ALTIVEC_A (NABS_V16QI, "nabs_v16qi",CONST,  nabsv16qi2)
+BU_ALTIVEC_A (NABS_V4SF,  "nabs_v4sf", CONST,  vsx_nabsv4sf2)
+BU_ALTIVEC_A (NABS_V2DF,  "nabs_v2df", CONST,  vsx_nabsv2df2)
+
 /* 1 argument Altivec builtin

[PATCH, rs6000] Add vec_nabs builtin support

2017-01-09 Thread Carl E. Love

GCC maintainers:

The following patch adds two more built-ins that are missing.
Specifically:

vector signed char vec_nabs (vector signed char)
vector signed short vec_nabs (vector signed short)
vector signed int vec_nabs (vector signed int)
vector signed long long vec_nabs (vector signed long long)
vector float vec_nabs (vector float)
vector double vec_nabs (vector double)
 

The patch has been boot strapped and tested on
powerpc64le-unknown-linux-gnu (Power 8 LE) and on 
powerpc64-unknown-linux-gnu (Power 8 BE 64-bit, 32-bit) and on 
powerpc64-unknown-linux-gnu (Power 7 64-bit, 32-bit) with no
regressions.

Is this OK for trunk?

Carl Love


---

Re: [PATCH] Fix late dwarf generated early from optimized out globals

2017-01-09 Thread Jakub Jelinek

On Mon, Jan 09, 2017 at 06:25:05PM +0100, Andreas Tobler wrote:
> On 09.01.17 12:25, Jakub Jelinek wrote:
> > On Mon, Jan 09, 2017 at 11:53:38AM +0100, Richard Biener wrote:
> > > > Ok, attached the part I bootstrapped successfully on amd64-*-freebsd12 
> > > > and
> > > > aarch64-*-freebsd12. From the amd64 run you'll find some test results 
> > > > at the
> > > > usual place. The aarch64 run takes some more time.
> > > > 
> > > > I hope I got it right this time :)
> > > > What do you think?
> > > 
> > > Looks good to me with the added comment to dwarf2out_late_global_decl
> > > exchanged to the one on trunk.
> > 
> > The formatting is completely wrong.
> > Lines indented e.g. by 7 spaces (or tab + 1/3 space(s)),
> > /* comment inside of { block starting in the same column as {
> > (should be 2 columns to the right), && ! not aligned below VAR_P,
> > or indenting by 3 columns instead of 2.
> 
> Hehe, yep. This time done with emacs ;)
> 
> Here the hopefully final patch with proper ChangeLog and formatting fixed.
> 
> Ok to apply?

Formatting LGTM, so I think Richard's approval applies now.

Jakub

Re: [PATCH] Fix late dwarf generated early from optimized out globals

2017-01-09 Thread Andreas Tobler


On 09.01.17 12:25, Jakub Jelinek wrote:

On Mon, Jan 09, 2017 at 11:53:38AM +0100, Richard Biener wrote:

Ok, attached the part I bootstrapped successfully on amd64-*-freebsd12 and
aarch64-*-freebsd12. From the amd64 run you'll find some test results at the
usual place. The aarch64 run takes some more time.

I hope I got it right this time :)
What do you think?


Looks good to me with the added comment to dwarf2out_late_global_decl
exchanged to the one on trunk.


The formatting is completely wrong.
Lines indented e.g. by 7 spaces (or tab + 1/3 space(s)),
/* comment inside of { block starting in the same column as {
(should be 2 columns to the right), && ! not aligned below VAR_P,
or indenting by 3 columns instead of 2.


Hehe, yep. This time done with emacs ;)

Here the hopefully final patch with proper ChangeLog and formatting fixed.

Ok to apply?

Thanks,
Andreas

2017-01-09  Andreas Tobler  

Backport from mainline
2016-09-19  Richard Biener  

* dwarf2out.c (dwarf2out_late_global_decl): When being during the
early debug phase do not add locations but only const value
attributes.

Backport from mainline
2016-10-20  Richard Biener  

* cgraphunit.c (analyze_functions): Set node->definition to
false to signal symbol removal to debug_hooks->late_global_decl.

Index: gcc/dwarf2out.c
===
--- gcc/dwarf2out.c (revision 244100)
+++ gcc/dwarf2out.c (working copy)
@@ -23752,7 +23752,16 @@
 {
   dw_die_ref die = lookup_decl_die (decl);
   if (die)
-   add_location_or_const_value_attribute (die, decl, false);
+{
+  /* We get called via the symtab code invoking late_global_decl
+ for symbols that are optimized out.  Do not add locations
+ for those.  */
+  varpool_node *node = varpool_node::get (decl);
+  if (! node || ! node->definition)
+tree_add_const_value_attribute_for_decl (die, decl);
+  else
+add_location_or_const_value_attribute (die, decl, false);
+}
 }
 }
 
Index: gcc/cgraphunit.c
===
--- gcc/cgraphunit.c(revision 244100)
+++ gcc/cgraphunit.c(working copy)
@@ -1193,8 +1193,16 @@
 at looking at optimized away DECLs, since
 late_global_decl will subsequently be called from the
 contents of the now pruned symbol table.  */
- if (!decl_function_context (node->decl))
-   (*debug_hooks->late_global_decl) (node->decl);
+ if (VAR_P (node->decl)
+ && !decl_function_context (node->decl))
+   {
+ /* We are reclaiming totally unreachable code and variables
+so they effectively appear as readonly.  Show that to
+the debug machinery.  */
+ TREE_READONLY (node->decl) = 1;
+ node->definition = false;
+ (*debug_hooks->late_global_decl) (node->decl);
+   }
 
  node->remove ();
  continue;

Re: [PR tree-optimization/67955] Exploit PTA in DSE

2017-01-09 Thread Richard Biener

On January 9, 2017 6:02:16 PM GMT+01:00, Jeff Law  wrote:
>On 01/09/2017 02:36 AM, Richard Biener wrote:
>>>
>>>
>>> a = 1;
>>> 
>>> a = 2;
>>>
>>>
>>> If "a" escapes such that its value can be queried in the exception
>handler,
>>> then the exception handler would be able to observe the first store
>and thus
>>> it should not be removed.
>>
>> Yes, and it won't as long as the EH is thrown internally (and thus we
>have
>> a CFG reflecting it).  When it's only externally catched we lose of
>course...
>>
>> We'd need an Ada testcase to actually show behavior that is not
>conforming
>> to an existing language specification though.
>I'm not versed enough in Ada to even attempt to pull together a
>testcase 
>for this.
>
>>
>> I suspect we have a similar issue in C++ for sth like
>>
>> void __attribute__((const)) foo () { throw; }
>>
>> int x;
>> void bar ()
>> {
>>   x = 1;
>>   foo ();
>>   x = 2;
>> }
>>
>> where foo is const but not nothrow.
>I wouldn't be surprised if there's other problems with const functions 
>that can throw.
>
>>
>>> We also have to be cognizant of systems where there is memory mapped
>at
>>> location 0.  When that is true, we must check pt.null and honor it,
>even if
>>> it pessimizes code.
>>
>> With -fno-delete-null-pointer-checks (that's what such systems set)
>PTA computes
>> 0 as "nonlocal" and thus it won't be a singleton points-to solution.
>Ah, good.
>
>>
>>>
>>>
 For

 int foo (int *p, int b)
 {
   int *q;
   int i = 1;
   if (b)
 q = 
   else
 q = (void *)0;
   *q = 2;
   i = 3;
   return *q;
 }
>>>
>>> So on a system where *0 is a valid memory address, *q = 2 does not
>make
>>> anything dead, nor does i = 3 unless we were to isolate the
>THEN/ELSE
>>> blocks.
>>>
>>> On a system where *0 traps, there is no way to observe the value of
>"i" in
>>> the handler.  Thus i = 1 is a dead store.  I believe we must keep
>the *q = 2
>>> store because it can trigger a signal/exception which is itself an
>>> observable side effect?  Right?
>>
>> But writing to 0 invokes undefined behavior which we have no
>obligation to
>> preserve (unless we make it well-defined with -fnon-call-exceptions
>-fexceptions
>> as a GCC extension).
>It may invoke undefined behavior, but to date we have explicitly chosen
>
>to preserve the *0 =  to trigger a fault via the virtual 
>memory system.  We kicked this around extensively in Nov 2013 with the 
>introduction of isolation of erroneous paths.

Note we do not preserve traps with -ftrapv -fnon-call-exceptions either.  Or 
generally we happily reorder global sides effects with externally throwing 
stmts.

>I think part of what pushed us that direction was that a program could 
>catch the signal related to the NULL dereference, then do something 
>sensible in the handler.  That also happens to match what the Go 
>language requires.
>
>
>
>>

 we remove all stores but the last store to i and the load from q
>(but we
 don't
 replace q with  here, a missed optimization if removing the other
>stores
 is
 valid).
>>>
>>> But if we remove the *q = 2 store, we remove an observable side
>effect, the
>>> trap/exception itself if we reach that statement via the ELSE path.
>>
>> As said above - I don't think we have to care for C/C++ w/o
>> -fnon-call-exceptions.
>So in the immediate term I propose we conditionalize the pt.null check 
>on non-call exceptions.  THen I'll look more closely at the example 
>above and see what we can reasonably do there.

Fix the pt-null bugs in PTA so we do not need the conservative fallback.  As 
said I have partial patches for this.

Richard.

>jeff

[PATCH] PR79017 workaround incomplete C99 math on darwin

2017-01-09 Thread Jonathan Wakely


Older versions of OS X (at least Leopard) are missing some
declarations of C99 functions from , which causes our
configure test to decide that all C99 functions are missing from
. Rather then splitting up the check into dozens of smaller
checks for individual functions (which would be stage 1 material) this
just adds a special case for the six missing functions, so that darwin
checks for them separately and defaines a new macro to say they're
missing.

PR libstdc++/79017
* acinclude.m4 (GLIBCXX_CHECK_C99_TR1): Check for llrint and llround
functions separately on darwin and if they're missing define
_GLIBCXX_NO_C99_ROUNDING_FUNCS.
* config.h.in: Regenerate.
* configure: Regenerate.
* include/c_global/cmath [_GLIBCXX_NO_C99_ROUNDING_FUNCS] (llrint)
(llrintf, llrintl, llround, llroundf, llroundl): Do not define.

Tested x86_64-linux, committed to trunk.

commit 2802004b233a381e67afee0d9fc4b83712bc7b56
Author: Jonathan Wakely 
Date:   Mon Jan 9 15:22:03 2017 +

PR79017 workaround incomplete C99 math on darwin

PR libstdc++/79017
* acinclude.m4 (GLIBCXX_CHECK_C99_TR1): Check for llrint and llround
functions separately on darwin and if they're missing define
_GLIBCXX_NO_C99_ROUNDING_FUNCS.
* config.h.in: Regenerate.
* configure: Regenerate.
* include/c_global/cmath [_GLIBCXX_NO_C99_ROUNDING_FUNCS] (llrint)
(llrintf, llrintl, llround, llroundf, llroundl): Do not define.

diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index eef107a..4e04cce 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -1890,12 +1890,14 @@ AC_DEFUN([GLIBCXX_CHECK_C99_TR1], [
  lgamma(0.0);
  lgammaf(0.0f);
  lgammal(0.0l);
+ #ifndef __APPLE__ /* see below */
  llrint(0.0);
  llrintf(0.0f);
  llrintl(0.0l);
  llround(0.0);
  llroundf(0.0f);
  llroundl(0.0l);
+ #endif
  log1p(0.0);
  log1pf(0.0f);
  log1pl(0.0l);
@@ -1954,6 +1956,29 @@ AC_DEFUN([GLIBCXX_CHECK_C99_TR1], [
 AC_DEFINE(_GLIBCXX_USE_C99_MATH_TR1, 1,
  [Define if C99 functions or macros in  should be imported
  in  in namespace std::tr1.])
+
+case "${target_os}" in
+  darwin*)
+AC_MSG_CHECKING([for ISO C99 rounding functions in ])
+AC_CACHE_VAL(glibcxx_cv_c99_math_llround, [
+  AC_TRY_COMPILE([#include ],
+[llrint(0.0);
+ llrintf(0.0f);
+ llrintl(0.0l);
+ llround(0.0);
+ llroundf(0.0f);
+ llroundl(0.0l);
+],
+[glibcxx_cv_c99_math_llround=yes],
+[glibcxx_cv_c99_math_llround=no])
+  ])
+   AC_MSG_RESULT($glibcxx_cv_c99_math_llround)
+;;
+esac
+if test x"$glibcxx_cv_c99_math_llround" = x"no"; then
+  AC_DEFINE(_GLIBCXX_NO_C99_ROUNDING_FUNCS, 1,
+   [Define if C99 llrint and llround functions are missing from 
.])
+fi
   fi
 
   # Check for the existence of  functions (NB: doesn't make
diff --git a/libstdc++-v3/include/c_global/cmath 
b/libstdc++-v3/include/c_global/cmath
index 3630a5b..6e7508f 100644
--- a/libstdc++-v3/include/c_global/cmath
+++ b/libstdc++-v3/include/c_global/cmath
@@ -1012,12 +1012,14 @@ _GLIBCXX_END_NAMESPACE_VERSION
 #undef lgamma
 #undef lgammaf
 #undef lgammal
+#ifndef _GLIBCXX_NO_C99_ROUNDING_FUNCS
 #undef llrint
 #undef llrintf
 #undef llrintl
 #undef llround
 #undef llroundf
 #undef llroundl
+#endif
 #undef log1p
 #undef log1pf
 #undef log1pl
@@ -1143,6 +1145,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   using ::lgammaf;
   using ::lgammal;
 
+#ifndef _GLIBCXX_NO_C99_ROUNDING_FUNCS
   using ::llrint;
   using ::llrintf;
   using ::llrintl;
@@ -1150,6 +1153,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   using ::llround;
   using ::llroundf;
   using ::llroundl;
+#endif
 
   using ::log1p;
   using ::log1pf;

Re: [PR tree-optimization/67955] Exploit PTA in DSE

2017-01-09 Thread Jeff Law


On 01/09/2017 02:36 AM, Richard Biener wrote:



a = 1;

a = 2;


If "a" escapes such that its value can be queried in the exception handler,
then the exception handler would be able to observe the first store and thus
it should not be removed.


Yes, and it won't as long as the EH is thrown internally (and thus we have
a CFG reflecting it).  When it's only externally catched we lose of course...

We'd need an Ada testcase to actually show behavior that is not conforming
to an existing language specification though.
I'm not versed enough in Ada to even attempt to pull together a testcase 
for this.




I suspect we have a similar issue in C++ for sth like

void __attribute__((const)) foo () { throw; }

int x;
void bar ()
{
  x = 1;
  foo ();
  x = 2;
}

where foo is const but not nothrow.
I wouldn't be surprised if there's other problems with const functions 
that can throw.





We also have to be cognizant of systems where there is memory mapped at
location 0.  When that is true, we must check pt.null and honor it, even if
it pessimizes code.


With -fno-delete-null-pointer-checks (that's what such systems set) PTA computes
0 as "nonlocal" and thus it won't be a singleton points-to solution.

Ah, good.







For

int foo (int *p, int b)
{
  int *q;
  int i = 1;
  if (b)
q = 
  else
q = (void *)0;
  *q = 2;
  i = 3;
  return *q;
}


So on a system where *0 is a valid memory address, *q = 2 does not make
anything dead, nor does i = 3 unless we were to isolate the THEN/ELSE
blocks.

On a system where *0 traps, there is no way to observe the value of "i" in
the handler.  Thus i = 1 is a dead store.  I believe we must keep the *q = 2
store because it can trigger a signal/exception which is itself an
observable side effect?  Right?


But writing to 0 invokes undefined behavior which we have no obligation to
preserve (unless we make it well-defined with -fnon-call-exceptions -fexceptions
as a GCC extension).
It may invoke undefined behavior, but to date we have explicitly chosen 
to preserve the *0 =  to trigger a fault via the virtual 
memory system.  We kicked this around extensively in Nov 2013 with the 
introduction of isolation of erroneous paths.


I think part of what pushed us that direction was that a program could 
catch the signal related to the NULL dereference, then do something 
sensible in the handler.  That also happens to match what the Go 
language requires.








we remove all stores but the last store to i and the load from q (but we
don't
replace q with  here, a missed optimization if removing the other stores
is
valid).


But if we remove the *q = 2 store, we remove an observable side effect, the
trap/exception itself if we reach that statement via the ELSE path.


As said above - I don't think we have to care for C/C++ w/o
-fnon-call-exceptions.
So in the immediate term I propose we conditionalize the pt.null check 
on non-call exceptions.  THen I'll look more closely at the example 
above and see what we can reasonably do there.


jeff

Re: [PATCH] improve string find algorithm

2017-01-09 Thread Aditya K


Thanks,
-Aditya




From: Jonathan Wakely 
Sent: Monday, January 9, 2017 6:33 AM
To: Aditya K
Cc: Aditya Kumar; Sebastian Pop; libstd...@gcc.gnu.org; gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] improve string find algorithm
    
On 06/01/17 22:19 +, Aditya K wrote:
>> Could you try the corrected patch on your benchmarks?
>
>For the test-case you gave there is a regression.
>
>Benchmark    
>Time   CPU      Iterations
>---
>Without the patch: BM_StringRegression  81 ns 81 ns    
>8503740
>With the patch: BM_StringRegression  109 ns  109 ns    
>6346500
>
>
>The real advantage is when there are fewer matches as seen in 
>BM_StringFindNoMatch. The code for the benchmark  can be found in  
>https://github.com/llvm-mirror/libcxx/blob/master/benchmarks/string.bench.cpp



llvm-mirror/libcxx
github.com
Mirror of official libcxx git repository located at http://llvm.org/git/libcxx. 
Updated every five minutes.

>However, I have written an independent std-benchmark that can be used just by 
>exporting the CC, CXX, LD_LIBRARY_FLAGS: 
>https://github.com/hiraditya/std-benchmark



hiraditya/std-benchmark
github.com
std-benchmark - A benchmark for standard libraries


I think the large improvements are worth the smaller regression, so
I'm committing the patch I sent last week.



>Following are the results:
>--
>Without the patch:
>
>
>Run on (8 X 3403.85 MHz CPU s)
>2017-01-06 15:47:30
>***WARNING*** CPU scaling is enabled, the benchmark real time measurements may 
>be noisy and will incur extra overhead.
>Benchmark Time   CPU Iterations
>---
>BM_StringFindNoMatch/10   6 ns  6 ns  114499418
>BM_StringFindNoMatch/64  34 ns 34 ns   20578576
>BM_StringFindNoMatch/512    222 ns    222 ns    3136787
>BM_StringFindNoMatch/4096  1728 ns   1729 ns 401913
>BM_StringFindNoMatch/32768    13679 ns  13684 ns  50680
>BM_StringFindNoMatch/131072   54570 ns  54591 ns  12506
>BM_StringFindAllMatch/1   4 ns  4 ns  180640260
>BM_StringFindAllMatch/8   6 ns  6 ns  119682220
>BM_StringFindAllMatch/64  7 ns  7 ns   97679753
>BM_StringFindAllMatch/512    19 ns 19 ns   36035174
>BM_StringFindAllMatch/4096   92 ns 92 ns    7516363
>BM_StringFindAllMatch/32768 849 ns    849 ns 809284
>BM_StringFindAllMatch/131072   3610 ns   3611 ns 193894
>BM_StringFindMatch1/1 27273 ns  27283 ns  25579
>BM_StringFindMatch1/8 27289 ns  27300 ns  25516
>BM_StringFindMatch1/64    27297 ns  27307 ns  25561
>BM_StringFindMatch1/512   27303 ns  27314 ns  25579
>BM_StringFindMatch1/4096  27488 ns  27499 ns  25366
>BM_StringFindMatch1/32768 28157 ns  28168 ns  24750
>BM_StringFindMatch2/1 27273 ns  27284 ns  25562
>BM_StringFindMatch2/8 27296 ns  27306 ns  2
>BM_StringFindMatch2/64    27312 ns  27323 ns  25549
>BM_StringFindMatch2/512   27327 ns  27338 ns  25558
>BM_StringFindMatch2/4096  27513 ns  27524 ns  25349
>BM_StringFindMatch2/32768 28161 ns  28172 ns  24788
>BM_StringRegression  81 ns 81 ns    8503740
>
>
>
>With the patch
>
>
>Run on (8 X 1071.8 MHz CPU s)
>2017-01-06 16:06:29
>***WARNING*** CPU scaling is enabled, the benchmark real time measurements may 
>be noisy and will incur extra overhead.
>Benchmark Time   CPU Iterations
>---
>BM_StringFindNoMatch/10   6 ns  6 ns  121302159
>BM_StringFindNoMatch/64   7 ns  7 ns  102003502
>BM_StringFindNoMatch/512 15 ns 15 ns   44820639
>BM_StringFindNoMatch/4096    77 ns 77 ns    9016958
>BM_StringFindNoMatch/32768  555 ns    555 ns    1227219
>BM_StringFindNoMatch/131072    2688 ns   2689 ns 259488
>BM_StringFindAllMatch/1   8 ns  8 ns   85893410
>BM_StringFindAllMatch/8   9 ns  9 ns   80811804
>BM_StringFindAllMatch/64  9 ns  9 ns   74237599
>BM_StringFindAllMatch/512    23 ns 23 ns   31163379
>BM_StringFindAllMatch/4096   94 ns 94 ns    7317385
>BM_StringFindAllMatch/32768 847 ns    848 ns 803901
>BM_StringFindAllMatch/131072   3551

Re: [doc, committed] clean up include search path documentation in cpp.texi

2017-01-09 Thread Joseph Myers

On Sat, 7 Jan 2017, Sandra Loosemore wrote:

> I've checked in this patch to modernize the tutorial information about the
> preprocessor search path in cpp.texi -- in particular, removing the discussion
> of the deprecated -I- option, better integrating information about the
> preferred replacement -iquote and -system options into the flow, and removing
> some other redundant or obsolete bits.

Note the argument (see bug 19541) that -I- should be not deprecated 
because -iquote isn't actually a full replacement.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: C++ PATCH for c++/78948 (instantiation from discarded statement)

2017-01-09 Thread Jakub Jelinek

On Mon, Jan 09, 2017 at 09:10:06AM -0500, Nathan Sidwell wrote:
> On 01/09/2017 09:03 AM, Jakub Jelinek wrote:
> 
> > FAIL: g++.dg/cpp1z/constexpr-if10.C   (test for excess errors)
> > 
> > Could we do e.g.
> > sed -i -e 's/long long/int */g' testsuite/g++.dg/cpp1z/constexpr-if10.C
> > so that it is something where if constexpr will be always true?
> 
> that would seem fine to me

Here is what I've committed to trunk then
(the testcase still fails with older cc1plus and
make check-c++-all RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} 
dg.exp=constexpr-if*.C'
and passes with current cc1plus).

2017-01-09  Jakub Jelinek  

PR c++/78948
* g++.dg/cpp1z/constexpr-if10.C: Fix PR number in comment.
(main): Use int* instead of long long.

--- gcc/testsuite/g++.dg/cpp1z/constexpr-if10.C.jj  2017-01-08 
17:41:17.0 +0100
+++ gcc/testsuite/g++.dg/cpp1z/constexpr-if10.C 2017-01-09 17:15:54.006314980 
+0100
@@ -1,4 +1,4 @@
-// PR c++/79848
+// PR c++/78948
 // { dg-options -std=c++1z }
 
 template 
@@ -9,8 +9,8 @@ void sizeof_mismatch()
 
 int main()
 {
-  if constexpr(sizeof(long long) == sizeof(char*))
+  if constexpr(sizeof(int*) == sizeof(char*))
 ;
   else
-sizeof_mismatch();
+sizeof_mismatch();
 }


Jakub

Re: [PATCH] Fix exgettext to handle multi-line help texts from *.opt files (PR translation/78745)

2017-01-09 Thread Thomas Schwinge

Hi!

On Thu, 29 Dec 2016 16:15:01 +0100, Jakub Jelinek  wrote:
>   PR translation/78745
>   * exgettext: Handle multi-line help texts in *.opt files.

With this committed in r243981, I noticed the following new snippet in
gcc/po/gcc.pot:

+#: config/nvptx/nvptx.c:1132
+msgid "tid.y;"
+msgstr ""

gcc/config/nvptx/nvptx.c:

   1126 #define ENTRY_TEMPLATE(PS, PS_BYTES, MAD_PS_32) "\
   1127  (.param.u" PS " %arg, .param.u" PS " %stack, .param.u" PS " %sz)\n\
   1128 {\n\
   1129 .reg.u32 %r<3>;\n\
   1130 .reg.u" PS " %R<4>;\n\
   1131 mov.u32 %r0, %tid.y;\n\
   1132 mov.u32 %r1, %ntid.y;\n\
   1133 mov.u32 %r2, %ctaid.x;\n\
   [...]

As I understand it, this is because of the special handling to collect
"all %e and %n strings from driver specs, so those can be translated too"
(function spec_error_string).  Probably harmless enough to just ignore
it?


Grüße
 Thomas


> --- gcc/po/exgettext.jj   2016-01-04 14:55:54.0 +0100
> +++ gcc/po/exgettext  2016-12-28 19:18:08.142715830 +0100
> @@ -237,6 +237,8 @@ echo "scanning option files..." >&2
>  field = 0
>  while (getline < file) {
>   if (/^[ \t]*(;|$)/ || !/^[^ \t]/) {
> + if (field > 2)
> + printf("_(\"%s\")\n", line)
>   field = 0
>   } else {
>   if ((field == 1) && /MissingArgError/) {
> @@ -275,12 +277,15 @@ echo "scanning option files..." >&2
>   if (field == 2) {
>   line = $0
>   printf("#line %d \"%s\"\n", lineno, file)
> - printf("_(\"%s\")\n", line)
> + } else if (field > 2) {
> + line = line " " $0
>   }
>   field++;
>   }
>   lineno++;
>  }
> +if (field > 2)
> + printf("_(\"%s\")\n", line)
>}') >> $emsg
>  
>  # Run the xgettext commands, with temporary added as a file to scan.
> 
>   Jakub

Re: [PATCH] Do not error when -E provided (PR pch/78970).

2017-01-09 Thread Jakub Jelinek

On Mon, Jan 09, 2017 at 04:09:57PM +0100, Martin Liška wrote:
> Hello.
> 
> As reported here https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78970#c7,
> we should not report error when one want to pre-compile a {c,c++}-header file.
> 
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
> Ready to be installed?
> Martin

> >From bf226badedf38c81d09e34bde6ce0ff694e5b4fd Mon Sep 17 00:00:00 2001
> From: marxin 
> Date: Mon, 9 Jan 2017 14:20:41 +0100
> Subject: [PATCH] Do not error when -E provided (PR pch/78970).
> 
> gcc/ChangeLog:
> 
> 2017-01-09  Martin Liska  
> 
>   PR pch/78970
>   * gcc.c (driver_handle_option): Handle OPT_E and set
>   have_E.
>   (lookup_compiler): Do not show error message with have_E.
> ---
>  gcc/gcc.c | 10 +-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/gcc.c b/gcc/gcc.c
> index 1d2ed99ef5f..b9cdc5d81a8 100644
> --- a/gcc/gcc.c
> +++ b/gcc/gcc.c
> @@ -1931,6 +1931,9 @@ static int have_c = 0;
>  /* Was the option -o passed.  */
>  static int have_o = 0;
>  
> +/* Were the option -E passed.  */

Was ?

Ok with that change.

Jakub

[PATCH 2/2] IPA ICF: make algorithm stable to survive -fcompare-debug

2017-01-09 Thread Martin Liška

Second part of the patch does sorting of final congruence classes, it's groups
and items included in the groups according DECL_UID.

Both patches can bootstrap together on ppc64le-redhat-linux and survive 
regression tests.

Ready to be installed?
Martin
>From c3baaad9da1fdaa95ff5b8a69fc7925ede13d8c9 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Fri, 6 Jan 2017 13:30:08 +0100
Subject: [PATCH 2/2] IPA ICF: make algorithm stable to survive -fcompare-debug

gcc/testsuite/ChangeLog:

2017-01-06  Martin Liska  

	* gcc.dg/ipa/ipa-icf-1.c: Change scanned pattern.
	* gcc.dg/ipa/ipa-icf-10.c: Likewise.
	* gcc.dg/ipa/ipa-icf-11.c: Likewise.
	* gcc.dg/ipa/ipa-icf-12.c: Likewise.
	* gcc.dg/ipa/ipa-icf-13.c: Likewise.
	* gcc.dg/ipa/ipa-icf-16.c: Likewise.
	* gcc.dg/ipa/ipa-icf-18.c: Likewise.
	* gcc.dg/ipa/ipa-icf-2.c: Likewise.
	* gcc.dg/ipa/ipa-icf-20.c: Likewise.
	* gcc.dg/ipa/ipa-icf-21.c: Likewise.
	* gcc.dg/ipa/ipa-icf-23.c: Likewise.
	* gcc.dg/ipa/ipa-icf-25.c: Likewise.
	* gcc.dg/ipa/ipa-icf-26.c: Likewise.
	* gcc.dg/ipa/ipa-icf-27.c: Likewise.
	* gcc.dg/ipa/ipa-icf-3.c: Likewise.
	* gcc.dg/ipa/ipa-icf-35.c: Likewise.
	* gcc.dg/ipa/ipa-icf-36.c: Likewise.
	* gcc.dg/ipa/ipa-icf-37.c: Likewise.
	* gcc.dg/ipa/ipa-icf-5.c: Likewise.
	* gcc.dg/ipa/ipa-icf-7.c: Likewise.
	* gcc.dg/ipa/ipa-icf-8.c: Likewise.
	* gcc.dg/ipa/pr64307.c: Likewise.
	* gcc.dg/ipa/pr77653.c: Likewise.

gcc/ChangeLog:

2017-01-06  Martin Liska  

	* ipa-icf.c (sort_sem_items_by_decl_uid): New function.
	(sort_congruence_classes_by_decl_uid): Likewise.
	(sort_congruence_class_groups_by_decl_uid): Likewise.
	(sem_item_optimizer::merge_classes): Sort class, groups in these
	classes and members in the groups by DECL_UID of declarations.
	This would make merge operations stable.
---
 gcc/ipa-icf.c | 92 +--
 gcc/testsuite/gcc.dg/ipa/ipa-icf-1.c  |  2 +-
 gcc/testsuite/gcc.dg/ipa/ipa-icf-10.c |  2 +-
 gcc/testsuite/gcc.dg/ipa/ipa-icf-11.c |  2 +-
 gcc/testsuite/gcc.dg/ipa/ipa-icf-12.c |  2 +-
 gcc/testsuite/gcc.dg/ipa/ipa-icf-13.c |  6 +--
 gcc/testsuite/gcc.dg/ipa/ipa-icf-16.c |  2 +-
 gcc/testsuite/gcc.dg/ipa/ipa-icf-18.c |  2 +-
 gcc/testsuite/gcc.dg/ipa/ipa-icf-2.c  |  2 +-
 gcc/testsuite/gcc.dg/ipa/ipa-icf-20.c |  2 +-
 gcc/testsuite/gcc.dg/ipa/ipa-icf-21.c |  2 +-
 gcc/testsuite/gcc.dg/ipa/ipa-icf-23.c |  2 +-
 gcc/testsuite/gcc.dg/ipa/ipa-icf-25.c |  4 +-
 gcc/testsuite/gcc.dg/ipa/ipa-icf-26.c |  2 +-
 gcc/testsuite/gcc.dg/ipa/ipa-icf-27.c |  2 +-
 gcc/testsuite/gcc.dg/ipa/ipa-icf-3.c  |  2 +-
 gcc/testsuite/gcc.dg/ipa/ipa-icf-35.c |  6 +--
 gcc/testsuite/gcc.dg/ipa/ipa-icf-36.c | 10 ++--
 gcc/testsuite/gcc.dg/ipa/ipa-icf-37.c | 10 ++--
 gcc/testsuite/gcc.dg/ipa/ipa-icf-5.c  |  2 +-
 gcc/testsuite/gcc.dg/ipa/ipa-icf-7.c  |  2 +-
 gcc/testsuite/gcc.dg/ipa/ipa-icf-8.c  |  2 +-
 gcc/testsuite/gcc.dg/ipa/pr64307.c|  2 +-
 gcc/testsuite/gcc.dg/ipa/pr77653.c|  2 +-
 24 files changed, 124 insertions(+), 40 deletions(-)

diff --git a/gcc/ipa-icf.c b/gcc/ipa-icf.c
index 28de251c421..4c835c39e3d 100644
--- a/gcc/ipa-icf.c
+++ b/gcc/ipa-icf.c
@@ -3380,6 +3380,66 @@ sem_item_optimizer::dump_cong_classes (void)
   free (histogram);
 }
 
+/* Sort pair of sem_items A and B by DECL_UID.  */
+
+static int
+sort_sem_items_by_decl_uid (const void *a, const void *b)
+{
+  const sem_item *i1 = *(const sem_item * const *)a;
+  const sem_item *i2 = *(const sem_item * const *)b;
+
+  int uid1 = DECL_UID (i1->decl);
+  int uid2 = DECL_UID (i2->decl);
+
+  if (uid1 < uid2)
+return -1;
+  else if (uid1 > uid2)
+return 1;
+  else
+return 0;
+}
+
+/* Sort pair of congruence_classes A and B by DECL_UID of the first member.  */
+
+static int
+sort_congruence_classes_by_decl_uid (const void *a, const void *b)
+{
+  const congruence_class *c1 = *(const congruence_class * const *)a;
+  const congruence_class *c2 = *(const congruence_class * const *)b;
+
+  int uid1 = DECL_UID (c1->members[0]->decl);
+  int uid2 = DECL_UID (c2->members[0]->decl);
+
+  if (uid1 < uid2)
+return -1;
+  else if (uid1 > uid2)
+return 1;
+  else
+return 0;
+}
+
+/* Sort pair of congruence_class_groups A and B by
+   DECL_UID of the first member of a first group.  */
+
+static int
+sort_congruence_class_groups_by_decl_uid (const void *a, const void *b)
+{
+  const congruence_class_group *g1
+= *(const congruence_class_group * const *)a;
+  const congruence_class_group *g2
+= *(const congruence_class_group * const *)b;
+
+  int uid1 = DECL_UID (g1->classes[0]->members[0]->decl);
+  int uid2 = DECL_UID (g2->classes[0]->members[0]->decl);
+
+  if (uid1 < uid2)
+return -1;
+  else if (uid1 > uid2)
+return 1;
+  else
+return 0;
+}
+
 /* After reduction is done, we can declare all items in a group
to be equal. PREV_CLASS_COUNT is start number of classes
before reduction. True is returned if there's a merge operation
@@ -3397,6 +3457,22 @@

[PATCH] Do not error when -E provided (PR pch/78970).

2017-01-09 Thread Martin Liška

Hello.

As reported here https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78970#c7,
we should not report error when one want to pre-compile a {c,c++}-header file.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Martin
>From bf226badedf38c81d09e34bde6ce0ff694e5b4fd Mon Sep 17 00:00:00 2001
From: marxin 
Date: Mon, 9 Jan 2017 14:20:41 +0100
Subject: [PATCH] Do not error when -E provided (PR pch/78970).

gcc/ChangeLog:

2017-01-09  Martin Liska  

	PR pch/78970
	* gcc.c (driver_handle_option): Handle OPT_E and set
	have_E.
	(lookup_compiler): Do not show error message with have_E.
---
 gcc/gcc.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/gcc/gcc.c b/gcc/gcc.c
index 1d2ed99ef5f..b9cdc5d81a8 100644
--- a/gcc/gcc.c
+++ b/gcc/gcc.c
@@ -1931,6 +1931,9 @@ static int have_c = 0;
 /* Was the option -o passed.  */
 static int have_o = 0;
 
+/* Were the option -E passed.  */
+static int have_E = 0;
+
 /* Pointer to output file name passed in with -o. */
 static const char *output_file = 0;
 
@@ -4067,6 +4070,10 @@ driver_handle_option (struct gcc_options *opts,
   validated = true;
   break;
 
+case OPT_E:
+  have_E = true;
+  break;
+
 case OPT_x:
   spec_lang = arg;
   if (!strcmp (spec_lang, "none"))
@@ -8328,7 +8335,8 @@ lookup_compiler (const char *name, size_t length, const char *language)
 	  {
 	if (name != NULL && strcmp (name, "-") == 0
 		&& (strcmp (cp->suffix, "@c-header") == 0
-		|| strcmp (cp->suffix, "@c++-header") == 0))
+		|| strcmp (cp->suffix, "@c++-header") == 0)
+		&& !have_E)
 	  fatal_error (input_location,
 			   "cannot use %<-%> as input filename for a "
 			   "precompiled header");
-- 
2.11.0

[PATCH 1/2] Revert m_classes_vec introduction.

2017-01-09 Thread Martin Liška

First patch removes basically what was installed by Jakub in r242910,
except formatting changes.

Martin
>From 32f4ccb48dfd84e4f64fb38f5122f5dc61482f3b Mon Sep 17 00:00:00 2001
From: marxin 
Date: Fri, 6 Jan 2017 11:36:13 +0100
Subject: [PATCH 1/2] Revert m_classes_vec introduction.

gcc/ChangeLog:

2017-01-06  Martin Liska  

	* ipa-icf.c (sem_item_optimizer::sem_item_optimizer): Remove
	usage of m_classes_vec.
	(sem_item_optimizer::~sem_item_optimizer):  Likewise.
	(sem_item_optimizer::get_group_by_hash): Likewise.
	(sem_item_optimizer::subdivide_classes_by_equality): Likewise.
	(sem_item_optimizer::subdivide_classes_by_sensitive_refs): Likewise.
	(sem_item_optimizer::verify_classes): Likewise.
	(sem_item_optimizer::process_cong_reduction): Likewise.
	(sem_item_optimizer::dump_cong_classes): Likewise.
	(sem_item_optimizer::merge_classes): Likewise.
	* ipa-icf.h (congruence_class_hash): Rename from
	congruence_class_group_hash.  Remove declaration of
	m_classes_vec.
---
 gcc/ipa-icf.c | 106 +++---
 gcc/ipa-icf.h |   7 ++--
 2 files changed, 51 insertions(+), 62 deletions(-)

diff --git a/gcc/ipa-icf.c b/gcc/ipa-icf.c
index d3f2ca14eac..28de251c421 100644
--- a/gcc/ipa-icf.c
+++ b/gcc/ipa-icf.c
@@ -2288,7 +2288,6 @@ sem_item_optimizer::sem_item_optimizer ()
   m_varpool_node_hooks (NULL)
 {
   m_items.create (0);
-  m_classes_vec.create (0);
   bitmap_obstack_initialize (_bmstack);
 }
 
@@ -2297,19 +2296,18 @@ sem_item_optimizer::~sem_item_optimizer ()
   for (unsigned int i = 0; i < m_items.length (); i++)
 delete m_items[i];
 
-  unsigned int l;
-  congruence_class_group *it;
-  FOR_EACH_VEC_ELT (m_classes_vec, l, it)
+
+  for (hash_table::iterator it = m_classes.begin ();
+   it != m_classes.end (); ++it)
 {
-  for (unsigned int i = 0; i < it->classes.length (); i++)
-	delete it->classes[i];
+  for (unsigned int i = 0; i < (*it)->classes.length (); i++)
+	delete (*it)->classes[i];
 
-  it->classes.release ();
-  free (it);
+  (*it)->classes.release ();
+  free (*it);
 }
 
   m_items.release ();
-  m_classes_vec.release ();
 
   bitmap_obstack_release (_bmstack);
 }
@@ -2502,7 +2500,6 @@ sem_item_optimizer::get_group_by_hash (hashval_t hash, sem_item_type type)
   else
 {
   item->classes.create (1);
-  m_classes_vec.safe_push (item);
   *slot = item;
 }
 
@@ -2847,15 +2844,14 @@ sem_item_optimizer::parse_nonsingleton_classes (void)
 void
 sem_item_optimizer::subdivide_classes_by_equality (bool in_wpa)
 {
-  unsigned int l;
-  congruence_class_group *it;
-  FOR_EACH_VEC_ELT (m_classes_vec, l, it)
+  for (hash_table ::iterator it = m_classes.begin ();
+   it != m_classes.end (); ++it)
 {
-  unsigned int class_count = it->classes.length ();
+  unsigned int class_count = (*it)->classes.length ();
 
   for (unsigned i = 0; i < class_count; i++)
 	{
-	  congruence_class *c = it->classes[i];
+	  congruence_class *c = (*it)->classes[i];
 
 	  if (c->members.length() > 1)
 	{
@@ -2864,7 +2860,7 @@ sem_item_optimizer::subdivide_classes_by_equality (bool in_wpa)
 	  sem_item *first = c->members[0];
 	  new_vector.safe_push (first);
 
-	  unsigned class_split_first = it->classes.length ();
+	  unsigned class_split_first = (*it)->classes.length ();
 
 	  for (unsigned j = 1; j < c->members.length (); j++)
 		{
@@ -2881,9 +2877,9 @@ sem_item_optimizer::subdivide_classes_by_equality (bool in_wpa)
 		  bool integrated = false;
 
 		  for (unsigned k = class_split_first;
-			   k < it->classes.length (); k++)
+			   k < (*it)->classes.length (); k++)
 			{
-			  sem_item *x = it->classes[k]->members[0];
+			  sem_item *x = (*it)->classes[k]->members[0];
 			  bool equals
 			= in_wpa ? x->equals_wpa (item, m_symtab_node_map)
  : x->equals (item, m_symtab_node_map);
@@ -2891,7 +2887,7 @@ sem_item_optimizer::subdivide_classes_by_equality (bool in_wpa)
 			  if (equals)
 			{
 			  integrated = true;
-			  add_item_to_class (it->classes[k], item);
+			  add_item_to_class ((*it)->classes[k], item);
 
 			  break;
 			}
@@ -2904,7 +2900,7 @@ sem_item_optimizer::subdivide_classes_by_equality (bool in_wpa)
 			  m_classes_count++;
 			  add_item_to_class (c, item);
 
-			  it->classes.safe_push (c);
+			  (*it)->classes.safe_push (c);
 			}
 		}
 		}
@@ -2935,16 +2931,15 @@ sem_item_optimizer::subdivide_classes_by_sensitive_refs ()
 
   unsigned newly_created_classes = 0;
 
-  unsigned int l;
-  congruence_class_group *it;
-  FOR_EACH_VEC_ELT (m_classes_vec, l, it)
+  for (hash_table ::iterator it = m_classes.begin ();
+   it != m_classes.end (); ++it)
 {
-  unsigned int class_count = it->classes.length ();
+  unsigned int class_count = (*it)->classes.length ();
   auto_vec new_classes;
 
   for (unsigned i = 0; i < class_count; i++)
 	{
-	  congruence_class *c =

Re: [PATCH] Speed-up use-after-scope (re-writing to SSA) (version 2)

2017-01-09 Thread Martin Liška

On 12/22/2016 06:21 PM, Jakub Jelinek wrote:
> On Thu, Dec 22, 2016 at 06:03:50PM +0100, Martin Liška wrote:
>> Done by hash_map.
> 
> Ok.
> 
>>> 3) I think you just want to do copy_node, plus roughly what
>>>copy_decl_for_dup_finish does (and set DECL_ARTIFICIAL and
>>>DECL_IGNORED_P) - except that you don't have copy_body_data
>>>so you can't use it directly (well, you could create copy_body_data
>>>just for that purpose and set src_fn and dst_fn to current_function_decl
>>>and the rest to NULL)
>>
>> I decided to use the function with prepared copy_body_data ;)
> 
> Ok.
> 
>>> I'd really like to see the storing to poisoned var becoming non-addressable
>>> in action (if it can ever happen, so it isn't just theoretical) to look at
>>> what it does.
>>
>> Well, having following sample:
>>
>> int
>> main (int argc, char **argv)
>> {
>>   int *ptr = 0;
>>
>>   {
>> int a;
>> ptr = 
>> *ptr = 12345;
>>   }
>>
>>   *ptr = 12345;
>>   return *ptr;
>> }
>>
>> Right after rewriting into SSA it looks as follows:
>>
>> main (int argc, char * * argv)
>> {
>>   int a;
>>   int * ptr;
>>   int _8;
>>
>>[0.00%]:
>>   a_9 = 12345;
>>   a_10 = ASAN_POISON ();
>>   a_11 = 12345;
>>   _8 = a_11;
>>   return _8;
>>
>> }
> 
> But we do not want to rewrite into SSA that way, but instead as
> 
> main (int argc, char * * argv)
> {
>   int a;
>   int * ptr;
>   int _8;
> 
>[0.00%]:
>   a_9 = 12345;
>   a_10 = ASAN_POISON ();
>   ASAN_POISON (a_10);
>   a_11 = 12345;
>   _8 = a_11;
>   return _8;
> 
> }

I'm still not sure how to do that. Problem is that transformation from:

  ASAN_MARK (UNPOISON, , 4);
  a = 5;
  ASAN_MARK (POISON, , 4);

to 

  a_8 = 5;
  a_9 = ASAN_POISON ();

happens in tree-ssa.c, after SSA is created, in situation where we prove the 'a'
does not need to live in memory. Thus said, question is how to identify that we
need to transform into SSA in a different way:

   a_10 = ASAN_POISON ();
   ASAN_POISON (a_10);

Thanks for help,
Martin

> 
> or something similar, so that you can 1) emit a diagnostics at the spot
> where the out of scope store happens 2) differentiate between reads from
> out of scope var and stores to out of scope var
> 
> What we need is to hook into tree-into-ssa.c for this, where a_11 is
> created, find out that there is a store to a var that has ASAN_POISON result
> as currently active definition.  Something like if we emit ASAN_POISON
> for some var, during tree-into-ssa.c if we see a store to that var that we
> need to rewrite into SSA pretend there is a read from that var first at
> that location and if it is result of ASAN_POISON, emit the additional
> stmt.
> 
>   Jakub
>

Re: C++ PATCH for c++/78948 (instantiation from discarded statement)

2017-01-09 Thread Nathan Sidwell


On 01/09/2017 09:03 AM, Jakub Jelinek wrote:


FAIL: g++.dg/cpp1z/constexpr-if10.C   (test for excess errors)

Could we do e.g.
sed -i -e 's/long long/int */g' testsuite/g++.dg/cpp1z/constexpr-if10.C
so that it is something where if constexpr will be always true?


that would seem fine to me

nathan
--
Nathan Sidwell

Re: [PATCH] Outer vs. inner loop ifcvt (PR tree-optimization/78899)

2017-01-09 Thread Richard Biener

On Mon, 9 Jan 2017, Jakub Jelinek wrote:

> On Mon, Jan 09, 2017 at 11:08:08AM +0100, Richard Biener wrote:
> > > > > There is one thing my patch doesn't do but should for efficiency, if 
> > > > > loop1
> > > > > (outer loop) is not successfully outer-loop vectorized, then we 
> > > > > should mark
> > > > > loop2 (its inner loop) as dont_vectorize if the outer loop has been
> > > > > LOOP_VECTORIZED guarded.  Then the gcc.dg/gomp/pr68128-1.c change
> > > > > wouldn't be actually needed.
> > 
> > (*)
> > Ok, I don't have too many spare cycles either so can you fix (*)?  Then
> > we can go with the extra versionings for GCC 7 for the moment and if any
> > of us has enough time to revisit this soon we can.
> 
> Here is untested (except for the affected testcases) patch to do that,
> ok if it passes bootstrap/regtest?  I'll file [8 Regression] with details
> what we want to undo and do for GCC 8.

Ok.

Thanks,
Richard.

> 2017-01-09  Jakub Jelinek  
> 
>   PR tree-optimization/78899
>   * tree-if-conv.c (version_loop_for_if_conversion): Instead of
>   returning bool return struct loop *, NULL for failure and the new
>   loop on success.
>   (versionable_outer_loop_p): Don't version outer loop if it has
>   dont_vectorized bit set.
>   (tree_if_conversion): When versioning outer loop, ensure
>   tree_if_conversion is performed also on the inner loop of the
>   non-vectorizable outer loop copy.
>   * tree-vectorizer.c (set_uid_loop_bbs): Formatting fix.  Fold
>   LOOP_VECTORIZED in inner loop of the scalar outer loop and
>   prevent vectorization of it.
>   (vectorize_loops): For outer + inner LOOP_VECTORIZED, ensure
>   the outer loop vectorization of the non-scalar version is attempted
>   before vectorization of the inner loop in scalar version.  If
>   outer LOOP_VECTORIZED guarded loop is not vectorized, prevent
>   vectorization of its inner loop.
>   * tree-vect-loop-manip.c (rename_variables_in_bb): If outer_loop
>   has 2 inner loops, rename also on edges from bb whose single pred
>   is outer_loop->header.  Fix typo in function comment.
> 
>   * gcc.target/i386/pr78899.c: New test.
>   * gcc.dg/pr71077.c: New test.
> 
> --- gcc/tree-if-conv.c.jj 2017-01-06 19:34:04.052560851 +0100
> +++ gcc/tree-if-conv.c2017-01-09 11:52:59.154806369 +0100
> @@ -2535,7 +2535,7 @@ combine_blocks (struct loop *loop)
> loop to execute.  The vectorizer pass will fold this
> internal call into either true or false.  */
>  
> -static bool
> +static struct loop *
>  version_loop_for_if_conversion (struct loop *loop)
>  {
>basic_block cond_bb;
> @@ -2566,7 +2566,7 @@ version_loop_for_if_conversion (struct l
>  ifc_bbs[i]->aux = saved_preds[i];
>  
>if (new_loop == NULL)
> -return false;
> +return NULL;
>  
>new_loop->dont_vectorize = true;
>new_loop->force_vectorize = false;
> @@ -2574,7 +2574,7 @@ version_loop_for_if_conversion (struct l
>gimple_call_set_arg (g, 1, build_int_cst (integer_type_node, 
> new_loop->num));
>gsi_insert_before (, g, GSI_SAME_STMT);
>update_ssa (TODO_update_ssa);
> -  return true;
> +  return new_loop;
>  }
>  
>  /* Return true when LOOP satisfies the follow conditions that will
> @@ -2594,6 +2594,7 @@ static bool
>  versionable_outer_loop_p (struct loop *loop)
>  {
>if (!loop_outer (loop)
> +  || loop->dont_vectorize
>|| !loop->inner
>|| loop->inner->next
>|| !single_exit (loop)
> @@ -2602,7 +2603,7 @@ versionable_outer_loop_p (struct loop *l
>|| !single_pred_p (loop->latch)
>|| !single_pred_p (loop->inner->latch))
>  return false;
> -  
> +
>basic_block outer_exit = single_pred (loop->latch);
>basic_block inner_exit = single_pred (loop->inner->latch);
>  
> @@ -2789,7 +2790,10 @@ tree_if_conversion (struct loop *loop)
>  {
>unsigned int todo = 0;
>bool aggressive_if_conv;
> +  struct loop *rloop;
>  
> + again:
> +  rloop = NULL;
>ifc_bbs = NULL;
>any_pred_load_store = false;
>any_complicated_phi = false;
> @@ -2829,8 +2833,31 @@ tree_if_conversion (struct loop *loop)
>struct loop *vloop
>   = (versionable_outer_loop_p (loop_outer (loop))
>  ? loop_outer (loop) : loop);
> -  if (!version_loop_for_if_conversion (vloop))
> +  struct loop *nloop = version_loop_for_if_conversion (vloop);
> +  if (nloop == NULL)
>   goto cleanup;
> +  if (vloop != loop)
> + {
> +   /* If versionable_outer_loop_p decided to version the
> +  outer loop, version also the inner loop of the non-vectorized
> +  loop copy.  So we transform:
> +   loop1
> + loop2
> +  into:
> +   if (LOOP_VECTORIZED (1, 3))
> + {
> +   loop1
> + loop2
> + }
> +   else
> + loop3 (copy of loop1)
> +   if (LOOP_VECTORIZED

Re: C++ PATCH for c++/78948 (instantiation from discarded statement)

2017-01-09 Thread Jakub Jelinek

On Mon, Jan 09, 2017 at 08:49:25AM -0500, Nathan Sidwell wrote:
> On 01/08/2017 01:34 AM, Jason Merrill wrote:
> > P0292 defines the notion of "discarded statement" which is almost but
> > not quite the same as "unevaluated operand".  This PR shows a case
> > where we need to be able to tell that we're in a discarded statement
> > at a lower level than in the parser, so this patch moves the
> > information about being in a discarded statement from the parser into
> > saved_scope.  I've also added a test for a couple of cases that
> > demonstrate why we can't just use cp_unevaluated_context.
> 
> > +  if constexpr(sizeof(long long) == sizeof(char*))
> > +;
> > +  else
> > +sizeof_mismatch();
> 
> This is going to behave differently on 32 and 64 bit HW.  Is that
> intentional? (If so, a comment would be nice)

Yeah, it fails on i686-linux and other ILP32 and similar targets.

FAIL: g++.dg/cpp1z/constexpr-if10.C   (test for excess errors)

Could we do e.g.
sed -i -e 's/long long/int */g' testsuite/g++.dg/cpp1z/constexpr-if10.C
so that it is something where if constexpr will be always true?

Jakub

Re: C++ PATCH for c++/78948 (instantiation from discarded statement)

2017-01-09 Thread Kyrill Tkachov



On 09/01/17 13:49, Nathan Sidwell wrote:

On 01/08/2017 01:34 AM, Jason Merrill wrote:

P0292 defines the notion of "discarded statement" which is almost but
not quite the same as "unevaluated operand".  This PR shows a case
where we need to be able to tell that we're in a discarded statement
at a lower level than in the parser, so this patch moves the
information about being in a discarded statement from the parser into
saved_scope.  I've also added a test for a couple of cases that
demonstrate why we can't just use cp_unevaluated_context.



+  if constexpr(sizeof(long long) == sizeof(char*))
+;
+  else
+sizeof_mismatch();


This is going to behave differently on 32 and 64 bit HW.  Is that intentional? 
(If so, a comment would be nice)



Indeed I see this test failing when testing aarch64 with -mabi=ilp32.

Kyrill


nathan

Re: C++ PATCH for c++/78948 (instantiation from discarded statement)

2017-01-09 Thread Nathan Sidwell


On 01/08/2017 01:34 AM, Jason Merrill wrote:

P0292 defines the notion of "discarded statement" which is almost but
not quite the same as "unevaluated operand".  This PR shows a case
where we need to be able to tell that we're in a discarded statement
at a lower level than in the parser, so this patch moves the
information about being in a discarded statement from the parser into
saved_scope.  I've also added a test for a couple of cases that
demonstrate why we can't just use cp_unevaluated_context.



+  if constexpr(sizeof(long long) == sizeof(char*))
+;
+  else
+sizeof_mismatch();


This is going to behave differently on 32 and 64 bit HW.  Is that 
intentional? (If so, a comment would be nice)


nathan

--
Nathan Sidwell

[PATCH] [ARC] Clean up arc header file.

2017-01-09 Thread Claudiu Zissulescu

This patch revamps the arc's header file by means of using separate
headers for different tool targets. Each target header file holds the
specific compiler backend macros definitions. Thus, we have:
 - elf.h is used for bare metal type of toolchains.
 - linux.h is used by our Linux type of toolchains.
 - big.h is used by big-endians toolchains.

This patch also cleans up arc specifics from config.gcc, consolidating
everything in one of the above new header files.


OK to apply?
Claudiu

gcc/
2016-07-29  Claudiu Zissulescu  

* config.gcc (arc*-): Clean up, use arc/big.h, arc/elf.h, and
arc/linux.h headers.
* config/arc/arc.h (TARGET_OS_CPP_BUILTINS): Remove.
(LINK_SPEC): Likewise.
(ARC_TLS_EXTRA_START_SPEC): Likewise.
(EXTRA_SPECS): Likewise.
(STARTFILE_SPEC): Likewise.
(ENDFILE_SPEC): Likewise.
(LIB_SPEC): Likewise.
(TARGET_SDATA_DEFAULT): Likewise.
(TARGET_MMEDIUM_CALLS_DEFAULT): Likewise.
(MULTILIB_DEFAULTS): Likewise.
(DWARF2_UNWIND_INFO): Likewise.
* config/arc/big.h: New file.
* config/arc/elf.h: Likewise.
* config/arc/linux.h: Likewise.
* config/arc/t-uClibc: Remove.
---
 gcc/config.gcc  |  15 +++---
 gcc/config/arc/arc.h| 120 +---
 gcc/config/arc/big.h|  22 +
 gcc/config/arc/elf.h|  55 ++
 gcc/config/arc/linux.h  |  76 ++
 gcc/config/arc/t-uClibc |  20 
 6 files changed, 171 insertions(+), 137 deletions(-)
 create mode 100644 gcc/config/arc/big.h
 create mode 100644 gcc/config/arc/elf.h
 create mode 100644 gcc/config/arc/linux.h
 delete mode 100644 gcc/config/arc/t-uClibc

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 7c27546..8e41b31 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -319,6 +319,7 @@ arc*-*-*)
c_target_objs="arc-c.o"
cxx_target_objs="arc-c.o"
extra_options="${extra_options} arc/arc-tables.opt"
+   extra_headers="arc-simd.h"
;;
 arm*-*-*)
cpu_type=arm
@@ -1003,8 +1004,7 @@ alpha*-dec-*vms*)
tmake_file="${tmake_file} alpha/t-vms alpha/t-alpha"
;;
 arc*-*-elf*)
-   extra_headers="arc-simd.h"
-   tm_file="arc/arc-arch.h dbxelf.h elfos.h newlib-stdint.h ${tm_file}"
+   tm_file="arc/arc-arch.h dbxelf.h elfos.h newlib-stdint.h arc/elf.h 
${tm_file}"
tmake_file="arc/t-multilib arc/t-arc"
extra_gcc_objs="driver-arc.o"
if test "x$with_cpu" != x; then
@@ -1021,15 +1021,12 @@ arc*-*-elf*)
*)  echo "with_endian=${with_endian} not supported."; exit 1 ;;
esac
case ${with_endian} in
-   big*)   
tm_defines="DRIVER_ENDIAN_SELF_SPECS=\\\"%{!EL:%{!mlittle-endian:-mbig-endian}}\\\"
 ${tm_defines}"
+   big*)   tm_file="arc/big.h ${tm_file}"
esac
;;
 arc*-*-linux-uclibc*)
-   extra_headers="arc-simd.h"
-   tm_file="arc/arc-arch.h dbxelf.h elfos.h gnu-user.h linux.h 
linux-android.h glibc-stdint.h ${tm_file}"
-   tmake_file="${tmake_file} arc/t-uClibc arc/t-arc"
-   tm_defines="${tm_defines} TARGET_SDATA_DEFAULT=0"
-   tm_defines="${tm_defines} TARGET_MMEDIUM_CALLS_DEFAULT=1"
+   tm_file="arc/arc-arch.h dbxelf.h elfos.h gnu-user.h linux.h 
linux-android.h glibc-stdint.h arc/linux.h ${tm_file}"
+   tmake_file="${tmake_file} arc/t-arc"
extra_gcc_objs="driver-arc.o"
if test "x$with_cpu" != x; then
tm_defines="${tm_defines} TARGET_CPU_BUILD=PROCESSOR_$with_cpu"
@@ -1045,7 +1042,7 @@ arc*-*-linux-uclibc*)
*)  echo "with_endian=${with_endian} not supported."; exit 1 ;;
esac
case ${with_endian} in
-   big*)   
tm_defines="DRIVER_ENDIAN_SELF_SPECS=\\\"%{!EL:%{!mlittle-endian:-mbig-endian}}\\\"
 ${tm_defines}"
+   big*)   tm_file="arc/big.h ${tm_file}"
esac
 ;;
 arm-wrs-vxworks)
diff --git a/gcc/config/arc/arc.h b/gcc/config/arc/arc.h
index da13ea1..549e698 100644
--- a/gcc/config/arc/arc.h
+++ b/gcc/config/arc/arc.h
@@ -1,14 +1,6 @@
 /* Definitions of target machine for GNU compiler, Synopsys DesignWare ARC cpu.
Copyright (C) 1994-2017 Free Software Foundation, Inc.
 
-   Sources derived from work done by Sankhya Technologies (www.sankhya.com) on
-   behalf of Synopsys Inc.
-
-   Position Independent Code support added,Code cleaned up,
-   Comments and Support For ARC700 instructions added by
-   Saurabh Verma (saurabh.ve...@codito.com)
-   Ramana Radhakrishnan(ramana.radhakrish...@codito.com)
-
 This file is part of GCC.
 
 GCC is free software; you can redistribute it and/or modify
@@ -57,32 +49,9 @@ along with GCC; see the file COPYING3.  If not see
 #define SYMBOL_REF_SHORT_CALL_P(X) \
((SYMBOL_REF_FLAGS (X) & SYMBOL_FLAG_SHORT_CALL) != 0)
 
-#undef ASM_SPEC
-#undef LINK_SPEC
-#undef STARTFILE_SPEC
-#undef ENDFILE_SPEC
-#undef SIZE_TYPE
-#undef

Re: Implement -Wduplicated-branches (PR c/64279) (v3)

2017-01-09 Thread Marek Polacek

On Mon, Jan 09, 2017 at 12:18:01PM +0100, Jakub Jelinek wrote:
> On Mon, Jan 09, 2017 at 10:21:47AM +0100, Marek Polacek wrote:
> > +/* Callback function to determine whether an expression TP or one of its
> > +   subexpressions comes from macro expansion.  Used to suppress bogus
> > +   warnings.  */
> > +
> > +static tree
> > +expr_from_macro_expansion_r (tree *tp, int *, void *)
> > +{
> > +  if (CAN_HAVE_LOCATION_P (*tp)
> > +  && from_macro_expansion_at (EXPR_LOCATION (*tp)))
> > +return integer_zero_node;
> > +
> > +  return NULL_TREE;
> > +}
> 
> I know this is hard issue, but won't it disable the warning way too often?
> 
> Perhaps it is good enough for the initial version (GCC 7), but doesn't it stop
> whenever one uses NULL in the branches, or some other trivial macros like
> that?  Perhaps we want to do the analysis if there is anything from macro
> expansion side-by-side on both the expressions and if you find something
> from a macro expansion, then still warn if both corresponding expressions
> are from the same macro expansion (either only non-function like one, or
> perhaps also function-like one with the same arguments, if it is possible
> to figure out those somehow)?  And perhaps it would be nice to choose
> warning level, whether you want to warn only under these rules (no macros
> or something smarter if implemented) vs. some certainly non-default more
> aggressive mode that will just warn no matter what macros there are.

I agree that not warning for 
  if (foo)
return NULL;
  else
return NULL;
is bad.  But how can I compare those expressions side-by-side?  I'm not finding
anything. :(

As for the idea of multiple levels, sure, I could do that, although I'd prefer
to get the initial version in first.

Marek

[RFC] [PATCH] Ignore Debug options for ICF equality.

2017-01-09 Thread Martin Liška

Hello.

Thanks Alexander for fixed the issue. In the meantime, I worked on a patch that 
would
be more generic and would introduce cl_optimization_eq function. It's 
definitely stage1
material and it adds 'Debug' keyword to Optimization options (equal to 
PerFunction that is currently
in trunk). However there are differences:

- cl_optimization_hash has new argument 'ignored_flags' that specify which 
flags are ignored
(currently only Debug is handled here). 
- cl_optimization_eq - new function, having the same argument

For the future, if there will be consensus, I'll be happy to rename 
'optimization' (Optimization)
to 'per_function' (PerFunction)? I think Optimization is unlucky name.

Thoughts?
Martin
>From 68f800b3093c6a5bf9fff86ec362af766ad5288b Mon Sep 17 00:00:00 2001
From: marxin 
Date: Fri, 6 Jan 2017 16:01:23 +0100
Subject: [PATCH] Ignore Debug options for ICF equality.

gcc/testsuite/ChangeLog:

2017-01-09  Martin Liska  

	* gcc.dg/ipa/ipa-icf-38.c: New test.
	* gcc.dg/ipa/ipa-icf-39.c: New test.

gcc/ChangeLog:

2017-01-09  Martin Liska  

	* common.opt: Add Debug attribute for debug
	optimization flags.
	* ipa-icf.c (sem_function::get_hash): Ignore CL_DEBUG options.
	(sem_item::ignore_attr_p): New function.
	(sem_item::compare_attributes): Use the function.
	(sem_function::equals_wpa): Fix typo.
	* ipa-icf.h (ignore_attr_p): Declare new function.
	* ipa-inline.c (can_inline_edge_p): Remove comparison
	of optimization flags.
	* opt-functions.awk (switch_opts_type_flags): New function.
	* optc-save-gen.awk: Add new assert.
	(cl_optimization_hash): Add new argument.
	(cl_optimization_eq): New function.
	* opth-gen.awk: Update declaration.
	* opts.h (CL_DEBUG): Define new macro.
---
 gcc/common.opt|  8 ++--
 gcc/ipa-icf.c | 25 +++
 gcc/ipa-icf.h |  3 ++
 gcc/ipa-inline.c  |  7 
 gcc/opt-functions.awk | 12 +-
 gcc/optc-save-gen.awk | 79 ---
 gcc/opth-gen.awk  | 16 +++
 gcc/opts.h|  1 +
 gcc/testsuite/gcc.dg/ipa/ipa-icf-38.c | 23 ++
 gcc/testsuite/gcc.dg/ipa/ipa-icf-39.c | 13 ++
 10 files changed, 123 insertions(+), 64 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/ipa/ipa-icf-38.c
 create mode 100644 gcc/testsuite/gcc.dg/ipa/ipa-icf-39.c

diff --git a/gcc/common.opt b/gcc/common.opt
index 9e751bda6be..d4a5e2461af 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2644,7 +2644,7 @@ Common Undocumented Var(flag_use_linker_plugin)
 ; will be set according to optimize, debug_info_level and debug_hooks
 ; in process_options ().
 fvar-tracking
-Common Report Var(flag_var_tracking) Init(2) Optimization
+Common Report Var(flag_var_tracking) Init(2) Optimization Debug
 Perform variable tracking.
 
 ; Positive if we should track variables at assignments, negative if
@@ -2652,13 +2652,13 @@ Perform variable tracking.
 ; annotations.  When flag_var_tracking_assignments ==
 ; AUTODETECT_VALUE it will be set according to flag_var_tracking.
 fvar-tracking-assignments
-Common Report Var(flag_var_tracking_assignments) Init(2) Optimization
+Common Report Var(flag_var_tracking_assignments) Init(2) Optimization Debug
 Perform variable tracking by annotating assignments.
 
 ; Nonzero if we should toggle flag_var_tracking_assignments after
 ; processing options and computing its default.  */
 fvar-tracking-assignments-toggle
-Common Report Var(flag_var_tracking_assignments_toggle) Optimization
+Common Report Var(flag_var_tracking_assignments_toggle) Optimization Debug
 Toggle -fvar-tracking-assignments.
 
 ; Positive if we should track uninitialized variables, negative if
@@ -2666,7 +2666,7 @@ Toggle -fvar-tracking-assignments.
 ; annotations.  When flag_var_tracking_uninit == AUTODETECT_VALUE it
 ; will be set according to flag_var_tracking.
 fvar-tracking-uninit
-Common Report Var(flag_var_tracking_uninit) Optimization
+Common Report Var(flag_var_tracking_uninit) Optimization Debug
 Perform variable tracking and also tag variables that are uninitialized.
 
 ftree-vectorize
diff --git a/gcc/ipa-icf.c b/gcc/ipa-icf.c
index 4c835c39e3d..94e6a9ed5a0 100644
--- a/gcc/ipa-icf.c
+++ b/gcc/ipa-icf.c
@@ -83,6 +83,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "ipa-icf.h"
 #include "stor-layout.h"
 #include "dbgcnt.h"
+#include "opts.h"
 
 using namespace ipa_icf_gimple;
 
@@ -289,10 +290,7 @@ sem_function::get_hash (void)
 hstate.add_wide_int
 	 (cl_target_option_hash
 	   (TREE_TARGET_OPTION (DECL_FUNCTION_SPECIFIC_TARGET (decl;
-  if (DECL_FUNCTION_SPECIFIC_OPTIMIZATION (decl))
-	hstate.add_wide_int
-	 (cl_optimization_hash
-	   (TREE_OPTIMIZATION (DECL_FUNCTION_SPECIFIC_OPTIMIZATION (decl;
+  hstate.add_wide_int (cl_optimization_hash (opts_for_fn (decl), CL_DEBUG));
   hstate.add_flag

Re: [PATCH] PR78968 add configure check for __cxa_thread_atexit in libc

2017-01-09 Thread Jonathan Wakely


On 06/01/17 17:06 +, Jonathan Wakely wrote:

On 04/01/17 15:42 +, Jonathan Wakely wrote:

FreeBSD 11 adds __cxa_thread_atexit to libc, so we should use that
instead of defining our own inferior version. This also avoids
multiple definitions of the symbol.

PR libstdc++/78968
* config.h.in: Regenerate.
* configure: Likewise.
* configure.ac: Check for __cxa_thread_atexit.
* libsupc++/atexit_thread.cc [_GLIBCXX_HAVE___CXA_THREAD_ATEXIT]:
Don't define __cxa_thread_atexit if libc provides it.

Tested powerpc64le-linux, committed to trunk.


This adds the check for freebsd cross-compilers. Tested by building
x86_64-unknown-freebsd11.0 on x86_64-uknown-linux-gnu.

Committed to trunk.


And this adjusts the testsuite so that the test which depends on
correct thread_local destruction order runs for FreeBSD.

Tested x86_64-linux and x86_64-freebsd11. Committed to trunk.


commit 9d592302e39c785eb27beb982768816ad60d6bc8
Author: Jonathan Wakely 
Date:   Mon Jan 9 11:42:21 2017 +

Define testsuite macro for correct thread_local destructors

	* testsuite/30_threads/condition_variable/members/3.cc: Use new macro
	to detect correct thread_local destructors.
	* testsuite/util/testsuite_hooks.h (CORRECT_THREAD_LOCAL_DTORS):
	Define.

diff --git a/libstdc++-v3/testsuite/30_threads/condition_variable/members/3.cc b/libstdc++-v3/testsuite/30_threads/condition_variable/members/3.cc
index 3f6885d..cedb2ab 100644
--- a/libstdc++-v3/testsuite/30_threads/condition_variable/members/3.cc
+++ b/libstdc++-v3/testsuite/30_threads/condition_variable/members/3.cc
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 
 std::mutex mx;
 std::condition_variable cv;
@@ -40,12 +41,12 @@ void func()
 {
   std::unique_lock lock{mx};
   std::notify_all_at_thread_exit(cv, std::move(lock));
-#if _GLIBCXX_HAVE___CXA_THREAD_ATEXIT_IMPL
+#if CORRECT_THREAD_LOCAL_DTORS
   // Correct order of thread_local destruction needs __cxa_thread_atexit_impl
-  static thread_local Inc inc;
-#else
-  Inc inc;
+  // or similar support from libc.
+  static thread_local
 #endif
+  Inc inc;
 }
 
 int main()
diff --git a/libstdc++-v3/testsuite/util/testsuite_hooks.h b/libstdc++-v3/testsuite/util/testsuite_hooks.h
index 6baff15..6f064a4 100644
--- a/libstdc++-v3/testsuite/util/testsuite_hooks.h
+++ b/libstdc++-v3/testsuite/util/testsuite_hooks.h
@@ -81,6 +81,12 @@
 # define THROW(X) noexcept(false)
 #endif
 
+#if _GLIBCXX_HAVE___CXA_THREAD_ATEXIT || _GLIBCXX_HAVE___CXA_THREAD_ATEXIT_IMPL
+// Correct order of thread_local destruction needs __cxa_thread_atexit_impl
+// or similar support from libc.
+# define CORRECT_THREAD_LOCAL_DTORS 1
+#endif
+
 namespace __gnu_test
 {
   // All macros are defined in GLIBCXX_CONFIGURE_TESTSUITE and imported

Re: [PATCH] improve string find algorithm

2017-01-09 Thread Jonathan Wakely


On 06/01/17 22:19 +, Aditya K wrote:

Could you try the corrected patch on your benchmarks?


For the test-case you gave there is a regression.

Benchmark    
Time   CPU      Iterations
---
Without the patch: BM_StringRegression  81 ns 81 ns    
8503740
With the patch: BM_StringRegression  109 ns  109 ns    
6346500


The real advantage is when there are fewer matches as seen in 
BM_StringFindNoMatch. The code for the benchmark  can be found in 
https://github.com/llvm-mirror/libcxx/blob/master/benchmarks/string.bench.cpp
However, I have written an independent std-benchmark that can be used just by 
exporting the CC, CXX, LD_LIBRARY_FLAGS: 
https://github.com/hiraditya/std-benchmark


I think the large improvements are worth the smaller regression, so
I'm committing the patch I sent last week.




Following are the results:
--
Without the patch:


Run on (8 X 3403.85 MHz CPU s)
2017-01-06 15:47:30
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may 
be noisy and will incur extra overhead.
Benchmark Time   CPU Iterations
---
BM_StringFindNoMatch/10   6 ns  6 ns  114499418
BM_StringFindNoMatch/64  34 ns 34 ns   20578576
BM_StringFindNoMatch/512    222 ns    222 ns    3136787
BM_StringFindNoMatch/4096  1728 ns   1729 ns 401913
BM_StringFindNoMatch/32768    13679 ns  13684 ns  50680
BM_StringFindNoMatch/131072   54570 ns  54591 ns  12506
BM_StringFindAllMatch/1   4 ns  4 ns  180640260
BM_StringFindAllMatch/8   6 ns  6 ns  119682220
BM_StringFindAllMatch/64  7 ns  7 ns   97679753
BM_StringFindAllMatch/512    19 ns 19 ns   36035174
BM_StringFindAllMatch/4096   92 ns 92 ns    7516363
BM_StringFindAllMatch/32768 849 ns    849 ns 809284
BM_StringFindAllMatch/131072   3610 ns   3611 ns 193894
BM_StringFindMatch1/1 27273 ns  27283 ns  25579
BM_StringFindMatch1/8 27289 ns  27300 ns  25516
BM_StringFindMatch1/64    27297 ns  27307 ns  25561
BM_StringFindMatch1/512   27303 ns  27314 ns  25579
BM_StringFindMatch1/4096  27488 ns  27499 ns  25366
BM_StringFindMatch1/32768 28157 ns  28168 ns  24750
BM_StringFindMatch2/1 27273 ns  27284 ns  25562
BM_StringFindMatch2/8 27296 ns  27306 ns  2
BM_StringFindMatch2/64    27312 ns  27323 ns  25549
BM_StringFindMatch2/512   27327 ns  27338 ns  25558
BM_StringFindMatch2/4096  27513 ns  27524 ns  25349
BM_StringFindMatch2/32768 28161 ns  28172 ns  24788
BM_StringRegression  81 ns 81 ns    8503740



With the patch


Run on (8 X 1071.8 MHz CPU s)
2017-01-06 16:06:29
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may 
be noisy and will incur extra overhead.
Benchmark Time   CPU Iterations
---
BM_StringFindNoMatch/10   6 ns  6 ns  121302159
BM_StringFindNoMatch/64   7 ns  7 ns  102003502
BM_StringFindNoMatch/512 15 ns 15 ns   44820639
BM_StringFindNoMatch/4096    77 ns 77 ns    9016958
BM_StringFindNoMatch/32768  555 ns    555 ns    1227219
BM_StringFindNoMatch/131072    2688 ns   2689 ns 259488
BM_StringFindAllMatch/1   8 ns  8 ns   85893410
BM_StringFindAllMatch/8   9 ns  9 ns   80811804
BM_StringFindAllMatch/64  9 ns  9 ns   74237599
BM_StringFindAllMatch/512    23 ns 23 ns   31163379
BM_StringFindAllMatch/4096   94 ns 94 ns    7317385
BM_StringFindAllMatch/32768 847 ns    848 ns 803901
BM_StringFindAllMatch/131072   3551 ns   3552 ns 196844
BM_StringFindMatch1/1  1337 ns   1337 ns 518042
BM_StringFindMatch1/8  1338 ns   1338 ns 519431
BM_StringFindMatch1/64 1340 ns   1341 ns 513974
BM_StringFindMatch1/512    1355 ns   1356 ns 511857
BM_StringFindMatch1/4096   1489 ns   1489 ns 465629
BM_StringFindMatch1/32768  2203 ns   2204 ns 316044
BM_StringFindMatch2/1  1337 ns   1338 ns 519057
BM_StringFindMatch2/8  1337 ns   1337 ns

Re: [PATCH] Fix late dwarf generated early from optimized out globals

2017-01-09 Thread Jakub Jelinek

On Mon, Jan 09, 2017 at 11:53:38AM +0100, Richard Biener wrote:
> > Ok, attached the part I bootstrapped successfully on amd64-*-freebsd12 and
> > aarch64-*-freebsd12. From the amd64 run you'll find some test results at the
> > usual place. The aarch64 run takes some more time.
> > 
> > I hope I got it right this time :)
> > What do you think?
> 
> Looks good to me with the added comment to dwarf2out_late_global_decl
> exchanged to the one on trunk.

The formatting is completely wrong.
Lines indented e.g. by 7 spaces (or tab + 1/3 space(s)),
/* comment inside of { block starting in the same column as {
(should be 2 columns to the right), && ! not aligned below VAR_P,
or indenting by 3 columns instead of 2.

Jakub

[PATCH] MIPS: Fix generation of DIV.G and MOD.G for Loongson targets.

2017-01-09 Thread Toma Tabacu

Hi,

The expand_DIVMOD function, introduced in r241660, will pick the divmod4
(or the udivmod4) pattern when it checks for the presence of hardware
div/mod instructions, which results in the generation of the old DIV
instruction.

Unfortunately, this interferes with the generation of DIV.G and MOD.G
(the div3 and mod3 patterns) for Loongson targets, which
causes test failures.

This patch prevents the selection of divmod4 and udivmod4 when
targeting Loongson by adding !ISA_HAS_DIV3 to the match condition.
ISA_HAS_DIV3 checks for the presence of the 3-operand Loongson-specific DIV.G
and MOD.G instructions.

Tested with mips-mti-elf.

This solution might be excessive, however, as it effectively forbids the
generation of the old DIV instruction for Loongson targets, which actually do
support it.

Is this OK ?

Regards,
Toma

gcc/ChangeLog:

* config/mips/mips.md (divmod4): Add check for !ISA_HAS_DIV3.
(udivmod4): Likewise.

diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
index 076cde6..f2fedcc 100644
--- a/gcc/config/mips/mips.md
+++ b/gcc/config/mips/mips.md
@@ -2851,7 +2851,7 @@
   (set (match_operand:GPR 3 "register_operand")
   (mod:GPR (match_dup 1)
(match_dup 2)))])]
-  "ISA_HAS_DIV && !TARGET_FIX_VR4120"
+  "ISA_HAS_DIV && !TARGET_FIX_VR4120 && !ISA_HAS_DIV3"
 {
   if (TARGET_MIPS16)
 {
@@ -2916,7 +2916,7 @@
   (set (match_operand:GPR 3 "register_operand")
   (umod:GPR (match_dup 1)
 (match_dup 2)))])]
-  "ISA_HAS_DIV && !TARGET_FIX_VR4120"
+  "ISA_HAS_DIV && !TARGET_FIX_VR4120 && !ISA_HAS_DIV3"
 {
   if (TARGET_MIPS16)
 {

Re: [PATCH][wwwdocs][ARM] ACLE Coprocessor Intrinsics

2017-01-09 Thread Andre Vieira (lists)

On 09/01/17 11:35, Kyrill Tkachov wrote:
> 
> On 09/01/17 11:33, Andre Vieira (lists) wrote:
>> Hi,
>>
>> This patch adds a change entry for the ACLE Coprocessor Intrinsics in
>> the ARM section.
>>
>> Is this OK?
>>
>> Cheers,
>> Andre
> 
> --- htdocs/gcc-7/changes.html21 Nov 2016 10:11:59 -1.25
> +++ htdocs/gcc-7/changes.html9 Jan 2017 10:47:11 -
> @@ -336,6 +336,11 @@
> This option is only available when generating non-pic code for
> ARMv7-M
> targets.
>   
> + 
> +  Support for the ACLE Coprocessor Intrinsics has been added. This
> enables
> +  the generation of coprocessor instructions through the use of
> intrinsics
> +  such as cdp and ldc.
> +
> 
> Better to say "such as cdb, ldc, and others".
> 
> Ok with that change.
> 
> Thanks,
> Kyrill
> 

Hi,

OK, committed attached patch.

Cheers,
Andre
? coprocessor_changes.patch
cvs diff: Diffing .
cvs diff: Diffing bin
cvs diff: Diffing cgi-bin
cvs diff: Diffing htdocs
cvs diff: Diffing htdocs/benchmarks
cvs diff: Diffing htdocs/bugs
cvs diff: Diffing htdocs/bzkanban
cvs diff: Diffing htdocs/egcs-1.0
cvs diff: Diffing htdocs/egcs-1.1
cvs diff: Diffing htdocs/fortran
cvs diff: Diffing htdocs/gcc-2.95
cvs diff: Diffing htdocs/gcc-3.0
cvs diff: Diffing htdocs/gcc-3.1
cvs diff: Diffing htdocs/gcc-3.2
cvs diff: Diffing htdocs/gcc-3.3
cvs diff: Diffing htdocs/gcc-3.4
cvs diff: Diffing htdocs/gcc-4.0
cvs diff: Diffing htdocs/gcc-4.1
cvs diff: Diffing htdocs/gcc-4.2
cvs diff: Diffing htdocs/gcc-4.3
cvs diff: Diffing htdocs/gcc-4.4
cvs diff: Diffing htdocs/gcc-4.5
cvs diff: Diffing htdocs/gcc-4.6
cvs diff: Diffing htdocs/gcc-4.7
cvs diff: Diffing htdocs/gcc-4.8
cvs diff: Diffing htdocs/gcc-4.9
cvs diff: Diffing htdocs/gcc-5
cvs diff: Diffing htdocs/gcc-6
cvs diff: Diffing htdocs/gcc-7
Index: htdocs/gcc-7/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-7/changes.html,v
retrieving revision 1.25
diff -u -r1.25 changes.html
--- htdocs/gcc-7/changes.html	21 Nov 2016 10:11:59 -	1.25
+++ htdocs/gcc-7/changes.html	9 Jan 2017 11:37:30 -
@@ -336,6 +336,11 @@
This option is only available when generating non-pic code for ARMv7-M
targets.
  
+ 
+  Support for the ACLE Coprocessor Intrinsics has been added. This enables
+  the generation of coprocessor instructions through the use of intrinsics
+  such as cdp, ldc, and others.
+

 
 AVR
cvs diff: Diffing htdocs/git
cvs diff: Diffing htdocs/img
cvs diff: Diffing htdocs/install
cvs diff: Diffing htdocs/java
cvs diff: Diffing htdocs/java/compare
cvs diff: Diffing htdocs/java/expected
cvs diff: Diffing htdocs/java/gui-compare
cvs diff: Diffing htdocs/java/gui-compare/compare
cvs diff: Diffing htdocs/java/papers
cvs diff: Diffing htdocs/java/papers/cni
cvs diff: Diffing htdocs/libstdc++
cvs diff: Diffing htdocs/news
cvs diff: Diffing htdocs/onlinedocs
cvs diff: Diffing htdocs/onlinedocs/4.6.0
cvs diff: Diffing htdocs/onlinedocs/4.6.1
cvs diff: Diffing htdocs/onlinedocs/4.6.2
cvs diff: Diffing htdocs/onlinedocs/4.6.3
cvs diff: Diffing htdocs/onlinedocs/4.6.4
cvs diff: Diffing htdocs/onlinedocs/4.7.0
cvs diff: Diffing htdocs/onlinedocs/4.7.1
cvs diff: Diffing htdocs/onlinedocs/4.7.2
cvs diff: Diffing htdocs/onlinedocs/4.7.3
cvs diff: Diffing htdocs/onlinedocs/4.7.4
cvs diff: Diffing htdocs/onlinedocs/4.8.0
cvs diff: Diffing htdocs/onlinedocs/4.8.1
cvs diff: Diffing htdocs/onlinedocs/4.8.2
cvs diff: Diffing htdocs/onlinedocs/4.8.3
cvs diff: Diffing htdocs/onlinedocs/4.8.4
cvs diff: Diffing htdocs/onlinedocs/4.8.5
cvs diff: Diffing htdocs/onlinedocs/4.9.0
cvs diff: Diffing htdocs/onlinedocs/4.9.1
cvs diff: Diffing htdocs/onlinedocs/4.9.2
cvs diff: Diffing htdocs/onlinedocs/4.9.3
cvs diff: Diffing htdocs/onlinedocs/4.9.4
cvs diff: Diffing htdocs/onlinedocs/5.1.0
cvs diff: Diffing htdocs/onlinedocs/5.2.0
cvs diff: Diffing htdocs/onlinedocs/5.3.0
cvs diff: Diffing htdocs/onlinedocs/5.4.0
cvs diff: Diffing htdocs/onlinedocs/6.1.0
cvs diff: Diffing htdocs/onlinedocs/6.2.0
cvs diff: Diffing htdocs/projects
cvs diff: Diffing htdocs/projects/bp
cvs diff: Diffing htdocs/projects/cxx-reflection
cvs diff: Diffing htdocs/projects/gomp
cvs diff: Diffing htdocs/projects/lto
cvs diff: Diffing htdocs/projects/strees
cvs diff: Diffing htdocs/projects/tree-ssa
cvs diff: Diffing htdocs/testing

Re: [PATCH][wwwdocs][ARM] ACLE Coprocessor Intrinsics

2017-01-09 Thread Kyrill Tkachov



On 09/01/17 11:33, Andre Vieira (lists) wrote:

Hi,

This patch adds a change entry for the ACLE Coprocessor Intrinsics in
the ARM section.

Is this OK?

Cheers,
Andre


--- htdocs/gcc-7/changes.html   21 Nov 2016 10:11:59 -  1.25
+++ htdocs/gcc-7/changes.html   9 Jan 2017 10:47:11 -
@@ -336,6 +336,11 @@
This option is only available when generating non-pic code for ARMv7-M
targets.
  
+ 
+  Support for the ACLE Coprocessor Intrinsics has been added. This enables
+  the generation of coprocessor instructions through the use of intrinsics
+  such as cdp and ldc.
+

Better to say "such as cdb, ldc, and others".

Ok with that change.

Thanks,
Kyrill

[PATCH][wwwdocs][ARM] ACLE Coprocessor Intrinsics

2017-01-09 Thread Andre Vieira (lists)

Hi,

This patch adds a change entry for the ACLE Coprocessor Intrinsics in
the ARM section.

Is this OK?

Cheers,
Andre
? coprocessor_changes.patch
? htdocs/gcc-7/.changes.html.swp
cvs diff: Diffing .
cvs diff: Diffing bin
cvs diff: Diffing cgi-bin
cvs diff: Diffing htdocs
cvs diff: Diffing htdocs/benchmarks
cvs diff: Diffing htdocs/bugs
cvs diff: Diffing htdocs/bzkanban
cvs diff: Diffing htdocs/egcs-1.0
cvs diff: Diffing htdocs/egcs-1.1
cvs diff: Diffing htdocs/fortran
cvs diff: Diffing htdocs/gcc-2.95
cvs diff: Diffing htdocs/gcc-3.0
cvs diff: Diffing htdocs/gcc-3.1
cvs diff: Diffing htdocs/gcc-3.2
cvs diff: Diffing htdocs/gcc-3.3
cvs diff: Diffing htdocs/gcc-3.4
cvs diff: Diffing htdocs/gcc-4.0
cvs diff: Diffing htdocs/gcc-4.1
cvs diff: Diffing htdocs/gcc-4.2
cvs diff: Diffing htdocs/gcc-4.3
cvs diff: Diffing htdocs/gcc-4.4
cvs diff: Diffing htdocs/gcc-4.5
cvs diff: Diffing htdocs/gcc-4.6
cvs diff: Diffing htdocs/gcc-4.7
cvs diff: Diffing htdocs/gcc-4.8
cvs diff: Diffing htdocs/gcc-4.9
cvs diff: Diffing htdocs/gcc-5
cvs diff: Diffing htdocs/gcc-6
cvs diff: Diffing htdocs/gcc-7
Index: htdocs/gcc-7/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-7/changes.html,v
retrieving revision 1.25
diff -u -r1.25 changes.html
--- htdocs/gcc-7/changes.html	21 Nov 2016 10:11:59 -	1.25
+++ htdocs/gcc-7/changes.html	9 Jan 2017 10:47:11 -
@@ -336,6 +336,11 @@
This option is only available when generating non-pic code for ARMv7-M
targets.
  
+ 
+  Support for the ACLE Coprocessor Intrinsics has been added. This enables
+  the generation of coprocessor instructions through the use of intrinsics
+  such as cdp and ldc.
+

 
 AVR
cvs diff: Diffing htdocs/git
cvs diff: Diffing htdocs/img
cvs diff: Diffing htdocs/install
cvs diff: Diffing htdocs/java
cvs diff: Diffing htdocs/java/compare
cvs diff: Diffing htdocs/java/expected
cvs diff: Diffing htdocs/java/gui-compare
cvs diff: Diffing htdocs/java/gui-compare/compare
cvs diff: Diffing htdocs/java/papers
cvs diff: Diffing htdocs/java/papers/cni
cvs diff: Diffing htdocs/libstdc++
cvs diff: Diffing htdocs/news
cvs diff: Diffing htdocs/onlinedocs
cvs diff: Diffing htdocs/onlinedocs/4.6.0
cvs diff: Diffing htdocs/onlinedocs/4.6.1
cvs diff: Diffing htdocs/onlinedocs/4.6.2
cvs diff: Diffing htdocs/onlinedocs/4.6.3
cvs diff: Diffing htdocs/onlinedocs/4.6.4
cvs diff: Diffing htdocs/onlinedocs/4.7.0
cvs diff: Diffing htdocs/onlinedocs/4.7.1
cvs diff: Diffing htdocs/onlinedocs/4.7.2
cvs diff: Diffing htdocs/onlinedocs/4.7.3
cvs diff: Diffing htdocs/onlinedocs/4.7.4
cvs diff: Diffing htdocs/onlinedocs/4.8.0
cvs diff: Diffing htdocs/onlinedocs/4.8.1
cvs diff: Diffing htdocs/onlinedocs/4.8.2
cvs diff: Diffing htdocs/onlinedocs/4.8.3
cvs diff: Diffing htdocs/onlinedocs/4.8.4
cvs diff: Diffing htdocs/onlinedocs/4.8.5
cvs diff: Diffing htdocs/onlinedocs/4.9.0
cvs diff: Diffing htdocs/onlinedocs/4.9.1
cvs diff: Diffing htdocs/onlinedocs/4.9.2
cvs diff: Diffing htdocs/onlinedocs/4.9.3
cvs diff: Diffing htdocs/onlinedocs/4.9.4
cvs diff: Diffing htdocs/onlinedocs/5.1.0
cvs diff: Diffing htdocs/onlinedocs/5.2.0
cvs diff: Diffing htdocs/onlinedocs/5.3.0
cvs diff: Diffing htdocs/onlinedocs/5.4.0
cvs diff: Diffing htdocs/onlinedocs/6.1.0
cvs diff: Diffing htdocs/onlinedocs/6.2.0
cvs diff: Diffing htdocs/projects
cvs diff: Diffing htdocs/projects/bp
cvs diff: Diffing htdocs/projects/cxx-reflection
cvs diff: Diffing htdocs/projects/gomp
cvs diff: Diffing htdocs/projects/lto
cvs diff: Diffing htdocs/projects/strees
cvs diff: Diffing htdocs/projects/tree-ssa
cvs diff: Diffing htdocs/testing

Re: [PATCH] Outer vs. inner loop ifcvt (PR tree-optimization/78899)

2017-01-09 Thread Jakub Jelinek

On Mon, Jan 09, 2017 at 11:08:08AM +0100, Richard Biener wrote:
> > > > There is one thing my patch doesn't do but should for efficiency, if 
> > > > loop1
> > > > (outer loop) is not successfully outer-loop vectorized, then we should 
> > > > mark
> > > > loop2 (its inner loop) as dont_vectorize if the outer loop has been
> > > > LOOP_VECTORIZED guarded.  Then the gcc.dg/gomp/pr68128-1.c change
> > > > wouldn't be actually needed.
> 
> (*)
> Ok, I don't have too many spare cycles either so can you fix (*)?  Then
> we can go with the extra versionings for GCC 7 for the moment and if any
> of us has enough time to revisit this soon we can.

Here is untested (except for the affected testcases) patch to do that,
ok if it passes bootstrap/regtest?  I'll file [8 Regression] with details
what we want to undo and do for GCC 8.

2017-01-09  Jakub Jelinek  

PR tree-optimization/78899
* tree-if-conv.c (version_loop_for_if_conversion): Instead of
returning bool return struct loop *, NULL for failure and the new
loop on success.
(versionable_outer_loop_p): Don't version outer loop if it has
dont_vectorized bit set.
(tree_if_conversion): When versioning outer loop, ensure
tree_if_conversion is performed also on the inner loop of the
non-vectorizable outer loop copy.
* tree-vectorizer.c (set_uid_loop_bbs): Formatting fix.  Fold
LOOP_VECTORIZED in inner loop of the scalar outer loop and
prevent vectorization of it.
(vectorize_loops): For outer + inner LOOP_VECTORIZED, ensure
the outer loop vectorization of the non-scalar version is attempted
before vectorization of the inner loop in scalar version.  If
outer LOOP_VECTORIZED guarded loop is not vectorized, prevent
vectorization of its inner loop.
* tree-vect-loop-manip.c (rename_variables_in_bb): If outer_loop
has 2 inner loops, rename also on edges from bb whose single pred
is outer_loop->header.  Fix typo in function comment.

* gcc.target/i386/pr78899.c: New test.
* gcc.dg/pr71077.c: New test.

--- gcc/tree-if-conv.c.jj   2017-01-06 19:34:04.052560851 +0100
+++ gcc/tree-if-conv.c  2017-01-09 11:52:59.154806369 +0100
@@ -2535,7 +2535,7 @@ combine_blocks (struct loop *loop)
loop to execute.  The vectorizer pass will fold this
internal call into either true or false.  */
 
-static bool
+static struct loop *
 version_loop_for_if_conversion (struct loop *loop)
 {
   basic_block cond_bb;
@@ -2566,7 +2566,7 @@ version_loop_for_if_conversion (struct l
 ifc_bbs[i]->aux = saved_preds[i];
 
   if (new_loop == NULL)
-return false;
+return NULL;
 
   new_loop->dont_vectorize = true;
   new_loop->force_vectorize = false;
@@ -2574,7 +2574,7 @@ version_loop_for_if_conversion (struct l
   gimple_call_set_arg (g, 1, build_int_cst (integer_type_node, new_loop->num));
   gsi_insert_before (, g, GSI_SAME_STMT);
   update_ssa (TODO_update_ssa);
-  return true;
+  return new_loop;
 }
 
 /* Return true when LOOP satisfies the follow conditions that will
@@ -2594,6 +2594,7 @@ static bool
 versionable_outer_loop_p (struct loop *loop)
 {
   if (!loop_outer (loop)
+  || loop->dont_vectorize
   || !loop->inner
   || loop->inner->next
   || !single_exit (loop)
@@ -2602,7 +2603,7 @@ versionable_outer_loop_p (struct loop *l
   || !single_pred_p (loop->latch)
   || !single_pred_p (loop->inner->latch))
 return false;
-  
+
   basic_block outer_exit = single_pred (loop->latch);
   basic_block inner_exit = single_pred (loop->inner->latch);
 
@@ -2789,7 +2790,10 @@ tree_if_conversion (struct loop *loop)
 {
   unsigned int todo = 0;
   bool aggressive_if_conv;
+  struct loop *rloop;
 
+ again:
+  rloop = NULL;
   ifc_bbs = NULL;
   any_pred_load_store = false;
   any_complicated_phi = false;
@@ -2829,8 +2833,31 @@ tree_if_conversion (struct loop *loop)
   struct loop *vloop
= (versionable_outer_loop_p (loop_outer (loop))
   ? loop_outer (loop) : loop);
-  if (!version_loop_for_if_conversion (vloop))
+  struct loop *nloop = version_loop_for_if_conversion (vloop);
+  if (nloop == NULL)
goto cleanup;
+  if (vloop != loop)
+   {
+ /* If versionable_outer_loop_p decided to version the
+outer loop, version also the inner loop of the non-vectorized
+loop copy.  So we transform:
+ loop1
+   loop2
+into:
+ if (LOOP_VECTORIZED (1, 3))
+   {
+ loop1
+   loop2
+   }
+ else
+   loop3 (copy of loop1)
+ if (LOOP_VECTORIZED (4, 5))
+   loop4 (copy of loop2)
+ else
+   loop5 (copy of loop4)  */
+ gcc_assert (nloop->inner && nloop->inner->next == NULL);
+ rloop = nloop->inner;
+   }
 }

Re: Implement -Wduplicated-branches (PR c/64279) (v3)

2017-01-09 Thread Jakub Jelinek

On Mon, Jan 09, 2017 at 10:21:47AM +0100, Marek Polacek wrote:
> +/* Callback function to determine whether an expression TP or one of its
> +   subexpressions comes from macro expansion.  Used to suppress bogus
> +   warnings.  */
> +
> +static tree
> +expr_from_macro_expansion_r (tree *tp, int *, void *)
> +{
> +  if (CAN_HAVE_LOCATION_P (*tp)
> +  && from_macro_expansion_at (EXPR_LOCATION (*tp)))
> +return integer_zero_node;
> +
> +  return NULL_TREE;
> +}

I know this is hard issue, but won't it disable the warning way too often?

Perhaps it is good enough for the initial version (GCC 7), but doesn't it stop
whenever one uses NULL in the branches, or some other trivial macros like
that?  Perhaps we want to do the analysis if there is anything from macro
expansion side-by-side on both the expressions and if you find something
from a macro expansion, then still warn if both corresponding expressions
are from the same macro expansion (either only non-function like one, or
perhaps also function-like one with the same arguments, if it is possible
to figure out those somehow)?  And perhaps it would be nice to choose
warning level, whether you want to warn only under these rules (no macros
or something smarter if implemented) vs. some certainly non-default more
aggressive mode that will just warn no matter what macros there are.

Jakub

Re: [patch] Fix wrong code for return of small aggregates on big-endian

2017-01-09 Thread Richard Biener

On Mon, Jan 9, 2017 at 11:43 AM, Eric Botcazou  wrote:
> Hi,
>
> this is a regression present on all active branches for big-endian targets
> returning small aggregate types in registers under certain circumstances and
> when optimization is enabled: when the bitfield path of store_field is taken,
> the function ends up calling store_bit_field to store the value.  Now the
> behavior of store_bit_field is awkward when the mode is BLKmode: it always
> takes its value from the lsb up to the word size but expects it left justified
> beyond it (see expmed.c:890 and below) and I missed that when I got rid of the
> stack temporaries that were originally generated in that case.
>
> Of course that's OK for little-endian targets but not for big-endian targets,
> and I have a couple of C++ testcases exposing the issue on SPARC 64-bit and a
> couple of Ada testcases exposing the issue on PowerPC with the SVR4 ABI (the
> Linux ABI is immune since it always returns on the stack); I think they cover
> all the cases in the problematic code.
>
> The attached fix was tested on a bunch of platforms: x86/Linux, x86-64/Linux,
> PowerPC/Linux, PowerPC64/Linux, PowerPC/VxWorks, Aarch64/Linux, SPARC/Solaris
> and SPARC64/Solaris with no regressions.  OK for the mainline? other branches?

Ok for trunk and branches after a short burn-in.

Thanks,
Richard.

>
> 2017-01-09  Eric Botcazou  
>
> * expr.c (store_field): In the bitfield case, if the value comes from
> a function call and is of an aggregate type returned in registers, do
> not modify the field mode; extract the value in all cases if the mode
> is BLKmode and the size is not larger than a word.
>
>
> 2017-01-09  Eric Botcazou  
>
> * g++.dg/opt/call2.C: New test.
> * g++.dg/opt/call3.C: Likewise.
> * gnat.dg/array26.adb: New test.
> * gnat.dg/array26_pkg.ad[sb]: New helper.
> * gnat.dg/array27.adb: New test.
> * gnat.dg/array27_pkg.ad[sb]: New helper.
> * gnat.dg/array28.adb: New test.
> * gnat.dg/array28_pkg.ad[sb]: New helper.
>
> --
> Eric Botcazou

Re: [PATCH] PR78991 make __gnu_cxx::__ops constructors explicit

2017-01-09 Thread Jonathan Wakely


On 09/01/17 11:05 +, Kyrill Tkachov wrote:


On 09/01/17 10:47, Jonathan Wakely wrote:

On 09/01/17 10:39 +, Kyrill Tkachov wrote:

Hi Jonathan,

On 06/01/17 12:40, Jonathan Wakely wrote:

This solves a problem when using libstdc++ with Clang, due to Clang
more eagerly instantiating constexpr function templates during
argument deduction. G++ has some shortcuts to avoid this problem, but
Clang doesn't, and it's not clear that it's strictly speaking a bug in
Clang or if it's following the standard. By making these constructors
explicit we stop them being considered by overload resolution for
copying these functors, which stops us ending up back in the
std::function SFINAE checks.

I'm also using _GLIBCXX_MOVE to turn some internal copies into moves,
because otherwise using something like std::function with 
results in a number of potentially expensive copies.

  PR libstdc++/78991
  * include/bits/predefined_ops.h (_Iter_comp_iter, _Iter_comp_val)
  (_Val_comp_iter, _Iter_equals_val, _Iter_pred, _Iter_comp_to_val)
  (_Iter_comp_to_iter, _Iter_negate): Make constructors explicit and
  move function objects.
  (__iter_comp_iter, __iter_comp_val, __val_comp_iter, __pred_iter)
  (__iter_comp_val, __iter_comp_iter, __negate): Move function objects.
  * testsuite/25_algorithms/sort/78991.cc: New test.

Tested powerpc64le-linux, committed to trunk.

I'll backport  the 'explicit' constructors (but not the _GLIBCXX_MOVE
changes) to the branches too.



I see this test fail on the GCC 5 branch on arm and aarch64 (error message 
pasted below).
Does the test need a gnu++11 guard or something on the branch?


I thought I'd changed that before committing, I'll fix it.


Thanks.
Also, I think 30_threads/thread/cons/lwg2097.cc needs something similar.


Yes. I fixed both those tests on my testing machine, but not on the
one where I committed from.

Committed to gcc-5-branch.

commit ea8019b493c1aa6b223969ec5f1153e6f381acde
Author: Jonathan Wakely 
Date:   Mon Jan 9 11:06:56 2017 +

Add missing dg-options to C++11 test.

	* testsuite/30_threads/thread/cons/lwg2097.cc: Compile with
	-std=gnu++11.

diff --git a/libstdc++-v3/testsuite/30_threads/thread/cons/lwg2097.cc b/libstdc++-v3/testsuite/30_threads/thread/cons/lwg2097.cc
index 3ec4325..d5d6288 100644
--- a/libstdc++-v3/testsuite/30_threads/thread/cons/lwg2097.cc
+++ b/libstdc++-v3/testsuite/30_threads/thread/cons/lwg2097.cc
@@ -1,3 +1,4 @@
+// { dg-options "-std=gnu++11" }
 // { dg-do compile }
 // { dg-require-cstdint "" }
 // { dg-require-gthreads "" }

Re: [PATCH] PR78991 make __gnu_cxx::__ops constructors explicit

2017-01-09 Thread Jonathan Wakely


On 09/01/17 10:47 +, Jonathan Wakely wrote:

On 09/01/17 10:39 +, Kyrill Tkachov wrote:

Hi Jonathan,

On 06/01/17 12:40, Jonathan Wakely wrote:

This solves a problem when using libstdc++ with Clang, due to Clang
more eagerly instantiating constexpr function templates during
argument deduction. G++ has some shortcuts to avoid this problem, but
Clang doesn't, and it's not clear that it's strictly speaking a bug in
Clang or if it's following the standard. By making these constructors
explicit we stop them being considered by overload resolution for
copying these functors, which stops us ending up back in the
std::function SFINAE checks.

I'm also using _GLIBCXX_MOVE to turn some internal copies into moves,
because otherwise using something like std::function with 
results in a number of potentially expensive copies.

  PR libstdc++/78991
  * include/bits/predefined_ops.h (_Iter_comp_iter, _Iter_comp_val)
  (_Val_comp_iter, _Iter_equals_val, _Iter_pred, _Iter_comp_to_val)
  (_Iter_comp_to_iter, _Iter_negate): Make constructors explicit and
  move function objects.
  (__iter_comp_iter, __iter_comp_val, __val_comp_iter, __pred_iter)
  (__iter_comp_val, __iter_comp_iter, __negate): Move function objects.
  * testsuite/25_algorithms/sort/78991.cc: New test.

Tested powerpc64le-linux, committed to trunk.

I'll backport  the 'explicit' constructors (but not the _GLIBCXX_MOVE
changes) to the branches too.



I see this test fail on the GCC 5 branch on arm and aarch64 (error message 
pasted below).
Does the test need a gnu++11 guard or something on the branch?


I thought I'd changed that before committing, I'll fix it.


Committed to gcc-5-branch.

commit 366c9e60ffa0536ab87de4e70ec807c2eb5fb66b
Author: Jonathan Wakely 
Date:   Mon Jan 9 10:54:44 2017 +

Add missing dg-options to C++14 test

	* testsuite/25_algorithms/sort/78991.cc: Compile with -std=gnu++14.

diff --git a/libstdc++-v3/testsuite/25_algorithms/sort/78991.cc b/libstdc++-v3/testsuite/25_algorithms/sort/78991.cc
index d947538..260878e 100644
--- a/libstdc++-v3/testsuite/25_algorithms/sort/78991.cc
+++ b/libstdc++-v3/testsuite/25_algorithms/sort/78991.cc
@@ -15,6 +15,7 @@
 // with this library; see the file COPYING3.  If not see
 // .
 
+// { dg-options "-std=gnu++14" }
 // { dg-do compile }
 
 // PR 78991

Re: [PATCH] PR78991 make __gnu_cxx::__ops constructors explicit

2017-01-09 Thread Kyrill Tkachov



On 09/01/17 10:47, Jonathan Wakely wrote:

On 09/01/17 10:39 +, Kyrill Tkachov wrote:

Hi Jonathan,

On 06/01/17 12:40, Jonathan Wakely wrote:

This solves a problem when using libstdc++ with Clang, due to Clang
more eagerly instantiating constexpr function templates during
argument deduction. G++ has some shortcuts to avoid this problem, but
Clang doesn't, and it's not clear that it's strictly speaking a bug in
Clang or if it's following the standard. By making these constructors
explicit we stop them being considered by overload resolution for
copying these functors, which stops us ending up back in the
std::function SFINAE checks.

I'm also using _GLIBCXX_MOVE to turn some internal copies into moves,
because otherwise using something like std::function with 
results in a number of potentially expensive copies.

   PR libstdc++/78991
   * include/bits/predefined_ops.h (_Iter_comp_iter, _Iter_comp_val)
   (_Val_comp_iter, _Iter_equals_val, _Iter_pred, _Iter_comp_to_val)
   (_Iter_comp_to_iter, _Iter_negate): Make constructors explicit and
   move function objects.
   (__iter_comp_iter, __iter_comp_val, __val_comp_iter, __pred_iter)
   (__iter_comp_val, __iter_comp_iter, __negate): Move function objects.
   * testsuite/25_algorithms/sort/78991.cc: New test.

Tested powerpc64le-linux, committed to trunk.

I'll backport  the 'explicit' constructors (but not the _GLIBCXX_MOVE
changes) to the branches too.



I see this test fail on the GCC 5 branch on arm and aarch64 (error message 
pasted below).
Does the test need a gnu++11 guard or something on the branch?


I thought I'd changed that before committing, I'll fix it.


Thanks.
Also, I think 30_threads/thread/cons/lwg2097.cc needs something similar.
I see it failing on the GCC 5 branch with:

In file included from 
$BLD/aarch64-unknown-linux-gnu/libstdc++-v3/include/thread:35:0,
 from 
$SRC/libstdc++-v3/testsuite/30_threads/thread/cons/lwg2097.cc:22:
$BLD/aarch64-unknown-linux-gnu/libstdc++-v3/include/bits/c++0x_warning.h:32:2: 
error: #error This file requires compiler and library support for the ISO C++ 
2011 standard. This support must
 be enabled with the -std=c++11 or -std=gnu++11 compiler options.
 #error This file requires compiler and library support \
  ^
$SRC/libstdc++-v3/testsuite/30_threads/thread/cons/lwg2097.cc:24:12: error: 
'std::thread' has not been declared
 using std::thread;
^
$SRC/libstdc++-v3/testsuite/30_threads/thread/cons/lwg2097.cc:25:12: error: 
'std::is_constructible' has not been declared
 using std::is_constructible;
^
$SRC/libstdc++-v3/testsuite/30_threads/thread/cons/lwg2097.cc:27:14: error: 
expected constructor, destructor, or type conversion before '(' token
 static_assert( !is_constructible::value, "" );
  ^
$SRC/libstdc++-v3/testsuite/30_threads/thread/cons/lwg2097.cc:28:14: error: 
expected constructor, destructor, or type conversion before '(' token
 static_assert( !is_constructible::value, "" );
  ^
$SRC/libstdc++-v3/testsuite/30_threads/thread/cons/lwg2097.cc:29:14: error: 
expected constructor, destructor, or type conversion before '(' token
 static_assert( !is_constructible::value, "" );
  ^

FAIL: 30_threads/thread/cons/lwg2097.cc (test for excess errors)

Re: Implement -Wduplicated-branches (PR c/64279) (v3)

2017-01-09 Thread Marek Polacek

On Mon, Jan 09, 2017 at 11:57:48AM +0100, Richard Biener wrote:
> On Mon, 9 Jan 2017, Marek Polacek wrote:
> 
> > On Thu, Jan 05, 2017 at 04:41:28PM +0100, Jakub Jelinek wrote:
> > > On Thu, Jan 05, 2017 at 04:39:40PM +0100, Marek Polacek wrote:
> > > > Coming back to this...
> > > 
> > > > > Right, after h0 == h1 is missing && operand_equal_p (thenb, elseb, 0)
> > > > > or so (the exact last operand needs to be figured out).
> > > > > OEP_ONLY_CONST is certainly wrong, we want the same VAR_DECLs to mean 
> > > > > the
> > > > > same thing.  0 is a tiny bit better, but still it will give up on 
> > > > > e.g. pure
> > > > > and other calls.  OEP_PURE_SAME is tiny bit better than that, but 
> > > > > still
> > > > > calls with the same arguments to the same function will not be 
> > > > > considered
> > > > > equal, plus likely operand_equal_p doesn't handle STATEMENT_LIST etc.
> > > > > So maybe we need another OEP_* mode for this.
> > > > 
> > > > Yea, if I add "&& operand_equal_p (thenb, elseb, 0)" then this warning 
> > > > doesn't
> > > > trigger for certain cases, such as MODIFY_EXPR, RETURN_EXPR, probably
> > > > STATEMENT_LIST and others.  So I suppose I could introduce a new OEP_ 
> > > > mode for
> > > > this (names?  OEP_EXTENDED?) and then in operand_equal_p in case 
> > > > tcc_expression
> > > > do
> > > > 
> > > >   case MODIFY_EXPR:
> > > > if (flags & OEP_EXTENDED)
> > > >   // compare LHS and RHS of both
> > > >  
> > > > ?
> > > 
> > > Yeah.  Not sure what is the best name for that.  Maybe Richi has some 
> > > clever
> > > ideas.
> > 
> > Here it is.  The changes in operand_equal_p should only trigger with the new
> > OEP_LEXICOGRAPHIC, and given the macro location issue, the warning isn't yet
> > enabled by neither -Wall nor -Wextra, so this all should be safe.
> > 
> > Bootstrapped/regtested on x86_64-linux, ok for trunk?
> 
> @@ -2722,6 +2722,9 @@ combine_comparisons (location_t loc,
> If OEP_ADDRESS_OF is set, we are actually comparing addresses of
> objects,
> not values of expressions.
>  
> +   If OEP_LEXICOGRAPHIC is set, then also handle expressions such as
> +   MODIFY_EXPR, RETURN_EXPR, as well as STATEMENT_LISTs.
> +
> 
> I'd say "also handle expressions with side-effects such as ..."
> 
> otherwise the middle-end changes look good to me - I'll defer to
> C FE maintainers for the rest.

Thanks, I'll fix it up.

Marek

Re: Implement -Wduplicated-branches (PR c/64279) (v3)

2017-01-09 Thread Richard Biener

On Mon, 9 Jan 2017, Marek Polacek wrote:

> On Thu, Jan 05, 2017 at 04:41:28PM +0100, Jakub Jelinek wrote:
> > On Thu, Jan 05, 2017 at 04:39:40PM +0100, Marek Polacek wrote:
> > > Coming back to this...
> > 
> > > > Right, after h0 == h1 is missing && operand_equal_p (thenb, elseb, 0)
> > > > or so (the exact last operand needs to be figured out).
> > > > OEP_ONLY_CONST is certainly wrong, we want the same VAR_DECLs to mean 
> > > > the
> > > > same thing.  0 is a tiny bit better, but still it will give up on e.g. 
> > > > pure
> > > > and other calls.  OEP_PURE_SAME is tiny bit better than that, but still
> > > > calls with the same arguments to the same function will not be 
> > > > considered
> > > > equal, plus likely operand_equal_p doesn't handle STATEMENT_LIST etc.
> > > > So maybe we need another OEP_* mode for this.
> > > 
> > > Yea, if I add "&& operand_equal_p (thenb, elseb, 0)" then this warning 
> > > doesn't
> > > trigger for certain cases, such as MODIFY_EXPR, RETURN_EXPR, probably
> > > STATEMENT_LIST and others.  So I suppose I could introduce a new OEP_ 
> > > mode for
> > > this (names?  OEP_EXTENDED?) and then in operand_equal_p in case 
> > > tcc_expression
> > > do
> > > 
> > >   case MODIFY_EXPR:
> > > if (flags & OEP_EXTENDED)
> > >   // compare LHS and RHS of both
> > >  
> > > ?
> > 
> > Yeah.  Not sure what is the best name for that.  Maybe Richi has some clever
> > ideas.
> 
> Here it is.  The changes in operand_equal_p should only trigger with the new
> OEP_LEXICOGRAPHIC, and given the macro location issue, the warning isn't yet
> enabled by neither -Wall nor -Wextra, so this all should be safe.
> 
> Bootstrapped/regtested on x86_64-linux, ok for trunk?

@@ -2722,6 +2722,9 @@ combine_comparisons (location_t loc,
If OEP_ADDRESS_OF is set, we are actually comparing addresses of
objects,
not values of expressions.
 
+   If OEP_LEXICOGRAPHIC is set, then also handle expressions such as
+   MODIFY_EXPR, RETURN_EXPR, as well as STATEMENT_LISTs.
+

I'd say "also handle expressions with side-effects such as ..."

otherwise the middle-end changes look good to me - I'll defer to
C FE maintainers for the rest.

Thanks,
Richard.

> 2017-01-09  Marek Polacek  
> 
>   PR c/64279
>   * c-common.h (do_warn_duplicated_branches_r): Declare.
>   * c-gimplify.c (c_genericize): Walk the function tree calling
>   do_warn_duplicated_branches_r.
>   * c-warn.c (expr_from_macro_expansion_r): New.
>   (do_warn_duplicated_branches): New.
>   (do_warn_duplicated_branches_r): New.
>   * c.opt (Wduplicated-branches): New option.
> 
>   * c-typeck.c (build_conditional_expr): Warn about duplicated branches.
> 
>   * call.c (build_conditional_expr_1): Warn about duplicated branches.
>   * semantics.c (finish_expr_stmt): Build statement using the proper
>   location.
> 
>   * doc/invoke.texi: Document -Wduplicated-branches.
>   * fold-const.c (operand_equal_p): Handle MODIFY_EXPR, INIT_EXPR,
>   COMPOUND_EXPR, PREDECREMENT_EXPR, PREINCREMENT_EXPR,
>   POSTDECREMENT_EXPR, POSTINCREMENT_EXPR, CLEANUP_POINT_EXPR, EXPR_STMT,
>   STATEMENT_LIST, and RETURN_EXPR.  For non-pure non-const functions
>   return 0 only when not OEP_LEXICOGRAPHIC.
>   (fold_build_cleanup_point_expr): Use the expression
>   location when building CLEANUP_POINT_EXPR.
>   * tree-core.h (enum operand_equal_flag): Add OEP_LEXICOGRAPHIC.
>   * tree.c (add_expr): Handle error_mark_node.
> 
>   * c-c++-common/Wduplicated-branches-1.c: New test.
>   * c-c++-common/Wduplicated-branches-10.c: New test.
>   * c-c++-common/Wduplicated-branches-11.c: New test.
>   * c-c++-common/Wduplicated-branches-12.c: New test.
>   * c-c++-common/Wduplicated-branches-2.c: New test.
>   * c-c++-common/Wduplicated-branches-3.c: New test.
>   * c-c++-common/Wduplicated-branches-4.c: New test.
>   * c-c++-common/Wduplicated-branches-5.c: New test.
>   * c-c++-common/Wduplicated-branches-6.c: New test.
>   * c-c++-common/Wduplicated-branches-7.c: New test.
>   * c-c++-common/Wduplicated-branches-8.c: New test.
>   * c-c++-common/Wduplicated-branches-9.c: New test.
>   * c-c++-common/Wimplicit-fallthrough-7.c: Coalesce dg-warning.
>   * g++.dg/cpp0x/lambda/lambda-switch.C: Move dg-warning.
>   * g++.dg/ext/builtin-object-size3.C: Likewise.
>   * g++.dg/gomp/loop-1.C: Likewise.
>   * g++.dg/warn/Wduplicated-branches1.C: New test.
>   * g++.dg/warn/Wduplicated-branches2.C: New test.
> 
> diff --git gcc/c-family/c-common.h gcc/c-family/c-common.h
> index b838869..06918db 100644
> --- gcc/c-family/c-common.h
> +++ gcc/c-family/c-common.h
> @@ -1537,6 +1537,7 @@ extern void maybe_warn_bool_compare (location_t, enum 
> tree_code, tree, tree);
>  extern bool maybe_warn_shift_overflow (location_t, tree, tree);
>  extern void warn_duplicated_cond_add_or_warn (location_t, tree, vec 
>

Re: [PATCH] Fix late dwarf generated early from optimized out globals

2017-01-09 Thread Richard Biener

On Thu, 5 Jan 2017, Andreas Tobler wrote:

> On 05.01.17 13:05, Richard Biener wrote:
> > On Wed, 4 Jan 2017, Andreas Tobler wrote:
> > 
> > > On 04.01.17 10:21, Richard Biener wrote:
> > > > On Wed, 28 Dec 2016, Andreas Tobler wrote:
> > > > 
> > > > > On 28.12.16 19:24, Richard Biener wrote:
> > > > > > On December 27, 2016 11:17:00 PM GMT+01:00, Andreas Tobler
> > > > > >  wrote:
> > > > > > > On 16.09.16 13:30, Richard Biener wrote:
> > > > > > > > On Thu, 15 Sep 2016, Richard Biener wrote:
> > > > > > > > 
> > > > > > > > > 
> > > > > > > > > This addresses sth I needed to address with the early LTO
> > > > > > > > > debug
> > > > > > > patches
> > > > > > > > > (you might now figure I'm piecemail merging stuff from that
> > > > > > > > > patch).
> > > > > > > > > 
> > > > > > > > > When the cgraph code optimizes out a global we call the
> > > > > > > late_global_decl
> > > > > > > > > debug hook to eventually add a DW_AT_const_value to its DIE
> > > > > > > > > (we
> > > > > > > don't
> > > > > > > > > really expect a location as that will be invalid after
> > > > > > > > > optimizing
> > > > > > > out
> > > > > > > > > and will be pruned).
> > > > > > > > > 
> > > > > > > > > With the early LTO debug patches I have introduced a
> > > > > > > early_dwarf_finished
> > > > > > > > > flag (mainly for consistency checking) and I figured I can use
> > > > > > > > > that
> > > > > > > to
> > > > > > > > > detect the call to the late hook during the early phase and
> > > > > > > > > provide
> > > > > > > > > the following cleaned up variant of avoiding to create
> > > > > > > > > locations
> > > > > > > that
> > > > > > > > > require later pruning (which doesn't work with emitting the
> > > > > > > > > early
> > > > > > > DIEs).
> > > > > > > > > 
> > > > > > > > > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> > > > > > > > > 
> > > > > > > > > I verified it does the correct thing for a unit like
> > > > > > > > > 
> > > > > > > > > static const int i = 2;
> > > > > > > > > 
> > > > > > > > > (but ISTR we do have at least one testcase in the testsuite as
> > > > > > > well).
> > > > > > > > > 
> > > > > > > > > Will commit if testing finishes successfully.
> > > > > > > > 
> > > > > > > > Ok, so it showed issues when merging that back to
> > > > > > > > early-LTO-debug.
> > > > > > > > Turns out in LTO we never call early_finish and thus
> > > > > > > early_dwarf_finished
> > > > > > > > was never set.  Also dwarf2out_late_global_decl itself is a
> > > > > > > > better
> > > > > > > > place to constrain generating locations.
> > > > > > > > 
> > > > > > > > The following variant is in very late stage of testing.
> > > > > > > > 
> > > > > > > > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> > > > > > > > LTO bootstrap on x86_64-unknown-linux-gnu in stage3.  LTO
> > > > > > > > bootstrap
> > > > > > > > with early-LTO-debug in stage3, bootstraped with
> > > > > > > > early-LTO-debug,
> > > > > > > > testing in progress.
> > > > > > > 
> > > > > > > Any chance to backport this commit (r240228) to 6.x?
> > > > > > > It fixes a bootstrap comparison issue on aarch64-*-freebsd*.
> > > > > > > The aarch64-*-freebsd* port is not yet merged to 6.x and 5.4.x due
> > > > > > > to
> > > > > > > the bootstrap comparison failure I faced.
> > > > > > 
> > > > > > Did you analyze the bootstrap miscompare?  I suspect the patch
> > > > > > merely
> > > > > > papers
> > > > > > over the problem.
> > > > > 
> > > > > gcc/contrib/compare-debug -p prev-gcc/ipa-icf.o gcc/ipa-icf.o
> > > > > prev-gcc/ipa-icf.o.stripped. gcc/ipa-icf.o.stripped. differ: char
> > > > > 52841,
> > > > > line
> > > > > 253
> > > > > 
> > > > > 
> > > > > The objdump -dSx diff on the non stripped object looked always more or
> > > > > less
> > > > > the same, a rodata offset which was different.
> > > > > 
> > > > > -   1448: R_AARCH64_ADD_ABS_LO12_NC .rodata+0x1d8
> > > > > +   1448: R_AARCH64_ADD_ABS_LO12_NC .rodata+0x410
> > > > 
> > > > Hmm, sounds like a constant pool entry was created by -g at a different
> > > > time (and thus offset) from regular compilation.  So yes, the patch
> > > > in question should have the affect to "fix" this.
> > > > 
> > > > Note that I later changed the fix with
> > > > 
> > > > 2016-10-20  Richard Biener  
> > > > 
> > > > * cgraphunit.c (analyze_functions): Set node->definition to
> > > > false to signal symbol removal to debug_hooks->late_global_decl.
> > > > * ipa.c (symbol_table::remove_unreachable_nodes): When not in
> > > > WPA signal symbol removal to the debuginfo machinery.
> > > > * dwarf2out.c (dwarf2out_late_global_decl): Instead of
> > > > using early_finised to guard the we're called for symbol
> > > > removal case look at the symtabs definition flag.
> > > > (gen_variable_die): Remove redundant check.
> > > > 
> > > >

Re: [PATCH/AARCH64] Add -mcpu=thunderx2t99 support

2017-01-09 Thread James Greenhalgh

On Thu, Dec 29, 2016 at 07:43:17PM -0800, Andrew Pinski wrote:
> Hi,
>   This patch adds -mcpu=thunderx2t99.  Cavium has acquired the Vulcan
> IP from Broadcom.  I am keeping the old -mcpu=vulcan as backwards
> compatible but renaming all of the structures to be based on the new
> name of the chip.  In the next few weeks, I am auditing the current
> tuning and will be posting some changes too.
> 
> OK?  Bootstrapped and tested on aarch64-linux-gnu with no regressions.
> Also tested -mcpu=native on a ThunderX2 CN99xx machine.

OK.

Thanks,
James

> 
> Thanks,
> Andrew
> 
> ChangeLog:
> * config/aarch64/aarch64-cores.def: Add thunderx2t99.  Change vulcan
> to reference thunderx2t99 for the tuning structure
> * config/aarch64/aarch64-cost-tables.h (vulcan_extra_costs): Rename to ...
> (thunderx2t99_extra_costs): This.
> * config/aarch64/aarch64-tune.md: Regenerate.
> * config/aarch64/aarch64.c (vulcan_addrcost_table): Rename to ...
> (vulcan_addrcost_table): This.
> (vulcan_regmove_cost): Rename to ...
> (thunderx2t99_regmove_cost): This.
> (vulcan_vector_cost): Rename to ...
> (thunderx2t99_vector_cost): this.
> (vulcan_branch_cost): Rename to ...
> (thunderx2t99_branch_cost): This.
> (vulcan_tunings): Rename to ...
> (thunderx2t99_tunings): This and s/vulcan/thunderx2t99 .
> * doc/invoke.texi (AARCH64/mtune): Add thunderx2t99.

Re: [PATCH] Implement P0393R3

2017-01-09 Thread Jonathan Wakely


On 08/01/17 22:49 -0800, Tim Shen wrote:

On Tue, Jan 3, 2017 at 6:17 AM, Jonathan Wakely  wrote:

On 01/01/17 04:17 -0800, Tim Shen via libstdc++ wrote:


+#define _VARIANT_RELATION_FUNCTION_TEMPLATE(__op, __name) \
+  template \
+constexpr bool operator __op(const variant<_Types...>& __lhs, \
+const variant<_Types...>& __rhs) \
+{ \
+  return __lhs._M##__name(__rhs,
std::index_sequence_for<_Types...>{}); \
+} \
+\
+  constexpr bool operator __op(monostate, monostate) noexcept \
+  { return 0 __op 0; }
+
+  _VARIANT_RELATION_FUNCTION_TEMPLATE(<, _erased_less_than)
+  _VARIANT_RELATION_FUNCTION_TEMPLATE(<=, _erased_less_equal)
+  _VARIANT_RELATION_FUNCTION_TEMPLATE(==, _erased_equal)
+  _VARIANT_RELATION_FUNCTION_TEMPLATE(!=, _erased_not_equal)
+  _VARIANT_RELATION_FUNCTION_TEMPLATE(>=, _erased_greater_than)
+  _VARIANT_RELATION_FUNCTION_TEMPLATE(>, _erased_greater)



These need double underscore prefixes.


Done.


I'm sorry, I missed that they get appended to _M to form a member
function name, so they don't need a double underscore.

But since they all have the same prefix, why not use _M_erased_##name
and just use less_than, less_equal etc. in the macro invocations?

However, the names are weird, you have >= as greater_than (not
greater_equal) and > as greater (which is inconsistent with < as
less_than).

So I'd go with:

_VARIANT_RELATION_FUNCTION_TEMPLATE(<, less)
_VARIANT_RELATION_FUNCTION_TEMPLATE(<=, less_equal)
_VARIANT_RELATION_FUNCTION_TEMPLATE(==, equal)
_VARIANT_RELATION_FUNCTION_TEMPLATE(!=, not_equal)
_VARIANT_RELATION_FUNCTION_TEMPLATE(>=, greater_equal)
_VARIANT_RELATION_FUNCTION_TEMPLATE(>, greater)


+#define _VARIANT_RELATION_FUNCTION_TEMPLATE(__op, __name) \


I think we usually use all-caps for macro arguments, so _OP and _NAME,
but it doesn't really matter.


+  template \
+   static constexpr bool \
+   (*_S##__name##_vtable[])(const variant&, const variant&) = \
+ { &__detail::__variant::__name... }; \


With the suggestions above this would change to use _S_erased_##_NAME
and &__detail::__variant::__erased_##_NAME


+  template \
+   constexpr inline bool \
+   _M##__name(const variant& __rhs, \
+std::index_sequence<__indices...>) const \
+   { \
+ auto __lhs_index = this->index(); \
+ auto __rhs_index = __rhs.index(); \
+ if (__lhs_index != __rhs_index || valueless_by_exception()) \
+   /* Intentinoal modulo addition. */ \


"Intentional" is spelled wrong, but I think simply "Modulo addition"
is clear enough that it's intentional.


+   return __lhs_index + 1 __op __rhs_index + 1; \
+ return _S##__name##_vtable<__indices...>[__lhs_index](*this, __rhs); \
}

-  template


And we'd usually use _Indices for template parameters, but this is
already inconsistent in .

The patch is OK with those naming tweaks. Thanks, and sorry for the
mixup about the underscores.

Re: [PR78365] ICE in determine_value_range, at tree-ssa-loo p-niter.c:413

2017-01-09 Thread Richard Biener

On Fri, Jan 6, 2017 at 7:00 PM, Martin Jambor  wrote:
> Hi,
>
> On Wed, Dec 14, 2016 at 01:12:11PM +0100, Richard Biener wrote:
>> On Wed, Dec 14, 2016 at 11:15 AM, Martin Jambor  wrote:
>
>> > ...
>
>> > +/* Emulate effects of unary OPERATION and/or conversion from SRC_TYPE to
>> > +   DST_TYPE on value range in SRC_VR and store it to DST_VR.  Return true 
>> > if
>> > +   the result is a range or an anti-range.  */
>> > +
>> > +static bool
>> > +ipa_vr_operation_and_type_effects (value_range *dst_vr, value_range 
>> > *src_vr,
>> > +  enum tree_code operation,
>> > +  tree dst_type, tree src_type)
>> > +{
>> > +  memset (dst_vr, 0, sizeof (*dst_vr));
>>
>> The memset is not necessary.
>
> Apparently it is.  Without it, I ended up with corrupted
> dst->vr_bitmup.  I got ICEs when I removed the memset and tracked it
> down to:
>
> (gdb) p dst_vr->equiv->first->next
> $14 = (bitmap_element *) 0x16
>
> after extract_range_from_unary_expr returns.

Ah, I see that set_value_range_to_* expect properly initialized ->equiv.

>>
>> > +  extract_range_from_unary_expr (dst_vr, operation, dst_type, src_vr, 
>> > src_type);
>> > +  if (dst_vr->type == VR_RANGE || dst_vr->type == VR_ANTI_RANGE)
>> > +return true;
>> > +  else
>> > +return false;
>> > +}
>> > +
>> >  /* Propagate value range across jump function JFUNC that is associated 
>> > with
>> > edge CS with param of callee of PARAM_TYPE and update DEST_PLATS
>> > accordingly.  */
>> > @@ -1849,7 +1866,6 @@ propagate_vr_accross_jump_function (cgraph_edge *cs,
>> > struct ipcp_param_lattices *dest_plats,
>> > tree param_type)
>> >  {
>> > -  struct ipcp_param_lattices *src_lats;
>> >ipcp_vr_lattice *dest_lat = _plats->m_value_range;
>> >
>> >if (dest_lat->bottom_p ())
>> > @@ -1862,31 +1878,23 @@ propagate_vr_accross_jump_function (cgraph_edge 
>> > *cs,
>> >
>> >if (jfunc->type == IPA_JF_PASS_THROUGH)
>> >  {
>> > -  struct ipa_node_params *caller_info = IPA_NODE_REF (cs->caller);
>> > -  int src_idx = ipa_get_jf_pass_through_formal_id (jfunc);
>> > -  src_lats = ipa_get_parm_lattices (caller_info, src_idx);
>> > +  enum tree_code operation = ipa_get_jf_pass_through_operation 
>> > (jfunc);
>> >
>> > -  if (ipa_get_jf_pass_through_operation (jfunc) == NOP_EXPR)
>> > -   return dest_lat->meet_with (src_lats->m_value_range);
>> > -  else if (param_type
>> > -  && (TREE_CODE_CLASS (ipa_get_jf_pass_through_operation 
>> > (jfunc))
>> > -  == tcc_unary))
>> > +  if (TREE_CODE_CLASS (operation) == tcc_unary)
>> > {
>> > - value_range vr;
>> > - memset (, 0, sizeof (vr));
>> > + struct ipa_node_params *caller_info = IPA_NODE_REF (cs->caller);
>> > + int src_idx = ipa_get_jf_pass_through_formal_id (jfunc);
>> >   tree operand_type = ipa_get_type (caller_info, src_idx);
>> > - enum tree_code operation = ipa_get_jf_pass_through_operation 
>> > (jfunc);
>> > + struct ipcp_param_lattices *src_lats
>> > +   = ipa_get_parm_lattices (caller_info, src_idx);
>> >
>> >   if (src_lats->m_value_range.bottom_p ())
>> > return dest_lat->set_to_bottom ();
>> > -
>> > - extract_range_from_unary_expr (,
>> > -operation,
>> > -param_type,
>> > -_lats->m_value_range.m_vr,
>> > -operand_type);
>> > - if (vr.type == VR_RANGE
>> > - || vr.type == VR_ANTI_RANGE)
>> > + value_range vr;
>> > + if (ipa_vr_operation_and_type_effects (,
>> > +
>> > _lats->m_value_range.m_vr,
>> > +operation, param_type,
>> > +operand_type))
>> > return dest_lat->meet_with ();
>> > }
>> >  }
>> > @@ -1906,8 +1914,11 @@ propagate_vr_accross_jump_function (cgraph_edge *cs,
>> > }
>> >  }
>> >
>> > -  if (jfunc->vr_known)
>> > -return dest_lat->meet_with (>m_vr);
>> > +  value_range vr;
>> > +  if (jfunc->vr_known
>> > +  && ipa_vr_operation_and_type_effects (, >m_vr, NOP_EXPR,
>> > +   param_type, 
>> > jfunc->passed_type))
>>
>> but instead of a new jfunc->passed_type you can use TREE_TYPE 
>> (jfunc->m_vr.min)
>> for example.
>
> Great, thanks a lot for this suggestion.  I have used that and removed
> the new field addition from the patch and used your suggestion
> instead.
>
>>
>> I notice that ipa_jump_func is badly laid out:
>>
>> struct GTY (()) ipa_jump_func
>> {
>>   /* Aggregate contants description.  See struct ipa_agg_jump_function and

Re: [PATCH] PR78991 make __gnu_cxx::__ops constructors explicit

2017-01-09 Thread Jonathan Wakely


On 09/01/17 10:39 +, Kyrill Tkachov wrote:

Hi Jonathan,

On 06/01/17 12:40, Jonathan Wakely wrote:

This solves a problem when using libstdc++ with Clang, due to Clang
more eagerly instantiating constexpr function templates during
argument deduction. G++ has some shortcuts to avoid this problem, but
Clang doesn't, and it's not clear that it's strictly speaking a bug in
Clang or if it's following the standard. By making these constructors
explicit we stop them being considered by overload resolution for
copying these functors, which stops us ending up back in the
std::function SFINAE checks.

I'm also using _GLIBCXX_MOVE to turn some internal copies into moves,
because otherwise using something like std::function with 
results in a number of potentially expensive copies.

   PR libstdc++/78991
   * include/bits/predefined_ops.h (_Iter_comp_iter, _Iter_comp_val)
   (_Val_comp_iter, _Iter_equals_val, _Iter_pred, _Iter_comp_to_val)
   (_Iter_comp_to_iter, _Iter_negate): Make constructors explicit and
   move function objects.
   (__iter_comp_iter, __iter_comp_val, __val_comp_iter, __pred_iter)
   (__iter_comp_val, __iter_comp_iter, __negate): Move function objects.
   * testsuite/25_algorithms/sort/78991.cc: New test.

Tested powerpc64le-linux, committed to trunk.

I'll backport  the 'explicit' constructors (but not the _GLIBCXX_MOVE
changes) to the branches too.



I see this test fail on the GCC 5 branch on arm and aarch64 (error message 
pasted below).
Does the test need a gnu++11 guard or something on the branch?


I thought I'd changed that before committing, I'll fix it.

[patch] Fix wrong code for return of small aggregates on big-endian

2017-01-09 Thread Eric Botcazou

Hi,

this is a regression present on all active branches for big-endian targets 
returning small aggregate types in registers under certain circumstances and 
when optimization is enabled: when the bitfield path of store_field is taken, 
the function ends up calling store_bit_field to store the value.  Now the 
behavior of store_bit_field is awkward when the mode is BLKmode: it always 
takes its value from the lsb up to the word size but expects it left justified 
beyond it (see expmed.c:890 and below) and I missed that when I got rid of the 
stack temporaries that were originally generated in that case.

Of course that's OK for little-endian targets but not for big-endian targets, 
and I have a couple of C++ testcases exposing the issue on SPARC 64-bit and a 
couple of Ada testcases exposing the issue on PowerPC with the SVR4 ABI (the 
Linux ABI is immune since it always returns on the stack); I think they cover 
all the cases in the problematic code.

The attached fix was tested on a bunch of platforms: x86/Linux, x86-64/Linux, 
PowerPC/Linux, PowerPC64/Linux, PowerPC/VxWorks, Aarch64/Linux, SPARC/Solaris 
and SPARC64/Solaris with no regressions.  OK for the mainline? other branches?


2017-01-09  Eric Botcazou  

* expr.c (store_field): In the bitfield case, if the value comes from
a function call and is of an aggregate type returned in registers, do
not modify the field mode; extract the value in all cases if the mode
is BLKmode and the size is not larger than a word.


2017-01-09  Eric Botcazou  

* g++.dg/opt/call2.C: New test.
* g++.dg/opt/call3.C: Likewise.
* gnat.dg/array26.adb: New test.
* gnat.dg/array26_pkg.ad[sb]: New helper.
* gnat.dg/array27.adb: New test.
* gnat.dg/array27_pkg.ad[sb]: New helper.
* gnat.dg/array28.adb: New test.
* gnat.dg/array28_pkg.ad[sb]: New helper.

-- 
Eric BotcazouIndex: expr.c
===
--- expr.c	(revision 244194)
+++ expr.c	(working copy)
@@ -6888,33 +6888,30 @@ store_field (rtx target, HOST_WIDE_INT b
   if (GET_CODE (temp) == PARALLEL)
 	{
 	  HOST_WIDE_INT size = int_size_in_bytes (TREE_TYPE (exp));
-	  rtx temp_target;
-	  if (mode == BLKmode || mode == VOIDmode)
-	mode = smallest_mode_for_size (size * BITS_PER_UNIT, MODE_INT);
-	  temp_target = gen_reg_rtx (mode);
+	  machine_mode temp_mode
+	= smallest_mode_for_size (size * BITS_PER_UNIT, MODE_INT);
+	  rtx temp_target = gen_reg_rtx (temp_mode);
 	  emit_group_store (temp_target, temp, TREE_TYPE (exp), size);
 	  temp = temp_target;
 	}
-  else if (mode == BLKmode)
+
+  /* Handle calls that return BLKmode values in registers.  */
+  else if (mode == BLKmode && REG_P (temp) && TREE_CODE (exp) == CALL_EXPR)
+	{
+	  rtx temp_target = gen_reg_rtx (GET_MODE (temp));
+	  copy_blkmode_from_reg (temp_target, temp, TREE_TYPE (exp));
+	  temp = temp_target;
+	}
+
+  /* The behavior of store_bit_field is awkward when mode is BLKmode:
+	 it always takes its value from the lsb up to the word size but
+	 expects it left justified beyond it.  At this point TEMP is left
+	 justified so extract the value in the former case.  */
+  if (mode == BLKmode && bitsize <= BITS_PER_WORD)
 	{
-	  /* Handle calls that return BLKmode values in registers.  */
-	  if (REG_P (temp) && TREE_CODE (exp) == CALL_EXPR)
-	{
-	  rtx temp_target = gen_reg_rtx (GET_MODE (temp));
-	  copy_blkmode_from_reg (temp_target, temp, TREE_TYPE (exp));
-	  temp = temp_target;
-	}
-	  else
-	{
-	  HOST_WIDE_INT size = int_size_in_bytes (TREE_TYPE (exp));
-	  rtx temp_target;
-	  mode = smallest_mode_for_size (size * BITS_PER_UNIT, MODE_INT);
-	  temp_target = gen_reg_rtx (mode);
-	  temp_target
-	= extract_bit_field (temp, size * BITS_PER_UNIT, 0, 1,
- temp_target, mode, mode, false);
-	  temp = temp_target;
-	}
+	  machine_mode temp_mode = smallest_mode_for_size (bitsize, MODE_INT);
+	  temp = extract_bit_field (temp, bitsize, 0, 1, NULL_RTX, temp_mode,
+temp_mode, false);
 	}
 
   /* Store the value in the bitfield.  */
// { dg-do run }
// { dg-options "-O" }

struct Foo
{
  Foo() : a(1), b(1), c('a') {}
  int a;
  int b;
  char c;
};

static Foo copy_foo(Foo) __attribute__((noinline, noclone));

static Foo copy_foo(Foo A)
{
  return A;
}

struct Bar : Foo
{
  Bar(Foo t) : Foo(copy_foo(t)) {}
};

Foo F;

int main (void)
{
  Bar B (F);

  if (B.a != 1 || B.b != 1 || B.c != 'a')
__builtin_abort ();

  return 0;
}
// { dg-do run }
// { dg-options "-O" }

struct Foo
{
  Foo() : a(1), c('a') {}
  short int a;
  char c;
};

static Foo copy_foo(Foo) __attribute__((noinline, noclone));

static Foo copy_foo(Foo A)
{
  return A;
}

struct Bar : Foo
{
  Bar(Foo t) : Foo(copy_foo(t)) {}
};

Foo F;

int main (void)
{
  Bar B (F);

  if (B.a != 1 || B.c != 'a')

Re: [PATCH] PR78991 make __gnu_cxx::__ops constructors explicit

2017-01-09 Thread Kyrill Tkachov


Hi Jonathan,

On 06/01/17 12:40, Jonathan Wakely wrote:

This solves a problem when using libstdc++ with Clang, due to Clang
more eagerly instantiating constexpr function templates during
argument deduction. G++ has some shortcuts to avoid this problem, but
Clang doesn't, and it's not clear that it's strictly speaking a bug in
Clang or if it's following the standard. By making these constructors
explicit we stop them being considered by overload resolution for
copying these functors, which stops us ending up back in the
std::function SFINAE checks.

I'm also using _GLIBCXX_MOVE to turn some internal copies into moves,
because otherwise using something like std::function with 
results in a number of potentially expensive copies.

PR libstdc++/78991
* include/bits/predefined_ops.h (_Iter_comp_iter, _Iter_comp_val)
(_Val_comp_iter, _Iter_equals_val, _Iter_pred, _Iter_comp_to_val)
(_Iter_comp_to_iter, _Iter_negate): Make constructors explicit and
move function objects.
(__iter_comp_iter, __iter_comp_val, __val_comp_iter, __pred_iter)
(__iter_comp_val, __iter_comp_iter, __negate): Move function objects.
* testsuite/25_algorithms/sort/78991.cc: New test.

Tested powerpc64le-linux, committed to trunk.

I'll backport  the 'explicit' constructors (but not the _GLIBCXX_MOVE
changes) to the branches too.



I see this test fail on the GCC 5 branch on arm and aarch64 (error message 
pasted below).
Does the test need a gnu++11 guard or something on the branch?

Thanks,
Kyrill

$SRC/libstdc++-v3/testsuite/25_algorithms/sort/78991.cc:28:16: warning: 
defaulted and deleted functions only available with -std=c++11 or -std=gnu++11
   function() = default;
^
$SRC/libstdc++-v3/testsuite/25_algorithms/sort/78991.cc:30:40: error: 
'result_of_t' in namespace 'std' does not name a template type
   template>
^
$SRC/libstdc++-v3/testsuite/25_algorithms/sort/78991.cc:30:51: error: expected '>' 
before '<' token
   template>
   ^
$SRC/libstdc++-v3/testsuite/25_algorithms/sort/78991.cc:31:17: error: expected 
unqualified-id before '{' token
 function(F) { }
 ^
$SRC/libstdc++-v3/testsuite/25_algorithms/sort/78991.cc: In function 'int 
main()':
$SRC/libstdc++-v3/testsuite/25_algorithms/sort/78991.cc:38:11: warning: 
extended initializer lists only available with -std=c++11 or -std=gnu++11
   int a[2]{ 2, 1 };
   ^
$SRC/libstdc++-v3/testsuite/25_algorithms/sort/78991.cc:39:29: warning: 
extended initializer lists only available with -std=c++11 or -std=gnu++11
   std::sort(a, a+2, function{});
 ^
compiler exited with status 1
output is:
$SRC/libstdc++-v3/testsuite/25_algorithms/sort/78991.cc:28:16: warning: 
defaulted and deleted functions only available with -std=c++11 or -std=gnu++11
   function() = default;
^
$SRC/libstdc++-v3/testsuite/25_algorithms/sort/78991.cc:30:40: error: 
'result_of_t' in namespace 'std' does not name a template type
   template>
^
$SRC/libstdc++-v3/testsuite/25_algorithms/sort/78991.cc:30:51: error: expected '>' 
before '<' token
   template>
   ^
$SRC/libstdc++-v3/testsuite/25_algorithms/sort/78991.cc:31:17: error: expected 
unqualified-id before '{' token
 function(F) { }
 ^
$SRC/libstdc++-v3/testsuite/25_algorithms/sort/78991.cc: In function 'int 
main()':
$SRC/libstdc++-v3/testsuite/25_algorithms/sort/78991.cc:38:11: warning: 
extended initializer lists only available with -std=c++11 or -std=gnu++11
   int a[2]{ 2, 1 };
   ^
$SRC/libstdc++-v3/testsuite/25_algorithms/sort/78991.cc:39:29: warning: 
extended initializer lists only available with -std=c++11 or -std=gnu++11
   std::sort(a, a+2, function{});
 ^

FAIL: 25_algorithms/sort/78991.cc (test for excess errors)

Re: [PATCH 6/6][ARM] Implement support for ACLE Coprocessor MCRR and MRRC intrinsics

2017-01-09 Thread Kyrill Tkachov


Hi Andre,

On 06/01/17 15:15, Kyrill Tkachov wrote:


On 06/01/17 14:58, Andre Vieira (lists) wrote:

On 05/01/17 11:11, Kyrill Tkachov wrote:

Hi Andre,

On 09/11/16 10:12, Andre Vieira (lists) wrote:

Hi,

This patch implements support for the ARM ACLE Coprocessor MCR and MRC
intrinsics. See below a table mapping the intrinsics to their respective
instructions:

+---+---+

| Intrinsic signature   |
Instruction pattern   |
+---+---+

|void __arm_mcrr(coproc, opc1, uint64_t value, CRm) |
MCRR coproc, opc1, Rt, Rt2, CRm   |
+---+---+

|void __arm_mcrr2(coproc, opc1, uint64_t value, CRm)|
MCRR2 coproc, opc1, Rt, Rt2, CRm  |
+---+---+

|uint64_t __arm_mrrc(coproc, opc1, CRm) |
MRRC coproc, opc1, Rt, Rt2, CRm   |
+---+---+

|uint64_t __arm_mrrc2(coproc, opc1, CRm)|
MRRC2 coproc, opc1, Rt, Rt2, CRm  |
+---+---+

Note that any untyped variable in the intrinsic signature is required to
be a compiler-time constant and has the type 'unsigned int'.  We do some
boundary checks for coproc:[0-15], opc1[0-7] CR*:[0-31]. If either of
these requirements are not met a diagnostic is issued.

I added a new arm_arch variable for ARMv5TE to use when deciding whether
or not the MCRR and MRCC intrinsics are available.

Is this OK for trunk?

Same as with the previous two patches the define_insns need constraints
and also I believe you'll want to rebase this patch on top of Richard's
rework of the
architecture feature bits for the ARMv5TE hunk.

Thanks,
Kyrill


Regards,
Andre

gcc/ChangeLog:
2016-11-09  Andre Vieira 

* config/arm/arm.md (): New.
(): New.
* config/arm/arm.c (arm_arch5te): New.
(arm_option_override): Set arm_arch5te.
(arm_coproc_builtin_available): Add support for mcrr, mcrr2, mrrc
and mrrc2.
* config/arm/arm-builtins.c (MCRR_QUALIFIERS): Define to...
(arm_mcrr_qualifiers): ... this. New.
(MRRC_QUALIFIERS): Define to...
(arm_mrrc_qualifiers): ... this. New.
* config/arm/arm_acle.h (__arm_mcrr, __arm_mcrr2, __arm_mrrc,
__arm_mrrc2): New.
* config/arm/arm_acle_builtins.def (mcrr, mcrr2, mrrc, mrrc2): New.
* config/arm/iterators.md (MCRRI, mcrr, MCRR): New.
(MRRCI, mrrc, MRRC): New.
* config/arm/unspecs.md (VUNSPEC_MCRR, VUNSPEC_MCRR2, VUNSPEC_MRRC,
VUNSPEC_MRRC2): New.

gcc/testsuite/ChangeLog:

2016-11-09  Andre Vieira 

* gcc.target/arm/acle/mcrr: New.
* gcc.target/arm/acle/mcrr2: New.
* gcc.target/arm/acle/mrrc: New.
* gcc.target/arm/acle/mrrc2: New.


Hi,

Reworked this patch according to comments, rebased and fixed the
availability of MCRR2/MRRC2 to only be available for ARMv6 and later.

Is this OK for trunk?


Ok.
Thanks,
Kyrill



Also, can you please propose a patch for the GCC 7 changes page mentioning this 
work?

Thanks,
Kyrill



Regards,
Andre

gcc/ChangeLog:
2017-01-xx  Andre Vieira  

   * config/arm/arm.md (): New.
   (): New.
   * config/arm/arm.c (arm_arch5te): New.
   (arm_option_override): Set arm_arch5te.
   (arm_coproc_builtin_available): Add support for mcrr, mcrr2, mrrc
   and mrrc2.
   * config/arm/arm-builtins.c (MCRR_QUALIFIERS): Define to...
   (arm_mcrr_qualifiers): ... this. New.
   (MRRC_QUALIFIERS): Define to...
   (arm_mrrc_qualifiers): ... this. New.
   * config/arm/arm_acle.h (__arm_mcrr, __arm_mcrr2, __arm_mrrc,
   __arm_mrrc2): New.
   * config/arm/arm_acle_builtins.def (mcrr, mcrr2, mrrc, mrrc2): New.
   * config/arm/iterators.md (MCRRI, mcrr, MCRR): New.
   (MRRCI, mrrc, MRRC): New.
   * config/arm/unspecs.md (VUNSPEC_MCRR, VUNSPEC_MCRR2, VUNSPEC_MRRC,
   VUNSPEC_MRRC2): New.

gcc/testsuite/ChangeLog:

2017-01-xx  Andre Vieira  

   * gcc.target/arm/acle/mcrr: New.
   * gcc.target/arm/acle/mcrr2: New.
   * gcc.target/arm/acle/mrrc: New.
   * gcc.target/arm/acle/mrrc2: New.

Re: [PATCH] Fix lto-bootstrap (PR bootstrap/79003).

2017-01-09 Thread Christophe Lyon

Hi,

On 7 January 2017 at 12:43, Richard Biener  wrote:
> On January 6, 2017 8:00:21 PM GMT+01:00, Jakub Jelinek  
> wrote:
>>On Fri, Jan 06, 2017 at 05:58:05PM +0100, Christophe Lyon wrote:
>>> > Trying now:
>>> >
>>> > 2017-01-06  Jakub Jelinek  
>>> >
>>> > * Makefile.in (CFLAGS, CPPFLAGS, LDFLAGS): Remove -fno-lto.
>>> > (NOLTO_FLAGS): New variable.
>>> > (ALL_CFLAGS): Use it.
>>> > * configure.ac (nolto_flags): New ACX_PROG_CC_WARNING_OPTS,
>>> > check for whether -fno-lto works.
>>> > * configure: Regenerated.
>>> >
>>> OK thanks for the prompt fix, I'll let you know if it doesn't work.
>>
>>The patch passed bootstrap (non- bootstrap-lto) on x86_64-linux and
>>i686-linux and I see -fno-lto being used everywhere I expected (with
>>bootstrap compiler that does support -fno-lto).
>>Ok for trunk, if it works even for Christophe?
>
> OK.
>
Thanks for fixing this over the week-end: my builds do complete again.

Christophe

> Richard.
>
>>   Jakub
>

Re: [PATCH] Outer vs. inner loop ifcvt (PR tree-optimization/78899)

2017-01-09 Thread Richard Biener

On Mon, 9 Jan 2017, Jakub Jelinek wrote:

> On Mon, Jan 09, 2017 at 10:08:24AM +0100, Richard Biener wrote:
> > > if if-conversion thinks outer loop vectorization might be successful.
> > > In this case, loop2 is if-converted.  This works well if the outer loop
> > > versioning is subsequently successful, doesn't work at all if it is
> > > unsuccessful (loop2 itself isn't LOOP_VECTORIZED guarded, so when we are
> > > vectorizing, we use loop2 itself as its scalar loop (so it will contain
> > > MASK_LOAD/MASK_STORE etc. that we then can't expand; also, as loop1 isn't
> > > vectorized, LOOP_VECTORIZED (1, 3) is folded into false and thus we
> > > effectively are vectorizing loop2 in dead code, loop3/loop4 will be used
> > > instead (loop3 is marked as dont_vectorize, so we don't try to vectorize 
> > > it,
> > > loop4 isn't, so might be vectorized, but only if no if-conversion is 
> > > needed
> > > for it (but tree-if-conversion determined it is needed)).
> > > With my patch, we have instead:
> > > if (LOOP_VECTORIZED (1, 3))
> > >   {
> > > loop1
> > >   loop2
> > >   }
> > > else
> > >   loop3 (copy of loop1)
> > > if (LOOP_VECTORIZED (4, 5))
> > >   loop4 (copy of loop2)
> > > else
> > >   loop5 (copy of loop4)
> > > loop2 and loop4 are if-converted, so either outer loop vectorization of
> > > loop1 is successful, then we use loop1/loop2 as the vectorized loop
> > > and loop3/loop5 as the corresponding scalar loop, or it is unsuccessful,
> > > then we use non-vectorized loop3, either with successfully vectorized
> > > loop4 as inner loop (loop5 is corresponding scalar_loop, for epilogues,
> > > versioning for alignment etc.), or we fail to vectorize anything and
> > > end up with scalar loop3/loop5.
> > 
> > But that causes even more versioning (plus redundant if-conversion).
> 
> Yes, one more loop (i.e. 2->3 loops versioned).
> 
> > > One option is to keep r242520 in, then the problem is that:
> > > 1) we would need to defer folding LOOP_VECTORIZED (1, 3) into false when
> > > outer loop vectorization failed, if there is still possible inner loop
> > > vectorization (not that difficult)
> > 
> > Yeah, that's something r242520 missed to address it seems.  We've
> > expected this works as expected...  heh.  Testsuite coverage is low
> > here it seems.
> 
> The insufficient testsuite coverage is what is the bigest problem I think.
> As I said, this part isn't that hard and I could do it.
> 
> > > 2) we'd need to use loop4 as the scalar_loop for the vectorization of
> > > loop2, but that loop is not adjacent to the vectorized loop, so we'd need
> > > to somehow transform all the SSA_NAMEs that might be affected by that
> > > different placement (as if all the loop3 PHIs were loop1 PHIs instead,
> > > and deal with the SSA_NAMEs set in loop4 and used outside of loop4 as
> > > if those were those in loop2 instead); this is the hard part I'm not 
> > > really
> > > enthusiastic to write
> > 
> > We use the special scalar_loop always if we have the loop_vectorized
> > guard, right?  Which is of course good.  And required for masked 
> > loads/stores.
> > 
> > I see how this looks somewhat unfortunate.
> 
> Yes, the scalar loop is used in many places, every time we need something
> that will not be vectorized, we use that (because, we don't have and don't
> want scalar MASK_LOAD/MASK_STORE, or without -ftree-loop-if-convert now also
> the COND_EXPRs all around).
> 
> > Certainly handling r242520 in a different way looks best then (w/o
> > r242520 the followup to always version loops in if-conversion shows
> > a lot of vect.exp regressions).
> > 
> > > There is one thing my patch doesn't do but should for efficiency, if loop1
> > > (outer loop) is not successfully outer-loop vectorized, then we should 
> > > mark
> > > loop2 (its inner loop) as dont_vectorize if the outer loop has been
> > > LOOP_VECTORIZED guarded.  Then the gcc.dg/gomp/pr68128-1.c change
> > > wouldn't be actually needed.

(*)

> > Yes.
> > 
> > Are you willing to try re-doing r242520?
> 
> Given the amount of time it took me to debug even this version of the patch
> (wasted more than a day on the tree-vect-loop-manip.c change, the ICEs on
> pr71077 testcase looked very cryptic), I'm afraid
> I don't know the SSA_NAME renaming/vect manipulation good enough not to
> spend a week or more on that and I probably need to spend that time on other
> PRs instead, we are over 50 P1-P3s behind e.g. GCC5 schedule (comparing
> from Jan 8th GCC5 status report).
> If you think you'd be able to handle it faster or if somebody else (Bill
> as the author of r242520) is willing to handle this part, I can help with
> the tree-vectorizer.c part.
> I think the minimum number of functions that need changing is
> slpeel_tree_duplicate_loop_to_edge_cfg
> vect_loop_versioning
> If the tree-vectorizer.c part supplies loop4 as the LOOP_VINFO_SCALAR_LOOP
> for

Re: [patch,libgomp] Make libgomp Fortran modules multilib-aware

2017-01-09 Thread FX

Given lack of review of this Fortran-specific patch for libgomp, can a Fortran 
maintainer approve it please?

FX

Index: libgomp/Makefile.am
===
--- libgomp/Makefile.am (revision 235843)
+++ libgomp/Makefile.am (working copy)
@@ -10,7 +10,7 @@ config_path = @config_path@
 search_path = $(addprefix $(top_srcdir)/config/, $(config_path)) $(top_srcdir) 
\
  $(top_srcdir)/../include
 
-fincludedir = $(libdir)/gcc/$(target_alias)/$(gcc_version)/finclude
+fincludedir = 
$(libdir)/gcc/$(target_alias)/$(gcc_version)$(MULTISUBDIR)/finclude
 libsubincludedir = $(libdir)/gcc/$(target_alias)/$(gcc_version)/include
 
 vpath % $(strip $(search_path))




> *ping*
> 
> This patch from May makes libgomp install its Fortran modules in the correct 
> multilib-aware directories, following what libgfortran does.
> 
> 
> 
> 
>> The attached patch allows libgomp to install its Fortran modules in the 
>> correct multilib-aware directories, just like libgfortran does.
>> Without it, multilib Fortran OpenMP code using the modules fails to compile 
>> because the modules are not found:
>> 
>> $ gfortran -fopenmp a.f90 
>> $ gfortran -fopenmp a.f90 -m32
>> a.f90:1:6:
>> 
>>  use omp_lib
>> 1
>> Fatal Error: Can't open module file ‘omp_lib.mod’ for reading at (1): No 
>> such file or directory
>> compilation terminated.
>> 
>> 
>> 
>> Bootstrapped and tested on x86_64-apple-darwin15. OK to commit?
>> 
>> FX
>> 
>> 
>> 
>> 
>> 
>> 
>> 2016-05-03  Francois-Xavier Coudert  
>> 
>>  PR libgomp/60670
>>  * Makefile.am: Make fincludedir multilib-aware.
>>  * Makefile.in: Regenerate.

Re: [PATCH, GCC/testsuite/ARM, ping] Skip optional_mthumb tests if GCC has a default mode

2017-01-09 Thread Thomas Preudhomme


Hi Jeff,

On 06/01/17 21:12, Jeff Law wrote:

On 01/03/2017 10:19 AM, Thomas Preudhomme wrote:

Ping?

Best regards,

Generic parts seem fine to me.  They may be a bit specific to arm right now, but
we can generalize as needed in the future.


What's too ARM specific about it? The default_mode procedure is together with 
other ARM specific procedures and the check_configured_with seems quite general 
to me. Is there something else you would like to see?


Best regards,

Thomas

[PATCH, GCC/LRA, gcc-5/6-branch] Fix PR78617: Fix conflict detection in rematerialization

2017-01-09 Thread Thomas Preudhomme


Hi,

Is it ok to backport the fix for PR78617 (incorrect conflict detection in 
rematerialization) to GCC 5 and GCC 6? The patch applies cleanly and the 
testsuite showed no regression when performed with the following configurations:


- an arm-none-eabi GCC cross-compiler targeting Cortex-M0 and Cortex-M3
- a bootstrapped arm-linux-gnueabihf GCC native compiler
- a bootstrapped x86_64-linux-gnu GCC native compiler

ChangeLog entry is as follows:

*** gcc/ChangeLog ***

2017-01-03 Thomas Preud'homme 

Backport from mainline
2016-12-07 Thomas Preud'homme 

PR rtl-optimization/78617
* lra-remat.c (do_remat): Initialize live_hard_regs from live in
registers, also setting hard registers mapped to pseudo registers.


*** gcc/testsuite/ChangeLog ***


2017-01-03 Thomas Preud'homme 

Backport from mainline
2016-12-07 Thomas Preud'homme 

PR rtl-optimization/78617
* gcc.c-torture/execute/pr78617.c: New test.


Best regards,

Thomas
diff --git a/gcc/lra-remat.c b/gcc/lra-remat.c
index 5e5d62c50b011fe53c5652a4406d711feb448885..17da91b7f2144b2eaf48ce13f547239013c6e7c3 100644
--- a/gcc/lra-remat.c
+++ b/gcc/lra-remat.c
@@ -1124,6 +1124,7 @@ update_scratch_ops (rtx_insn *remat_insn)
 static bool
 do_remat (void)
 {
+  unsigned regno;
   rtx_insn *insn;
   basic_block bb;
   bitmap_head avail_cands;
@@ -1131,12 +1132,21 @@ do_remat (void)
   bool changed_p = false;
   /* Living hard regs and hard registers of living pseudos.  */
   HARD_REG_SET live_hard_regs;
+  bitmap_iterator bi;
 
   bitmap_initialize (_cands, _obstack);
   bitmap_initialize (_cands, _obstack);
   FOR_EACH_BB_FN (bb, cfun)
 {
-  REG_SET_TO_HARD_REG_SET (live_hard_regs, df_get_live_out (bb));
+  CLEAR_HARD_REG_SET (live_hard_regs);
+  EXECUTE_IF_SET_IN_BITMAP (df_get_live_in (bb), 0, regno, bi)
+	{
+	  int hard_regno = regno < FIRST_PSEUDO_REGISTER
+			   ? regno
+			   : reg_renumber[regno];
+	  if (hard_regno >= 0)
+	SET_HARD_REG_BIT (live_hard_regs, hard_regno);
+	}
   bitmap_and (_cands, _remat_bb_data (bb)->avin_cands,
 		  _remat_bb_data (bb)->livein_cands);
   /* Activating insns are always in the same block as their corresponding
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr78617.c b/gcc/testsuite/gcc.c-torture/execute/pr78617.c
new file mode 100644
index ..89c4f6dea8cb507b963f91debb94cbe16eb1db90
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr78617.c
@@ -0,0 +1,25 @@
+int a = 0;
+int d = 1;
+int f = 1;
+
+int fn1() {
+  return a || 1 >> a;
+}
+
+int fn2(int p1, int p2) {
+  return p2 >= 2 ? p1 : p1 >> 1;
+}
+
+int fn3(int p1) {
+  return d ^ p1;
+}
+
+int fn4(int p1, int p2) {
+  return fn3(!d > fn2((f = fn1() - 1000) || p2, p1));
+}
+
+int main() {
+  if (fn4(0, 0) != 1)
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/lra-remat.c b/gcc/lra-remat.c
index 187ee3e7752d1ebe15ba8e8014620c0a94e11424..79504d4eb1a052d906a69178d847e6e618b468ec 100644
--- a/gcc/lra-remat.c
+++ b/gcc/lra-remat.c
@@ -1116,6 +1116,7 @@ update_scratch_ops (rtx_insn *remat_insn)
 static bool
 do_remat (void)
 {
+  unsigned regno;
   rtx_insn *insn;
   basic_block bb;
   bitmap_head avail_cands;
@@ -1123,12 +1124,21 @@ do_remat (void)
   bool changed_p = false;
   /* Living hard regs and hard registers of living pseudos.  */
   HARD_REG_SET live_hard_regs;
+  bitmap_iterator bi;
 
   bitmap_initialize (_cands, _obstack);
   bitmap_initialize (_cands, _obstack);
   FOR_EACH_BB_FN (bb, cfun)
 {
-  REG_SET_TO_HARD_REG_SET (live_hard_regs, df_get_live_out (bb));
+  CLEAR_HARD_REG_SET (live_hard_regs);
+  EXECUTE_IF_SET_IN_BITMAP (df_get_live_in (bb), 0, regno, bi)
+	{
+	  int hard_regno = regno < FIRST_PSEUDO_REGISTER
+			   ? regno
+			   : reg_renumber[regno];
+	  if (hard_regno >= 0)
+	SET_HARD_REG_BIT (live_hard_regs, hard_regno);
+	}
   bitmap_and (_cands, _remat_bb_data (bb)->avin_cands,
 		  _remat_bb_data (bb)->livein_cands);
   /* Activating insns are always in the same block as their corresponding
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr78617.c b/gcc/testsuite/gcc.c-torture/execute/pr78617.c
new file mode 100644
index ..89c4f6dea8cb507b963f91debb94cbe16eb1db90
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr78617.c
@@ -0,0 +1,25 @@
+int a = 0;
+int d = 1;
+int f = 1;
+
+int fn1() {
+  return a || 1 >> a;
+}
+
+int fn2(int p1, int p2) {
+  return p2 >= 2 ? p1 : p1 >> 1;
+}
+
+int fn3(int p1) {
+  return d ^ p1;
+}
+
+int fn4(int p1, int p2) {
+  return fn3(!d > fn2((f = fn1() - 1000) || p2, p1));
+}
+
+int main() {
+  if (fn4(0, 0) != 1)
+__builtin_abort ();
+  return 0;
+}

Re: [PATCH] Outer vs. inner loop ifcvt (PR tree-optimization/78899)

2017-01-09 Thread Jakub Jelinek

On Mon, Jan 09, 2017 at 10:08:24AM +0100, Richard Biener wrote:
> > if if-conversion thinks outer loop vectorization might be successful.
> > In this case, loop2 is if-converted.  This works well if the outer loop
> > versioning is subsequently successful, doesn't work at all if it is
> > unsuccessful (loop2 itself isn't LOOP_VECTORIZED guarded, so when we are
> > vectorizing, we use loop2 itself as its scalar loop (so it will contain
> > MASK_LOAD/MASK_STORE etc. that we then can't expand; also, as loop1 isn't
> > vectorized, LOOP_VECTORIZED (1, 3) is folded into false and thus we
> > effectively are vectorizing loop2 in dead code, loop3/loop4 will be used
> > instead (loop3 is marked as dont_vectorize, so we don't try to vectorize it,
> > loop4 isn't, so might be vectorized, but only if no if-conversion is needed
> > for it (but tree-if-conversion determined it is needed)).
> > With my patch, we have instead:
> >   if (LOOP_VECTORIZED (1, 3))
> > {
> >   loop1
> > loop2
> > }
> >   else
> > loop3 (copy of loop1)
> >   if (LOOP_VECTORIZED (4, 5))
> > loop4 (copy of loop2)
> >   else
> > loop5 (copy of loop4)
> > loop2 and loop4 are if-converted, so either outer loop vectorization of
> > loop1 is successful, then we use loop1/loop2 as the vectorized loop
> > and loop3/loop5 as the corresponding scalar loop, or it is unsuccessful,
> > then we use non-vectorized loop3, either with successfully vectorized
> > loop4 as inner loop (loop5 is corresponding scalar_loop, for epilogues,
> > versioning for alignment etc.), or we fail to vectorize anything and
> > end up with scalar loop3/loop5.
> 
> But that causes even more versioning (plus redundant if-conversion).

Yes, one more loop (i.e. 2->3 loops versioned).

> > One option is to keep r242520 in, then the problem is that:
> > 1) we would need to defer folding LOOP_VECTORIZED (1, 3) into false when
> > outer loop vectorization failed, if there is still possible inner loop
> > vectorization (not that difficult)
> 
> Yeah, that's something r242520 missed to address it seems.  We've
> expected this works as expected...  heh.  Testsuite coverage is low
> here it seems.

The insufficient testsuite coverage is what is the bigest problem I think.
As I said, this part isn't that hard and I could do it.

> > 2) we'd need to use loop4 as the scalar_loop for the vectorization of
> > loop2, but that loop is not adjacent to the vectorized loop, so we'd need
> > to somehow transform all the SSA_NAMEs that might be affected by that
> > different placement (as if all the loop3 PHIs were loop1 PHIs instead,
> > and deal with the SSA_NAMEs set in loop4 and used outside of loop4 as
> > if those were those in loop2 instead); this is the hard part I'm not really
> > enthusiastic to write
> 
> We use the special scalar_loop always if we have the loop_vectorized
> guard, right?  Which is of course good.  And required for masked 
> loads/stores.
> 
> I see how this looks somewhat unfortunate.

Yes, the scalar loop is used in many places, every time we need something
that will not be vectorized, we use that (because, we don't have and don't
want scalar MASK_LOAD/MASK_STORE, or without -ftree-loop-if-convert now also
the COND_EXPRs all around).

> Certainly handling r242520 in a different way looks best then (w/o
> r242520 the followup to always version loops in if-conversion shows
> a lot of vect.exp regressions).
> 
> > There is one thing my patch doesn't do but should for efficiency, if loop1
> > (outer loop) is not successfully outer-loop vectorized, then we should mark
> > loop2 (its inner loop) as dont_vectorize if the outer loop has been
> > LOOP_VECTORIZED guarded.  Then the gcc.dg/gomp/pr68128-1.c change
> > wouldn't be actually needed.
> 
> Yes.
> 
> Are you willing to try re-doing r242520?

Given the amount of time it took me to debug even this version of the patch
(wasted more than a day on the tree-vect-loop-manip.c change, the ICEs on
pr71077 testcase looked very cryptic), I'm afraid
I don't know the SSA_NAME renaming/vect manipulation good enough not to
spend a week or more on that and I probably need to spend that time on other
PRs instead, we are over 50 P1-P3s behind e.g. GCC5 schedule (comparing
from Jan 8th GCC5 status report).
If you think you'd be able to handle it faster or if somebody else (Bill
as the author of r242520) is willing to handle this part, I can help with
the tree-vectorizer.c part.
I think the minimum number of functions that need changing is
slpeel_tree_duplicate_loop_to_edge_cfg
vect_loop_versioning
If the tree-vectorizer.c part supplies loop4 as the LOOP_VINFO_SCALAR_LOOP
for loop2, then the check whether this needs to extra more complicated
handling could be whether the vectorized loop's outer loop is not equal to
scalar_loop's outer loop.

Jakub

Re: [PR tree-optimization/67955] Exploit PTA in DSE

2017-01-09 Thread Richard Biener

On Sat, Jan 7, 2017 at 7:01 PM, Jeff Law  wrote:
> On 01/05/2017 01:34 AM, Richard Biener wrote:
>>
>> On Wed, Jan 4, 2017 at 8:24 PM, Jeff Law  wrote:
>>>
>>>
>>> The more I think about this the more I'm sure we need to verify pt.null
>>> is
>>> not in the points-to set.I've taken the above testcase and added it
>>> as a
>>> negative test.  Bootstrapped, regression tested and committed to the
>>> trunk
>>> along with the other minor cleanups you pointed out.
>>
>>
>> Note disabling this for pt.null == 1 makes it pretty useless given we
>> compute
>> that conservatively to always 1 in points-to analysis (and only VRP ever
>> sets
>> it to zero).  See PTAs find_what_p_points_to.  This is because PTA does
>> not conservatively compute whether a pointer may be NULL (all bugs, I have
>> partly fixed some and have an incomplete patch to fix others -- at the
>> point
>> I looked into this we had no users of pt.null info and thus I decided the
>> extra constraints and complexity wasn't worth the
>> compile-time/memory-use).
>>
>> Without -fnon-call-exceptions removing the *0 = 2 store is IMHO ok, so we
>> only have to make sure to not break the exception case.
>
> I spent a goodly amount of time thinking about this...  I think the key
> point is whether or not removing the store is observable in a conforming
> program.
>
> Essentially if we get a non-call exception or receive a signal between the
> "dead" store and the subsequent store, then we could observe that the "dead"
> store was removed if the object being stored escapes.
>
> This seems to have larger implications than just the cases we're looking at
> (assume "a" is something in memory, of course).
>
>
> a = 1;
> 
> a = 2;
>
>
> If "a" escapes such that its value can be queried in the exception handler,
> then the exception handler would be able to observe the first store and thus
> it should not be removed.

Yes, and it won't as long as the EH is thrown internally (and thus we have
a CFG reflecting it).  When it's only externally catched we lose of course...

We'd need an Ada testcase to actually show behavior that is not conforming
to an existing language specification though.

I suspect we have a similar issue in C++ for sth like

void __attribute__((const)) foo () { throw; }

int x;
void bar ()
{
  x = 1;
  foo ();
  x = 2;
}

where foo is const but not nothrow.

> We also have to be cognizant of systems where there is memory mapped at
> location 0.  When that is true, we must check pt.null and honor it, even if
> it pessimizes code.

With -fno-delete-null-pointer-checks (that's what such systems set) PTA computes
0 as "nonlocal" and thus it won't be a singleton points-to solution.

>
>
>> For
>>
>> int foo (int *p, int b)
>> {
>>   int *q;
>>   int i = 1;
>>   if (b)
>> q = 
>>   else
>> q = (void *)0;
>>   *q = 2;
>>   i = 3;
>>   return *q;
>> }
>
> So on a system where *0 is a valid memory address, *q = 2 does not make
> anything dead, nor does i = 3 unless we were to isolate the THEN/ELSE
> blocks.
>
> On a system where *0 traps, there is no way to observe the value of "i" in
> the handler.  Thus i = 1 is a dead store.  I believe we must keep the *q = 2
> store because it can trigger a signal/exception which is itself an
> observable side effect?  Right?

But writing to 0 invokes undefined behavior which we have no obligation to
preserve (unless we make it well-defined with -fnon-call-exceptions -fexceptions
as a GCC extension).

>>
>> we remove all stores but the last store to i and the load from q (but we
>> don't
>> replace q with  here, a missed optimization if removing the other stores
>> is
>> valid).
>
> But if we remove the *q = 2 store, we remove an observable side effect, the
> trap/exception itself if we reach that statement via the ELSE path.

As said above - I don't think we have to care for C/C++ w/o
-fnon-call-exceptions.

Richard.

>
> Jeff

Re: [PR tree-optimization/71691] Fix unswitching in presence of maybe-undef SSA_NAMEs (take 2)

2017-01-09 Thread Richard Biener

On Sat, Jan 7, 2017 at 1:54 PM, Aldy Hernandez  wrote:
> On 01/04/2017 07:11 AM, Richard Biener wrote:
>>
>> On Tue, Jan 3, 2017 at 6:36 PM, Aldy Hernandez  wrote:
>>>
>>> On 12/20/2016 09:16 AM, Richard Biener wrote:


 On Fri, Dec 16, 2016 at 3:41 PM, Aldy Hernandez 
 wrote:
>
>
> Hi folks.
>
> This is a follow-up on Jeff and Richi's interaction on the
> aforementioned
> PR
> here:
>
> https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01397.html
>
> I decided to explore the idea of analyzing may-undefness on-demand,
> which
> actually looks rather cheap.
>
> BTW, I don't understand why we don't have auto_bitmap's, as we already
> have
> auto_sbitmap's.  I've implemented the former based on auto_sbitmap's
> code
> we
> already have.
>
> The attached patch fixes the bug without introducing any regressions.
>
> I also tested the patch by compiling 242 .ii files with -O3.  These
> were
> gathered from a stage1 build with -save-temps.  There is a slight time
> degradation of 4 seconds within 27 minutes of user time:
>
> tainted:26:52
> orig:   26:48
>
> This was the average aggregate time of two runs compiling all 242 .ii
> files.
> IMO, this looks reasonable.  It is after all, -O3.Is it acceptable?



 +  while (!worklist.is_empty ())
 +{
 +  name = worklist.pop ();
 +  gcc_assert (TREE_CODE (name) == SSA_NAME);
 +
 +  if (ssa_undefined_value_p (name, true))
 +   return true;
 +
 +  bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));

 it should be already set as we use visited_ssa as "was it ever on the
 worklist",
 so maybe renaming it would be a good thing as well.
>>>
>>>
>>>
>>> I don't understand what you would prefer here.
>>
>>
>> Set the bit when you put name on the worklist (and do not do that if the
>> bit is set).  Thus simply remove the above and add a bitmap_set_bit
>> for the initial name you put on the worklist.
>>

 + if (TREE_CODE (name) == SSA_NAME)
 +   {
 + /* If an SSA has already been seen, this may be a
 loop.
 +Fail conservatively.  */
 + if (bitmap_bit_p (visited_ssa, SSA_NAME_VERSION
 (name)))
 +   return false;

 so to me "conservative" is returning true, not false.
>>>
>>>
>>>
>>> OK
>>>

 + bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));
 + worklist.safe_push (name);

 but for loops we can just continue and ignore this use.  And
 bitmap_set_bit
 returns whether it set a bit, thus

 if (bitmap_set_bit (visited_ssa, SSA_NAME_VERSION
 (name)))
   worklist.safe_push (name);

 should work?
>>>
>>>
>>>
>>> Fixed.
>>>

 +  /* Check that any SSA names used to define NAME is also fully
 +defined.  */
 +  use_operand_p use_p;
 +  ssa_op_iter iter;
 +  FOR_EACH_SSA_USE_OPERAND (use_p, def, iter, SSA_OP_USE)
 +   {
 + name = USE_FROM_PTR (use_p);
 + if (TREE_CODE (name) == SSA_NAME)

 always true.

 +   {
 + /* If an SSA has already been seen, this may be a loop.
 +Fail conservatively.  */
 + if (bitmap_bit_p (visited_ssa, SSA_NAME_VERSION (name)))
 +   return false;
 + bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));
 + worklist.safe_push (name);

 See above.

 In principle the thing is sound but I'd like to be able to pass in a
 bitmap of
 known maybe-undefined/must-defined SSA names to have a cache for
 multiple invocations from within a pass (like this unswitching case).
>>>
>>>
>>>
>>> Done, though bitmaps are now calculated as part of the instantiation.
>>>

 Also once you hit defs that are in a post-dominated region of the loop
 entry
 you can treat them as not undefined (as their use invokes undefined
 behavior).  This is also how you treat function parameters (well,
 ssa_undefined_value_p does), where the call site invokes undefined
 behavior
 when passing in undefined values.  So we need an extra parameter
 specifying
 the post-dominance region.
>>>
>>>
>>>
>>> Done.
>>>

 You do not handle memory or calls conservatively which means the
 existing
 testcase only needs some obfuscation to become a problem again.  To fix
 that before /* Check that any SSA names used to define NAME is also
 fully
 defined.  */ bail out conservatively, like

if (! is_gimple_assign (def)
   ||

Re: [PATCH] Fix up vectorizable_condition for comparisons of 2 booleans (PR tree-optimization/78938)

2017-01-09 Thread Richard Biener

On Thu, 5 Jan 2017, Jakub Jelinek wrote:

> Hi!
> 
> As mentioned in the PR, while vectorizable_comparison has code to deal
> with comparison of 2 booleans by transorming those into one or two
> BIT_*_EXPR operations that work equally well on normal vectors as well
> as the AVX512 bitset masks, vectorizable_comparison lacks that and we
> ICE during expansion because of that.
> The following patch teaches vectorizable_condition to do that too.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok.

Richard.

> 2017-01-05  Jakub Jelinek  
> 
>   PR tree-optimization/78938
>   * tree-vect-stmts.c (vectorizable_condition): For non-masked COND_EXPR
>   where comp_vectype is VECTOR_BOOLEAN_TYPE_P, use
>   BIT_{NOT,XOR,AND,IOR}_EXPR on the comparison operands instead of
>   {EQ,NE,GE,GT,LE,LT}_EXPR directly inside of VEC_COND_EXPR.  Formatting
>   fixes.
> 
>   * gcc.dg/vect/pr78938.c: New test.
> 
> --- gcc/tree-vect-stmts.c.jj  2017-01-01 12:45:39.0 +0100
> +++ gcc/tree-vect-stmts.c 2017-01-05 15:54:41.075218409 +0100
> @@ -7731,7 +7731,8 @@ vectorizable_condition (gimple *stmt, gi
>  {
>tree scalar_dest = NULL_TREE;
>tree vec_dest = NULL_TREE;
> -  tree cond_expr, then_clause, else_clause;
> +  tree cond_expr, cond_expr0 = NULL_TREE, cond_expr1 = NULL_TREE;
> +  tree then_clause, else_clause;
>stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
>tree comp_vectype = NULL_TREE;
>tree vec_cond_lhs = NULL_TREE, vec_cond_rhs = NULL_TREE;
> @@ -7741,7 +7742,7 @@ vectorizable_condition (gimple *stmt, gi
>loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
>enum vect_def_type dt, dts[4];
>int ncopies;
> -  enum tree_code code;
> +  enum tree_code code, cond_code, bitop1 = NOP_EXPR, bitop2 = NOP_EXPR;
>stmt_vec_info prev_stmt_info = NULL;
>int i, j;
>bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info);
> @@ -7825,11 +7826,76 @@ vectorizable_condition (gimple *stmt, gi
>if (vec_cmp_type == NULL_TREE)
>  return false;
>  
> +  cond_code = TREE_CODE (cond_expr);
> +  if (!masked)
> +{
> +  cond_expr0 = TREE_OPERAND (cond_expr, 0);
> +  cond_expr1 = TREE_OPERAND (cond_expr, 1);
> +}
> +
> +  if (!masked && VECTOR_BOOLEAN_TYPE_P (comp_vectype))
> +{
> +  /* Boolean values may have another representation in vectors
> +  and therefore we prefer bit operations over comparison for
> +  them (which also works for scalar masks).  We store opcodes
> +  to use in bitop1 and bitop2.  Statement is vectorized as
> +  BITOP2 (rhs1 BITOP1 rhs2) or rhs1 BITOP2 (BITOP1 rhs2)
> +  depending on bitop1 and bitop2 arity.  */
> +  switch (cond_code)
> + {
> + case GT_EXPR:
> +   bitop1 = BIT_NOT_EXPR;
> +   bitop2 = BIT_AND_EXPR;
> +   break;
> + case GE_EXPR:
> +   bitop1 = BIT_NOT_EXPR;
> +   bitop2 = BIT_IOR_EXPR;
> +   break;
> + case LT_EXPR:
> +   bitop1 = BIT_NOT_EXPR;
> +   bitop2 = BIT_AND_EXPR;
> +   std::swap (cond_expr0, cond_expr1);
> +   break;
> + case LE_EXPR:
> +   bitop1 = BIT_NOT_EXPR;
> +   bitop2 = BIT_IOR_EXPR;
> +   std::swap (cond_expr0, cond_expr1);
> +   break;
> + case NE_EXPR:
> +   bitop1 = BIT_XOR_EXPR;
> +   break;
> + case EQ_EXPR:
> +   bitop1 = BIT_XOR_EXPR;
> +   bitop2 = BIT_NOT_EXPR;
> +   break;
> + default:
> +   return false;
> + }
> +  cond_code = SSA_NAME;
> +}
> +
>if (!vec_stmt)
>  {
>STMT_VINFO_TYPE (stmt_info) = condition_vec_info_type;
> +  if (bitop1 != NOP_EXPR)
> + {
> +   machine_mode mode = TYPE_MODE (comp_vectype);
> +   optab optab;
> +
> +   optab = optab_for_tree_code (bitop1, comp_vectype, optab_default);
> +   if (!optab || optab_handler (optab, mode) == CODE_FOR_nothing)
> + return false;
> +
> +   if (bitop2 != NOP_EXPR)
> + {
> +   optab = optab_for_tree_code (bitop2, comp_vectype,
> +optab_default);
> +   if (!optab || optab_handler (optab, mode) == CODE_FOR_nothing)
> + return false;
> + }
> + }
>return expand_vec_cond_expr_p (vectype, comp_vectype,
> -  TREE_CODE (cond_expr));
> +  cond_code);
>  }
>  
>/* Transform.  */
> @@ -7858,11 +7924,11 @@ vectorizable_condition (gimple *stmt, gi
> auto_vec vec_defs;
>  
> if (masked)
> -   ops.safe_push (cond_expr);
> + ops.safe_push (cond_expr);
> else
>   {
> -   ops.safe_push (TREE_OPERAND (cond_expr, 0));
> -   ops.safe_push (TREE_OPERAND (cond_expr, 1));
> +   ops.safe_push (cond_expr0);
> +   ops.safe_push (cond_expr1);
>   }
>ops.safe_push

Re: Implement -Wduplicated-branches (PR c/64279) (v3)

2017-01-09 Thread Marek Polacek

On Thu, Jan 05, 2017 at 04:41:28PM +0100, Jakub Jelinek wrote:
> On Thu, Jan 05, 2017 at 04:39:40PM +0100, Marek Polacek wrote:
> > Coming back to this...
> 
> > > Right, after h0 == h1 is missing && operand_equal_p (thenb, elseb, 0)
> > > or so (the exact last operand needs to be figured out).
> > > OEP_ONLY_CONST is certainly wrong, we want the same VAR_DECLs to mean the
> > > same thing.  0 is a tiny bit better, but still it will give up on e.g. 
> > > pure
> > > and other calls.  OEP_PURE_SAME is tiny bit better than that, but still
> > > calls with the same arguments to the same function will not be considered
> > > equal, plus likely operand_equal_p doesn't handle STATEMENT_LIST etc.
> > > So maybe we need another OEP_* mode for this.
> > 
> > Yea, if I add "&& operand_equal_p (thenb, elseb, 0)" then this warning 
> > doesn't
> > trigger for certain cases, such as MODIFY_EXPR, RETURN_EXPR, probably
> > STATEMENT_LIST and others.  So I suppose I could introduce a new OEP_ mode 
> > for
> > this (names?  OEP_EXTENDED?) and then in operand_equal_p in case 
> > tcc_expression
> > do
> > 
> >   case MODIFY_EXPR:
> > if (flags & OEP_EXTENDED)
> >   // compare LHS and RHS of both
> >  
> > ?
> 
> Yeah.  Not sure what is the best name for that.  Maybe Richi has some clever
> ideas.

Here it is.  The changes in operand_equal_p should only trigger with the new
OEP_LEXICOGRAPHIC, and given the macro location issue, the warning isn't yet
enabled by neither -Wall nor -Wextra, so this all should be safe.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2017-01-09  Marek Polacek  

PR c/64279
* c-common.h (do_warn_duplicated_branches_r): Declare.
* c-gimplify.c (c_genericize): Walk the function tree calling
do_warn_duplicated_branches_r.
* c-warn.c (expr_from_macro_expansion_r): New.
(do_warn_duplicated_branches): New.
(do_warn_duplicated_branches_r): New.
* c.opt (Wduplicated-branches): New option.

* c-typeck.c (build_conditional_expr): Warn about duplicated branches.

* call.c (build_conditional_expr_1): Warn about duplicated branches.
* semantics.c (finish_expr_stmt): Build statement using the proper
location.

* doc/invoke.texi: Document -Wduplicated-branches.
* fold-const.c (operand_equal_p): Handle MODIFY_EXPR, INIT_EXPR,
COMPOUND_EXPR, PREDECREMENT_EXPR, PREINCREMENT_EXPR,
POSTDECREMENT_EXPR, POSTINCREMENT_EXPR, CLEANUP_POINT_EXPR, EXPR_STMT,
STATEMENT_LIST, and RETURN_EXPR.  For non-pure non-const functions
return 0 only when not OEP_LEXICOGRAPHIC.
(fold_build_cleanup_point_expr): Use the expression
location when building CLEANUP_POINT_EXPR.
* tree-core.h (enum operand_equal_flag): Add OEP_LEXICOGRAPHIC.
* tree.c (add_expr): Handle error_mark_node.

* c-c++-common/Wduplicated-branches-1.c: New test.
* c-c++-common/Wduplicated-branches-10.c: New test.
* c-c++-common/Wduplicated-branches-11.c: New test.
* c-c++-common/Wduplicated-branches-12.c: New test.
* c-c++-common/Wduplicated-branches-2.c: New test.
* c-c++-common/Wduplicated-branches-3.c: New test.
* c-c++-common/Wduplicated-branches-4.c: New test.
* c-c++-common/Wduplicated-branches-5.c: New test.
* c-c++-common/Wduplicated-branches-6.c: New test.
* c-c++-common/Wduplicated-branches-7.c: New test.
* c-c++-common/Wduplicated-branches-8.c: New test.
* c-c++-common/Wduplicated-branches-9.c: New test.
* c-c++-common/Wimplicit-fallthrough-7.c: Coalesce dg-warning.
* g++.dg/cpp0x/lambda/lambda-switch.C: Move dg-warning.
* g++.dg/ext/builtin-object-size3.C: Likewise.
* g++.dg/gomp/loop-1.C: Likewise.
* g++.dg/warn/Wduplicated-branches1.C: New test.
* g++.dg/warn/Wduplicated-branches2.C: New test.

diff --git gcc/c-family/c-common.h gcc/c-family/c-common.h
index b838869..06918db 100644
--- gcc/c-family/c-common.h
+++ gcc/c-family/c-common.h
@@ -1537,6 +1537,7 @@ extern void maybe_warn_bool_compare (location_t, enum 
tree_code, tree, tree);
 extern bool maybe_warn_shift_overflow (location_t, tree, tree);
 extern void warn_duplicated_cond_add_or_warn (location_t, tree, vec **);
 extern bool diagnose_mismatched_attributes (tree, tree);
+extern tree do_warn_duplicated_branches_r (tree *, int *, void *);

 /* In c-attribs.c.  */
 extern bool attribute_takes_identifier_p (const_tree);
diff --git gcc/c-family/c-gimplify.c gcc/c-family/c-gimplify.c
index c327ca7..57edb41 100644
--- gcc/c-family/c-gimplify.c
+++ gcc/c-family/c-gimplify.c
@@ -125,6 +125,10 @@ c_genericize (tree fndecl)
 );
 }

+  if (warn_duplicated_branches)
+walk_tree_without_duplicates (_SAVED_TREE (fndecl),
+ do_warn_duplicated_branches_r, NULL);
+
   /* Dump the C-specific tree IR.  */

Re: [PATCH] Outer vs. inner loop ifcvt (PR tree-optimization/78899)

2017-01-09 Thread Richard Biener

On Mon, 9 Jan 2017, Jakub Jelinek wrote:

> On Sat, Jan 07, 2017 at 12:46:26PM +0100, Richard Biener wrote:
> > >The following patch tweaks tree-if-conv.c so that when it wants to
> > >version
> > >an outer loop, it actually transforms:
> > > loop1
> > >   loop2
> > >into:
> > > if (LOOP_VECTORIZED (1, 3))
> > >   {
> > > loop1
> > >   loop2
> > >   }
> > > else
> > >   loop3 (copy of loop1)
> > > if (LOOP_VECTORIZED (4, 5))
> > >   loop4 (copy of loop2)
> > > else
> > >   loop5 (copy of loop4)
> > 
> > Huh.  Why isn't the else case equal to the if case for the vectorizer?  
> > That is, we have the inner loop if-converted And thus for the if case 
> > either outer or inner loop vectorization should be possible.
> > 
> > So - why doesn't it work that way?
> 
> The patch is a consequence of the r242520 changes:
> http://gcc.gnu.org/ml/gcc-patches/2016-11/msg01541.html
> Previously, we had
>   loop1
> loop2
> transformed into:
>   loop1
> if (LOOP_VECTORIZED (2, 3))
>   loop2
> else
>   loop3 (copy of loop2)
> (if actually we find if-conversion of loop2 desirable, there are masked
> loads/stores etc.).  loop2 is if-converted after the versioning.
> This works well if the inner loop is vectorized, doesn't work at all
> (prevents it) outer loop vectorization.
> Now, with the r242520, it is transformed into:
>   if (LOOP_VECTORIZED (1, 3))
> {
>   loop1
> loop2
> }
>   else
> loop3 (copy of loop1)
>   loop4 (copy of loop2)
> if if-conversion thinks outer loop vectorization might be successful.
> In this case, loop2 is if-converted.  This works well if the outer loop
> versioning is subsequently successful, doesn't work at all if it is
> unsuccessful (loop2 itself isn't LOOP_VECTORIZED guarded, so when we are
> vectorizing, we use loop2 itself as its scalar loop (so it will contain
> MASK_LOAD/MASK_STORE etc. that we then can't expand; also, as loop1 isn't
> vectorized, LOOP_VECTORIZED (1, 3) is folded into false and thus we
> effectively are vectorizing loop2 in dead code, loop3/loop4 will be used
> instead (loop3 is marked as dont_vectorize, so we don't try to vectorize it,
> loop4 isn't, so might be vectorized, but only if no if-conversion is needed
> for it (but tree-if-conversion determined it is needed)).
> With my patch, we have instead:
> if (LOOP_VECTORIZED (1, 3))
>   {
> loop1
>   loop2
>   }
> else
>   loop3 (copy of loop1)
> if (LOOP_VECTORIZED (4, 5))
>   loop4 (copy of loop2)
> else
>   loop5 (copy of loop4)
> loop2 and loop4 are if-converted, so either outer loop vectorization of
> loop1 is successful, then we use loop1/loop2 as the vectorized loop
> and loop3/loop5 as the corresponding scalar loop, or it is unsuccessful,
> then we use non-vectorized loop3, either with successfully vectorized
> loop4 as inner loop (loop5 is corresponding scalar_loop, for epilogues,
> versioning for alignment etc.), or we fail to vectorize anything and
> end up with scalar loop3/loop5.

But that causes even more versioning (plus redundant if-conversion).

> Without the patch I've posted, there are some remaining options, but those
> will mean big amount of work in the loop manipulation code etc.
> 
> One option is to keep r242520 in, then the problem is that:
> 1) we would need to defer folding LOOP_VECTORIZED (1, 3) into false when
> outer loop vectorization failed, if there is still possible inner loop
> vectorization (not that difficult)

Yeah, that's something r242520 missed to address it seems.  We've
expected this works as expected...  heh.  Testsuite coverage is low
here it seems.

> 2) we'd need to use loop4 as the scalar_loop for the vectorization of
> loop2, but that loop is not adjacent to the vectorized loop, so we'd need
> to somehow transform all the SSA_NAMEs that might be affected by that
> different placement (as if all the loop3 PHIs were loop1 PHIs instead,
> and deal with the SSA_NAMEs set in loop4 and used outside of loop4 as
> if those were those in loop2 instead); this is the hard part I'm not really
> enthusiastic to write

We use the special scalar_loop always if we have the loop_vectorized
guard, right?  Which is of course good.  And required for masked 
loads/stores.

I see how this looks somewhat unfortunate.

> Another option is to revert r242520, and then do something for the outer loop
> vectorization.  Right now we expect certain fixed form (5 basic blocks in
> the outer loop, lots of assumptions about the cfg of that, dunno where
> everywhere it is hardcoded).  We'd need to allow also 7+ basic block form,
> where one of the extra loops is just if LOOP_VECTORIZED (x, y), then there
> is

Re: [PATCH] Outer vs. inner loop ifcvt (PR tree-optimization/78899)

2017-01-09 Thread Jakub Jelinek

On Sat, Jan 07, 2017 at 12:46:26PM +0100, Richard Biener wrote:
> >The following patch tweaks tree-if-conv.c so that when it wants to
> >version
> >an outer loop, it actually transforms:
> >   loop1
> > loop2
> >into:
> >   if (LOOP_VECTORIZED (1, 3))
> > {
> >   loop1
> > loop2
> > }
> >   else
> > loop3 (copy of loop1)
> >   if (LOOP_VECTORIZED (4, 5))
> > loop4 (copy of loop2)
> >   else
> > loop5 (copy of loop4)
> 
> Huh.  Why isn't the else case equal to the if case for the vectorizer?  That 
> is, we have the inner loop if-converted And thus for the if case either outer 
> or inner loop vectorization should be possible.
> 
> So - why doesn't it work that way?

The patch is a consequence of the r242520 changes:
http://gcc.gnu.org/ml/gcc-patches/2016-11/msg01541.html
Previously, we had
loop1
  loop2
transformed into:
loop1
  if (LOOP_VECTORIZED (2, 3))
loop2
  else
loop3 (copy of loop2)
(if actually we find if-conversion of loop2 desirable, there are masked
loads/stores etc.).  loop2 is if-converted after the versioning.
This works well if the inner loop is vectorized, doesn't work at all
(prevents it) outer loop vectorization.
Now, with the r242520, it is transformed into:
if (LOOP_VECTORIZED (1, 3))
  {
loop1
  loop2
  }
else
  loop3 (copy of loop1)
loop4 (copy of loop2)
if if-conversion thinks outer loop vectorization might be successful.
In this case, loop2 is if-converted.  This works well if the outer loop
versioning is subsequently successful, doesn't work at all if it is
unsuccessful (loop2 itself isn't LOOP_VECTORIZED guarded, so when we are
vectorizing, we use loop2 itself as its scalar loop (so it will contain
MASK_LOAD/MASK_STORE etc. that we then can't expand; also, as loop1 isn't
vectorized, LOOP_VECTORIZED (1, 3) is folded into false and thus we
effectively are vectorizing loop2 in dead code, loop3/loop4 will be used
instead (loop3 is marked as dont_vectorize, so we don't try to vectorize it,
loop4 isn't, so might be vectorized, but only if no if-conversion is needed
for it (but tree-if-conversion determined it is needed)).
With my patch, we have instead:
  if (LOOP_VECTORIZED (1, 3))
{
  loop1
loop2
}
  else
loop3 (copy of loop1)
  if (LOOP_VECTORIZED (4, 5))
loop4 (copy of loop2)
  else
loop5 (copy of loop4)
loop2 and loop4 are if-converted, so either outer loop vectorization of
loop1 is successful, then we use loop1/loop2 as the vectorized loop
and loop3/loop5 as the corresponding scalar loop, or it is unsuccessful,
then we use non-vectorized loop3, either with successfully vectorized
loop4 as inner loop (loop5 is corresponding scalar_loop, for epilogues,
versioning for alignment etc.), or we fail to vectorize anything and
end up with scalar loop3/loop5.

Without the patch I've posted, there are some remaining options, but those
will mean big amount of work in the loop manipulation code etc.

One option is to keep r242520 in, then the problem is that:
1) we would need to defer folding LOOP_VECTORIZED (1, 3) into false when
outer loop vectorization failed, if there is still possible inner loop
vectorization (not that difficult)
2) we'd need to use loop4 as the scalar_loop for the vectorization of
loop2, but that loop is not adjacent to the vectorized loop, so we'd need
to somehow transform all the SSA_NAMEs that might be affected by that
different placement (as if all the loop3 PHIs were loop1 PHIs instead,
and deal with the SSA_NAMEs set in loop4 and used outside of loop4 as
if those were those in loop2 instead); this is the hard part I'm not really
enthusiastic to write

Another option is to revert r242520, and then do something for the outer loop
vectorization.  Right now we expect certain fixed form (5 basic blocks in
the outer loop, lots of assumptions about the cfg of that, dunno where
everywhere it is hardcoded).  We'd need to allow also 7+ basic block form,
where one of the extra loops is just if LOOP_VECTORIZED (x, y), then there
is if-converted single basic block loop x, and then perhaps many basic
blocks loop y.  Lots of the vectorization code expects ->inner to be THE
inner loop, that would no longer be the case, it would be either the scalar
or vector loop, etc.

To me my patch at least for GCC7 looks like far less work than both other
option would require.  I know it has the drawback that compared to the other
options there is more loop copying that - for this regard, the option to
revert r242520 and deal with it would be the most effective, only a single
loop is versioned, with r242520 and without my

96 matches

Mail list logo