Simplify contrib/gcc_update wrt. Java

2017-01-16 Thread Gerald Pfeifer
I committed this straightforward simplification, and while I was 
at it also adjusted the copyright dates.

Gerald

2017-01-17  Gerald Pfeifer  

* gcc_update: Remove entries related to GCJ and libgcj.
Complete copyright years and adjust my e-mail address.

Index: gcc_update
===
--- gcc_update  (revision 244516)
+++ gcc_update  (working copy)
@@ -4,9 +4,9 @@
 # repository, with an emphasis on treating generated files correctly, so
 # that autoconf, gperf et al are not required for the ``end'' user.
 #
-# (C) 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2009, 2010,
-# 2011 Free Software Foundation
-# Originally by Gerald Pfeifer , August 1998.
+# (C) 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2009, 
+# 2010, 2011, 2012, 2013, 2014, 2015, 2017 Free Software Foundation
+# Originally by Gerald Pfeifer , August 1998.
 #
 # This script is Free Software, and it can be copied, distributed and
 # modified as defined in the GNU General Public License.  A copy of
@@ -97,7 +97,6 @@
 gcc/config/tilepro/mul-tables.c: gcc/config/tilepro/gen-mul-tables.cc
 # And then, language-specific files
 gcc/cp/cfns.h: gcc/cp/cfns.gperf
-gcc/java/keyword.h: gcc/java/keyword.gperf
 # testsuite
 # Without this, _Pragma3.c can have a false negative.
 gcc/testsuite/gcc.dg/cpp/_Pragma3.c: gcc/testsuite/gcc.dg/cpp/mi1c.h
@@ -131,13 +130,6 @@
 libquadmath/aclocal.m4: libquadmath/configure.ac libquadmath/acinclude.m4
 libquadmath/Makefile.in: libquadmath/Makefile.am libquadmath/configure.ac 
libgfortran/aclocal.m4
 libgfortran/configure: libgfortran/configure.ac libgfortran/aclocal.m4
-libjava/aclocal.m4: libjava/configure.ac
-libjava/Makefile.in: libjava/Makefile.am libjava/configure.ac 
libjava/aclocal.m4
-libjava/configure: libjava/configure.ac libjava/aclocal.m4
-libjava/libltdl/aclocal.m4: libjava/libltdl/configure.ac 
libjava/libltdl/acinclude.m4
-libjava/libltdl/Makefile.in: libjava/libltdl/Makefile.am 
libjava/libltdl/configure.ac libjava/libltdl/aclocal.m4
-libjava/libltdl/configure: libjava/libltdl/configure.ac 
libjava/libltdl/aclocal.m4
-libjava/libltdl/config-h.in: libjava/libltdl/configure.ac 
libjava/libltdl/aclocal.m4
 libcpp/aclocal.m4: libcpp/configure.ac
 libcpp/Makefile.in: libcpp/configure.ac libcpp/aclocal.m4
 libcpp/configure: libcpp/configure.ac libcpp/aclocal.m4


Re: [PATCH] avoid calling memset et al. with excessively large sizes (PR 79095)

2017-01-16 Thread Jakub Jelinek
On Mon, Jan 16, 2017 at 05:06:40PM -0700, Martin Sebor wrote:
> The test case submitted in bug 79095 - [7 regression] spurious
> stringop-overflow warning shows that GCC optimizes some loops
> into calls to memset with size arguments in excess of the object
> size limit.  Since such calls will unavoidably lead to a buffer
> overflow and memory corruption the attached patch detects them
> and replaces them with a trap.  That both prevents the buffer
> overflow and eliminates the warning.

I fear this is going to break various 32-bit database programs and similar
that mmap say 3GB of RAM and then work on that memory chunk as contiguous.
Some things don't work too well in that case (pointer differences), but it
is unlikely they would be using those, while your patch actively breaks it
even for loops that can be transformed into memset (memcpy of course isn't a
problem, because you need some virtual address space to copy it from).

And as written in the PR, IMNSHO the warning should not be enabled by
default at its current verboseness and false positive rate.

Jakub


Re: [PATCH] Tree-level fix for PR 69526

2017-01-16 Thread Robin Dapp
Ping.

To put it shortly, I'm not sure how to differentiate between:

example range of a: [3,3]
(ulong)(a + UINT_MAX) + 1 --> (ulong)(a) + (ulong)(-1 + 1), sign extend

example range of a: [0,0]
(ulong)(a + UINT_MAX) + 1 --> (ulong)(a) + (ulong)(UINT_MAX + 1), no
sign extend

In this case, there is an "ok" overflow in the first example's range and
no overflow in the second, but I don't think this suffices as criterion.
As said, your proposed canonicalization (" unsigned X plus CST to
unsigned X minus CST' if CST' has a smaller absolute value than CST)
might help here, but I'm unsure how to do it currently.



[java,i386] Remove LIBGCJ_SONAME for cygwin and mingw32

2017-01-16 Thread Gerald Pfeifer
With libgcj gone, no need to define LIBGCJ_SONAME any more in any
of the ports.

Applied.

Gerald

2017-01-17  Gerald Pfeifer  

* config/i386/cygwin.h (LIBGCJ_SONAME): Remove.
* config/i386/mingw32.h (LIBGCJ_SONAME): Remove.

Index: config/i386/cygwin.h
===
--- config/i386/cygwin.h(revision 244516)
+++ config/i386/cygwin.h(working copy)
@@ -153,6 +153,3 @@
 #endif
 #define LIBGCC_SONAME "cyggcc_s" LIBGCC_EH_EXTN "-1.dll"
 
-/* We should find a way to not have to update this manually.  */
-#define LIBGCJ_SONAME "cyggcj" /*LIBGCC_EH_EXTN*/ "-16.dll"
-
Index: config/i386/mingw32.h
===
--- config/i386/mingw32.h   (revision 244516)
+++ config/i386/mingw32.h   (working copy)
@@ -252,5 +252,3 @@
 #endif
 #define LIBGCC_SONAME "libgcc_s" LIBGCC_EH_EXTN "-1.dll"
 
-/* We should find a way to not have to update this manually.  */
-#define LIBGCJ_SONAME "libgcj" /*LIBGCC_EH_EXTN*/ "-16.dll"


Re: [PATCH 2/6] RISC-V Port: gcc

2017-01-16 Thread Palmer Dabbelt
On Sat, 14 Jan 2017 02:05:27 PST (-0800), mer...@debian.org wrote:
> Palmer Dabbelt wrote:
>
>> diff --git a/gcc/config/riscv/linux.h b/gcc/config/riscv/linux.h
>> new file mode 100644
>> index 000..045f6cc
>> --- /dev/null
>> +++ b/gcc/config/riscv/linux.h
>> [...]
>>  +#define GLIBC_DYNAMIC_LINKER "/lib" XLEN_SPEC "/" ABI_SPEC "/ld.so.1"
>
> [with XLEN_SPEC being either 32 or 64 and ABI_SPEC being one of
>  ilp32, ilp32f, ilp32d, lp64, lp64f, lp64d]
>
> Hello everybody,
>
> I am not fully happy with the way the dynamic linker path (which
> gets embedded into every Linux executable built by gcc and
> therefore cannot be changed later) is defined here.  The dynamic
> linker path must be unique over all platforms for which a Linux
> port exists to make multiarch installations (i.e. having
> dynamically linked binaries for multiple architectures/ABIs in
> the same root filesystem) possible.  The path specifier as cited
> above contains nothing that makes the linker path inherently
> specific to RISC-V.  While there is AFAIK no other architecture
> that currently uses exactly this specific linker path model with
> the ABI specifier as a separate subdirectory (instead of encoding
> it into the filename), so that there technically isn't a naming
> conflict, I think that we should follow the convention of the
> other "modern" Linux architectures, which all include the
> architecture name in their linker path:
>
>   * arm64:/lib/ld-linux-aarch64.so.1
>   * armhf:/lib/ld-linux-armhf.so.3
>   * ia64: /lib/ld-linux-ia64.so.2
>   * mips n64: /lib64/ld-linux-mipsn8.so.1
>   * nios2:/lib/ld-linux-nios2.so.1
>   * x86_64:   /lib64/ld-linux-x86-64.so.2
>
> So the actual ld.so binary should be called something like
> "ld-linux-rv.so.1" instead of just "ld.so.1". With everything
> else staying the same, that would give us a dynamic linker path
> along the lines of "/lib64/lp64f/ld-linux-rv.so.1" for an RV64G
> system.
>
> Changing the linker path is of course an incompatible change, but
> RISC-V gcc support as submitted upstream now already incoporates
> a number of incompatible changes to the previous development
> versions (including a different dynamic linker path than older
> riscv-gcc versions), so moving from the previous development
> versions to upstream results in incompatible binaries anyway,
> therefore IMHO now is the time to get these things sorted out
> once for all.

I'm perfectly fine with this approach.  We have a policy of not maintaining ABI
compatibility until code is upstream and released (so we'll be ABI compatible
in binutils when 2.28 comes out soon) because we want to fix these sorts of
issues the right way rather than having being stuck with something bad forever.

Just to be clear, the paths you'd like would look exactly like

  rv32-ilp32: /lib32/ilp32/ld-linux-rv.so.1
  rv64-lp64d: /lib64/lp64d/ld-linux-rv.so.1

?  If so, that should be a pretty straight-forward change.  I'll incorporate it
into our v2 patchset.  I'd also be OK with something like
"/lib64/lp64d/ld-linux-rv64imafd-lp64d.so.1", if for some reason that's better
(it looks a bit more like the other architectures to me).  I'm really OK with
pretty much anything here, so feel free to offer suggestions -- otherwise I'll
just go with what's listed above.

Thanks!


Re: [PATCH 4/6] RISC-V Port: libsanitizer

2017-01-16 Thread Palmer Dabbelt
On Thu, 12 Jan 2017 15:39:54 PST (-0800), jos...@codesourcery.com wrote:
> Have these changes been sent upstream?  Although at the present
> development stage applying selected changes might be better than a bulk
> merge from upstream libsanitizer, they should still go upstream so they
> aren't a local patch at the time of the next merge.

Sorry, I didn't know there was an upstream to this.  I went to send this
upstream to the LLVM compiler-rt repo, but I think this port is actually
broken: it turns out we hadn't built it in a while (which is why the double-&&
wasn't noticed, it must have come from a rebase) and it looks like we don't
quite have a sane set of Linux headers right now so we don't quite match up
with the SANITIZER_USES_CANONICAL_LINUX_SYSCALLS macro any more.

I think it'd be best if we just drop this for now and wait for our Linux port
to stabilize a bit (which we're planning on doing after the GCC code reviews).
There's a bit too much stuff in flight for me to be able to keep track of it
all right now.

Sorry!

Also, thanks for finding the "&& &&" bug -- I hadn't noticed we weren't
compiling libsanitizer any more.


[patch committed] [SH] Fix PR target/78633

2017-01-16 Thread Kaz Kojima
Hi,

I've applied the quick fix below for PR target/78633 which results
a build failure on the target.  Tested on sh4-unknown-linux-gnu.

Regards,
kaz
--
2017-01-17  Kaz Kojima  

PR target/78633
* config/sh/sh.md (cmpeqsi_t+1): Call copy_rtx to avoid invalid
RTL sharing.

diff --git a/gcc/config/sh/sh.md b/gcc/config/sh/sh.md
index c6956a0..2645fca 100644
--- a/gcc/config/sh/sh.md
+++ b/gcc/config/sh/sh.md
@@ -858,7 +858,8 @@
 operands of the tstsi_t insn, which is generally the case.  */
   if (dump_file)
fprintf (dump_file, "cmpeqsi_t: replacing with tstsi_t\n");
-  emit_insn (gen_tstsi_t (XEXP (op.set_src, 0), XEXP (op.set_src, 1)));
+  emit_insn (gen_tstsi_t (copy_rtx (XEXP (op.set_src, 0)),
+ copy_rtx (XEXP (op.set_src, 1;
   DONE;
 }
 


Re: PR79066, non-PIC code generated for powerpc glibc with -fpic

2017-01-16 Thread Alan Modra
On Mon, Jan 16, 2017 at 01:49:36PM -0600, Segher Boessenkool wrote:
> On Mon, Jan 16, 2017 at 03:50:01PM +1030, Alan Modra wrote:
> > > > > Okay for trunk if there is nothing unexpected.  Thanks!
> > > > 
> > > > I guess I should at least build glibc.
> > > 
> > > Yes exactly, something big that uses pic -- it is pretty obvious it won't
> > > change anything for non-pic.
> > 
> > glibc built fine without elf_high/elf_low with no regressions in its
> > testsuite.
> 
> I meant comparing the generated code, sorry.

libc.so had differences in debug sections due to libgcc source paths
being different, but other than that there were no differences.

> > However, there is a problem in rs6000_emit_allocate_stack
> > -fstack-limit-symbol=SYMBOL code.  This now might ICE if someone tries
> > to use the option with -fpic/PIC.  I reckon the option combination to
> > be little used, so it shouldn't hurt to disable -fstack-limit-symbol
> > for PIC.  (We were generating non-PIC for the trap, so we probably
> > would have gotten a complaint about text relocs in shared libraries.)
> > 
> > This revised patch has been bootstrapped and regression tested as
> > before, and tested with glibc too.  OK?
> > 
> > PR target/79066
> > * config/rs6000/rs6000.md (elf_high, elf_low): Disable when pic.
> > * config/rs6000/rs6000.c (rs6000_emit_allocate_stack): Don't allow
> > symbolic stack limit when pic.
> > testsuite/
> > * gcc.target/powerpc/pr79066.c: New.
> > 
> > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> > index 11394b2..2dd6bbe 100644
> > --- a/gcc/config/rs6000/rs6000.c
> > +++ b/gcc/config/rs6000/rs6000.c
> > @@ -27668,7 +27668,8 @@ rs6000_emit_allocate_stack (HOST_WIDE_INT size, rtx 
> > copy_reg, int copy_off)
> > }
> >else if (GET_CODE (stack_limit_rtx) == SYMBOL_REF
> >&& TARGET_32BIT
> > -  && DEFAULT_ABI == ABI_V4)
> > +  && DEFAULT_ABI == ABI_V4
> > +  && !flag_pic)
> > {
> >   rtx toload = gen_rtx_CONST (VOIDmode,
> >   gen_rtx_PLUS (Pmode,
> 
> Please at least make it sorry() for ABI_V4 && flag_pic.  Or what does it
> result in with the patch as-is?

I think we're OK as is.  A warning is emitted "stack limit expression
is not supported".

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH] Make multiple_target.c aware of LTO (PR lto/66295)

2017-01-16 Thread Jan Hubicka
> Hello.
> 
> Not being expert in multi_target area, however it consists of 2 passes. The 
> first
> one (ipa_target_clone) is responsible for creation of multiple targets for 
> functions
> decorated with __attribute__((target_clones("xxx"))). I guess the pass should 
> be
> called just in LGEN phase and consecutive clone materialization takes care of 
> these
> clones. I'm also adding lto test-case.

It is declared as SIMPLE_IPA_PASS. Then it should either run early (i.e. be in
the queue after pass_ipa_lower_emutls (to run at compile stage) or before 
pass_ipa_pta
(to run at lgen stage).

If it really needs to run in the middle of IPA pass queue, then it should be
declared as IPA_PASS and run at WPA only.

Honza
> 
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
> mvc test-cases work find on x86_64-linux-gnu.
> 
> Ready to be installed?
> Martin

> >From 7ec7045680e10838c43b2713a4fa34b205ba5004 Mon Sep 17 00:00:00 2001
> From: marxin 
> Date: Fri, 13 Jan 2017 15:26:45 +0100
> Subject: [PATCH] Make multiple_target.c aware of LTO (PR lto/66295)
> 
> gcc/ChangeLog:
> 
> 2017-01-13  Martin Liska  
> 
>   PR lto/66295
>   * multiple_target.c (pass_target_clone::gate): Run the pass
>   just in LGEN (in LTO mode).
> 
> gcc/testsuite/ChangeLog:
> 
> 2017-01-13  Martin Liska  
> 
>   PR lto/66295
>   * gcc.target/i386/mvc9.c: New test.
> ---
>  gcc/multiple_target.c|  2 +-
>  gcc/testsuite/gcc.target/i386/mvc9.c | 29 +
>  2 files changed, 30 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/mvc9.c
> 
> diff --git a/gcc/multiple_target.c b/gcc/multiple_target.c
> index 5be3980db20..3df1e297122 100644
> --- a/gcc/multiple_target.c
> +++ b/gcc/multiple_target.c
> @@ -378,7 +378,7 @@ public:
>  bool
>  pass_target_clone::gate (function *)
>  {
> -  return true;
> +  return !flag_wpa && !flag_ltrans;
>  }
>  
>  } // anon namespace
> diff --git a/gcc/testsuite/gcc.target/i386/mvc9.c 
> b/gcc/testsuite/gcc.target/i386/mvc9.c
> new file mode 100644
> index 000..d510b6c18b4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/mvc9.c
> @@ -0,0 +1,29 @@
> +/* { dg-do run } */
> +/* { dg-require-ifunc "" } */
> +/* { dg-require-effective-target lto } */
> +/* { dg-options "-flto" } */
> +
> +__attribute__((target_clones("avx","arch=slm","arch=core-avx2","default")))
> +int
> +foo ()
> +{
> +  return -2;
> +}
> +
> +int
> +bar ()
> +{
> +  return 2;
> +}
> +
> +int
> +main ()
> +{
> +  int r = 0;
> +  r += bar ();
> +  r += foo ();
> +  r += bar ();
> +  r += foo ();
> +  r += bar ();
> +  return r - 2;
> +}
> -- 
> 2.11.0
> 



Re: [PATCH] fix integer overflow bugs in gimple-ssa-sprintf.c (PR 78608)

2017-01-16 Thread Martin Sebor

If the FIXME was a future thing, then this is OK with the nits fixed. If
the FIXME was a marker for something you intended to address now and
just forgot, then we either need another iteration or a follow-up patch
depending on the severity of the FIXME in your  mind.


As we discussed privately, I went ahead and committed just a minimal
patch to resolve the bug in r244511.  Most of the rest of the changes
in the patch have either already been made to resolve other bugs or
are a part of the bigger patch for bug 78703 that's under review.

I wanted to resolve the bug without making merging the bigger patch
harder than it needs to be.

Thanks
Martin


[PATCH] avoid calling memset et al. with excessively large sizes (PR 79095)

2017-01-16 Thread Martin Sebor

The test case submitted in bug 79095 - [7 regression] spurious
stringop-overflow warning shows that GCC optimizes some loops
into calls to memset with size arguments in excess of the object
size limit.  Since such calls will unavoidably lead to a buffer
overflow and memory corruption the attached patch detects them
and replaces them with a trap.  That both prevents the buffer
overflow and eliminates the warning.

Martin
PR c++/79095 - [7 regression] spurious stringop-overflow warning

gcc/ChangeLog:

	PR c++/79095
	* tree-loop-distribution.c (maybe_emit_trap): New function.
	(generate_memset_builtin): Call it.
	(generate_memcpy_builtin): Same.

gcc/testsuite/ChangeLog:

	PR c++/79095
	* g++.dg/pr79095.C: New test.

Index: gcc/testsuite/g++.dg/pr79095.C
===
--- gcc/testsuite/g++.dg/pr79095.C	(revision 0)
+++ gcc/testsuite/g++.dg/pr79095.C	(working copy)
@@ -0,0 +1,41 @@
+/* PR c++/79095 - spurious stringop-overflow warning
+   { dg-do compile }
+   { dg-options "-O3 -Wall -fdump-tree-optimized" } */
+
+typedef long unsigned int size_t;
+
+inline void
+fill (int *p, size_t n, int)
+{
+  while (n--)
+*p++ = 0;
+}
+
+struct B
+{
+  int* p0, *p1, *p2;
+
+  size_t size () const {
+return size_t (p1 - p0);
+  }
+
+  void resize (size_t n) {
+if (n > size())
+  append (n - size());
+  }
+
+  void append (size_t n)
+  {
+if (size_t (p2 - p1) >= n) {
+  fill (p1, n, 0);
+}
+  }
+};
+
+void foo (B )
+{
+  b.resize (b.size () - 1);
+}
+
+
+/* { dg-final { scan-tree-dump-not "memset" "optimized" } } */
Index: gcc/tree-loop-distribution.c
===
--- gcc/tree-loop-distribution.c	(revision 244508)
+++ gcc/tree-loop-distribution.c	(working copy)
@@ -796,6 +796,49 @@ const_with_all_bytes_same (tree val)
   return buf[0];
 }
 
+/* If NBYTES is greater than SSIZE_MAX emit a trap and return true.
+   Otherwise return false.  FNCODE identifies the built-in function
+   being generated.  */
+
+static bool
+maybe_emit_trap (gimple_stmt_iterator gsi, tree nbytes,
+		 built_in_function fncode)
+{
+  if (TREE_CODE (nbytes) != INTEGER_CST
+  || tree_int_cst_le (nbytes, TYPE_MAX_VALUE (ssizetype)))
+return false;
+
+  tree fn = builtin_decl_implicit (BUILT_IN_TRAP);
+  gimple *fn_call = gimple_build_call (fn, 0);
+  gsi_insert_after (, fn_call, GSI_CONTINUE_LINKING);
+  split_block (gimple_bb (fn_call), fn_call);
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+{
+  const char *fname = 0;
+
+  switch (fncode)
+	{
+	case BUILT_IN_MEMCPY:
+	  fname = "memcpy";
+	  break;
+	case BUILT_IN_MEMMOVE:
+	  fname = "memove";
+	  break;
+	case BUILT_IN_MEMSET:
+	  fname = "memset";
+	  break;
+	default:
+	  gcc_unreachable ();
+	}
+
+  fprintf (dump_file, "generated trap for an out-of-bounds %s "
+	   "with %wu size", fname, tree_to_uhwi (nbytes));
+}
+
+  return true;
+}
+
 /* Generate a call to memset for PARTITION in LOOP.  */
 
 static void
@@ -817,6 +860,13 @@ generate_memset_builtin (struct loop *loop, partit
  partition->plus_one);
   nb_bytes = force_gimple_operand_gsi (, nb_bytes, true, NULL_TREE,
    false, GSI_CONTINUE_LINKING);
+
+  /* If the number of bytes is in excess of SSIZE_MAX avoid generating
+ the memset call that would certainly overflow and emit a trap
+ instead.  */
+  if (maybe_emit_trap (gsi, nb_bytes, BUILT_IN_MEMSET))
+return;
+
   mem = build_addr_arg_loc (loc, partition->main_dr, nb_bytes);
   mem = force_gimple_operand_gsi (, mem, true, NULL_TREE,
   false, GSI_CONTINUE_LINKING);
@@ -873,6 +923,7 @@ generate_memcpy_builtin (struct loop *loop, partit
  partition->plus_one);
   nb_bytes = force_gimple_operand_gsi (, nb_bytes, true, NULL_TREE,
    false, GSI_CONTINUE_LINKING);
+
   dest = build_addr_arg_loc (loc, partition->main_dr, nb_bytes);
   src = build_addr_arg_loc (loc, partition->secondary_dr, nb_bytes);
   if (partition->kind == PKIND_MEMCPY
@@ -881,6 +932,12 @@ generate_memcpy_builtin (struct loop *loop, partit
   else
 kind = BUILT_IN_MEMMOVE;
 
+  /* If the number of bytes is in excess of SSIZE_MAX avoid generating
+ the mem(cpy|move) call that would certainly overflow and instead
+ emit a trap.  */
+  if (maybe_emit_trap (gsi, nb_bytes, kind))
+return;
+
   dest = force_gimple_operand_gsi (, dest, true, NULL_TREE,
    false, GSI_CONTINUE_LINKING);
   src = force_gimple_operand_gsi (, src, true, NULL_TREE,


RE: [PATCH, MIPS] Target flag and build option to disable indexed memory OPs.

2017-01-16 Thread Moore, Catherine


> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Matthew Fortune
> Sent: Monday, January 16, 2017 11:25 AM
> To: Doug Gilmore ; gcc-
> patc...@gcc.gnu.org
> Cc: Moore, Catherine 
> Subject: RE: [PATCH, MIPS] Target flag and build option to disable
> indexed memory OPs.
> 
> Doug Gilmore 
> > I recently bisected PR78176 to problems introduced with r21650.
> >
> > Given the short time until the release, we would like to provide a
> > target flag and build option to avoid the bug until we are able to
> > resolve the problem with the commit.  Note that as Matthew Fortune
> has
> > mentioned in the PR:
> >
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78176#c5
> >
> > the problem could also be addressed by updates to the Linux kernel
> since
> > the problem is only exposed by running MIPS 32-bit binaries on 64-
> bit
> > kernels.
> >
> > Bootstrapped on X86_64, regression tested on X86_64 and MIPS.
> >
> > OK to commit?
> 
> Given this is a generic reference to indexed load/store and the issue
> could
> affect any indexed operation then I think it needs to include all of the
> following as well:
> 
> /* ISA has lwxs instruction (load w/scaled index address.  */
> #define ISA_HAS_LWXS((TARGET_SMARTMIPS ||
> TARGET_MICROMIPS) \
>  && !TARGET_MIPS16)
> 
> /* ISA has lbx, lbux, lhx, lhx, lhux, lwx, lwux, or ldx instruction. */
> #define ISA_HAS_LBX (TARGET_OCTEON2)
> #define ISA_HAS_LBUX(ISA_HAS_DSP || TARGET_OCTEON2)
> #define ISA_HAS_LHX (ISA_HAS_DSP || TARGET_OCTEON2)
> #define ISA_HAS_LHUX(TARGET_OCTEON2)
> #define ISA_HAS_LWX (ISA_HAS_DSP || TARGET_OCTEON2)
> #define ISA_HAS_LWUX(TARGET_OCTEON2 && TARGET_64BIT)
> #define ISA_HAS_LDX ((ISA_HAS_DSP || TARGET_OCTEON2) \
>  && TARGET_64BIT)
> 
> The DSP LBUX/LHX/LWX/LDX intrinsics will also need a new AVAIL
> predicate
> to disable them. The snag is that some DSP code will fail to compile if it
> uses the DSP load intrinsics directly.
> 
> I see no way of avoiding that. Therefore, distributions that use
> --without-indexed-load-store will have to cope with some potential
> DSP
> fallout if they enable DSP at all.
> 
> @Catherine: I'd like your input here if possible as I advocated this
> approach, comments on option names welcome too.  I quite like the
> verbose
> name.

Okay, based on my reading of the comments in the bug report, you are proposing 
this option as a workaround to a kernel deficiency.  I don't see any agreement 
that this is actually a compiler bug.
Do we really need to include the DSP instrinsics as well?   Do you think that 
many distributions actually enable DSP?  

The option name itself is acceptable to me.  I'd like to see documentation that 
explains when this problem is exposed.  I'd like to limit the fix to LWXS and 
I'd like to see the testcase from the bug report added to the testsuite.
I also agree that the preprocessor macro is a good idea (even if we decide to 
forgo the DSP portion of the patch).

Catherine

> 
> @Doug: Have you tried running the testsuite with the configure option
> --without-indexed-load-store? There may be tests that need adjusting
> where they
> test indexed load/store. We probably need a pre-processor macro
> to detect if the option is enabled/disabled so that DSP code can be
> predicated
> on indexed load being available.
> 
> Thanks,
> Matthew


Reverting mingw32 patch from late 2016

2017-01-16 Thread Jeff Law


Your patch for ming32 causes bootstrapping problems for mingw32.  Given 
that mingw32 isn't something I can afford to spend time debugging, I've 
reverted the patch until it can be fixed.


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78880

Jeff


Add tests for recent dse bugs

2017-01-16 Thread Jeff Law



ACATS already had a test covering the Ada issue, Eric also added a test 
to the gnat.dg testsuite.  So that's well covered.


The test for the bootstrap comparison failure was (as expected) trivial 
to construct (ssa-dse-29.c).  The test for the ppc64 big endian failures 
was easy to extract from sel-sched.c (ssa-dse-2.C)  What was more 
interesting was the scanning.


I finally decided to enhance the dumps a bit to facilitate testing.  If 
we have a trimmable store, we now dump the trimming information.  The 
test then verifies two things.  First that we don't get a negative trim, 
second that the dead memmove call gets deleted.


Bootstrapped and regression tested on x86_64-linux-gnu.  Installing on 
the trunk.


Again folks, sorry for the breakage over the weekend.

Jeff
commit b241d3db9d195bdffa4349cea077ee0e94985435
Author: law 
Date:   Mon Jan 16 23:43:05 2017 +

2017-01-16  Jeff Law  

PR tree-optimization/79090
PR tree-optimization/33562
PR tree-optimization/61912
PR tree-optimization/77485
* tree-ssa-dse.c (compute_trims): Accept STMT argument.  Dump STMT
and computed trims into the dump file.

PR tree-optimization/79090
PR tree-optimization/33562
PR tree-optimization/61912
PR tree-optimization/77485
* tree-ssa-dse.c (compute_trims): Accept STMT argument.  Dump STMT
and computed trims into the dump file.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@244509 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 265e3a5..8507528 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,12 @@
+2017-01-16  Jeff Law  
+
+   PR tree-optimization/79090
+   PR tree-optimization/33562
+   PR tree-optimization/61912
+   PR tree-optimization/77485
+   * tree-ssa-dse.c (compute_trims): Accept STMT argument.  Dump STMT
+   and computed trims into the dump file.
+
 2017-01-17  Uros Bizjak  
 
* config/i386/i386.h (LIMIT_RELOAD_CLASS): Remove.
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index ef1fea5..09c5d52 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,12 @@
+2017-01-16  Jeff Law  
+
+   PR tree-optimization/33562
+   PR tree-optimization/61912
+   PR tree-optimization/77485
+   PR tree-optimization/79090
+   * gcc.dg/tree-ssa/ssa-dse-29.c: New test.
+   * g++.dg/tree-ssa/ssa-dse-2.C: New test.
+
 2017-01-16  Jakub Jelinek  
 
PR c/79089
diff --git a/gcc/testsuite/g++.dg/tree-ssa/ssa-dse-2.C 
b/gcc/testsuite/g++.dg/tree-ssa/ssa-dse-2.C
new file mode 100644
index 000..d69edd2
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/ssa-dse-2.C
@@ -0,0 +1,59 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-dse2-details" } */
+
+typedef long unsigned int size_t;
+extern "C"
+{
+  extern void *memmove (void *__dest, const void *__src, size_t __n) throw ()
+__attribute__ ((__nonnull__ (1, 2)));
+}
+extern void abort () __attribute__ ((__noreturn__));
+struct vec_prefix { unsigned m_num; };
+struct vl_embed { };
+struct vl_ptr { };
+struct va_heap { typedef vl_ptr default_layout; };
+template < typename T, typename A = va_heap, typename L = typename 
A::default_layout > struct vec { };
+template < typename T, typename A > struct vec < T, A, vl_embed >
+{
+  unsigned length (void) const { return m_vecpfx.  m_num; }
+  bool is_empty (void) const { return m_vecpfx.  m_num == 0; }
+  void block_remove (unsigned, unsigned);
+  vec_prefix m_vecpfx;
+  T m_vecdata[1];
+};
+template < typename T, typename A > inline void vec < T, A, vl_embed 
>::block_remove (unsigned ix, unsigned len)
+{
+  T * slot = _vecdata[ix];
+  m_vecpfx.m_num -= len;
+  memmove (slot, slot + len, (m_vecpfx.m_num - ix) * sizeof (T));
+}
+
+template < typename T > struct vec < T, va_heap, vl_ptr >
+{
+  bool is_empty (void) const { return m_vec ?  m_vec-> is_empty () : true; }
+  unsigned length (void) const { return m_vec ?  m_vec-> length () : 0; }
+  void block_remove (unsigned, unsigned);
+  vec < T, va_heap, vl_embed > * m_vec;
+};
+template < typename T > inline void vec < T, va_heap, vl_ptr >::block_remove 
(unsigned ix, unsigned len)
+{
+  m_vec->block_remove (ix, len);
+}
+
+typedef struct _list_node * _list_t;
+typedef struct _expr expr_def;
+typedef expr_def * expr_t;
+typedef _list_t av_set_t;
+static vec < expr_t > vec_av_set;
+bool
+fill_vec_av_set (av_set_t av)
+{
+  if (vec_av_set.length () > 0)
+vec_av_set.block_remove (0, vec_av_set.length ());
+  ((!(vec_av_set.is_empty ())? abort () , 0 : 0));
+}
+
+/* { dg-final { scan-tree-dump-not "Trimming statement .head = -" "dse2" } } */
+/* { dg-final { scan-tree-dump "Deleted dead call: " "dse2" } } */
+
+
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-29.c 

Re: [PATCH 9f] Add a way for the C frontend to compile __RTL-tagged functions

2017-01-16 Thread David Malcolm
On Mon, 2017-01-16 at 14:54 -0700, Jeff Law wrote:
> On 01/09/2017 07:38 PM, David Malcolm wrote:
> > The backend is full of singleton state, so we have to compile
> > __RTL-functions as soon as we parse them.  This means that the
> > C frontend needs to invoke the backed.
> > 
> > This patch adds the support needed.
> > 
> > Normally this would be a no-no, and including rtl headers is
> > blocked by this within system.h:
> > 
> >  /* Front ends should never have to include middle-end headers. 
> >  Enforce
> > this by poisoning the header double-include protection defines.
> >   */
> >  #ifdef IN_GCC_FRONTEND
> >  #pragma GCC poison GCC_RTL_H GCC_EXCEPT_H GCC_EXPR_H
> >  #endif
> > 
> > Hence the patch puts the decl into a new header (run-rtl-passes.h)
> > that's accessible to the C frontend without exposing any RTL
> > internals.  (If adding a header for just this decl is overkill, is
> > there a better place to put the decl?)
> > 
> > gcc/ChangeLog:
> > * Makefile.in (OBJS): Add run-rtl-passes.o.
> > * pass_manager.h (gcc::pass_manager::get_rest_of_compilation):
> > New
> > accessor.
> > (gcc::pass_manager::get_clean_slate): New accessor.
> > * run-rtl-passes.c: New file.
> > * run-rtl-passes.h: New file.
> It feels like this is dependent upon something that I haven't seen?!?

I may have split things up a bit too much; sorry.  The code in this
patch is called by patch 9a.

>  
> Where is get_rest_of_compilation used? 

In this patch, in run-rtl-passes.c:run_rtl_passes, thusly:

+  opt_pass *rest_of_compilation
+= g->get_passes ()->get_rest_of_compilation ();
+  gcc_assert (rest_of_compilation);
+  execute_pass_list (cfun, rest_of_compilation);


>  Where is pass_clean_state_1?

(as used in this part of the patch):

>   * pass_manager.h (gcc::pass_manager::get_rest_of_compilation):
>   New accessor.

+  opt_pass *get_clean_slate () const { return pass_clean_state_1; }

This is a new accessor method for the pass_manager class.
pass_clean_state_1 is private member data of the pass_manager, it's the
instance of class pass_clean_state.

This field of pass_manager is created by the 

  /* References to all of the individual passes.
 These fields are generated via macro expansion.

  (...etc...)

  #define NEXT_PASS(PASS, NUM) opt_pass *PASS ## _ ## NUM

  (...etc...)

  #include "pass-instances.def"

code within pass_manager.h.  This line within SRC/gcc/passes.def:

  NEXT_PASS (pass_clean_state);

becomes this line within BUILD/gcc/pass-instances.def:

  NEXT_PASS (pass_clean_state, 1);

and this means that we effectively have this field within class
pass_manager (along with dozens of others):

   opt_pass *pass_clean_state_1;

They're all private; hence the need for this accessor.

Similarly for get_rest_of_compilation.



[PATCH, i386]: Remove LIMIT_RELOAD_CLASS macro

2017-01-16 Thread Uros Bizjak
Hello!

This is reload-only macro, not needed after target moved to LRA.

2017-01-17  Uros Bizjak  

* config/i386/i386.h (LIMIT_RELOAD_CLASS): Remove.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.
Index: config/i386/i386.h
===
--- config/i386/i386.h  (revision 244504)
+++ config/i386/i386.h  (working copy)
@@ -1559,30 +1559,6 @@
 #define INDEX_REG_CLASS INDEX_REGS
 #define BASE_REG_CLASS GENERAL_REGS
 
-/* Place additional restrictions on the register class to use when it
-   is necessary to be able to hold a value of mode MODE in a reload
-   register for which class CLASS would ordinarily be used.
-
-   We avoid classes containing registers from multiple units due to
-   the limitation in ix86_secondary_memory_needed.  We limit these
-   classes to their "natural mode" single unit register class, depending
-   on the unit availability.
-
-   Please note that reg_class_subset_p is not commutative, so these
-   conditions mean "... if (CLASS) includes ALL registers from the
-   register set."  */
-
-#define LIMIT_RELOAD_CLASS(MODE, CLASS)
\
-  (((MODE) == QImode && !TARGET_64BIT  \
-&& reg_class_subset_p (Q_REGS, (CLASS))) ? Q_REGS  \
-   : (((MODE) == SImode || (MODE) == DImode)   \
-  && reg_class_subset_p (GENERAL_REGS, (CLASS))) ? GENERAL_REGS\
-   : (SSE_FLOAT_MODE_P (MODE) && TARGET_SSE_MATH   \
-  && reg_class_subset_p (SSE_REGS, (CLASS))) ? SSE_REGS\
-   : (X87_FLOAT_MODE_P (MODE)  \
-  && reg_class_subset_p (FLOAT_REGS, (CLASS))) ? FLOAT_REGS
\
-   : (CLASS))
-
 /* If we are copying between general and FP registers, we need a memory
location. The same is true for SSE and MMX registers.  */
 #define SECONDARY_MEMORY_NEEDED(CLASS1, CLASS2, MODE) \


Re: [PATCH, rs6000] Add support for vec_rlnm and vec_rlmi

2017-01-16 Thread Bill Schmidt

> On Jan 16, 2017, at 4:24 PM, Segher Boessenkool  
> wrote:
> 
> Hi Bill,
> 
> A few comments:
> 
> On Mon, Jan 16, 2017 at 12:12:22PM -0600, Bill Schmidt wrote:
>>  * config/rs6000/rs6000-builtin.def (VRLWNM): New monomorphic
>>  function entry.
> 
> I had to look up if "monomorphic" is an existing word in this context.
> Unfortunately it is, sigh (it clashes hard with all the more common
> meanings).

Eh, "more common" if you are an algebra or biology bigot. :P

> 
>> --- gcc/config/rs6000/altivec.h  (revision 244498)
>> +++ gcc/config/rs6000/altivec.h  (working copy)
>> @@ -168,6 +168,9 @@
>> #define vec_re __builtin_vec_re
>> #define vec_round __builtin_vec_round
>> #define vec_recipdiv __builtin_vec_recipdiv
>> +#define vec_rlmi __builtin_vec_rlmi
>> +#define vec_vrlnm __builtin_vec_rlnm
>> +#define vec_rlnm(a,b,c) (__builtin_vec_rlnm(a,(b<<8)|c))
> 
> This needs parens around the arguments.

Oops, fixing.

> 
>> +The result of @code{vec_rlmi} is obtained by rotating each element of
>> +the first argument vector left and inserting it under mask into the
>> +second argument vector.
> 
> Did you swap first and second here?

No -- this is the way the vec_rlmi interface is defined in the appendix.

Thanks for the review!
Bill

> 
> Okay for trunk with those points addressed.  Thanks!
> 
> 
> Segher
> 



Re: [PATCH 9g] Extend .md and RTL parsing to support being wired up to cc1

2017-01-16 Thread David Malcolm
On Mon, 2017-01-16 at 15:04 -0700, Jeff Law wrote:
> On 01/09/2017 07:38 PM, David Malcolm wrote:
> > gcc/ChangeLog:
> > * read-md.c (md_reader::read_char): Support filtering
> > the input to a subset of line numbers.
> > (md_reader::md_reader): Initialize fields
> > m_first_line and m_last_line.
> > (md_reader::read_file_fragment): New function.
> > * read-md.h (md_reader::read_file_fragment): New decl.
> > (md_reader::m_first_line): New field.
> > (md_reader::m_last_line): New field.
> > * read-rtl-function.c (function_reader::create_function): Only
> > create cfun if it doesn't already exist.  Set PROP_rtl on
> > cfun's
> > curr_properties.  Set DECL_INITIAL to a dummy block.
> > (read_rtl_function_body_from_file_range): New function.
> > * read-rtl-function.h (read_rtl_function_body_from_file_range):
> > New decl.
> OK.
> 
> So what's left?

Thanks; I believe patches 9a-b, 9d, and 9g-9j are approved.
Patches 9c, 9e and 9f need work.  Am looking at them now.

Dave


Re: [PATCH 1/6] RISC-V Port: gcc/config/riscv/riscv.c

2017-01-16 Thread Andrew Waterman
On Thu, Jan 12, 2017 at 1:38 PM, Joseph Myers  wrote:
> On Wed, 11 Jan 2017, Palmer Dabbelt wrote:
>
>> +#include 
>
> This is included in system.h, so don't include it here.

OK.

>
>> +  error ("unknown cpu `%s' for -mtune", cpu_string);
>
> This is using very-old-style `' quotes.  Diagnostics should use e.g. %qs
> for quoting the output of a single % directive, or %< and %> for quoting
> anything more complicated, so that Unicode quotes can be used when the
> locale permits.
>
> Likewise elsewhere in this patch and in patch 2.

Will do, throughout.

>
>> +#undef TARGET_LRA_P
>> +#define TARGET_LRA_P hook_bool_void_true
>
> Using LRA is the default; you shouldn't need this definition.

Ah, glad to see LRA is now the default.

>
> I don't see a definition of TARGET_ATOMIC_ASSIGN_EXPAND_FENV.  Since you
> have floating-point exceptions, I'd expect lack this to result in
> c11-atomic-exec-5.c failing.

Indeed, and after implementing the hook, those failures disappeared.  Thanks.

>
> --
> Joseph S. Myers
> jos...@codesourcery.com


Re: [PATCH, bugfix] builtin expansion of strcmp for rs6000

2017-01-16 Thread Segher Boessenkool
On Mon, Jan 16, 2017 at 03:09:35PM -0600, Aaron Sawdey wrote:
> Tulio noted that glibc's strncmp test was failing. This turned out to
> be the use of signed HOST_WIDE_INT for handling strncmp length. The
> glibc test calls strncmp with length 2^64-1, presumably to provoke
> exactly this type of bug. Fixing the issue required changing
> select_block_compare_mode() and expand_block_compare() as well.
> 
> The other change is if we must emit a runtime check for the 4k
> crossing, then we might as well set base_align to 8 and emit the best
> possible code.

Some nits...

> --- gcc/config/rs6000/rs6000.c(revision 244503)
> +++ gcc/config/rs6000/rs6000.c(working copy)
> @@ -19310,7 +19310,8 @@
> WORD_MODE_OK indicates using WORD_MODE is allowed, else SImode is
> the largest allowable mode.  */
>  static machine_mode
> -select_block_compare_mode (HOST_WIDE_INT offset, HOST_WIDE_INT bytes,
> +select_block_compare_mode (unsigned HOST_WIDE_INT offset,
> +unsigned HOST_WIDE_INT bytes,
>  HOST_WIDE_INT align, bool word_mode_ok)

"align" should probably be unsigned as well?

> @@ -19410,8 +19411,8 @@
>gcc_assert (GET_MODE (target) == SImode);
>  
>/* Anything to move? */
> -  HOST_WIDE_INT bytes = INTVAL (bytes_rtx);
> -  if (bytes <= 0)
> +  unsigned HOST_WIDE_INT bytes = INTVAL (bytes_rtx);
> +  if (bytes == 0)
>  return true;

UINTVAL?  Please check the rest of the patch for this, too.

>/* We don't want to generate too much code.  */
>if (ROUND_UP (bytes, load_mode_size) / load_mode_size
> -  > rs6000_block_compare_inline_limit)
> +  > (unsigned HOST_WIDE_INT)rs6000_block_compare_inline_limit)
>  return false;

Space after cast operator.  Why do you need a cast at all?  It already
is unsigned.

> +   /* -m32 -mpowerpc64 results in word_mode being DImode even
> +  though otherwise it is 32-bit. The length arg to strncmp
> +  is a size_t which will be the same size as pointers.  */
> +   rtx len_rtx;
> +   if (TARGET_64BIT)
> + len_rtx = gen_reg_rtx(DImode);
> +   else
> + len_rtx = gen_reg_rtx(SImode);

Space before opening paren in function calls.

> +  while (bytes_to_compare > 0)
>  {
>/* Compare sequence:
>   check each 8B with: ld/ld cmpd bne
> +  If equal, use rldicr/cmpb to check for zero byte.
>   cleanup code at end:
>   cmpb  get byte that differs
>   cmpb  look for zero byte

Mixed spaces/tabs indent.

> @@ -19919,37 +19968,45 @@
>   }
>   }
>  
> -  int remain = bytes - cmp_bytes;
> +  /* Cases to handle.  A and B are chunks of the two strings.
> + 1: Not end of comparison:
> +A != B: branch to cleanup code to compute result.
> +   A == B: check for 0 byte, next block if not found.
> + 2: End of the inline comparison:
> +A != B: branch to cleanup code to compute result.
> +   A == B: check for 0 byte, call strcmp/strncmp
> +  3: compared requested N bytes:
> +A == B: branch to result 0.
> +   A != B: cleanup code to compute result.  */

And here.

> @@ -19957,10 +20014,102 @@
>JUMP_LABEL (j) = dst_label;
>LABEL_NUSES (dst_label) += 1;
>  
> +  if (remain > 0 || equality_compare_rest)
> + {
> +   /* Generate a cmpb to test for a 0 byte and branch
> +  to final result if found.  */
> +   rtx cmpb_zero = gen_reg_rtx (word_mode);
> +   rtx lab_ref_fin = gen_rtx_LABEL_REF (VOIDmode, final_move_label);
> +   rtx condz = gen_reg_rtx (CCmode);
> +   rtx zero_reg = gen_reg_rtx (word_mode);
> +   if (word_mode == SImode)
> + {
> +   emit_insn (gen_movsi (zero_reg, GEN_INT(0)));

Space before (0).

> +   emit_insn (gen_cmpbsi3 (cmpb_zero, tmp_reg_src1, zero_reg));
> +   if ( cmp_bytes < word_mode_size )

No spaces inside the parens.

> + { /* Don't want to look at zero bytes past end.  */

Put the comment on a separate line please.

> +   emit_insn (gen_movdi (zero_reg, GEN_INT(0)));
> +   emit_insn (gen_cmpbdi3 (cmpb_zero, tmp_reg_src1, zero_reg));
> +   if ( cmp_bytes < word_mode_size )
> + { /* Don't want to look at zero bytes past end.  */

Again and again and again :-)

Looks good otherwise.


Segher


Re: [PATCH] Add AVX512 k-mask intrinsics

2017-01-16 Thread Jakub Jelinek
On Tue, Jan 17, 2017 at 01:30:11AM +0300, Andrew Senkevich wrote:
> here is one more part of intrinsics for k-mask registers shifts:

The software developer manuals describe KSHIFT{L,R}* like:
KSHIFTLW
COUNT <- imm8[7:0]
DEST[MAX_KL-1:0] <- 0
IF COUNT <=15
THEN DEST[15:0] <- SRC1[15:0] << COUNT;
FI;

What is the behavior when src1 == dest, like:
  kshiftld $3, %k3, %k3
?  Is it just a bug in the SDM and will it actually do the expected thing
(set %k3 to %k3 << 3 and clear just the upper bits), or do we need
an early-clobber on the destination to make sure GCC never emits these
insns with the same register as both input and output?

Jakub


Re: [PATCH][PR tree-optimization/79090] Fix two minor DSE bugs

2017-01-16 Thread Richard Biener
On January 16, 2017 7:27:53 PM GMT+01:00, Jeff Law  wrote:
>On 01/16/2017 01:51 AM, Richard Biener wrote:
>> On Sun, Jan 15, 2017 at 10:34 AM, Jeff Law  wrote:
>>>
>>> At one time I know I had the max_size == size test in
>valid_ao_ref_for_dse.
>>> But it got lost at some point.  This is what caused the Ada failure.
>>>
>>> Technically it'd be OK for the potentially dead store to have a
>variable
>>> size as long as the later stores covered the entire range of the
>potentially
>>> dead store.  I doubt this happens enough to be worth checking.
>>>
>>> The ppc64 big endian failures were more interesting.  We had this in
>the IL:
>>>
>>> memmove (dst, src, 0)
>>>
>>> The trimming code assumes that there's at least one live byte in the
>store,
>>> which obviously isn't the case here.  The net result is we compute
>an
>>> incorrect trim and the copy goes wild with incorrect addresses and
>lengths.
>>> This is trivial to fix by validating that the store has a nonzero
>length.
>>>
>>> I was a bit curious how often this happened in practice because such
>a call
>>> is trivially dead.  ~80 during a bootstrap and a few dozen in the
>testsuite.
>>> Given how trivial it is to detect and optimize, this patch includes
>removal
>>> of such calls.  This hunk makes the check for zero size in
>>> valid_ao_ref_for_dse redundant, but I'd like to keep the check -- if
>we add
>>> more builtin support without filtering zero size we'd regress again.
>>
>> Interesting - we do fold memset (..., 0) away so this means we either
>> have an unfolded memset stmt in the IL before DSE.
>It's actually exposed by fre3, both in the original test and in the 
>reduced testcase.  In the reduced testcase we have this just prior to
>FRE3:
>
>;;   basic block 3, loop depth 0, count 0, freq 7326, maybe hot
>;;prev block 2, next block 4, flags: (NEW, REACHABLE, VISITED)
>;;pred:   2 [73.3%]  (TRUE_VALUE,EXECUTABLE)
>   _3 = MEM[(const struct vec *)_4].m_num;
>   if (_3 != 0)
> goto ; [36.64%]
>   else
> goto ; [63.36%]
>;;succ:   4 [36.6%]  (TRUE_VALUE,EXECUTABLE)
>;;5 [63.4%]  (FALSE_VALUE,EXECUTABLE)
>
>;;   basic block 4, loop depth 0, count 0, freq 2684, maybe hot
>;;prev block 3, next block 5, flags: (NEW, REACHABLE, VISITED)
>;;pred:   3 [36.6%]  (TRUE_VALUE,EXECUTABLE)
>   _6 = vec_av_set.m_vec;
>   _7 = _6->m_num;
>   _8 = _7 - _3;
>   _6->m_num = _8;
>   _9 = (long unsigned int) _8;
>   _10 = _9 * 4;
>   slot.2_11 = slot;
>   dest.3_12 = dest;
>   memmove (dest.3_12, slot.2_11, _10);
>;;succ:   5 [100.0%]  (FALLTHRU,EXECUTABLE)
>
>
>_3 has the value _6->m_num.  Thus _8 will have the value 0, which in 
>turn makes _10 have the value zero as seen in the .fre3 dump:
>
>;;   basic block 4, loop depth 0, count 0, freq 2684, maybe hot
>;;prev block 3, next block 5, flags: (NEW, REACHABLE, VISITED)
>;;pred:   3 [36.6%]  (TRUE_VALUE,EXECUTABLE)
>   _4->m_num = 0;
>   slot.2_11 = slot;
>   dest.3_12 = dest;
>   memmove (dest.3_12, slot.2_11, 0);
>
>In the full test its similar.
>
>I don't know if you want to try and catch this in FRE though.

Ah, I think I have patches for this since a long time in my tree...  We're 
folding calls in a restricted way for some historical reason.

>If we detect in DCE (where it makes reasonable sense) rather than DSE, 
>then we detect the dead mem* about 17 passes earlier and the dead 
>argument setup about 20 passes earlier.  In the testcase I looked at, I
>
>didn't see additional secondary optimizations enabled, but I could 
>imagine cases where it might.  Seems like a gcc-8 thing though.

I'll give it a quick look tomorrow.

Richard.

>Jeff



Re: Implement -Wduplicated-branches (PR c/64279) (v3)

2017-01-16 Thread Jeff Law

On 01/09/2017 02:21 AM, Marek Polacek wrote:

On Thu, Jan 05, 2017 at 04:41:28PM +0100, Jakub Jelinek wrote:

On Thu, Jan 05, 2017 at 04:39:40PM +0100, Marek Polacek wrote:

Coming back to this...



Right, after h0 == h1 is missing && operand_equal_p (thenb, elseb, 0)
or so (the exact last operand needs to be figured out).
OEP_ONLY_CONST is certainly wrong, we want the same VAR_DECLs to mean the
same thing.  0 is a tiny bit better, but still it will give up on e.g. pure
and other calls.  OEP_PURE_SAME is tiny bit better than that, but still
calls with the same arguments to the same function will not be considered
equal, plus likely operand_equal_p doesn't handle STATEMENT_LIST etc.
So maybe we need another OEP_* mode for this.


Yea, if I add "&& operand_equal_p (thenb, elseb, 0)" then this warning doesn't
trigger for certain cases, such as MODIFY_EXPR, RETURN_EXPR, probably
STATEMENT_LIST and others.  So I suppose I could introduce a new OEP_ mode for
this (names?  OEP_EXTENDED?) and then in operand_equal_p in case tcc_expression
do

  case MODIFY_EXPR:
if (flags & OEP_EXTENDED)
  // compare LHS and RHS of both

?


Yeah.  Not sure what is the best name for that.  Maybe Richi has some clever
ideas.


Here it is.  The changes in operand_equal_p should only trigger with the new
OEP_LEXICOGRAPHIC, and given the macro location issue, the warning isn't yet
enabled by neither -Wall nor -Wextra, so this all should be safe.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2017-01-09  Marek Polacek  

PR c/64279
* c-common.h (do_warn_duplicated_branches_r): Declare.
* c-gimplify.c (c_genericize): Walk the function tree calling
do_warn_duplicated_branches_r.
* c-warn.c (expr_from_macro_expansion_r): New.
(do_warn_duplicated_branches): New.
(do_warn_duplicated_branches_r): New.
* c.opt (Wduplicated-branches): New option.

* c-typeck.c (build_conditional_expr): Warn about duplicated branches.

* call.c (build_conditional_expr_1): Warn about duplicated branches.
* semantics.c (finish_expr_stmt): Build statement using the proper
location.

* doc/invoke.texi: Document -Wduplicated-branches.
* fold-const.c (operand_equal_p): Handle MODIFY_EXPR, INIT_EXPR,
COMPOUND_EXPR, PREDECREMENT_EXPR, PREINCREMENT_EXPR,
POSTDECREMENT_EXPR, POSTINCREMENT_EXPR, CLEANUP_POINT_EXPR, EXPR_STMT,
STATEMENT_LIST, and RETURN_EXPR.  For non-pure non-const functions
return 0 only when not OEP_LEXICOGRAPHIC.
(fold_build_cleanup_point_expr): Use the expression
location when building CLEANUP_POINT_EXPR.
* tree-core.h (enum operand_equal_flag): Add OEP_LEXICOGRAPHIC.
* tree.c (add_expr): Handle error_mark_node.

* c-c++-common/Wduplicated-branches-1.c: New test.
* c-c++-common/Wduplicated-branches-10.c: New test.
* c-c++-common/Wduplicated-branches-11.c: New test.
* c-c++-common/Wduplicated-branches-12.c: New test.
* c-c++-common/Wduplicated-branches-2.c: New test.
* c-c++-common/Wduplicated-branches-3.c: New test.
* c-c++-common/Wduplicated-branches-4.c: New test.
* c-c++-common/Wduplicated-branches-5.c: New test.
* c-c++-common/Wduplicated-branches-6.c: New test.
* c-c++-common/Wduplicated-branches-7.c: New test.
* c-c++-common/Wduplicated-branches-8.c: New test.
* c-c++-common/Wduplicated-branches-9.c: New test.
* c-c++-common/Wimplicit-fallthrough-7.c: Coalesce dg-warning.
* g++.dg/cpp0x/lambda/lambda-switch.C: Move dg-warning.
* g++.dg/ext/builtin-object-size3.C: Likewise.
* g++.dg/gomp/loop-1.C: Likewise.
* g++.dg/warn/Wduplicated-branches1.C: New test.
* g++.dg/warn/Wduplicated-branches2.C: New test.

s/indentical/identical in the doc/invoke.texi changes.






diff --git gcc/c/c-typeck.c gcc/c/c-typeck.c
index 96e7351..ed8ffe4 100644
--- gcc/c/c-typeck.c
+++ gcc/c/c-typeck.c
@@ -5193,6 +5193,15 @@ build_conditional_expr (location_t colon_loc, tree 
ifexp, bool ifexp_bcp,
 ret = build1 (EXCESS_PRECISION_EXPR, semantic_result_type, ret);

   protected_set_expr_location (ret, colon_loc);
+
+  /* If the OP1 and OP2 are the same and don't have side-effects,
+ warn here, because the COND_EXPR will be turned into OP1.  */
+  if (warn_duplicated_branches
+  && TREE_CODE (ret) == COND_EXPR
+  && (op1 == op2 || operand_equal_p (op1, op2, 0)))
Did you want OEP_LEXICOGRAPHIC here, or in this context do we not have 
to worry about the more complex forms?  What about a statement 
expressions?  Have we lowered those at this point already?



diff --git gcc/cp/call.c gcc/cp/call.c
index e431221..84931fb 100644
--- gcc/cp/call.c
+++ gcc/cp/call.c
@@ -5302,6 +5302,13 @@ build_conditional_expr_1 (location_t loc, tree arg1, 
tree arg2, tree arg3,
  valid_operands:
   result = build3_loc (loc, 

Re: [PATCH] Add AVX512 k-mask intrinsics

2017-01-16 Thread Andrew Senkevich
Hi,

here is one more part of intrinsics for k-mask registers shifts:

gcc/
* config/i386/avx512bwintrin.h: Add k-mask registers shift intrinsics.
* config/i386/avx512dqintrin.h: Ditto.
* config/i386/avx512fintrin.h: Ditto.
* config/i386/i386-builtin-types.def: Add new types.
* gcc/config/i386/i386.c: Handle new types.
* config/i386/i386-builtin.def (__builtin_ia32_kshiftliqi,
__builtin_ia32_kshiftlihi, __builtin_ia32_kshiftlisi,
__builtin_ia32_kshiftlidi, __builtin_ia32_kshiftriqi,
__builtin_ia32_kshiftrihi, __builtin_ia32_kshiftrisi,
__builtin_ia32_kshiftridi): New.
* config/i386/sse.md (k2): Rename *k.

gcc/testsuite/
* gcc.target/i386/avx512bw-kshiftld-1.c: New test.
* gcc.target/i386/avx512bw-kshiftlq-1.c: Ditto.
* gcc.target/i386/avx512dq-kshiftlb-1.c: Ditto.
* gcc.target/i386/avx512f-kshiftlw-1.c: Ditto.
* gcc.target/i386/avx512bw-kshiftrd-1.c: Ditto.
* gcc.target/i386/avx512bw-kshiftrq-1.c: Ditto.
* gcc.target/i386/avx512dq-kshiftrb-1.c: Ditto.
* gcc.target/i386/avx512f-kshiftrw-1.c: Ditto.


Is it Ok for trunk?


--
WBR,
Andrew


avx512-kmask-intrin-part4.patch
Description: Binary data


Re: [PATCH, rs6000] Add support for vec_rlnm and vec_rlmi

2017-01-16 Thread Segher Boessenkool
Hi Bill,

A few comments:

On Mon, Jan 16, 2017 at 12:12:22PM -0600, Bill Schmidt wrote:
>   * config/rs6000/rs6000-builtin.def (VRLWNM): New monomorphic
>   function entry.

I had to look up if "monomorphic" is an existing word in this context.
Unfortunately it is, sigh (it clashes hard with all the more common
meanings).

> --- gcc/config/rs6000/altivec.h   (revision 244498)
> +++ gcc/config/rs6000/altivec.h   (working copy)
> @@ -168,6 +168,9 @@
>  #define vec_re __builtin_vec_re
>  #define vec_round __builtin_vec_round
>  #define vec_recipdiv __builtin_vec_recipdiv
> +#define vec_rlmi __builtin_vec_rlmi
> +#define vec_vrlnm __builtin_vec_rlnm
> +#define vec_rlnm(a,b,c) (__builtin_vec_rlnm(a,(b<<8)|c))

This needs parens around the arguments.

> +The result of @code{vec_rlmi} is obtained by rotating each element of
> +the first argument vector left and inserting it under mask into the
> +second argument vector.

Did you swap first and second here?

Okay for trunk with those points addressed.  Thanks!


Segher


Re: [ping] libobjc patches and zlib update

2017-01-16 Thread Jeff Law

On 01/12/2017 04:24 AM, Matthias Klose wrote:

[CCing some global reviewers]

Please review the two libobjc patches:

  https://gcc.gnu.org/ml/gcc-patches/2016-12/msg02003.html
- not a regression, but a simplification of the new
  configure option.
  https://gcc.gnu.org/ml/gcc-patches/2016-12/msg02004.html
- fix the --disable-shared build with external bdw-gc.
  Fixed the texi typo as mentioned by Andreas.

These are both OK.



and the proposed zlib update:

  https://gcc.gnu.org/ml/gcc-patches/2017-01/msg00315.html

I acked this last week.

Thanks for your patience.
jeff


Re: [PATCH 9g] Extend .md and RTL parsing to support being wired up to cc1

2017-01-16 Thread Jeff Law

On 01/09/2017 07:38 PM, David Malcolm wrote:

gcc/ChangeLog:
* read-md.c (md_reader::read_char): Support filtering
the input to a subset of line numbers.
(md_reader::md_reader): Initialize fields
m_first_line and m_last_line.
(md_reader::read_file_fragment): New function.
* read-md.h (md_reader::read_file_fragment): New decl.
(md_reader::m_first_line): New field.
(md_reader::m_last_line): New field.
* read-rtl-function.c (function_reader::create_function): Only
create cfun if it doesn't already exist.  Set PROP_rtl on cfun's
curr_properties.  Set DECL_INITIAL to a dummy block.
(read_rtl_function_body_from_file_range): New function.
* read-rtl-function.h (read_rtl_function_body_from_file_range):
New decl.

OK.

So what's left?

jeff



Re: [PATCH 9i] testsuite: add aarch64-specific files

2017-01-16 Thread Jeff Law

On 01/09/2017 07:38 PM, David Malcolm wrote:

gcc/testsuite/ChangeLog:
* gcc.dg/rtl/aarch64/asr_div1.c: New test case.
* gcc.dg/rtl/aarch64/pr71779.c: New test case.

OK.
jeff



Re: [PATCH 9j] testsuite: add x86_64-specific files

2017-01-16 Thread Jeff Law

On 01/09/2017 07:38 PM, David Malcolm wrote:

A collection of test cases, capturing the state of various
functions at various places within the pass list, and verifying
that we can restart at various passes.

gcc/testsuite/ChangeLog:
* gcc.dg/rtl/x86_64/dfinit.c: New test case.
* gcc.dg/rtl/x86_64/different-structs.c: New test case.
* gcc.dg/rtl/x86_64/final.c: New test case.
* gcc.dg/rtl/x86_64/into-cfglayout.c: New test case.
* gcc.dg/rtl/x86_64/ira.c: New test case.
* gcc.dg/rtl/x86_64/pro_and_epilogue.c: New test case.
* gcc.dg/rtl/x86_64/test-multiple-fns.c: New test case.
* gcc.dg/rtl/x86_64/test-return-const.c.after-expand.c: New test case.
* gcc.dg/rtl/x86_64/test-return-const.c.before-fwprop.c: New test case.
* gcc.dg/rtl/x86_64/test-rtl.c: New test case.
* gcc.dg/rtl/x86_64/test_1.h: New file.
* gcc.dg/rtl/x86_64/times-two.c.after-expand.c: New test case.
* gcc.dg/rtl/x86_64/times-two.c.before-df.c: New test case.
* gcc.dg/rtl/x86_64/times-two.h: New file.
* gcc.dg/rtl/x86_64/vregs.c: New test case.

This all looks fairly reasonable.

jeff



Re: [PATCH 9h] testsuite: add platform-independent files

2017-01-16 Thread Jeff Law

On 01/09/2017 07:38 PM, David Malcolm wrote:

This patch adds:
  - an rtl.exp (to make it easy to run just the tests
for __RTL-tagged functions)
  - a test.c source file I used when generating the various RTL
dumps (for reference)
  - a couple of tests of __RTL parser errors

gcc/testsuite/ChangeLog:
* gcc.dg/rtl/rtl.exp: New file.
* gcc.dg/rtl/test.c: New file.
* gcc.dg/rtl/truncated-rtl-file.c: New test case.
* gcc.dg/rtl/unknown-rtx-code.c: New test case.

OK.
jeff



Re: [PATCH 9f] Add a way for the C frontend to compile __RTL-tagged functions

2017-01-16 Thread Jeff Law

On 01/09/2017 07:38 PM, David Malcolm wrote:

The backend is full of singleton state, so we have to compile
__RTL-functions as soon as we parse them.  This means that the
C frontend needs to invoke the backed.

This patch adds the support needed.

Normally this would be a no-no, and including rtl headers is
blocked by this within system.h:

 /* Front ends should never have to include middle-end headers.  Enforce
this by poisoning the header double-include protection defines.  */
 #ifdef IN_GCC_FRONTEND
 #pragma GCC poison GCC_RTL_H GCC_EXCEPT_H GCC_EXPR_H
 #endif

Hence the patch puts the decl into a new header (run-rtl-passes.h)
that's accessible to the C frontend without exposing any RTL
internals.  (If adding a header for just this decl is overkill, is
there a better place to put the decl?)

gcc/ChangeLog:
* Makefile.in (OBJS): Add run-rtl-passes.o.
* pass_manager.h (gcc::pass_manager::get_rest_of_compilation): New
accessor.
(gcc::pass_manager::get_clean_slate): New accessor.
* run-rtl-passes.c: New file.
* run-rtl-passes.h: New file.
It feels like this is dependent upon something that I haven't seen?!? 
Where is get_rest_of_compilation used?  Where is pass_clean_state_1?



jeff


Re: [PATCH 9e] Update "startwith" logic for pass-skipping to handle __RTL functions

2017-01-16 Thread Jeff Law

On 01/09/2017 07:38 PM, David Malcolm wrote:

gcc/ChangeLog:
* passes.c: Include "insn-addr.h".
(should_skip_pass_p): Add logging.  Update logic for running
"expand" to be compatible with both __GIMPLE and __RTL.  Guard
property-provider override so it is only done for gimple passes.
Don't skip dfinit.
(skip_pass): New function.
(execute_one_pass): Call skip_pass when skipping passes.
---
 gcc/passes.c | 65 +---
 1 file changed, 58 insertions(+), 7 deletions(-)

diff --git a/gcc/passes.c b/gcc/passes.c
index 31262ed..6954d1e 100644
--- a/gcc/passes.c
+++ b/gcc/passes.c
@@ -59,6 +59,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "cfgrtl.h"
 #include "tree-ssa-live.h"  /* For remove_unused_locals.  */
 #include "tree-cfgcleanup.h"
+#include "insn-addr.h" /* for INSN_ADDRESSES_ALLOC.  */

insn-addr?  Yuk.




 using namespace gcc;

@@ -2315,26 +2316,73 @@ should_skip_pass_p (opt_pass *pass)
   if (!cfun->pass_startwith)
 return false;

-  /* We can't skip the lowering phase yet -- ideally we'd
- drive that phase fully via properties.  */
-  if (!(cfun->curr_properties & PROP_ssa))
-return false;
+ /* For __GIMPLE functions, we have to at least start when we leave
+ SSA.  */
+  if (pass->properties_destroyed & PROP_ssa)
+{
+  if (!quiet_flag)
+   fprintf (stderr, "starting anyway when leaving SSA: %s\n", pass->name);
+  cfun->pass_startwith = NULL;
+  return false;
+}
This seems to need a comment -- it's not obvious how destroying the SSA 
property maps to a pass that can not be skipped.



-  /* And also run any property provider.  */
-  if (pass->properties_provided != 0)
+  /* Run any property provider.  */
+  if (pass->type == GIMPLE_PASS
+  && pass->properties_provided != 0)
 return false;
So comment needed here too.  I read this as "if a gimple pass provides a 
property then it should not be skipped.  Which means that an RTL pass 
that provides a property can?





+  /* Don't skip df init; later RTL passes need it.  */
+  if (strstr (pass->name, "dfinit") != NULL)
+return false;

Which seems like a failing in RTL passes saying they need DF init.




+/* Skip the given pass, for handling passes before "startwith"
+   in __GIMPLE and__RTL-marked functions.
+   In theory, this ought to be a no-op, but some of the RTL passes
+   need additional processing here.  */
+
+static void
+skip_pass (opt_pass *pass)

...
This all feels like a failing in how we handle state in the RTL world. 
And I suspect it's prone to error.  Imagine if I'm hacking on something 
in the RTL world and my code depends on something else being set up.   I 
really ought to have a way within my pass to indicate what I depend on. 
Having it hidden away in passes.c makes it easy to miss/forget.




+{
+  /* Pass "reload" sets the global "reload_completed", and many
+ things depend on this (e.g. instructions in .md files).  */
+  if (strcmp (pass->name, "reload") == 0)
+reload_completed = 1;

Seems like this ought to be a property provided by LRA/reload.



+
+  /* The INSN_ADDRESSES vec is normally set up by
+ shorten_branches; set it up for the benefit of passes that
+ run after this.  */
+  if (strcmp (pass->name, "shorten") == 0)
+INSN_ADDRESSES_ALLOC (get_max_uid ());

Similarly ought to be provided by shorten-branches


+
+  /* Update the cfg hooks as appropriate.  */
+  if (strcmp (pass->name, "into_cfglayout") == 0)
+{
+  cfg_layout_rtl_register_cfg_hooks ();
+  cfun->curr_properties |= PROP_cfglayout;
+}
+  if (strcmp (pass->name, "outof_cfglayout") == 0)
+{
+  rtl_register_cfg_hooks ();
+  cfun->curr_properties &= ~PROP_cfglayout;
+}
+}

This feels somewhat different, but still a hack.

I don't have strong suggestions on how to approach this, but what we've 
got here feels like a hack and one prone to bitrot.


jeff


[committed] Fix a tree sharing bug during gimplification (PR c/79089)

2017-01-16 Thread Jakub Jelinek
Hi!

gimplify_init_constructor sometimes uses object == lhs twice, once in
gimple_build_assign and then as the result value, which is wrong if
object is something that can't be shared such as COMPONENT_REF.  Fixed by
unsharing it in that case.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2017-01-16  Jakub Jelinek  

PR c/79089
* gimplify.c (gimplify_init_constructor): If want_value and
object == lhs, unshare lhs to avoid invalid tree sharing.  Formatting
fix.

* gcc.c-torture/compile/pr79089.c: New test.

--- gcc/gimplify.c.jj   2017-01-05 11:43:11.0 +0100
+++ gcc/gimplify.c  2017-01-16 10:23:15.709864970 +0100
@@ -4586,8 +4586,8 @@ gimplify_init_constructor (tree *expr_p,
 }
 
   object = TREE_OPERAND (*expr_p, 0);
-  ctor = TREE_OPERAND (*expr_p, 1) =
-optimize_compound_literals_in_ctor (TREE_OPERAND (*expr_p, 1));
+  ctor = TREE_OPERAND (*expr_p, 1)
+= optimize_compound_literals_in_ctor (TREE_OPERAND (*expr_p, 1));
   type = TREE_TYPE (ctor);
   elts = CONSTRUCTOR_ELTS (ctor);
   ret = GS_ALL_DONE;
@@ -4911,6 +4911,8 @@ gimplify_init_constructor (tree *expr_p,
 {
   tree lhs = TREE_OPERAND (*expr_p, 0);
   tree rhs = TREE_OPERAND (*expr_p, 1);
+  if (want_value && object == lhs)
+   lhs = unshare_expr (lhs);
   gassign *init = gimple_build_assign (lhs, rhs);
   gimplify_seq_add_stmt (pre_p, init);
 }
--- gcc/testsuite/gcc.c-torture/compile/pr79089.c.jj2017-01-16 
10:24:05.772227397 +0100
+++ gcc/testsuite/gcc.c-torture/compile/pr79089.c   2017-01-16 
10:23:51.0 +0100
@@ -0,0 +1,12 @@
+/* PR c/79089 */
+
+struct S { int b; };
+struct T { struct S c; } a;
+int d;
+struct S e;
+
+void
+foo ()
+{
+  e = ({ d++; a.c = (struct S) {}; });
+}

Jakub


Re: [PATCH 9c] callgraph: handle __RTL functions

2017-01-16 Thread Jeff Law

On 01/09/2017 07:38 PM, David Malcolm wrote:

The RTL backend code is full of singleton state, so we have to handle
functions as soon as we parse them.  This requires various special-casing
in the callgraph code.

gcc/ChangeLog:
* cgraph.h (symtab_node::native_rtl_p): New decl.
* cgraphunit.c (symtab_node::native_rtl_p): New function.
(symtab_node::needed_p): Don't assert for early assembly output
for __RTL functions.
(cgraph_node::finalize_function): Set "force_output" for __RTL
functions.
(cgraph_node::analyze): Bail out early for __RTL functions.
(analyze_functions): Update assertion to support __RTL functions.
(cgraph_node::expand): Bail out early for __RTL functions.
* gimple-expr.c: Include "tree-pass.h".
(gimple_has_body_p): Return false for __RTL functions.
---
 gcc/cgraph.h  |  4 
 gcc/cgraphunit.c  | 41 ++---
 gcc/gimple-expr.c |  3 ++-
 3 files changed, 44 insertions(+), 4 deletions(-)




diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
index 81a3ae9..ed699e1 100644
--- a/gcc/cgraphunit.c
+++ b/gcc/cgraphunit.c
 @@ -568,6 +591,12 @@ cgraph_node::add_new_function (tree fndecl, bool 
lowered)

 void
 cgraph_node::analyze (void)
 {
+  if (native_rtl_p ())
+{
+  analyzed = true;
+  return;
+}
So my concern here would be how this interacts with the rest of the 
cgraph machinery.  Essentially you're saying we've built all the 
properties for the given code.  But AFAICT that can't be true and cgraph 
isn't actually aware of any of the properties of the native RTL code 
(even such things as what functions the native RTL code might call).


So I guess my question is how do you ensure that even though cgraph 
hasn't looked at code that we're appropriately conservative with how the 
file is processed?  Particularly if there's other code in the source 
file that is expected to interact with the RTL native code?


Jeff


Re: [PATCH 9d] Don't call delete_tree_ssa for __RTL functions

2017-01-16 Thread Jeff Law

On 01/10/2017 06:42 AM, Richard Biener wrote:

On Tue, Jan 10, 2017 at 3:38 AM, David Malcolm  wrote:

gcc/ChangeLog:
* final.c (rest_of_clean_state): Don't call delete_tree_ssa for
__RTL functions.


Heh, so you are lucky that nothing looks at this.  MEM_EXPRs can
contain SSA names
these days (for points-to info), but I suppose you don't parse
MEM_EXPRs fully yet.

I'm somewhat inclined to tell you calling init_tree_ssa () for __RTL
functions...
Perhaps, but I suspect it'll just open up another can-o-worms WRT state 
that needs to be initialized.


Jeff



Re: [PATCH 9d] Don't call delete_tree_ssa for __RTL functions

2017-01-16 Thread Jeff Law

On 01/09/2017 07:38 PM, David Malcolm wrote:

gcc/ChangeLog:
* final.c (rest_of_clean_state): Don't call delete_tree_ssa for
__RTL functions.
OK.   And just for the record, these patches were submitted prior to 
stage1 close.  I'm hesitant to go forward with them unless the set as a 
whole can easily be seen to not affect the existing front-end.


jeff



Re: [PATCH, GCC/LRA, gcc-5/6-branch] Fix PR78617: Fix conflict detection in rematerialization

2017-01-16 Thread Vladimir Makarov



On 01/16/2017 02:26 PM, Jeff Law wrote:

On 01/13/2017 11:19 AM, Thomas Preudhomme wrote:

Ping? I'm not sure if an ok from Valdimir is enough or if I also need RM
approval.

Vlad's approval is all you need.


Thomas, the patch is ok for backporting.  It is pretty safe.



Re: [PATCH] Fix as flags for -c xx.s -g0 -g (PR driver/49726)

2017-01-16 Thread Jeff Law

On 01/10/2017 03:51 PM, Jakub Jelinek wrote:

Hi!

ASM_DEBUG_SPEC uses usually %{g:%{!g0:--gdwarf2}} or something similar.
In the past, it used to be %{g:--gdwarf2}.  Both have problems (and thus
this is also a regression).  The problem with the current ASM_DEBUG_SPEC
is that if one uses -g0 -g, which normally means that -g overrides -g0,
we still don't supply --gdwarf2 to as.  Also, there are many options
starting with -g, so if one just uses gcc -grecord-gcc-switches foo.s,
--gdwarf2 is passed to as too, even when gcc -grecord-gcc-switches foo.c
actually doesn't emit any debug info.  The problem with older
ASM_DEBUG_SPEC has been that even -g0 resulted in --gdwarf2 being passed
to as, or -g -g0 too.

The following patch fixes that by introducing a new spec function, which
compares debug_info_level against its argument (so in some other place
we could e.g. check if debug_info_level >= 3 and similar if needed).
To make this work, I had to mark all the -g* options with the Driver
flag, so that debug_info_level isn't computed just in the FEs, but also in
the driver.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-01-10  Jakub Jelinek  

PR driver/49726
* gcc.c (debug_level_greater_than_spec_func): New function.
(static_spec_functions): Add debug-level-gt spec function.
(ASM_DEBUG_SPEC, cpp_options): Use %:debug-level-gt(0) instead of
!g0.
* config/darwin.h (DSYMUTIL_SPEC, ASM_DEBUG_SPEC): Likewise.
* config/darwin9.h (DSYMUTIL_SPEC, ASM_DEBUG_SPEC): Likewise.
* common.opt (g, gcoff, gdwarf, gdwarf-, ggdb, gno-pubnames,
gpubnames, ggnu-pubnames, gno-record-gcc-switches,
grecord-gcc-switches, gno-strict-dwarf, gstrict-dwarf, gstabs,
gstabs+, gtoggle, gvms, gxcoff, gxcoff+): Add Driver flag.
c-family/
* c.opt (gen-decls): Add Driver flag.
ada/
* gcc-interface/lang.opt (gant, gnatO, gnat): Add Driver flag.

OK.
jeff



Re: [PATCH, bugfix] builtin expansion of strcmp for rs6000

2017-01-16 Thread Aaron Sawdey
Here is an updated version of this patch. 

Tulio noted that glibc's strncmp test was failing. This turned out to
be the use of signed HOST_WIDE_INT for handling strncmp length. The
glibc test calls strncmp with length 2^64-1, presumably to provoke
exactly this type of bug. Fixing the issue required changing
select_block_compare_mode() and expand_block_compare() as well.

The other change is if we must emit a runtime check for the 4k
crossing, then we might as well set base_align to 8 and emit the best
possible code.

OK for trunk if bootstrap/regtest passes on ppc64/ppc64le?

Thanks,
  Aaron

On Wed, 2017-01-11 at 11:26 -0600, Aaron Sawdey wrote:
> This expands on the previous patch. For strcmp and for strncmp with N
> larger than 64, the first 64 bytes of comparison is expanded inline
> and
> then a call to strcmp or strncmp is emitted to compare the remainder
> if
> the strings are found to be equal at that point. 
> 
> -mstring-compare-inline-limit=N determines how many block comparisons
> are emitted. With the default 8, and 64-bit code, you get 64 bytes. 
> 
> Performance testing on a power8 system shows that the code is
> anywhere
> from 2-8 times faster than RHEL7.2 glibc strcmp/strncmp depending on
> alignment and length.
> 
> In the process of adding this I discovered that the expansion being
> generated for strncmp had a bug in that it was not testing for a zero
> byte to terminate the comparison. As a result inputs like
> strncmp("AB\0CDEFGX", "AB\0CDEFGY", 9) would be compared not equal.
> The
> initial comparison of a doubleword would be equal so a second one
> would
> be fetched and compared, ignoring the zero byte that should have
> terminated comparison. The fix is to use a cmpb to check for zero
> bytes
> in the equality case before comparing the next chunk. I updated the
> strncmp-1.c test case to check for this, and also added a new 
> strcmp-1.c test case to check strcmp expansion. Also both now have a
> length 100 tests to check the transition from the inline comparison
> to
> the library call for the remainder.
> 
> ChangeLog
> 2017-01-11  Aaron Sawdey  
>   * config/rs6000/rs6000-protos.h (expand_strn_compare): Add arg.
>   * config/rs6000/rs6000.c (expand_strn_compare): Add ability to
> expand
>   strcmp. Fix bug where comparison didn't stop with zero byte.
>   * config/rs6000/rs6000.md (cmpstrnsi): Args to
> expand_strn_compare.
>   (cmpstrsi): Add pattern.
> 
> gcc.dg/ChangeLog
> 2017-01-11  Aaron Sawdey  
>   * gcc.dg/strcmp-1.c: New.
>   * gcc.dg/strncmp-1.c: Add test for a bug that escaped.
> 
> 
> 
> 
-- 
Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
050-2/C113  (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC ToolchainIndex: gcc/config/rs6000/rs6000-protos.h
===
--- gcc/config/rs6000/rs6000-protos.h	(revision 244503)
+++ gcc/config/rs6000/rs6000-protos.h	(working copy)
@@ -78,7 +78,7 @@
 extern int expand_block_clear (rtx[]);
 extern int expand_block_move (rtx[]);
 extern bool expand_block_compare (rtx[]);
-extern bool expand_strn_compare (rtx[]);
+extern bool expand_strn_compare (rtx[], int);
 extern const char * rs6000_output_load_multiple (rtx[]);
 extern bool rs6000_is_valid_mask (rtx, int *, int *, machine_mode);
 extern bool rs6000_is_valid_and_mask (rtx, machine_mode);
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c	(revision 244503)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -19310,7 +19310,8 @@
WORD_MODE_OK indicates using WORD_MODE is allowed, else SImode is
the largest allowable mode.  */
 static machine_mode
-select_block_compare_mode (HOST_WIDE_INT offset, HOST_WIDE_INT bytes,
+select_block_compare_mode (unsigned HOST_WIDE_INT offset,
+			   unsigned HOST_WIDE_INT bytes,
 			   HOST_WIDE_INT align, bool word_mode_ok)
 {
   /* First see if we can do a whole load unit
@@ -19322,7 +19323,7 @@
  Do largest chunk possible without violating alignment rules.  */
 
   /* The most we can read without potential page crossing.  */
-  HOST_WIDE_INT maxread = ROUND_UP (bytes, align);
+  unsigned HOST_WIDE_INT maxread = ROUND_UP (bytes, align);
 
   if (word_mode_ok && bytes >= UNITS_PER_WORD)
 return word_mode;
@@ -19410,8 +19411,8 @@
   gcc_assert (GET_MODE (target) == SImode);
 
   /* Anything to move? */
-  HOST_WIDE_INT bytes = INTVAL (bytes_rtx);
-  if (bytes <= 0)
+  unsigned HOST_WIDE_INT bytes = INTVAL (bytes_rtx);
+  if (bytes == 0)
 return true;
 
   /* The code generated for p7 and older is not faster than glibc
@@ -19432,14 +19433,14 @@
 
   /* Strategy phase.  How many ops will this take and should we expand it?  */
 
-  int offset = 0;
+  unsigned HOST_WIDE_INT offset = 0;
   machine_mode load_mode =
 select_block_compare_mode (offset, bytes, base_align, word_mode_ok);
-  int 

[PATCH, i386]: Tighten checks in HARD_REGNO_CALLER_SAVE_MODE

2017-01-16 Thread Uros Bizjak
Hello!

In addition to mask registers, QImode and HImode fixups in
HARD_REGNO_CALLER_SAVE_MODE apply only to general registers.

2017-01-16  Uros Bizjak  

* config/i386/i386.h (HARD_REGNO_CALLER_SAVE_MODE): Apply HImode and
QImode fixups to general and mask registers only.

Bootstrapped and regression tested on x86_64-linux-gnu.

Committed to mainline SVN.

Uros.
Index: config/i386/i386.h
===
--- config/i386/i386.h  (revision 244497)
+++ config/i386/i386.h  (working copy)
@@ -1211,9 +1211,10 @@ extern const char *host_detect_local_cpu (int argc
   (CC_REGNO_P (REGNO) ? VOIDmode   \
: (MODE) == VOIDmode && (NREGS) != 1 ? VOIDmode \
: (MODE) == VOIDmode ? choose_hard_reg_mode ((REGNO), (NREGS), false) \
-   : (MODE) == HImode && !(TARGET_PARTIAL_REG_STALL\
+   : (MODE) == HImode && !((GENERAL_REGNO_P (REGNO)\
+   && TARGET_PARTIAL_REG_STALL)\
   || MASK_REGNO_P (REGNO)) ? SImode\
-   : (MODE) == QImode && !(TARGET_64BIT || QI_REGNO_P (REGNO)  \
+   : (MODE) == QImode && !(ANY_QI_REGNO_P (REGNO)  \
   || MASK_REGNO_P (REGNO)) ? SImode\
: (MODE))
 


Re: PR79066, non-PIC code generated for powerpc glibc with -fpic

2017-01-16 Thread Segher Boessenkool
On Mon, Jan 16, 2017 at 03:50:01PM +1030, Alan Modra wrote:
> > > > Okay for trunk if there is nothing unexpected.  Thanks!
> > > 
> > > I guess I should at least build glibc.
> > 
> > Yes exactly, something big that uses pic -- it is pretty obvious it won't
> > change anything for non-pic.
> 
> glibc built fine without elf_high/elf_low with no regressions in its
> testsuite.

I meant comparing the generated code, sorry.

> However, there is a problem in rs6000_emit_allocate_stack
> -fstack-limit-symbol=SYMBOL code.  This now might ICE if someone tries
> to use the option with -fpic/PIC.  I reckon the option combination to
> be little used, so it shouldn't hurt to disable -fstack-limit-symbol
> for PIC.  (We were generating non-PIC for the trap, so we probably
> would have gotten a complaint about text relocs in shared libraries.)
> 
> This revised patch has been bootstrapped and regression tested as
> before, and tested with glibc too.  OK?
> 
>   PR target/79066
>   * config/rs6000/rs6000.md (elf_high, elf_low): Disable when pic.
>   * config/rs6000/rs6000.c (rs6000_emit_allocate_stack): Don't allow
>   symbolic stack limit when pic.
> testsuite/
>   * gcc.target/powerpc/pr79066.c: New.
> 
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index 11394b2..2dd6bbe 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -27668,7 +27668,8 @@ rs6000_emit_allocate_stack (HOST_WIDE_INT size, rtx 
> copy_reg, int copy_off)
>   }
>else if (GET_CODE (stack_limit_rtx) == SYMBOL_REF
>  && TARGET_32BIT
> -&& DEFAULT_ABI == ABI_V4)
> +&& DEFAULT_ABI == ABI_V4
> +&& !flag_pic)
>   {
> rtx toload = gen_rtx_CONST (VOIDmode,
> gen_rtx_PLUS (Pmode,

Please at least make it sorry() for ABI_V4 && flag_pic.  Or what does it
result in with the patch as-is?

Okay with that change (if needed).  Thanks!


Segher


Re: [PATCH, GCC/LRA, gcc-5/6-branch] Fix PR78617: Fix conflict detection in rematerialization

2017-01-16 Thread Jeff Law

On 01/13/2017 11:19 AM, Thomas Preudhomme wrote:

Ping? I'm not sure if an ok from Valdimir is enough or if I also need RM
approval.

Vlad's approval is all you need.


jeff



Re: [PATCH] Allow building GCC with PTX offloading even without CUDA being installed (gcc and nvptx-tools patches)

2017-01-16 Thread Jeff Law

On 01/13/2017 11:28 AM, Jakub Jelinek wrote:

On Fri, Jan 13, 2017 at 06:19:02PM +, Joseph Myers wrote:

--- libgomp/plugin/cuda/cuda.h.jj   2017-01-13 15:58:00.966544147 +0100
+++ libgomp/plugin/cuda/cuda.h  2017-01-13 17:02:47.355817896 +0100
@@ -0,0 +1,174 @@
+/* CUDA API description.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.


The new file should presumably have the runtime license exception.


Agreed (though, most likely the file isn't copyrightable anyway, but
we use copyright boilerplate for various files that might not be
copyrightable).  But we should use it not just for cuda.h, but also
for hsa.h and hsa_ext_finalize.h (CCing Martin who has added those).

2017-01-13  Jakub Jelinek  

* plugin/cuda/cuda.h: Add GCC runtime library exception.
* plugin/hsa.h: Likewise.
* plugin/hsa_ext_finalize.h: Likewise.
Yea, seems like an oversight.  Certainly the intention is that using 
cuda & hsa in and of itself doesn't require the resulting executable to 
be GPL licensed.


jeff



Re: [PATCH][PR tree-optimization/79090] Fix two minor DSE bugs

2017-01-16 Thread Jeff Law

On 01/16/2017 01:51 AM, Richard Biener wrote:

On Sun, Jan 15, 2017 at 10:34 AM, Jeff Law  wrote:


At one time I know I had the max_size == size test in valid_ao_ref_for_dse.
But it got lost at some point.  This is what caused the Ada failure.

Technically it'd be OK for the potentially dead store to have a variable
size as long as the later stores covered the entire range of the potentially
dead store.  I doubt this happens enough to be worth checking.

The ppc64 big endian failures were more interesting.  We had this in the IL:

memmove (dst, src, 0)

The trimming code assumes that there's at least one live byte in the store,
which obviously isn't the case here.  The net result is we compute an
incorrect trim and the copy goes wild with incorrect addresses and lengths.
This is trivial to fix by validating that the store has a nonzero length.

I was a bit curious how often this happened in practice because such a call
is trivially dead.  ~80 during a bootstrap and a few dozen in the testsuite.
Given how trivial it is to detect and optimize, this patch includes removal
of such calls.  This hunk makes the check for zero size in
valid_ao_ref_for_dse redundant, but I'd like to keep the check -- if we add
more builtin support without filtering zero size we'd regress again.


Interesting - we do fold memset (..., 0) away so this means we either
have an unfolded memset stmt in the IL before DSE.
It's actually exposed by fre3, both in the original test and in the 
reduced testcase.  In the reduced testcase we have this just prior to FRE3:


;;   basic block 3, loop depth 0, count 0, freq 7326, maybe hot
;;prev block 2, next block 4, flags: (NEW, REACHABLE, VISITED)
;;pred:   2 [73.3%]  (TRUE_VALUE,EXECUTABLE)
  _3 = MEM[(const struct vec *)_4].m_num;
  if (_3 != 0)
goto ; [36.64%]
  else
goto ; [63.36%]
;;succ:   4 [36.6%]  (TRUE_VALUE,EXECUTABLE)
;;5 [63.4%]  (FALSE_VALUE,EXECUTABLE)

;;   basic block 4, loop depth 0, count 0, freq 2684, maybe hot
;;prev block 3, next block 5, flags: (NEW, REACHABLE, VISITED)
;;pred:   3 [36.6%]  (TRUE_VALUE,EXECUTABLE)
  _6 = vec_av_set.m_vec;
  _7 = _6->m_num;
  _8 = _7 - _3;
  _6->m_num = _8;
  _9 = (long unsigned int) _8;
  _10 = _9 * 4;
  slot.2_11 = slot;
  dest.3_12 = dest;
  memmove (dest.3_12, slot.2_11, _10);
;;succ:   5 [100.0%]  (FALLTHRU,EXECUTABLE)


_3 has the value _6->m_num.  Thus _8 will have the value 0, which in 
turn makes _10 have the value zero as seen in the .fre3 dump:


;;   basic block 4, loop depth 0, count 0, freq 2684, maybe hot
;;prev block 3, next block 5, flags: (NEW, REACHABLE, VISITED)
;;pred:   3 [36.6%]  (TRUE_VALUE,EXECUTABLE)
  _4->m_num = 0;
  slot.2_11 = slot;
  dest.3_12 = dest;
  memmove (dest.3_12, slot.2_11, 0);

In the full test its similar.

I don't know if you want to try and catch this in FRE though.

If we detect in DCE (where it makes reasonable sense) rather than DSE, 
then we detect the dead mem* about 17 passes earlier and the dead 
argument setup about 20 passes earlier.  In the testcase I looked at, I 
didn't see additional secondary optimizations enabled, but I could 
imagine cases where it might.  Seems like a gcc-8 thing though.


Jeff




[PATCH] Fix testcase for PR c/78304

2017-01-16 Thread David Malcolm
On Mon, 2017-01-16 at 13:31 +0100, Rainer Orth wrote:
> Hi Christophe,
> 
> > > Successfully bootstrapped on x86_64-pc-linux-gnu;
> > > adds 34 PASS results to gcc.sum.
> > > 
> > These 2 tests fail on arm:
> > 
> >   gcc.dg/format/pr78304.c (test for warnings, line 9)
> >   gcc.dg/format/pr78304.c   -DWIDE   (test for warnings, line 9)
> 
> also on sparc-sun-solaris2.12 and i386-pc-solaris2.12, 32-bit only.
> 
>   Rainer

Sorry about the failures.

The tests I committed made assumptions about size_t and long
being invalid for use with "%u".

The tests only need some invalid type, so this patch converts
them to attempt a print "const char *" with "%u", which should be
invalid for every target (and hence generate the expected warning).

I reproduced the problem on i686-pc-linux-gnu, and the patch fixes
it there. 

Committed to trunk as r244502.

Does this fix the test for you?

Thanks; sorry again.
Dave

gcc/testsuite/ChangeLog:
PR c/78304
* gcc.dg/format/pr78304.c: Convert argument from integral type
to a pointer.
* gcc.dg/format/pr78304-2.c: Likewise.
---
 gcc/testsuite/gcc.dg/format/pr78304-2.c | 4 ++--
 gcc/testsuite/gcc.dg/format/pr78304.c   | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/format/pr78304-2.c 
b/gcc/testsuite/gcc.dg/format/pr78304-2.c
index 5ee6d65..83648c4 100644
--- a/gcc/testsuite/gcc.dg/format/pr78304-2.c
+++ b/gcc/testsuite/gcc.dg/format/pr78304-2.c
@@ -5,7 +5,7 @@ extern int printf (const char *, ...);
 
 # define PRIu32"u"
 
-void test (long size)
+void test (const char *msg)
 {
-  printf ("size: %" PRIu32 "\n", size); /* { dg-warning "expects argument of 
type" } */
+  printf ("size: %" PRIu32 "\n", msg); /* { dg-warning "expects argument of 
type" } */
 }
diff --git a/gcc/testsuite/gcc.dg/format/pr78304.c 
b/gcc/testsuite/gcc.dg/format/pr78304.c
index d0a96f6..f6ad807 100644
--- a/gcc/testsuite/gcc.dg/format/pr78304.c
+++ b/gcc/testsuite/gcc.dg/format/pr78304.c
@@ -4,7 +4,7 @@
 #include 
 #include 
 
-void test (size_t size)
+void test (const char *msg)
 {
-  printf ("size: %" PRIu32 "\n", size); /* { dg-warning "expects argument of 
type" } */
+  printf ("size: %" PRIu32 "\n", msg); /* { dg-warning "expects argument of 
type" } */
 }
-- 
1.8.5.3



[PATCH, rs6000] Add support for vec_rlnm and vec_rlmi

2017-01-16 Thread Bill Schmidt
Hi,

ISA 3.0 introduces new instructions vrlwmi, vrldmi, vrlwnm, and vrldnm.
This patch provides access to them via built-ins, including the vec_rlmi
and vec_rlnm built-ins mandated by Appendix A of the ELFv2 ABI document.
I also added a vec_vrlnm built-in, which is a more direct translation of
the vrlwnm and vrldnm instructions that some users might prefer.

This has been bootstrapped and tested on powerpc64le-unknown-linux-gnu
with no regressions.  I am in process of testing them on a big-endian
system as well.  Provided there are no problems there, is this ok for
trunk?

Thanks,
Bill


[gcc]

2017-01-16  Bill Schmidt  

* config/rs6000/altivec.h (vec_rlmi): New #define.
(vec_vrlnm): Likewise.
(vec_rlnm): Likewise.
* config/rs6000/altivec.md (UNSPEC_VRLMI): New UNSPEC enum value.
(UNSPEC_VRLNM): Likewise.
(VIlong): New mode iterator.
(altivec_vrlmi): New define_insn.
(altivec_vrlnm): Likewise.
* config/rs6000/rs6000-builtin.def (VRLWNM): New monomorphic
function entry.
(VRLDNM): Likewise.
(RLNM): New polymorphic function entry.
(VRLWMI): New monomorphic function entry.
(VRLDMI): Likewise.
(RLMI): New polymorphic function entry.
* config/rs6000/r6000-c.c (altivec_overloaded_builtin_table): Add
new entries for P9V_BUILTIN_VEC_RLMI and P9V_BUILTIN_VEC_RLNM.
* doc/extend.texi: Add description of vec_rlmi, vec_rlnm, and
vec_vrlnm.

[gcc/testsuite]

2017-01-16  Bill Schmidt  

* vec-rlmi-rlnm.c: New file.


Index: gcc/config/rs6000/altivec.h
===
--- gcc/config/rs6000/altivec.h (revision 244498)
+++ gcc/config/rs6000/altivec.h (working copy)
@@ -168,6 +168,9 @@
 #define vec_re __builtin_vec_re
 #define vec_round __builtin_vec_round
 #define vec_recipdiv __builtin_vec_recipdiv
+#define vec_rlmi __builtin_vec_rlmi
+#define vec_vrlnm __builtin_vec_rlnm
+#define vec_rlnm(a,b,c) (__builtin_vec_rlnm(a,(b<<8)|c))
 #define vec_rsqrt __builtin_vec_rsqrt
 #define vec_rsqrte __builtin_vec_rsqrte
 #define vec_vsubfp __builtin_vec_vsubfp
Index: gcc/config/rs6000/altivec.md
===
--- gcc/config/rs6000/altivec.md(revision 244498)
+++ gcc/config/rs6000/altivec.md(working copy)
@@ -156,6 +156,8 @@
UNSPEC_CMPRB
UNSPEC_CMPRB2
UNSPEC_CMPEQB
+   UNSPEC_VRLMI
+   UNSPEC_VRLNM
 ])
 
 (define_c_enum "unspecv"
@@ -168,8 +170,10 @@
 
 ;; Like VI, defined in vector.md, but add ISA 2.07 integer vector ops
 (define_mode_iterator VI2 [V4SI V8HI V16QI V2DI])
-;; Short vec in modes
+;; Short vec int modes
 (define_mode_iterator VIshort [V8HI V16QI])
+;; Longer vec int modes for rotate/mask ops
+(define_mode_iterator VIlong [V2DI V4SI])
 ;; Vec float modes
 (define_mode_iterator VF [V4SF])
 ;; Vec modes, pity mode iterators are not composable
@@ -1627,6 +1631,25 @@
   "vrl %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
+(define_insn "altivec_vrlmi"
+  [(set (match_operand:VIlong 0 "register_operand" "=v")
+(unspec:VIlong [(match_operand:VIlong 1 "register_operand" "0")
+   (match_operand:VIlong 2 "register_operand" "v")
+   (match_operand:VIlong 3 "register_operand" "v")]
+  UNSPEC_VRLMI))]
+  "TARGET_P9_VECTOR"
+  "vrlmi %0,%2,%3"
+  [(set_attr "type" "veclogical")])
+
+(define_insn "altivec_vrlnm"
+  [(set (match_operand:VIlong 0 "register_operand" "=v")
+(unspec:VIlong [(match_operand:VIlong 1 "register_operand" "v")
+   (match_operand:VIlong 2 "register_operand" "v")]
+  UNSPEC_VRLNM))]
+  "TARGET_P9_VECTOR"
+  "vrlnm %0,%1,%2"
+  [(set_attr "type" "veclogical")])
+
 (define_insn "altivec_vsl"
   [(set (match_operand:V4SI 0 "register_operand" "=v")
 (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v")
Index: gcc/config/rs6000/rs6000-builtin.def
===
--- gcc/config/rs6000/rs6000-builtin.def(revision 244498)
+++ gcc/config/rs6000/rs6000-builtin.def(working copy)
@@ -1918,6 +1918,8 @@ BU_P9V_OVERLOAD_2 (VSRV,  "vsrv")
 BU_P9V_AV_2 (VADUB,"vadub",CONST,  vaduv16qi3)
 BU_P9V_AV_2 (VADUH,"vaduh",CONST,  vaduv8hi3)
 BU_P9V_AV_2 (VADUW,"vaduw",CONST,  vaduv4si3)
+BU_P9V_AV_2 (VRLWNM,   "vrlwnm",   CONST,  altivec_vrlwnm)
+BU_P9V_AV_2 (VRLDNM,   "vrldnm",   CONST,  altivec_vrldnm)
 
 /* ISA 3.0 vector overloaded 2 argument functions. */
 BU_P9V_OVERLOAD_2 (VADU,   "vadu")
@@ -1924,7 +1926,15 @@ BU_P9V_OVERLOAD_2 (VADU, "vadu")
 BU_P9V_OVERLOAD_2 (VADUB,  "vadub")
 BU_P9V_OVERLOAD_2 (VADUH,  "vaduh")
 BU_P9V_OVERLOAD_2 (VADUW,  "vaduw")

Re: [driver, doc] Support escaping special characters in specs

2017-01-16 Thread Sandra Loosemore

On 01/16/2017 03:54 AM, Rainer Orth wrote:

Hi Sandra,


On 01/13/2017 05:59 AM, Rainer Orth wrote:

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -26391,6 +26391,13 @@ be as many clauses as you need.  This ma

  @end table

+The switch matching text @code{S} in a %@{@code{S}@},
+%@{@code{S}:@code{X}@} or similar construct can use a backslash to
+ignore the special meaning of the character following it, thus allowing
+literal matching of a character that is otherwise specially treated.
+For example, %@{@code{std=iso9899\:1999}:@code{X}@} would substitute
+@code{X} if the @option{-std=iso9899:1999} option were given.
+


I see this "%@{@code{..." markup appears in the paragraph just before this,
but it's wrong.  The whole thing needs to be wrapped in @samp and the
nested @codes removed, like

s/%@{@code{S}:@code{X}@}/@samp{%@{S:X@}}/

etc.


I see, fixed.  I assume this applies to the uses inside @item, too, and
irrespective of %{S:X} or %{S}?


It looked to me like this table environment uses

@table @code

so there should be no need for additional @code markup within the @item 
tags.


I'm not asking you to repair existing bad markup elsewhere in this 
section, just not propagate it to your new paragraph of text.


-Sandra



RE: [PATCH] MIPS: Fix generation of DIV.G and MOD.G for Loongson targets.

2017-01-16 Thread Maciej W. Rozycki
On Mon, 16 Jan 2017, Toma Tabacu wrote:

> After searching through the archives, I have found an interesting bit of
> information about DIV.G/MOD.G in the original submission thread:
> 
> > > Ruan Beihong 23 July 2008:
> > > 
> > > I've seen the Loongson 2F manual carefully. The (d)div(u) is 
> > > internally splited into one (d)div(u).g and one (d)mod(u).g. So I said 
> > > before was wrong. The truth is that, (d)div(u).g and (d)mod(u).g are 
> > > always faster than (d)div(u), at least the time spend on mflo/mfhi is 
> > > saved. 
> > > 
> > > James Ruan 
> > 
> > Richard Sandiford 24 July 2008:
> > 
> > OK, great.  In that case, it should simply be a case of disabling
> > the divmod-related insns for Loongson, in addition to your patch.
> > (Probably stating the obvious there, sorry.)
> > 
> > Richard
> 
> Here's the link for part 1 of the submission thread (has the quotes from 
> above):
> https://gcc.gnu.org/ml/gcc-patches/2008-07/msg01529.html
> and here's part 2:
> https://gcc.gnu.org/ml/gcc-patches/2008-11/msg00273.html

 Thanks for digging this out!

> If DIV.G/MOD.G are faster, according to Ruan Beihong, and also smaller than 
> DIV
> (or the same size [1]), as pointed out by Maciej, then I am led to the same
> conclusion as Richard Sandiford: that only DIV.G/MOD.G should be generated for
> Loongson.
> 
> I think it would still be a good idea to add a test for separated DIV.G/MOD.G,
> though.

 Possibly, though the combined tests need to stay then, to make sure 
generic DIV/DIVU is not ever produced.

> What are your thoughts on this ?
> Have I misunderstood something in the context of the submission thread ?
> 
> Regards,
> Toma
> 
> [1] I've noticed that GCC generates the same TEQ instruction twice if both
> DIV.G and MOD.G are needed, which makes the sequence just as big as
> DIV + TEQ + MFHI + MFLO; this seems unnecessary to me.

 This ought to be handled then, likely by adding Loongson-specific RTL 
insns matching the `divmod4' and `udivmod4' expanders.  It may 
be as simple as say (conceptually, untested):

(define_insn "divmod4_loongson"
  [(set (match_operand:GPR 0 "register_operand" "=d")
(any_div:GPR (match_operand:GPR 1 "register_operand" "d")
 (match_operand:GPR 2 "register_operand" "d")))
   (set (match_operand:GPR 3 "register_operand" "=d")
(any_mod:GPR (match_dup 1)
 (match_dup 2)))]
  "TARGET_LOONGSON_2EF"
{
  return mips_output_division
("div.g\t%0,%1,%2\;mod.g\t%3,%1,%2", operands);
}
  [(set_attr "type" "idiv")
   (set_attr "mode" "")])

although any final fix will have to take an instruction count adjustment 
into account too, as `mips_idiv_insns' won't as it stands handle the new 
case.

  Maciej


Re: [PATCH][GCC][PATCHv3] Improve fpclassify w.r.t IEEE like numbers in GIMPLE.

2017-01-16 Thread Tamar Christina
Ping. Does this still have a chance or should I table it till Stage 1 opens 
again?

Tamar.


From: Tamar Christina
Sent: Tuesday, January 3, 2017 9:55:51 AM
To: Jeff Law; Joseph Myers
Cc: GCC Patches; Wilco Dijkstra; rguent...@suse.de; Michael Meissner; nd
Subject: Re: [PATCH][GCC][PATCHv3] Improve fpclassify w.r.t IEEE like numbers 
in GIMPLE.

Hi Jeff,

I wasn't sure if you saw the updated patch attached to the previous email or if 
you just hadn't had the time to look at it yet.

Cheers,
Tamar


From: Jeff Law 
Sent: Monday, December 19, 2016 8:27:33 PM
To: Tamar Christina; Joseph Myers
Cc: GCC Patches; Wilco Dijkstra; rguent...@suse.de; Michael Meissner; nd
Subject: Re: [PATCH][GCC][PATCHv3] Improve fpclassify w.r.t IEEE like numbers 
in GIMPLE.

On 12/15/2016 03:14 AM, Tamar Christina wrote:
>
>> On a high level, presumably there's no real value in keeping the old
>> code to "fold" fpclassify.  By exposing those operations as integer
>> logicals for the fast path, if the FP value becomes a constant during
>> the optimization pipeline we'll see the reinterpreted values flowing
>> into the new integer logical tests and they'll simplify just like
>> anything else.  Right?
>
> Yes, if it becomes a constant it will be folded away, both in the integer and 
> the fp case.
Thanks for clarifying.


>
>> The old IBM format is still supported, though they are expected to be
>> moveing towards a standard ieee 128 bit format.  So my only concern is
>> that we preserve correct behavior for those cases -- I don't really care
>> about optimizing them.  So I think you need to keep them.
>
> Yes, I re-added them. It's mostly a copy paste from what they were in the
> other functions. But I have no way of testing it.
Understood.

>>  > +  const HOST_WIDE_INT type_width = TYPE_PRECISION (type);
>>> +  return (format->is_binary_ieee_compatible
>>> +   && FLOAT_WORDS_BIG_ENDIAN == WORDS_BIG_ENDIAN
>>> +   /* We explicitly disable quad float support on 32 bit systems.  */
>>> +   && !(UNITS_PER_WORD == 4 && type_width == 128)
>>> +   && targetm.scalar_mode_supported_p (mode));
>>> +}
>> Presumably this is why you needed the target.h inclusion.
>>
>> Note that on some systems we even disable 64bit floating point support.
>> I suspect this check needs a little re-thinking as I don't think that
>> checking for a specific UNITS_PER_WORD is correct, nor is checking the
>> width of the type.  I'm not offhand sure what the test should be, just
>> that I think we need something better here.
>
> I think what I really wanted to test here is if there was an integer mode 
> available
> which has the exact width as the floating point one. So I have replaced this 
> with
> just a call to int_mode_for_mode. Which is probably more correct.
I'll need to think about it, but would inherently think that
int_mode_for_mode is better than an explicit check of UNITS_PER_WORD and
typewidth.


>
>>> +
>>> +/* Determines if the given number is a NaN value.
>>> +   This function is the last in the chain and only has to
>>> +   check if it's preconditions are true.  */
>>> +static tree
>>> +is_nan (gimple_seq *seq, tree arg, location_t loc)
>> So in the old code we checked UNGT_EXPR, in the new code's slow path you
>> check UNORDERED.  Was that change intentional?
>
> The old FP code used UNORDERED and the new one was using ORDERED and negating 
> the result.
> I've replaced it with UNORDERED, but both are correct.
OK.  Just wanted to make sure.

jeff



RE: [PATCH] MIPS: Fix generation of DIV.G and MOD.G for Loongson targets.

2017-01-16 Thread Toma Tabacu
After searching through the archives, I have found an interesting bit of
information about DIV.G/MOD.G in the original submission thread:

> > Ruan Beihong 23 July 2008:
> > 
> > I've seen the Loongson 2F manual carefully. The (d)div(u) is 
> > internally splited into one (d)div(u).g and one (d)mod(u).g. So I said 
> > before was wrong. The truth is that, (d)div(u).g and (d)mod(u).g are 
> > always faster than (d)div(u), at least the time spend on mflo/mfhi is 
> > saved. 
> > 
> > James Ruan 
> 
> Richard Sandiford 24 July 2008:
> 
> OK, great.  In that case, it should simply be a case of disabling
> the divmod-related insns for Loongson, in addition to your patch.
> (Probably stating the obvious there, sorry.)
> 
> Richard

Here's the link for part 1 of the submission thread (has the quotes from above):
https://gcc.gnu.org/ml/gcc-patches/2008-07/msg01529.html
and here's part 2:
https://gcc.gnu.org/ml/gcc-patches/2008-11/msg00273.html

If DIV.G/MOD.G are faster, according to Ruan Beihong, and also smaller than DIV
(or the same size [1]), as pointed out by Maciej, then I am led to the same
conclusion as Richard Sandiford: that only DIV.G/MOD.G should be generated for
Loongson.

I think it would still be a good idea to add a test for separated DIV.G/MOD.G,
though.

What are your thoughts on this ?
Have I misunderstood something in the context of the submission thread ?

Regards,
Toma

[1] I've noticed that GCC generates the same TEQ instruction twice if both
DIV.G and MOD.G are needed, which makes the sequence just as big as
DIV + TEQ + MFHI + MFLO; this seems unnecessary to me.



RE: [PATCH, MIPS] Target flag and build option to disable indexed memory OPs.

2017-01-16 Thread Matthew Fortune
Doug Gilmore 
> I recently bisected PR78176 to problems introduced with r21650.
> 
> Given the short time until the release, we would like to provide a
> target flag and build option to avoid the bug until we are able to
> resolve the problem with the commit.  Note that as Matthew Fortune has
> mentioned in the PR:
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78176#c5
> 
> the problem could also be addressed by updates to the Linux kernel since
> the problem is only exposed by running MIPS 32-bit binaries on 64-bit
> kernels.
> 
> Bootstrapped on X86_64, regression tested on X86_64 and MIPS.
> 
> OK to commit?

Given this is a generic reference to indexed load/store and the issue could
affect any indexed operation then I think it needs to include all of the
following as well:

/* ISA has lwxs instruction (load w/scaled index address.  */
#define ISA_HAS_LWXS((TARGET_SMARTMIPS || TARGET_MICROMIPS) \
 && !TARGET_MIPS16)

/* ISA has lbx, lbux, lhx, lhx, lhux, lwx, lwux, or ldx instruction. */
#define ISA_HAS_LBX (TARGET_OCTEON2)
#define ISA_HAS_LBUX(ISA_HAS_DSP || TARGET_OCTEON2)
#define ISA_HAS_LHX (ISA_HAS_DSP || TARGET_OCTEON2)
#define ISA_HAS_LHUX(TARGET_OCTEON2)
#define ISA_HAS_LWX (ISA_HAS_DSP || TARGET_OCTEON2)
#define ISA_HAS_LWUX(TARGET_OCTEON2 && TARGET_64BIT)
#define ISA_HAS_LDX ((ISA_HAS_DSP || TARGET_OCTEON2) \
 && TARGET_64BIT)

The DSP LBUX/LHX/LWX/LDX intrinsics will also need a new AVAIL predicate
to disable them. The snag is that some DSP code will fail to compile if it
uses the DSP load intrinsics directly.

I see no way of avoiding that. Therefore, distributions that use
--without-indexed-load-store will have to cope with some potential DSP
fallout if they enable DSP at all.

@Catherine: I'd like your input here if possible as I advocated this
approach, comments on option names welcome too.  I quite like the verbose
name.

@Doug: Have you tried running the testsuite with the configure option
--without-indexed-load-store? There may be tests that need adjusting where they
test indexed load/store. We probably need a pre-processor macro
to detect if the option is enabled/disabled so that DSP code can be predicated
on indexed load being available.

Thanks,
Matthew


Re: libgo patch committed: Update to Go1.8rc1

2017-01-16 Thread Rainer Orth
Hi Ian,

> This seems to need a version of defs_solaris.go that works with the
> constants and structs living in syscall.

I've made some progress here: I need to check  for
SIOCGLIF* ioctls, and  for the IFT_* constants.  I've
updated configure.in and sysinfo.c accordingly, and adapted
defs_solaris.go as well.  However, I still run into a number of errors:

/vol/gcc/src/hg/trunk/local/libgo/go/golang_org/x/net/lif/defs_solaris.go:29:22:
 error: reference to undefined identifier 'syscall._sockaddr_storage'
 type sockaddrStorage syscall._sockaddr_storage
  ^

/vol/gcc/src/hg/trunk/local/libgo/go/golang_org/x/net/lif/defs_solaris.go:32:36:
 error: reference to undefined identifier 'syscall.LIFC_NOXMIT'
  sysLIFC_NOXMIT  = syscall.LIFC_NOXMIT
^
/vol/gcc/src/hg/trunk/local/libgo/go/golang_org/x/net/lif/defs_solaris.go:33:36:
 error: reference to undefined identifier 'syscall.LIFC_EXTERNAL_SOURCE'
  sysLIFC_EXTERNAL_SOURCE = syscall.LIFC_EXTERNAL_SOURCE
^
/vol/gcc/src/hg/trunk/local/libgo/go/golang_org/x/net/lif/defs_solaris.go:34:36:
 error: reference to undefined identifier 'syscall.LIFC_TEMPORARY'
  sysLIFC_TEMPORARY   = syscall.LIFC_TEMPORARY
^
/vol/gcc/src/hg/trunk/local/libgo/go/golang_org/x/net/lif/defs_solaris.go:35:36:
 error: reference to undefined identifier 'syscall.LIFC_ALLZONES'
  sysLIFC_ALLZONES= syscall.LIFC_ALLZONES
^
/vol/gcc/src/hg/trunk/local/libgo/go/golang_org/x/net/lif/defs_solaris.go:36:36:
 error: reference to undefined identifier 'syscall.LIFC_UNDER_IPMP'
  sysLIFC_UNDER_IPMP  = syscall.LIFC_UNDER_IPMP
^

The LIFC_* constants are only present in their _LIFC_* form in
sysinfo.go so far; need update to mksysinfo.sh.

/vol/gcc/src/hg/trunk/local/libgo/go/golang_org/x/net/lif/defs_solaris.go:40:31:
 error: reference to undefined identifier 'syscall.SIOCGLIFADDR'
  sysSIOCGLIFADDR= syscall.SIOCGLIFADDR
   ^
/vol/gcc/src/hg/trunk/local/libgo/go/golang_org/x/net/lif/defs_solaris.go:41:31:
 error: reference to undefined identifier 'syscall.SIOCGLIFDSTADDR'
  sysSIOCGLIFDSTADDR = syscall.SIOCGLIFDSTADDR
   ^
/vol/gcc/src/hg/trunk/local/libgo/go/golang_org/x/net/lif/defs_solaris.go:42:31:
 error: reference to undefined identifier 'syscall.SIOCGLIFFLAGS'
  sysSIOCGLIFFLAGS   = syscall.SIOCGLIFFLAGS
   ^
/vol/gcc/src/hg/trunk/local/libgo/go/golang_org/x/net/lif/defs_solaris.go:43:31:
 error: reference to undefined identifier 'syscall.SIOCGLIFMTU'
  sysSIOCGLIFMTU = syscall.SIOCGLIFMTU
   ^
/vol/gcc/src/hg/trunk/local/libgo/go/golang_org/x/net/lif/defs_solaris.go:44:31:
 error: reference to undefined identifier 'syscall.SIOCGLIFNETMASK'
  sysSIOCGLIFNETMASK = syscall.SIOCGLIFNETMASK
   ^
/vol/gcc/src/hg/trunk/local/libgo/go/golang_org/x/net/lif/defs_solaris.go:45:31:
 error: reference to undefined identifier 'syscall.SIOCGLIFMETRIC'
  sysSIOCGLIFMETRIC  = syscall.SIOCGLIFMETRIC
   ^
/vol/gcc/src/hg/trunk/local/libgo/go/golang_org/x/net/lif/defs_solaris.go:46:31:
 error: reference to undefined identifier 'syscall.SIOCGLIFNUM'
  sysSIOCGLIFNUM = syscall.SIOCGLIFNUM
   ^
/vol/gcc/src/hg/trunk/local/libgo/go/golang_org/x/net/lif/defs_solaris.go:47:31:
 error: reference to undefined identifier 'syscall.SIOCGLIFINDEX'
  sysSIOCGLIFINDEX   = syscall.SIOCGLIFINDEX
   ^
/vol/gcc/src/hg/trunk/local/libgo/go/golang_org/x/net/lif/defs_solaris.go:48:31:
 error: reference to undefined identifier 'syscall.SIOCGLIFSUBNET'
  sysSIOCGLIFSUBNET  = syscall.SIOCGLIFSUBNET
   ^
/vol/gcc/src/hg/trunk/local/libgo/go/golang_org/x/net/lif/defs_solaris.go:49:31:
 error: reference to undefined identifier 'syscall.SIOCGLIFLNKINFO'
  sysSIOCGLIFLNKINFO = syscall.SIOCGLIFLNKINFO
   ^
/vol/gcc/src/hg/trunk/local/libgo/go/golang_org/x/net/lif/defs_solaris.go:50:31:
 error: reference to undefined identifier 'syscall.SIOCGLIFCONF'
  sysSIOCGLIFCONF= syscall.SIOCGLIFCONF
   ^
/vol/gcc/src/hg/trunk/local/libgo/go/golang_org/x/net/lif/defs_solaris.go:51:31:
 error: reference to undefined identifier 'syscall.SIOCGLIFHWADDR'
  sysSIOCGLIFHWADDR  = syscall.SIOCGLIFHWADDR
   ^

All of them show as

// unknowndefine SIOCGLIFMTU _IOWR('i', 122, struct lifreq)

in gen-sysinfo.go.  Not sure what's going on here...

/vol/gcc/src/hg/trunk/local/libgo/go/golang_org/x/net/lif/defs_solaris.go:73:31:
 error: invalid reference to unexported identifier 'syscall._sizeof_lifnum'
  sizeofLifnum   = syscall._sizeof_lifnum
   ^

Re: [PATCH] Fix gcc.dg/tree-ssa/scev-[345].c testcases

2017-01-16 Thread Christophe Lyon
On 16 January 2017 at 10:43, Richard Biener  wrote:
> On Mon, 16 Jan 2017, Christophe Lyon wrote:
>
>> On 13 January 2017 at 12:16, Bin.Cheng  wrote:
>> > On Fri, Jan 13, 2017 at 9:46 AM, Richard Biener  wrote:
>> >>
>> >> The following is an attempt to change those testcases to be less dependent
>> >> on previous passes.  The original motivation of the testcases seems to be
>> >> testing SCEV capabilities and in turn IVOPTs decisions, thus the testcases
>> >> are changed to check the IVO dump, use the GIMPLE FE feeding the loop
>> >> pipeline directly and skip lowering/store-motion we meanwhile do to
>> >> the testcase.
>> >>
>> >> To avoid some existing issue with CFG construction after GIMPLE parsing
>> >> we need to be able to add GIMPLE_NOPs which the patch enables to generate
>> >> from empty stmts (previously those resulted in parse errors).
>> >>
>> >> Tested the testcases on x86_64 with {,-m32} sofar I'll appreciate
>> >> testing on more targets.
>> > I checked aarch64-elf/aarch64-linux with default configuration, all
>> > passed with this change.
>> >
>>
>> For me, the testcases don't compile with this patch:
>> gcc.dg/tree-ssa/scev-3.c: In function 'f':
>> gcc.dg/tree-ssa/scev-3.c:30:3: error: '__MEM' undeclared (first use in
>> this function)
>> gcc.dg/tree-ssa/scev-3.c:30:3: note: each undeclared identifier is
>> reported only once for each function it appears in
>> gcc.dg/tree-ssa/scev-3.c:30:9: error: expected '=' before '<' token
>>
>> Did I misapply the patch?
>
> You need
>
> 2017-01-12  Richard Biener  
>
> * gimple-parser.c (c_parser_gimple_postfix_expression): Parse
> __MEM.
>
> in c/ChangeLog for it to work.
>

Indeed, thanks.
With it, I confirm the tests now pass on aarch64 and arm.
Thanks

Christophe

> Richard.
>
>> > Thanks,
>> > bin
>> >>
>> >> Full bootstrap / regtest running on x86_64-unknown-linux-gnu.
>> >>
>> >> Richard.
>> >>
>> >
>> >> -}
>> >>  }
>> >>
>> >> -/* { dg-final { scan-tree-dump-times "" 1 "optimized" } } */
>> >> +/* { dg-final { scan-tree-dump-times "" 1 "ivopts" } } */
>>
>>
>
> --
> Richard Biener 
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
> 21284 (AG Nuernberg)


[PATCH] PR66145 use new ABI for std::ios::failure exceptions

2017-01-16 Thread Jonathan Wakely

This changes the version of std::ios_base::failure thrown by
libstdc++.so to be the new __cxx11 ABI one, matching the default for
headers that are trying to catch the exception.

There's no simple way to do this as an easy transition, but hopefully
by now most people are either using the new ABI or building GCC
without the dual ABI.


PR libstdc++/66145
* src/c++11/functexcept.cc: Use new ABI for std::ios_base::failure
exceptions.
* testsuite/27_io/basic_ios/copyfmt/char/1.cc: Don't override ABI
for test, so new ios::failure can be caught.
* testsuite/27_io/basic_ios/exceptions/char/1.cc: Likewise.
* testsuite/27_io/basic_istream/extractors_arithmetic/char/
exceptions_failbit.cc: Likewise.
* testsuite/27_io/basic_istream/extractors_arithmetic/wchar_t/
exceptions_failbit.cc: Likewise.
* testsuite/27_io/basic_istream/extractors_other/char/
exceptions_null.cc: Likewise.
* testsuite/27_io/basic_istream/extractors_other/wchar_t/
exceptions_null.cc: Likewise.
* testsuite/27_io/basic_istream/sentry/char/12297.cc: Likewise.
* testsuite/27_io/basic_istream/sentry/wchar_t/12297.cc: Likewise.
* testsuite/27_io/basic_ostream/inserters_other/char/
exceptions_null.cc: Likewise.
* testsuite/27_io/basic_ostream/inserters_other/wchar_t/
exceptions_null.cc: Likewise.
* testsuite/27_io/ios_base/storage/2.cc: Likewise.

Tested powerpc64le-linux, committed to trunk.

commit a8b9e77459aa5886dbaf3d6aaf2edea45aa60ad9
Author: Jonathan Wakely 
Date:   Mon Jan 16 15:55:36 2017 +

PR66145 use new ABI for std::ios::failure exceptions

PR libstdc++/66145
* src/c++11/functexcept.cc: Use new ABI for std::ios_base::failure
exceptions.
* testsuite/27_io/basic_ios/copyfmt/char/1.cc: Don't override ABI
for test, so new ios::failure can be caught.
* testsuite/27_io/basic_ios/exceptions/char/1.cc: Likewise.
* testsuite/27_io/basic_istream/extractors_arithmetic/char/
exceptions_failbit.cc: Likewise.
* testsuite/27_io/basic_istream/extractors_arithmetic/wchar_t/
exceptions_failbit.cc: Likewise.
* testsuite/27_io/basic_istream/extractors_other/char/
exceptions_null.cc: Likewise.
* testsuite/27_io/basic_istream/extractors_other/wchar_t/
exceptions_null.cc: Likewise.
* testsuite/27_io/basic_istream/sentry/char/12297.cc: Likewise.
* testsuite/27_io/basic_istream/sentry/wchar_t/12297.cc: Likewise.
* testsuite/27_io/basic_ostream/inserters_other/char/
exceptions_null.cc: Likewise.
* testsuite/27_io/basic_ostream/inserters_other/wchar_t/
exceptions_null.cc: Likewise.
* testsuite/27_io/ios_base/storage/2.cc: Likewise.

diff --git a/libstdc++-v3/src/c++11/functexcept.cc 
b/libstdc++-v3/src/c++11/functexcept.cc
index 3210619..6281534 100644
--- a/libstdc++-v3/src/c++11/functexcept.cc
+++ b/libstdc++-v3/src/c++11/functexcept.cc
@@ -20,8 +20,9 @@
 // see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 // .
 
-// We don't want to change the type thrown by __throw_ios_failure (yet?)
-#define _GLIBCXX_USE_CXX11_ABI 0
+// Determines the version of ios_base::failure thrown by __throw_ios_failure.
+// If !_GLIBCXX_USE_DUAL_ABI this will get undefined automatically.
+#define _GLIBCXX_USE_CXX11_ABI 1
 
 #include 
 #include 
diff --git a/libstdc++-v3/testsuite/27_io/basic_ios/copyfmt/char/1.cc 
b/libstdc++-v3/testsuite/27_io/basic_ios/copyfmt/char/1.cc
index e1a30f3..d922b18 100644
--- a/libstdc++-v3/testsuite/27_io/basic_ios/copyfmt/char/1.cc
+++ b/libstdc++-v3/testsuite/27_io/basic_ios/copyfmt/char/1.cc
@@ -17,9 +17,6 @@
 // with this library; see the file COPYING3.  If not see
 // .
 
-// The library still throws the original definition of std::ios::failure
-// { dg-options "-D_GLIBCXX_USE_CXX11_ABI=0" }
-
 // 27.4.4.2 basic_ios member functions
 
 // NB: Don't include any other headers in this file.
diff --git a/libstdc++-v3/testsuite/27_io/basic_ios/exceptions/char/1.cc 
b/libstdc++-v3/testsuite/27_io/basic_ios/exceptions/char/1.cc
index df073cf..a7f829a 100644
--- a/libstdc++-v3/testsuite/27_io/basic_ios/exceptions/char/1.cc
+++ b/libstdc++-v3/testsuite/27_io/basic_ios/exceptions/char/1.cc
@@ -17,9 +17,6 @@
 // with this library; see the file COPYING3.  If not see
 // .
 
-// The library still throws the original definition of std::ios::failure
-// { dg-options "-D_GLIBCXX_USE_CXX11_ABI=0" }
-
 // 27.4.4.2 basic_ios member functions
 
 // NB: Don't include any other headers in this file.
diff --git 
a/libstdc++-v3/testsuite/27_io/basic_istream/extractors_arithmetic/char/exceptions_failbit.cc
 
b/libstdc++-v3/testsuite/27_io/basic_istream/extractors_arithmetic/char/exceptions_failbit.cc

Re: [C++ Patch] PR 71737

2017-01-16 Thread Paolo Carlini

On 16/01/2017 16:11, David Edelsohn wrote:

This patch caused a libstdc++ regression on AIX

libstdc++-v3/include/tr1/shared_ptr.h:556: internal compiler error:
tree check: expected class 'type', have 'exceptional' (error_mark) in
build_variant_type_copy, at tree.c:6737

I noticed, patch reverted for the time being.

Sorry about the annoyance,
Paolo.


Re: [PATCH][AArch64][GCC 6] PR target/79041: Correct -mpc-relative-literal-loads logic in aarch64_classify_symbol

2017-01-16 Thread Kyrill Tkachov


On 13/01/17 16:35, James Greenhalgh wrote:

On Wed, Jan 11, 2017 at 04:32:45PM +, Kyrill Tkachov wrote:

Hi all,

In this PR we generated ADRP/ADD instructions with :lo12: relocations on
symbols even though -mpc-relative-literal-loads is used. This is due to the
confusing double-negative logic of the nopcrelative_literal_loads aarch64
variable and its relation to the aarch64_nopcrelative_literal_loads global
variable in the GCC 6 branch.

Wilco fixed this on trunk as part of a larger patch (r237607) and parts of
that patch were backported, but other parts weren't and that patch now
doesn't apply cleanly to the branch.

As I commented to Jakub at the time he made the first partial backport,
I'd much rather just see all of Wilco's patch backported. We're not on
the verge of a 6 release now, so please just backport Wilco's patch (as
should have been done all along if this had been correctly identified as
a fix rather than a clean-up).


Unfortunately simply backporting the patch does not fix this PR.
The aarch64_classify_symbol changes do not have the desired effect
and the :lo12: relocations are generated.
I'll look into it, but I believe that would require a bigger change than this 
one-liner.

Thanks,
Kyri


Thanks,
James






[PING][PATCH, ARM] PR71607: New approach to arm_disable_literal_pool

2017-01-16 Thread Andre Vieira (lists)
On 28/12/16 09:58, Andre Vieira (lists) wrote:
> On 29/11/16 09:45, Andre Vieira (lists) wrote:
>> On 17/11/16 10:00, Ramana Radhakrishnan wrote:
>>> On Thu, Oct 6, 2016 at 2:57 PM, Andre Vieira (lists)
>>>  wrote:
 Hello,

 This patch tackles the issue reported in PR71607. This patch takes a
 different approach for disabling the creation of literal pools. Instead
 of disabling the patterns that would normally transform the rtl into
 actual literal pools, it disables the creation of this literal pool rtl
 by making the target hook TARGET_CANNOT_FORCE_CONST_MEM return true if
 arm_disable_literal_pool is true. I added patterns to split floating
 point constants for both SF and DFmode. A pattern to handle the
 addressing of label_refs had to be included as well since all
 "memory_operand" patterns are disabled when
 TARGET_CANNOT_FORCE_CONST_MEM returns true. Also the pattern for
 splitting 32-bit immediates had to be changed, it was not accepting
 unsigned 32-bit unsigned integers with the MSB set. I believe
 const_int_operand expects the mode of the operand to be set to VOIDmode
 and not SImode. I have only changed it in the patterns that were
 affecting this code, though I suggest looking into changing it in the
 rest of the ARM backend.

 I added more test cases. No regressions for arm-none-eabi with
 Cortex-M0, Cortex-M3 and Cortex-M7.

 Is this OK for trunk?
>>>
>>> Including -mslow-flash-data in your multilib flags ? If no regressions
>>> with that ok .
>>>
>>>
>>> regards
>>> Ramana
>>>

>>
>> Hello,
>>
>> I found some new ICE's with the -mslow-flash-data testing so I had to
>> rework this patch. I took the opportunity to rebase it as well.
>>
>> The problem was with the way the old version of the patch handled label
>> references.  After some digging I found I wasn't using the right target
>> hook and so I implemented the 'TARGET_USE_BLOCKS_FOR_CONSTANT_P' for
>> ARM.  This target hook determines whether a literal pool ends up in an
>> 'object_block' structure. So I reverted the changes made in the old
>> version of the patch to the ARM implementation of the
>> 'TARGET_CANNOT_FORCE_CONST_MEM' hook and rely on
>> 'TARGET_USE_BLOCKS_FOR_CONSTANT_P' instead. This patch adds an ARM
>> implementation for this hook that returns false if
>> 'arm_disable_literal_pool' is set to true and true otherwise.
>>
>> This version of the patch also reverts back to using the check for
>> 'SYMBOL_REF' in 'thumb2_legitimate_address_p' that was removed in the
>> last version, this code is required to place the label references in
>> rodata sections.
>>
>> Another thing this patch does is revert the changes made to the 32-bit
>> constant split in arm.md. The reason this was needed before was because
>> 'real_to_target' returns a long array and does not sign-extend values in
>> it, which would make sense on hosts with 64-bit longs. To fix this the
>> value is now casted to 'int' first.  It would probably be a good idea to
>> change the 'real_to_target' function to return an array with
>> 'HOST_WIDE_INT' elements instead and either use all 64-bits or
>> sign-extend them.  Something for the future?
>>
>> I added more test cases in this patch and reran regression tests for:
>> Cortex-M0, Cortex-M4 with and without -mslow-flash-data. Also did a
>> bootstrap+regressions on arm-none-linux-gnueabihf.
>>
>> Is this OK for trunk?
>>
>> Cheers,
>> Andre
>>
>> gcc/ChangeLog:
>>
>> 2016-11-29  Andre Vieira  
>>
>> PR target/71607
>> * config/arm/arm.md (use_literal_pool): Removes.
>> (64-bit immediate split): No longer takes cost into consideration
>> if 'arm_disable_literal_pool' is enabled.
>> * config/arm/arm.c (arm_use_blocks_for_constant_p): New.
>> (TARGET_USE_BLOCKS_FOR_CONSTANT_P): Define.
>> (arm_max_const_double_inline_cost): Remove use of
>> arm_disable_literal_pool.
>> * config/arm/vfp.md (no_literal_pool_df_immediate): New.
>> (no_literal_pool_sf_immediate): New.
>>
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2016-11-29  Andre Vieira  
>> Thomas Preud'homme  
>>
>> PR target/71607
>> * gcc.target/arm/thumb2-slow-flash-data.c: Renamed to ...
>> * gcc.target/arm/thumb2-slow-flash-data-1.c: ... this.
>> * gcc.target/arm/thumb2-slow-flash-data-2.c: New.
>> * gcc.target/arm/thumb2-slow-flash-data-3.c: New.
>> * gcc.target/arm/thumb2-slow-flash-data-4.c: New.
>> * gcc.target/arm/thumb2-slow-flash-data-5.c: New.
>>
> Hello,
> 
> As I have mentioned in my commit message for the fix on embedded-6  (see
> https://gcc.gnu.org/ml/gcc-patches/2016-12/msg01304.html) I found an
> issue with this code when testing its backport on the embedded-6-branch.
> 
> I misread the definition of the target hook
> TARGET_USE_BLOCKS_FOR_CONSTANT_P and it seems the 

Re: libgo patch committed: Update to Go1.8rc1

2017-01-16 Thread Rainer Orth
Hi Ian,

> On Sun, Jan 15, 2017 at 3:12 AM, Andreas Schwab  wrote:
>> In file included from ../../../libgo/runtime/runtime.h:113:0,
>>  from ../../../libgo/runtime/go-libmain.c:15:
>> ./runtime.inc:650:8: error: redefinition of 'struct siginfo'
>>  struct siginfo {
>> ^~~
>> In file included from /usr/include/signal.h:79:0,
>>  from ../../../libgo/runtime/runtime.h:9,
>>  from ../../../libgo/runtime/go-libmain.c:15:
>> /usr/include/bits/siginfo.h:46:16: note: originally defined here
>>  typedef struct siginfo
>> ^~~
>> In file included from ../../../libgo/runtime/runtime.h:113:0,
>>  from ../../../libgo/runtime/go-main.c:17:
>> ./runtime.inc:650:8: error: redefinition of 'struct siginfo'
>>  struct siginfo {
>> ^~~
>> In file included from /usr/include/signal.h:79:0,
>>  from ../../../libgo/runtime/runtime.h:9,
>>  from ../../../libgo/runtime/go-main.c:17:
>> /usr/include/bits/siginfo.h:46:16: note: originally defined here
>>  typedef struct siginfo
>> ^~~
>> In file included from ../../../libgo/runtime/runtime.h:113:0,
>>  from ../../../libgo/runtime/aeshash.c:7:
>> ./runtime.inc:650:8: error: redefinition of 'struct siginfo'
>>  struct siginfo {
>> ^~~
>> In file included from /usr/include/signal.h:79:0,
>>  from ../../../libgo/runtime/runtime.h:9,
>>  from ../../../libgo/runtime/aeshash.c:7:
>> /usr/include/bits/siginfo.h:46:16: note: originally defined here
>>  typedef struct siginfo
>> ^~~
>> make[4]: *** [libgobegin_a-go-main.o] Error 1
>
> Thanks.  I committed this patch, which I think should fix the problem.
> Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.

I'm getting further on Solaris now, but the build still fails:

/vol/gcc/src/hg/trunk/local/libgo/go/golang_org/x/net/lif/syscall.go:16:3: 
error: procIoctl is not a function; //go:linkname is only supported for 
functions
 //go:linkname procIoctl libc_ioctl
   ^
/vol/gcc/src/hg/trunk/local/libgo/go/golang_org/x/net/lif/address.go:27:43: 
error: reference to undefined name 'sysAF_INET'
 func (a *Inet4Addr) Family() int { return sysAF_INET }
   ^
/vol/gcc/src/hg/trunk/local/libgo/go/golang_org/x/net/lif/address.go:37:43: 
error: reference to undefined name 'sysAF_INET6'
 func (a *Inet6Addr) Family() int { return sysAF_INET6 }
   ^
/vol/gcc/src/hg/trunk/local/libgo/go/golang_org/x/net/lif/address.go:59:12: 
error: use of undefined type 'lifreq'
   var lifr lifreq
^
/vol/gcc/src/hg/trunk/local/libgo/go/golang_org/x/net/lif/address.go:61:8: 
error: reference to field 'Name' in object which has no fields or methods
lifr.Name[i] = int8(ll.Name[i])
^
/vol/gcc/src/hg/trunk/local/libgo/go/golang_org/x/net/lif/address.go:64:17: 
error: reference to undefined name 'sysSIOCGLIFADDR'
ioc := int64(sysSIOCGLIFADDR)
 ^
/vol/gcc/src/hg/trunk/local/libgo/go/golang_org/x/net/lif/address.go:69:12: 
error: reference to undefined name 'sockaddrStorage'
sa := (*sockaddrStorage)(unsafe.Pointer([0]))
^
/vol/gcc/src/hg/trunk/local/libgo/go/golang_org/x/net/lif/address.go:69:11: 
error: expected pointer
sa := (*sockaddrStorage)(unsafe.Pointer([0]))
   ^
/vol/gcc/src/hg/trunk/local/libgo/go/golang_org/x/net/lif/address.go:69:49: 
error: reference to field 'Lifru' in object which has no fields or methods
sa := (*sockaddrStorage)(unsafe.Pointer([0]))
 ^
/vol/gcc/src/hg/trunk/local/libgo/go/golang_org/x/net/lif/address.go:70:37: 
error: reference to field 'Lifru1' in object which has no fields or methods
l := int(littleEndian.Uint32(lifr.Lifru1[:4]))
 ^
/vol/gcc/src/hg/trunk/local/libgo/go/golang_org/x/net/lif/address.go:75:9: 
error: reference to undefined name 'sysAF_INET'
case sysAF_INET:
 ^
/vol/gcc/src/hg/trunk/local/libgo/go/golang_org/x/net/lif/address.go:77:23: 
error: reference to field 'Lifru' in object which has no fields or methods
 copy(a.IP[:], lifr.Lifru[4:8])
   ^
/vol/gcc/src/hg/trunk/local/libgo/go/golang_org/x/net/lif/address.go:79:9: 
error: reference to undefined name 'sysAF_INET6'
case sysAF_INET6:
 ^
/vol/gcc/src/hg/trunk/local/libgo/go/golang_org/x/net/lif/address.go:80:71: 
error: reference to field 'Lifru' in object which has no fields or methods
 a := {PrefixLen: l, ZoneID: 
int(littleEndian.Uint32(lifr.Lifru[24:28]))}
   ^
/vol/gcc/src/hg/trunk/local/libgo/go/golang_org/x/net/lif/address.go:81:23: 
error: reference to field 'Lifru' in object which has no fields or methods
 copy(a.IP[:], lifr.Lifru[8:24])
  

Re: [C++ Patch] PR 71737

2017-01-16 Thread David Edelsohn
This patch caused a libstdc++ regression on AIX

libstdc++-v3/include/tr1/shared_ptr.h:556: internal compiler error:
tree check: expected class 'type', have 'exceptional' (error_mark) in
build_variant_type_copy, at tree.c:6737


Re: [PATCH][AArch64 - v4] Simplify eh_return implementation

2017-01-16 Thread Wilco Dijkstra
Here is the updated version:

This patch simplifies the handling of the EH return value.  We force the use of 
the
frame pointer so the return location is always at FP + 8.  This means we can 
emit
a simple volatile access in EH_RETURN_HANDLER_RTX without needing md
patterns, splitters and frame offset calculations.  The new implementation also
fixes various bugs in aarch64_final_eh_return_addr, which does not work with
-fomit-frame-pointer, alloca or outgoing arguments.

Bootstrap OK, GCC Regression OK, OK for trunk? Would it be useful to backport
this to GCC6.x?

ChangeLog:

2017-01-16  Wilco Dijkstra  

PR77455
gcc/
* config/aarch64/aarch64.md (eh_return): Remove pattern and splitter.
* config/aarch64/aarch64.h (AARCH64_EH_STACKADJ_REGNUM): Remove.
(EH_RETURN_HANDLER_RTX): New define.
* config/aarch64/aarch64.c (aarch64_frame_pointer_required):
Force frame pointer in EH return functions.
(aarch64_expand_epilogue): Add barrier for eh_return.
(aarch64_final_eh_return_addr): Remove.
(aarch64_eh_return_handler_rtx): New function.
* config/aarch64/aarch64-protos.h (aarch64_final_eh_return_addr):
Remove.
(aarch64_eh_return_handler_rtx): New prototype.

testsuite/
* gcc.target/aarch64/eh_return.c: New test.
--

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 
29a3bd71151aa4fb7c6728f0fb52e2f3f233f41d..602f54f934a9b62e782154e7f19cafe3b1bf0481
 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -358,7 +358,7 @@ int aarch64_hard_regno_mode_ok (unsigned, machine_mode);
 int aarch64_hard_regno_nregs (unsigned, machine_mode);
 int aarch64_uxt_size (int, HOST_WIDE_INT);
 int aarch64_vec_fpconst_pow_of_2 (rtx);
-rtx aarch64_final_eh_return_addr (void);
+rtx aarch64_eh_return_handler_rtx (void);
 rtx aarch64_mask_from_zextract_ops (rtx, rtx);
 const char *aarch64_output_move_struct (rtx *operands);
 rtx aarch64_return_addr (int, rtx);
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 
5aee9eb5b42d200343484ac4d18d650d26027078..924376223f52e9f614ea2b4bc12bd24d2ab2720a
 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -400,9 +400,9 @@ extern unsigned aarch64_architecture_version;
 #define ASM_DECLARE_FUNCTION_NAME(STR, NAME, DECL) \
   aarch64_declare_function_name (STR, NAME, DECL)
 
-/* The register that holds the return address in exception handlers.  */
-#define AARCH64_EH_STACKADJ_REGNUM (R0_REGNUM + 4)
-#define EH_RETURN_STACKADJ_RTX gen_rtx_REG (Pmode, AARCH64_EH_STACKADJ_REGNUM)
+/* For EH returns X4 contains the stack adjustment.  */
+#define EH_RETURN_STACKADJ_RTX gen_rtx_REG (Pmode, R4_REGNUM)
+#define EH_RETURN_HANDLER_RTX  aarch64_eh_return_handler_rtx ()
 
 /* Don't use __builtin_setjmp until we've defined it.  */
 #undef DONT_USE_BUILTIN_SETJMP
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
4e5fad912ba24ffeca36e2d601b67e09c005bb54..d2585dc6eb14fc82c952ca8dbeda445d249fdb9b
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -2762,6 +2762,10 @@ aarch64_frame_pointer_required (void)
   && (!crtl->is_leaf || df_regs_ever_live_p (LR_REGNUM)))
 return true;
 
+  /* Force a frame pointer for EH returns so the return address is at FP+8.  */
+  if (crtl->calls_eh_return)
+return true;
+
   return false;
 }
 
@@ -3623,7 +3627,8 @@ aarch64_expand_epilogue (bool for_sibcall)
 + cfun->machine->frame.saved_varargs_size) != 0;
 
   /* Emit a barrier to prevent loads from a deallocated stack.  */
-  if (final_adjust > crtl->outgoing_args_size || cfun->calls_alloca)
+  if (final_adjust > crtl->outgoing_args_size || cfun->calls_alloca
+  || crtl->calls_eh_return)
 {
   emit_insn (gen_stack_tie (stack_pointer_rtx, stack_pointer_rtx));
   need_barrier_p = false;
@@ -3691,52 +3696,40 @@ aarch64_expand_epilogue (bool for_sibcall)
 emit_jump_insn (ret_rtx);
 }
 
-/* Return the place to copy the exception unwinding return address to.
-   This will probably be a stack slot, but could (in theory be the
-   return register).  */
-rtx
-aarch64_final_eh_return_addr (void)
-{
-  HOST_WIDE_INT fp_offset;
-
-  aarch64_layout_frame ();
+/* Implement EH_RETURN_HANDLER_RTX.  EH returns need to either return
+   normally or return to a previous frame after unwinding.
 
-  fp_offset = cfun->machine->frame.frame_size
- - cfun->machine->frame.hard_fp_offset;
+   An EH return uses a single shared return sequence.  The epilogue is
+   exactly like a normal epilogue except that it has an extra input
+   register (EH_RETURN_STACKADJ_RTX) which contains the stack adjustment
+   that must be applied after the frame has been destroyed.  An extra label
+   is inserted before the epilogue which initializes this register to zero,
+   and this is the 

Re: [2/5][DWARF] Generate dwarf information for -msign-return-address by introducing new DWARF mapping hook

2017-01-16 Thread Jiong Wang

On 13/01/17 18:02, Jiong Wang wrote:

On 13/01/17 16:09, Richard Earnshaw (lists) wrote:

On 06/01/17 11:47, Jiong Wang wrote:


This patch is an update on DWARF generation for return address signing.

According to new proposal, we simply needs to generate 
REG_CFA_WINDOW_SAVE

annotation.

gcc/

2017-01-06  Jiong Wang  

 * config/aarch64/aarch64.c (aarch64_expand_prologue): Generate
dwarf
 annotation (REG_CFA_WINDOW_SAVE) for return address signing.
 (aarch64_expand_epilogue): Likewise.



I don't think we should be overloading REG_CFA_WINDOW_SAVE internally in
the compiler -- it's one thing to do it in the dwarf output tables, but
quite another to be doing it elsewhere in the compiler.

Instead we should create a new reg note kind and use that, but in the
final dwarf output then emit the overloaded opcode.


I can see the reason for doing this is if you want to seperate the 
interpretion
of GCC CFA reg-note and the final DWARF CFA operation.  My 
understanding is all
reg notes defined in gcc/reg-note.def should have general meaning, 
even the
CFA_WINDOW_SAVE.  For those which are architecture specific we might 
need a

mechanism to define them in backend only.
   For general reg-notes in gcc/reg-note.def, they are not always have 
the

corresponding standard DWARF CFA operation, for example CFA_WINDOW_SAVE,
therefore if we want to achieve what you described, I think we also 
need to
define a new target hook which maps a GCC CFA reg-note into 
architecture DWARF

CFA operation.

Regards,
Jiong



Here is the patch.

Introduced one new target hook TARGET_DWARF_MAP_REGNOTE_TO_CFA.  The purpose is
to allow GCC to map DWARF CFA reg notes in reg-note.def, which looks to me have
generic meaning, into target private DWARF CFI if there is no standard DWARF CFI
mapping.

One new GCC reg-note REG_TOGGLE_RA_MANGLE introduced as well, currently, it's
only used by AArch64 to implement return address signing and is mapped to
AArch64's target private DWARF CFI.

Does this approach and implementation looks OK?

I can come up with seperate patches to define this hook on Sparc for
CFA_WINDOW_SAVE, and to remove redundant including of dwarf2.h although there is
"ifdef" protector in header file.

The default hook implementation "default_dwarf_map_regnote_to_cfa" in
targhooks.c used the types "enum reg_note" and "enum dwarf_call_frame_info"
which is not included in coretypes.h thus this patch has several change in
header files.  I have done X86 bootstrap to make sure no build breakage.  I'd
appreciate there is better ideas to handle these type define.

Thanks.

gcc/ChangeLog:

2017-01-16  Jiong Wang  

* target.def (dwarf_map_regnote_to_cfa): New hook.
* targhooks.c (default_dwarf_map_regnote_to_cfa): Default implementation
for TARGET_DWARF_MAP_REGNOTE_TO_CFA.
* targhooks.h (default_dwarf_map_regnote_to_cfa): New declaration.
* rtl.h (enum reg_note): Move enum reg_note to...
* coretypes.h: ... here.
(dwarf2.h): New include file.
* reg-notes.def (CFA_TOGGLE_RA_MANGLE): New reg-note.
* combine-stack-adj.c (no_unhandled_cfa): Handle
REG_CFA_TOGGLE_RA_MANGLE.
* dwarf2cfi.c (dwarf2out_frame_debug_cfa_toggle_ra_mangle): New
function.
(dwarf2out_frame_debug): Handle REG_CFA_TOGGLE_RA_MANGLE.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in: Documents TARGET_DWARF_MAP_REGNOTE_TO_CFA.
* config/aarch64/aarch64.c (aarch64_map_regnote_to_cfa): Implements
TARGET_DWARF_MAP_REGNOTE_TO_CFA.
(aarch64_expand_prologue): Generate DWARF info for return address
signing.
(aarch64_expand_epilogue): Likewise.
(TARGET_DWARF_MAP_REGNOTE_TO_CFA): Define.

diff --git a/gcc/target.def b/gcc/target.def
index 0443390..6aaa9e6 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -3995,6 +3995,14 @@ the CFI label attached to the insn, @var{pattern} is the pattern of\n\
 the insn and @var{index} is @code{UNSPEC_INDEX} or @code{UNSPECV_INDEX}.",
  void, (const char *label, rtx pattern, int index), NULL)
 
+/* This target hook allows the backend to map GCC DWARF CFA reg-note to
+   architecture specific DWARF call frame instruction.  */
+DEFHOOK
+(dwarf_map_regnote_to_cfa,
+ "Maps the incoming GCC DWARF CFA reg-note to architecture specific DWARF call\
+ frame instruction.",
+ enum dwarf_call_frame_info, (enum reg_note), default_dwarf_map_regnote_to_cfa)
+
 /* ??? Documenting this hook requires a GFDL license grant.  */
 DEFHOOK_UNDOC
 (stdarg_optimize_hook,
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index 2f2abd3..df07911 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -1711,6 +1711,17 @@ default_dwarf_frame_reg_mode (int regno)
   return save_mode;
 }
 
+/* Determine the correct mode for a Dwarf frame register that represents
+   register REGNO.  */
+
+enum dwarf_call_frame_info
+default_dwarf_map_regnote_to_cfa (enum 

Re: [PATCH] Speed-up use-after-scope (re-writing to SSA) (version 2)

2017-01-16 Thread Jakub Jelinek
On Mon, Jan 09, 2017 at 03:58:04PM +0100, Martin Liška wrote:
> >> Well, having following sample:
> >>
> >> int
> >> main (int argc, char **argv)
> >> {
> >>   int *ptr = 0;
> >>
> >>   {
> >> int a;
> >> ptr = 
> >> *ptr = 12345;
> >>   }
> >>
> >>   *ptr = 12345;
> >>   return *ptr;
> >> }
> >>

> I'm still not sure how to do that. Problem is that transformation from:
> 
>   ASAN_MARK (UNPOISON, , 4);
>   a = 5;
>   ASAN_MARK (POISON, , 4);
> 
> to 
> 
>   a_8 = 5;
>   a_9 = ASAN_POISON ();
> 
> happens in tree-ssa.c, after SSA is created, in situation where we prove the 
> 'a'
> does not need to live in memory. Thus said, question is how to identify that 
> we
> need to transform into SSA in a different way:
> 
>a_10 = ASAN_POISON ();
>ASAN_POISON (a_10);

I meant something like this (completely untested, and without the testcase
added to the testsuite).
The incremental patch as is relies on the ASAN_POISON_USE call having the
argument the result of ASAN_POISON, it would ICE if that is not the case
(especially if -fsanitize-recover=address).  Dunno if some optimization
might decide to create a PHI in between, say merge two unrelated vars for
if (something)
  {
x_1 = ASAN_POISON ();
...
ASAN_POISON_USE (x_1);
  }
else
  {
y_2 = ASAN_POISON ();
...
ASAN_POISON_USE (y_2);
  }
to turn that into:
if (something)
  x_1 = ASAN_POISON ();
else
  y_2 = ASAN_POISON ();
_3 = PHI ;
...
ASAN_POISON_USE (_3);

If it did, we would ICE because ASAN_POISON_USE would survive this way until
expansion.  A quick fix for the ICE (if it can ever happen) would be easy,
in sanopt remove ASAN_POISON_USE calls which have argument that is not lhs
of ASAN_POISON (all other ASAN_POISON_USE calls will be handled by my
incremental patch).  Of course that would also mean in that case we'd report
a read rather than write.  But if it can't happen or is very unlikely to
happen, then it is a non-issue.

Something missing from the patch is some change in DCE to remove ASAN_POISON
calls without lhs earlier.  I think we can't make ASAN_POISON ECF_CONST, we
don't want it to be merged for different variables.

--- gcc/internal-fn.def.jj  2017-01-16 13:19:49.0 +0100
+++ gcc/internal-fn.def 2017-01-16 14:25:37.427962196 +0100
@@ -167,6 +167,7 @@ DEF_INTERNAL_FN (ABNORMAL_DISPATCHER, EC
 DEF_INTERNAL_FN (ASAN_CHECK, ECF_TM_PURE | ECF_LEAF | ECF_NOTHROW, ".R...")
 DEF_INTERNAL_FN (ASAN_MARK, ECF_LEAF | ECF_NOTHROW, ".R..")
 DEF_INTERNAL_FN (ASAN_POISON, ECF_LEAF | ECF_NOTHROW | ECF_NOVOPS, NULL)
+DEF_INTERNAL_FN (ASAN_POISON_USE, ECF_LEAF | ECF_NOTHROW | ECF_NOVOPS, NULL)
 DEF_INTERNAL_FN (ADD_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (SUB_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (MUL_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
--- gcc/asan.c.jj   2017-01-16 13:19:49.0 +0100
+++ gcc/asan.c  2017-01-16 14:52:34.022044223 +0100
@@ -3094,6 +3094,8 @@ create_asan_shadow_var (tree var_decl,
 return *slot;
 }
 
+/* Expand ASAN_POISON ifn.  */
+
 bool
 asan_expand_poison_ifn (gimple_stmt_iterator *iter,
bool *need_commit_edge_insert,
@@ -3107,8 +3109,8 @@ asan_expand_poison_ifn (gimple_stmt_iter
   return true;
 }
 
-  tree shadow_var  = create_asan_shadow_var (SSA_NAME_VAR (poisoned_var),
-shadow_vars_mapping);
+  tree shadow_var = create_asan_shadow_var (SSA_NAME_VAR (poisoned_var),
+   shadow_vars_mapping);
 
   bool recover_p;
   if (flag_sanitize & SANITIZE_USER_ADDRESS)
@@ -3122,16 +3124,16 @@ asan_expand_poison_ifn (gimple_stmt_iter
 ASAN_MARK_POISON),
  build_fold_addr_expr (shadow_var), size);
 
-  use_operand_p use_p;
+  gimple *use;
   imm_use_iterator imm_iter;
-  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, poisoned_var)
+  FOR_EACH_IMM_USE_STMT (use, imm_iter, poisoned_var)
 {
-  gimple *use = USE_STMT (use_p);
   if (is_gimple_debug (use))
continue;
 
   int nargs;
-  tree fun = report_error_func (false, recover_p, tree_to_uhwi (size),
+  bool store_p = gimple_call_internal_p (use, IFN_ASAN_POISON_USE);
+  tree fun = report_error_func (store_p, recover_p, tree_to_uhwi (size),
);
 
   gcall *call = gimple_build_call (fun, 1,
@@ -3160,7 +3162,10 @@ asan_expand_poison_ifn (gimple_stmt_iter
   else
{
  gimple_stmt_iterator gsi = gsi_for_stmt (use);
- gsi_insert_before (, call, GSI_NEW_STMT);
+ if (store_p)
+   gsi_replace (, call, true);
+ else
+   gsi_insert_before (, call, GSI_NEW_STMT);
}
 }
 
--- gcc/tree-into-ssa.c.jj  2017-01-01 12:45:35.0 +0100
+++ gcc/tree-into-ssa.c 2017-01-16 14:32:14.853808726 +0100
@@ -38,6 +38,7 @@ along with GCC; see the file 

Re: Powerpc bootstrap failure due to duplicate case value

2017-01-16 Thread Alan Modra
On Mon, Jan 16, 2017 at 12:41:29PM +0100, Jakub Jelinek wrote:
> Or, as a switch it could be of the form:
>   switch (INSN_CODE (insn))
> {
> #ifdef HAVE_ctrsi_internal1
> case CODE_FOR_ctrsi_internal1:
> case CODE_FOR_ctrsi_internal2:
> case CODE_FOR_ctrsi_internal3:
> case CODE_FOR_ctrsi_internal4:
> #endif
> #ifdef HAVE_ctrdi_internal1
> case CODE_FOR_ctrdi_internal1:
> case CODE_FOR_ctrdi_internal2:
> case CODE_FOR_ctrdi_internal3:
> case CODE_FOR_ctrdi_internal4:
> #endif
>   return false;
> default:
>   break;
> }

I didn't think of that.  Segher and I discussed the problem on #gcc
and both independently decided an if() was the obvious fix.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [committed] Don't suppress bogus usage of macros from system headers in -Wformat (PR c/78304)

2017-01-16 Thread Rainer Orth
Hi Christophe,

>> Successfully bootstrapped on x86_64-pc-linux-gnu;
>> adds 34 PASS results to gcc.sum.
>>
> These 2 tests fail on arm:
>
>   gcc.dg/format/pr78304.c (test for warnings, line 9)
>   gcc.dg/format/pr78304.c   -DWIDE   (test for warnings, line 9)

also on sparc-sun-solaris2.12 and i386-pc-solaris2.12, 32-bit only.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[PATCH] PR78702 fix accessibility of locale::facet::__shim

2017-01-16 Thread Jonathan Wakely

This fixes a bug that prevents building the library with Clang (though
I don't know why anybody's doing that). G++ doesn't notice it due to
one of our bugs with access checking in templates.

PR libstdc++/78702
* include/bits/locale_classes.h (locale::facet::__shim): Change from
private to protected.
* src/c++11/cxx11-shim_facets.cc (__shim_accessor): Define helper to
make locale::facet::__shim accessible.

Tested powercp64le-linux, committed to trunk.

commit 71f9cd2ee86aaea28cb293be20d01adaa1178f9c
Author: Jonathan Wakely 
Date:   Mon Jan 16 11:11:55 2017 +

PR78702 fix accessibility of locale::facet::__shim

PR libstdc++/78702
* include/bits/locale_classes.h (locale::facet::__shim): Change from
private to protected.
* src/c++11/cxx11-shim_facets.cc (__shim_accessor): Define helper to
make locale::facet::__shim accessible.

diff --git a/libstdc++-v3/include/bits/locale_classes.h 
b/libstdc++-v3/include/bits/locale_classes.h
index 41adac6..b63e9c8 100644
--- a/libstdc++-v3/include/bits/locale_classes.h
+++ b/libstdc++-v3/include/bits/locale_classes.h
@@ -461,10 +461,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
}
 }
 
-class __shim;
-
 const facet* _M_sso_shim(const id*) const;
 const facet* _M_cow_shim(const id*) const;
+
+  protected:
+class __shim; // For internal use only.
   };
 
 
diff --git a/libstdc++-v3/src/c++11/cxx11-shim_facets.cc 
b/libstdc++-v3/src/c++11/cxx11-shim_facets.cc
index 9ad7300..b69959f 100644
--- a/libstdc++-v3/src/c++11/cxx11-shim_facets.cc
+++ b/libstdc++-v3/src/c++11/cxx11-shim_facets.cc
@@ -226,8 +226,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   namespace // unnamed
   {
+struct __shim_accessor : facet
+{
+  using facet::__shim;  // Redeclare protected member as public.
+};
+using __shim = __shim_accessor::__shim;
+
 template
-  struct numpunct_shim : std::numpunct<_CharT>, facet::__shim
+  struct numpunct_shim : std::numpunct<_CharT>, __shim
   {
typedef typename numpunct<_CharT>::__cache_type __cache_type;
 
@@ -251,7 +257,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   };
 
 template
-  struct collate_shim : std::collate<_CharT>, facet::__shim
+  struct collate_shim : std::collate<_CharT>, __shim
   {
typedef basic_string<_CharT>string_type;
 
@@ -276,7 +282,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   };
 
 template
-  struct time_get_shim : std::time_get<_CharT>, facet::__shim
+  struct time_get_shim : std::time_get<_CharT>, __shim
   {
typedef typename std::time_get<_CharT>::iter_type iter_type;
typedef typename std::time_get<_CharT>::char_type char_type;
@@ -330,7 +336,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   };
 
 template
-  struct moneypunct_shim : std::moneypunct<_CharT, _Intl>, facet::__shim
+  struct moneypunct_shim : std::moneypunct<_CharT, _Intl>, __shim
   {
typedef typename moneypunct<_CharT, _Intl>::__cache_type __cache_type;
 
@@ -357,7 +363,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   };
 
 template
-  struct money_get_shim : std::money_get<_CharT>, facet::__shim
+  struct money_get_shim : std::money_get<_CharT>, __shim
   {
typedef typename std::money_get<_CharT>::iter_type iter_type;
typedef typename std::money_get<_CharT>::char_type char_type;
@@ -398,7 +404,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   };
 
 template
-  struct money_put_shim : std::money_put<_CharT>, facet::__shim
+  struct money_put_shim : std::money_put<_CharT>, __shim
   {
typedef typename std::money_put<_CharT>::iter_type iter_type;
typedef typename std::money_put<_CharT>::char_type char_type;
@@ -427,7 +433,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   };
 
 template
-  struct messages_shim : std::messages<_CharT>, facet::__shim
+  struct messages_shim : std::messages<_CharT>, __shim
   {
typedef messages_base::catalog  catalog;
typedef basic_string<_CharT>string_type;


Re: Powerpc bootstrap failure due to duplicate case value

2017-01-16 Thread Jakub Jelinek
On Mon, Jan 16, 2017 at 09:53:17PM +1030, Alan Modra wrote:
> Commited as obvious.
> 
>   PR target/79098
>   * config/rs6000/rs6000.c (rs6000_legitimate_combined_insn): Don't
>   use a switch.

Perhaps it would be useful to write why it can't be written as a switch.
Or, as a switch it could be of the form:
  switch (INSN_CODE (insn))
{
#ifdef HAVE_ctrsi_internal1
case CODE_FOR_ctrsi_internal1:
case CODE_FOR_ctrsi_internal2:
case CODE_FOR_ctrsi_internal3:
case CODE_FOR_ctrsi_internal4:
#endif
#ifdef HAVE_ctrdi_internal1
case CODE_FOR_ctrdi_internal1:
case CODE_FOR_ctrdi_internal2:
case CODE_FOR_ctrdi_internal3:
case CODE_FOR_ctrdi_internal4:
#endif
  return false;
default:
  break;
}

> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index 11394b2..f1d5d9d 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -9085,40 +9085,41 @@ rs6000_const_not_ok_for_debug_p (rtx x)
>  static bool
>  rs6000_legitimate_combined_insn (rtx_insn *insn)
>  {
> -  switch (INSN_CODE (insn))
> -{
> -  /* Reject creating doloop insns.  Combine should not be allowed
> -  to create these for a number of reasons:
> -  1) In a nested loop, if combine creates one of these in an
> -  outer loop and the register allocator happens to allocate ctr
> -  to the outer loop insn, then the inner loop can't use ctr.
> -  Inner loops ought to be more highly optimized.
> -  2) Combine often wants to create one of these from what was
> -  originally a three insn sequence, first combining the three
> -  insns to two, then to ctrsi/ctrdi.  When ctrsi/ctrdi is not
> -  allocated ctr, the splitter takes use back to the three insn
> -  sequence.  It's better to stop combine at the two insn
> -  sequence.
> -  3) Faced with not being able to allocate ctr for ctrsi/crtdi
> -  insns, the register allocator sometimes uses floating point
> -  or vector registers for the pseudo.  Since ctrsi/ctrdi is a
> -  jump insn and output reloads are not implemented for jumps,
> -  the ctrsi/ctrdi splitters need to handle all possible cases.
> -  That's a pain, and it gets to be seriously difficult when a
> -  splitter that runs after reload needs memory to transfer from
> -  a gpr to fpr.  See PR70098 and PR71763 which are not fixed
> -  for the difficult case.  It's better to not create problems
> -  in the first place.  */
> -case CODE_FOR_ctrsi_internal1:
> -case CODE_FOR_ctrdi_internal1:
> -case CODE_FOR_ctrsi_internal2:
> -case CODE_FOR_ctrdi_internal2:
> -case CODE_FOR_ctrsi_internal3:
> -case CODE_FOR_ctrdi_internal3:
> -case CODE_FOR_ctrsi_internal4:
> -case CODE_FOR_ctrdi_internal4:
> -  return false;
> -}
> +  int icode = INSN_CODE (insn);
> +
> +  /* Reject creating doloop insns.  Combine should not be allowed
> + to create these for a number of reasons:
> + 1) In a nested loop, if combine creates one of these in an
> + outer loop and the register allocator happens to allocate ctr
> + to the outer loop insn, then the inner loop can't use ctr.
> + Inner loops ought to be more highly optimized.
> + 2) Combine often wants to create one of these from what was
> + originally a three insn sequence, first combining the three
> + insns to two, then to ctrsi/ctrdi.  When ctrsi/ctrdi is not
> + allocated ctr, the splitter takes use back to the three insn
> + sequence.  It's better to stop combine at the two insn
> + sequence.
> + 3) Faced with not being able to allocate ctr for ctrsi/crtdi
> + insns, the register allocator sometimes uses floating point
> + or vector registers for the pseudo.  Since ctrsi/ctrdi is a
> + jump insn and output reloads are not implemented for jumps,
> + the ctrsi/ctrdi splitters need to handle all possible cases.
> + That's a pain, and it gets to be seriously difficult when a
> + splitter that runs after reload needs memory to transfer from
> + a gpr to fpr.  See PR70098 and PR71763 which are not fixed
> + for the difficult case.  It's better to not create problems
> + in the first place.  */
> +  if (icode != CODE_FOR_nothing
> +  && (icode == CODE_FOR_ctrsi_internal1
> +   || icode == CODE_FOR_ctrdi_internal1
> +   || icode == CODE_FOR_ctrsi_internal2
> +   || icode == CODE_FOR_ctrdi_internal2
> +   || icode == CODE_FOR_ctrsi_internal3
> +   || icode == CODE_FOR_ctrdi_internal3
> +   || icode == CODE_FOR_ctrsi_internal4
> +   || icode == CODE_FOR_ctrdi_internal4))
> +return false;
> +
>return true;
>  }
>  
> 
> -- 
> Alan Modra
> Australia Development Lab, IBM

Jakub


Re: [v3 PATCH] PR libstdc++/78389

2017-01-16 Thread Jonathan Wakely

On 16/01/17 11:24 +, Jonathan Wakely wrote:

OK for trunk with the additional changes to use better magic numbers
in the tests.


Oh, and for the branches too.


Re: [1/5][AArch64] Return address protection on AArch64

2017-01-16 Thread James Greenhalgh
On Fri, Jan 13, 2017 at 05:05:43PM +, Jiong Wang wrote:
> On 13/01/17 16:04, James Greenhalgh wrote:
> >On Fri, Jan 06, 2017 at 11:47:07AM +, Jiong Wang wrote:
> >>On 11/11/16 18:22, Jiong Wang wrote:
> >>gcc/
> >>2017-01-06  Jiong Wang  
> >>
> >> * config/aarch64/aarch64-opts.h (aarch64_function_type): New enum.
> >> * config/aarch64/aarch64-protos.h
> >> (aarch64_return_address_signing_enabled): New declaration.
> >> * config/aarch64/aarch64.c 
> >> (aarch64_return_address_signing_enabled):
> >> New function.
> >> (aarch64_expand_prologue): Sign return address before it's pushed 
> >> onto
> >> stack.
> >> (aarch64_expand_epilogue): Authenticate return address fetched from
> >> stack.
> >> (aarch64_override_options): Sanity check for ILP32 and ISA level.
> >> (aarch64_attributes): New function attributes for 
> >> "sign-return-address".
> >> * config/aarch64/aarch64.md (UNSPEC_AUTI1716, UNSPEC_AUTISP,
> >> UNSPEC_PACI1716, UNSPEC_PACISP, UNSPEC_XPACLRI): New unspecs.
> >> ("*do_return"): Generate combined instructions according to key 
> >> index.
> >> ("sp", " >> New.
> >> * config/aarch64/iterators.md (PAUTH_LR_SP, PAUTH_17_16): New 
> >> integer
> >> iterators.
> >> (pauth_mnem_prefix, pauth_hint_num_a): New integer attributes.
> >> * config/aarch64/aarch64.opt (msign-return-address=): New.
> >> * doc/extend.texi (AArch64 Function Attributes): Documents
> >> "sign-return-address=".
> >> * doc/invoke.texi (AArch64 Options): Documents 
> >> "-msign-return-address=".
> >>
> >>gcc/testsuite/
> >>2017-01-06  Jiong Wang  
> >>
> >> * gcc.target/aarch64/return_address_sign_1.c: New testcase.
> >> * gcc.target/aarch64/return_address_sign_scope_1.c: New testcase.
> >I have a few comments on this patch
> 
> All fixed.  New patch attached.

OK with two trivial changes (see below).

We're late in stage 3, but as this option is default off and doesn't affect
the rest of the AArch64 port I think it will be OK to take at this stage.
Because of the timing, please give Richard/Marcus a 24hr chance to comment
on whether they think this is acceptable.

> gcc/
> 2017-01-13  Jiong Wang  
> 
> * config/aarch64/aarch64-opts.h (aarch64_function_type): New enum.
> * config/aarch64/aarch64-protos.h
> (aarch64_return_address_signing_enabled): New declaration.
> * config/aarch64/aarch64.c (aarch64_return_address_signing_enabled):
> New function.
> (aarch64_expand_prologue): Sign return address before it's pushed onto
> stack.
> (aarch64_expand_epilogue): Authenticate return address fetched from
> stack.
> (aarch64_override_options): Sanity check for ILP32 and ISA level.
> (aarch64_attributes): New function attributes for 
> "sign-return-address".
> * config/aarch64/aarch64.md (UNSPEC_AUTI1716, UNSPEC_AUTISP,
> UNSPEC_PACI1716, UNSPEC_PACISP, UNSPEC_XPACLRI): New unspecs.
> ("*do_return"): Generate combined instructions according to key index.
> ("sp", " * config/aarch64/iterators.md (PAUTH_LR_SP, PAUTH_17_16): New integer
> iterators.
> (pauth_mnem_prefix, pauth_hint_num_a): New integer attributes.
> * config/aarch64/aarch64.opt (msign-return-address=): New.
> * doc/extend.texi (AArch64 Function Attributes): Documents
> "sign-return-address=".
> * doc/invoke.texi (AArch64 Options): Documents 
> "-msign-return-address=".
> 
> gcc/testsuite/
> 2017-01-13  Jiong Wang  
> 
> * gcc.target/aarch64/return_address_sign_1.c: New testcase for no
> combined instructions.
> * gcc.target/aarch64/return_address_sign_2.c: New testcase for 
> combined
> instructions.
> * gcc.target/aarch64/return_address_sign_3.c: New testcase for disable
> of pointer authentication.
> 

> diff --git a/gcc/config/aarch64/aarch64-opts.h 
> b/gcc/config/aarch64/aarch64-opts.h
> index 9f37b9b..ba5d052 100644
> --- a/gcc/config/aarch64/aarch64-opts.h
> +++ b/gcc/config/aarch64/aarch64-opts.h
> @@ -71,4 +71,14 @@ enum aarch64_code_model {
>AARCH64_CMODEL_LARGE
>  };
>  
> +/* Function types -msign-return-address should sign.  */
> +enum aarch64_function_type {
> +  /* Don't sign any function.  */
> +  AARCH64_FUNCTION_NONE,
> +  /* Non-leaf functions.  */
> +  AARCH64_FUNCTION_NON_LEAF,
> +  /* All functions.  */
> +  AARCH64_FUNCTION_ALL
> +};
> +
>  #endif
> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index 29a3bd7..632dd47 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -386,6 +386,7 

Re: [v3 PATCH] PR libstdc++/78389

2017-01-16 Thread Jonathan Wakely

On 15/01/17 19:07 +0200, Ville Voutilainen wrote:

   PR libstdc++/78389
   Fix backwards size adjustments.


I don't think repeating this text here and ...


   * include/bits/list.tcc (merge(list&&)):
   Fix backwards size adjustments.


... here is useful.

More useful would be a good Git-style commit log with a separate
description on the first line. So the svn commit would have a subject
line, followed by a blank line, followed by the content of the
ChangeLog entry e.g.

PR78389 fix backwards size adjustments

PR libstdc++/78389
* include/bits/list.tcc (merge(list&&)): Fix backwards size
adjustments.
(merge(list&&, _StrictWeakOrdering)): Likewise.
* testsuite/23_containers/list/operations/78389.cc: Add
better test for the sizes.

See the output of "git log --pretty=oneline --abbrev-commit" for why
that's useful (and how messy and unhelpful that output is when commits
don't have a separate summary on the first line). See also
http://chris.beams.io/posts/git-commit/

Anyway ...

OK for trunk with the additional changes to use better magic numbers
in the tests.



Powerpc bootstrap failure due to duplicate case value

2017-01-16 Thread Alan Modra
Commited as obvious.

PR target/79098
* config/rs6000/rs6000.c (rs6000_legitimate_combined_insn): Don't
use a switch.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 11394b2..f1d5d9d 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -9085,40 +9085,41 @@ rs6000_const_not_ok_for_debug_p (rtx x)
 static bool
 rs6000_legitimate_combined_insn (rtx_insn *insn)
 {
-  switch (INSN_CODE (insn))
-{
-  /* Reject creating doloop insns.  Combine should not be allowed
-to create these for a number of reasons:
-1) In a nested loop, if combine creates one of these in an
-outer loop and the register allocator happens to allocate ctr
-to the outer loop insn, then the inner loop can't use ctr.
-Inner loops ought to be more highly optimized.
-2) Combine often wants to create one of these from what was
-originally a three insn sequence, first combining the three
-insns to two, then to ctrsi/ctrdi.  When ctrsi/ctrdi is not
-allocated ctr, the splitter takes use back to the three insn
-sequence.  It's better to stop combine at the two insn
-sequence.
-3) Faced with not being able to allocate ctr for ctrsi/crtdi
-insns, the register allocator sometimes uses floating point
-or vector registers for the pseudo.  Since ctrsi/ctrdi is a
-jump insn and output reloads are not implemented for jumps,
-the ctrsi/ctrdi splitters need to handle all possible cases.
-That's a pain, and it gets to be seriously difficult when a
-splitter that runs after reload needs memory to transfer from
-a gpr to fpr.  See PR70098 and PR71763 which are not fixed
-for the difficult case.  It's better to not create problems
-in the first place.  */
-case CODE_FOR_ctrsi_internal1:
-case CODE_FOR_ctrdi_internal1:
-case CODE_FOR_ctrsi_internal2:
-case CODE_FOR_ctrdi_internal2:
-case CODE_FOR_ctrsi_internal3:
-case CODE_FOR_ctrdi_internal3:
-case CODE_FOR_ctrsi_internal4:
-case CODE_FOR_ctrdi_internal4:
-  return false;
-}
+  int icode = INSN_CODE (insn);
+
+  /* Reject creating doloop insns.  Combine should not be allowed
+ to create these for a number of reasons:
+ 1) In a nested loop, if combine creates one of these in an
+ outer loop and the register allocator happens to allocate ctr
+ to the outer loop insn, then the inner loop can't use ctr.
+ Inner loops ought to be more highly optimized.
+ 2) Combine often wants to create one of these from what was
+ originally a three insn sequence, first combining the three
+ insns to two, then to ctrsi/ctrdi.  When ctrsi/ctrdi is not
+ allocated ctr, the splitter takes use back to the three insn
+ sequence.  It's better to stop combine at the two insn
+ sequence.
+ 3) Faced with not being able to allocate ctr for ctrsi/crtdi
+ insns, the register allocator sometimes uses floating point
+ or vector registers for the pseudo.  Since ctrsi/ctrdi is a
+ jump insn and output reloads are not implemented for jumps,
+ the ctrsi/ctrdi splitters need to handle all possible cases.
+ That's a pain, and it gets to be seriously difficult when a
+ splitter that runs after reload needs memory to transfer from
+ a gpr to fpr.  See PR70098 and PR71763 which are not fixed
+ for the difficult case.  It's better to not create problems
+ in the first place.  */
+  if (icode != CODE_FOR_nothing
+  && (icode == CODE_FOR_ctrsi_internal1
+ || icode == CODE_FOR_ctrdi_internal1
+ || icode == CODE_FOR_ctrsi_internal2
+ || icode == CODE_FOR_ctrdi_internal2
+ || icode == CODE_FOR_ctrsi_internal3
+ || icode == CODE_FOR_ctrdi_internal3
+ || icode == CODE_FOR_ctrsi_internal4
+ || icode == CODE_FOR_ctrdi_internal4))
+return false;
+
   return true;
 }
 

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH][AArch64 - v3] Simplify eh_return implementation

2017-01-16 Thread James Greenhalgh
On Fri, Jan 13, 2017 at 07:50:48PM +, Wilco Dijkstra wrote:
> James Greenhalgh wrote:
> 
> > I've been putting off reviewing this patch for a while now, because I don't
> > understand enough about the current eh_return code to understand why what
> > you're proposing is correct.
> >
> > The best way to progress this patch would be to go in to more detail as to
> > what the current code does, why we don't need it, and why your new
> > implementation is sufficient. The current code looks like it covers special
> > cases, and calls out DSE and CSELIB as needing special handling - your
> > new code doesn't.
> > 
> > Could you explain your patch to me assuming I know very little about the
> > implementation, with particular focus on why we no longer need the special
> > cases?
> 
> Well eh_return is far more complex than the few lines of assembly code it was
> intending to avoid...
>
> Basically the EH return feature wants the capability to either return
> normally or return to a previous frame after unwinding. Oddly enough the
> design is to use a single shared return sequence. The epilog is exactly like
> a normal epilog except that it has an extra input register which contains the
> stack adjustment which must be applied after the frame has been destroyed. An
> extra label is inserted before the eplilog which sets this stack adjustment
> to zero, and this is the entry point for a normal return. 
> 
> An EH return updates the return address, initializes the stack adjustment and
> jumps directly into the epilog (bypassing the zeroing of the adjustment).
> Sounds insane already?  Well it gets worse...
> 
> Given the return address is typically saved on the stack when a function
> makes a call, we now need to overwrite the saved LR outside the epilog. This
> poses a problem as we need to emit this store before the epilog is emitted,
> ie. before the offset of LR is even known. We also need to ensure the store
> is not removed. 
> 
> The existing implementation tries to do this by using a special eh_return
> pattern which emits the store as late as possible (at least after the frame
> layout has been finalized).  However it doesn't do this late enough so DSE is
> still run after it...
> 
> To try to avoid DSE removing the store, the existing implementation tries to
> make the store alias with the load by using the same base register and
> offset. Given the epilog always reads LR from SP, it follows that whenever FP
> is used for the store, it will be optimized away.
> 
> Even when it uses SP as the base for the store, SP may be updated in the
> epilog before LR is read (for example alloca functions or frames with large
> locals or outgoing arguments will update SP before reading LR). Worse, the
> offset it uses for the store is incorrect, so if the store is not optimized,
> it just corrupts a random callee-save rather than overwriting LR.
> 
> So basically to conclude the existing implementation only works for one
> specific case where there are a specific number of callee-saves, no floating
> point registers, no outgoing arguments, no alloca used, a small number of
> locals, and frame pointer is enabled.
> 
> The new implementation is trivially correct by forcing the use of a frame
> pointer so that the location of LR is fixed, known early, and using a
> volatile means no optimization could remove the store. It's also much simpler
> by avoiding all the workarounds using patterns, splitters and trying to
> ensure the store appears not dead.

Great, thanks for the detailed explanation. If we can borrow most of this
text and place it in the code as a comment, I think this patch will be good
to go.

How about putting this slight rewording at the top of
aarch64_eh_return_handler_rtx:

/* The EH return feature wants the capability to either return
   normally or return to a previous frame after unwinding.

   The design is to use a single shared return sequence.  The epilogue is
   exactly like a normal epilogue except that it has an extra input
   register which contains the stack adjustment which must be applied after
   the frame has been destroyed.  An extra label is inserted before the
   epilogue which sets this stack adjustment to zero, and this is the
   entry point for a normal return.

   An EH return updates the return address, initializes the stack
   adjustment and jumps directly into the epilogue (bypassing the zeroing
   of the adjustment).  But, it gets worse...

   Given the return address is typically saved on the stack when a function
   makes a call, we now need to overwrite the saved LR outside the epilogue.
   This poses a problem as we need to emit this store before the
   epilogue is emitted, i.e. before the offset of LR is even known.  We also
   need to ensure the store is not removed. 

   Previously, we did the by using a special eh_return pattern which
   would emits the store as late as possible (at least after the frame
   layout has been finalized).  However, DSE would still be run 

Re: [driver, doc] Support escaping special characters in specs

2017-01-16 Thread Rainer Orth
Hi Sandra,

> On 01/13/2017 05:59 AM, Rainer Orth wrote:
>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>> --- a/gcc/doc/invoke.texi
>> +++ b/gcc/doc/invoke.texi
>> @@ -26391,6 +26391,13 @@ be as many clauses as you need.  This ma
>>
>>  @end table
>>
>> +The switch matching text @code{S} in a %@{@code{S}@},
>> +%@{@code{S}:@code{X}@} or similar construct can use a backslash to
>> +ignore the special meaning of the character following it, thus allowing
>> +literal matching of a character that is otherwise specially treated.
>> +For example, %@{@code{std=iso9899\:1999}:@code{X}@} would substitute
>> +@code{X} if the @option{-std=iso9899:1999} option were given.
>> +
>
> I see this "%@{@code{..." markup appears in the paragraph just before this,
> but it's wrong.  The whole thing needs to be wrapped in @samp and the
> nested @codes removed, like
>
> s/%@{@code{S}:@code{X}@}/@samp{%@{S:X@}}/
>
> etc.

I see, fixed.  I assume this applies to the uses inside @item, too, and
irrespective of %{S:X} or %{S}?

> I also suggest using the present tense here instead of the subjunctive...
>
> s/would substitute/substitutes/
> s/were given/is given/

Makes sense: fixed both in gcc.c and invoke.texi.

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


RE: [PATCH] PR79079 Fix __builtin_mul_overflow code gen for !TRULY_NOOP_TRUNCATION target

2017-01-16 Thread Matthew Fortune
Kito Cheng  writes:
> On Mon, Jan 16, 2017 at 02:42:08PM +0800, Kito Cheng wrote:
> > 2017-01-16  Kito Cheng 
> > Kuan-Lin Chen 
> >
> > PR target/PR79079
> > * gcc/internal-fn.c (expand_mul_overflow): Use convert_modes
> instead of
> > gen_lowpart.

Thanks for the fix Kito, much appreciated.

I believe this is candidate for backport to GCC 6 given a week or so in trunk
to make sure it's OK.

Matthew


Re: [committed] Don't suppress bogus usage of macros from system headers in -Wformat (PR c/78304)

2017-01-16 Thread Christophe Lyon
Hi David,

On 13 January 2017 at 21:04, David Malcolm  wrote:
> c-lex.c: lex_string uses cpp_get_token rather than
> cpp_get_token_with_location, and hence the C family of frontends
> record the physical locations of tokens in string concatenations, rather
> than the virtual locations, discarding any macro expansion information.
>
> The resulting *tree* node from the concatenation does use virtual
> locations (if appropriate).
>
> Hence if we have a macro expansion within a string literal, the macro
> expansion information is only recorded for the initial string token, and
> not for any followup string tokens concatenated with it.
>
> Hence any usage of macros defined in system headers in the concatenated
> string literal (apart from as the first token) will erroneously be
> treated as being within the system header by the substring location
> code, and hence we fail to emit a warning for this bogus code:
>
> void test (const char *msg)
> {
>   printf ("size: %" PRIu32 "\n", msg);
> }
>
> format_warning_va determines whether or not the pertinent substring
> (here "%u") is spelled fully within the spelling of the whole format
> string, and if it is (case 1), it uses just the pertinent substring as
> the location of the diagnostic.
>
> In the above case, we have the '%' within the format string, but the
> 'u' is within the macro definition; the location of 'u' is used for the
> caret location, and this is the location used for determining if we're
> inside a system header, and so we erroneously bail out without printing
>  the warning.
>
> This patch fixes things by tightening up the logic for case 1 of
> format_warning_va to detect non-trivial situations like this, and to fall
> back on cases 2 and 3, so that the location of the first string token of
> the format string will be used as the primary location of the diagnostics
> (and hence won't be in a system header); the location of the macro
> definition may be printed as a supplementary note (for case 2).
>
> For the test case above, this gives us back the missing warning
> (hitting case 2, and hence printing the macro defn):
>
> pr78304.c: In function ‘test’:
> pr78304.c:9:11: warning: format ‘%u’ expects argument of type
> ‘unsigned int’, but argument 2 has type ‘size_t {aka long unsigned int}’ 
> [-Wformat=]
>printf ("size: %" PRIu32 "\n", size);
>^
> In file included from pr78304.c:4:0:
> /usr/include/inttypes.h:104:19: note: format string is defined here
>  # define PRIu32  "u"
>
> Successfully bootstrapped on x86_64-pc-linux-gnu;
> adds 34 PASS results to gcc.sum.
>
These 2 tests fail on arm:

  gcc.dg/format/pr78304.c (test for warnings, line 9)
  gcc.dg/format/pr78304.c   -DWIDE   (test for warnings, line 9)

Christophe



> Committed to trunk as r244453.
>
> gcc/ChangeLog:
> PR c/78304
> * substring-locations.c (format_warning_va): Strengthen case 1 so
> that both endpoints of the substring must be within the format
> range for just the substring to be printed.
>
> gcc/testsuite/ChangeLog:
> PR c/78304
> * gcc.dg/format/diagnostic-ranges.c (test_macro): Undef INT_FMT.
> (test_macro_2): New test.
> (test_macro_3): New test.
> (test_macro_4): New test.
> (test_non_contiguous_strings): Convert line number to line offset.
> * gcc.dg/format/pr78304-2.c: New test case.
> * gcc.dg/format/pr78304.c: New test case.
> ---
>  gcc/substring-locations.c   |  2 +
>  gcc/testsuite/gcc.dg/format/diagnostic-ranges.c | 53 
> -
>  gcc/testsuite/gcc.dg/format/pr78304-2.c | 11 +
>  gcc/testsuite/gcc.dg/format/pr78304.c   | 10 +
>  4 files changed, 75 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.dg/format/pr78304-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/format/pr78304.c
>
> diff --git a/gcc/substring-locations.c b/gcc/substring-locations.c
> index 8b41f2b..e2d8dd7 100644
> --- a/gcc/substring-locations.c
> +++ b/gcc/substring-locations.c
> @@ -118,6 +118,8 @@ format_warning_va (const substring_loc _loc,
>else
>  {
>if (fmt_substring_range.m_start >= fmt_loc_range.m_start
> + && fmt_substring_range.m_start <= fmt_loc_range.m_finish
> + && fmt_substring_range.m_finish >= fmt_loc_range.m_start
>   && fmt_substring_range.m_finish <= fmt_loc_range.m_finish)
> /* Case 1.  */
> {
> diff --git a/gcc/testsuite/gcc.dg/format/diagnostic-ranges.c 
> b/gcc/testsuite/gcc.dg/format/diagnostic-ranges.c
> index e5e6ade..cab30f2 100644
> --- a/gcc/testsuite/gcc.dg/format/diagnostic-ranges.c
> +++ b/gcc/testsuite/gcc.dg/format/diagnostic-ranges.c
> @@ -254,12 +254,63 @@ void test_macro (const char *msg)
>~^
>%s
> { dg-end-multiline-output "" } */
> +#undef INT_FMT
> +}
> +
> +void test_macro_2 (const char *msg)
> +{
> +#define PRIu32 "u" /* { 

Re: [PATCH] Fix gcc.dg/tree-ssa/scev-[345].c testcases

2017-01-16 Thread Rainer Orth
Hi Richard,

> The following is an attempt to change those testcases to be less dependent
> on previous passes.  The original motivation of the testcases seems to be
> testing SCEV capabilities and in turn IVOPTs decisions, thus the testcases
> are changed to check the IVO dump, use the GIMPLE FE feeding the loop
> pipeline directly and skip lowering/store-motion we meanwhile do to
> the testcase.
>
> To avoid some existing issue with CFG construction after GIMPLE parsing
> we need to be able to add GIMPLE_NOPs which the patch enables to generate
> from empty stmts (previously those resulted in parse errors).
>
> Tested the testcases on x86_64 with {,-m32} sofar I'll appreciate
> testing on more targets.

I had included the patch in this weekend's Solaris bootstraps and all
scev-?.c failures were gone on both sparc and x86.

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH] Fix gcc.dg/tree-ssa/scev-[345].c testcases

2017-01-16 Thread Richard Biener
On Mon, 16 Jan 2017, Christophe Lyon wrote:

> On 13 January 2017 at 12:16, Bin.Cheng  wrote:
> > On Fri, Jan 13, 2017 at 9:46 AM, Richard Biener  wrote:
> >>
> >> The following is an attempt to change those testcases to be less dependent
> >> on previous passes.  The original motivation of the testcases seems to be
> >> testing SCEV capabilities and in turn IVOPTs decisions, thus the testcases
> >> are changed to check the IVO dump, use the GIMPLE FE feeding the loop
> >> pipeline directly and skip lowering/store-motion we meanwhile do to
> >> the testcase.
> >>
> >> To avoid some existing issue with CFG construction after GIMPLE parsing
> >> we need to be able to add GIMPLE_NOPs which the patch enables to generate
> >> from empty stmts (previously those resulted in parse errors).
> >>
> >> Tested the testcases on x86_64 with {,-m32} sofar I'll appreciate
> >> testing on more targets.
> > I checked aarch64-elf/aarch64-linux with default configuration, all
> > passed with this change.
> >
> 
> For me, the testcases don't compile with this patch:
> gcc.dg/tree-ssa/scev-3.c: In function 'f':
> gcc.dg/tree-ssa/scev-3.c:30:3: error: '__MEM' undeclared (first use in
> this function)
> gcc.dg/tree-ssa/scev-3.c:30:3: note: each undeclared identifier is
> reported only once for each function it appears in
> gcc.dg/tree-ssa/scev-3.c:30:9: error: expected '=' before '<' token
> 
> Did I misapply the patch?

You need

2017-01-12  Richard Biener  

* gimple-parser.c (c_parser_gimple_postfix_expression): Parse
__MEM.

in c/ChangeLog for it to work.

Richard.

> > Thanks,
> > bin
> >>
> >> Full bootstrap / regtest running on x86_64-unknown-linux-gnu.
> >>
> >> Richard.
> >>
> >
> >> -}
> >>  }
> >>
> >> -/* { dg-final { scan-tree-dump-times "" 1 "optimized" } } */
> >> +/* { dg-final { scan-tree-dump-times "" 1 "ivopts" } } */
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


[PATCH] Fix PR71433

2017-01-16 Thread Richard Biener

The following makes VRP deal better with the situation where the
same assertion is to be inserted on all predecessors of a BB.
That avoids the spurious array-bound warning (and enables some
optimization) when jump-threading f**ed up the CFG in such redundant
way.

(the fully "correct" way to deal with this VRP shortcoming is
to make assert insertion also insert PHIs for merges as there are
cases - such like this - where the merged range is useful;  but
as comments say PHI insertion is expensive and via update-ssa
it is also quite excessive)

Bootstrap / regtest running on x86_64-unknown-linux-gnu.

Richard.

2017-01-16  Richard Biener  

PR tree-optimization/71433
* tree-vrp.c (register_new_assert_for): Merge same asserts
on all incoming edges.
(process_assert_insertions_for): Handle insertions at the
beginning of BBs.

* gcc.dg/Warray-bounds-20.c: New testcase.

Index: gcc/tree-vrp.c
===
--- gcc/tree-vrp.c  (revision 244484)
+++ gcc/tree-vrp.c  (working copy)
@@ -5032,6 +5032,17 @@ register_new_assert_for (tree name, tree
  loc->si = si;
  return;
}
+ /* If we have the same assertion on all incoming edges of a BB
+instead insert it at the beginning of it.  */
+ if (e && loc->e
+ && dest_bb == loc->e->dest
+ && EDGE_COUNT (dest_bb->preds) == 2)
+   {
+ loc->bb = dest_bb;
+ loc->e = NULL;
+ loc->si = gsi_none ();
+ return;
+   }
}
 
   /* Update the last node of the list and move to the next one.  */
@@ -6429,6 +6440,15 @@ process_assert_insertions_for (tree name
   return true;
 }
 
+  /* If the stmt iterator points at the end then this is an insertion
+ at the beginning of a block.  */
+  if (gsi_end_p (loc->si))
+{
+  gimple_stmt_iterator si = gsi_after_labels (loc->bb);
+  gsi_insert_before (, assert_stmt, GSI_SAME_STMT);
+  return false;
+
+}
   /* Otherwise, we can insert right after LOC->SI iff the
  statement must not be the last statement in the block.  */
   stmt = gsi_stmt (loc->si);
Index: gcc/testsuite/gcc.dg/Warray-bounds-20.c
===
--- gcc/testsuite/gcc.dg/Warray-bounds-20.c (nonexistent)
+++ gcc/testsuite/gcc.dg/Warray-bounds-20.c (working copy)
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -Warray-bounds" } */
+
+int t[1];
+int fct (int r, long e)
+{
+  int d = 0;
+  if (r == 4)
+r = 1;
+  if (e < -52)
+d = r == 0 ? 1 : 2;
+  else
+{
+  int i, n = 53;
+  if (__builtin_expect (e < 0, 0))
+   n += e;
+  for (i = 1 ; i < n / 64 + 1 ; i++)
+   t[i] = 0; /* { dg-bogus "array bounds" } */
+}
+  return d;
+}


Re: [PATCH] BRIG frontend: request for a global review

2017-01-16 Thread Pekka Jääskeläinen
OK,

I'll see into adapting the Jakub's idea and also check if some of the
simplest builtins are better expanded
directly to tree nodes instead.

I'm not sure if lto support is needed though as the assumption now is
to have fully linked input to this FE
(all necessary BRIG modules fed in at build time which can be
guaranteed by the HSA finalizer runtime API)
without modules from other FEs never linked in.

Quickly testing, I couldn't measure startup time difference with the
builtins on/off, but of course there
will be some cycles wasted somewhere especially if some code in gcc
traverses all the builtins.

For the audience that might want to enable the FE, I expect the BRIG
FE to be used with x86_64
to have an HSA CPU Agent e.g. with phsa to avoid the need for a HSA
supported GPU in the
machine for developing HSA using projects, and also by other HSA
finalizer implementations
downstream or up. Hopefully this includes also finalization for
AMDGPUs or PTX in the future (if someone
is interested in implementing it). So, there might not be "mainstream
needs" currently that would
warrant enabling by default currently for masses, but hopefully more
so in the future after HSA gets
wider adoption.

Thanks,
Pekka

On Mon, Jan 16, 2017 at 11:07 AM, Jakub Jelinek  wrote:
> On Mon, Jan 16, 2017 at 09:46:43AM +0100, Richard Biener wrote:
>> There are 187 of them (well, simple grep of DEF_HSAIL, so probably a bit 
>> less).
>> They aren't really documented but I guess that __hsail_bitmask_u64 for 
>> example
>> is really equivalent to sth like -1U >> n << m?  So I'm not sure why
>> you have builtins
>> like these represened as functions rather than as "expanded" code sequences?
>>
>> If that's the ones you are talking about having special target
>> specific expansion.
>>
>> Note that builtins add to GCC startup times and if you don't expect
>> people to enable
>> BRIG then I wonder why you are submitting it for inclusion ;)
>
> I guess the question is when the DEF_HSAIL* builtins are actually needed.
> If the FE is separate from the other FEs, I guess it would be enough to
> define those builtins
> 1) in the BRIG FE
> 2) in tree-core.h
> 3) in lto1 (only if any such builtin has been seen in the IL
>
> So, perhaps define DEF_HSAIL* just to DEF_BUILTIN_STUB in builtins.def
> unless already defined, and override it in the BRIG FE where you create its
> builtins, and then have some routine in the middle-end similar to
> initialize_sanitizer_builtins which lazily initializes the DEF_HSAIL*
> builtins during LTO reading if a call to any of the builtins in the hsail
> range is noticed?
>
> Jakub


Re: [PATCH] Fix gcc.dg/tree-ssa/scev-[345].c testcases

2017-01-16 Thread Christophe Lyon
On 13 January 2017 at 12:16, Bin.Cheng  wrote:
> On Fri, Jan 13, 2017 at 9:46 AM, Richard Biener  wrote:
>>
>> The following is an attempt to change those testcases to be less dependent
>> on previous passes.  The original motivation of the testcases seems to be
>> testing SCEV capabilities and in turn IVOPTs decisions, thus the testcases
>> are changed to check the IVO dump, use the GIMPLE FE feeding the loop
>> pipeline directly and skip lowering/store-motion we meanwhile do to
>> the testcase.
>>
>> To avoid some existing issue with CFG construction after GIMPLE parsing
>> we need to be able to add GIMPLE_NOPs which the patch enables to generate
>> from empty stmts (previously those resulted in parse errors).
>>
>> Tested the testcases on x86_64 with {,-m32} sofar I'll appreciate
>> testing on more targets.
> I checked aarch64-elf/aarch64-linux with default configuration, all
> passed with this change.
>

For me, the testcases don't compile with this patch:
gcc.dg/tree-ssa/scev-3.c: In function 'f':
gcc.dg/tree-ssa/scev-3.c:30:3: error: '__MEM' undeclared (first use in
this function)
gcc.dg/tree-ssa/scev-3.c:30:3: note: each undeclared identifier is
reported only once for each function it appears in
gcc.dg/tree-ssa/scev-3.c:30:9: error: expected '=' before '<' token

Did I misapply the patch?

> Thanks,
> bin
>>
>> Full bootstrap / regtest running on x86_64-unknown-linux-gnu.
>>
>> Richard.
>>
>
>> -}
>>  }
>>
>> -/* { dg-final { scan-tree-dump-times "" 1 "optimized" } } */
>> +/* { dg-final { scan-tree-dump-times "" 1 "ivopts" } } */


[PATCH] Make multiple_target.c aware of LTO (PR lto/66295)

2017-01-16 Thread Martin Liška
Hello.

Not being expert in multi_target area, however it consists of 2 passes. The 
first
one (ipa_target_clone) is responsible for creation of multiple targets for 
functions
decorated with __attribute__((target_clones("xxx"))). I guess the pass should be
called just in LGEN phase and consecutive clone materialization takes care of 
these
clones. I'm also adding lto test-case.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
mvc test-cases work find on x86_64-linux-gnu.

Ready to be installed?
Martin
>From 7ec7045680e10838c43b2713a4fa34b205ba5004 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Fri, 13 Jan 2017 15:26:45 +0100
Subject: [PATCH] Make multiple_target.c aware of LTO (PR lto/66295)

gcc/ChangeLog:

2017-01-13  Martin Liska  

	PR lto/66295
	* multiple_target.c (pass_target_clone::gate): Run the pass
	just in LGEN (in LTO mode).

gcc/testsuite/ChangeLog:

2017-01-13  Martin Liska  

	PR lto/66295
	* gcc.target/i386/mvc9.c: New test.
---
 gcc/multiple_target.c|  2 +-
 gcc/testsuite/gcc.target/i386/mvc9.c | 29 +
 2 files changed, 30 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/mvc9.c

diff --git a/gcc/multiple_target.c b/gcc/multiple_target.c
index 5be3980db20..3df1e297122 100644
--- a/gcc/multiple_target.c
+++ b/gcc/multiple_target.c
@@ -378,7 +378,7 @@ public:
 bool
 pass_target_clone::gate (function *)
 {
-  return true;
+  return !flag_wpa && !flag_ltrans;
 }
 
 } // anon namespace
diff --git a/gcc/testsuite/gcc.target/i386/mvc9.c b/gcc/testsuite/gcc.target/i386/mvc9.c
new file mode 100644
index 000..d510b6c18b4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/mvc9.c
@@ -0,0 +1,29 @@
+/* { dg-do run } */
+/* { dg-require-ifunc "" } */
+/* { dg-require-effective-target lto } */
+/* { dg-options "-flto" } */
+
+__attribute__((target_clones("avx","arch=slm","arch=core-avx2","default")))
+int
+foo ()
+{
+  return -2;
+}
+
+int
+bar ()
+{
+  return 2;
+}
+
+int
+main ()
+{
+  int r = 0;
+  r += bar ();
+  r += foo ();
+  r += bar ();
+  r += foo ();
+  r += bar ();
+  return r - 2;
+}
-- 
2.11.0



Re: [PATCH] BRIG frontend: request for a global review

2017-01-16 Thread Jakub Jelinek
On Mon, Jan 16, 2017 at 09:46:43AM +0100, Richard Biener wrote:
> There are 187 of them (well, simple grep of DEF_HSAIL, so probably a bit 
> less).
> They aren't really documented but I guess that __hsail_bitmask_u64 for example
> is really equivalent to sth like -1U >> n << m?  So I'm not sure why
> you have builtins
> like these represened as functions rather than as "expanded" code sequences?
> 
> If that's the ones you are talking about having special target
> specific expansion.
> 
> Note that builtins add to GCC startup times and if you don't expect
> people to enable
> BRIG then I wonder why you are submitting it for inclusion ;)

I guess the question is when the DEF_HSAIL* builtins are actually needed.
If the FE is separate from the other FEs, I guess it would be enough to
define those builtins
1) in the BRIG FE
2) in tree-core.h
3) in lto1 (only if any such builtin has been seen in the IL

So, perhaps define DEF_HSAIL* just to DEF_BUILTIN_STUB in builtins.def
unless already defined, and override it in the BRIG FE where you create its
builtins, and then have some routine in the middle-end similar to
initialize_sanitizer_builtins which lazily initializes the DEF_HSAIL*
builtins during LTO reading if a call to any of the builtins in the hsail
range is noticed?

Jakub


Re: [PATCH][PR tree-optimization/79090] Fix two minor DSE bugs

2017-01-16 Thread Richard Biener
On Sun, Jan 15, 2017 at 10:34 AM, Jeff Law  wrote:
>
> At one time I know I had the max_size == size test in valid_ao_ref_for_dse.
> But it got lost at some point.  This is what caused the Ada failure.
>
> Technically it'd be OK for the potentially dead store to have a variable
> size as long as the later stores covered the entire range of the potentially
> dead store.  I doubt this happens enough to be worth checking.
>
> The ppc64 big endian failures were more interesting.  We had this in the IL:
>
> memmove (dst, src, 0)
>
> The trimming code assumes that there's at least one live byte in the store,
> which obviously isn't the case here.  The net result is we compute an
> incorrect trim and the copy goes wild with incorrect addresses and lengths.
> This is trivial to fix by validating that the store has a nonzero length.
>
> I was a bit curious how often this happened in practice because such a call
> is trivially dead.  ~80 during a bootstrap and a few dozen in the testsuite.
> Given how trivial it is to detect and optimize, this patch includes removal
> of such calls.  This hunk makes the check for zero size in
> valid_ao_ref_for_dse redundant, but I'd like to keep the check -- if we add
> more builtin support without filtering zero size we'd regress again.

Interesting - we do fold memset (..., 0) away so this means we either
have an unfolded
memset stmt in the IL before DSE.

> I wanted to get this in tonight given the Ada and ppc64 breakage so I didn't
> do testcase reductions for the ppc test or the zero length optimization
> (acats covers the Ada breakage).   I'll get testcases done Monday.  I'll
> also finish generating suitable baselines for ppc64-linux-gnu over the rest
> of the weekend to verify if this fixes all the ppc64 regressions or just a
> subset of them.
>
> Bootstrapped on x86_64-linux-gnu and ppc64-linux-gnu.  Regression tested on
> x86_64-linux-gnu.  Also verified the ppc64 testresults are improved
> (anything where sel-sched was faulting ought to be fixed now,  maybe
> others).  Installing on the trunk.
>
> Jeff
>
> commit 04c5d9cab65e2f5b219f9610621bea06173b9cb8
> Author: Jeff Law 
> Date:   Sun Jan 15 01:37:32 2017 -0700
>
> PR tree-optimization/79090
> * tree-ssa-dse.c (valid_ao_ref_for_dse): Reject zero length and
> variable length stores.
> (compute_trims): Delete dead assignment to *trim_tail.
> (dse_dom_walker::dse_optimize_stmt): Optimize mem* calls with
> zero length.
>
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index 746f6af..36982c6 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,12 @@
> +2017-01-14  Jeff Law  
> +
> +   PR tree-optimization/79090
> +   * tree-ssa-dse.c (valid_ao_ref_for_dse): Reject zero length and
> +   variable length stores.
> +   (compute_trims): Delete dead assignment to *trim_tail.
> +   (dse_dom_walker::dse_optimize_stmt): Optimize mem* calls with
> +   zero length.
> +
>  2017-01-14  Bernd Schmidt  
>
> PR rtl-optimization/78626
> diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c
> index 2e6f8ff..65dcb12 100644
> --- a/gcc/tree-ssa-dse.c
> +++ b/gcc/tree-ssa-dse.c
> @@ -129,6 +129,8 @@ valid_ao_ref_for_dse (ao_ref *ref)
>  {
>return (ao_ref_base (ref)
>   && ref->max_size != -1
> + && ref->size != 0
> + && ref->max_size == ref->size
>   && (ref->offset % BITS_PER_UNIT) == 0
>   && (ref->size % BITS_PER_UNIT) == 0
>   && (ref->size != -1));
> @@ -221,7 +223,6 @@ compute_trims (ao_ref *ref, sbitmap live, int
> *trim_head, int *trim_tail)
>   the REF to compute the trims.  */
>
>/* Now identify how much, if any of the tail we can chop off.  */
> -  *trim_tail = 0;
>int last_orig = (ref->size / BITS_PER_UNIT) - 1;
>int last_live = bitmap_last_set_bit (live);
>*trim_tail = (last_orig - last_live) & ~0x1;
> @@ -700,6 +701,16 @@ dse_dom_walker::dse_optimize_stmt (gimple_stmt_iterator
> *gsi)
>   case BUILT_IN_MEMMOVE:
>   case BUILT_IN_MEMSET:
> {
> + /* Occasionally calls with an explicit length of zero
> +show up in the IL.  It's pointless to do analysis
> +on them, they're trivially dead.  */
> + tree size = gimple_call_arg (stmt, 2);
> + if (integer_zerop (size))
> +   {
> + delete_dead_call (gsi);
> + return;
> +   }
> +
>   gimple *use_stmt;
>   enum dse_store_status store_status;
>   m_byte_tracking_enabled
>


Re: [PATCH] BRIG frontend: request for a global review

2017-01-16 Thread Richard Biener
On Fri, Jan 13, 2017 at 4:54 PM, Pekka Jääskeläinen  wrote:
> On Fri, Jan 13, 2017 at 2:34 PM, Richard Biener
>  wrote:
>> On Thu, Jan 12, 2017 at 3:55 PM, Pekka Jääskeläinen  
>> wrote:
>>> Hi,
>>>
>>> A gentle ping...
>>
>> Looking at 002/
>>
>> What's the reason of having brig-builtins.def?  I do not see any
>> middle-end stuff using them
>> and if you just emit calls to the runtime then you do not need
>> "builtins" for this -- just build
>> the function decls in the frontend.  The Fortran frontend has examples
>> for how to do that
>> for almost all entries to the libgfortran runtime.
>>
>> So all changes besides in brig/ look unnecessary to me.
>
> Most of the HSAIL instructions represented as builtins here are fine
> grained enough for
> them to be candidates for expanding as sensible target instruction
> sequences instead of runtime
> function calls.
>
> I understand that there is no code in any of the targets to do so yet.
> But still I wonder is it
> meaningful to convert them to direct runtime function calls -- if/when
> a target wants
> to optimize their HSAIL codegen they have to be moved back. Especially given
> the BRIG FE is not enabled by default and the inclusion of the BRIG
> builtins is done
> only if enabled, does it cause harm?

There are 187 of them (well, simple grep of DEF_HSAIL, so probably a bit less).
They aren't really documented but I guess that __hsail_bitmask_u64 for example
is really equivalent to sth like -1U >> n << m?  So I'm not sure why
you have builtins
like these represened as functions rather than as "expanded" code sequences?

If that's the ones you are talking about having special target
specific expansion.

Note that builtins add to GCC startup times and if you don't expect
people to enable
BRIG then I wonder why you are submitting it for inclusion ;)

Richard.

> BR,
> --Pekka


Re: [PATCH] system.h: Poison strndup (PR bootstrap/78616)

2017-01-16 Thread Richard Biener
On Fri, Jan 13, 2017 at 10:17 PM, David Malcolm  wrote:
> This patch poisons strndup (in system.h), as requested in the
> discussion of PR bootstrap/78616.
>
> Successfully bootstrapped on x86_64-pc-linux-gnu.
>
> OK for trunk?

Ok.

Richard.

> gcc/ChangeLog:
> PR bootstrap/78616
> * system.h: Poison strndup.
> ---
>  gcc/system.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/system.h b/gcc/system.h
> index 0cd58db..c0f08a9 100644
> --- a/gcc/system.h
> +++ b/gcc/system.h
> @@ -840,7 +840,8 @@ extern void fancy_abort (const char *, int, const char *) 
> ATTRIBUTE_NORETURN;
>  #ifndef USES_ISL
>  #undef calloc
>  #undef strdup
> - #pragma GCC poison calloc strdup
> +#undef strndup
> + #pragma GCC poison calloc strdup strndup
>  #endif
>
>  #if !defined(FLEX_SCANNER) && !defined(YYBISON)
> --
> 1.8.5.3
>


[ping] 3 aarch64/arm/rs6000 patches

2017-01-16 Thread Eric Botcazou
aarch64 (Enable descriptors for nested functions in Ada):
  https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01253.html

arm (Enable descriptors for nested functions in Ada):
  https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01254.html

rs6000 (Fix reload failures in 64-bit mode):
  https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00568.html

Thanks in advance.

-- 
Eric Botcazou


Re: [patch,avr]: Increase branch costs after reload.

2017-01-16 Thread Denis Chertykov
2017-01-13 17:28 GMT+04:00 Georg-Johann Lay :
> This adds a penalty of 4 to the post-reload branch costs.
>
> Purpose is reduce the number of out-of-line blocks like in
>
> unsigned long variant5 (unsigned in)
> {
> unsigned long out = 0;
> if (in & (1 << 0)) out |= 0xful << (4*0);
> if (in & (1 << 1)) out |= 0xful << (4*1);
> if (in & (1 << 2)) out |= 0xful << (4*2);
> if (in & (1 << 3)) out |= 0xful << (4*3);
> return out;
> }
>
> without the patch, code is
>
>
> variant5:
> mov r18,r24  ;  67  movqi_insn/1[length = 1]
> sbrs r24,0   ;  10  *sbrx_branchhi  [length = 2]
> rjmp .L6
> ldi r22,lo8(15)  ;  5   *movsi/5[length = 4]
> ldi r23,0
> ldi r24,0
> ldi r25,0
> .L2:
> 
> .L6:
> ldi r22,0;  4   *movsi/2[length = 3]
> ldi r23,0
> movw r24,r22
> rjmp .L2 ;  74  jump[length = 1]
>
>
> and with the patch it reads:
>
> variant5:
> mov r18,r24  ;  67  movqi_insn/1[length = 1]
> ldi r22,lo8(15)  ;  5   *movsi/5[length = 4]
> ldi r23,0
> ldi r24,0
> ldi r25,0
> sbrc r18,0   ;  10  *sbrx_branchhi  [length = 2]
> rjmp .L2
> ldi r22,0;  4   *movsi/2[length = 3]
> ldi r23,0
> movw r24,r22
> .L2:
> 
>
>
> Using fall-through safes insn 74.
>
> Main blocker for not increasing default branch costs in general
> is do_store_flag which is a heap of assertions not using
> rtx_costs and which gives the best results with the old
> default of 0, which is not changed.
>
> Tested without regressions.
>
> Ok for trunk?
>
> Johann
>
> * config/avr/avr.h (BRANCH_COST) [reload_completed]: Increase by 4.


Approved.
Please apply.