Re: [PATCH, rs6000] GIMPLE folding for vector compares

2017-10-12 Thread Segher Boessenkool
Hi Will,

On Thu, Oct 12, 2017 at 03:03:12PM -0500, Will Schmidt wrote:
>   * config/rs6000/rs6000.c: (rs6000_gimple_fold_builtin) Add support for
>   folding of vector compares.  (builtin_function_type) Add compare
>   builtins to the list of functions having unsigned arguments.
>   * config/rs6000/vsx.md:  Add vcmpne{b,h,w} instructions.

* config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add support for
folding of vector compares.
(builtin_function_type): Add compare builtins to the list of functions
having unsigned arguments.
* config/rs6000/vsx.md (vcmpneb_spec, vcmpneh_spec, vcmpnew_spec): New.

> +case ALTIVEC_BUILTIN_VCMPEQUB:
> +case ALTIVEC_BUILTIN_VCMPEQUH:
> +case ALTIVEC_BUILTIN_VCMPEQUW:
> +case P8V_BUILTIN_VCMPEQUD:
> +  {
> + arg0 = gimple_call_arg (stmt, 0);
> + arg1 = gimple_call_arg (stmt, 1);
> + lhs = gimple_call_lhs (stmt);
> + gimple *g = gimple_build_assign (lhs, EQ_EXPR, arg0, arg1);
> + gimple_set_location (g, gimple_location (stmt));
> + gsi_replace (gsi, g, true);
> + return true;
> +  }

I wonder how much it helps to factor out the bodies here...  So this could
be like:

+case ALTIVEC_BUILTIN_VCMPEQUB:
+case ALTIVEC_BUILTIN_VCMPEQUH:
+case ALTIVEC_BUILTIN_VCMPEQUW:
+case P8V_BUILTIN_VCMPEQUD:
+  gsi_replace_call_2arg (gsi, EQ_EXPR, stmt);
+  return true;

with

static void
gsi_replace_call_2arg (gimple_stmt_iterator *gsi, tree_code code, gimple *stmt)
{
  tree arg0 = gimple_call_arg (stmt, 0);
  tree arg1 = gimple_call_arg (stmt, 1);
  tree lhs = gimple_call_lhs (stmt);
  gimple *g = gimple_build_assign (lhs, code, arg0, arg1);
  gimple_set_location (g, gimple_location (stmt));
  gsi_replace (gsi, g, true);
}

(But maybe too many other cases need special code?  And it could use a
better name).

>  default:
>   if (TARGET_DEBUG_BUILTIN)
>  fprintf (stderr, "gimple builtin intrinsic not matched:%d %s %s\n",
>   fn_code, fn_name1, fn_name2);
>break;
>  }
> -
>return false;
>  }

Please drop this part.  Whitespace is good ;-)

> @@ -18112,10 +18188,27 @@ builtin_function_type (machine_mode mode_ret, 
> machine_mode mode_arg0,

> +  h.uns_p[1]=1;
> +  h.uns_p[2]=1;

+  h.uns_p[1] = 1;
+  h.uns_p[2] = 1;


> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md

> +;; Vector Compare Not Equal Byte (specified/not+eq:)
> +(define_insn "vcmpneb_spec"
> +  [(set (match_operand:V16QI 0 "altivec_register_operand" "=v")
> +  (not:V16QI
> +(eq:V16QI (match_operand:V16QI 1 "altivec_register_operand" "v")
> +  (match_operand:V16QI 2 "altivec_register_operand" "v"]
> +  "TARGET_P9_VECTOR"
> +  "vcmpneb %0,%1,%2"
> +  [(set_attr "type" "vecsimple")]
> +)

+  [(set_attr "type" "vecsimple")])

What does "_spec" mean?  That it is not an unspec?  :-)

If a name is not (expected to be) used directly, it should start with *.

Do we still need the unspec version?


Segher


[PATCH] Add gnu::unique_ptr

2017-10-12 Thread David Malcolm
From: Trevor Saunders 

I had a go at updating Trevor's unique_ptr patch from July
( https://gcc.gnu.org/ml/gcc-patches/2017-07/msg02084.html )

One of the sticking points was what to call the namespace; there was
wariness about using "gtl" as the name.

Richi proposed (https://gcc.gnu.org/ml/gcc-patches/2017-09/msg00155.html):
> If it should be short use g::.  We can also use gnu:: I guess and I
> agree gnutools:: is a little long (similar to libiberty::).  Maybe
> gt:: as a short-hand for gnutools.  

Pedro noted (https://gcc.gnu.org/ml/gcc-patches/2017-09/msg00157.html):
> Exactly 3 letters has the nice property of making s/gtl::foo/std::foo/
> super trivial down the road; you don't have to care about reindenting
> stuff

Hence this version of the patch uses "gnu::" - 3 letters, one of the
ones Richi proposed, and *not* a match for ".tl" (e.g. "gtl");
(FWIW personally "gnu::" is my favorite, followed by "gcc::").

The include/unique-ptr.h in this patch is identical to that posted
by Trevor in July, with the following changes (by me):
- renaming of "gtl" to "gnu" 
- renaming of DEFINE_GDB_UNIQUE_PTR to DEFINE_GNU_UNIQUE_PTR
- renaming of xfree_deleter to xmalloc_deleter, and making it
  use "free" rather than "xfree" (which doesn't exist)

I also went and added a gcc/unique-ptr-tests.cc file containing
selftests (my thinking here is that although std::unique_ptr ought
to already be well-tested, we need to ensure that the fallback
implementation is sane when building with C++ prior to C++11).

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu,
using gcc 4.8 for the initial bootstrap (hence testing both gnu+03
and then gnu++14 in the selftests, for stage 1 and stages 2 and 3
respectively).

I also manually tested selftests with both gcc 4.8 and trunk on
the same hardware (again, to exercise both the with/without C++11
behavior).

Tested with "make selftest-valgrind" (no new issues).

OK for trunk?

Motivation/use-cases:
(a) I have an updated version of the
name_hint/deferred_diagnostic patch kit, which uses this (rather
than the refcounting used by
https://gcc.gnu.org/ml/gcc-patches/2017-05/msg00439.html ); and
(b) having this available ought to allow various other cleanups
e.g. the example ones identified by Trevor here:
  https://gcc.gnu.org/ml/gcc-patches/2017-07/msg02085.html

Thanks
Dave


Blurb from Trevor:

For most of the history of this see
  https://sourceware.org/ml/gdb-patches/2016-10/msg00223.html
The changes are mostly s/gdb/gtl/g

include/ChangeLog:

2017-07-29  Trevor Saunders  

* unique-ptr.h: New file.

Combined ChangeLog follows:

gcc/ChangeLog:

David Malcolm 

* Makefile.in (OBJS): Add unique-ptr-tests.o.
* selftest-run-tests.c (selftest::run_tests): Call
selftest::unique_ptr_tests_cc_tests.
* selftest.h (selftest::unique_ptr_tests_cc_tests): New decl.
* unique-ptr-tests.cc: New file.

include/ChangeLog:

Trevor Saunders  
David Malcolm 

* unique-ptr.h: New file.
---
 gcc/Makefile.in  |   1 +
 gcc/selftest-run-tests.c |   1 +
 gcc/selftest.h   |   1 +
 gcc/unique-ptr-tests.cc  | 177 ++
 include/unique-ptr.h | 386 +++
 5 files changed, 566 insertions(+)
 create mode 100644 gcc/unique-ptr-tests.cc
 create mode 100644 include/unique-ptr.h

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 878ce7b..2809619 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1568,6 +1568,7 @@ OBJS = \
tree-vrp.o \
tree.o \
typed-splay-tree.o \
+   unique-ptr-tests.o \
valtrack.o \
value-prof.o \
var-tracking.o \
diff --git a/gcc/selftest-run-tests.c b/gcc/selftest-run-tests.c
index 30e476d..b05e0fc 100644
--- a/gcc/selftest-run-tests.c
+++ b/gcc/selftest-run-tests.c
@@ -66,6 +66,7 @@ selftest::run_tests ()
   sreal_c_tests ();
   fibonacci_heap_c_tests ();
   typed_splay_tree_c_tests ();
+  unique_ptr_tests_cc_tests ();
 
   /* Mid-level data structures.  */
   input_c_tests ();
diff --git a/gcc/selftest.h b/gcc/selftest.h
index 96eccac..adc0b68 100644
--- a/gcc/selftest.h
+++ b/gcc/selftest.h
@@ -194,6 +194,7 @@ extern void store_merging_c_tests ();
 extern void typed_splay_tree_c_tests ();
 extern void tree_c_tests ();
 extern void tree_cfg_c_tests ();
+extern void unique_ptr_tests_cc_tests ();
 extern void vec_c_tests ();
 extern void wide_int_cc_tests ();
 extern void predict_c_tests ();
diff --git a/gcc/unique-ptr-tests.cc b/gcc/unique-ptr-tests.cc
new file mode 100644
index 000..df18467
--- /dev/null
+++ b/gcc/unique-ptr-tests.cc
@@ -0,0 +1,177 @@
+/* Unit tests for unique-ptr.h.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms 

Re: [PATCH, rs6000] Correct unaligned_load vector cost for Power9

2017-10-12 Thread Segher Boessenkool
On Mon, Oct 09, 2017 at 03:46:17PM -0500, Pat Haugen wrote:
> Power9 has efficient unaligned load insns. The following patch fixes the
> cost to reflect that. There was no similar code for the unaligned_store
> case.  Bootstrap/regtest on powerpc64le-linux with no new regressions.
> Ok for trunk?

Sorry this fell through the cracks.  Yes, okay.  Thanks!


Segher


> 2017-10-09  Pat Haugen  
> 
>   * config/rs6000/power9.c (rs6000_builtin_vectorization_cost): Remove
>   TARGET_P9_VECTOR code for unaligned_load case.
> 
> 
> Index: gcc/config/rs6000/rs6000.c
> ===
> --- gcc/config/rs6000/rs6000.c(revision 253547)
> +++ gcc/config/rs6000/rs6000.c(working copy)
> @@ -5438,9 +5438,6 @@ rs6000_builtin_vectorization_cost (enum
>  return 3;
> 
>case unaligned_load:
> - if (TARGET_P9_VECTOR)
> -   return 3;
> -
>   if (TARGET_EFFICIENT_UNALIGNED_VSX)
> return 1;


Re: [PATCH 09/22] Enable building libbacktrace with Intel CET

2017-10-12 Thread Ian Lance Taylor
"Tsimbalist, Igor V"  writes:

> Enable building libbacktrace with CET options.
>
> libbacktrace/
>   * configure.ac: Add CET_FLAGS to EXTRA_FLAGS.
>   * aclocal.m4: Regenerate.
>   * Makefile.in: Likewise.
>   * configure: Likewise.

> +if test x$enable_cet = xyes; then
> +  CET_FLAGS="-fcf-protection -mcet -include cet.h"
> +fi

Is this really right?  Why the -include option?  CET protection sounds
like it should work for any language, but -include is C-specific and
doesn't seem to have anything to do with code generation.

Of course, for libbacktrace, that is a generated file.  The patch to
libbacktrace/configure.ac is fine if the general approach is approved.

Ian


Re: [PATCH 08/22] Add Intel CET support for EH in libgcc.

2017-10-12 Thread Hans-Peter Nilsson
On Thu, 12 Oct 2017, Tsimbalist, Igor V wrote:
>   * unwind.inc (_Unwind_RaiseException_Phase2): Use FRAMES_P_DECL,
>   FRAMES_VAR_DECL_1, FRAMES_VAR_INC and FRAMES_P_UPDATE.
>   (_Unwind_RaiseException): Use FRAMES_VAR_DECL, FRAMES_VAR_P and
>   FRAMES_VAR.
>   (_Unwind_ForcedUnwind_Phase2): Use FRAMES_P_DECL,
>   FRAMES_VAR_DECL_1, FRAMES_VAR_INC, FRAMES_P_UPDATE.
>   (_Unwind_ForcedUnwind): Use FRAMES_VAR_DECL, FRAMES_VAR_P and
>   FRAMES_VAR.
>   (_Unwind_Resume): Use FRAMES_VAR_DECL, FRAMES_VAR_P and
>   FRAMES_VAR.
>   (_Unwind_Resume_or_Rethrow): Use FRAMES_VAR_DECL, FRAMES_VAR_P
>   and FRAMES_VAR.

I think you'll have to test this on a non-CET target too,
because it looks like this won't fly:

-  uw_install_context (_context, _context);
+  uw_install_context (_context, _context, FRAMES_VAR);

You'll be introducing a naked comma for the default empty
FRAMES_VAR.  I like the basic approach though, FWIW.

brgds, H-P


RE: [PATCH 07/22] Enable building libgcc with CET options.

2017-10-12 Thread Tsimbalist, Igor V
> -Original Message-
> From: Joseph Myers [mailto:jos...@codesourcery.com]
> Sent: Thursday, October 12, 2017 10:36 PM
> To: Tsimbalist, Igor V 
> Cc: gcc-patches@gcc.gnu.org; Jeff Law ; i...@airs.com
> Subject: Re: [PATCH 07/22] Enable building libgcc with CET options.
> 
> On Thu, 12 Oct 2017, Tsimbalist, Igor V wrote:
> 
> > Enable building libgcc with CET options by default on Linux/x86 if
> > binutils supports CET v2.0.
> > It can be disabled with --disable-cet.  It is an error to configure
> > GCC with --enable-cet if bintuiils doesn't support CET v2.0.
> >
> > config/
> > * cet.m4: New file
> 
> This file is checking $target.  That's only ever appropriate in directories
> building compilers and similar tools; target library directories should check
> $host, as the host for target libraries is the target for the compiler.

Fixed.

> This file has a comment
> 
> > +dnl GCC_CET_LIBRARY
> > +dnl(SHELL-CODE_HANDLER)
> 
> which doesn't seem to match the subsequent definition of GCC_CET_FLAGS.

Fixed.

> I don't see any documentation of the new configure option.  I'd expect the
> first patch adding such an option to document it in install.texi, and then
> subsequent patches to update that documentation if those patches extend
> the option to cover more things.

Added the description of this configure option to install.texi. 

The updated patch is attached.

Igor

> --
> Joseph S. Myers
> jos...@codesourcery.com


0007-Enable-building-libgcc-with-CET-options.patch
Description: 0007-Enable-building-libgcc-with-CET-options.patch


Re: Make more use of byte_lowpart_offset

2017-10-12 Thread Jeff Law
On 08/23/2017 04:53 AM, Richard Sandiford wrote:
> This patch uses byte_lowpart_offset in places that open-coded the
> calculation.
> 
> Tested on aarch64-linux-gnu and x86_64-linux-gnu, and by making sure
> that there were no differences in testsuite assembly output for one
> target per CPU.  OK to install?
> 
> Richard
> 
> 
> 2017-08-23  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * caller-save.c (replace_reg_with_saved_mem): Use byte_lowpart_offset.
>   * combine.c (gen_lowpart_for_combine): Likewise.
>   * dwarf2out.c (rtl_for_decl_location): Likewise.
>   * final.c (alter_subreg): Likewise.
>   * rtlhooks.c (gen_lowpart_general): Likewise.
>   (gen_lowpart_if_possible): Likewise.
OK.
jeff


Re: [PATCH] Fix bitmap_bit_in_range_p (PR tree-optimization/82493).

2017-10-12 Thread Jeff Law
On 10/11/2017 12:13 AM, Martin Liška wrote:
> 2017-10-10  Martin Liska  
> 
>   PR tree-optimization/82493
>   * sbitmap.c (bitmap_bit_in_range_p): Fix the implementation.
>   (test_range_functions): New function.
>   (sbitmap_c_tests): Likewise.
>   * selftest-run-tests.c (selftest::run_tests): Run new tests.
>   * selftest.h (sbitmap_c_tests): New function.
I went ahead and committed this along with a patch to fix the off-by-one
error in live_bytes_read.  Bootstrapped and regression tested on x86.

Actual patch attached for archival purposes.

Jeff
commit 00112593cb12ac28e78c33a0aaeebd91a09f3826
Author: law 
Date:   Thu Oct 12 21:53:21 2017 +

PR tree-optimization/82493
* sbitmap.c (bitmap_bit_in_range_p): Fix the implementation.
(test_range_functions): New function.
(sbitmap_c_tests): Likewise.
* selftest-run-tests.c (selftest::run_tests): Run new tests.
* selftest.h (sbitmap_c_tests): New function.

* tree-ssa-dse.c (live_bytes_read): Fix thinko.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@253699 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 20fb303d03a..b5981edddc4 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,14 @@
+2017-10-12  Martin Liska  
+
+   PR tree-optimization/82493
+   * sbitmap.c (bitmap_bit_in_range_p): Fix the implementation.
+   (test_range_functions): New function.
+   (sbitmap_c_tests): Likewise.
+   * selftest-run-tests.c (selftest::run_tests): Run new tests.
+   * selftest.h (sbitmap_c_tests): New function.
+
+   * tree-ssa-dse.c (live_bytes_read): Fix thinko.
+
 2017-10-12  Michael Meissner  
 
* config/rs6000/amo.h: Fix spacing issue.
diff --git a/gcc/sbitmap.c b/gcc/sbitmap.c
index 4bf13a11a1d..baef4d05f0d 100644
--- a/gcc/sbitmap.c
+++ b/gcc/sbitmap.c
@@ -21,6 +21,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "system.h"
 #include "coretypes.h"
 #include "sbitmap.h"
+#include "selftest.h"
 
 typedef SBITMAP_ELT_TYPE *sbitmap_ptr;
 typedef const SBITMAP_ELT_TYPE *const_sbitmap_ptr;
@@ -322,29 +323,22 @@ bitmap_set_range (sbitmap bmap, unsigned int start, 
unsigned int count)
 bool
 bitmap_bit_in_range_p (const_sbitmap bmap, unsigned int start, unsigned int 
end)
 {
+  gcc_checking_assert (start <= end);
   unsigned int start_word = start / SBITMAP_ELT_BITS;
   unsigned int start_bitno = start % SBITMAP_ELT_BITS;
 
-  /* Testing within a word, starting at the beginning of a word.  */
-  if (start_bitno == 0 && (end - start) < SBITMAP_ELT_BITS)
-{
-  SBITMAP_ELT_TYPE mask = ((SBITMAP_ELT_TYPE)1 << (end - start)) - 1;
-  return (bmap->elms[start_word] & mask) != 0;
-}
-
   unsigned int end_word = end / SBITMAP_ELT_BITS;
   unsigned int end_bitno = end % SBITMAP_ELT_BITS;
 
-  /* Testing starts somewhere in the middle of a word.  Test up to the
- end of the word or the end of the requested region, whichever comes
- first.  */
+  /* Check beginning of first word if different from zero.  */
   if (start_bitno != 0)
 {
-  unsigned int nbits = ((start_word == end_word)
-   ? end_bitno - start_bitno
-   : SBITMAP_ELT_BITS - start_bitno);
-  SBITMAP_ELT_TYPE mask = ((SBITMAP_ELT_TYPE)1 << nbits) - 1;
-  mask <<= start_bitno;
+  SBITMAP_ELT_TYPE high_mask = ~(SBITMAP_ELT_TYPE)0;
+  if (start_word == end_word && end_bitno + 1 < SBITMAP_ELT_BITS)
+   high_mask = ((SBITMAP_ELT_TYPE)1 << (end_bitno + 1)) - 1;
+
+  SBITMAP_ELT_TYPE low_mask = ((SBITMAP_ELT_TYPE)1 << start_bitno) - 1;
+  SBITMAP_ELT_TYPE mask = high_mask - low_mask;
   if (bmap->elms[start_word] & mask)
return true;
   start_word++;
@@ -364,8 +358,9 @@ bitmap_bit_in_range_p (const_sbitmap bmap, unsigned int 
start, unsigned int end)
 }
 
   /* Now handle residuals in the last word.  */
-  SBITMAP_ELT_TYPE mask
-= ((SBITMAP_ELT_TYPE)1 << (SBITMAP_ELT_BITS - end_bitno)) - 1;
+  SBITMAP_ELT_TYPE mask = ~(SBITMAP_ELT_TYPE)0;
+  if (end_bitno + 1 < SBITMAP_ELT_BITS)
+mask = ((SBITMAP_ELT_TYPE)1 << (end_bitno + 1)) - 1;
   return (bmap->elms[start_word] & mask) != 0;
 }
 
@@ -821,3 +816,92 @@ dump_bitmap_vector (FILE *file, const char *title, const 
char *subtitle,
 
   fprintf (file, "\n");
 }
+
+#if CHECKING_P
+
+namespace selftest {
+
+/* Selftests for sbitmaps.  */
+
+
+/* Verify range functions for sbitmap.  */
+
+static void
+test_range_functions ()
+{
+  sbitmap s = sbitmap_alloc (1024);
+  bitmap_clear (s);
+
+  ASSERT_FALSE (bitmap_bit_in_range_p (s, 512, 1023));
+  bitmap_set_bit (s, 100);
+
+  ASSERT_FALSE (bitmap_bit_in_range_p (s, 512, 1023));
+  ASSERT_FALSE (bitmap_bit_in_range_p (s, 0, 99));
+  ASSERT_FALSE (bitmap_bit_in_range_p (s, 101, 1023));
+  

Re: [PATCH, AArch64] Disable reg offset in quad-word store for Falkor.

2017-10-12 Thread Jim Wilson
On Fri, 2017-09-22 at 14:11 -0700, Andrew Pinski wrote:
> On Fri, Sep 22, 2017 at 11:39 AM, Jim Wilson 
> wrote:
> > 
> > On Fri, Sep 22, 2017 at 10:58 AM, Andrew Pinski 
> > wrote:
> > > 
> > > Two overall comments:
> > > * What about splitting register_offset into two different
> > > elements,
> > > one for non 128bit modes and one for 128bit (and more; OI, etc.)
> > > modes
> > > so you get better address generation right away for the simd load
> > > cases rather than having LRA/reload having to reload the address
> > > into
> > > a register.
> > I'm not sure if changing register_offset cost would make a
> > difference,
> > since costs are usually used during optimization, not during
> > address
> > generation.  This is something that I didn't think to try
> > though.  I
> > can try taking a look at this.
> It does taken into account when fwprop is propagating the addition
> into
> the MEM (the tree level is always a_1 = POINTER_PLUS_EXPR;
> MEM_REF(a_1)).
> IV-OPTS will produce much better code if the address_cost is correct.
> 
> It looks like no other pass (combine, etc.) would take that into
> account except for postreload CSE but maybe they should.

I tried increasing the cost of register_offset.  This got rid of the
reg+reg addressing mode in the middle of the main loop for lmbench
stream copy, but did not eliminate it after the main loop.

The tree optimized dump has 
  _52 = a_15 + _51;
  _53 = c_17 + _51;
  _54 = *_52;
  *_53 = _54;
and the RTL expand dump has
(insn 64 63 65 10 (set (reg:DF 96 [ _54 ])
(mem:DF (plus:DI (reg/v/f:DI 78 [ a ])
(reg:DI 93 [ _51 ])) [3 *_52+0 S8 A64])) "stream.c":223
-1
 (nil))
(insn 65 64 66 10 (set (mem:DF (plus:DI (reg/v/f:DI 79 [ c ])
(reg:DI 93 [ _51 ])) [3 *_53+0 S8 A64])
(reg:DF 96 [ _54 ])) "stream.c":223 -1
 (nil))

That may be fixable, but there is a bigger problem here which is that
increasing the costs of register_offset affects both loads and stores.
 On falkor, it is only quad-word stores that are inefficient with a
reg+reg address.  Quad-word loads with a reg+reg address are faster
than the equivalent add/ldr.  Disabling reg+reg address for quad-word
loads will hurt performance.

Since the address cost stuff makes no distinction between loads and
stores, I see no way to get the result I need by using address costs.
 I can only get the result I need by modifying the md file.

> > I did try writing a patch to modify predicates to disallow reg
> > offset
> > for 128bit modes, and that got complicated, as I had to split apart
> > a
> > number of patterns in the aarch64-simd.md file that accept both VD
> > and
> > VQ modes.  I ended up with a patch 3-4 times as big as the one I
> > submitted, without any additional performance improvement, so it
> > wasn't worth the trouble.
> > 
> > > 
> > > * Maybe adding a testcase to the testsuite to show this change.
> > Yes, I can add a testcase.
> > 
> > > 
> > > One extra comment:
> > > * should we change the generic tuning to avoid reg+reg for 128bit
> > > modes?
> > Are there other targets with a similar problem?  I only know that
> > it
> > is a problem for Falkor.  It might be a loss for some targets as it
> > is
> > replacing one instruction with two.
> Well that is why I was suggesting the address cost model change.
> Because the cost model change actually might provide better code in
> the first place and still allow for reasonable generic code to be
> produced.

The patch I posted only affects Falkor.  It doesn't change generic
code.  I don't know of any reason why we need to change generic code
here.

The Falkor core has out-of-order execution and multiple function units,
so there isn't any noticeable performance gain from trying to fix this
earlier.  Fixing this with a md file change gives optimal performance
for the testcases I've looked at.

Since I'm no longer at Linaro, I expect that someone else will take
over this patch submission.  I will create a bug report to document the
issue, to make it easier to track it and hand off to someone else.

Jim



[PATCH 21/22] Enable building libitm with Intel CET

2017-10-12 Thread Tsimbalist, Igor V
Enable building libitm with Intel CET options.

libitm/
* Makefile.in: Regenerate.
* acinclude.m4: Add enable.m4 and cet.m4.
* config/x86/sjlj.S
(_ITM_beginTransaction): Save Shadow Stack pointer.
(GTM_longjmp): Restore Shadow Stack pointer.
* config/x86/target.h (struct gtm_jmpbuf):
Add Shadow Stack pointer.
* configure: Regenerate.
* configure.ac: Set CET_FLAGS. Update XCFLAGS, libtool_VERSION.
* testsuite/Makefile.in: Regenerate.

* config/cet.m4: Define ENABLE_CET_COMPATIBILITY. Set
enable_cet_compatibility.


0021-Enable-building-libitm-with-Intel-CET.PATCH
Description: 0021-Enable-building-libitm-with-Intel-CET.PATCH


[PATCH 20/22] Enable building libobjc with Intel CET

2017-10-12 Thread Tsimbalist, Igor V
Enable building libobjc with Intel CET options.

libobjc/
* Makefile.in: Regenerate.
* aclocal.m4: Likeiwse.
* configure: Likewise.
* configure.ac: Set CET_FLAGS. Update XCFLAGS.



0020-Enable-building-libobjc-with-Intel-CET.PATCH
Description: 0020-Enable-building-libobjc-with-Intel-CET.PATCH


[PATCH 19/22] Enable building libgfortran with Intel CET

2017-10-12 Thread Tsimbalist, Igor V
Enable building libgfortran with Intel CET options.

libgfortran/
* acinclude.m4: Add enable.m4, cet.m4.
* configure: Regenerate.
* configure.ac: Set CET_FLAGS. Update AM_FCFLAGS, AM_CFLAGS,
CFLAGS.


0019-Enable-building-libgfortran-with-Intel-CET.PATCH
Description: 0019-Enable-building-libgfortran-with-Intel-CET.PATCH


Re: [PATCH] C++: show location of unclosed extern "C" specifications (v2)

2017-10-12 Thread Jason Merrill
On Thu, Oct 12, 2017 at 2:45 PM, David Malcolm  wrote:
> - put the note on the string-literal, rather than the extern:
> note: 'extern "C"' linkage started here
>  extern "C" {
> ^~~

Maybe a range spanning both tokens?

OK with or without that change.

Jason


[patch] Fix PR debug/82509

2017-10-12 Thread Eric Botcazou
Hi,

this PR reports a couple of problems with the support of the DW_AT_endianity 
attribute associated with the scalar_storage_order source attribute: it does 
not persist through typedefs and it can contaminate native order DIEs.

The attached patch revamps it by associating native order DIEs and reverse 
order DIEs into adjacent pairs for base types, as well as looking through 
typedefs for base types with reverse order.  This makes it possible to have a 
single reverse order DIE for each base type and look it up efficiently.

Tested on x86_64-suse-linux, OK for the mainline?  What about the 7 branch?


2017-10-12  Eric Botcazou  

PR debug/82509
* dwarf2out.c (base_type_die): Remove early return for corner cases.
Allocate the new DIE manually and do not call add_pubtype on it.
(is_base_type): Remove ERROR_MARK and return 0 for VOID_TYPE.
(modified_type_die): Adjust the lookup for reverse order DIEs.  Skip
typedefs for base types with DW_AT_endianity.  Make sure a DIE with
native order exists for base types, attach the DIE manually and call
add_pubtype on it.  Do not equate a reverse order DIE to the type.


2017-10-12  Eric Botcazou  

* gcc.dg/debug/dwarf2/sso.c: Rename into...
* gcc.dg/debug/dwarf2/sso-1.c: ...this.
* gcc.dg/debug/dwarf2/sso-2.c: New test.
* gcc.dg/debug/dwarf2/sso-3.c: Likewise.

-- 
Eric BotcazouIndex: dwarf2out.c
===
--- dwarf2out.c	(revision 253628)
+++ dwarf2out.c	(working copy)
@@ -12090,9 +12090,6 @@ base_type_die (tree type, bool reverse)
   struct fixed_point_type_info fpt_info;
   tree type_bias = NULL_TREE;
 
-  if (TREE_CODE (type) == ERROR_MARK || TREE_CODE (type) == VOID_TYPE)
-return 0;
-
   /* If this is a subtype that should not be emitted as a subrange type,
  use the base type.  See subrange_type_for_debug_p.  */
   if (TREE_CODE (type) == INTEGER_TYPE && TREE_TYPE (type) != NULL_TREE)
@@ -12185,7 +12182,8 @@ base_type_die (tree type, bool reverse)
   gcc_unreachable ();
 }
 
-  base_type_result = new_die (DW_TAG_base_type, comp_unit_die (), type);
+  base_type_result = ggc_cleared_alloc ();
+  base_type_result->die_tag = DW_TAG_base_type;
 
   add_AT_unsigned (base_type_result, DW_AT_byte_size,
 		   int_size_in_bytes (type));
@@ -12241,8 +12239,6 @@ base_type_die (tree type, bool reverse)
 		 | dw_scalar_form_reference,
 		 NULL);
 
-  add_pubtype (type, base_type_result);
-
   return base_type_result;
 }
 
@@ -12270,8 +12266,6 @@ is_base_type (tree type)
 {
   switch (TREE_CODE (type))
 {
-case ERROR_MARK:
-case VOID_TYPE:
 case INTEGER_TYPE:
 case REAL_TYPE:
 case FIXED_POINT_TYPE:
@@ -12280,6 +12274,7 @@ is_base_type (tree type)
 case POINTER_BOUNDS_TYPE:
   return 1;
 
+case VOID_TYPE:
 case ARRAY_TYPE:
 case RECORD_TYPE:
 case UNION_TYPE:
@@ -12485,6 +12480,8 @@ modified_type_die (tree type, int cv_qua
   /* Only these cv-qualifiers are currently handled.  */
   const int cv_qual_mask = (TYPE_QUAL_CONST | TYPE_QUAL_VOLATILE
 			| TYPE_QUAL_RESTRICT | TYPE_QUAL_ATOMIC);
+  const bool reverse_base_type
+= need_endianity_attribute_p (reverse) && is_base_type (type);
 
   if (code == ERROR_MARK)
 return NULL;
@@ -12535,29 +12532,33 @@ modified_type_die (tree type, int cv_qua
 	qualified_type = size_type_node;
 }
 
-
   /* If we do, then we can just use its DIE, if it exists.  */
   if (qualified_type)
 {
   mod_type_die = lookup_type_die (qualified_type);
 
-  /* DW_AT_endianity doesn't come from a qualifier on the type.  */
+  /* DW_AT_endianity doesn't come from a qualifier on the type, so it is
+	 dealt with specially: the DIE with the attribute, if it exists, is
+	 placed immediately after the regular DIE for the same base type.  */
   if (mod_type_die
-	  && (!need_endianity_attribute_p (reverse)
-	  || !is_base_type (type)
-	  || get_AT_unsigned (mod_type_die, DW_AT_endianity)))
+	  && (!reverse_base_type
+	  || ((mod_type_die = mod_type_die->die_sib) != NULL
+		  && get_AT_unsigned (mod_type_die, DW_AT_endianity
 	return mod_type_die;
 }
 
   name = qualified_type ? TYPE_NAME (qualified_type) : NULL;
 
   /* Handle C typedef types.  */
-  if (name && TREE_CODE (name) == TYPE_DECL && DECL_ORIGINAL_TYPE (name)
+  if (name
+  && TREE_CODE (name) == TYPE_DECL
+  && DECL_ORIGINAL_TYPE (name)
   && !DECL_ARTIFICIAL (name))
 {
   tree dtype = TREE_TYPE (name);
 
-  if (qualified_type == dtype)
+  /* Skip the typedef for base types with DW_AT_endianity, no big deal.  */
+  if (qualified_type == dtype && !reverse_base_type)
 	{
 	  tree origin = decl_ultimate_origin (name);
 
@@ -12729,7 +12730,21 @@ modified_type_die (tree type, int cv_qua
   item_type = TREE_TYPE (type);
 }
   else if 

[PATCH 18/22] Enable building libmpx with Intel CET

2017-10-12 Thread Tsimbalist, Igor V
Enable building libmpx with Intel CET options.

libmpx/
* Makefile.in: Regenerate.
* acinclude.m4: Add enable.m4 and cet.m4.
* configure: Regenerate.
* configure.ac: Set CET_FLAGS. Update XCFLAGS.
* mpxrt/Makefile.am: Update libmpx_la_CFLAGS.
* mpxrt/Makefile.in: Regenerate.
* mpxwrap/Makefile.am: Add AM_CFLAGS. Update
* libmpxwrappers_la_CFLAGS.
* mpxwrap/Makefile.in: Regenerate.



0018-Enable-building-libmpx-with-Intel-CET.PATCH
Description: 0018-Enable-building-libmpx-with-Intel-CET.PATCH


Re: [PATCH 07/22] Enable building libgcc with CET options.

2017-10-12 Thread Joseph Myers
On Thu, 12 Oct 2017, Tsimbalist, Igor V wrote:

> Enable building libgcc with CET options by default on Linux/x86 if
> binutils supports CET v2.0.
> It can be disabled with --disable-cet.  It is an error to configure
> GCC with --enable-cet if bintuiils doesn't support CET v2.0.
> 
> config/
>   * cet.m4: New file

This file is checking $target.  That's only ever appropriate in 
directories building compilers and similar tools; target library 
directories should check $host, as the host for target libraries is the 
target for the compiler.

This file has a comment

> +dnl GCC_CET_LIBRARY
> +dnl(SHELL-CODE_HANDLER)

which doesn't seem to match the subsequent definition of GCC_CET_FLAGS.

I don't see any documentation of the new configure option.  I'd expect the 
first patch adding such an option to document it in install.texi, and then 
subsequent patches to update that documentation if those patches extend 
the option to cover more things.

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH 17/22] Enable building libquadmath with Intel CET

2017-10-12 Thread Tsimbalist, Igor V
Enable building libquadmath with Intel CET options.

libquadmath/
* Makefile.am: Update AM_CFLAGS.
* Makefile.in: Regenerate:
* acinclude.m4: Add enable.m4 and cet.m4.
* configure: Regenerate.
* configure.ac: Set CET_FLAGS. Update XCFLAGS.



0017-Enable-building-libquadmath-with-Intel-CET.PATCH
Description: 0017-Enable-building-libquadmath-with-Intel-CET.PATCH


[PATCH 16/22] Enable building libssp with Intel CET

2017-10-12 Thread Tsimbalist, Igor V
Enable building libssp with Intel CET options.

libssp/
* Makefile.am: Update AM_CFLAGS.
* Makefile.in: Regenerate.
* configure: Likewise.
* aclocal.m4: Likewise.
* configure.ac: Set CET_FLAGS. Update XCFLAGS.



0016-Enable-building-libssp-with-Intel-CET.PATCH
Description: 0016-Enable-building-libssp-with-Intel-CET.PATCH


[PATCH 15/22] Enable building libvtv with Intel CET

2017-10-12 Thread Tsimbalist, Igor V
Enable building libvtv with Intel CET options.

libvtv/
* acinclude.m4: Add enable.m4 and cet.m4.
* libvtv/configure: Regenerate.
* libvtv/configure.ac: Set CET_FLAGS. Update XCFLAGS.



0015-Enable-building-libvtv-with-Intel-CET.PATCH
Description: 0015-Enable-building-libvtv-with-Intel-CET.PATCH


[PATCH 14/22] Enable building libsanitizer with Intel CET

2017-10-12 Thread Tsimbalist, Igor V
Enable building libsanitizer with Intel CET options.

libsanitizer/
* acinclude.m4: Add enable.m4 and cet.m4.
* Makefile.in: Regenerate.
* asan/Makefile.am: Update AM_CXXFLAGS.
* asan/Makefile.in: Regenerate.
* configure: Likewise.
* configure.ac: Set CET_FLAGS. Update EXTRA_CFLAGS,
* EXTRA_CXXFLAGS.
* interception/Makefile.am: Update AM_CXXFLAGS.
* interception/Makefile.in: Regenerate.
* libbacktrace/Makefile.am: Update AM_CFLAGS, AM_CXXFLAGS.
* libbacktrace/Makefile.in: Regenerate.
* lsan/Makefile.am: Update AM_CXXFLAGS.
* lsan/Makefile.in: Regenerate.
* sanitizer_common/Makefile.am: Update AM_CXXFLAGS.
* sanitizer_common/Makefile.in: Regenerate.
* tsan/Makefile.am: Update AM_CXXFLAGS.
* tsan/Makefile.in: Regenerate.
* ubsan/Makefile.am: Update AM_CXXFLAGS.
* ubsan/Makefile.in: Regenerate.


0014-Enable-building-libsanitizer-with-Intel-CET.PATCH
Description: 0014-Enable-building-libsanitizer-with-Intel-CET.PATCH


[PATCH 13/22] Enable building libstdc++-v3 with Intel CET

2017-10-12 Thread Tsimbalist, Igor V
Enable building libstdc++v3 with CET options.

libstdc++-v3/
* acinclude.m4: Add cet.m4.
* configure.ac: Set CET_FLAGS. Update EXTRA_CFLAGS.
* libsupc++/Makefile.am: Add EXTRA_CFLAGS.
* Makefile.in: Regenerate.
* configure: Likewise.
* doc/Makefile.in: Likewise.
* include/Makefile.in: Likewise.
* libsupc++/Makefile.in: Likewise.
* po/Makefile.in: Likewise.
* python/Makefile.in: Likewise.
* src/Makefile.in: Likewise.
* src/c++11/Makefile.in: Likewise.
* src/c++98/Makefile.in: Likewise.
* src/filesystem/Makefile.in: Likewise.
* testsuite/Makefile.in: Likewise.



0013-Enable-building-libstdc-v3-with-Intel-CET.PATCH
Description: 0013-Enable-building-libstdc-v3-with-Intel-CET.PATCH


[PATCH 12/22] Enable building libgomp with Intel CET

2017-10-12 Thread Tsimbalist, Igor V
Enable building libgomp with CET options.

libgomp/
* configure.ac: Set CET_FLAGS, update XCFLAGS and FCFLAGS.
* acinclude.m4: Add cet.m4.
* configure: Regenerate.
* Makefile.in: Likewise.
* testsuite/Makefile.in: Likewise



0012-Enable-building-libgomp-with-Intel-CET.PATCH
Description: 0012-Enable-building-libgomp-with-Intel-CET.PATCH


[PATCH 11/22] Enable building libatomic with Intel CET

2017-10-12 Thread Tsimbalist, Igor V
Enable building libatomic with CET options.

libatomic/
* configure.ac: Set CET_FLAGS, update XCFLAGS.
* acinclude.m4: Add cet.m4 and enable.m4.
* configure: Regenerate.
* Makefile.in: Likewise.
* testsuite/Makefile.in: Likewise.



0011-Enable-building-libatomic-with-Intel-CET.PATCH
Description: 0011-Enable-building-libatomic-with-Intel-CET.PATCH


[PATCH 10/22] Enable building libcilkrts with Intel CET

2017-10-12 Thread Tsimbalist, Igor V
Enable building libcilkrts with CET options.

libcilkrts/
* Makefile.am: Add AM_CXXFLAGS and XCXXFLAGS.
* configure.ac: Set CET_FLAGS, update XCFLAGS, XCXXFLAGS.
* Makefile.in: Regenerate.
* aclocal.m4: Likewise.
* configure: Likewise.



0010-Enable-building-libcilkrts-with-Intel-CET.PATCH
Description: 0010-Enable-building-libcilkrts-with-Intel-CET.PATCH


[PATCH, rs6000] GIMPLE folding for vector compares

2017-10-12 Thread Will Schmidt
Hi,

Add support for gimple folding of vec_cmp_{eq,ge,gt,le,ne} for
the integer data types.

This adds a handful of entries to the switch statement in builtin_function_type
for those builtins having unsigned arguments.

Three entries are added to vsx.md to enable vcmpne[bhw] instruction, where we
would otherwise generate a vcmpeq + vnor.

This patch requires the previously posted "allow integer return type from 
vector compares" patch.

A handful of existing tests required updates to their specified optimization
levels to continue to generate the desired code.  builtins-3-p9.c in particular
has been updated to reflect improved code gen with the higher specified
optimization level.   Testcase coverage is otherwise handled by the 
already-in-tree
gcc.target/powerpc/fold-vec-cmp-*.c tests.

Tested OK on P6 and newer. OK for trunk?

Thanks,
-Will

[gcc]

2017-10-12  Will Schmidt  

* config/rs6000/rs6000.c: (rs6000_gimple_fold_builtin) Add support for
folding of vector compares.  (builtin_function_type) Add compare
builtins to the list of functions having unsigned arguments.
* config/rs6000/vsx.md:  Add vcmpne{b,h,w} instructions.

[testsuite]

2017-10-12  Will Schmidt  

* gcc.target/powerpc/builtins-3-p9.c: Add -O1, update
expected codegen checks.
* gcc.target/powerpc/vec-cmp-sel.c: Mark vars as volatile.
* gcc.target/powerpc/vsu/vec-cmpne-0.c: Add -O1.
* gcc.target/powerpc/vsu/vec-cmpne-1.c: Add -O1.
* gcc.target/powerpc/vsu/vec-cmpne-2.c: Add -O1.
* gcc.target/powerpc/vsu/vec-cmpne-3.c: Add -O1.
* gcc.target/powerpc/vsu/vec-cmpne-4.c: Add -O1.
* gcc.target/powerpc/vsu/vec-cmpne-5.c: Add -O1.
* gcc.target/powerpc/vsu/vec-cmpne-6.c: Add -O1.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 12ddd97..7e73239 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -16605,17 +16605,93 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
   build_int_cst (arg2_type, 0)), arg0);
 gimple_set_location (g, loc);
 gsi_replace (gsi, g, true);
 return true;
   }
+/* Vector compares (integer); EQ, NE, GE, GT, LE.  */
+case ALTIVEC_BUILTIN_VCMPEQUB:
+case ALTIVEC_BUILTIN_VCMPEQUH:
+case ALTIVEC_BUILTIN_VCMPEQUW:
+case P8V_BUILTIN_VCMPEQUD:
+  {
+   arg0 = gimple_call_arg (stmt, 0);
+   arg1 = gimple_call_arg (stmt, 1);
+   lhs = gimple_call_lhs (stmt);
+   gimple *g = gimple_build_assign (lhs, EQ_EXPR, arg0, arg1);
+   gimple_set_location (g, gimple_location (stmt));
+   gsi_replace (gsi, g, true);
+   return true;
+  }
+case P9V_BUILTIN_CMPNEB:
+case P9V_BUILTIN_CMPNEH:
+case P9V_BUILTIN_CMPNEW:
+  {
+   arg0 = gimple_call_arg (stmt, 0);
+   arg1 = gimple_call_arg (stmt, 1);
+   lhs = gimple_call_lhs (stmt);
+   gimple *g = gimple_build_assign (lhs, NE_EXPR, arg0, arg1);
+   gimple_set_location (g, gimple_location (stmt));
+   gsi_replace (gsi, g, true);
+   return true;
+  }
+case VSX_BUILTIN_CMPGE_16QI:
+case VSX_BUILTIN_CMPGE_U16QI:
+case VSX_BUILTIN_CMPGE_8HI:
+case VSX_BUILTIN_CMPGE_U8HI:
+case VSX_BUILTIN_CMPGE_4SI:
+case VSX_BUILTIN_CMPGE_U4SI:
+case VSX_BUILTIN_CMPGE_2DI:
+case VSX_BUILTIN_CMPGE_U2DI:
+  {
+   arg0 = gimple_call_arg (stmt, 0);
+   arg1 = gimple_call_arg (stmt, 1);
+   lhs = gimple_call_lhs (stmt);
+   gimple *g = gimple_build_assign (lhs, GE_EXPR, arg0, arg1);
+   gimple_set_location (g, gimple_location (stmt));
+   gsi_replace (gsi, g, true);
+   return true;
+  }
+case ALTIVEC_BUILTIN_VCMPGTSB:
+case ALTIVEC_BUILTIN_VCMPGTUH:
+case ALTIVEC_BUILTIN_VCMPGTSH:
+case ALTIVEC_BUILTIN_VCMPGTUW:
+case ALTIVEC_BUILTIN_VCMPGTSW:
+case ALTIVEC_BUILTIN_VCMPGTUB:
+case P8V_BUILTIN_VCMPGTUD:
+case P8V_BUILTIN_VCMPGTSD:
+  {
+   arg0 = gimple_call_arg (stmt, 0);
+   arg1 = gimple_call_arg (stmt, 1);
+   lhs = gimple_call_lhs (stmt);
+   gimple *g = gimple_build_assign (lhs, GT_EXPR, arg0, arg1);
+   gimple_set_location (g, gimple_location (stmt));
+   gsi_replace (gsi, g, true);
+   return true;
+  }
+case VSX_BUILTIN_CMPLE_16QI:
+case VSX_BUILTIN_CMPLE_U16QI:
+case VSX_BUILTIN_CMPLE_8HI:
+case VSX_BUILTIN_CMPLE_U8HI:
+case VSX_BUILTIN_CMPLE_4SI:
+case VSX_BUILTIN_CMPLE_U4SI:
+case VSX_BUILTIN_CMPLE_2DI:
+case VSX_BUILTIN_CMPLE_U2DI:
+  {
+   arg0 = gimple_call_arg (stmt, 0);
+   arg1 = gimple_call_arg (stmt, 1);
+   lhs = gimple_call_lhs (stmt);
+   gimple *g = gimple_build_assign (lhs, LE_EXPR, arg0, arg1);
+   gimple_set_location (g, gimple_location (stmt));
+   gsi_replace (gsi, g, true);
+   

[PATCH] (gimple) Allow integer return type from vector compares

2017-10-12 Thread Will Schmidt
Hi, 

Update the logic in verify_gimple_comparision to allow a vector integer result
from a vector comparison, where it previously was limited to only allowing
compares with boolean results.  This allows powerpc intrinsics such as this
one to build (after gimple folding):
   vector bool int vec_cmpeq (vector signed int, vector signed int);

This has been tested in conjunction with the "rs6000 GIMPLE folding for vector
compares" patch (posted separately) on p6 and newer.

OK for trunk?

Thanks,
-Will

[gcc]

2017-10-12  Will Schmidt  

* gcc/tree-cfg.c: (@ verify_gimple_comparison ): allow boolean result
from vector compares.

diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index b5e0460..adf3607 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -3624,14 +3624,16 @@ verify_gimple_comparison (tree type, tree op0, tree 
op1, enum tree_code code)
  debug_generic_expr (op0_type);
  debug_generic_expr (op1_type);
  return true;
 }
 }
-  /* Or a boolean vector type with the same element count
- as the comparison operand types.  */
+  /* Or a vector type with the same element count
+ as the comparison operand types.  The vector type may
+ be boolean or integer.  */
   else if (TREE_CODE (type) == VECTOR_TYPE
-  && TREE_CODE (TREE_TYPE (type)) == BOOLEAN_TYPE)
+  && (( TREE_CODE (TREE_TYPE (type)) == BOOLEAN_TYPE)
+  || ( TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE)))
 {
   if (TREE_CODE (op0_type) != VECTOR_TYPE
  || TREE_CODE (op1_type) != VECTOR_TYPE)
 {
   error ("non-vector operands in vector comparison");




[PATCH 09/22] Enable building libbacktrace with Intel CET

2017-10-12 Thread Tsimbalist, Igor V
Enable building libbacktrace with CET options.

libbacktrace/
* configure.ac: Add CET_FLAGS to EXTRA_FLAGS.
* aclocal.m4: Regenerate.
* Makefile.in: Likewise.
* configure: Likewise.

Igor




0009-Enable-building-libbacktrace-with-Intel-CET.PATCH
Description: 0009-Enable-building-libbacktrace-with-Intel-CET.PATCH


[PATCH 08/22] Add Intel CET support for EH in libgcc.

2017-10-12 Thread Tsimbalist, Igor V
Control-flow Enforcement Technology (CET), published by Intel, Introduces
the Shadow Stack feature, which ensures a return from a function is done
to exactly the same location from where the function was called. When EH
is present the control-flow transfer may skip some stack frames and the
shadow stack has to be adjusted not to signal a violation of a
control-flow transfer. It's done by counting a number of skipping frames
and adjusting shadow stack pointer by this number.

gcc/
* config/i386/i386.c (ix86_expand_epilogue): Change simple
return to indirect jump for EH return. Change explicit 'false'
argument in pro_epilogue_adjust_stack with a value of
flag_cf_protection.
* config/i386/i386.md (simple_return_indirect_internal): Remove
SImode restriction to support 64-bit.

libgcc/
* config/i386/linux-unwind.h: Include
config/i386/shadow-stack-unwind.h.
* config/i386/shadow-stack-unwind.h: New file.
* unwind-dw2.c: (uw_install_context): Add a FRAMES argument and
pass it to _Unwind_Frames_Extra.
* unwind-generic.h (FRAMES_P_DECL): New.
(FRAMES_VAR): Likewise.
(FRAMES_VAR_P): Likewise.
(FRAMES_VAR_DECL): Likewise.
(FRAMES_VAR_DECL_1): Likewise.
(FRAMES_VAR_INC): Likewise.
(FRAMES_P_UPDATE): Likewise.
(_Unwind_Frames_Extra): Likewise.
* unwind.inc (_Unwind_RaiseException_Phase2): Use FRAMES_P_DECL,
FRAMES_VAR_DECL_1, FRAMES_VAR_INC and FRAMES_P_UPDATE.
(_Unwind_RaiseException): Use FRAMES_VAR_DECL, FRAMES_VAR_P and
FRAMES_VAR.
(_Unwind_ForcedUnwind_Phase2): Use FRAMES_P_DECL,
FRAMES_VAR_DECL_1, FRAMES_VAR_INC, FRAMES_P_UPDATE.
(_Unwind_ForcedUnwind): Use FRAMES_VAR_DECL, FRAMES_VAR_P and
FRAMES_VAR.
(_Unwind_Resume): Use FRAMES_VAR_DECL, FRAMES_VAR_P and
FRAMES_VAR.
(_Unwind_Resume_or_Rethrow): Use FRAMES_VAR_DECL, FRAMES_VAR_P
and FRAMES_VAR. 

Igor




0008-Add-Intel-CET-support-for-EH-in-libgcc.patch
Description: 0008-Add-Intel-CET-support-for-EH-in-libgcc.patch


[PATCH 07/22] Enable building libgcc with CET options.

2017-10-12 Thread Tsimbalist, Igor V
Enable building libgcc with CET options by default on Linux/x86 if
binutils supports CET v2.0.
It can be disabled with --disable-cet.  It is an error to configure
GCC with --enable-cet if bintuiils doesn't support CET v2.0.

config/
* cet.m4: New file

gcc/
* config.gcc (extra_headers): Add cet.h for Linux/x86 targets.
* config/i386/cet.h: New file.

libgcc/
* Makefile.in (configure_deps): Add $(srcdir)/../config/cet.m4.
(CET_FLAGS): New.
* config/i386/t-linux
(HOST_LIBGCC2_CFLAGS): Add $(CET_FLAGS).
(CRTSTUFF_T_CFLAGS): Add $(CET_FLAGS).
* configure.ac: Include ../config/cet.m4.
Set and substitute CET_FLAGS.
* configure: Regenerated.

Igor




0007-Enable-building-libgcc-with-CET-options.patch
Description: 0007-Enable-building-libgcc-with-CET-options.patch


[PATCH] Fix various arithmetic patterns with %[abcd]h destination (PR target/82524)

2017-10-12 Thread Jakub Jelinek
Hi!

As mentioned in the PR, there are two bugs in these.  One is that
the zero_extract destination is effectively another input operand (for the
remaining bits that are unchanged) and thus the constraint can't be =Q,
but has to be +Q.
And the other problem is that then LRA ICEs whenever it has 3 different
operands, because it is unable to reload it properly.  Uros mentioned
that it could be reloaded by using *insvqi_2-like insn to move the
8 bits from the operand that should use "0" constraint into the destination
register, but LRA isn't able to do that right now.
So this patch instead adds insn conditions that either the destination
is the same as the first input operand or as one of the input operands
(the latter for commutative patterns).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/7.3?

2017-10-12  Jakub Jelinek  

PR target/82524
* config/i386/i386.md (addqi_ext_1, andqi_ext_1,
*andqi_ext_1_cc, *qi_ext_1, *xorqi_ext_1_cc): Change
=Q constraints to +Q and into insn condition add check
that operands[0] and operands[1] are equal.
(*addqi_ext_2, *andqi_ext_2, *qi_ext_2): Change
=Q constraints to +Q and into insn condition add check
that operands[0] is equal to either operands[1] or operands[2].

* gcc.c-torture/execute/pr82524.c: New test.

--- gcc/config/i386/i386.md.jj  2017-10-12 14:05:15.0 +0200
+++ gcc/config/i386/i386.md 2017-10-12 17:07:11.723151868 +0200
@@ -6264,7 +6264,7 @@ (define_insn "*add_5"
(set_attr "mode" "")])
 
 (define_insn "addqi_ext_1"
-  [(set (zero_extract:SI (match_operand 0 "ext_register_operand" "=Q,Q")
+  [(set (zero_extract:SI (match_operand 0 "ext_register_operand" "+Q,Q")
 (const_int 8)
 (const_int 8))
(subreg:SI
@@ -6275,7 +6275,8 @@ (define_insn "addqi_ext_1"
   (const_int 8)) 0)
(match_operand:QI 2 "general_operand" "QnBc,m")) 0))
(clobber (reg:CC FLAGS_REG))]
-  ""
+  "/* FIXME: without this LRA can't reload this pattern, see PR82524.  */
+   rtx_equal_p (operands[0], operands[1])"
 {
   switch (get_attr_type (insn))
 {
@@ -6300,7 +6301,7 @@ (define_insn "addqi_ext_1"
(set_attr "mode" "QI")])
 
 (define_insn "*addqi_ext_2"
-  [(set (zero_extract:SI (match_operand 0 "ext_register_operand" "=Q")
+  [(set (zero_extract:SI (match_operand 0 "ext_register_operand" "+Q")
 (const_int 8)
 (const_int 8))
(subreg:SI
@@ -6314,7 +6315,9 @@ (define_insn "*addqi_ext_2"
   (const_int 8)
   (const_int 8)) 0)) 0))
   (clobber (reg:CC FLAGS_REG))]
-  ""
+  "/* FIXME: without this LRA can't reload this pattern, see PR82524.  */
+   rtx_equal_p (operands[0], operands[1])
+   || rtx_equal_p (operands[0], operands[2])"
   "add{b}\t{%h2, %h0|%h0, %h2}"
   [(set_attr "type" "alu")
(set_attr "mode" "QI")])
@@ -8998,7 +9001,7 @@ (define_insn "*andqi_2_slp"
(set_attr "mode" "QI")])
 
 (define_insn "andqi_ext_1"
-  [(set (zero_extract:SI (match_operand 0 "ext_register_operand" "=Q,Q")
+  [(set (zero_extract:SI (match_operand 0 "ext_register_operand" "+Q,Q")
 (const_int 8)
 (const_int 8))
(subreg:SI
@@ -9009,7 +9012,8 @@ (define_insn "andqi_ext_1"
   (const_int 8)) 0)
(match_operand:QI 2 "general_operand" "QnBc,m")) 0))
(clobber (reg:CC FLAGS_REG))]
-  ""
+  "/* FIXME: without this LRA can't reload this pattern, see PR82524.  */
+   rtx_equal_p (operands[0], operands[1])"
   "and{b}\t{%2, %h0|%h0, %2}"
   [(set_attr "isa" "*,nox64")
(set_attr "type" "alu")
@@ -9027,7 +9031,7 @@ (define_insn "*andqi_ext_1_cc"
   (const_int 8)) 0)
(match_operand:QI 2 "general_operand" "QnBc,m"))
  (const_int 0)))
-   (set (zero_extract:SI (match_operand 0 "ext_register_operand" "=Q,Q")
+   (set (zero_extract:SI (match_operand 0 "ext_register_operand" "+Q,Q")
 (const_int 8)
 (const_int 8))
(subreg:SI
@@ -9037,14 +9041,16 @@ (define_insn "*andqi_ext_1_cc"
   (const_int 8)
   (const_int 8)) 0)
(match_dup 2)) 0))]
-  "ix86_match_ccmode (insn, CCNOmode)"
+  "ix86_match_ccmode (insn, CCNOmode)
+   /* FIXME: without this LRA can't reload this pattern, see PR82524.  */
+   && rtx_equal_p (operands[0], operands[1])"
   "and{b}\t{%2, %h0|%h0, %2}"
   [(set_attr "isa" "*,nox64")
(set_attr "type" "alu")
(set_attr "mode" "QI")])
 
 (define_insn "*andqi_ext_2"
-  [(set (zero_extract:SI (match_operand 0 "ext_register_operand" "=Q")
+  [(set (zero_extract:SI (match_operand 0 "ext_register_operand" "+Q")
 (const_int 8)
 (const_int 8))
(subreg:SI
@@ 

[PATCH] Improve rotate fold-const pattern matching (PR target/82498)

2017-10-12 Thread Jakub Jelinek
Hi!

Marc in the PR mentioned that it is not really good that the recommended
rotate pattern is recognized only during forwprop1 and later, which is after
einline and that inlining or early opts could have changed stuff too much so
that we wouldn't recogize it anymore.

The following patch handles that pattern in fold_binary_loc too, and while
I've touched it, it cleans a lot of ugliness/duplication in that code.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-10-12  Jakub Jelinek  

PR target/82498
* fold-const.c (fold_binary_loc) : Code cleanups,
instead of handling MINUS_EXPR twice (once for each argument),
canonicalize operand order and handle just once, use rtype where
possible.  Handle (A << B) | (A >> (-B & (Z - 1))).

* gcc.dg/tree-ssa/pr82498.c: New test.

--- gcc/fold-const.c.jj 2017-10-11 22:37:51.0 +0200
+++ gcc/fold-const.c2017-10-12 13:17:45.360589554 +0200
@@ -9429,7 +9429,10 @@ fold_binary_loc (location_t loc,
   /* (A << C1) + (A >> C2) if A is unsigned and C1+C2 is the size of A
 is a rotate of A by C1 bits.  */
   /* (A << B) + (A >> (Z - B)) if A is unsigned and Z is the size of A
-is a rotate of A by B bits.  */
+is a rotate of A by B bits.
+Similarly for (A << B) | (A >> (-B & C3)) where C3 is Z-1,
+though in this case CODE must be | and not + or ^, otherwise
+it doesn't return A when B is 0.  */
   {
enum tree_code code0, code1;
tree rtype;
@@ -9447,25 +9450,32 @@ fold_binary_loc (location_t loc,
== GET_MODE_UNIT_PRECISION (TYPE_MODE (rtype
  {
tree tree01, tree11;
+   tree orig_tree01, orig_tree11;
enum tree_code code01, code11;
 
-   tree01 = TREE_OPERAND (arg0, 1);
-   tree11 = TREE_OPERAND (arg1, 1);
+   tree01 = orig_tree01 = TREE_OPERAND (arg0, 1);
+   tree11 = orig_tree11 = TREE_OPERAND (arg1, 1);
STRIP_NOPS (tree01);
STRIP_NOPS (tree11);
code01 = TREE_CODE (tree01);
code11 = TREE_CODE (tree11);
+   if (code11 != MINUS_EXPR
+   && (code01 == MINUS_EXPR || code01 == BIT_AND_EXPR))
+ {
+   std::swap (code0, code1);
+   std::swap (code01, code11);
+   std::swap (tree01, tree11);
+   std::swap (orig_tree01, orig_tree11);
+ }
if (code01 == INTEGER_CST
&& code11 == INTEGER_CST
&& (wi::to_widest (tree01) + wi::to_widest (tree11)
-   == element_precision (TREE_TYPE (TREE_OPERAND (arg0, 0)
+   == element_precision (rtype)))
  {
tem = build2_loc (loc, LROTATE_EXPR,
- TREE_TYPE (TREE_OPERAND (arg0, 0)),
- TREE_OPERAND (arg0, 0),
+ rtype, TREE_OPERAND (arg0, 0),
  code0 == LSHIFT_EXPR
- ? TREE_OPERAND (arg0, 1)
- : TREE_OPERAND (arg1, 1));
+ ? orig_tree01 : orig_tree11);
return fold_convert_loc (loc, type, tem);
  }
else if (code11 == MINUS_EXPR)
@@ -9477,39 +9487,37 @@ fold_binary_loc (location_t loc,
STRIP_NOPS (tree111);
if (TREE_CODE (tree110) == INTEGER_CST
&& 0 == compare_tree_int (tree110,
- element_precision
- (TREE_TYPE (TREE_OPERAND
- (arg0, 0
+ element_precision (rtype))
&& operand_equal_p (tree01, tree111, 0))
- return
-   fold_convert_loc (loc, type,
- build2 ((code0 == LSHIFT_EXPR
-  ? LROTATE_EXPR
-  : RROTATE_EXPR),
- TREE_TYPE (TREE_OPERAND (arg0, 
0)),
- TREE_OPERAND (arg0, 0),
- TREE_OPERAND (arg0, 1)));
+ {
+   tem = build2_loc (loc, (code0 == LSHIFT_EXPR
+   ? LROTATE_EXPR : RROTATE_EXPR),
+ rtype, TREE_OPERAND (arg0, 0),
+ orig_tree01);
+   return fold_convert_loc (loc, type, tem);
+ }
  }
-   else if (code01 == MINUS_EXPR)
+   else if (code == BIT_IOR_EXPR
+&& code11 == BIT_AND_EXPR
+&& pow2p_hwi 

[PATCH] Avoid UB in ia32intrin.h rotate patterns (PR target/82498)

2017-10-12 Thread Jakub Jelinek
Hi!

The ia32intrin.h rotate intrinsics require the second argument to be
in between 1 and 31 (or 63), otherwise they invoke UB.  But, we can do much
better while generating the same instruction when optimizing, so the
following patch uses the patterns we pattern recognize well and where
the second argument can be any value.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-10-12  Jakub Jelinek  

PR target/82498
* config/i386/ia32intrin.h (__rold, __rord, __rolq, __rorq): Allow
any values of __C while still being pattern recognizable as a simple
rotate instruction.

* gcc.dg/ubsan/pr82498.c: New test.

--- gcc/config/i386/ia32intrin.h.jj 2017-01-01 12:45:42.0 +0100
+++ gcc/config/i386/ia32intrin.h2017-10-12 09:55:24.235602737 +0200
@@ -147,7 +147,8 @@ extern __inline unsigned int
 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 __rold (unsigned int __X, int __C)
 {
-  return (__X << __C) | (__X >> (32 - __C));
+  __C &= 31;
+  return (__X << __C) | (__X >> (-__C & 31));
 }
 
 /* 8bit ror */
@@ -171,7 +172,8 @@ extern __inline unsigned int
 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 __rord (unsigned int __X, int __C)
 {
-  return (__X >> __C) | (__X << (32 - __C));
+  __C &= 31;
+  return (__X >> __C) | (__X << (-__C & 31));
 }
 
 /* Pause */
@@ -239,7 +241,8 @@ extern __inline unsigned long long
 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 __rolq (unsigned long long __X, int __C)
 {
-  return (__X << __C) | (__X >> (64 - __C));
+  __C &= 63;
+  return (__X << __C) | (__X >> (-__C & 63));
 }
 
 /* 64bit ror */
@@ -247,7 +250,8 @@ extern __inline unsigned long long
 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 __rorq (unsigned long long __X, int __C)
 {
-  return (__X >> __C) | (__X << (64 - __C));
+  __C &= 63;
+  return (__X >> __C) | (__X << (-__C & 63));
 }
 
 /* Read flags register */
--- gcc/testsuite/gcc.dg/ubsan/pr82498.c.jj 2017-10-12 09:40:36.025438511 
+0200
+++ gcc/testsuite/gcc.dg/ubsan/pr82498.c2017-10-12 10:06:06.636790077 
+0200
@@ -0,0 +1,159 @@
+/* PR target/82498 */
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-fsanitize=undefined -fno-sanitize-recover=undefined" } */
+
+#include 
+
+volatile unsigned int a;
+volatile unsigned long long b;
+volatile int c;
+
+int
+main ()
+{
+  a = 0x12345678U;
+  a = __rold (a, 0);
+  if (a != 0x12345678U)
+__builtin_abort ();
+  a = __rold (a, 32);
+  if (a != 0x12345678U)
+__builtin_abort ();
+  a = __rold (a, -32);
+  if (a != 0x12345678U)
+__builtin_abort ();
+  a = __rold (a, 37);
+  if (a != 0x468acf02U)
+__builtin_abort ();
+  a = __rold (a, -5);
+  if (a != 0x12345678U)
+__builtin_abort ();
+  a = __rord (a, 0);
+  if (a != 0x12345678U)
+__builtin_abort ();
+  a = __rord (a, 32);
+  if (a != 0x12345678U)
+__builtin_abort ();
+  a = __rord (a, -32);
+  if (a != 0x12345678U)
+__builtin_abort ();
+  a = __rord (a, -37);
+  if (a != 0x468acf02U)
+__builtin_abort ();
+  a = __rord (a, 5);
+  if (a != 0x12345678U)
+__builtin_abort ();
+  c = 0;
+  a = __rold (a, c);
+  if (a != 0x12345678U)
+__builtin_abort ();
+  c = 32;
+  a = __rold (a, c);
+  if (a != 0x12345678U)
+__builtin_abort ();
+  c = -32;
+  a = __rold (a, c);
+  if (a != 0x12345678U)
+__builtin_abort ();
+  c = 37;
+  a = __rold (a, c);
+  if (a != 0x468acf02U)
+__builtin_abort ();
+  c = -5;
+  a = __rold (a, c);
+  if (a != 0x12345678U)
+__builtin_abort ();
+  c = 0;
+  a = __rord (a, c);
+  if (a != 0x12345678U)
+__builtin_abort ();
+  c = 32;
+  a = __rord (a, c);
+  if (a != 0x12345678U)
+__builtin_abort ();
+  c = -32;
+  a = __rord (a, c);
+  if (a != 0x12345678U)
+__builtin_abort ();
+  c = -37;
+  a = __rord (a, c);
+  if (a != 0x468acf02U)
+__builtin_abort ();
+  c = 5;
+  a = __rord (a, c);
+  if (a != 0x12345678U)
+__builtin_abort ();
+#ifdef __x86_64__
+  b = 0x123456789abcdef1ULL;
+  b = __rolq (b, 0);
+  if (b != 0x123456789abcdef1ULL)
+__builtin_abort ();
+  b = __rolq (b, 64);
+  if (b != 0x123456789abcdef1ULL)
+__builtin_abort ();
+  b = __rolq (b, -64);
+  if (b != 0x123456789abcdef1ULL)
+__builtin_abort ();
+  b = __rolq (b, 69);
+  if (b != 0x468acf13579bde22ULL)
+__builtin_abort ();
+  b = __rolq (b, -5);
+  if (b != 0x123456789abcdef1ULL)
+__builtin_abort ();
+  b = __rorq (b, 0);
+  if (b != 0x123456789abcdef1ULL)
+__builtin_abort ();
+  b = __rorq (b, 64);
+  if (b != 0x123456789abcdef1ULL)
+__builtin_abort ();
+  b = __rorq (b, -64);
+  if (b != 0x123456789abcdef1ULL)
+__builtin_abort ();
+  b = __rorq (b, -69);
+  if (b != 0x468acf13579bde22ULL)
+__builtin_abort ();
+  b = __rorq (b, 5);
+  if (b != 0x123456789abcdef1ULL)
+__builtin_abort ();
+  c = 0;
+  b = __rolq (b, c);
+  if (b != 0x123456789abcdef1ULL)
+

Re: [patch, fortran] New take on PR 82373

2017-10-12 Thread Steve Kargl
On Thu, Oct 12, 2017 at 08:21:46PM +0200, Thomas Koenig wrote:
> 
> after some thought, I think the PR can be fixed by something
> far less invasive than my previous patch.
> 
> The new version of the patch simply issues an error for a
> non-printable character (which should never be legal).
> Anything else should be caught by other error reporting
> routines.
> 
> OK for trunk?
> 

I like this patch much better than the last.  It's
ok with me, but you might want give 24 hours to let
the world rotate before committing (ie., allow others
to comment).

-- 
Steve
20170425 https://www.youtube.com/watch?v=VWUpyCsUKR4
20161221 https://www.youtube.com/watch?v=IbCHE-hONow


Re: [PATCH] Improve x86 and + rotate (PR target/82498)

2017-10-12 Thread Jakub Jelinek
On Thu, Oct 12, 2017 at 10:40:22AM +0200, Uros Bizjak wrote:
> > So, if you aren't against it, I can extend the patch to handle the 4
> > other mask patterns; as for other modes, SImode is what is being handled
> > already, DImode is not a problem, because the FEs truncate the shift counts
> > to integer_type_node already, and for HImode I haven't seen problem
> > probably because most tunings avoid HImode math and so it isn't worth
> > optimizing.
> 
> OK, I think that we can live wtih 4 new patterns. Since these are all
> written in the same way (as in the patch you posted), the ammended
> patch is pre-approved for mainline.

Thanks, here is what I've committed to trunk after another bootstrap/regtest
on x86_64-linux and i686-linux:

2017-10-12  Jakub Jelinek  

PR target/82498
* config/i386/i386.md (*ashl3_mask_1,
*3_mask_1, *3_mask_1,
*_mask_1, *btr_mask_1): New define_insn_and_split
patterns.

* gcc.target/i386/pr82498-1.c: New test.
* gcc.target/i386/pr82498-2.c: New test.

--- gcc/config/i386/i386.md.jj  2017-10-11 22:37:55.933863355 +0200
+++ gcc/config/i386/i386.md 2017-10-12 11:30:38.191535974 +0200
@@ -10228,6 +10228,26 @@ (define_insn_and_split "*ashl3_mas
   (clobber (reg:CC FLAGS_REG))])]
   "operands[2] = gen_lowpart (QImode, operands[2]);")
 
+(define_insn_and_split "*ashl3_mask_1"
+  [(set (match_operand:SWI48 0 "nonimmediate_operand")
+   (ashift:SWI48
+ (match_operand:SWI48 1 "nonimmediate_operand")
+ (and:QI
+   (match_operand:QI 2 "register_operand")
+   (match_operand:QI 3 "const_int_operand"
+   (clobber (reg:CC FLAGS_REG))]
+  "ix86_binary_operator_ok (ASHIFT, mode, operands)
+   && (INTVAL (operands[3]) & (GET_MODE_BITSIZE (mode)-1))
+  == GET_MODE_BITSIZE (mode)-1
+   && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(parallel
+ [(set (match_dup 0)
+  (ashift:SWI48 (match_dup 1)
+(match_dup 2)))
+  (clobber (reg:CC FLAGS_REG))])])
+
 (define_insn "*bmi2_ashl3_1"
   [(set (match_operand:SWI48 0 "register_operand" "=r")
(ashift:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" "rm")
@@ -10728,6 +10748,26 @@ (define_insn_and_split "*3_mask_1"
+  [(set (match_operand:SWI48 0 "nonimmediate_operand")
+   (any_shiftrt:SWI48
+ (match_operand:SWI48 1 "nonimmediate_operand")
+ (and:QI
+   (match_operand:QI 2 "register_operand")
+   (match_operand:QI 3 "const_int_operand"
+   (clobber (reg:CC FLAGS_REG))]
+  "ix86_binary_operator_ok (, mode, operands)
+   && (INTVAL (operands[3]) & (GET_MODE_BITSIZE (mode)-1))
+  == GET_MODE_BITSIZE (mode)-1
+   && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(parallel
+ [(set (match_dup 0)
+  (any_shiftrt:SWI48 (match_dup 1)
+ (match_dup 2)))
+  (clobber (reg:CC FLAGS_REG))])])
+
 (define_insn_and_split "*3_doubleword"
   [(set (match_operand:DWI 0 "register_operand" "=")
(any_shiftrt:DWI (match_operand:DWI 1 "register_operand" "0")
@@ -11187,6 +11227,26 @@ (define_insn_and_split "*3_mask_1"
+  [(set (match_operand:SWI48 0 "nonimmediate_operand")
+   (any_rotate:SWI48
+ (match_operand:SWI48 1 "nonimmediate_operand")
+ (and:QI
+   (match_operand:QI 2 "register_operand")
+   (match_operand:QI 3 "const_int_operand"
+   (clobber (reg:CC FLAGS_REG))]
+  "ix86_binary_operator_ok (, mode, operands)
+   && (INTVAL (operands[3]) & (GET_MODE_BITSIZE (mode)-1))
+  == GET_MODE_BITSIZE (mode)-1
+   && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(parallel
+ [(set (match_dup 0)
+  (any_rotate:SWI48 (match_dup 1)
+(match_dup 2)))
+  (clobber (reg:CC FLAGS_REG))])])
+
 ;; Implement rotation using two double-precision
 ;; shift instructions and a scratch register.
 
@@ -11494,6 +11554,30 @@ (define_insn_and_split "*_ma
   (clobber (reg:CC FLAGS_REG))])]
   "operands[1] = gen_lowpart (QImode, operands[1]);")
 
+(define_insn_and_split "*_mask_1"
+  [(set (match_operand:SWI48 0 "register_operand")
+   (any_or:SWI48
+ (ashift:SWI48
+   (const_int 1)
+   (and:QI
+ (match_operand:QI 1 "register_operand")
+ (match_operand:QI 2 "const_int_operand")))
+ (match_operand:SWI48 3 "register_operand")))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_USE_BT
+   && (INTVAL (operands[2]) & (GET_MODE_BITSIZE (mode)-1))
+  == GET_MODE_BITSIZE (mode)-1
+   && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(parallel
+ [(set (match_dup 0)
+  (any_or:SWI48
+(ashift:SWI48 (const_int 1)
+  (match_dup 1))
+(match_dup 3)))
+  (clobber (reg:CC FLAGS_REG))])])
+
 (define_insn "*btr"
   [(set (match_operand:SWI48 0 "register_operand" "=r")
(and:SWI48
@@ -11535,6 +11619,30 @@ (define_insn_and_split "*btr_mask"
   

Go patch committed: Fix import of indirectly imported type alias

2017-10-12 Thread Ian Lance Taylor
When the Go frontend imported a reference to an indirectly imported
type aliases, it was looking for the " = " before the optional package
name that appears for an indirect reference, but the exporter was
putting it after.  This patch fixes that.  The test case is
https://golang.org/cl/70290, and will be brought into the GCC
testsuite in due course.  Bootstrapped and ran Go testsuite on
x86_64-pc-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 253664)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-af46ad16dc34773877068393d331ac8ae34b2219
+44132970e4b6c1186036bf8eda8982fb6e905d6f
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/import.cc
===
--- gcc/go/gofrontend/import.cc (revision 253311)
+++ gcc/go/gofrontend/import.cc (working copy)
@@ -756,13 +756,6 @@ Import::read_type()
 
   this->require_c_string(" ");
 
-  bool is_alias = false;
-  if (this->match_c_string("= "))
-{
-  stream->advance(2);
-  is_alias = true;
-}
-
   // The package name may follow.  This is the name of the package in
   // the package clause of that package.  The type name will include
   // the pkgpath, which may be different.
@@ -775,6 +768,13 @@ Import::read_type()
   this->require_c_string(" ");
 }
 
+  bool is_alias = false;
+  if (this->match_c_string("= "))
+{
+  stream->advance(2);
+  is_alias = true;
+}
+
   // Declare the type in the appropriate package.  If we haven't seen
   // it before, mark it as invisible.  We declare it before we read
   // the actual definition of the type, since the definition may refer


[C++ PATCH] cp_expr tweak and delete unused enumerations

2017-10-12 Thread Nathan Sidwell
The cp_expr class doesn't have a const-qualified operator * accessor. 
This leaves one with a confusing error message if one ever tries to 
dereference a constant cp_expr object:

  const cp_expr thing = ...;
  tree bob = *thing; // ERROR

What happens is the non-const operator * is not suitable, but the 
implicit conversion to a tree is, which is then dereferenced.  so we end 
up trying to assign a struct to a pointer-to-struct.


cp_expr::operator-> has a similar problem.

The attached patch adds const-qualified accessors.

I also noticed that some of the values of cp_tree_node_structure_enum 
were no longer used, leading to warnings about them not being handled in 
switch statements.  (We never use LAST_TS_CP_ENUM to size an array).


Deleted in this patch.

Applying to trunk.

nathan

--
Nathan Sidwell
2017-10-12  Nathan Sidwell  

	* cp-tree.h (cp_expr): Add const operator * and operator->
	accessors.
	(cp_tree_node_structure_enum): Delete TS_CP_BINDING,
	TS_CP_WRAPPER, LAST_TS_CP_ENUM.

Index: cp-tree.h
===
--- cp-tree.h	(revision 253683)
+++ cp-tree.h	(working copy)
@@ -65,7 +65,9 @@ public:
   /* Implicit conversions to tree.  */
   operator tree () const { return m_value; }
   tree & operator* () { return m_value; }
+  tree operator* () const { return m_value; }
   tree & operator-> () { return m_value; }
+  tree operator-> () const { return m_value; }
 
   tree get_value () const { return m_value; }
   location_t get_location () const { return m_loc; }
@@ -1467,11 +1469,9 @@ enum cp_tree_node_structure_enum {
   TS_CP_IDENTIFIER,
   TS_CP_TPI,
   TS_CP_PTRMEM,
-  TS_CP_BINDING,
   TS_CP_OVERLOAD,
   TS_CP_BASELINK,
   TS_CP_TEMPLATE_DECL,
-  TS_CP_WRAPPER,
   TS_CP_DEFAULT_ARG,
   TS_CP_DEFERRED_NOEXCEPT,
   TS_CP_STATIC_ASSERT,
@@ -1480,8 +1480,7 @@ enum cp_tree_node_structure_enum {
   TS_CP_LAMBDA_EXPR,
   TS_CP_TEMPLATE_INFO,
   TS_CP_CONSTRAINT_INFO,
-  TS_CP_USERDEF_LITERAL,
-  LAST_TS_CP_ENUM
+  TS_CP_USERDEF_LITERAL
 };
 
 /* The resulting tree type.  */


RE: 0006-Part-6.-Add-x86-tests-for-Intel-CET-implementation

2017-10-12 Thread Tsimbalist, Igor V
Attached is an updated patch according to your comments. New tests are
added to test ICF optimization in presence of nocf_check attribute.

Igor


> -Original Message-
> From: Tsimbalist, Igor V
> Sent: Tuesday, September 19, 2017 11:30 PM
> To: Uros Bizjak 
> Cc: gcc-patches@gcc.gnu.org; Tsimbalist, Igor V
> 
> Subject: RE: 0006-Part-6.-Add-x86-tests-for-Intel-CET-implementation
> 
> > -Original Message-
> > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> > ow...@gcc.gnu.org] On Behalf Of Uros Bizjak
> > Sent: Tuesday, September 19, 2017 6:13 PM
> > To: Tsimbalist, Igor V 
> > Cc: gcc-patches@gcc.gnu.org
> > Subject: Re: 0006-Part-6.-Add-x86-tests-for-Intel-CET-implementation
> >
> > On Tue, Sep 19, 2017 at 5:18 PM, Tsimbalist, Igor V
> >  wrote:
> > >> -Original Message-
> > >> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> > >> ow...@gcc.gnu.org] On Behalf Of Uros Bizjak
> > >> Sent: Monday, September 18, 2017 12:17 PM
> > >> To: gcc-patches@gcc.gnu.org
> > >> Cc: Tsimbalist, Igor V ; Tsimbalist,
> > >> Igor V 
> > >> Subject: Re:
> > >> 0006-Part-6.-Add-x86-tests-for-Intel-CET-implementation
> > >>
> > >> Hello!
> > >>
> > >> > gcc/testsuite/
> > >> >
> > >> > * g++.dg/cet-notrack-1.C: New test.
> > >> > * gcc.target/i386/cet-intrin-1.c: Likewise.
> > >> > * gcc.target/i386/cet-intrin-10.c: Likewise.
> > >> > * gcc.target/i386/cet-intrin-2.c: Likewise.
> > >> > * gcc.target/i386/cet-intrin-3.c: Likewise.
> > >> > * gcc.target/i386/cet-intrin-4.c: Likewise.
> > >> > * gcc.target/i386/cet-intrin-5.c: Likewise.
> > >> > * gcc.target/i386/cet-intrin-6.c: Likewise.
> > >> > * gcc.target/i386/cet-intrin-7.c: Likewise.
> > >> > * gcc.target/i386/cet-intrin-8.c: Likewise.
> > >> > * gcc.target/i386/cet-intrin-9.c: Likewise.
> > >> > * gcc.target/i386/cet-label.c: Likewise.
> > >> > * gcc.target/i386/cet-notrack-1a.c: Likewise.
> > >> > * gcc.target/i386/cet-notrack-1b.c: Likewise.
> > >> > * gcc.target/i386/cet-notrack-2a.c: Likewise.
> > >> > * gcc.target/i386/cet-notrack-2b.c: Likewise.
> > >> > * gcc.target/i386/cet-notrack-3.c: Likewise.
> > >> > * gcc.target/i386/cet-notrack-4a.c: Likewise.
> > >> > * gcc.target/i386/cet-notrack-4b.c: Likewise.
> > >> > * gcc.target/i386/cet-notrack-5a.c: Likewise.
> > >> > * gcc.target/i386/cet-notrack-5b.c: Likewise.
> > >> > * gcc.target/i386/cet-notrack-6a.c: Likewise.
> > >> > * gcc.target/i386/cet-notrack-6b.c: Likewise.
> > >> > * gcc.target/i386/cet-notrack-7.c: Likewise.
> > >> > * gcc.target/i386/cet-property-1.c: Likewise.
> > >> > * gcc.target/i386/cet-property-2.c: Likewise.
> > >> > * gcc.target/i386/cet-rdssp-1.c: Likewise.
> > >> > * gcc.target/i386/cet-sjlj-1.c: Likewise.
> > >> > * gcc.target/i386/cet-sjlj-2.c: Likewise.
> > >> > * gcc.target/i386/cet-sjlj-3.c: Likewise.
> > >> > * gcc.target/i386/cet-switch-1.c: Likewise.
> > >> > * gcc.target/i386/cet-switch-2.c: Likewise.
> > >> > * lib/target-supports.exp (check_effective_target_cet): New proc.
> > >>
> > >> A couple of questions:
> > >>
> > >> +/* { dg-do compile } */
> > >> +/* { dg-options "-O2 -mcet" } */
> > >> +/* { dg-final { scan-assembler-times "setssbsy" 2 } } */
> > >> +
> > >> +#include 
> > >> +
> > >> +void f1 (void)
> > >> +{
> > >> +  __builtin_ia32_setssbsy ();
> > >> +}
> > >> +
> > >> +void f2 (void)
> > >> +{
> > >> +  _setssbsy ();
> > >> +}
> > >>
> > >> Is there a reason that both, __builtin and intrinsic versions are
> > >> tested in a couple of places? The intrinsic version is just a
> > >> wrapper for __builtin, so IMO testing intrinsic version should be
> enough.
> > > No strong reason. Just to check that intrinsic names are recognized
> > > and
> > processed correctly.
> > > The implementation could change and the test will catch inconsistency.
> > > I would also assume a user will use intrinsics that's why I add intrinsic
> check.
> > Should I remove it?
> >
> > Actually, these __builtins are considered as implementation detail,
> > and their use should be discouraged. They are deliberately not
> > documented, and users should use intrinsic headers instead. That said,
> > builtins won't change without a reason, since Ada needs them.
> >
> > It can happen that the test fails due to change of intrinsics, so I'd
> > recommend to remove them.
> Ok, I will remove intrinsic.
> 
> > >> diff --git a/gcc/testsuite/gcc.target/i386/cet-rdssp-1.c
> > >> b/gcc/testsuite/gcc.target/i386/cet-rdssp-1.c
> > >> new file mode 100644
> > >> index 000..f9223a5
> > >> --- /dev/null
> > >> +++ b/gcc/testsuite/gcc.target/i386/cet-rdssp-1.c
> > >> @@ -0,0 +1,39 @@
> > >> +/* { dg-do run { target cet } } */
> > >> +/* { dg-options "-O2 -finstrument-control-flow -mcet" } */
> > >>
> > >> The "target cet" directive just checks that CET instructions can be
> > compiled.
> > >> The test 

Increase base of profile probabilities

2017-10-12 Thread Jan Hubicka
Hi,
this patch makes profile probability use more precise representation
and takes care of places where 32bit artithmetic will possibly overflow.
This will make it possible to drop counts on edges next.

Bootstrapped/regtested x86_64-linux, comitted.

Honza

* profile-count.c (safe_scale_64bit): Fix GCC4.x path.
(profile_probability): Set max_probability
to (uint32_t) 1 << (n_bits - 2) and update accessors to avoid overlfows
in temporaries.
* profile-count.c (profile_probability::differs_from_p): Do not
rely on max_probaiblity == 1

* gcc.dg/predict-13.c: Update template for probaility change.
* gcc.dg/predict-8.c: Likewise.
Index: profile-count.c
===
--- profile-count.c (revision 253683)
+++ profile-count.c (working copy)
@@ -147,12 +147,12 @@ profile_probability::differs_from_p (pro
 {
   if (!initialized_p () || !other.initialized_p ())
 return false;
-  if ((uint64_t)m_val - (uint64_t)other.m_val < 10
-  || (uint64_t)other.m_val - (uint64_t)m_val < 10)
+  if ((uint64_t)m_val - (uint64_t)other.m_val < max_probability / 1000
+  || (uint64_t)other.m_val - (uint64_t)max_probability < 1000)
 return false;
   if (!other.m_val)
 return true;
-  int64_t ratio = m_val * 100 / other.m_val;
+  int64_t ratio = (int64_t)m_val * 100 / other.m_val;
   return ratio < 99 || ratio > 101;
 }
 
Index: profile-count.h
===
--- profile-count.h (revision 253683)
+++ profile-count.h (working copy)
@@ -67,7 +67,10 @@ safe_scale_64bit (uint64_t a, uint64_t b
   if (a < ((uint64_t)1 << 31)
   && b < ((uint64_t)1 << 31)
   && c < ((uint64_t)1 << 31))
-return (a * b + (c / 2)) / c;
+{
+  *res = (a * b + (c / 2)) / c;
+  return true;
+}
 #endif
   return slow_safe_scale_64bit (a, b, c, res);
 }
@@ -111,11 +114,10 @@ safe_scale_64bit (uint64_t a, uint64_t b
 
 class GTY((user)) profile_probability
 {
-  /* For now use values in range 0...REG_BR_PROB_BASE.  Later we can use full
- precision of 30 bits available.  */
-
   static const int n_bits = 30;
-  static const uint32_t max_probability = REG_BR_PROB_BASE;
+  /* We can technically use ((uint32_t) 1 << (n_bits - 1)) - 2 but that
+ will lead to harder multiplication sequences.  */
+  static const uint32_t max_probability = (uint32_t) 1 << (n_bits - 2);
   static const uint32_t uninitialized_probability
 = ((uint32_t) 1 << (n_bits - 1)) - 1;
 
@@ -210,14 +212,14 @@ public:
 {
   profile_probability ret;
   gcc_checking_assert (v >= 0 && v <= REG_BR_PROB_BASE);
-  ret.m_val = RDIV (v * max_probability, REG_BR_PROB_BASE);
+  ret.m_val = RDIV (v * (uint64_t) max_probability, REG_BR_PROB_BASE);
   ret.m_quality = profile_guessed;
   return ret;
 }
   int to_reg_br_prob_base () const
 {
   gcc_checking_assert (initialized_p ());
-  return RDIV (m_val * REG_BR_PROB_BASE, max_probability);
+  return RDIV (m_val * (uint64_t) REG_BR_PROB_BASE, max_probability);
 }
 
   /* Conversion to and from RTL representation of profile probabilities.  */
@@ -246,7 +248,12 @@ public:
   if (val1 > val2)
ret.m_val = max_probability;
   else
-   ret.m_val = RDIV (val1 * max_probability, val2);
+   {
+ uint64_t tmp;
+ safe_scale_64bit (val1, max_probability, val2, );
+ gcc_checking_assert (tmp <= max_probability);
+ ret.m_val = tmp;
+   }
   ret.m_quality = profile_precise;
   return ret;
 }
@@ -443,8 +450,9 @@ public:
   if (!initialized_p ())
return profile_probability::uninitialized ();
   profile_probability ret;
-  ret.m_val = MIN (RDIV (m_val * num, den),
-  max_probability);
+  uint64_t tmp;
+  safe_scale_64bit (m_val, num, den, );
+  ret.m_val = MIN (tmp, max_probability);
   ret.m_quality = MIN (m_quality, profile_adjusted);
   return ret;
 }
@@ -482,7 +490,7 @@ public:
   if (m_val == uninitialized_probability)
return m_quality == profile_guessed;
   else
-   return m_val <= REG_BR_PROB_BASE;
+   return m_val <= max_probability;
 }
 
   /* Comparsions are three-state and conservative.  False is returned if
@@ -781,8 +789,10 @@ public:
   if (!initialized_p ())
return profile_count::uninitialized ();
   profile_count ret;
-  ret.m_val = RDIV (m_val * prob.m_val,
-   profile_probability::max_probability);
+  uint64_t tmp;
+  safe_scale_64bit (m_val, prob.m_val, 
profile_probability::max_probability,
+   );
+  ret.m_val = tmp;
   ret.m_quality = MIN (m_quality, prob.m_quality);
   return ret;
 }
@@ -794,11 +804,11 @@ public:
   if (!initialized_p ())
return profile_count::uninitialized ();
   profile_count ret;

RE: 0004-Part-4.-Update-x86-backend-to-enable-Intel-CET

2017-10-12 Thread Tsimbalist, Igor V
Uros,

Attached is an updated patch. The main difference is in option name and 
attribute name change. Other code is the same.

Igor


> -Original Message-
> From: Tsimbalist, Igor V
> Sent: Tuesday, September 19, 2017 5:06 PM
> To: Uros Bizjak ; gcc-patches@gcc.gnu.org
> Cc: Tsimbalist, Igor V 
> Subject: RE: 0004-Part-4.-Update-x86-backend-to-enable-Intel-CET
> 
> Uros, thank you for the approval. Based on the approval of the first 3 patches
> (I've submitted them today), I need to adjust option and attribute names. I
> will resubmit the patch when I fix option and attribute names.
> 
> Thanks,
> Igor
> 
> 
> > -Original Message-
> > From: Uros Bizjak [mailto:ubiz...@gmail.com]
> > Sent: Monday, September 18, 2017 11:58 AM
> > To: gcc-patches@gcc.gnu.org
> > Cc: Tsimbalist, Igor V ; Tsimbalist, Igor
> > V 
> > Subject: Re: 0004-Part-4.-Update-x86-backend-to-enable-Intel-CET
> >
> > Hello!
> >
> > > gcc/
> > >
> > > * common/config/i386/i386-common.c (OPTION_MASK_ISA_IBT_SET):
> > New.
> > > (OPTION_MASK_ISA_SHSTK_SET): Likewise.
> > > (OPTION_MASK_ISA_IBT_UNSET): Likewise.
> > > (OPTION_MASK_ISA_SHSTK_UNSET): Likewise.
> > > (ix86_handle_option): Add -mibt, -mshstk, -mcet handling.
> > > * config.gcc (extra_headers): Add cetintrin.h for x86 targets.
> > > (extra_objs): Add cet.o for Linux/x86 targets.
> > > (tmake_file): Add i386/t-cet for Linux/x86 targets.
> > > * config/i386/cet.c: New file.
> > > * config/i386/cetintrin.h: Likewise.
> > > * config/i386/t-cet: Likewise.
> > > * config/i386/cpuid.h (bit_SHSTK): New.
> > > (bit_IBT): Likewise.
> > > * config/i386/driver-i386.c (host_detect_local_cpu): Detect and pass
> > > IBT and SHSTK bits.
> > > * config/i386/i386-builtin-types.def
> > > (VOID_FTYPE_UNSIGNED_PVOID): New.
> > > (VOID_FTYPE_UINT64_PVOID): Likewise.
> > > * config/i386/i386-builtin.def: Add CET intrinsics.
> > > * config/i386/i386-c.c (ix86_target_macros_internal): Add
> > > OPTION_MASK_ISA_IBT, OPTION_MASK_ISA_SHSTK handling.
> > > * config/i386/i386-passes.def: Add pass_insert_endbranch pass.
> > > * config/i386/i386-protos.h (make_pass_insert_endbranch): New
> > > prototype.
> > > * config/i386/i386.c (rest_of_insert_endbranch): New.
> > > (pass_data_insert_endbranch): Likewise.
> > > (pass_insert_endbranch): Likewise.
> > > (make_pass_insert_endbranch): Likewise.
> > > (ix86_notrack_prefixed_insn_p): Likewise.
> > > (ix86_target_string): Add -mibt, -mshstk flags.
> > > (ix86_option_override_internal): Add flag_instrument_control_flow
> > > processing.
> > > (ix86_valid_target_attribute_inner_p): Set OPT_mibt, OPT_mshstk.
> > > (ix86_print_operand): Add 'notrack' prefix output.
> > > (ix86_init_mmx_sse_builtins): Add CET intrinsics.
> > > (ix86_expand_builtin): Expand CET intrinsics.
> > > (x86_output_mi_thunk): Add 'endbranch' instruction.
> > > * config/i386/i386.h (TARGET_IBT): New.
> > > (TARGET_IBT_P): Likewise.
> > > (TARGET_SHSTK): Likewise.
> > > (TARGET_SHSTK_P): Likewise.
> > > * config/i386/i386.md (unspecv): Add UNSPECV_NOP_RDSSP,
> > > UNSPECV_INCSSP, UNSPECV_SAVEPREVSSP, UNSPECV_RSTORSSP,
> > UNSPECV_WRSS,
> > > UNSPECV_WRUSS, UNSPECV_SETSSBSY, UNSPECV_CLRSSBSY.
> > > (builtin_setjmp_setup): New pattern.
> > > (builtin_longjmp): Likewise.
> > > (rdssp): Likewise.
> > > (incssp): Likewise.
> > > (saveprevssp): Likewise.
> > > (rstorssp): Likewise.
> > > (wrss): Likewise.
> > > (wruss): Likewise.
> > > (setssbsy): Likewise.
> > > (clrssbsy): Likewise.
> > > (nop_endbr): Likewise.
> > > * config/i386/i386.opt: Add -mcet, -mibt, -mshstk and -mcet-switch
> > > options.
> > > * config/i386/immintrin.h: Include .
> > > * config/i386/linux-common.h
> > > (file_end_indicate_exec_stack_and_cet): New prototype.
> > > (TARGET_ASM_FILE_END): New.
> >
> > LGTM.
> >
> > OK for mainline.
> >
> > Thanks,
> > Uros.


0004-Update-x86-backend-to-enable-Intel-CET.PATCH
Description: 0004-Update-x86-backend-to-enable-Intel-CET.PATCH


[PATCH] C++: show location of unclosed extern "C" specifications (v2)

2017-10-12 Thread David Malcolm
On Wed, 2017-10-11 at 15:51 -0400, Jason Merrill wrote:
> On Tue, Sep 26, 2017 at 3:27 PM, David Malcolm 
> wrote:
> > * cp-tree.h (struct saved_scope): Add "location" field.
> 
> saved_scope seems like the wrong place for this; it's only
> interesting
> at parse time, so having it saved during template instantiation
> doesn't seem useful.  I'd prefer to put it in cp_parser and have
> maybe_show_extern_c_location look in the_parser.
> 
> Jason

Here's an updated version that moves it there.

Other changes in v2:
- handle nested linkage specifications
- put the note on the string-literal, rather than the extern:
note: 'extern "C"' linkage started here
 extern "C" {
^~~

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
OK for trunk?

Thanks
Dave


Blurb from v1:

If the user fails to close an extern "C" linkage specifier, and then
uses templates, they will run into "template with C linkage" errors.

>From personal experience, it can be hard to tell where the
extern "C" began.  As of r251026 there will be a message highlighting
the unclosed '{', but this may be hard to spot at the very end of
the errors.

This patch adds a note to the various diagnostics that complain
about C linkage, showing the user where the extern "C" specification
began.

gcc/cp/ChangeLog:
* cp-tree.h (maybe_show_extern_c_location): New decl.
* decl.c (grokfndecl): When complaining about literal operators
with C linkage, issue a note giving the location of the
extern "C".
* parser.c (cp_parser_new): Initialize new field
"innermost_linkage_specification_location".
(cp_parser_linkage_specification): Store the location
of the string-literal token within the cp_parser.
(cp_parser_explicit_specialization): When complaining about
template specializations with C linkage, issue a note giving the
location of the extern "C".
(cp_parser_explicit_template_declaration): Likewise for templates.
(maybe_show_extern_c_location): New function.
* parser.h (struct cp_parser): New field
"innermost_linkage_specification_location".

gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/udlit-extern-c.C: New test case.
* g++.dg/diagnostic/unclosed-extern-c.C: Add example of a template
erroneously covered by an unclosed extern "C".
* g++.dg/template/extern-c.C: New test case.
---
 gcc/cp/cp-tree.h   |  1 +
 gcc/cp/decl.c  |  1 +
 gcc/cp/parser.c| 29 ++
 gcc/cp/parser.h|  4 ++
 gcc/testsuite/g++.dg/cpp0x/udlit-extern-c.C|  7 +++
 .../g++.dg/diagnostic/unclosed-extern-c.C  | 11 +++-
 gcc/testsuite/g++.dg/template/extern-c.C   | 66 ++
 7 files changed, 118 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/udlit-extern-c.C
 create mode 100644 gcc/testsuite/g++.dg/template/extern-c.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 82ebc28..df3d198 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -6362,6 +6362,7 @@ extern bool parsing_nsdmi (void);
 extern bool parsing_default_capturing_generic_lambda_in_template (void);
 extern void inject_this_parameter (tree, cp_cv_quals);
 extern location_t defarg_location (tree);
+extern void maybe_show_extern_c_location (void);
 
 /* in pt.c */
 extern bool check_template_shadow  (tree);
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 99f0af2..51a5305 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -8724,6 +8724,7 @@ grokfndecl (tree ctype,
   if (DECL_LANGUAGE (decl) == lang_c)
{
  error ("literal operator with C linkage");
+ maybe_show_extern_c_location ();
  return NULL_TREE;
}
 
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 28bc8e4..a556a21 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -3828,6 +3828,9 @@ cp_parser_new (void)
   /* Allow constrained-type-specifiers. */
   parser->prevent_constrained_type_specifiers = 0;
 
+  /* We haven't yet seen an 'extern "C"'.  */
+  parser->innermost_linkage_specification_location = UNKNOWN_LOCATION;
+
   return parser;
 }
 
@@ -13740,6 +13743,7 @@ cp_parser_linkage_specification (cp_parser* parser)
   cp_parser_require_keyword (parser, RID_EXTERN, RT_EXTERN);
 
   /* Look for the string-literal.  */
+  cp_token *string_token = cp_lexer_peek_token (parser->lexer);
   linkage = cp_parser_string_literal (parser, false, false);
 
   /* Transform the literal into an identifier.  If the literal is a
@@ -13758,6 +13762,13 @@ cp_parser_linkage_specification (cp_parser* parser)
   /* We're now using the new linkage.  */
   push_lang_context (linkage);
 
+  /* Preserve the location of the string-literal for the innermost linkage
+ specification, tracking the locations of nested specifications

Re: Prevent invalid register mode changes in combine

2017-10-12 Thread Jeff Law
On 09/18/2017 05:38 AM, Richard Sandiford wrote:
> This patch stops combine from changing the mode of an existing register
> in-place if doing so would change the size of the underlying register
> allocation size, as given by REGMODE_NATURAL_SIZE.  Without this,
> many tests fail in adjust_reg_mode after SVE is added.  One example
> is gcc.c-torture/compile/20090401-1.c.
> 
> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
> Also tested by comparing the testsuite assembly output on at least one
> target per CPU directory.  OK to install?
> 
> Richard
> 
> 
> 2017-09-18  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * combine.c (can_change_dest_mode): Reject changes in
>   REGMODE_NATURAL_SIZE.
OK.
jeff


Re: Clobbers and Scratch Registers

2017-10-12 Thread Jeff Law
On 08/21/2017 10:11 PM, Alan Modra wrote:
> On Mon, Aug 21, 2017 at 06:33:09PM +0100, Richard Sandiford wrote:
>> I think it's worth emphasising that tying operands doesn't change
>> whether an output needs an earlyclobber or not.  E.g. for:
> 
> Thanks for noticing this.  It turns out that my OpenBLAS example
> actually ought to have an early-clobber on one of the tied outputs, so
> you've also alerted me to another bug in the power8 code.  (Well, only
> if the dgemv kernel was called directly from user code with a 16*N A
> matrix, or I suppose if LTO was used.)  So I now have a real-world
> example of the situation where you need an early-clobber on tied
> outputs, and also where an early-clobber is undesirable.
> 
> Revised and expanded.
> 
>   * doc/extend.texi (Extended Asm ): Rename to
>   "Clobbers and Scratch Registers".  Add paragraph on
>   alternative to clobbers for scratch registers and OpenBLAS
>   example.
Also OK.  I think you had a minor revision on this which is OK as well.
I never would have spotted the additional earlyclobber requirement.

jeff


Re: [PATCH] Asm memory constraints

2017-10-12 Thread Jeff Law
On 08/20/2017 06:59 PM, Alan Modra wrote:
> On Sun, Aug 20, 2017 at 08:00:53AM -0500, Segher Boessenkool wrote:
>> Hi Alan,
>>
>> On Sat, Aug 19, 2017 at 12:19:35AM +0930, Alan Modra wrote:
>>> +Flushing registers to memory has performance implications and may be
>>> +an issue for time-sensitive code.  You can provide better information
>>> +to GCC to avoid this, as shown in the following examples.  At a
>>> +minimum, aliasing rules allow GCC to know what memory @emph{doesn't}
>>> +need to be flushed.  Also, if GCC can prove that all of the outputs of
>>> +a non-volatile @code{asm} statement are unused, then the @code{asm}
>>> +may be deleted.  Removal of otherwise dead @code{asm} statements will
>>> +not happen if they clobber @code{"memory"}.
>>
>> void f(int x) { int z; asm("hcf %0,%1" : "=r"(z) : "r"(x) : "memory"); }
>> void g(int x) { int z; asm("hcf %0,%1" : "=r"(z) : "r"(x)); }
>>
>> Both f and g are completely removed by the first jump pass immediately
>> after expand (via delete_trivially_dead_insns).
>>
>> Do you have a testcase for the behaviour you saw?
> 
> Oh my.  I was sure that was how "memory" worked!  I see though that
> every gcc I have lying around, going all the way back to gcc-2.95,
> deletes the asm in your testcase.  I definitely don't want to put
> something in the docs that is plain wrong, or just my idea of how
> things ought to work, so the last two sentences quoted above need to
> go.  Thanks for the correction.
> 
> Fixed in this revised patch.  The only controversial aspect now should
> be whether those array casts ought to be officially blessed.  I've
> checked that "=m" (*(T (*)[]) ptr), "=m" (*(T (*)[n]) ptr), and
> "=m" (*(T (*)[10]) ptr), all generate reasonable MEM_ATTRS handled
> apparently properly by alias.c and other code.
> 
> For example, at -O3 the following shows gcc moving the read of "val"
> before the asm, while an asm using a "memory" clobber forces the read
> to occur after the asm.
> 
> static int
> f (double *x)
> {
>   int res;
>   asm ("#%0 %1 %2" : "=r" (res) : "r" (x), "m" (*(double (*)[]) x));
>   return res;
> }
> 
> int val = 123;
> double foo[10];
> 
> int
> main ()
> {
>   int b = f (foo);
>   __builtin_printf ("%d %d\n", val, b);
>   return 0;
> }
> 
> 
> I'm also encouraged by comments like the following by rth in 2004
> (gcc/c/c-typeck.c), which say that using non-kosher lvalues in memory
> output constraints must continue to be supported.
> 
>   /* ??? Really, this should not be here.  Users should be using a
>proper lvalue, dammit.  But there's a long history of using casts
>in the output operands.  In cases like longlong.h, this becomes a
>primitive form of typechecking -- if the cast can be removed, then
>the output operand had a type of the proper width; otherwise we'll
>get an error.  Gross, but ...  */
>   STRIP_NOPS (output);
> 
> 
>   * doc/extend.texi (Clobbers): Correct vax example.  Delete old
>   example of a memory input for a string of known length.  Move
>   commentary out of table.  Add a number of new examples
>   covering array memory inputs.
> testsuite/
>   * gcc.target/i386/asm-mem.c: New test.
OK.  Sorry about the long wait.

jeff


[patch, fortran] New take on PR 82373

2017-10-12 Thread Thomas Koenig

Hello world,

after some thought, I think the PR can be fixed by something
far less invasive than my previous patch.

The new version of the patch simply issues an error for a
non-printable character (which should never be legal).
Anything else should be caught by other error reporting
routines.

OK for trunk?

Regards

Thomas

2017-10-12  Thomas Koenig  

PR fortran/82372
* fortran/scanner.c (last_error_char):  New global variable.
(gfc_scanner_init_1): Set last_error_char to NULL.
(gfc_gooble_whitespace): If a character not printable or
not newline, issue an error.

2017-10-12  Thomas Koenig  

PR fortran/82372
* gfortran.dg/illegal_char.f90: New test.
Index: scanner.c
===
--- scanner.c	(Revision 253530)
+++ scanner.c	(Arbeitskopie)
@@ -80,6 +80,7 @@ static struct gfc_file_change
 size_t file_changes_cur, file_changes_count;
 size_t file_changes_allocated;
 
+static gfc_char_t *last_error_char;
 
 /* Functions dealing with our wide characters (gfc_char_t) and
sequences of such characters.  */
@@ -269,6 +270,7 @@ gfc_scanner_init_1 (void)
   continue_line = 0;
 
   end_flag = 0;
+  last_error_char = NULL;
 }
 
 
@@ -1700,6 +1702,14 @@ gfc_gobble_whitespace (void)
 }
   while (gfc_is_whitespace (c));
 
+  if (!ISPRINT(c) && c != '\n' && last_error_char != gfc_current_locus.nextc)
+{
+  char buf[20];
+  last_error_char = gfc_current_locus.nextc;
+  snprintf (buf, 20, "%2.2X", c);
+  gfc_error_now ("Invalid character 0x%s at %C", buf);
+}
+
   gfc_current_locus = old_loc;
 }
 
#include "illegal_char.f90"


Re: Make more use of subreg_size_lowpart_offset

2017-10-12 Thread Jeff Law
On 08/23/2017 04:52 AM, Richard Sandiford wrote:
> This patch uses subreg_size_lowpart_offset in places that open-coded
> the calculation.  The reload use (and the LRA one that was based on it)
> seemed to ignore the BYTES_BIG_ENDIAN != WORDS_BIG_ENDIAN case; it's not
> obvious whether that was deliberate or an oversight.
> 
> Tested on aarch64-linux-gnu and x86_64-linux-gnu, and by making sure
> that there were no differences in testsuite assembly output for one
> target per CPU.  OK to install?
> 
> Richard
> 
> 
> 2017-08-23  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * lra-spills.c (assign_mem_slot): Use subreg_size_lowpart_offset.
>   * regcprop.c (maybe_mode_change): Likewise.
>   * reload1.c (alter_reg): Likewise.
I'm pretty sure the lra/reload case is just an oversight.  It's just
trying to get an adjustment so that it can make a MEM that covers the slot.

It's not absolutely clear to me that the regcprop bits are just an
open-coded subreg_size_lowpart_offset.  But I'll willing to trust you on
that one.

jeff


Minor tree-ssa-dse.c bugfix

2017-10-12 Thread Jeff Law


While working with Martin L's bugfix/cleanup I came across a minor bug
in the DSE code.


Specifically it mis-handles references with negative offsets.  Example
code can be found in gcc.dg/pr48335-4.c.  Thankfully these are
relatively uncommon and we can just prune them from consideration
without having major missed-optimization concerns.

Bootstrapped & regression tested with and without Martin L's bugfixes.
Installing on the trunk.

JEff
commit 261fc575195492afbedce5ebd872a2a25fe6efb1
Author: law 
Date:   Thu Oct 12 18:09:11 2017 +

* tree-ssa-dse.c (valid_ao_ref_for_dse): Reject ao_refs with
negative offsets.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@253691 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index dc17b705025..d5ee088e77f 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2017-10-12  Jeff Law  
+
+   * tree-ssa-dse.c (valid_ao_ref_for_dse): Reject ao_refs with
+   negative offsets.
+
 2017-10-12  Martin Sebor  
 
PR other/82301
diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c
index 6f58fffc693..87e2fce9ac5 100644
--- a/gcc/tree-ssa-dse.c
+++ b/gcc/tree-ssa-dse.c
@@ -131,6 +131,7 @@ valid_ao_ref_for_dse (ao_ref *ref)
  && ref->max_size != -1
  && ref->size != 0
  && ref->max_size == ref->size
+ && ref->offset >= 0
  && (ref->offset % BITS_PER_UNIT) == 0
  && (ref->size % BITS_PER_UNIT) == 0
  && (ref->size != -1));


Re: [PATCH 1/2] C++: avoid partial duplicate implementation of cp_parser_error

2017-10-12 Thread David Malcolm
On Wed, 2017-10-11 at 17:18 -0400, Jason Merrill wrote:
> On Tue, Sep 26, 2017 at 9:56 AM, David Malcolm 
> wrote:
> > In r251026 (aka 3fe34694f0990d1d649711ede0326497f8a849dc,
> > "C/C++: show pertinent open token when missing a close token")
> > I copied part of cp_parser_error into cp_parser_required_error,
> > leading to duplication of code.
> > 
> > This patch eliminates this duplication by merging the two copies of
> > the
> > code into a new cp_parser_error_1 subroutine.
> > 
> > Doing so removes an indentation level, making the patch appear to
> > have
> > more churn than it really does.
> 
> FWIW, you could also attach a patch generated with -w to ignore
> whitespace changes.
> 
> The patch is OK.
> 
> Jason

Thanks; committed to trunk as r253686.

For reference, here's a version of the patch generated with -w.

gcc/cp/ChangeLog:
* parser.c (get_matching_symbol): Move to before...
(cp_parser_error): Split out into...
(cp_parser_error_1): ...this new function, merging in content
from...
(cp_parser_required_error): ...here.  Eliminate partial duplicate
of body of cp_parser_error in favor of a call to the new
cp_parser_error_1 helper function.

gcc/testsuite/ChangeLog:
* g++.dg/parse/pragma2.C: Update to reflect reinstatement of the
"#pragma is not allowed here" error.
---
 gcc/cp/parser.c  | 119 ---
 gcc/testsuite/g++.dg/parse/pragma2.C |   4 +-
 2 files changed, 72 insertions(+), 51 deletions(-)

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index a556a21..4e08032 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -2769,16 +2769,43 @@ cp_lexer_peek_conflict_marker (cp_lexer *lexer, enum 
cpp_ttype tok1_kind,
   return true;
 }
 
-/* If not parsing tentatively, issue a diagnostic of the form
+/* Get a description of the matching symbol to TOKEN_DESC e.g. "(" for
+   RT_CLOSE_PAREN.  */
+
+static const char *
+get_matching_symbol (required_token token_desc)
+{
+  switch (token_desc)
+{
+default:
+  gcc_unreachable ();
+  return "";
+case RT_CLOSE_BRACE:
+  return "{";
+case RT_CLOSE_PAREN:
+  return "(";
+}
+}
+
+/* Subroutine of cp_parser_error and cp_parser_required_error.
+
+   Issue a diagnostic of the form
   FILE:LINE: MESSAGE before TOKEN
where TOKEN is the next token in the input stream.  MESSAGE
(specified by the caller) is usually of the form "expected
-   OTHER-TOKEN".  */
+   OTHER-TOKEN".
+
+   This bypasses the check for tentative passing, and potentially
+   adds material needed by cp_parser_required_error.
+
+   If MISSING_TOKEN_DESC is not RT_NONE, and MATCHING_LOCATION is not
+   UNKNOWN_LOCATION, then we have an unmatched symbol at
+   MATCHING_LOCATION; highlight this secondary location.  */
 
 static void
-cp_parser_error (cp_parser* parser, const char* gmsgid)
-{
-  if (!cp_parser_simulate_error (parser))
+cp_parser_error_1 (cp_parser* parser, const char* gmsgid,
+  required_token missing_token_desc,
+  location_t matching_location)
 {
   cp_token *token = cp_lexer_peek_token (parser->lexer);
   /* This diagnostic makes more sense if it is tagged to the line
@@ -2806,16 +2833,52 @@ cp_parser_error (cp_parser* parser, const char* gmsgid)
}
 }
 
-  rich_location richloc (line_table, input_location);
+  gcc_rich_location richloc (input_location);
+
+  bool added_matching_location = false;
+
+  if (missing_token_desc != RT_NONE)
+{
+  /* If matching_location != UNKNOWN_LOCATION, highlight it.
+Attempt to consolidate diagnostics by printing it as a
+   secondary range within the main diagnostic.  */
+  if (matching_location != UNKNOWN_LOCATION)
+   added_matching_location
+ = richloc.add_location_if_nearby (matching_location);
+}
+
+  /* Actually emit the error.  */
   c_parse_error (gmsgid,
 /* Because c_parser_error does not understand
CPP_KEYWORD, keywords are treated like
identifiers.  */
 (token->type == CPP_KEYWORD ? CPP_NAME : token->type),
 token->u.value, token->flags, );
+
+  if (missing_token_desc != RT_NONE)
+{
+  /* If we weren't able to consolidate matching_location, then
+print it as a secondary diagnostic.  */
+  if (matching_location != UNKNOWN_LOCATION
+ && !added_matching_location)
+   inform (matching_location, "to match this %qs",
+   get_matching_symbol (missing_token_desc));
 }
 }
 
+/* If not parsing tentatively, issue a diagnostic of the form
+  FILE:LINE: MESSAGE before TOKEN
+   where TOKEN is the next token in the input stream.  MESSAGE
+   (specified by the caller) is usually of the form "expected
+   OTHER-TOKEN".  */
+
+static void
+cp_parser_error (cp_parser* parser, const char* gmsgid)
+{
+  if (!cp_parser_simulate_error (parser))
+  

Re: patch to fix PR82353

2017-10-12 Thread Jakub Jelinek
On Thu, Oct 12, 2017 at 01:05:21PM -0400, Vladimir Makarov wrote:
> 
> 
> On 10/12/2017 12:49 PM, Jakub Jelinek wrote:
> > Hi!
> > 
> > On Wed, Oct 11, 2017 at 06:41:05PM -0400, Vladimir Makarov wrote:
> > > > Tested on x86_64-linux -m32/-m64, and verified with cc1plus before your
> > > > change, ok for trunk?
> > BTW, I think it is quite fragile to scan for the reload messages, so I've
> > cooked up a runtime test that fails before your patch and succeeds with your
> > patch.  Tested on x86_64-linux with -m32/-m64 (both with your patch reverted
> > and without), ok for trunk?
> > 
> > 
> OK.

Thanks.

> FYI, I am going to revert LRA part of the patch because it results in a
> failure bootstrap with go or ada.  I guess the new version of the patch will
> be ready tomorrow or on Monday.

Yeah, I've just noticed that too (my bootstrap failed in Ada).  I'll defer
committing the patch after you commit your new fix then.

Jakub


Re: patch to fix PR82353

2017-10-12 Thread Vladimir Makarov



On 10/12/2017 12:49 PM, Jakub Jelinek wrote:

Hi!

On Wed, Oct 11, 2017 at 06:41:05PM -0400, Vladimir Makarov wrote:

Tested on x86_64-linux -m32/-m64, and verified with cc1plus before your
change, ok for trunk?

BTW, I think it is quite fragile to scan for the reload messages, so I've
cooked up a runtime test that fails before your patch and succeeds with your
patch.  Tested on x86_64-linux with -m32/-m64 (both with your patch reverted
and without), ok for trunk?



OK.

FYI, I am going to revert LRA part of the patch because it results in a 
failure bootstrap with go or ada.  I guess the new version of the patch 
will be ready tomorrow or on Monday.




Re: patch to fix PR82353

2017-10-12 Thread Jakub Jelinek
Hi!

On Wed, Oct 11, 2017 at 06:41:05PM -0400, Vladimir Makarov wrote:
> > Tested on x86_64-linux -m32/-m64, and verified with cc1plus before your
> > change, ok for trunk?

BTW, I think it is quite fragile to scan for the reload messages, so I've
cooked up a runtime test that fails before your patch and succeeds with your
patch.  Tested on x86_64-linux with -m32/-m64 (both with your patch reverted
and without), ok for trunk?

2017-10-12  Jakub Jelinek  

PR sanitizer/82353
* g++.dg/ubsan/pr82353-2.C: New test.
* g++.dg/ubsan/pr82353-2-aux.cc: New file.
* g++.dg/ubsan/pr82353-2.h: New file.

--- gcc/testsuite/g++.dg/ubsan/pr82353-2.C.jj   2017-10-12 18:39:43.712341186 
+0200
+++ gcc/testsuite/g++.dg/ubsan/pr82353-2.C  2017-10-12 18:39:23.0 
+0200
@@ -0,0 +1,20 @@
+// PR sanitizer/82353
+// { dg-do run }
+// { dg-options "-fsanitize=undefined -fno-sanitize-recover=undefined 
-std=c++11 -O2 -w" }
+// { dg-additional-sources "pr82353-2-aux.cc" }
+
+#include "pr82353-2.h"
+
+unsigned long f, g;
+bool h, k, j, i;
+unsigned char l, m;
+short n;
+unsigned o;
+F p;
+
+int
+main ()
+{
+  foo ();
+  bar ();
+}
--- gcc/testsuite/g++.dg/ubsan/pr82353-2-aux.cc.jj  2017-10-12 
18:39:39.375393412 +0200
+++ gcc/testsuite/g++.dg/ubsan/pr82353-2-aux.cc 2017-10-12 18:38:47.0 
+0200
@@ -0,0 +1,32 @@
+// PR sanitizer/82353
+
+#include "pr82353-2.h"
+
+B a;
+E b;
+B C::c0;
+unsigned D::d0;
+
+void
+foo ()
+{
+  a.b1 = p.f2.e2.b1 = 5;
+}
+
+void
+bar ()
+{
+  int c = p.f2.e4.d1.a0 - -~p.f4 * 89;
+  q.c0.b0 = i > g * a.b0 * h - k % a.b1;
+  if ((~(m * j) && -~p.f4 * 90284000534361) % ~m * j)
+b.e2.b0 << l << f;
+  o = -~p.f4 * 89;
+  int d = p.f4;
+  if (b.e2.b0)
+b.e2.b1 = c;
+  bool e = ~-~p.f4;
+  a.b1 % e;
+  if (k / p.f2.e2.b1)
+b.e4.d0 = g * a.b0 * h;
+  n = j;
+}
--- gcc/testsuite/g++.dg/ubsan/pr82353-2.h.jj   2017-10-12 18:39:46.671305554 
+0200
+++ gcc/testsuite/g++.dg/ubsan/pr82353-2.h  2017-10-12 18:32:04.0 
+0200
@@ -0,0 +1,31 @@
+extern unsigned long f, g;
+extern bool h, i, j, k;
+extern unsigned char l, m;
+extern short n;
+extern unsigned o;
+struct B {
+  short b0 : 27;
+  long b1 : 10;
+};
+struct A {
+  int a0 : 5;
+};
+struct C {
+  static B c0;
+};
+struct D {
+  static unsigned d0;
+  A d1;
+};
+struct E {
+  B e2;
+  D e4;
+};
+struct F {
+  E f2;
+  short f4;
+};
+extern F p;
+extern C q;
+void foo ();
+void bar ();


Jakub


Zen tuning part 7: Fix ix86_adjust_cost

2017-10-12 Thread Jan Hubicka
Hi,
this patch fixes ix86_adjust_cost for zen support.  In particular the original
code was accounting memory latencies incorrectly (3 for integer, 2 for FP unit)
while they are 4 for integer and 7 for FP on this CPU.

Using lower latencies makes scheduler overly pesimistic about CPU's ability
to execute sequences involving loads effectively.

I have decided to split the code into new switch, even tought it is currently
similar to Athon-Buldozer tuning.  The reason is that some extra special cases
will appear here and Zen is probably good place to cut away from sharing
implementation with older AMD designs.

Bootstrapped/regtested x86_64-linux, will commit it shortly.

* x86-tune-sched.c (ix86_adjust_cost): Fix Zen support.
Index: config/i386/x86-tune-sched.c
===
--- config/i386/x86-tune-sched.c(revision 253651)
+++ config/i386/x86-tune-sched.c(working copy)
@@ -352,7 +352,6 @@ ix86_adjust_cost (rtx_insn *insn, int de
 case PROCESSOR_BDVER2:
 case PROCESSOR_BDVER3:
 case PROCESSOR_BDVER4:
-case PROCESSOR_ZNVER1:
 case PROCESSOR_BTVER1:
 case PROCESSOR_BTVER2:
 case PROCESSOR_GENERIC:
@@ -387,6 +386,35 @@ ix86_adjust_cost (rtx_insn *insn, int de
 
  if (cost >= loadcost)
cost -= loadcost;
+ else
+   cost = 0;
+   }
+  break;
+
+case PROCESSOR_ZNVER1:
+  /* Stack engine allows to execute push instructions in parall.  */
+  if ((insn_type == TYPE_PUSH || insn_type == TYPE_POP)
+ && (dep_insn_type == TYPE_PUSH || dep_insn_type == TYPE_POP))
+   return 0;
+
+  memory = get_attr_memory (insn);
+
+  /* Show ability of reorder buffer to hide latency of load by executing
+in parallel with previous instruction in case
+previous instruction is not needed to compute the address.  */
+  if ((memory == MEMORY_LOAD || memory == MEMORY_BOTH)
+ && !ix86_agi_dependent (dep_insn, insn))
+   {
+ enum attr_unit unit = get_attr_unit (insn);
+ int loadcost;
+
+ if (unit == UNIT_INTEGER || unit == UNIT_UNKNOWN)
+   loadcost = 4;
+ else
+   loadcost = 7;
+
+ if (cost >= loadcost)
+   cost -= loadcost;
  else
cost = 0;
}


RE: [PATCH][GCC][AArch64] Dot Product SIMD patterns [Patch (5/8)]

2017-10-12 Thread Tamar Christina


> -Original Message-
> From: Richard Earnshaw (lists) [mailto:richard.earns...@arm.com]
> Sent: 12 October 2017 13:58
> To: Tamar Christina; James Greenhalgh
> Cc: gcc-patches@gcc.gnu.org; nd; Marcus Shawcroft
> Subject: Re: [PATCH][GCC][AArch64] Dot Product SIMD patterns [Patch
> (5/8)]
> 
> On 06/10/17 13:45, Tamar Christina wrote:
> > Hi All,
> >
> > This is a respin with the feedback suggested.
> >
> > Regtested on arm-none-eabi, armeb-none-eabi, aarch64-none-elf and
> > aarch64_be-none-elf with no issues found.
> >
> > Ok for trunk?
> >
> > gcc/
> > 2017-10-06  Tamar Christina  
> >
> > * config/aarch64/aarch64-builtins.c
> > (aarch64_types_quadopu_lane_qualifiers): New.
> > (TYPES_QUADOPU_LANE): New.
> > * config/aarch64/aarch64-simd.md (aarch64_dot): New.
> > (dot_prod, aarch64_dot_lane): New.
> > (aarch64_dot_laneq): New.
> > * config/aarch64/aarch64-simd-builtins.def (sdot, udot): New.
> > (sdot_lane, udot_lane, sdot_laneq, udot_laneq): New.
> > * config/aarch64/iterators.md (sur): Add UNSPEC_SDOT,
> UNSPEC_UDOT.
> > (Vdottype, DOTPROD): New.
> > (sur): Add SDOT and UDOT.
> 
> OK if this passes a native bootstrap.

Boostrapped on aarch64-none-linux-gnu and no issues.

Thanks,
Tamar

> 
> R.
> > 
> > From: Tamar Christina
> > Sent: Tuesday, September 5, 2017 7:42:40 PM
> > To: James Greenhalgh
> > Cc: gcc-patches@gcc.gnu.org; nd; Richard Earnshaw; Marcus Shawcroft
> > Subject: Re: [PATCH][GCC][AArch64] Dot Product SIMD patterns [Patch
> > (5/8)]
> >
> >>
> >> 
> >> From: James Greenhalgh 
> >> Sent: Monday, September 4, 2017 12:01 PM
> >> To: Tamar Christina
> >> Cc: gcc-patches@gcc.gnu.org; nd; Richard Earnshaw; Marcus Shawcroft
> >> Subject: Re: [PATCH][GCC][AArch64] Dot Product SIMD patterns [Patch
> >> (5/8)]
> >>
> >> On Fri, Sep 01, 2017 at 02:22:17PM +0100, Tamar Christina wrote:
> >>> Hi All,
> >>>
> >>> This patch adds the instructions for Dot Product to AArch64 along
> >>> with the intrinsics and vectorizer pattern.
> >>>
> >>> Armv8.2-a dot product supports 8-bit element values both signed and
> >>> unsigned.
> >>>
> >>> Dot product is available from Arm8.2-a and onwards.
> >>>
> >>> Regtested and bootstrapped on aarch64-none-elf and no issues.
> >>>
> >>> Ok for trunk?
> >>>
> >>> gcc/
> >>> 2017-09-01  Tamar Christina  
> >>>
> >>>   * config/aarch64/aarch64-builtins.c
> >>>   (aarch64_types_quadopu_lane_qualifiers): New.
> >>>   (TYPES_QUADOPU_LANE): New.
> >>>   * config/aarch64/aarch64-simd.md (aarch64_dot):
> New.
> >>>   (dot_prod, aarch64_dot_lane):
> New.
> >>>   (aarch64_dot_laneq): New.
> >>>   * config/aarch64/aarch64-simd-builtins.def (sdot, udot): New.
> >>>   (sdot_lane, udot_lane, sdot_laneq, udot_laneq): New.
> >>>   * config/aarch64/iterators.md (UNSPEC_SDOT, UNSPEC_UDOT):
> New.
> >>>   (DOT_MODE, dot_mode, Vdottype, DOTPROD): New.
> >>>   (sur): Add SDOT and UDOT.
> >>>
> >>> --
> >>
> >>> diff --git a/gcc/config/aarch64/aarch64-simd.md
> >>> b/gcc/config/aarch64/aarch64-simd.md
> >>> index
> >>>
> f3e084f8778d70c82823b92fa80ff96021ad26db..21d46c84ab317c2d62afdf8c48
> >>> 117886aaf483b0 100644
> >>> --- a/gcc/config/aarch64/aarch64-simd.md
> >>> +++ b/gcc/config/aarch64/aarch64-simd.md
> >>> @@ -386,6 +386,87 @@
> >>>  }
> >>>  )
> >>>
> >>> +;; These instructions map to the __builtins for the Dot Product
> operations.
> >>> +(define_insn "aarch64_dot"
> >>> +  [(set (match_operand:VS 0 "register_operand" "=w")
> >>> + (unspec:VS [(match_operand:VS 1 "register_operand" "0")
> >>> + (match_operand: 2 "register_operand" "w")
> >>> + (match_operand: 3 "register_operand" "w")]
> >>> + DOTPROD))]
> >>> +  "TARGET_DOTPROD"
> >>> +  "dot\\t%0., %2., %3."
> >>> +  [(set_attr "type" "neon_dot")]
> >>
> >> Would there be a small benefit in modelling this as:
> >>
> >>   [(set (match_operand:VS 0 "register_operand" "=w")
> >> (add:VS ((match_operand:VS 1 "register_operand" "0")
> >>  (unsepc:VS [(match_operand: 2
> "register_operand" "w")
> >> (match_operand: 3 "register_operand" "w")]
> >> DOTPROD)))]
> >>
> >
> > Maybe, I can't think of anything at the moment, but it certainly won't hurt.
> >
> >>
> >>> +)
> >>> +
> >>> +;; These expands map to the Dot Product optab the vectorizer checks
> for.
> >>> +;; The auto-vectorizer expects a dot product builtin that also does
> >>> +an ;; accumulation into the provided register.
> >>> +;; Given the following pattern
> >>> +;;
> >>> +;; for (i=0; i >>> +;; c = a[i] * b[i];
> >>> +;; r += c;
> >>> +;; }
> >>> +;; return result;
> >>> +;;
> >>> +;; This can be auto-vectorized to
> >>> +;; r  = a[0]*b[0] + a[1]*b[1] + 

Re: [PATCH] Add a warning for invalid function casts

2017-10-12 Thread Joseph Myers
On Thu, 12 Oct 2017, Martin Sebor wrote:

> Yes.  In light of this discussion I am thinking it might be
> worthwhile to bring up the issue of generic function pointers
> with WG14 for C2X.

I'm fine with the idea of having a standard solution that (unlike void (*) 
(void)) cannot be called at all without converting to another type.  I 
just maintain that void (*) (void) is the de facto idiom for this and so 
the warning should reflect this even in the future presence of such a 
standard solution (much as we e.g. handle trailing [1] arrays in structs 
as possibly being used as flexible array members in code using that as a 
C89/C++-compatible idiom rather than relying on C99 flexible array 
members).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Add a warning for invalid function casts

2017-10-12 Thread Martin Sebor

On 10/12/2017 05:52 AM, Pedro Alves wrote:

On 10/11/2017 03:57 AM, Martin Sebor wrote:



[X] This can be function that takes an argument of an incomplete
type, such as:

  struct Incomplete;
  typedef void Uncallable (struct Incomplete);

Any function can safely be converted to Uncallable* without
the risk of being called with the wrong arguments.  The only
way to use an Uncallable* is to explicitly convert it to
a pointer to a function that can be called.


OOC, did you consider trying to get something like that added
to C proper, to some standard header?  I don't imagine that it'd
be much objectionable, and having a standard type may help
tooling give better diagnostics and suggestions.


Yes.  In light of this discussion I am thinking it might be
worthwhile to bring up the issue of generic function pointers
with WG14 for C2X.

Martin



Re: 0001-Part-1.-Add-generic-part-for-Intel-CET-enabling

2017-10-12 Thread Jeff Law
On 10/12/2017 02:12 AM, Tsimbalist, Igor V wrote:
>> Seems reasonable.  As a result something like
>> check_missing_nocf_check_attribute is going to just go away along with the
>> code in *-typeck.c which called it, right?  If so that seems like a nice 
>> cleanup.
> Yes, you are right.
> 
> Updated patch is attached.
> 
> 
> High-level design.
> --
> 
> A proposal is to introduce a target independent flag
> -fcf-protection=[none|branch|return|full] with a semantic to
> instrument a code to control validness or integrity of control-flow
> transfers using jump and call instructions. The main goal is to detect
> and block a possible malware execution through transfer the execution
> to unknown target address. Implementation could be either software or
> target based. Any target platforms can provide their implementation
> for instrumentation under this option.
> 
> When the -fcf-protection flag is set each implementation has
> to check if a support exists for a target platform and report an error
> if no support is found.
> 
> The compiler should instrument any control-flow transfer points in a
> program (ex. call/jmp/ret) as well as any landing pads, which are
> targets of control-flow transfers.
> 
> A new 'nocf_check' attribute is introduced to provide hand tuning
> support. The attribute directs the compiler to skip a call to a
> function and a function's landing pad from instrumentation. The
> attribute can be used for function and pointer to function types,
> otherwise it will be ignored. The attribute is saved in a type and
> propagated to a GIMPLE call statement and later to a call instruction.
> 
> Currently all platforms except i386 will report the error and do no
> instrumentation. i386 will provide the implementation based on a
> specification published by Intel for a new technology called
> Control-flow Enforcement Technology (CET).
> 
> gcc/c-family/
>   * c-attribs.c (handle_nocf_check_attribute): New function.
>   (c_common_attribute_table): Add 'nocf_check' handling.
> 
> gcc/c/
>   * gimple-parser.c: Add second argument NULL to
>   gimple_build_call_from_tree.
> 
> gcc/
>   * attrib.c (comp_type_attributes): Check nocf_check attribute.
>   * cfgexpand.c (expand_call_stmt): Set REG_CALL_NOCF_CHECK for
>   call insn.
>   * combine.c (distribute_notes): Add REG_CALL_NOCF_CHECK handling.
>   * common.opt: Add fcf-protection flag.
>   * emit-rtl.c (try_split): Add REG_CALL_NOCF_CHECK handling.
>   * flag-types.h: Add enum cf_protection_level.
>   * gimple.c (gimple_build_call_from_tree): Add second parameter.
>   Add 'nocf_check' attribute propagation to gimple call.
>   * gimple.h (gf_mask): Add GF_CALL_NOCF_CHECK.
>   (gimple_build_call_from_tree): Update prototype.
>   (gimple_call_nocf_check_p): New function.
>   (gimple_call_set_nocf_check): Likewise.
>   * gimplify.c: Add second argument to gimple_build_call_from_tree.
>   * ipa-icf.c: Add nocf_check attribute in statement hash.
>   * recog.c (peep2_attempt): Add REG_CALL_NOCF_CHECK handling.
>   * reg-notes.def: Add REG_NOTE (CALL_NOCF_CHECK).
>   * toplev.c (process_options): Add flag_cf_protection handling.
OK.  Sorry about the long delays.

jeff


Re: [PATCH][GRAPHITE] Fix PR82525

2017-10-12 Thread Sebastian Pop
On Oct 12, 2017 4:36 AM, "Richard Biener"  wrote:


The following avoids code-generation errors for modulo operations
resulting from our own constraints ending up as no-ops because
the type we code-generate in already imposes the modulo operation.

For the case in SPEC 2k6 triggering this we'd even know the
modulo constraint isn't necessary - we have

 int64_t lower, upper;
 if (lower < upper)
   uint64_t niter = (uint64_t)upper - (uint64_t)lower;

but there's no way to represent in GIMPLE that subtracting
the two signed values will yield a positive value fitting
in the corresponding unsigned type...  We'd need sth
like a MINUSU_EXPR (like the often proposed ABSU_EXPR or
the proposed POINTER_DIFF_EXPR).

This fixes the last code generation errors with SPEC CPU 2006
and -fgraphite-identity -floop-nest-optimize.

loop nest optimized: 483
loop nest not optimized, code generation error: 0
loop nest not optimized, optimized schedule is identical to original
schedule: 173
loop nest not optimized, optimization timed out: 60
loop nest not optimized, ISL signalled an error: 9
loop nest: 725

Note that we still (with and without this patch) get miscompares
in 465.tonto, 416.gamess and 403.gcc (we have those "wrong"
constraint thing leading to empty domains if you remember).

Bootstrap and regtest running on x86_64-unknown-linux-gnu, ok?

Thanks,
Richard.

2017-10-12  Richard Biener  

PR tree-optimization/82525
* graphite-isl-ast-to-gimple.c
(translate_isl_ast_to_gimple::widest_int_from_isl_expr_int): Split
out from ...
(translate_isl_ast_to_gimple::gcc_expression_from_isl_expr_int):
Here.
Fail code generation when we cannot represent the isl integer.
(binary_op_to_tree): Elide modulo operations that are no-ops
in the type we code generate.  Remove now superfluous code
generation errors.

* gcc.dg/graphite/id-30.c: New testcase.
* gfortran.dg/graphite/id-28.f90: Likewise.


Looks good.


Index: gcc/graphite-isl-ast-to-gimple.c
===
--- gcc/graphite-isl-ast-to-gimple.c(revision 253645)
+++ gcc/graphite-isl-ast-to-gimple.c(working copy)
@@ -177,6 +177,7 @@ class translate_isl_ast_to_gimple
   tree gcc_expression_from_isl_ast_expr_id (tree type,
__isl_keep isl_ast_expr
*expr_id,
ivs_params );
+  widest_int widest_int_from_isl_expr_int (__isl_keep isl_ast_expr *expr);
   tree gcc_expression_from_isl_expr_int (tree type,
 __isl_take isl_ast_expr *expr);
   tree gcc_expression_from_isl_expr_op (tree type,
@@ -265,29 +266,46 @@ gcc_expression_from_isl_ast_expr_id (tre
   return fold_convert (type, *val);
 }

-/* Converts an isl_ast_expr_int expression E to a GCC expression tree of
-   type TYPE.  */
+/* Converts an isl_ast_expr_int expression E to a widest_int.
+   Raises a code generation error when the constant doesn't fit.  */

-tree translate_isl_ast_to_gimple::
-gcc_expression_from_isl_expr_int (tree type, __isl_take isl_ast_expr *expr)
+widest_int translate_isl_ast_to_gimple::
+widest_int_from_isl_expr_int (__isl_keep isl_ast_expr *expr)
 {
   gcc_assert (isl_ast_expr_get_type (expr) == isl_ast_expr_int);
   isl_val *val = isl_ast_expr_get_val (expr);
   size_t n = isl_val_n_abs_num_chunks (val, sizeof (HOST_WIDE_INT));
   HOST_WIDE_INT *chunks = XALLOCAVEC (HOST_WIDE_INT, n);
-  tree res;
-  if (isl_val_get_abs_num_chunks (val, sizeof (HOST_WIDE_INT), chunks) ==
-1)
-res = NULL_TREE;
-  else
+  if (n > WIDE_INT_MAX_ELTS
+  || isl_val_get_abs_num_chunks (val, sizeof (HOST_WIDE_INT), chunks)
== -1)
 {
-  widest_int wi = widest_int::from_array (chunks, n, true);
-  if (isl_val_is_neg (val))
-   wi = -wi;
-  res = wide_int_to_tree (type, wi);
+  isl_val_free (val);
+  set_codegen_error ();
+  return 0;
 }
+  widest_int wi = widest_int::from_array (chunks, n, true);
+  if (isl_val_is_neg (val))
+wi = -wi;
   isl_val_free (val);
+  return wi;
+}
+
+/* Converts an isl_ast_expr_int expression E to a GCC expression tree of
+   type TYPE.  Raises a code generation error when the constant doesn't
fit.  */
+
+tree translate_isl_ast_to_gimple::
+gcc_expression_from_isl_expr_int (tree type, __isl_take isl_ast_expr *expr)
+{
+  widest_int wi = widest_int_from_isl_expr_int (expr);
   isl_ast_expr_free (expr);
-  return res;
+  if (codegen_error_p ())
+return NULL_TREE;
+  if (wi::min_precision (wi, TYPE_SIGN (type)) > TYPE_PRECISION (type))
+{
+  set_codegen_error ();
+  return NULL_TREE;
+}
+  return wide_int_to_tree (type, wi);
 }

 /* Converts a binary isl_ast_expr_op expression E to a GCC expression tree
of
@@ -296,14 +314,25 @@ gcc_expression_from_isl_expr_int (tree t
 tree translate_isl_ast_to_gimple::
 binary_op_to_tree (tree type, 

[PATCH, alpha]: Use std::swap some more.

2017-10-12 Thread Uros Bizjak
No functional changes.

2017-10-12  Uros Bizjak  

* config/alpha/alpha.c (alpha_split_conditional_move):
Use std::swap instead of manually swapping.
(alpha_stdarg_optimize_hook): Ditto.
(alpha_canonicalize_comparison): Ditto.

Bootstrapped and regression tested on alphaev68-linux-gnu.

Committed to mainline SVN.

Uros.
diff --git a/gcc/config/alpha/alpha.c b/gcc/config/alpha/alpha.c
index 41f3e3a..ece8879 100644
--- a/gcc/config/alpha/alpha.c
+++ b/gcc/config/alpha/alpha.c
@@ -2910,8 +2910,8 @@ alpha_split_conditional_move (enum rtx_code code, rtx 
dest, rtx cond,
   || (code == GE || code == GT))
 {
   code = reverse_condition (code);
-  diff = t, t = f, f = diff;
-  diff = t - f;
+  std::swap (t, f);
+  diff = -diff;
 }
 
   subtarget = target = dest;
@@ -6078,10 +6078,8 @@ alpha_stdarg_optimize_hook (struct stdarg_info *si, 
const gimple *stmt)
  else if (code2 == COMPONENT_REF
   && (code1 == MINUS_EXPR || code1 == PLUS_EXPR))
{
- gimple *tem = arg1_stmt;
+ std::swap (arg1_stmt, arg2_stmt);
  code2 = code1;
- arg1_stmt = arg2_stmt;
- arg2_stmt = tem;
}
  else
goto escapes;
@@ -9831,9 +9829,7 @@ alpha_canonicalize_comparison (int *code, rtx *op0, rtx 
*op1,
   && (*code == GE || *code == GT || *code == GEU || *code == GTU)
   && (REG_P (*op1) || *op1 == const0_rtx))
 {
-  rtx tem = *op0;
-  *op0 = *op1;
-  *op1 = tem;
+  std::swap (*op0, *op1);
   *code = (int)swap_condition ((enum rtx_code)*code);
 }
 


Re: [PATCH][GRAPHITE] Fix PR69728 in "another" way

2017-10-12 Thread Sebastian Pop
On Oct 12, 2017 9:08 AM, "Richard Biener"  wrote:


I made scheduling to fail when we end up with an empty domain but as
I forgot to actually check the return value of build_original_schedule
the fix was equivalent to just doing nothing to the schedule when
it has an empty domain.  I verified that for the testcase it DCEs
the relevant stmt and that this is a valid transform.

Bootstrapped and tested on x86_64-unknown-linux-gnu, SPEC 2k6 is also
happy with this.

Committed as obvious (since no functional change).

Richard.

2017-10-12  Richard Biener  

PR tree-optimization/69728
Revert
2017-09-19  Richard Biener  

PR tree-optimization/69728
* graphite-sese-to-poly.c (schedule_error): New global.
(add_loop_schedule): Handle empty domain by failing the
schedule.
(build_original_schedule): Handle schedule_error.

* graphite-sese-to-poly.c (add_loop_schedule): Handle empty
domain by returning an unchanged schedule.

* gcc.dg/graphite/pr69728.c: Adjust to reflect we can handle
the loop now.  Remove unrelated undefined behavior.


Looks good.


Index: gcc/graphite-sese-to-poly.c
===
--- gcc/graphite-sese-to-poly.c (revision 253645)
+++ gcc/graphite-sese-to-poly.c (working copy)
@@ -1066,8 +1051,6 @@ outer_projection_mupa (__isl_take isl_un
   return isl_multi_union_pw_aff_from_union_pw_multi_aff (data.res);
 }

-static bool schedule_error;
-
 /* Embed SCHEDULE in the constraints of the LOOP domain.  */

 static isl_schedule *
@@ -1082,11 +1065,9 @@ add_loop_schedule (__isl_take isl_schedu
 return empty < 0 ? isl_schedule_free (schedule) : schedule;

   isl_union_set *domain = isl_schedule_get_domain (schedule);
-  /* We cannot apply an empty domain to pbbs in this loop so fail.
- ??? Somehow drop pbbs in the loop instead.  */
+  /* We cannot apply an empty domain to pbbs in this loop so return
early.  */
   if (isl_union_set_is_empty (domain))
 {
-  schedule_error = true;
   isl_union_set_free (domain);
   return schedule;
 }
@@ -1216,8 +1197,6 @@ build_schedule_loop_nest (scop_p scop, i
 static bool
 build_original_schedule (scop_p scop)
 {
-  schedule_error = false;
-
   int i = 0;
   int n = scop->pbbs.length ();
   while (i < n)
@@ -1232,14 +1211,6 @@ build_original_schedule (scop_p scop)
   scop->original_schedule = add_in_sequence (scop->original_schedule,
s);
 }

-  if (schedule_error)
-{
-  if (dump_file)
-   fprintf (dump_file, "[sese-to-poly] failed to build "
-"original schedule\n");
-  return false;
-}
-
   if (dump_file)
 {
   fprintf (dump_file, "[sese-to-poly] original schedule:\n");
Index: gcc/testsuite/gcc.dg/graphite/pr69728.c
===
--- gcc/testsuite/gcc.dg/graphite/pr69728.c (revision 253645)
+++ gcc/testsuite/gcc.dg/graphite/pr69728.c (working copy)
@@ -1,7 +1,7 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -floop-nest-optimize" } */
+/* { dg-options "-O3 -floop-nest-optimize -fdump-tree-graphite-details" }
*/

-int a[1];
+int a[9];
 int b, c, d, e;
 void
 fn1 ()
@@ -19,3 +19,9 @@ fn1 ()
}
 }
 }
+
+/* At the moment only ISL figures that if (d) is always true.  We've
+   run into scheduling issues before here, not being able to handle
+   empty domains.  */
+
+/* { dg-final { scan-tree-dump "loop nest optimized" "graphite" } }  */


Re: [PATCH][GRAPHITE] Lift some IV restrictions

2017-10-12 Thread Sebastian Pop
On Oct 12, 2017 9:29 AM, "Richard Biener"  wrote:


The type check seems premature (we're checking CHRECs already) and
we certainly can handle POINTER IVs just fine.

Bootstrap / regtest running on x86_64-unknown-linux-gnu.

SPEC CPU 2k6 sees ~100 more loop nest optimizations that way.

Ok?

[I'd rather have problematical testcases for those weird
restrictions]


Sounds good.
Thanks.


Thanks,
Richard.

2017-10-12  Richard Biener  

* graphite-scop-detection.c (loop_ivs_can_be_represented): Remove.
(scop_detection::harmful_loop_in_region): Remove premature
IV type restriction.
(scop_detection::graphite_can_represent_scev): We can handle
pointer IVs just fine.

Index: gcc/graphite-scop-detection.c
===
--- gcc/graphite-scop-detection.c   (revision 253676)
+++ gcc/graphite-scop-detection.c   (working copy)
@@ -254,28 +254,6 @@ dot_cfg ()
   scops.release ();
 }

-/* Can all ivs be represented by a signed integer?
-   As isl might generate negative values in its expressions, signed loop
ivs
-   are required in the backend.  */
-
-static bool
-loop_ivs_can_be_represented (loop_p loop)
-{
-  unsigned type_long_long = TYPE_PRECISION (long_long_integer_type_node);
-  for (gphi_iterator psi = gsi_start_phis (loop->header); !gsi_end_p (psi);
-   gsi_next ())
-{
-  gphi *phi = psi.phi ();
-  tree res = PHI_RESULT (phi);
-  tree type = TREE_TYPE (res);
-
-  if (TYPE_UNSIGNED (type) && TYPE_PRECISION (type) >= type_long_long)
-   return false;
-}
-
-  return true;
-}
-
 /* Returns a COND_EXPR statement when BB has a single predecessor, the
edge between BB and its predecessor is not a loop exit edge, and
the last statement of the single predecessor is a COND_EXPR.  */
@@ -822,13 +800,6 @@ scop_detection::harmful_loop_in_region (
  return true;
}

-  if (! loop_ivs_can_be_represented (loop))
-   {
- DEBUG_PRINT (dp << "[scop-detection-fail] loop_" << loop->num
-  << "IV cannot be represented.\n");
- return true;
-   }
-
   /* Check if all loop nests have at least one data reference.
 ???  This check is expensive and loops premature at this point.
 If important to retain we can pre-compute this for all innermost
@@ -968,14 +939,6 @@ scop_detection::graphite_can_represent_s
   if (chrec_contains_undetermined (scev))
 return false;

-  /* We disable the handling of pointer types, because it’s currently not
- supported by Graphite with the isl AST generator. SSA_NAME nodes are
- the only nodes, which are disabled in case they are pointers to object
- types, but this can be changed.  */
-
-  if (POINTER_TYPE_P (TREE_TYPE (scev)) && TREE_CODE (scev) == SSA_NAME)
-return false;
-
   switch (TREE_CODE (scev))
 {
 case NEGATE_EXPR:


Re: [PATCH][GRAPHITE] Fix PR82451 (and PR82355 in a different way)

2017-10-12 Thread Sebastian Pop
On Oct 11, 2017 9:43 AM, "Richard Biener"  wrote:


For PR82355 I introduced a fake dimension to ISL to allow CHRECs
having an evolution in a loop that isn't fully part of the SESE
region we are processing.  That was easier than fending off those
CHRECs (without simply giving up on SESE regions with those).

But it didn't fully solve the issue as PR82451 shows where we run
into the issue that we eventually have to code-gen those
evolutions and thus in theory need a canonical IV of that containing loop.

So I decided (after Micha pressuring me a bit...) to revisit the
original issue and make SCEV analysis "properly" handle SE regions.
It turns out that it is mostly instantiate_scev lacking proper support
plus the necessary interfacing change (really just cosmetic in some sense)
from a instantiate_before basic-block to a instantiate_before edge.


Very nice.


data-ref interfaces have been similarly adjusted, here changing
the "loop nest" loop parameter to the entry edge for the SE region
and passing that down accordingly.

I've for now tried to keep other high-level loop-based interfaces the
same by simply using the loop preheader edge as entry where appropriate
(needing loop_preheader_edge cope with the loop root tree for simplicity).

In the process I ran into issues with us too overly aggressive
instantiating random trees and thus I cut those down.  That part
doesn't successfully test separately (when I remove the strange
ARRAY_REF instantiation), so it's part of this patch.  I've also
run into an SSA verification fail (the id-27.f90 testcase) which
shows we _do_ need to clear the SCEV cache after introducing
the versioned CFG (and added a comment before it).

On the previously failing testcases I've verified we produce
sensible instantiations for those pesky refs residing in "no" loop
in the SCOP and that we get away with the result in terms of
optimizing.

SPEC 2k6 testing shows

loop nest optimized: 311
loop nest not optimized, code generation error: 0
loop nest not optimized, optimized schedule is identical to original
schedule: 173
loop nest not optimized, optimization timed out: 59
loop nest not optimized, ISL signalled an error: 9
loop nest: 552

for SPEC 2k6 and -floop-nest-optimize while adding -fgraphite-identity
still reveals some codegen errors:

loop nest optimized: 437
loop nest not optimized, code generation error: 25
loop nest not optimized, optimized schedule is identical to original
schedule: 169
loop nest not optimized, optimization timed out: 60
loop nest not optimized, ISL signalled an error: 9
loop nest: 700

Bootstrap and regtest in progress on x86_64-unknown-linux-gnu
(with and without -fgraphite-identity -floop-nest-optimize).

Ok?


Looks good to me.
Thanks.


Thanks,
Richard.

2017-10-11  Richard Biener  

PR tree-optimization/82451
Revert
2017-10-02  Richard Biener  

PR tree-optimization/82355
* graphite-isl-ast-to-gimple.c (build_iv_mapping): Also build
a mapping for the enclosing loop but avoid generating one for
the loop tree root.
(copy_bb_and_scalar_dependences): Remove premature codegen
error on PHIs in blocks duplicated into multiple places.
* graphite-scop-detection.c
(scop_detection::stmt_has_simple_data_refs_p): For a loop not
in the region use it as loop and nest to analyze the DR in.
(try_generate_gimple_bb): Likewise.
* graphite-sese-to-poly.c (extract_affine_chrec): Adjust.
(add_loop_constraints): For blocks in a loop not in the region
create a dimension with a single iteration.
* sese.h (gbb_loop_at_index): Remove assert.

* cfgloop.c (loop_preheader_edge): For the loop tree root
return the single successor of the entry block.
* graphite-isl-ast-to-gimple.c (graphite_regenerate_ast_isl):
Reset the SCEV hashtable and niters.
* graphite-scop-detection.c
(scop_detection::graphite_can_represent_scev): Add SCOP parameter,
assert that we only have POLYNOMIAL_CHREC that vary in loops
contained in the region.
(scop_detection::graphite_can_represent_expr): Adjust.
(scop_detection::stmt_has_simple_data_refs_p): For loops
not in the region set loop to NULL.  The nest is now the
entry edge to the region.
(try_generate_gimple_bb): Likewise.
* sese.c (scalar_evolution_in_region): Adjust for
instantiate_scev change.
* tree-data-ref.h (graphite_find_data_references_in_stmt):
Make nest parameter the edge into the region.
(create_data_ref): Likewise.
* tree-data-ref.c (dr_analyze_indices): Make nest parameter an
entry edge into a region and adjust instantiate_scev calls.
(create_data_ref): Likewise.
(graphite_find_data_references_in_stmt): Likewise.
(find_data_references_in_stmt): Pass the loop preheader 

[PATCH][GRAPHITE] Lift some IV restrictions

2017-10-12 Thread Richard Biener

The type check seems premature (we're checking CHRECs already) and
we certainly can handle POINTER IVs just fine.

Bootstrap / regtest running on x86_64-unknown-linux-gnu.

SPEC CPU 2k6 sees ~100 more loop nest optimizations that way.

Ok?

[I'd rather have problematical testcases for those weird
restrictions]

Thanks,
Richard.

2017-10-12  Richard Biener  

* graphite-scop-detection.c (loop_ivs_can_be_represented): Remove.
(scop_detection::harmful_loop_in_region): Remove premature
IV type restriction.
(scop_detection::graphite_can_represent_scev): We can handle
pointer IVs just fine.

Index: gcc/graphite-scop-detection.c
===
--- gcc/graphite-scop-detection.c   (revision 253676)
+++ gcc/graphite-scop-detection.c   (working copy)
@@ -254,28 +254,6 @@ dot_cfg ()
   scops.release ();
 }
 
-/* Can all ivs be represented by a signed integer?
-   As isl might generate negative values in its expressions, signed loop ivs
-   are required in the backend.  */
-
-static bool
-loop_ivs_can_be_represented (loop_p loop)
-{
-  unsigned type_long_long = TYPE_PRECISION (long_long_integer_type_node);
-  for (gphi_iterator psi = gsi_start_phis (loop->header); !gsi_end_p (psi);
-   gsi_next ())
-{
-  gphi *phi = psi.phi ();
-  tree res = PHI_RESULT (phi);
-  tree type = TREE_TYPE (res);
-
-  if (TYPE_UNSIGNED (type) && TYPE_PRECISION (type) >= type_long_long)
-   return false;
-}
-
-  return true;
-}
-
 /* Returns a COND_EXPR statement when BB has a single predecessor, the
edge between BB and its predecessor is not a loop exit edge, and
the last statement of the single predecessor is a COND_EXPR.  */
@@ -822,13 +800,6 @@ scop_detection::harmful_loop_in_region (
  return true;
}
 
-  if (! loop_ivs_can_be_represented (loop))
-   {
- DEBUG_PRINT (dp << "[scop-detection-fail] loop_" << loop->num
-  << "IV cannot be represented.\n");
- return true;
-   }
-
   /* Check if all loop nests have at least one data reference.
 ???  This check is expensive and loops premature at this point.
 If important to retain we can pre-compute this for all innermost
@@ -968,14 +939,6 @@ scop_detection::graphite_can_represent_s
   if (chrec_contains_undetermined (scev))
 return false;
 
-  /* We disable the handling of pointer types, because it’s currently not
- supported by Graphite with the isl AST generator. SSA_NAME nodes are
- the only nodes, which are disabled in case they are pointers to object
- types, but this can be changed.  */
-
-  if (POINTER_TYPE_P (TREE_TYPE (scev)) && TREE_CODE (scev) == SSA_NAME)
-return false;
-
   switch (TREE_CODE (scev))
 {
 case NEGATE_EXPR:

Re: [PATCH GCC][6/7]Support loop nest distribution for builtin partition

2017-10-12 Thread Bin.Cheng
On Thu, Oct 12, 2017 at 2:32 PM, Richard Biener
<richard.guent...@gmail.com> wrote:
> On Thu, Oct 5, 2017 at 3:17 PM, Bin Cheng <bin.ch...@arm.com> wrote:
>> Hi,
>> This patch rewrites classification part of builtin partition so that nested
>> builtin partitions are supported.  With this extension, below loop nest:
>> void
>> foo (void)
>> {
>>   for (unsigned i = 0; i < M; ++i)
>> for (unsigned j = 0; j < N; ++j)
>>   arr[i][j] = 0;
>>
>> will be distributed into a single memset, rather than a loop of memset.
>> Bootstrap and test in patch set on x86_64 and AArch64, is it OK?
>
> +  tree access_size = fold_convert (sizetype, TYPE_SIZE_UNIT (TREE_TYPE 
> (ref)));
> +
>
> TYPE_SIZE_UNIT should be always sizetype.
Done.

>
> +  /* Classify the builtin kind.  */
> +  if (single_ld == NULL)
> +classify_builtin_1 (loop, partition, single_st);
> +  else
> +classify_builtin_2 (loop, rdg, partition, single_st, single_ld);
>
> maybe name those helpers classify_builtin_st and classify_builtin_ldst?
Done.  Patch updated in attachment, Will apply it later.

Thanks,
bin
2017-10-12  Bin Cheng  <bin.ch...@arm.com>

* tree-loop-distribution.c (struct builtin_info): New struct.
(struct partition): Refactor fields into struct builtin_info.
(partition_free): Free struct builtin_info.
(build_size_arg_loc, build_addr_arg_loc): Delete.
(generate_memset_builtin, generate_memcpy_builtin): Get memory range
information from struct builtin_info.
(find_single_drs): New function refactored from classify_partition.
Also moved builtin validity checks to this function.
(compute_access_range, alloc_builtin): New functions.
(classify_builtin_st, classify_builtin_ldst): New functions.
(classify_partition): Refactor code into functions find_single_drs,
classify_builtin_st and classify_builtin_ldst.
(distribute_loop): Don't do runtime alias check when distributing
loop nest.
(find_seed_stmts_for_distribution): New function.
(pass_loop_distribution::execute): Refactor code finding seed
stmts into above function.  Support distribution for the innermost
two-level loop nest.  Adjust dump information.

gcc/testsuite/ChangeLog
2017-10-12  Bin Cheng  <bin.ch...@arm.com>

* gcc.dg/tree-ssa/ldist-28.c: New test.
* gcc.dg/tree-ssa/ldist-29.c: New test.
* gcc.dg/tree-ssa/ldist-30.c: New test.
* gcc.dg/tree-ssa/ldist-31.c: New test.

>
> Ok with those changes.
>
> Thanks,
> Richard.
>
From 8271ce0851a60b38226e92558bca234774e5503e Mon Sep 17 00:00:00 2001
From: Bin Cheng <binch...@e108451-lin.cambridge.arm.com>
Date: Wed, 27 Sep 2017 13:00:59 +0100
Subject: [PATCH 6/7] loop_nest-builtin-pattern-20171012.txt

---
 gcc/testsuite/gcc.dg/tree-ssa/ldist-28.c |  16 +
 gcc/testsuite/gcc.dg/tree-ssa/ldist-29.c |  17 ++
 gcc/testsuite/gcc.dg/tree-ssa/ldist-30.c |  16 +
 gcc/testsuite/gcc.dg/tree-ssa/ldist-31.c |  19 ++
 gcc/tree-loop-distribution.c | 507 +++
 5 files changed, 377 insertions(+), 198 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ldist-28.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ldist-29.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ldist-30.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ldist-31.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-28.c b/gcc/testsuite/gcc.dg/tree-ssa/ldist-28.c
new file mode 100644
index 000..4420139
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-28.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-loop-distribution -ftree-loop-distribute-patterns -fdump-tree-ldist-details" } */
+
+#define M (256)
+#define N (1024)
+int arr[M][N];
+
+void
+foo (void)
+{
+  for (unsigned i = 0; i < M; ++i)
+for (unsigned j = 0; j < N; ++j)
+  arr[i][j] = 0;
+}
+
+/* { dg-final { scan-tree-dump "Loop nest . distributed: split to 0 loops and 1 library" "ldist" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-29.c b/gcc/testsuite/gcc.dg/tree-ssa/ldist-29.c
new file mode 100644
index 000..9ce93e8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-29.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-loop-distribution -ftree-loop-distribute-patterns -fdump-tree-ldist-details" } */
+
+#define M (256)
+#define N (512)
+int arr[M][N];
+
+void
+foo (void)
+{
+  for (unsigned i = 0; i < M; ++i)
+for (unsigned j = 0; j < N - 1; ++j)
+  arr[i][j] = 0;
+}
+
+/* { dg-final { scan-tree-dump-not "Loop nest . distributed: split to" "ldist" } } */
+/* { dg-final { scan-tree-dump-times "Loop . distributed: split to 0 loops and 1 library" 1 "ldist" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-30.c b/gcc/testsuite/g

[PATCH][GRAPHITE] Fix PR69728 in "another" way

2017-10-12 Thread Richard Biener

I made scheduling to fail when we end up with an empty domain but as
I forgot to actually check the return value of build_original_schedule
the fix was equivalent to just doing nothing to the schedule when
it has an empty domain.  I verified that for the testcase it DCEs
the relevant stmt and that this is a valid transform.

Bootstrapped and tested on x86_64-unknown-linux-gnu, SPEC 2k6 is also
happy with this.

Committed as obvious (since no functional change).

Richard.

2017-10-12  Richard Biener  

PR tree-optimization/69728
Revert
2017-09-19  Richard Biener  

PR tree-optimization/69728
* graphite-sese-to-poly.c (schedule_error): New global.
(add_loop_schedule): Handle empty domain by failing the
schedule.
(build_original_schedule): Handle schedule_error.

* graphite-sese-to-poly.c (add_loop_schedule): Handle empty
domain by returning an unchanged schedule.

* gcc.dg/graphite/pr69728.c: Adjust to reflect we can handle
the loop now.  Remove unrelated undefined behavior.

Index: gcc/graphite-sese-to-poly.c
===
--- gcc/graphite-sese-to-poly.c (revision 253645)
+++ gcc/graphite-sese-to-poly.c (working copy)
@@ -1066,8 +1051,6 @@ outer_projection_mupa (__isl_take isl_un
   return isl_multi_union_pw_aff_from_union_pw_multi_aff (data.res);
 }
 
-static bool schedule_error;
-
 /* Embed SCHEDULE in the constraints of the LOOP domain.  */
 
 static isl_schedule *
@@ -1082,11 +1065,9 @@ add_loop_schedule (__isl_take isl_schedu
 return empty < 0 ? isl_schedule_free (schedule) : schedule;
 
   isl_union_set *domain = isl_schedule_get_domain (schedule);
-  /* We cannot apply an empty domain to pbbs in this loop so fail.
- ??? Somehow drop pbbs in the loop instead.  */
+  /* We cannot apply an empty domain to pbbs in this loop so return early.  */
   if (isl_union_set_is_empty (domain))
 {
-  schedule_error = true;
   isl_union_set_free (domain);
   return schedule;
 }
@@ -1216,8 +1197,6 @@ build_schedule_loop_nest (scop_p scop, i
 static bool
 build_original_schedule (scop_p scop)
 {
-  schedule_error = false;
-
   int i = 0;
   int n = scop->pbbs.length ();
   while (i < n)
@@ -1232,14 +1211,6 @@ build_original_schedule (scop_p scop)
   scop->original_schedule = add_in_sequence (scop->original_schedule, s);
 }
 
-  if (schedule_error)
-{
-  if (dump_file)
-   fprintf (dump_file, "[sese-to-poly] failed to build "
-"original schedule\n");
-  return false;
-}
-
   if (dump_file)
 {
   fprintf (dump_file, "[sese-to-poly] original schedule:\n");
Index: gcc/testsuite/gcc.dg/graphite/pr69728.c
===
--- gcc/testsuite/gcc.dg/graphite/pr69728.c (revision 253645)
+++ gcc/testsuite/gcc.dg/graphite/pr69728.c (working copy)
@@ -1,7 +1,7 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -floop-nest-optimize" } */
+/* { dg-options "-O3 -floop-nest-optimize -fdump-tree-graphite-details" } */
 
-int a[1];
+int a[9];
 int b, c, d, e;
 void
 fn1 ()
@@ -19,3 +19,9 @@ fn1 ()
}
 }
 }
+
+/* At the moment only ISL figures that if (d) is always true.  We've
+   run into scheduling issues before here, not being able to handle
+   empty domains.  */
+
+/* { dg-final { scan-tree-dump "loop nest optimized" "graphite" } }  */


Re: [PATCH 1/2] add unique_ptr header

2017-10-12 Thread Trevor Saunders
On Wed, Oct 11, 2017 at 02:16:38PM -0400, David Malcolm wrote:
> On Sat, 2017-08-05 at 01:39 -0400, Trevor Saunders wrote:
> > On Fri, Aug 04, 2017 at 08:55:50PM +0100, Jonathan Wakely wrote:
> > > On 01/08/17 23:09 -0400, Trevor Saunders wrote:
> > > > aiui C++03 is C++98 with a few additions to the stl.
> > > 
> > > Again, STL != C++ Standard Library.
> > > 
> > > C++03 made some important changes to the core language,
> > > particularly
> > > the value-initialization rules.
> > > 
> > > Most of the library changes were small bug fixes, and most were to
> > > locales (which didn't originate in the STL and aren't even
> > > templates)
> > > and iostreams (which didn't originate in the STL).
> > > 
> > > There were also changes to std::auto_ptr (also not from the STL)
> > > which
> > > this might rely on (I haven't checked).
> > 
> > I doubt it, as Pedro said in his email it originally copied code from
> > std::auto_ptr, but it doesn't use the standard libraries definition
> > of
> > std::auto_ptr anywhere.  However please do feel free to look at the
> > implementation.
> > 
> > Trev
> > 
> 
> Trevor: did this go anywhere?

I'm sorry, I've been pretty busy starting a new job, and haven't yet
finished the paperwork to get permission to work on gcc stuff.  So I
haven't done anything since my last message in the thread.

> Do we have an unique_ptr class we can use within gcc's own
> implementation?

Not yet :(

> (I was hoping to use it for
>   https://gcc.gnu.org/ml/gcc-patches/2017-05/msg00439.html
>   "[PATCH 1/3] c-family: add name_hint/deferred_diagnostic";
> see the discussion at:
>   https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00123.html
> onwards)

I believe you should be good to take the last version of the patch I
submitted and sed the namespace name to whatever people can agree on.  I
think that was the only issue people had with the patch.  I'd really
like to see this get into gcc8, and will try to get stuff done on my
end, but I'd also be thankful if you felt like doing that work first.

Trev

> 
> Thanks
> Dave


RE: [PATCH][GCC][Testsuite][ARM][AArch64] Enable Dot Product for generic tests for ARM and AArch64 [Patch (7/8)]

2017-10-12 Thread Tamar Christina


> -Original Message-
> From: Richard Earnshaw (lists) [mailto:richard.earns...@arm.com]
> Sent: 12 October 2017 14:21
> To: Tamar Christina; James Greenhalgh
> Cc: gcc-patches@gcc.gnu.org; nd; Marcus Shawcroft
> Subject: Re: [PATCH][GCC][Testsuite][ARM][AArch64] Enable Dot Product
> for generic tests for ARM and AArch64 [Patch (7/8)]
> 
> On 06/10/17 13:45, Tamar Christina wrote:
> > Hi All,
> >
> > this is a respin with the changes suggested. Note that this patch is no 8/8 
> > in
> the series.
> >
> > Regtested on arm-none-eabi, armeb-none-eabi, aarch64-none-elf and
> > aarch64_be-none-elf with no issues found.
> >
> > Ok for trunk?
> >
> > gcc/testsuite
> > 2017-10-06  Tamar Christina  
> >
> > * gcc.dg/vect/vect-reduc-dot-s8a.c
> > (dg-additional-options, dg-require-effective-target): Add +dotprod.
> > * gcc.dg/vect/vect-reduc-dot-u8a.c
> > (dg-additional-options, dg-require-effective-target): Add +dotprod.
> > 
> > From: Tamar Christina
> > Sent: Monday, September 4, 2017 12:35:39 PM
> > To: James Greenhalgh
> > Cc: gcc-patches@gcc.gnu.org; nd; Richard Earnshaw; Marcus Shawcroft
> > Subject: RE: [PATCH][GCC][Testsuite][ARM][AArch64] Enable Dot Product
> > for generic tests for ARM and AArch64 [Patch (7/8)]
> >
> >> I'm surprised that this worked!
> >>
> >> It looks like you unconditionally add the -march=armv8.2-a+dotprod
> >> options, which should cause you to generate instructions which will
> >> not execute on targets which don't support this instruction. As far
> >> as I can see, this is an execute test, so that should cause undefined
> >> instruction exceptions on an Armv8-A target at the very least.
> >
> > It's not, there is no dg-do specified, which means it defaults to "compile"
> > This is a straight compilation tests that checks to see if the target
> > can do the reduction. There may be a main, but it's never executed,
> > which is why I don't have a hardware check against it.
> >
> > The unconditional armv8.2+dotprod is for this reason. It doesn't matter
> what hardware.
> >
> >>
> >> So, not OK in its current form.
> >>
> >> Thanks,
> >> James
> >>
> >>>
> >>> Ok for trunk?
> >>>
> >>> gcc/testsuite
> >>> 2017-09-01  Tamar Christina  
> >>>
> >>> * gcc.dg/vect/vect-reduc-dot-s8a.c
> >>> (dg-additional-options, dg-require-effective-target): Add +dotprod.
> >>> * gcc.dg/vect/vect-reduc-dot-u8a.c
> >>> (dg-additional-options, dg-require-effective-target): Add +dotprod.
> >>>
> >>> --
> >
> 
> iff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s8a.c
> b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s8a.c
> index
> dc4f52019d5435edbbc811b73dee0f98ff44c1b1..acb6862f8274fb954f69bd45e8
> edeedcdca4cbf7
> 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s8a.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s8a.c
> @@ -1,4 +1,7 @@
>  /* { dg-require-effective-target vect_int } */
> +/* { dg-require-effective-target arm_v8_2a_dotprod_neon_hw { target {
> aarch64*-*-* || arm*-*-* } } } */
> 
> Why do you need hardware with dot-product if these are compile-only
> tests?  (presumably that's what the _hw at the end of the require means).

James was right in that vect.exp overrides the default from compile to run for 
these tests,
So they are execution tests.

> 
> R.


Re: [PATCH GCC]Refine comment and set type for partition merged from SCC

2017-10-12 Thread Richard Biener
On Wed, Oct 11, 2017 at 6:10 PM, Bin Cheng  wrote:
> Hi,
> When reading the code I found it's could be confusing without comment.
> This patch adds comment explaining why we want merge PARALLEL type
> partitions in a SCC, even though the result partition can no longer
> be executed in parallel.  It also sets type of the result partition
> to sequential.
> Bootstrap and test on x86_64 and AArch64.  Is it OK?

Ok.

Richard.

> Thanks,
> bin
> 2017-10-10  Bin Cheng  
>
> * tree-loop-distribution.c (break_alias_scc_partitions): Add comment
> and set PTYPE_SEQUENTIAL for merged partition.


Re: [PATCH GCC][7/7]Merge adjacent memset builtin partitions

2017-10-12 Thread Richard Biener
On Thu, Oct 5, 2017 at 3:17 PM, Bin Cheng  wrote:
> Hi,
> This patch merges adjacent memset builtin partitions if possible.  It is
> a useful special case optimization transforming below code:
>
> #define M (256)
> #define N (512)
>
> struct st
> {
>   int a[M][N];
>   int c[M];
>   int b[M][N];
> };
>
> void
> foo (struct st *p)
> {
>   for (unsigned i = 0; i < M; ++i)
> {
>   p->c[i] = 0;
>   for (unsigned j = N; j > 0; --j)
> {
>   p->a[i][j - 1] = 0;
>   p->b[i][j - 1] = 0;
> }
> }
>
> into a single memset function call, rather than three calls initializing
> the structure field by field.
>
> Bootstrap and test in patch set on x86_64 and AArch64, is it OK?

+  /* Insertion sort is good enough for the small sub-array.  */
+  for (k = i + 1; k < j; ++k)
+   {
+ part1 = (*partitions)[k];
+ for (l = k; l > i; --l)
+   {
+ part2 = (*partitions)[l - 1];
+ if (part1->builtin->dst_base_offset
+   >= part2->builtin->dst_base_offset)
+   break;
+
+ (*partitions)[l] = part2;
+   }
+ (*partitions)[l] = part1;
+   }

so we want to sort [i, j[ after dst_base_offset.  I realize you don't want
to write a qsort helper for this but I can't wrap my head around the above
in 5 minutes so ... please!

You don't seem to check the sizes of the memsets given that they
obviously cannot overlap(!?)

Also why handle memset and not memcpy/memmove or combinations of
them (for sorting)?

  for ()
   {
  p->a[i] = 0;
  p->c[i] = q->c[i];
  p->b[i] = 0;
   }

with a and b adjacent.  I suppose p->c could be computed by a
non-builtin partition as well.
So don't we want to see if dependences allow sorting all builtin
partitions next to each other
as much as possible?  (maybe we do that already?)

Thanks,
Richard.


> Thanks,
> bin
> 2017-10-04  Bin Cheng  
>
> * tree-loop-distribution.c (tree-ssa-loop-ivopts.h): New header file.
> (struct builtin_info): New fields.
> (classify_builtin_1): Compute and record base and offset parts for
> memset builtin partition by calling strip_offset.
> (fuse_memset_builtins): New function.
> (finalize_partitions): Fuse adjacent memset partitions by calling
> above function.
> * tree-ssa-loop-ivopts.c (strip_offset): Delete static declaration.
> Expose the interface.
> * tree-ssa-loop-ivopts.h (strip_offset): New declaration.
>
> gcc/testsuite/ChangeLog
> 2017-10-04  Bin Cheng  
>
> * gcc.dg/tree-ssa/ldist-17.c: Adjust test string.
> * gcc.dg/tree-ssa/ldist-32.c: New test.


Re: [PATCH GCC][6/7]Support loop nest distribution for builtin partition

2017-10-12 Thread Richard Biener
On Thu, Oct 5, 2017 at 3:17 PM, Bin Cheng  wrote:
> Hi,
> This patch rewrites classification part of builtin partition so that nested
> builtin partitions are supported.  With this extension, below loop nest:
> void
> foo (void)
> {
>   for (unsigned i = 0; i < M; ++i)
> for (unsigned j = 0; j < N; ++j)
>   arr[i][j] = 0;
>
> will be distributed into a single memset, rather than a loop of memset.
> Bootstrap and test in patch set on x86_64 and AArch64, is it OK?

+  tree access_size = fold_convert (sizetype, TYPE_SIZE_UNIT (TREE_TYPE (ref)));
+

TYPE_SIZE_UNIT should be always sizetype.

+  /* Classify the builtin kind.  */
+  if (single_ld == NULL)
+classify_builtin_1 (loop, partition, single_st);
+  else
+classify_builtin_2 (loop, rdg, partition, single_st, single_ld);

maybe name those helpers classify_builtin_st and classify_builtin_ldst?

Ok with those changes.

Thanks,
Richard.

> Thanks,
> bin
> 2017-10-04  Bin Cheng  
>
> * tree-loop-distribution.c (struct builtin_info): New struct.
> (struct partition): Refactor fields into struct builtin_info.
> (partition_free): Free struct builtin_info.
> (build_size_arg_loc, build_addr_arg_loc): Delete.
> (generate_memset_builtin, generate_memcpy_builtin): Get memory range
> information from struct builtin_info.
> (find_single_drs): New function refactored from classify_partition.
> Also moved builtin validity checks to this function.
> (compute_access_range, alloc_builtin): New functions.
> (classify_builtin_1, classify_builtin_2): New functions.
> (classify_partition): Refactor code into functions find_single_drs,
> classify_builtin_1 and classify_builtin_2.
> (distribute_loop): Don't do runtime alias check when distributing
> loop nest.
> (find_seed_stmts_for_distribution): New function.
> (pass_loop_distribution::execute): Refactor code finding seed
> stmts into above function.  Support distribution for the innermost
> two-level loop nest.  Adjust dump information.
>
> gcc/testsuite/ChangeLog
> 2017-10-04  Bin Cheng  
>
> * gcc.dg/tree-ssa/ldist-28.c: New test.
> * gcc.dg/tree-ssa/ldist-29.c: New test.
> * gcc.dg/tree-ssa/ldist-30.c: New test.
> * gcc.dg/tree-ssa/ldist-31.c: New test.


Re: [PATCH][GCC][Testsuite][ARM][AArch64] Enable Dot Product for generic tests for ARM and AArch64 [Patch (7/8)]

2017-10-12 Thread Richard Earnshaw (lists)
On 06/10/17 13:45, Tamar Christina wrote:
> Hi All,
> 
> this is a respin with the changes suggested. Note that this patch is no 8/8 
> in the series.
> 
> Regtested on arm-none-eabi, armeb-none-eabi,
> aarch64-none-elf and aarch64_be-none-elf with no issues found.
> 
> Ok for trunk?
> 
> gcc/testsuite
> 2017-10-06  Tamar Christina  
> 
> * gcc.dg/vect/vect-reduc-dot-s8a.c
> (dg-additional-options, dg-require-effective-target): Add +dotprod.
> * gcc.dg/vect/vect-reduc-dot-u8a.c
> (dg-additional-options, dg-require-effective-target): Add +dotprod.
> 
> From: Tamar Christina
> Sent: Monday, September 4, 2017 12:35:39 PM
> To: James Greenhalgh
> Cc: gcc-patches@gcc.gnu.org; nd; Richard Earnshaw; Marcus Shawcroft
> Subject: RE: [PATCH][GCC][Testsuite][ARM][AArch64] Enable Dot Product for 
> generic tests for ARM and AArch64 [Patch (7/8)]
> 
>> I'm surprised that this worked!
>>
>> It looks like you unconditionally add the -march=armv8.2-a+dotprod options,
>> which should cause you to generate instructions which will not execute on
>> targets which don't support this instruction. As far as I can see, this is an
>> execute test, so that should cause undefined instruction exceptions on an
>> Armv8-A target at the very least.
> 
> It's not, there is no dg-do specified, which means it defaults to "compile"
> This is a straight compilation tests that checks to see if the target can do 
> the
> reduction. There may be a main, but it's never executed, which is why I don't
> have a hardware check against it.
> 
> The unconditional armv8.2+dotprod is for this reason. It doesn't matter what 
> hardware.
> 
>>
>> So, not OK in its current form.
>>
>> Thanks,
>> James
>>
>>>
>>> Ok for trunk?
>>>
>>> gcc/testsuite
>>> 2017-09-01  Tamar Christina  
>>>
>>> * gcc.dg/vect/vect-reduc-dot-s8a.c
>>> (dg-additional-options, dg-require-effective-target): Add +dotprod.
>>> * gcc.dg/vect/vect-reduc-dot-u8a.c
>>> (dg-additional-options, dg-require-effective-target): Add +dotprod.
>>>
>>> --
> 

iff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s8a.c
b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s8a.c
index
dc4f52019d5435edbbc811b73dee0f98ff44c1b1..acb6862f8274fb954f69bd45e8edeedcdca4cbf7
100644
--- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s8a.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s8a.c
@@ -1,4 +1,7 @@
 /* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target arm_v8_2a_dotprod_neon_hw { target {
aarch64*-*-* || arm*-*-* } } } */

Why do you need hardware with dot-product if these are compile-only
tests?  (presumably that's what the _hw at the end of the require means).

R.


Re: [PATCH][GCC][ARM] Dot Product NEON patterns [Patch (2/8)]

2017-10-12 Thread Richard Earnshaw (lists)
On 06/10/17 13:44, Tamar Christina wrote:
> Hi All,
> 
> this is a minor respin with changes echo'd from feedback from aarch64.
> I assume still OK for trunk.
> 
> Regtested on arm-none-eabi, armeb-none-eabi,
> aarch64-none-elf and aarch64_be-none-elf with no issues found.
> 
> Ok for trunk?
> 
> gcc/
> 2017-10-06  Tamar Christina  
> 
> * config/arm/arm-builtins.c (arm_unsigned_uternop_qualifiers): New.
> (UTERNOP_QUALIFIERS, arm_umac_lane_qualifiers, UMAC_LANE_QUALIFIERS): 
> New.
> * config/arm/arm_neon_builtins.def (sdot, udot, sdot_lane, 
> udot_lane): new.
> * config/arm/iterators.md (DOTPROD, VSI2QI, vsi2qi): New.
> (UNSPEC_DOT_S, UNSPEC_DOT_U, opsuffix): New.
> * config/arm/neon.md (neon_dot): New.
> (neon_dot_lane, dot_prod): New.
> * config/arm/types.md (neon_dot, neon_dot_q): New.
> * config/arm/unspecs.md (sup): Add UNSPEC_DOT_S, UNSPEC_DOT_U.

OK if this passes a native bootstrap.

R.

> 
> From: Kyrill Tkachov 
> Sent: Wednesday, September 13, 2017 10:36:38 AM
> To: Tamar Christina; gcc-patches@gcc.gnu.org
> Cc: nd; Ramana Radhakrishnan; Richard Earnshaw; ni...@redhat.com
> Subject: Re: [PATCH][GCC][ARM] Dot Product NEON patterns [Patch (2/8)]
> 
> Hi Tamar,
> 
> On 01/09/17 14:33, Tamar Christina wrote:
>> Hi All,
>>
>> This patch adds the instructions for Dot Product to ARM along
>> with the intrinsics and vectorizer pattern.
>>
>> Armv8.2-a dot product supports 8-bit element values both
>> signed and unsigned.
>>
>> Dot product is available from Armv8.2-a and onwards.
>>
>> Regtested and bootstrapped on arm-none-eabi and no issues.
>>
>> Ok for trunk?
> 
> This is ok once the prerequisites are approved with one ChangeLog nit.
> 
> Kyrill
> 
>> gcc/
>> 2017-09-01  Tamar Christina  
>>
>>   * config/arm/arm-builtins.c (arm_unsigned_uternop_qualifiers): New.
>>   (UTERNOP_QUALIFIERS, arm_umac_lane_qualifiers, UMAC_LANE_QUALIFIERS): 
>> New.
>>   * config/arm/arm_neon_builtins.def (sdot, udot, sdot_lane, udot_lane): 
>> new.
>>   * config/arm/iterators.md (DOTPROD, DOT_MODE, dot_mode): New.
>>   (UNSPEC_DOT_S, UNSPEC_DOT_U, opsuffix): New.
>>   * config/arm/neon.md (neon_dot): New.
>>   (neon_dot_lane, dot_prod): New.
>>   * config/arm/types.md (neon_dot, neon_dot_q): New.
>>   * config/arm/unspecs.md (UNSPEC_DOT_S, UNSPEC_DOT_U): New.
>>
> 
> diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
> index 
> 7acbaf1bb40a4f270e75968804546508f7839e49..139e09fd929e17216ad9383505f1453a73d071fb
>  100644
> --- a/gcc/config/arm/iterators.md
> 
> --snip---
> 
>   
> ;;
>   ;; Code attributes
>   
> ;;
> @@ -816,6 +822,7 @@
> (UNSPEC_VSRA_S_N "s") (UNSPEC_VSRA_U_N "u")
> (UNSPEC_VRSRA_S_N "s") (UNSPEC_VRSRA_U_N "u")
> (UNSPEC_VCVTH_S "s") (UNSPEC_VCVTH_U "u")
> +  (UNSPEC_DOT_S "s") (UNSPEC_DOT_U "u")
>   ])
> 
> In your ChangeLog you list this as "New" whereas your patch just adds them to 
> the "sup" int_attr.
> 



Re: [PATCH][RFC] Instrument function exit with __builtin_unreachable in C++.

2017-10-12 Thread Jason Merrill
On Thu, Oct 12, 2017 at 4:40 AM, Martin Liška  wrote:
> On 10/11/2017 04:59 PM, Jason Merrill wrote:
>> On Thu, Oct 5, 2017 at 12:53 PM, Martin Liška  wrote:
>>> On 10/05/2017 05:07 PM, Jason Merrill wrote:
 On Thu, Oct 5, 2017 at 6:31 AM, Martin Liška  wrote:
> As discussed 2 days ago on IRC with Jakub and Jonathan, C++ standard says
> that having a non-return
> function with missing return statement is undefined behavior. We've got
> -fsanitize=return check for
> that and we can in such case instrument __builtin_unreachable(). This
> patch does that.


 Great.

> And there's still some fallout:
>
> FAIL: g++.dg/cpp0x/constexpr-diag3.C  -std=c++11  (test for errors, line
> 7)
> FAIL: g++.dg/cpp0x/constexpr-neg3.C  -std=c++11  (test for errors, line
> 12)
> FAIL: g++.dg/cpp1y/constexpr-return2.C  -std=c++14  (test for errors,
> line 7)
> FAIL: g++.dg/cpp1y/constexpr-return2.C  -std=c++14 (test for excess
> errors)
> FAIL: g++.dg/cpp1y/pr63996.C  -std=c++14  (test for errors, line 9)
> FAIL: g++.dg/cpp1y/pr63996.C  -std=c++14 (test for excess errors)
>
> 1) there are causing:
>
> ./xg++ -B.
> /home/marxin/Programming/gcc/gcc/testsuite/g++.dg/cpp1y/pr63996.C
> /home/marxin/Programming/gcc/gcc/testsuite/g++.dg/cpp1y/pr63996.C:9:23:
> in constexpr expansion of ‘foo(1)’
> /home/marxin/Programming/gcc/gcc/testsuite/g++.dg/cpp1y/pr63996.C:4:1:
> error: ‘__builtin_unreachable()’ is not a constant expression
>   foo (int i)
>   ^~~
>
> Where we probably should not emit the built-in call. Am I right?


 Or constexpr.c could give a more friendly diagnostic for
 __builtin_unreachable.  It's correct to give a diagnostic here for
 this testcase.
>>>
>>>
>>> Hi.
>>>
>>> That's good idea, any suggestion different from:
>>>
>>> ./xg++ -B.
>>> /home/marxin/Programming/gcc2/gcc/testsuite/g++.dg/cpp1y/pr63996.C
>>> /home/marxin/Programming/gcc2/gcc/testsuite/g++.dg/cpp1y/pr63996.C:9:23:
>>> in constexpr expansion of ‘foo(1)’
>>> : error: constexpr can't contain call of a non-return function
>>> ‘__builtin_unreachable’
>>>
>>> which is probably misleading as it points to a function call that is
>>> actually missing in source code.
>>> Should we distinguish between implicit and explicit __builtin_unreachable?
>>
>> Probably without your change the constexpr code already diagnoses the
>> missing return as "constexpr call flows off the end of the function";
>> that same message seems appropriate.
>
> Hello.
>
> Ok, I've done that. Sending patch candidate.
>
>>
>>> So turning on the warning by default for c++, we get about 500 failing 
>>> test-cases. Uf :/
>>
>> Yes, we've been sloppy about this in the testsuite.  :(
>
> I'll fix it. For being sure, I'm sending first part where I demonstrate how I 
> plan to change
> tests, with following preference:
>
> 1) change function to void type if possible
> 2) return a default value for functions not returning a value (for complex 
> return value of a type A:
>I do { static A a; return a; } is it appropriate?

How about "return A();" ?

Jason


Re: [PATCH] Add further VEC_SELECT verification

2017-10-12 Thread Richard Biener
On Wed, 11 Oct 2017, Jakub Jelinek wrote:

> Hi!
> 
> This patch adds verification that vec_select in *.md files
> doesn't have any out of bounds indices in the selector.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, tested
> by building cc1 of aarch64, arm, powerpc64le and s390x cross,
> and tested by hacking up sse.md to have an out of bounds access there,
> ok for trunk?

Ok.

Richard.

> 2017-10-11  Jakub Jelinek  
> 
>   * genrecog.c (validate_pattern): For VEC_SELECT verify that
>   CONST_INT selectors are 0 to GET_MODE_NUNITS (imode) - 1.
> 
> --- gcc/genrecog.c.jj 2017-09-01 09:25:40.0 +0200
> +++ gcc/genrecog.c2017-10-11 17:53:20.107198630 +0200
> @@ -751,6 +751,21 @@ validate_pattern (rtx pattern, md_rtx_in
>   error_at (info->loc,
> "vec_select parallel with %d elements, expected %d",
> XVECLEN (XEXP (pattern, 1), 0), expected);
> +   else if (VECTOR_MODE_P (imode))
> + {
> +   unsigned int nelems = GET_MODE_NUNITS (imode);
> +   int i;
> +   for (i = 0; i < expected; ++i)
> + if (CONST_INT_P (XVECEXP (XEXP (pattern, 1), 0, i))
> + && (UINTVAL (XVECEXP (XEXP (pattern, 1), 0, i))
> + >= nelems))
> +   error_at (info->loc,
> + "out of bounds selector %u in vec_select, "
> + "expected at most %u",
> + (unsigned)
> + UINTVAL (XVECEXP (XEXP (pattern, 1), 0, i)),
> + nelems - 1);
> + }
>   }
> if (imode != VOIDmode && !VECTOR_MODE_P (imode))
>   error_at (info->loc, "%smode of first vec_select operand is not a "


Re: [PATCH] Add a warning for invalid function casts

2017-10-12 Thread Pedro Alves
On 10/11/2017 03:57 AM, Martin Sebor wrote:
> 
> 
> [X] This can be function that takes an argument of an incomplete
> type, such as:
> 
>   struct Incomplete;
>   typedef void Uncallable (struct Incomplete);
> 
> Any function can safely be converted to Uncallable* without
> the risk of being called with the wrong arguments.  The only
> way to use an Uncallable* is to explicitly convert it to
> a pointer to a function that can be called.

OOC, did you consider trying to get something like that added
to C proper, to some standard header?  I don't imagine that it'd
be much objectionable, and having a standard type may help
tooling give better diagnostics and suggestions.

Thanks,
Pedro Alves



Re: [PATCH] Add a warning for invalid function casts

2017-10-12 Thread Pedro Alves
On 10/11/2017 03:57 AM, Martin Sebor wrote:
> 
> 
> Incidentally, void(*)(void) in C++ is a poor choice for this
> use case also because of the language's default function
> arguments.  It's an easy mistake for a C++ programmer to make
> to assume that given, say:
> 
>   void foo (const char *s = "...");
> 
> or for any other function that provides default values for all
> its arguments, the function may be callable via void(*)(void):
> 
>   typedef void F (void);
> 
>   void (pf)(void) = (F*)foo;

I'd think it'd be much more common to write instead:

  typedef void F (void);

  F *pf = (F*)foo;

I.e., use the typedef on both sides of the assignment.

> 
> by having the (default) function argument value(s) magically
> substituted at the call site of:
> 
>pf ();
> 
> Bu since that's not the case it would be helpful for the new
> warning to detect this mistake.  By encouraging the use of
> 
>   typedef void F (...);
> 
> as the type of a pointer there is little chance of making such
> a mistake.

(and then) I don't think I understand this rationale.
If users follow the advice, they'll end up with:

  void foo (const char *s = "...");
  typedef void F (...);
  F *pf = (F *)foo;
  pf ();

which still compiles silently and calls the foo
function incorrectly.

Thanks,
Pedro Alves



Re: [PATCH] Add a warning for invalid function casts

2017-10-12 Thread Pedro Alves
On 10/11/2017 06:58 PM, Martin Sebor wrote:
> On 10/11/2017 11:26 AM, Joseph Myers wrote:
>> On Tue, 10 Oct 2017, Martin Sebor wrote:
>>
>>> The ideal solution for 1) would be a function pointer that can
>>> never be used to call a function (i.e., the void* equivalent
>>> for functions).[X]
>>
>> I don't think that's relevant.  The normal idiom for this in modern C
>> code, if not just using void *, is void (*) (void), and since the warning
>> is supposed to be avoiding excessive false positives and detecting the
>> cases that are likely to be used for ABI-incompatible calls, the warning
>> should allow void (*) (void) there.
> 
> I think we'll just have to agree to disagree.  I'm not convinced
> that using void(*)(void) for this is idiomatic or pervasive enough
> to drive design decisions.  Bernd mentioned just libgo and libffi
> as the code bases that do, and both you and I have noted that any
> pointer type works equally well for this purpose.  The problem
> with almost any type, including void(*) (void), is that they can
> be misused to call the incompatible function.  Since the sole
> purpose of this new warning is to help find those misuses,
> excluding void(*)(void) from the checking is directly at odds
> with its goal.

Switching type-erased function pointers from "void(*)(void)"
to "void(*)()" (in C) substantially increases the risk of code
calling the type-erased pointer without casting it back by accident:

 extern void foo (int, int);
 typedef void (type_erased_func) (void);
 type_erased_func *func = (type_erased_func *) foo;

 int main ()
 {
   func (1, 2); // error: too many arguments
 }

vs:

 extern void foo (int, int);
 typedef void (type_erased_func) (); // note: void dropped
 type_erased_func *func = (type_erased_func *) foo;

 int main ()
 {
   func (1, 2); // whoops, now silently compiles
 }

I think it'd be good if this were weighed in as well.  If 'void ()'
is picked as the special type, then maybe the above could
be at least addressed in the documentation, and/or
diagnostics/notes.

Thanks,
Pedro Alves



Re: [PATCH][GRAPHITE] Fix PR82451 (and PR82355 in a different way)

2017-10-12 Thread Bin.Cheng
On Thu, Oct 12, 2017 at 12:13 PM, Richard Biener  wrote:
> On Thu, 12 Oct 2017, Bin.Cheng wrote:
>
>> On Wed, Oct 11, 2017 at 3:43 PM, Richard Biener  wrote:
>> >
>> > For PR82355 I introduced a fake dimension to ISL to allow CHRECs
>> > having an evolution in a loop that isn't fully part of the SESE
>> > region we are processing.  That was easier than fending off those
>> > CHRECs (without simply giving up on SESE regions with those).
>> >
>> > But it didn't fully solve the issue as PR82451 shows where we run
>> > into the issue that we eventually have to code-gen those
>> > evolutions and thus in theory need a canonical IV of that containing loop.
>> >
>> > So I decided (after Micha pressuring me a bit...) to revisit the
>> > original issue and make SCEV analysis "properly" handle SE regions.
>> > It turns out that it is mostly instantiate_scev lacking proper support
>> > plus the necessary interfacing change (really just cosmetic in some sense)
>> > from a instantiate_before basic-block to a instantiate_before edge.
>> >
>> > data-ref interfaces have been similarly adjusted, here changing
>> > the "loop nest" loop parameter to the entry edge for the SE region
>> > and passing that down accordingly.
>> >
>> > I've for now tried to keep other high-level loop-based interfaces the
>> > same by simply using the loop preheader edge as entry where appropriate
>> > (needing loop_preheader_edge cope with the loop root tree for simplicity).
>> >
>> > In the process I ran into issues with us too overly aggressive
>> > instantiating random trees and thus I cut those down.  That part
>> > doesn't successfully test separately (when I remove the strange
>> > ARRAY_REF instantiation), so it's part of this patch.  I've also
>> > run into an SSA verification fail (the id-27.f90 testcase) which
>> > shows we _do_ need to clear the SCEV cache after introducing
>> > the versioned CFG (and added a comment before it).
>> >
>> > On the previously failing testcases I've verified we produce
>> > sensible instantiations for those pesky refs residing in "no" loop
>> > in the SCOP and that we get away with the result in terms of
>> > optimizing.
>> >
>> > SPEC 2k6 testing shows
>> >
>> > loop nest optimized: 311
>> > loop nest not optimized, code generation error: 0
>> > loop nest not optimized, optimized schedule is identical to original
>> > schedule: 173
>> > loop nest not optimized, optimization timed out: 59
>> > loop nest not optimized, ISL signalled an error: 9
>> > loop nest: 552
>> >
>> > for SPEC 2k6 and -floop-nest-optimize while adding -fgraphite-identity
>> > still reveals some codegen errors:
>> >
>> > loop nest optimized: 437
>> > loop nest not optimized, code generation error: 25
>> > loop nest not optimized, optimized schedule is identical to original
>> > schedule: 169
>> > loop nest not optimized, optimization timed out: 60
>> > loop nest not optimized, ISL signalled an error: 9
>> > loop nest: 700
>> >
>> > Bootstrap and regtest in progress on x86_64-unknown-linux-gnu
>> > (with and without -fgraphite-identity -floop-nest-optimize).
>> >
>> > Ok?
>> >
>> > Thanks,
>> > Richard.
>> >
>>
>> > Index: gcc/tree-scalar-evolution.c
>> > ===
>> > --- gcc/tree-scalar-evolution.c (revision 253645)
>> > +++ gcc/tree-scalar-evolution.c (working copy)
>> > @@ -2344,7 +2348,7 @@ static tree instantiate_scev_r (basic_bl
>> > instantiated, and to stop if it exceeds some limit.  */
>> >
>> >  static tree
>> > -instantiate_scev_name (basic_block instantiate_below,
>> > +instantiate_scev_name (edge instantiate_below,
>> >struct loop *evolution_loop, struct loop 
>> > *inner_loop,
>> >tree chrec,
>> >bool *fold_conversions,
>> > @@ -2358,7 +2362,7 @@ instantiate_scev_name (basic_block insta
>> >   evolutions in outer loops), nothing to do.  */
>> >if (!def_bb
>> >|| loop_depth (def_bb->loop_father) == 0
>> > -  || dominated_by_p (CDI_DOMINATORS, instantiate_below, def_bb))
>> > +  || ! dominated_by_p (CDI_DOMINATORS, def_bb, 
>> > instantiate_below->dest))
>> >  return chrec;
>> >
>> >/* We cache the value of instantiated variable to avoid exponential
>> > @@ -2380,6 +2384,51 @@ instantiate_scev_name (basic_block insta
>> >
>> >def_loop = find_common_loop (evolution_loop, def_bb->loop_father);
>> >
>> > +  if (! dominated_by_p (CDI_DOMINATORS,
>> > +   def_loop->header, instantiate_below->dest))
>> > +{
>> > +  gimple *def = SSA_NAME_DEF_STMT (chrec);
>> > +  if (gassign *ass = dyn_cast  (def))
>> > +   {
>> > + switch (gimple_assign_rhs_class (ass))
>> > +   {
>> > +   case GIMPLE_UNARY_RHS:
>> > + {
>> > +   tree op0 = instantiate_scev_r (instantiate_below, 
>> > evolution_loop,
>> > +  

Re: [PATCH] Add -fsanitize=pointer-{compare,subtract}.

2017-10-12 Thread Jakub Jelinek
On Thu, Oct 12, 2017 at 01:30:34PM +0200, Martin Liška wrote:
> There's one false positive I've noticed:
> 
> $ cat /tmp/ptr-cmp.c
> int
> __attribute__((noinline))
> foo(char *p1, char *p2)
> {
>   if (p2 != 0 && p1 > p2)
> return 0;
> 
>   return 1;
> }

Guess that is an argument for instrumenting pointer-compare/pointer-subtract
earlier (in the FEs, perhaps into internal-fn).

Because then it will have side-effects and thus folding (generic as well as
during gimplification and on early gimple) will not do this kind of
optimization with it.
Of course you'd need to handle constexpr and folding in initializers then...

Jakub


Re: patch to fix PR82353

2017-10-12 Thread Uros Bizjak
Hello!

> The following patch fixes
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82353
>
> LRA did not update hard reg liveness on bb borders for hard regs which are 
> part of insn patterns
> like CFLAGS reg. It was ok for inheritance in EBB which creates only moves 
> and they usually
> have no embedded hard regs in the patterns. But LRA rematerialization needs 
> this. So this patch
> implements such hard reg liveness updates.
>
> The patch was successfully bootstrapped and tested on x86-64.
>
> Committed as rev. 253656.

It looks that this patch caused libgo build bootstrap failure on
x86_64-linux-gnu:

../../../git/gcc/libgo/go/cmd/go/internal/load/pkg.go:377:1: error:
this is the insn:
(insn 1134 2835 2611 195 (parallel [
(set (reg:DI 492)
(const_int 0 [0]))
(set (reg/f:DI 1008 [orig:490  ] [490])
(plus:DI (ashift:DI (reg:DI 492)
(const_int 3 [0x3]))
(reg/f:DI 1008 [orig:490  ] [490])))
(set (reg/f:DI 1009 [880])
(plus:DI (ashift:DI (reg:DI 492)
(const_int 3 [0x3]))
(reg/f:DI 1009 [880])))
(set (mem:BLK (reg/f:DI 1008 [orig:490  ] [490])
[22 *_372+0 S1048 A64])
(mem/c:BLK (reg/f:DI 1009 [880]) [22 GOTMP.492+0 S1048 A128]))
(use (reg:DI 492))
]) "../../../git/gcc/libgo/go/cmd/go/internal/load/pkg.go":679
930 {*rep_movdi_rex64}
 (expr_list:REG_UNUSED (reg/f:DI 1009 [880])
(expr_list:REG_UNUSED (reg/f:DI 1008 [orig:490  ] [490])
(expr_list:REG_UNUSED (reg:DI 492)
(expr_list:REG_EH_REGION (const_int 1 [0x1])
(nil))
during RTL pass: reload
../../../git/gcc/libgo/go/cmd/go/internal/load/pkg.go:377:1: internal
compiler error: in assign_by_spills, at lra-assigns.c:1468
0x6b2709 _fatal_insn(char const*, rtx_def const*, char const*, int, char const*)
../../git/gcc/gcc/rtl-error.c:108
0xbe87fa assign_by_spills
../../git/gcc/gcc/lra-assigns.c:1468
0xbe94a6 lra_assign()
../../git/gcc/gcc/lra-assigns.c:1662
0xbe48d4 lra(_IO_FILE*)
../../git/gcc/gcc/lra.c:2450
0xb9ec51 do_reload
../../git/gcc/gcc/ira.c:5440
0xb9ec51 execute
../../git/gcc/gcc/ira.c:5624
Please submit a full bug report,

Uros.


Re: [PATCH] Add -fsanitize=pointer-{compare,subtract}.

2017-10-12 Thread Jakub Jelinek
On Thu, Oct 12, 2017 at 01:13:56PM +0200, Martin Liška wrote:
> +  if (a1 == a2)
> +return;
> +
> +  uptr shadow_offset1, shadow_offset2;
> +  bool valid1, valid2;
> +  {
> +ThreadRegistryLock l(());
> +
> +valid1 = GetStackVariableBeginning(a1, _offset1);
> +valid2 = GetStackVariableBeginning(a2, _offset2);
> +  }
> +
> +  if (valid1 && valid2) {
> +if (shadow_offset1 == shadow_offset2)
> +  return;
>}
> +  else if (!valid1 && !valid2) {
> +AsanChunkView chunk1 = FindHeapChunkByAddress(a1);
> +AsanChunkView chunk2 = FindHeapChunkByAddress(a2);
> +valid1 = chunk1.IsAllocated();
> +valid2 = chunk2.IsAllocated();
> +
> +if (valid1 && valid2) {
> +  if (chunk1.Eq(chunk2))
> + return;
> +}
> +else if (!valid1 && !valid2) {
> +  uptr offset = a1 < a2 ? a2 - a1 : a1 - a2;
> +  uptr left = a1 < a2 ? a1 : a2;
> +  if (offset <= 2048) {
> + if (__asan_region_is_poisoned (left, offset) == 0)
> +   return;
> + else {
> +   GET_CALLER_PC_BP_SP;
> +   ReportInvalidPointerPair(pc, bp, sp, a1, a2);
> +   return;
> + }

My assumption was that this offset/left stuff would be done
right after the if (a1 == a2) above, i.e. after the most common
case - equality comparison, handle the case of pointers close to each other
without any expensive locking and data structure lookup (also, you only need
to compute left if offset is <= 2048).  Only for larger distances, you'd
go through the other cases.  I.e. see if one or both pointers point
into a stack variable, or heap, or global registered variable.

For those 3 cases, I wonder if pairs of valid1 = ...; valid2 = ...;
is what is most efficient.  What we really want is to error out if
one pointer is valid in any of the categories, but the other is
not.  So, isn't the best general algorithm something like:
  if (a1 == a2)
return;
  if (distance <= 2048)
{
  if (not poisoned area)
return;
}
  else if (heap (a1))
{
  if (heap (a2) && same_heap_object)
return;
}
  else if (stackvar (a1))
{
  if (stackvar (a2) && same_stackvar)
return;
}
  else if (globalvar (a1))
{
  if (globalvar (a2) && same_globalvar)
return;
}
  else
return; /* ??? We don't know what the pointers point to, punt.
   Or perhaps try the not poisoned area check even for
   slightly larger distances (like 16K) and only punt
   for larger?  */
  error;

?  What order of the a1 tests should be done depends on how expensive
those tests are and perhaps some data gathering on real-world apps
on what pointers in the comparisons/subtracts are most common.

Jakub


Re: [PATCH] Add -fsanitize=pointer-{compare,subtract}.

2017-10-12 Thread Martin Liška
Hi.

There's one false positive I've noticed:

$ cat /tmp/ptr-cmp.c
int
__attribute__((noinline))
foo(char *p1, char *p2)
{
  if (p2 != 0 && p1 > p2)
return 0;

  return 1;
}

int main(int argc, char **argv)
{
  return foo(argv[0], 0);
}

$  gcc /tmp/ptr-cmp.c -fsanitize=address,pointer-compare -O2 
-fdump-tree-asan1=/dev/stdout && ./a.out

__attribute__((noinline))
foo (char * p1, char * p2)
{
  _Bool _1;
  _Bool _2;
  _Bool _3;
  _Bool _8;
  int _9;

   [100.00%] [count: INV]:
  _1 = p2_5(D) != 0B;
  __builtin___sanitizer_ptr_cmp (p2_5(D), p1_6(D));
  _2 = p2_5(D) < p1_6(D);
  _3 = _1 & _2;
  _8 = ~_3;
  _9 = (int) _8;
  return _9;

}

==31859==ERROR: AddressSanitizer: invalid-pointer-pair: 0x 
0x7ffccadb4ff9
#0 0x400756 in foo 
(/home/marxin/Programming/postgres/src/pl/plpgsql/src/a.out+0x400756)
#1 0x1513cde71f49 in __libc_start_main (/lib64/libc.so.6+0x20f49)
#2 0x400689 in _start 
(/home/marxin/Programming/postgres/src/pl/plpgsql/src/a.out+0x400689)

As I've been reading dump files, it's already in gimple dump:
cat ptr-cmp.c.004t.gimple
__attribute__((noinline))
foo (char * p1, char * p2)
{
  int D.2181;

  _1 = p2 != 0B;
  _2 = p1 > p2;
  _3 = _1 & _2;
  if (_3 != 0) goto ; else goto ;
...

Thoughts?
Martin


Re: [PATCH] Add -fsanitize=pointer-{compare,subtract}.

2017-10-12 Thread Martin Liška
On 10/11/2017 04:22 PM, Jakub Jelinek wrote:
> On Wed, Oct 11, 2017 at 03:36:40PM +0200, Martin Liška wrote:
>>> std::swap(addr1, addr2); ?  I don't see it used in any of libsanitizer
>>> though, so not sure if the corresponding STL header is included.
>>
>> They don't use it anywhere and I had some #include issues. That's why I did 
>> it manually.
> 
> Ok.
> 
>>> There are many kinds of shadow memory markings.  My thought was that it
>>> would start with a quick check, perhaps vectorized by hand (depending on if
>>> the arch has unaligned loads maybe without or with a short loop for
>>
>> Did that, but I have no experience how to make decision about prologue that 
>> will
>> align the pointer? Any examples?
> 
> First of all, why are you aligning anything?
>> +  uptr aligned_addr1 = addr1 & ~(SANITIZER_WORDSIZE/8 - 1);  // align addr.
>> +  uptr aligned_addr2 = addr2 & ~(SANITIZER_WORDSIZE/8 - 1);  // align addr.
> You want to compare what the pointers point to, not what the aligned pointer
> points to.
> 
> Actually, looking around, there already is __sanitizer::mem_is_zero
> which does what we want.
> 
> Or even __asan_region_is_poisoned(addr1, addr2 - addr1).

Hi.

Thanks, the function works fine. I added various tests for global variables and 
found
that my check with GetGlobalAddressInformation was wrong Thus I'm adding
bool GlobalAddressDescription::PointsInsideASameVariable.

Another change I did is:
gcc -fsanitize=pointer-compare /tmp/main.c
cc1: error: ‘-fsanitize=pointer-compare’ must be combined with 
‘-fsanitize=address’ or ‘-fsanitize=kernel-address’

Which would make life easier in gcc.c, where one would have to distinguish 
between -fsanitize=pointer-compare and
fsanitize=pointer-compare,kernel-address and according to -lasan will be added. 
Would it be possible to require
it explicitly?

I'm adding some numbers for postgres:

  1 SUMMARY: AddressSanitizer: invalid-pointer-pair 
/home/marxin/Programming/postgres/src/backend/utils/mmgr/freepage.c:673 in 
FreePageBtreeCleanup
  1 SUMMARY: AddressSanitizer: invalid-pointer-pair 
/home/marxin/Programming/postgres/src/port/qsort_arg.c:168 in qsort_arg
  2 SUMMARY: AddressSanitizer: invalid-pointer-pair 
/home/marxin/Programming/postgres/src/backend/regex/rege_dfa.c:316 in matchuntil
  3 SUMMARY: AddressSanitizer: invalid-pointer-pair 
/home/marxin/Programming/postgres/src/backend/utils/mmgr/freepage.c:1543 in 
FreePageManagerPutInternal
  4 SUMMARY: AddressSanitizer: invalid-pointer-pair 
/home/marxin/Programming/postgres/src/backend/access/gin/gindatapage.c:155 in 
GinDataLeafPageGetItems
  4 SUMMARY: AddressSanitizer: invalid-pointer-pair 
/home/marxin/Programming/postgres/src/common/base64.c:160 in pg_b64_decode
  4 SUMMARY: AddressSanitizer: invalid-pointer-pair 
/home/marxin/Programming/postgres/src/common/keywords.c:103 in ScanKeywordLookup
 14 SUMMARY: AddressSanitizer: invalid-pointer-pair 
/home/marxin/Programming/postgres/src/backend/access/hash/hashovfl.c:768 in 
_hash_initbitmapbuffer
 18 SUMMARY: AddressSanitizer: invalid-pointer-pair 
/home/marxin/Programming/postgres/src/backend/utils/mmgr/freepage.c:187 in 
FreePageManagerInitialize
 26 SUMMARY: AddressSanitizer: invalid-pointer-pair 
/home/marxin/Programming/postgres/src/port/qsort.c:188 in pg_qsort
 40 SUMMARY: AddressSanitizer: invalid-pointer-pair 
/home/marxin/Programming/postgres/src/backend/utils/mmgr/dsa.c:1358 in init_span
 43 SUMMARY: AddressSanitizer: invalid-pointer-pair 
/home/marxin/Programming/postgres/src/backend/utils/mmgr/freepage.c:1388 in 
FreePageManagerGetInternal
 87 SUMMARY: AddressSanitizer: invalid-pointer-pair 
/home/marxin/Programming/postgres/src/pl/plpgsql/src/pl_scanner.c:666 in 
plpgsql_location_to_lineno
 92 SUMMARY: AddressSanitizer: invalid-pointer-pair 
/home/marxin/Programming/postgres/src/port/qsort.c:168 in pg_qsort
154 SUMMARY: AddressSanitizer: invalid-pointer-pair 
/home/marxin/Programming/postgres/src/backend/storage/buffer/bufmgr.c:385 in 
ForgetPrivateRefCountEntry
273 SUMMARY: AddressSanitizer: invalid-pointer-pair 
/home/marxin/Programming/postgres/src/backend/storage/ipc/shm_toc.c:182 in 
shm_toc_insert
   1158 SUMMARY: AddressSanitizer: invalid-pointer-pair 
/home/marxin/Programming/postgres/src/backend/regex/rege_dfa.c:153 in longest
   3906 SUMMARY: AddressSanitizer: invalid-pointer-pair 
/home/marxin/Programming/postgres/src/backend/nodes/tidbitmap.c:840 in 
tbm_prepare_shared_iterate
   6545 SUMMARY: AddressSanitizer: invalid-pointer-pair 
/home/marxin/Programming/postgres/src/backend/utils/adt/formatting.c:4582 in 
NUM_numpart_to_char

There are some comparisons with NULL:
==28245==ERROR: AddressSanitizer: invalid-pointer-pair: 0x625000e42139 
0x

and quite some:
Address 0x1465a178e5e0 is a wild pointer.

Which is quite interesting reason, I'll investigate where a memory comes from.

Thanks,
Martin

> 
>   Jakub
> 

>From 

Re: [PATCH][GRAPHITE] Fix PR82451 (and PR82355 in a different way)

2017-10-12 Thread Richard Biener
On Thu, 12 Oct 2017, Bin.Cheng wrote:

> On Wed, Oct 11, 2017 at 3:43 PM, Richard Biener  wrote:
> >
> > For PR82355 I introduced a fake dimension to ISL to allow CHRECs
> > having an evolution in a loop that isn't fully part of the SESE
> > region we are processing.  That was easier than fending off those
> > CHRECs (without simply giving up on SESE regions with those).
> >
> > But it didn't fully solve the issue as PR82451 shows where we run
> > into the issue that we eventually have to code-gen those
> > evolutions and thus in theory need a canonical IV of that containing loop.
> >
> > So I decided (after Micha pressuring me a bit...) to revisit the
> > original issue and make SCEV analysis "properly" handle SE regions.
> > It turns out that it is mostly instantiate_scev lacking proper support
> > plus the necessary interfacing change (really just cosmetic in some sense)
> > from a instantiate_before basic-block to a instantiate_before edge.
> >
> > data-ref interfaces have been similarly adjusted, here changing
> > the "loop nest" loop parameter to the entry edge for the SE region
> > and passing that down accordingly.
> >
> > I've for now tried to keep other high-level loop-based interfaces the
> > same by simply using the loop preheader edge as entry where appropriate
> > (needing loop_preheader_edge cope with the loop root tree for simplicity).
> >
> > In the process I ran into issues with us too overly aggressive
> > instantiating random trees and thus I cut those down.  That part
> > doesn't successfully test separately (when I remove the strange
> > ARRAY_REF instantiation), so it's part of this patch.  I've also
> > run into an SSA verification fail (the id-27.f90 testcase) which
> > shows we _do_ need to clear the SCEV cache after introducing
> > the versioned CFG (and added a comment before it).
> >
> > On the previously failing testcases I've verified we produce
> > sensible instantiations for those pesky refs residing in "no" loop
> > in the SCOP and that we get away with the result in terms of
> > optimizing.
> >
> > SPEC 2k6 testing shows
> >
> > loop nest optimized: 311
> > loop nest not optimized, code generation error: 0
> > loop nest not optimized, optimized schedule is identical to original
> > schedule: 173
> > loop nest not optimized, optimization timed out: 59
> > loop nest not optimized, ISL signalled an error: 9
> > loop nest: 552
> >
> > for SPEC 2k6 and -floop-nest-optimize while adding -fgraphite-identity
> > still reveals some codegen errors:
> >
> > loop nest optimized: 437
> > loop nest not optimized, code generation error: 25
> > loop nest not optimized, optimized schedule is identical to original
> > schedule: 169
> > loop nest not optimized, optimization timed out: 60
> > loop nest not optimized, ISL signalled an error: 9
> > loop nest: 700
> >
> > Bootstrap and regtest in progress on x86_64-unknown-linux-gnu
> > (with and without -fgraphite-identity -floop-nest-optimize).
> >
> > Ok?
> >
> > Thanks,
> > Richard.
> >
> 
> > Index: gcc/tree-scalar-evolution.c
> > ===
> > --- gcc/tree-scalar-evolution.c (revision 253645)
> > +++ gcc/tree-scalar-evolution.c (working copy)
> > @@ -2344,7 +2348,7 @@ static tree instantiate_scev_r (basic_bl
> > instantiated, and to stop if it exceeds some limit.  */
> >
> >  static tree
> > -instantiate_scev_name (basic_block instantiate_below,
> > +instantiate_scev_name (edge instantiate_below,
> >struct loop *evolution_loop, struct loop *inner_loop,
> >tree chrec,
> >bool *fold_conversions,
> > @@ -2358,7 +2362,7 @@ instantiate_scev_name (basic_block insta
> >   evolutions in outer loops), nothing to do.  */
> >if (!def_bb
> >|| loop_depth (def_bb->loop_father) == 0
> > -  || dominated_by_p (CDI_DOMINATORS, instantiate_below, def_bb))
> > +  || ! dominated_by_p (CDI_DOMINATORS, def_bb, 
> > instantiate_below->dest))
> >  return chrec;
> >
> >/* We cache the value of instantiated variable to avoid exponential
> > @@ -2380,6 +2384,51 @@ instantiate_scev_name (basic_block insta
> >
> >def_loop = find_common_loop (evolution_loop, def_bb->loop_father);
> >
> > +  if (! dominated_by_p (CDI_DOMINATORS,
> > +   def_loop->header, instantiate_below->dest))
> > +{
> > +  gimple *def = SSA_NAME_DEF_STMT (chrec);
> > +  if (gassign *ass = dyn_cast  (def))
> > +   {
> > + switch (gimple_assign_rhs_class (ass))
> > +   {
> > +   case GIMPLE_UNARY_RHS:
> > + {
> > +   tree op0 = instantiate_scev_r (instantiate_below, 
> > evolution_loop,
> > +  inner_loop, 
> > gimple_assign_rhs1 (ass),
> > +  fold_conversions, size_expr);
> > +   if (op0 == chrec_dont_know)
> > + 

Re: [PATCH 2/2] PR libgcc/59714 complex division is surprising on aarch64

2017-10-12 Thread Richard Earnshaw
On 12/10/17 06:21, vladimir.mezent...@oracle.com wrote:
> From: Vladimir Mezentsev 
> 
> FMA (floating-point multiply-add) instructions are supported on aarch64.
> These instructions can produce different result if two operations executed 
> separately.
> -ffp-contract=off doesn't allow the FMA instructions.
> 
> Tested on aarch64-linux-gnu.
> No regression. Two failed tests now passed.
> 
> ChangeLog:
> 2017-10-11  Vladimir Mezentsev  
> 
> PR libgcc/59714
> * libgcc/config/aarch64/t-aarch64 (HOST_LIBGCC2_CFLAGS): Add -ffp-contract=off
> ---
>  libgcc/config/aarch64/t-aarch64 | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/libgcc/config/aarch64/t-aarch64 b/libgcc/config/aarch64/t-aarch64
> index 3af933c..e33bef0 100644
> --- a/libgcc/config/aarch64/t-aarch64
> +++ b/libgcc/config/aarch64/t-aarch64
> @@ -18,4 +18,5 @@
>  # along with GCC; see the file COPYING3.  If not see
>  # .
>  
> +HOST_LIBGCC2_CFLAGS += -ffp-contract=off
>  LIB2ADD += $(srcdir)/config/aarch64/sync-cache.c
> 

Why would we want to do this on AArch64 only?  If it's right for us,
then surely it would be right for everyone and the option should be
applied at the top level.

Hint: I'm not convinced on the evidence here that it is right across the
whole of libgcc.  Before moving forward on this particular PR I think we
need to understand what behaviour we do want out of the compiler.

R.


Re: [PATCH][GRAPHITE] Fix PR82451 (and PR82355 in a different way)

2017-10-12 Thread Bin.Cheng
On Wed, Oct 11, 2017 at 3:43 PM, Richard Biener  wrote:
>
> For PR82355 I introduced a fake dimension to ISL to allow CHRECs
> having an evolution in a loop that isn't fully part of the SESE
> region we are processing.  That was easier than fending off those
> CHRECs (without simply giving up on SESE regions with those).
>
> But it didn't fully solve the issue as PR82451 shows where we run
> into the issue that we eventually have to code-gen those
> evolutions and thus in theory need a canonical IV of that containing loop.
>
> So I decided (after Micha pressuring me a bit...) to revisit the
> original issue and make SCEV analysis "properly" handle SE regions.
> It turns out that it is mostly instantiate_scev lacking proper support
> plus the necessary interfacing change (really just cosmetic in some sense)
> from a instantiate_before basic-block to a instantiate_before edge.
>
> data-ref interfaces have been similarly adjusted, here changing
> the "loop nest" loop parameter to the entry edge for the SE region
> and passing that down accordingly.
>
> I've for now tried to keep other high-level loop-based interfaces the
> same by simply using the loop preheader edge as entry where appropriate
> (needing loop_preheader_edge cope with the loop root tree for simplicity).
>
> In the process I ran into issues with us too overly aggressive
> instantiating random trees and thus I cut those down.  That part
> doesn't successfully test separately (when I remove the strange
> ARRAY_REF instantiation), so it's part of this patch.  I've also
> run into an SSA verification fail (the id-27.f90 testcase) which
> shows we _do_ need to clear the SCEV cache after introducing
> the versioned CFG (and added a comment before it).
>
> On the previously failing testcases I've verified we produce
> sensible instantiations for those pesky refs residing in "no" loop
> in the SCOP and that we get away with the result in terms of
> optimizing.
>
> SPEC 2k6 testing shows
>
> loop nest optimized: 311
> loop nest not optimized, code generation error: 0
> loop nest not optimized, optimized schedule is identical to original
> schedule: 173
> loop nest not optimized, optimization timed out: 59
> loop nest not optimized, ISL signalled an error: 9
> loop nest: 552
>
> for SPEC 2k6 and -floop-nest-optimize while adding -fgraphite-identity
> still reveals some codegen errors:
>
> loop nest optimized: 437
> loop nest not optimized, code generation error: 25
> loop nest not optimized, optimized schedule is identical to original
> schedule: 169
> loop nest not optimized, optimization timed out: 60
> loop nest not optimized, ISL signalled an error: 9
> loop nest: 700
>
> Bootstrap and regtest in progress on x86_64-unknown-linux-gnu
> (with and without -fgraphite-identity -floop-nest-optimize).
>
> Ok?
>
> Thanks,
> Richard.
>

> Index: gcc/tree-scalar-evolution.c
> ===
> --- gcc/tree-scalar-evolution.c (revision 253645)
> +++ gcc/tree-scalar-evolution.c (working copy)
> @@ -2344,7 +2348,7 @@ static tree instantiate_scev_r (basic_bl
> instantiated, and to stop if it exceeds some limit.  */
>
>  static tree
> -instantiate_scev_name (basic_block instantiate_below,
> +instantiate_scev_name (edge instantiate_below,
>struct loop *evolution_loop, struct loop *inner_loop,
>tree chrec,
>bool *fold_conversions,
> @@ -2358,7 +2362,7 @@ instantiate_scev_name (basic_block insta
>   evolutions in outer loops), nothing to do.  */
>if (!def_bb
>|| loop_depth (def_bb->loop_father) == 0
> -  || dominated_by_p (CDI_DOMINATORS, instantiate_below, def_bb))
> +  || ! dominated_by_p (CDI_DOMINATORS, def_bb, instantiate_below->dest))
>  return chrec;
>
>/* We cache the value of instantiated variable to avoid exponential
> @@ -2380,6 +2384,51 @@ instantiate_scev_name (basic_block insta
>
>def_loop = find_common_loop (evolution_loop, def_bb->loop_father);
>
> +  if (! dominated_by_p (CDI_DOMINATORS,
> +   def_loop->header, instantiate_below->dest))
> +{
> +  gimple *def = SSA_NAME_DEF_STMT (chrec);
> +  if (gassign *ass = dyn_cast  (def))
> +   {
> + switch (gimple_assign_rhs_class (ass))
> +   {
> +   case GIMPLE_UNARY_RHS:
> + {
> +   tree op0 = instantiate_scev_r (instantiate_below, 
> evolution_loop,
> +  inner_loop, gimple_assign_rhs1 
> (ass),
> +  fold_conversions, size_expr);
> +   if (op0 == chrec_dont_know)
> + return chrec_dont_know;
> +   res = fold_build1 (gimple_assign_rhs_code (ass),
> +  TREE_TYPE (chrec), op0);
> +   break;
> + }
> +   case GIMPLE_BINARY_RHS:
> + {
> +   

[PATCH][GRAPHITE] Fix PR82525

2017-10-12 Thread Richard Biener

The following avoids code-generation errors for modulo operations 
resulting from our own constraints ending up as no-ops because
the type we code-generate in already imposes the modulo operation.

For the case in SPEC 2k6 triggering this we'd even know the
modulo constraint isn't necessary - we have

 int64_t lower, upper;
 if (lower < upper)
   uint64_t niter = (uint64_t)upper - (uint64_t)lower;

but there's no way to represent in GIMPLE that subtracting
the two signed values will yield a positive value fitting
in the corresponding unsigned type...  We'd need sth
like a MINUSU_EXPR (like the often proposed ABSU_EXPR or
the proposed POINTER_DIFF_EXPR).

This fixes the last code generation errors with SPEC CPU 2006
and -fgraphite-identity -floop-nest-optimize.

loop nest optimized: 483
loop nest not optimized, code generation error: 0
loop nest not optimized, optimized schedule is identical to original 
schedule: 173
loop nest not optimized, optimization timed out: 60
loop nest not optimized, ISL signalled an error: 9
loop nest: 725

Note that we still (with and without this patch) get miscompares
in 465.tonto, 416.gamess and 403.gcc (we have those "wrong"
constraint thing leading to empty domains if you remember).

Bootstrap and regtest running on x86_64-unknown-linux-gnu, ok?

Thanks,
Richard.

2017-10-12  Richard Biener  

PR tree-optimization/82525
* graphite-isl-ast-to-gimple.c
(translate_isl_ast_to_gimple::widest_int_from_isl_expr_int): Split
out from ...
(translate_isl_ast_to_gimple::gcc_expression_from_isl_expr_int): Here.
Fail code generation when we cannot represent the isl integer.
(binary_op_to_tree): Elide modulo operations that are no-ops
in the type we code generate.  Remove now superfluous code
generation errors.

* gcc.dg/graphite/id-30.c: New testcase.
* gfortran.dg/graphite/id-28.f90: Likewise.

Index: gcc/graphite-isl-ast-to-gimple.c
===
--- gcc/graphite-isl-ast-to-gimple.c(revision 253645)
+++ gcc/graphite-isl-ast-to-gimple.c(working copy)
@@ -177,6 +177,7 @@ class translate_isl_ast_to_gimple
   tree gcc_expression_from_isl_ast_expr_id (tree type,
__isl_keep isl_ast_expr *expr_id,
ivs_params );
+  widest_int widest_int_from_isl_expr_int (__isl_keep isl_ast_expr *expr);
   tree gcc_expression_from_isl_expr_int (tree type,
 __isl_take isl_ast_expr *expr);
   tree gcc_expression_from_isl_expr_op (tree type,
@@ -265,29 +266,46 @@ gcc_expression_from_isl_ast_expr_id (tre
   return fold_convert (type, *val);
 }
 
-/* Converts an isl_ast_expr_int expression E to a GCC expression tree of
-   type TYPE.  */
+/* Converts an isl_ast_expr_int expression E to a widest_int.
+   Raises a code generation error when the constant doesn't fit.  */
 
-tree translate_isl_ast_to_gimple::
-gcc_expression_from_isl_expr_int (tree type, __isl_take isl_ast_expr *expr)
+widest_int translate_isl_ast_to_gimple::
+widest_int_from_isl_expr_int (__isl_keep isl_ast_expr *expr)
 {
   gcc_assert (isl_ast_expr_get_type (expr) == isl_ast_expr_int);
   isl_val *val = isl_ast_expr_get_val (expr);
   size_t n = isl_val_n_abs_num_chunks (val, sizeof (HOST_WIDE_INT));
   HOST_WIDE_INT *chunks = XALLOCAVEC (HOST_WIDE_INT, n);
-  tree res;
-  if (isl_val_get_abs_num_chunks (val, sizeof (HOST_WIDE_INT), chunks) == -1)
-res = NULL_TREE;
-  else
+  if (n > WIDE_INT_MAX_ELTS
+  || isl_val_get_abs_num_chunks (val, sizeof (HOST_WIDE_INT), chunks) == 
-1)
 {
-  widest_int wi = widest_int::from_array (chunks, n, true);
-  if (isl_val_is_neg (val))
-   wi = -wi;
-  res = wide_int_to_tree (type, wi);
+  isl_val_free (val);
+  set_codegen_error ();
+  return 0;
 }
+  widest_int wi = widest_int::from_array (chunks, n, true);
+  if (isl_val_is_neg (val))
+wi = -wi;
   isl_val_free (val);
+  return wi;
+}
+
+/* Converts an isl_ast_expr_int expression E to a GCC expression tree of
+   type TYPE.  Raises a code generation error when the constant doesn't fit.  
*/
+
+tree translate_isl_ast_to_gimple::
+gcc_expression_from_isl_expr_int (tree type, __isl_take isl_ast_expr *expr)
+{
+  widest_int wi = widest_int_from_isl_expr_int (expr);
   isl_ast_expr_free (expr);
-  return res;
+  if (codegen_error_p ())
+return NULL_TREE;
+  if (wi::min_precision (wi, TYPE_SIGN (type)) > TYPE_PRECISION (type))
+{
+  set_codegen_error ();
+  return NULL_TREE;
+}
+  return wide_int_to_tree (type, wi);
 }
 
 /* Converts a binary isl_ast_expr_op expression E to a GCC expression tree of
@@ -296,14 +314,25 @@ gcc_expression_from_isl_expr_int (tree t
 tree translate_isl_ast_to_gimple::
 binary_op_to_tree (tree type, __isl_take isl_ast_expr *expr, ivs_params )
 {
+  enum isl_ast_op_type expr_type = 

Re: [PATCH][RFC] Instrument function exit with __builtin_unreachable in C++.

2017-10-12 Thread Jakub Jelinek
On Thu, Oct 12, 2017 at 10:40:42AM +0200, Martin Liška wrote:
> --- a/gcc/cp/constexpr.c
> +++ b/gcc/cp/constexpr.c
> @@ -1175,7 +1175,12 @@ cxx_eval_builtin_function_call (const constexpr_ctx 
> *ctx, tree t, tree fun,
>   {
> new_call = build_call_array_loc (EXPR_LOCATION (t), TREE_TYPE (t),
>  CALL_EXPR_FN (t), nargs, args);
> -   error ("%q+E is not a constant expression", new_call);
> +
> +   /* Do not allow__builtin_unreachable in constexpr function.  */
> +   if (DECL_FUNCTION_CODE (fun) == BUILT_IN_UNREACHABLE)

As I said earlier, I think it would be better to differentiate between
explicit __builtin_unreachable and the implicitly added one from the patch.
So this could be done as
if (DECL_FUNCTION_CODE (fun) == BUILT_IN_UNREACHABLE
&& EXPR_LOCATION (t) == BUILTINS_LOCATION)

> +  location_t loc = DECL_SOURCE_LOCATION (fndecl);
> +  if (sanitize_flags_p (SANITIZE_RETURN, fndecl))
> +t = ubsan_instrument_return (loc);
> +  else
> +t = build_call_expr_loc (loc, builtin_decl_explicit 
> (BUILT_IN_UNREACHABLE),

and here use BUILTINS_LOCATION instead of loc.
The code might be more readable by doing:
{
  tree fndecl = builtin_decl_explicit (BUILT_IN_UNREACHABLE);
  t = build_call_expr_loc (BUILTINS_LOCATION, fndecl, 0);
}

> +  0);
> +

Jakub


Re: [PATCH][RFC] Instrument function exit with __builtin_unreachable in C++.

2017-10-12 Thread Martin Liška
On 10/11/2017 04:59 PM, Jason Merrill wrote:
> On Thu, Oct 5, 2017 at 12:53 PM, Martin Liška  wrote:
>> On 10/05/2017 05:07 PM, Jason Merrill wrote:
>>> On Thu, Oct 5, 2017 at 6:31 AM, Martin Liška  wrote:
 As discussed 2 days ago on IRC with Jakub and Jonathan, C++ standard says
 that having a non-return
 function with missing return statement is undefined behavior. We've got
 -fsanitize=return check for
 that and we can in such case instrument __builtin_unreachable(). This
 patch does that.
>>>
>>>
>>> Great.
>>>
 And there's still some fallout:

 FAIL: g++.dg/cpp0x/constexpr-diag3.C  -std=c++11  (test for errors, line
 7)
 FAIL: g++.dg/cpp0x/constexpr-neg3.C  -std=c++11  (test for errors, line
 12)
 FAIL: g++.dg/cpp1y/constexpr-return2.C  -std=c++14  (test for errors,
 line 7)
 FAIL: g++.dg/cpp1y/constexpr-return2.C  -std=c++14 (test for excess
 errors)
 FAIL: g++.dg/cpp1y/pr63996.C  -std=c++14  (test for errors, line 9)
 FAIL: g++.dg/cpp1y/pr63996.C  -std=c++14 (test for excess errors)

 1) there are causing:

 ./xg++ -B.
 /home/marxin/Programming/gcc/gcc/testsuite/g++.dg/cpp1y/pr63996.C
 /home/marxin/Programming/gcc/gcc/testsuite/g++.dg/cpp1y/pr63996.C:9:23:
 in constexpr expansion of ‘foo(1)’
 /home/marxin/Programming/gcc/gcc/testsuite/g++.dg/cpp1y/pr63996.C:4:1:
 error: ‘__builtin_unreachable()’ is not a constant expression
   foo (int i)
   ^~~

 Where we probably should not emit the built-in call. Am I right?
>>>
>>>
>>> Or constexpr.c could give a more friendly diagnostic for
>>> __builtin_unreachable.  It's correct to give a diagnostic here for
>>> this testcase.
>>
>>
>> Hi.
>>
>> That's good idea, any suggestion different from:
>>
>> ./xg++ -B.
>> /home/marxin/Programming/gcc2/gcc/testsuite/g++.dg/cpp1y/pr63996.C
>> /home/marxin/Programming/gcc2/gcc/testsuite/g++.dg/cpp1y/pr63996.C:9:23:
>> in constexpr expansion of ‘foo(1)’
>> : error: constexpr can't contain call of a non-return function
>> ‘__builtin_unreachable’
>>
>> which is probably misleading as it points to a function call that is
>> actually missing in source code.
>> Should we distinguish between implicit and explicit __builtin_unreachable?
> 
> Probably without your change the constexpr code already diagnoses the
> missing return as "constexpr call flows off the end of the function";
> that same message seems appropriate.

Hello.

Ok, I've done that. Sending patch candidate.

> 
>> So turning on the warning by default for c++, we get about 500 failing 
>> test-cases. Uf :/
> 
> Yes, we've been sloppy about this in the testsuite.  :(

I'll fix it. For being sure, I'm sending first part where I demonstrate how I 
plan to change
tests, with following preference:

1) change function to void type if possible
2) return a default value for functions not returning a value (for complex 
return value of a type A:
   I do { static A a; return a; } is it appropriate?

Thanks for feedback. If it's fine, then I'll carry on.


Martin

> 
> Jason
> 

>From 56c6520d86cba32d04cdbd8873243a7ab801975f Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 12 Oct 2017 10:14:59 +0200
Subject: [PATCH] Instrument function exit with __builtin_unreachable in C++

gcc/c-family/ChangeLog:

2017-10-12  Martin Liska  

	PR middle-end/82404
	* c-opts.c (c_common_post_options): Set -Wreturn-type for C++
	FE.
	* c.opt: Set default value of warn_return_type.

gcc/cp/ChangeLog:

2017-10-12  Martin Liska  

	PR middle-end/82404
	* constexpr.c (cxx_eval_builtin_function_call): Handle
	__builtin_unreachable call.
	* cp-gimplify.c (cp_ubsan_maybe_instrument_return): Rename to
	...
	(cp_maybe_instrument_return): ... this.
	(cp_genericize): Call the function unconditionally.

gcc/fortran/ChangeLog:

2017-10-12  Martin Liska  

	PR middle-end/82404
	* options.c (gfc_post_options): Set default value of
	-Wreturn-type to false.
---
 gcc/c-family/c-opts.c |  3 +++
 gcc/c-family/c.opt|  2 +-
 gcc/cp/constexpr.c|  7 ++-
 gcc/cp/cp-gimplify.c  | 18 --
 gcc/fortran/options.c |  3 +++
 5 files changed, 25 insertions(+), 8 deletions(-)

diff --git a/gcc/c-family/c-opts.c b/gcc/c-family/c-opts.c
index 3662aa37be6..afea6a3dca3 100644
--- a/gcc/c-family/c-opts.c
+++ b/gcc/c-family/c-opts.c
@@ -978,6 +978,9 @@ c_common_post_options (const char **pfilename)
 	flag_extern_tls_init = 1;
 }
 
+  if (warn_return_type == -1)
+warn_return_type = c_dialect_cxx () ? 1 : 0;
+
   if (num_in_fnames > 1)
 error ("too many filenames given.  Type %s --help for usage",
 	   progname);
diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 13d2a59b8a5..e26fba734c0 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -960,7 +960,7 @@ C++ ObjC++ Var(warn_reorder) Warning LangEnabledBy(C++ ObjC++,Wall)
 Warn when the compiler reorders code.
 
 

Re: [PATCH] Improve x86 and + rotate (PR target/82498)

2017-10-12 Thread Uros Bizjak
On Thu, Oct 12, 2017 at 10:32 AM, Jakub Jelinek  wrote:
> On Thu, Oct 12, 2017 at 08:32:32AM +0200, Uros Bizjak wrote:
>> On Wed, Oct 11, 2017 at 10:59 PM, Jakub Jelinek  wrote:
>> > As can be seen on the testcase below, the *3_mask
>> > insn/splitter is able to optimize only the case when the and is
>> > performed in SImode and then the result subreged into QImode,
>> > while if the computation is already in QImode, we don't handle it.
>> >
>> > Fixed by adding another pattern, bootstrapped/regtested on x86_64-linux and
>> > i686-linux, ok for trunk?
>>
>> We probably want to add this variant to *all* *_mask splitters (there
>> are a few of them in i386.md, please grep for "Avoid useless
>
> Well, not all of them, just the 5 ones that have (subreg:QI (and:SI ...)),
> *jcc_bt_mask expects SImode argument, so doesn't have this kind of
> problem.
>
>> masking"). Which finally begs a question - should we implement this
>> simplification in a generic, target-independent way? OTOH, we already
>
> The target independent way is SHIFT_COUNT_TRUNCATED, but that isn't
> applicable to targets which aren't consistent in their behavior across
> the whole ISA.  x86 has some instructions that truncate the mask and others
> that saturate and others (b* with memory operand) that use something
> still different.
> So we need to apply it only to the instructions which really truncate
> the shift counts and the best infrastructure for that I'm aware is the
> combiner.
>
> In order to have just one set of the *_mask patterns we'd need to
> change simplify_truncation to canonicalize
> (subreg:QI (and:SI (something:SI) (const_int N)) low)
> into
> (and:QI (subreg:QI (something:SI) low) (const_int lowN))
> and change all these patterns; but I'm afraid that is going to break almost
> all WORD_REGISTER_OPERATIONS targets and various others, those actually
> perform all the operations in word mode and that is why the first form
> is actually preferred for them.  Except that if something is already
> QImode there is no need for the subreg at all, which is why we I'm afraid
> need the two sets of mask patterns.
>
> So, if you aren't against it, I can extend the patch to handle the 4
> other mask patterns; as for other modes, SImode is what is being handled
> already, DImode is not a problem, because the FEs truncate the shift counts
> to integer_type_node already, and for HImode I haven't seen problem
> probably because most tunings avoid HImode math and so it isn't worth
> optimizing.

OK, I think that we can live wtih 4 new patterns. Since these are all
written in the same way (as in the patch you posted), the ammended
patch is pre-approved for mainline.

Thanks,
Uros.


Re: [PATCH] Improve x86 and + rotate (PR target/82498)

2017-10-12 Thread Jakub Jelinek
On Thu, Oct 12, 2017 at 08:32:32AM +0200, Uros Bizjak wrote:
> On Wed, Oct 11, 2017 at 10:59 PM, Jakub Jelinek  wrote:
> > As can be seen on the testcase below, the *3_mask
> > insn/splitter is able to optimize only the case when the and is
> > performed in SImode and then the result subreged into QImode,
> > while if the computation is already in QImode, we don't handle it.
> >
> > Fixed by adding another pattern, bootstrapped/regtested on x86_64-linux and
> > i686-linux, ok for trunk?
> 
> We probably want to add this variant to *all* *_mask splitters (there
> are a few of them in i386.md, please grep for "Avoid useless

Well, not all of them, just the 5 ones that have (subreg:QI (and:SI ...)),
*jcc_bt_mask expects SImode argument, so doesn't have this kind of
problem.

> masking"). Which finally begs a question - should we implement this
> simplification in a generic, target-independent way? OTOH, we already

The target independent way is SHIFT_COUNT_TRUNCATED, but that isn't
applicable to targets which aren't consistent in their behavior across
the whole ISA.  x86 has some instructions that truncate the mask and others
that saturate and others (b* with memory operand) that use something
still different.
So we need to apply it only to the instructions which really truncate
the shift counts and the best infrastructure for that I'm aware is the
combiner.

In order to have just one set of the *_mask patterns we'd need to
change simplify_truncation to canonicalize
(subreg:QI (and:SI (something:SI) (const_int N)) low)
into
(and:QI (subreg:QI (something:SI) low) (const_int lowN))
and change all these patterns; but I'm afraid that is going to break almost
all WORD_REGISTER_OPERATIONS targets and various others, those actually
perform all the operations in word mode and that is why the first form
is actually preferred for them.  Except that if something is already
QImode there is no need for the subreg at all, which is why we I'm afraid
need the two sets of mask patterns.

So, if you aren't against it, I can extend the patch to handle the 4
other mask patterns; as for other modes, SImode is what is being handled
already, DImode is not a problem, because the FEs truncate the shift counts
to integer_type_node already, and for HImode I haven't seen problem
probably because most tunings avoid HImode math and so it isn't worth
optimizing.

> have SHIFT_COUNT_TRUNCATED and shift_truncation_mask hooks, but last
> time I try the former, there were some problems in the testsuite on
> x86. I guess there are several targets that would benefit from
> removing useless masking of count operands.
> 
> Uros.
> 
> > 2017-10-11  Jakub Jelinek  
> >
> > PR target/82498
> > * config/i386/i386.md (*3_mask_1): New
> > define_insn_and_split.
> >
> > * gcc.target/i386/pr82498.c: New test.

Jakub


RE: 0001-Part-1.-Add-generic-part-for-Intel-CET-enabling

2017-10-12 Thread Tsimbalist, Igor V
> Seems reasonable.  As a result something like
> check_missing_nocf_check_attribute is going to just go away along with the
> code in *-typeck.c which called it, right?  If so that seems like a nice 
> cleanup.
Yes, you are right.

Updated patch is attached.

Igor

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Jeff Law
> Sent: Thursday, October 12, 2017 8:07 AM
> To: Tsimbalist, Igor V ; gcc-
> patc...@gcc.gnu.org
> Cc: richard.guent...@gmail.com
> Subject: Re: 0001-Part-1.-Add-generic-part-for-Intel-CET-enabling
> 
> On 10/05/2017 04:20 AM, Tsimbalist, Igor V wrote:
> > I would like to implement the patch in a bit different way depending
> > on answers I will get for my following proposals:
> >
> > - I propose to make a type with 'nocf_check' attribute to be different from
> type w/o the attribute.
> >The reason is that the type with 'nocf_check' attribute implies different
> code generation. It will be
> >done by setting affects_type_identity field to true for the attribute. 
> > That
> in turn will trigger
> >needed or expected type checking;
> Seems reasonable.  As a result something like
> check_missing_nocf_check_attribute is going to just go away along with the
> code in *-typeck.c which called it, right?  If so that seems like a nice 
> cleanup.
> 
> 
> >
> > - I propose to ignore the 'nocf_check' attribute if 'fcf-protection' option 
> > is
> not specified and output
> >the warning about this. If there is no instrumentation the type with
> attribute should not be treated
> >differently from type w/o the attribute (see previous item) and should
> not be recorded into the
> >type.
> Seems reasonable.
> >
> > If it's ok, please ignore my previous patch (version#3) and I will post the
> updated patch shortly.
> OK.  FWIW, I think we're ready to ACK on this.  So get it posted :-)
> 
> jeff


0001-Add-generic-part-for-Intel-CET-enabling-fcf-protecti.patch
Description: 0001-Add-generic-part-for-Intel-CET-enabling-fcf-protecti.patch


Re: [PATCH] Improve x86 and + rotate (PR target/82498)

2017-10-12 Thread Uros Bizjak
On Thu, Oct 12, 2017 at 8:39 AM, Uros Bizjak  wrote:
> On Thu, Oct 12, 2017 at 8:32 AM, Uros Bizjak  wrote:
>> On Wed, Oct 11, 2017 at 10:59 PM, Jakub Jelinek  wrote:
>>> Hi!
>>>
>>> As can be seen on the testcase below, the *3_mask
>>> insn/splitter is able to optimize only the case when the and is
>>> performed in SImode and then the result subreged into QImode,
>>> while if the computation is already in QImode, we don't handle it.
>>>
>>> Fixed by adding another pattern, bootstrapped/regtested on x86_64-linux and
>>> i686-linux, ok for trunk?
>>
>> We probably want to add this variant to *all* *_mask splitters (there
>> are a few of them in i386.md, please grep for "Avoid useless
>> masking"). Which finally begs a question - should we implement this
>> simplification in a generic, target-independent way? OTOH, we already
>> have SHIFT_COUNT_TRUNCATED and shift_truncation_mask hooks, but last
>> time I try the former, there were some problems in the testsuite on
>> x86. I guess there are several targets that would benefit from
>> removing useless masking of count operands.
>
> Oh, and there is a strange x86 exception in the comment for
> SHIFT_COUNT_TRUNCATED. I'm not sure what "(real or pretended)
> bit-field operation" means, but variable-count BT instruction with
> non-memory operand (we never generate variable-count BTx with memory
> operand) masks its count operand as well.

I forgot that SSE shifts don't truncate their count operand. This is
the reason that the removal of mask is implemented in the *.md file,
but it would really be nice if the same functionality can be achieved
in a more generic way, without pattern explosion.

Uros.


Re: [PATCH] Improve x86 and + rotate (PR target/82498)

2017-10-12 Thread Uros Bizjak
On Thu, Oct 12, 2017 at 8:32 AM, Uros Bizjak  wrote:
> On Wed, Oct 11, 2017 at 10:59 PM, Jakub Jelinek  wrote:
>> Hi!
>>
>> As can be seen on the testcase below, the *3_mask
>> insn/splitter is able to optimize only the case when the and is
>> performed in SImode and then the result subreged into QImode,
>> while if the computation is already in QImode, we don't handle it.
>>
>> Fixed by adding another pattern, bootstrapped/regtested on x86_64-linux and
>> i686-linux, ok for trunk?
>
> We probably want to add this variant to *all* *_mask splitters (there
> are a few of them in i386.md, please grep for "Avoid useless
> masking"). Which finally begs a question - should we implement this
> simplification in a generic, target-independent way? OTOH, we already
> have SHIFT_COUNT_TRUNCATED and shift_truncation_mask hooks, but last
> time I try the former, there were some problems in the testsuite on
> x86. I guess there are several targets that would benefit from
> removing useless masking of count operands.

Oh, and there is a strange x86 exception in the comment for
SHIFT_COUNT_TRUNCATED. I'm not sure what "(real or pretended)
bit-field operation" means, but variable-count BT instruction with
non-memory operand (we never generate variable-count BTx with memory
operand) masks its count operand as well.

Uros.


Re: [PATCH] Improve x86 and + rotate (PR target/82498)

2017-10-12 Thread Uros Bizjak
On Wed, Oct 11, 2017 at 10:59 PM, Jakub Jelinek  wrote:
> Hi!
>
> As can be seen on the testcase below, the *3_mask
> insn/splitter is able to optimize only the case when the and is
> performed in SImode and then the result subreged into QImode,
> while if the computation is already in QImode, we don't handle it.
>
> Fixed by adding another pattern, bootstrapped/regtested on x86_64-linux and
> i686-linux, ok for trunk?

We probably want to add this variant to *all* *_mask splitters (there
are a few of them in i386.md, please grep for "Avoid useless
masking"). Which finally begs a question - should we implement this
simplification in a generic, target-independent way? OTOH, we already
have SHIFT_COUNT_TRUNCATED and shift_truncation_mask hooks, but last
time I try the former, there were some problems in the testsuite on
x86. I guess there are several targets that would benefit from
removing useless masking of count operands.

Uros.

> 2017-10-11  Jakub Jelinek  
>
> PR target/82498
> * config/i386/i386.md (*3_mask_1): New
> define_insn_and_split.
>
> * gcc.target/i386/pr82498.c: New test.
>
> --- gcc/config/i386/i386.md.jj  2017-10-10 11:54:11.0 +0200
> +++ gcc/config/i386/i386.md 2017-10-11 19:24:27.673606778 +0200
> @@ -11187,6 +11187,26 @@ (define_insn_and_split "*(clobber (reg:CC FLAGS_REG))])]
>"operands[2] = gen_lowpart (QImode, operands[2]);")
>
> +(define_insn_and_split "*3_mask_1"
> +  [(set (match_operand:SWI48 0 "nonimmediate_operand")
> +   (any_rotate:SWI48
> + (match_operand:SWI48 1 "nonimmediate_operand")
> + (and:QI
> +   (match_operand:QI 2 "register_operand")
> +   (match_operand:QI 3 "const_int_operand"
> +   (clobber (reg:CC FLAGS_REG))]
> +  "ix86_binary_operator_ok (, mode, operands)
> +   && (INTVAL (operands[3]) & (GET_MODE_BITSIZE (mode)-1))
> +  == GET_MODE_BITSIZE (mode)-1
> +   && can_create_pseudo_p ()"
> +  "#"
> +  "&& 1"
> +  [(parallel
> + [(set (match_dup 0)
> +  (any_rotate:SWI48 (match_dup 1)
> +(match_dup 2)))
> +  (clobber (reg:CC FLAGS_REG))])])
> +
>  ;; Implement rotation using two double-precision
>  ;; shift instructions and a scratch register.
>
> --- gcc/testsuite/gcc.target/i386/pr82498.c.jj  2017-10-11 20:21:10.677088306 
> +0200
> +++ gcc/testsuite/gcc.target/i386/pr82498.c 2017-10-11 20:22:31.569101564 
> +0200
> @@ -0,0 +1,52 @@
> +/* PR target/82498 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mtune=generic -masm=att" } */
> +/* { dg-final { scan-assembler-not {\mand[bwlq]\M} } } */
> +
> +unsigned
> +f1 (unsigned x, unsigned char y)
> +{
> +  if (y == 0)
> +return x;
> +  y &= __CHAR_BIT__ * __SIZEOF_INT__ - 1;
> +  return (x << y) | (x >> (__CHAR_BIT__ * __SIZEOF_INT__ - y));
> +}
> +
> +unsigned
> +f2 (unsigned x, unsigned y)
> +{
> +  if (y == 0)
> +return x;
> +  y &= __CHAR_BIT__ * __SIZEOF_INT__ - 1;
> +  return (x << y) | (x >> (__CHAR_BIT__ * __SIZEOF_INT__ - y));
> +}
> +
> +unsigned
> +f3 (unsigned x, unsigned short y)
> +{
> +  if (y == 0)
> +return x;
> +  y &= __CHAR_BIT__ * __SIZEOF_INT__ - 1;
> +  return (x << y) | (x >> (__CHAR_BIT__ * __SIZEOF_INT__ - y));
> +}
> +
> +unsigned
> +f4 (unsigned x, unsigned char y)
> +{
> +  y &= __CHAR_BIT__ * __SIZEOF_INT__ - 1;
> +  return (x << y) | (x >> (-y & (__CHAR_BIT__ * __SIZEOF_INT__ - 1)));
> +}
> +
> +unsigned
> +f5 (unsigned x, unsigned int y)
> +{
> +  y &= __CHAR_BIT__ * __SIZEOF_INT__ - 1;
> +  return (x << y) | (x >> (-y & (__CHAR_BIT__ * __SIZEOF_INT__ - 1)));
> +}
> +
> +unsigned
> +f6 (unsigned x, unsigned short y)
> +{
> +  y &= __CHAR_BIT__ * __SIZEOF_INT__ - 1;
> +  return (x << y) | (x >> (-y & (__CHAR_BIT__ * __SIZEOF_INT__ - 1)));
> +}
>
> Jakub


Re: Make more use of subreg_lowpart_offset

2017-10-12 Thread Jeff Law
On 08/23/2017 04:51 AM, Richard Sandiford wrote:
> This patch uses subreg_lowpart_offset in places that open-coded
> the calculation.  It also uses it in regcprop.c to test whether,
> after a mode change, the first register in a multi-register group
> is still the right one.
> 
> Tested on aarch64-linux-gnu and x86_64-linux-gnu, and by making sure
> that there were no differences in testsuite assembly output for one
> target per CPU.  OK to install?
> 
> Richard
> 
> 
> 2017-08-23  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * calls.c (expand_call): Use subreg_lowpart_offset.
>   * cse.c (cse_insn): Likewise.
>   * regcprop.c (copy_value): Likewise.
>   (copyprop_hardreg_forward_1): Likewise.
It was far from obvious that the open coded versions actually
corresponded to subreg_lowpart_offset.  Sorry it took so long.

OK.
jeff


Re: 0001-Part-1.-Add-generic-part-for-Intel-CET-enabling

2017-10-12 Thread Jeff Law
On 10/05/2017 04:20 AM, Tsimbalist, Igor V wrote:
> I would like to implement the patch in a bit different way depending on 
> answers I will get for
> my following proposals:
> 
> - I propose to make a type with 'nocf_check' attribute to be different from 
> type w/o the attribute.
>The reason is that the type with 'nocf_check' attribute implies different 
> code generation. It will be
>done by setting affects_type_identity field to true for the attribute. 
> That in turn will trigger
>needed or expected type checking;
Seems reasonable.  As a result something like
check_missing_nocf_check_attribute is going to just go away along with
the code in *-typeck.c which called it, right?  If so that seems like a
nice cleanup.


> 
> - I propose to ignore the 'nocf_check' attribute if 'fcf-protection' option 
> is not specified and output
>the warning about this. If there is no instrumentation the type with 
> attribute should not be treated
>differently from type w/o the attribute (see previous item) and should not 
> be recorded into the
>type.
Seems reasonable.
> 
> If it's ok, please ignore my previous patch (version#3) and I will post the 
> updated patch shortly.
OK.  FWIW, I think we're ready to ACK on this.  So get it posted :-)

jeff