Re: [PATCH] PR fortran/83998 -- fix dot_product on 0-sized arrays

2018-01-24 Thread Thomas Koenig

Hi Steve,

I have a couple of questions before I have to hurry off to work:

First, why is


@@ -2253,22 +2253,19 @@ gfc_simplify_dim (gfc_expr *x, gfc_expr *y)
  gfc_expr*
  gfc_simplify_dot_product (gfc_expr *vector_a, gfc_expr *vector_b)
  {
+  /* If vector_a is a zero-sized array, the result is 0 for INTEGER,
+ REAL, and COMPLEX types and .false. for LOGICAL.  */
+  if (vector_a->shape && mpz_get_si (vector_a->shape[0]) == 0)
+{
+  if (vector_a->ts.type == BT_LOGICAL)
+   return gfc_get_logical_expr (gfc_default_logical_kind, NULL, false);
+  else
+   return gfc_get_int_expr (gfc_default_integer_kind, NULL, 0);
+}


in front of


-  gfc_expr temp;
-
if (!is_constant_array_expr (vector_a)
|| !is_constant_array_expr (vector_b))
  return NULL;


and / or why is the test only done for one variable?

Second, why do you remove this


-  temp.value.op.op = INTRINSIC_NONE;
-  temp.value.op.op1 = vector_a;
-  temp.value.op.op2 = vector_b;
-  gfc_type_convert_binary (, 1);


block of code? What would happen for code like

  integer, dimension(2), parameter :: a = [ 1,2]
  real, dimension(2), parameter :: b = [1.0,2.0]
  real, parameter :: c = dot_product(a,b)

?

Regards

Thomas



Re: New istreambuf_iterator debug check

2018-01-24 Thread Petr Ovtchenkov
On Wed, 24 Jan 2018 21:34:48 +0100
François Dumont  wrote:

> On 24/01/2018 18:53, Petr Ovtchenkov wrote:
> > On Wed, 24 Jan 2018 17:39:59 +0100
> > François Dumont  wrote:
> >
> >> Hi
> >>
> >>       I'd like to propose this new debug check. Comparing with non-eos
> >> istreambuf_iterator sounds like an obvious coding mistake.
> >>
> >>       I propose it despite the stage 1 as it is just a new debug check,
> >> it doesn't impact the lib in normal mode.
> >>
> >>       Tested under Linux x86_64, ok to commit ?
> >>
> >> François
> >>
> > bool
> > equal(const istreambuf_iterator& __b) const
> > -  { return _M_at_eof() == __b._M_at_eof(); }
> > +  {
> > +   bool __this_at_eof = _M_at_eof();
> > +   bool __b_at_eof = __b._M_at_eof();
> > +
> > +   __glibcxx_requires_cond(__this_at_eof || __b_at_eof, _M_message(
> > + "Abnormal comparison to non-end-of-stream istreambuf_iterator"));
> > +   return __this_at_eof == __b_at_eof;
> > +  }
> >
> > Looks strange for me. It is legal and possible that istreambuf_iterator
> > will be in EOF state.
> >
> Sure, but consider rather the associated 3_neg.cc showing the debug 
> check purpose:
> 
>    cistreambuf_iter it1(istrs), it2(istrs);
>    it1 == it2; // No sens
> 

This is what author want to say.

Neveretheless, __glibcxx_requires_cond(__this_at_eof || __b_at_eof, ...
in equal looks bogus for me.

--

  - ptr


libbacktrace patch committed: Fix setting str_size on PE/COFF

2018-01-24 Thread Ian Lance Taylor
This libbacktrace patch fixes the setting of str_size on PE/COFF to
not leave some bytes uninitialized on a 64-bit host.  Committed to
mainline.

Ian

2018-01-24  Ian Lance Taylor  

* pecoff.c (coff_add): Use coff_read4, not memcpy.
Index: pecoff.c
===
--- pecoff.c(revision 257038)
+++ pecoff.c(working copy)
@@ -727,7 +727,7 @@ coff_add (struct backtrace_state *state,
goto fail;
   syms_view_valid = 1;
 
-  memcpy (_size, syms_view.data + syms_size, 4);
+  str_size = coff_read4 (syms_view.data + syms_size);
 
   str_off = syms_off + syms_size;
 


libbacktrace patch committed: Only keep 16 entries on free list

2018-01-24 Thread Ian Lance Taylor
PR 68239 points out that libbacktrace can sometimes take a long time
scanning the list of free memory blocks looking for one that is large
enough.  Since the libbacktrace memory allocator does not have to be
perfect in practice, only keep the 16 largest entries on the free
list.  Bootstrapped and ran libbacktrace and libgo tests on
x86_64-pc-linux-gnu.  Committed to mainline.

Ian

2018-01-24  Ian Lance Taylor  

PR other/68239
* mmap.c (backtrace_free_locked): Don't put more than 16 entries
on the free list.
Index: mmap.c
===
--- mmap.c  (revision 257038)
+++ mmap.c  (working copy)
@@ -69,11 +69,33 @@ struct backtrace_freelist_struct
 static void
 backtrace_free_locked (struct backtrace_state *state, void *addr, size_t size)
 {
-  /* Just leak small blocks.  We don't have to be perfect.  */
+  /* Just leak small blocks.  We don't have to be perfect.  Don't put
+ more than 16 entries on the free list, to avoid wasting time
+ searching when allocating a block.  If we have more than 16
+ entries, leak the smallest entry.  */
+
   if (size >= sizeof (struct backtrace_freelist_struct))
 {
+  size_t c;
+  struct backtrace_freelist_struct **ppsmall;
+  struct backtrace_freelist_struct **pp;
   struct backtrace_freelist_struct *p;
 
+  c = 0;
+  ppsmall = NULL;
+  for (pp = >freelist; *pp != NULL; pp = &(*pp)->next)
+   {
+ if (ppsmall == NULL || (*pp)->size < (*ppsmall)->size)
+   ppsmall = pp;
+ ++c;
+   }
+  if (c >= 16)
+   {
+ if (size <= (*ppsmall)->size)
+   return;
+ *ppsmall = (*ppsmall)->next;
+   }
+
   p = (struct backtrace_freelist_struct *) addr;
   p->next = state->freelist;
   p->size = size;


[PATCH] PR fortran/83998 -- fix dot_product on 0-sized arrays

2018-01-24 Thread Steve Kargl
All,

The attach patch fixes a regression with dot_product and
zero-sized arrays.  I bootstrapped and regression tested
the patch on x86_64-*-freebsd.  OK to commit?

2018-01-23  Steven G. Kargl  

PR fortran/83998
* simplify.c (gfc_simplify_dot_product): Deal with zero-sized arrays.

2018-01-23  Steven G. Kargl  

PR fortran/83998
* gfortran.dg/dot_product_4.f90:

-- 
Steve
Index: gcc/fortran/simplify.c
===
--- gcc/fortran/simplify.c	(revision 256953)
+++ gcc/fortran/simplify.c	(working copy)
@@ -2253,22 +2253,19 @@ gfc_simplify_dim (gfc_expr *x, gfc_expr *y)
 gfc_expr*
 gfc_simplify_dot_product (gfc_expr *vector_a, gfc_expr *vector_b)
 {
+  /* If vector_a is a zero-sized array, the result is 0 for INTEGER, 
+ REAL, and COMPLEX types and .false. for LOGICAL.  */
+  if (vector_a->shape && mpz_get_si (vector_a->shape[0]) == 0)
+{
+  if (vector_a->ts.type == BT_LOGICAL)
+	return gfc_get_logical_expr (gfc_default_logical_kind, NULL, false);
+  else
+	return gfc_get_int_expr (gfc_default_integer_kind, NULL, 0);
+}
 
-  gfc_expr temp;
-
   if (!is_constant_array_expr (vector_a)
   || !is_constant_array_expr (vector_b))
 return NULL;
-
-  gcc_assert (vector_a->rank == 1);
-  gcc_assert (vector_b->rank == 1);
-
-  temp.expr_type = EXPR_OP;
-  gfc_clear_ts ();
-  temp.value.op.op = INTRINSIC_NONE;
-  temp.value.op.op1 = vector_a;
-  temp.value.op.op2 = vector_b;
-  gfc_type_convert_binary (, 1);
 
   return compute_dot_product (vector_a, 1, 0, vector_b, 1, 0, true);
 }
Index: gcc/testsuite/gfortran.dg/dot_product_4.f90
===
--- gcc/testsuite/gfortran.dg/dot_product_4.f90	(nonexistent)
+++ gcc/testsuite/gfortran.dg/dot_product_4.f90	(working copy)
@@ -0,0 +1,13 @@
+! { dg-do run }
+! PR fortran/83998
+program p
+   integer, parameter :: a(0) = 1
+   real, parameter :: b(0) = 1
+   complex, parameter :: c(0) = 1
+   logical, parameter :: d(0) = .true.
+   if (dot_product(a,a) /= 0) call abort
+   if (dot_product(b,b) /= 0) call abort
+   if (dot_product(c,c) /= 0) call abort
+   if (dot_product(d,d) .neqv. .false.) call abort
+end
+


Re: [PATCH], PR target/81550, Rewrite PowerPC loop_align test so it still tests the original target hook

2018-01-24 Thread Segher Boessenkool
On Wed, Jan 24, 2018 at 05:00:39PM -0500, Michael Meissner wrote:
> Replacing 'int' with 'unsigned long' allows the test to succeed once again.  I
> have checked this on a big endian power8 (both 32-bit and 64-bit) and on a
> little endian power8 (64-bit only), and it passes in all three environments.
> Can I install this on the trunk?

Yes, this is.  Please install on trunk.  Thanks!


Segher


> [gcc/testsuite]
> 2018-01-24  Michael Meissner  
> 
>   PR target/81550
>   * gcc.target/powerpc/loop_align.c: Use unsigned long for the loop
>   index instead of int, which allows IVOPTs to properly optimize the
>   loop.


[committed] Fix jit.dg/test-alignment* (PR jit/82846)

2018-01-24 Thread David Malcolm
These testcases jit-compile functions that return char, but
were erroneously calling them as if they returned int.

This led to errors for certain target configurations (e.g.
reading from %eax (32-bit) in the harness when only %al (8-bit)
had been written to in the jit-compiled function).

Regrtested on x86_64-pc-linux-gnu, and smoketested with
"-with-arch=haswell--with-cpu=haswell".

Committed to trunk as r257037.

gcc/testsuite/ChangeLog:
PR jit/82846
* jit.dg/test-alignment.c (create_aligned_code): Fix return type
of "fn_type" typedef.
* jit.dg/test-alignment.cc (verify_aligned_code): Likewise.
---
 gcc/testsuite/jit.dg/test-alignment.c  | 2 +-
 gcc/testsuite/jit.dg/test-alignment.cc | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/jit.dg/test-alignment.c 
b/gcc/testsuite/jit.dg/test-alignment.c
index 686d981..82328d5 100644
--- a/gcc/testsuite/jit.dg/test-alignment.c
+++ b/gcc/testsuite/jit.dg/test-alignment.c
@@ -166,7 +166,7 @@ create_aligned_code (gcc_jit_context *ctxt, const char 
*struct_name,
   gcc_jit_result *result,  \
   const char *writer_fn_name)  \
   {\
-  typedef int (*fn_type) (struct TYPENAME *);  \
+  typedef char (*fn_type) (struct TYPENAME *); \
   CHECK_NON_NULL (result); \
\
   struct TYPENAME tmp; \
diff --git a/gcc/testsuite/jit.dg/test-alignment.cc 
b/gcc/testsuite/jit.dg/test-alignment.cc
index 3e99209..9a09a41 100644
--- a/gcc/testsuite/jit.dg/test-alignment.cc
+++ b/gcc/testsuite/jit.dg/test-alignment.cc
@@ -126,7 +126,7 @@ verify_aligned_code (gcc_jit_context *ctxt,
  gcc_jit_result *result,
  const char *writer_fn_name)
 {
-  typedef int (*fn_type) (T *);
+  typedef char (*fn_type) (T *);
   CHECK_NON_NULL (result);
 
   T tmp;
-- 
1.8.5.3



Re: [PATCH], PR target/81550, Rewrite PowerPC loop_align test so it still tests the original target hook

2018-01-24 Thread Segher Boessenkool
On Wed, Jan 24, 2018 at 03:19:00PM -0500, Michael Meissner wrote:
> On Wed, Jan 24, 2018 at 12:35:38PM -0600, Segher Boessenkool wrote:
> > Although, hrm, in your patch you also change "int i" to "long i"; that
> > alone seems to be enough to fix everything?  Could you check that please?
> 
> Changing i and n to either 'long' or 'long unsigned' makes the test work.
> 
> It is interesting that -mcpu=power7 -mbig does not seem to be able to create
> LFDU and STFDU, but either setting cpu to power8/power9 or setting -mbig to
> -mlittle or -m32 it can generate those instructions.

Yeah, dunno...  I suspect we have some target costs a bit wrong,
influencing the ivopts etc. decisions.

The auto_inc_dec pass says (-mcpu=power7 -mabi=elfv2 -mbig):

   23: r147:DF=[r126:DI]
found mem(23) *(r[126]+0)
trying SIMPLE_PRE_INC
cost failure old=16 new=408

(where -mlittle thinks it is fine; does not say what costs, but the 408
for -mbig looks suspicious of course -- sounds like the call to
rs6000_slow_unaligned_access in rs6000_rtx_costs misfired.  Yet another
reason to use insn_cost instead ;-) )


Segher


Go patch committed: Rationalize external symbol names

2018-01-24 Thread Ian Lance Taylor
This patch to the Go frontend rationalizes the external symbol names
that appear in assembler code.  It changes from the ad hoc mechanisms
used to date to produce a set of names that are at least somewhat more
coherent.  They are also more readable, after applying a simple
demangling algorithms outlined in the long comment in names.cc.  The
new names use only ASCII alphanumeric characters, underscore, and dot
(which fixes AIX by avoiding the use of dollar sign).  If we really
had to we could replace dot with underscore at the cost of forbidding
some uses of underscore in Go identifier names.

A minor cleanup discovered during this was that we were treating
function types as different if one had a NULL parameters_ field and
another has a non-NULL parameters_ field that has no parameters.  This
worked because we mangled them slightly differently.  We now mangle
them the same, so we treat them as equal, as we should anyhow.

Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian


2018-01-24  Ian Lance Taylor  

* go.go-torture/execute/names-1.go: New test.
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 256971)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-3488a401e50835de5de5c4f153772ac2798d0e71
+0bbc03f81c862fb35be3edee9824698a7892a17e
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/escape.cc
===
--- gcc/go/gofrontend/escape.cc (revision 256835)
+++ gcc/go/gofrontend/escape.cc (working copy)
@@ -686,42 +686,33 @@ debug_function_name(Named_object* fn)
 
   if (!fn->is_function())
 return Gogo::unpack_hidden_name(fn->name());
-  if (fn->func_value()->enclosing() == NULL)
+
+  std::string fnname = Gogo::unpack_hidden_name(fn->name());
+  if (fn->func_value()->is_method())
 {
-  std::string fnname = Gogo::unpack_hidden_name(fn->name());
-  if (fn->func_value()->is_method())
-{
-  // Methods in gc compiler are named "T.m" or "(*T).m" where
-  // T is the receiver type. Add the receiver here.
-  Type* rt = fn->func_value()->type()->receiver()->type();
-  switch (rt->classification())
-{
-  case Type::TYPE_NAMED:
-fnname = rt->named_type()->name() + "." + fnname;
-break;
-
-  case Type::TYPE_POINTER:
-{
-  Named_type* nt = rt->points_to()->named_type();
-  if (nt != NULL)
-fnname = "(*" + nt->name() + ")." + fnname;
-  break;
-}
-
-  default:
-break;
-}
-}
-  return fnname;
+  // Methods in gc compiler are named "T.m" or "(*T).m" where
+  // T is the receiver type. Add the receiver here.
+  Type* rt = fn->func_value()->type()->receiver()->type();
+  switch (rt->classification())
+   {
+   case Type::TYPE_NAMED:
+ fnname = rt->named_type()->name() + "." + fnname;
+ break;
+
+   case Type::TYPE_POINTER:
+ {
+   Named_type* nt = rt->points_to()->named_type();
+   if (nt != NULL)
+ fnname = "(*" + nt->name() + ")." + fnname;
+   break;
+ }
+
+   default:
+ break;
+   }
 }
 
-  // Closures are named ".$nested#" where # is a global counter. Add outer
-  // function name for better distinguishing. This is also closer to what
-  // gc compiler prints, "outer.func#".
-  Named_object* enclosing = fn->func_value()->enclosing();
-  std::string name = Gogo::unpack_hidden_name(fn->name());
-  std::string outer_name = Gogo::unpack_hidden_name(enclosing->name());
-  return outer_name + "." + name;
+  return fnname;
 }
 
 // Return the name of the current function.
Index: gcc/go/gofrontend/expressions.cc
===
--- gcc/go/gofrontend/expressions.cc(revision 256835)
+++ gcc/go/gofrontend/expressions.cc(working copy)
@@ -1310,6 +1310,16 @@ Func_descriptor_expression::do_get_backe
   && Linemap::is_predeclared_location(no->location()))
 is_descriptor = true;
 
+  // The runtime package implements some functions defined in the
+  // syscall package.  Let the syscall package define the descriptor
+  // in this case.
+  if (gogo->compiling_runtime()
+  && gogo->package_name() == "runtime"
+  && no->is_function()
+  && !no->func_value()->asm_name().empty()
+  && no->func_value()->asm_name().compare(0, 8, "syscall.") == 0)
+is_descriptor = true;
+
   Btype* btype = this->type()->get_backend(gogo);
 
   Bvariable* bvar;
@@ -6845,7 +6855,8 @@ Bound_method_expression::create_thunk(Go
 
   if (orig_fntype == NULL || !orig_fntype->is_method())
 {

[C++ PATCH] Don't clear TREE_CONSTANT on ADDR_EXPRs (PR c++/83993)

2018-01-24 Thread Jakub Jelinek
Hi!

cxx_eval_outermost_constant_expr clears TREE_CONSTANT on ADDR_EXPRs that
aren't considered by C++ constant expressions, but that breaks middle-end
which relies on TREE_CONSTANT being set on ADDR_EXPR where the address
is constant.

The following patch just special cases ADDR_EXPR not to clear it.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

As I wrote in the PR, another option would be to restore TREE_CONSTANT
during genericization if it was cleared earlier in the FE.

2018-01-24  Jakub Jelinek  

PR c++/83993
* constexpr.c (cxx_eval_outermost_constant_expr): Don't clear
TREE_CONSTANT on ADDR_EXPRs.

* g++.dg/init/pr83993-2.C: New test.

--- gcc/cp/constexpr.c.jj   2018-01-24 13:38:40.572913190 +0100
+++ gcc/cp/constexpr.c  2018-01-24 17:03:16.821440047 +0100
@@ -4832,7 +4832,7 @@ cxx_eval_outermost_constant_expr (tree t
 
   if (non_constant_p && !allow_non_constant)
 return error_mark_node;
-  else if (non_constant_p && TREE_CONSTANT (r))
+  else if (non_constant_p && TREE_CONSTANT (r) && TREE_CODE (r) != ADDR_EXPR)
 {
   /* This isn't actually constant, so unset TREE_CONSTANT.  */
   if (EXPR_P (r))
--- gcc/testsuite/g++.dg/init/pr83993-2.C.jj2018-01-24 17:04:18.823456178 
+0100
+++ gcc/testsuite/g++.dg/init/pr83993-2.C   2018-01-24 17:04:39.593454636 
+0100
@@ -0,0 +1,14 @@
+// PR c++/83993
+// { dg-do compile }
+// { dg-options "-w" }
+
+int a[5];
+extern int b[];
+int *const c = [6];
+int *const d = [1];
+
+int
+foo ()
+{
+  return c[-4] + d[-1];
+}

Jakub


[C++ PATCH] Fix constexpr handling of arrays with unknown bound (PR c++/83993)

2018-01-24 Thread Jakub Jelinek
Hi!

In constexpr evaluation of array references for arrays with unknown bounds,
we need to diagnose out of bounds accesses, but really don't know the bounds
at compile time, right now GCC will see nelts as error_mark_node + 1 and
will not consider them a constant expression at all.
>From the clang commit message it seems that CWG is leaning towards allowing
_with_unknown_bound[0] and array_with_unknown_bound, but disallowing
any other indexes (i.e. assume the array could have zero elements).
The following patch implements that.  Bootstrapped/regtested on x86_64-linux
and i686-linux, ok for trunk?

2018-01-24  Jakub Jelinek  

PR c++/83993
* constexpr.c (diag_array_subscript): Emit different diagnostics
if TYPE_DOMAIN (arraytype) is NULL.
(cxx_eval_array_reference, cxx_eval_store_expression): For arrays
with NULL TYPE_DOMAIN use size_zero_node as nelts.

* g++.dg/init/pr83993-1.C: New test.
* g++.dg/cpp0x/pr83993.C: New test.

--- gcc/cp/constexpr.c.jj   2018-01-19 23:34:04.897278768 +0100
+++ gcc/cp/constexpr.c  2018-01-24 13:38:40.572913190 +0100
@@ -2270,13 +2270,20 @@ diag_array_subscript (const constexpr_ct
   tree sidx = fold_convert (ssizetype, index);
   if (DECL_P (array))
{
- error ("array subscript value %qE is outside the bounds "
-"of array %qD of type %qT", sidx, array, arraytype);
+ if (TYPE_DOMAIN (arraytype))
+   error ("array subscript value %qE is outside the bounds "
+  "of array %qD of type %qT", sidx, array, arraytype);
+ else
+   error ("array subscript value %qE used on array %qD of "
+  "type %qT with unknown bounds", sidx, array, arraytype);
  inform (DECL_SOURCE_LOCATION (array), "declared here");
}
-  else
+  else if (TYPE_DOMAIN (arraytype))
error ("array subscript value %qE is outside the bounds "
   "of array type %qT", sidx, arraytype);
+  else
+   error ("array subscript value %qE used on array of type %qT "
+  "with unknown bounds", sidx, arraytype);
 }
 }
 
@@ -2361,7 +2368,12 @@ cxx_eval_array_reference (const constexp
 
   tree nelts;
   if (TREE_CODE (TREE_TYPE (ary)) == ARRAY_TYPE)
-nelts = array_type_nelts_top (TREE_TYPE (ary));
+{
+  if (TYPE_DOMAIN (TREE_TYPE (ary)))
+   nelts = array_type_nelts_top (TREE_TYPE (ary));
+  else
+   nelts = size_zero_node;
+}
   else if (VECTOR_TYPE_P (TREE_TYPE (ary)))
 nelts = size_int (TYPE_VECTOR_SUBPARTS (TREE_TYPE (ary)));
   else
@@ -3439,7 +3451,12 @@ cxx_eval_store_expression (const constex
  tree nelts, ary;
  ary = TREE_OPERAND (probe, 0);
  if (TREE_CODE (TREE_TYPE (ary)) == ARRAY_TYPE)
-   nelts = array_type_nelts_top (TREE_TYPE (ary));
+   {
+ if (TYPE_DOMAIN (TREE_TYPE (ary)))
+   nelts = array_type_nelts_top (TREE_TYPE (ary));
+ else
+   nelts = size_zero_node;
+   }
  else if (VECTOR_TYPE_P (TREE_TYPE (ary)))
nelts = size_int (TYPE_VECTOR_SUBPARTS (TREE_TYPE (ary)));
  else
--- gcc/testsuite/g++.dg/init/pr83993-1.C.jj2018-01-24 13:45:43.430864528 
+0100
+++ gcc/testsuite/g++.dg/init/pr83993-1.C   2018-01-24 13:44:59.352869530 
+0100
@@ -0,0 +1,11 @@
+// PR c++/83993
+// { dg-do compile }
+
+extern const int a[];
+const int *const b = [0];
+
+int
+foo ()
+{
+  return b[0];
+}
--- gcc/testsuite/g++.dg/cpp0x/pr83993.C.jj 2018-01-24 14:09:01.846716177 
+0100
+++ gcc/testsuite/g++.dg/cpp0x/pr83993.C2018-01-24 14:08:41.246718212 
+0100
@@ -0,0 +1,49 @@
+// PR c++/83993
+// { dg-do compile { target c++11 } }
+// { dg-options "" }
+
+extern const int a[];
+const int b[5] = { 1, 2, 3, 4, 5 };
+extern const int c[4];
+constexpr const int *d = [0];
+constexpr const int *d2 = a;
+constexpr const int *e = [1];// { dg-error "array subscript 
value '1' used on array 'a' of type 'const int \\\[\\\]' with unknown bounds" }
+constexpr const int *f = [0];
+constexpr const int *f2 = b;
+constexpr const int *g = [5];
+constexpr const int *h = [6];// { dg-error "array subscript 
value '6' is outside the bounds of array 'b' of type 'const int \\\[5\\\]'" }
+constexpr const int *i = [0];
+constexpr const int *i2 = c;
+constexpr const int *j = [4];
+constexpr const int *k = [5];// { dg-error "array subscript 
value '5' is outside the bounds of array 'c' of type 'const int \\\[4\\\]'" }
+extern const int l[];
+
+void
+foo ()
+{
+  extern const int l[3];
+  constexpr const int *m = [0];
+  constexpr const int *m2 = l;
+  constexpr const int *n = [1];
+  static_assert (m == m2, "");
+}
+
+constexpr const int *m = [0];
+constexpr const int *m2 = l;
+constexpr const int *n = [1];// { dg-error "array subscript 
value '1' used on array 'l' of type 'const int \\\[\\\]' 

Re: Fix m68k-linux-gnu libgcc build for ColdFire (PR target/68467)

2018-01-24 Thread Jeff Law
On 01/24/2018 03:24 PM, Joseph Myers wrote:
> PR target/68467 is libgcc failing to build for m68k-linux-gnu
> configured for ColdFire.
> 
> Jeff has an analysis in the PR identifying the problem as resulting
> from the callers of libcalls with 1-byte or 2-byte arguments wanting
> to push just 1 or 2 bytes on the stack, while the libcall
> implementations have the normal C ABI and expect 4-byte arguments.
> For normal C functions, I believe the TARGET_PROMOTE_PROTOTYPES
> definition would ensure such arguments get passed as 4-byte, but that
> does not apply for libcalls.
> 
> This patch fixes the issue by defining TARGET_PROMOTE_FUNCTION_MODE
> for m68k.  The definition is conservative, only applying promotions in
> the case of arguments to libcalls; otherwise it returns the unpromoted
> type, which I believe matches what the default implementation of the
> hook would have done on m68k.
> 
> I have tested that this fixes the libgcc build for ColdFire, and, in
> conjunction with one glibc patch, this enables glibc to build cleanly
> for ColdFire and to pass the compilation parts of the glibc testsuite
> except for one test unrelated to this patch (while glibc and the
> compilation parts of the testsuite continue to build OK for
> non-ColdFire m68k, as expected).  I have *not* run any GCC tests for
> this patch, or any execution tests for m68k.
> 
> OK to commit?
> 
> 2018-01-24  Joseph Myers  
> 
>   PR target/68467
>   * config/m68k/m68k.c (m68k_promote_function_mode): New function.
>   (TARGET_PROMOTE_FUNCTION_MODE): New macro.
So assuming my analysis in the BZ was correct the remaining concern
would be any libcalls that were implemented in assembly code which had
char or short arguments.  Wandering through libgcc, I don't see anything
which fits those constraints (there's lots of stuff that accepts a 32bit
float, but those aren't changed by your patch).

So I think the verdict is this should be OK.

Jeff


Fix m68k-linux-gnu libgcc build for ColdFire (PR target/68467)

2018-01-24 Thread Joseph Myers
PR target/68467 is libgcc failing to build for m68k-linux-gnu
configured for ColdFire.

Jeff has an analysis in the PR identifying the problem as resulting
from the callers of libcalls with 1-byte or 2-byte arguments wanting
to push just 1 or 2 bytes on the stack, while the libcall
implementations have the normal C ABI and expect 4-byte arguments.
For normal C functions, I believe the TARGET_PROMOTE_PROTOTYPES
definition would ensure such arguments get passed as 4-byte, but that
does not apply for libcalls.

This patch fixes the issue by defining TARGET_PROMOTE_FUNCTION_MODE
for m68k.  The definition is conservative, only applying promotions in
the case of arguments to libcalls; otherwise it returns the unpromoted
type, which I believe matches what the default implementation of the
hook would have done on m68k.

I have tested that this fixes the libgcc build for ColdFire, and, in
conjunction with one glibc patch, this enables glibc to build cleanly
for ColdFire and to pass the compilation parts of the glibc testsuite
except for one test unrelated to this patch (while glibc and the
compilation parts of the testsuite continue to build OK for
non-ColdFire m68k, as expected).  I have *not* run any GCC tests for
this patch, or any execution tests for m68k.

OK to commit?

2018-01-24  Joseph Myers  

PR target/68467
* config/m68k/m68k.c (m68k_promote_function_mode): New function.
(TARGET_PROMOTE_FUNCTION_MODE): New macro.

Index: gcc/config/m68k/m68k.c
===
--- gcc/config/m68k/m68k.c  (revision 257030)
+++ gcc/config/m68k/m68k.c  (working copy)
@@ -192,6 +192,8 @@
 static unsigned int m68k_hard_regno_nregs (unsigned int, machine_mode);
 static bool m68k_hard_regno_mode_ok (unsigned int, machine_mode);
 static bool m68k_modes_tieable_p (machine_mode, machine_mode);
+static machine_mode m68k_promote_function_mode (const_tree, machine_mode,
+   int *, const_tree, int);
 
 /* Initialize the GCC target structure.  */
 
@@ -347,6 +349,9 @@
 #undef TARGET_MODES_TIEABLE_P
 #define TARGET_MODES_TIEABLE_P m68k_modes_tieable_p
 
+#undef TARGET_PROMOTE_FUNCTION_MODE
+#define TARGET_PROMOTE_FUNCTION_MODE m68k_promote_function_mode
+
 static const struct attribute_spec m68k_attribute_table[] =
 {
   /* { name, min_len, max_len, decl_req, type_req, fn_type_req,
@@ -6621,4 +6626,20 @@
   return (bytes + 1) & ~1;
 }
 
+/* Implement TARGET_PROMOTE_FUNCTION_MODE.  */
+
+static machine_mode
+m68k_promote_function_mode (const_tree type, machine_mode mode,
+int *punsignedp ATTRIBUTE_UNUSED,
+const_tree fntype ATTRIBUTE_UNUSED,
+int for_return)
+{
+  /* Promote libcall arguments narrower than int to match the normal C
+ ABI (for which promotions are handled via
+ TARGET_PROMOTE_PROTOTYPES).  */
+  if (type == NULL_TREE && !for_return && (mode == QImode || mode == HImode))
+return SImode;
+  return mode;
+}
+
 #include "gt-m68k.h"

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH, fortran] Support Fortran 2018 teams

2018-01-24 Thread Jakub Jelinek
On Wed, Jan 24, 2018 at 08:19:58PM +, Paul Richard Thomas wrote:
> (Jakub, This is all hidden behind the -fcoarray option. To my mind
> this is safe for release.)

Ok from RM POV.

Jakub


Re: [PATCH][PR target/83994] Fix stack-clash-protection code generation on x86

2018-01-24 Thread Jeff Law
On 01/24/2018 12:11 AM, Uros Bizjak wrote:
> On Wed, Jan 24, 2018 at 12:15 AM, Jeff Law  wrote:
>>
>> pr83994 is a code generation bug in the stack-clash support that affects
>> openssl (we've turned on stack-clash-protection by default for the F28
>> builds).
>>
>> The core problem is stack-clash (like stack-check) will emit a probing
>> loop if the prologue allocates enough stack space.  When emitting a loop
>> both implementations will need a scratch register.
>>
>> They use get_scratch_register_at_entry to find a suitable scratch
>> register.  This routine assumes that callee registers saves are
>> completed at the point where the scratch register is going to be used.
>>
>> In this particular testcase we select %ebx because ax,cx,dx are used for
>> parameter passing.  That's fine.  The problem is %ebx hasn't been saved yet!
>>
>> -fstack-check has a bit of code in the frame setup/layout code which
>> forces the prologue to use pushes rather than reg->mem moves for saving
>> registers.  There's a gcc_assert in the prologue expander to catch any
>> case where the registers aren't saved.
>>
>> -fstack-clash-protection doesn't have that same bit of magic in the
>> frame setup/layout code and it bypasses the assertion due to a change I
>> made back in Nov 2017 due to not being aware of this particular issue.
>>
>> This patch reverts the assertion bypass I added back in Nov 2017 and
>> adds clarifying comments.  The patch also forces use of push to save
>> integer registers for a stack-clash protected prologue if probes are
>> going to be needed.
>>
>> Bootstrapped and regression tested on x86_64.
>>
>> While the bug is not marked as a regression, ISTM this needs to be fixed
>> for gcc-8.
>>
>> OK for the trunk?
>>
>> Jeff
>>
>> * i386.c (get_probe_interval): Move to earlier point.
>> (ix86_compute_frame_layout): If -fstack-clash-protection and
>> the frame is larger than the probe interval, then use pushes
>> to save registers rather than reg->mem moves.
>> (ix86_expand_prologue): Remove conditional for int_registers_saved
>> assertion.
>>
>> * gcc.target/i386/pr83994.c: New test.
> 
> OK with the fixed testcase (see below).
> 
> Thanks,
> Uros.

[ ... snip ... ]

>> diff --git a/gcc/testsuite/gcc.target/i386/pr83994.c 
>> b/gcc/testsuite/gcc.target/i386/pr83994.c
>> new file mode 100644
>> index 000..b57b04b
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/i386/pr83994.c
>> @@ -0,0 +1,15 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -m32 -march=i686 -fpic -fstack-clash-protection" } */
> 
> Please use
> 
> /* { dg-require-effective-target ia32 } */
> 
> and remove "-m32" from dg-options.
Done.  Thanks for the quick turnaround.

jeff


Re: [PATCH], PR target/81550, Rewrite PowerPC loop_align test so it still tests the original target hook

2018-01-24 Thread Michael Meissner
On Wed, Jan 24, 2018 at 12:35:38PM -0600, Segher Boessenkool wrote:
> Hi!
> 
> On Wed, Jan 24, 2018 at 12:27:55AM -0500, Michael Meissner wrote:
> > 
> > As Segher and I were discussing over private IRC, the root cause of this 
> > bug is
> > the compiler no long generates the BDNZ instruction for a count down loop,
> > instead it decrements the index in a GPR and does a branch/comparison on it.
> 
> Yes, ivopts makes a bad decision (it uses stride 8 for all IVs, it should
> keep one with stride -1 for the loop counter, for optimal code; it also
> does three separate increments for the three memory accesses, which is
> a bit excessive here).
> 
> > In doing so, it now unrolls the loop twice, and and the resulting loop is 
> > too
> > big for the target hook TARGET_ASM_LOOP_ALIGN_MAX_SKIP.  This means the loop
> > isn't aligned to a 32 byte boundary.
> 
> It's not really unrolling, it is bb-reorder copying an RTL block.  However,
> even if you disable it you still get 9 insns on some configurations, so
> your patch does not hide the problem :-(
> 
> Although, hrm, in your patch you also change "int i" to "long i"; that
> alone seems to be enough to fix everything?  Could you check that please?

Replacing 'int' with 'unsigned long' allows the test to succeed once again.  I
have checked this on a big endian power8 (both 32-bit and 64-bit) and on a
little endian power8 (64-bit only), and it passes in all three environments.
Can I install this on the trunk?

[gcc/testsuite]
2018-01-24  Michael Meissner  

PR target/81550
* gcc.target/powerpc/loop_align.c: Use unsigned long for the loop
index instead of int, which allows IVOPTs to properly optimize the
loop.

Index: gcc/testsuite/gcc.target/powerpc/loop_align.c
===
--- gcc/testsuite/gcc.target/powerpc/loop_align.c   (revision 256992)
+++ gcc/testsuite/gcc.target/powerpc/loop_align.c   (working copy)
@@ -4,8 +4,8 @@
 /* { dg-options "-O2 -mcpu=power7 -falign-functions=16" } */
 /* { dg-final { scan-assembler ".p2align 5,,31" } } */
 
-void f(double *a, double *b, double *c, int n) {
-  int i;
+void f(double *a, double *b, double *c, unsigned long n) {
+  unsigned long i;
   for (i=0; i < n; i++)
 a[i] = b[i] + c[i];
 }

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



[RFC][PR82479] missing popcount builtin detection

2018-01-24 Thread Kugan Vivekanandarajah
Hi All,

Here is a patch for popcount builtin detection similar to LLVM. I
would like to queue this for review for next stage 1.

1. This is done part of loop-distribution and effective for -O3 and above.
2. This does not distribute loop to detect popcount (like
memcpy/memmove). I dont think that happens in practice. Please correct
me if I am wrong.

Bootstrapped and regression tested on aarch64-linux-gnu with no new regressions.

Thanks,
Kugan

gcc/ChangeLog:

2018-01-25  Kugan Vivekanandarajah  

PR middle-end/82479
* tree-loop-distribution.c (handle_popcount): New.
(pass_loop_distribution::execute): Use handle_popcount.

gcc/testsuite/ChangeLog:

2018-01-25  Kugan Vivekanandarajah  

PR middle-end/82479
* gcc.dg/tree-ssa/popcount.c: New test.
From 9fa09af4b7013c6207e59a4920c82f089bfe45c2 Mon Sep 17 00:00:00 2001
From: Kugan Vivekanandarajah 
Date: Wed, 24 Jan 2018 08:50:08 +1100
Subject: [PATCH] pocount builtin detection

Change-Id: Ic6e175f9cc9a69bd417936a4845c2c046fd446b4

Change-Id: I680eb107445660c60a5d38f5d7300ab1a3243bf5

Change-Id: Ia9f0df89e05520091dc7797195098118768c7ac2
---
 gcc/testsuite/gcc.dg/tree-ssa/popcount.c |  41 +
 gcc/tree-loop-distribution.c | 145 +++
 2 files changed, 186 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/popcount.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/popcount.c b/gcc/testsuite/gcc.dg/tree-ssa/popcount.c
new file mode 100644
index 000..86a66cb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/popcount.c
@@ -0,0 +1,41 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+
+extern int foo (int);
+
+int PopCount (long b) {
+int c = 0;
+b++;
+
+while (b) {
+	b &= b - 1;
+	c++;
+}
+return c;
+}
+int PopCount2 (long b) {
+int c = 0;
+
+while (b) {
+	b &= b - 1;
+	c++;
+}
+foo (c);
+return foo (c);
+}
+
+void PopCount3 (long b1) {
+
+for (long i = 0; i < b1; ++i)
+  {
+	long b = i;
+	int c = 0;
+	while (b) {
+	b &= b - 1;
+	c++;
+	}
+	foo (c);
+  }
+}
+
+/* { dg-final { scan-tree-dump-times "__builtin_popcount" 3 "optimized" } } */
diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c
index a3d76e4..1060700 100644
--- a/gcc/tree-loop-distribution.c
+++ b/gcc/tree-loop-distribution.c
@@ -1585,6 +1585,148 @@ classify_builtin_ldst (loop_p loop, struct graph *rdg, partition *partition,
   return;
 }
 
+/* See if loop is a popcout implementation of the form
+
+int c = 0;
+while (b) {
+	b = b & (b - 1);
+	c++;
+}
+
+If so, convert this into c = __builtin_popcount (b)
+return true if we did, false otherwise.  */
+
+
+static bool
+handle_popcount (loop_p loop)
+{
+  tree lhs, rhs;
+  tree dest, src;
+  gimple *and_minus_one;
+  int count = 0;
+  gphi *count_phi;
+  gimple *fn_call;
+  gimple *use_stmt;
+  use_operand_p use_p;
+  imm_use_iterator iter;
+  gimple_stmt_iterator gsi;
+
+  /* Check loop terminating branch is like
+ if (b != 0).  */
+  gimple *stmt = last_stmt (loop->header);
+  if (!stmt
+  || gimple_code (stmt) != GIMPLE_COND
+  || !zerop (gimple_cond_rhs (stmt)))
+return false;
+
+  /* Cheeck "b = b & (b - 1)" is calculated.  */
+  lhs = gimple_cond_lhs (stmt);
+  gimple *and_stmt = SSA_NAME_DEF_STMT (lhs);
+  if (gimple_assign_rhs_code (and_stmt) != BIT_AND_EXPR)
+return false;
+  lhs = gimple_assign_rhs1 (and_stmt);
+  rhs = gimple_assign_rhs2 (and_stmt);
+  if (TREE_CODE (lhs) == SSA_NAME
+  && (and_minus_one = SSA_NAME_DEF_STMT (lhs))
+  && is_gimple_assign (and_minus_one)
+  && (gimple_assign_rhs_code (and_minus_one) == PLUS_EXPR)
+  && integer_minus_onep (gimple_assign_rhs2 (and_minus_one)))
+  lhs = rhs;
+  else if (TREE_CODE (rhs) == SSA_NAME
+  && (and_minus_one = SSA_NAME_DEF_STMT (rhs))
+  && is_gimple_assign (and_minus_one)
+  && (gimple_assign_rhs_code (and_minus_one) == PLUS_EXPR)
+  && integer_minus_onep (gimple_assign_rhs2 (and_minus_one)))
+  ;
+  else
+return false;
+  if ((gimple_assign_rhs1 (and_stmt) != gimple_assign_rhs1 (and_minus_one))
+  && (gimple_assign_rhs2 (and_stmt) != gimple_assign_rhs1 (and_minus_one)))
+return false;
+
+  /* Check the recurrence.  */
+  gimple *phi = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (and_minus_one));
+  gimple *src_phi = SSA_NAME_DEF_STMT (lhs);
+  if (gimple_code (phi) != GIMPLE_PHI
+  || gimple_code (src_phi) != GIMPLE_PHI)
+return false;
+
+  /* Check the loop closed SSA definition for just the variable c defined in
+ loop.  */
+  src = gimple_phi_arg_def (src_phi, loop_preheader_edge (loop)->dest_idx);
+  basic_block bb = single_exit (loop)->dest;
+  for (gphi_iterator gpi = gsi_start_phis (bb);
+   !gsi_end_p (gpi); gsi_next ())
+{
+  count_phi = gpi.phi ();
+  count++;
+}
+  if (count != 1)
+return false;
+
+  /* 

[PATCH, rs6000] Fix PR56010 and PR83743, -mcpu=native use wrong names

2018-01-24 Thread Peter Bergner
The following patch fixes both PR56010 and PR83743.  PR56010 is fixed by
adding an extra altname field to the RS6000_CPU table which matches the
cases where the Linux kernel's AT_PLATFORM name differs from the name GCC
expects.  If we match on the altname, then we return the canonical name.

PR83743 is fixed by catching the case where we do not recognize at all
the AT_PLATFORM value returned by the kernel.  In that case, we emit an
error message and request the user use an explicit cpu name rather than
using "native".

I have tested this by forcing use of non-existant names and kernel alternate
names and verifying we call cc1/cc1plus with the either the correct canonical
cpu names or we emit an error message and quit.

This has bootstrapped and regtested with no errors.  Ok for mainline?

This patch applies fairly easy to the release branches, do we want this
fix there as well?

Peter

PR target/56010
PR target/83743
* config/rs6000/rs6000-cpus.def (RS6000_CPU table): Add alternate cpu
names.
* config/rs6000/driver-rs6000.c: #include "diagnostic.h".
(struct rs6000_ptt): Define new structure.
(rs6000_supported_cpu_names): New static variable.
(elf_platform) : Define new static variable and use it.
Translate kernel AT_PLATFORM name to canonical name if needed.
Error if platform name is unknown.
* config/rs6000/rs6000.c: Handle extra field in RS6000_CPU table.
* config/rs6000/default64.h: Likewise.

Index: gcc/config/rs6000/rs6000-cpus.def
===
--- gcc/config/rs6000/rs6000-cpus.def   (revision 256364)
+++ gcc/config/rs6000/rs6000-cpus.def   (working copy)
@@ -150,82 +150,82 @@
 
Before including this file, define a macro:
 
-   RS6000_CPU (NAME, CPU, FLAGS)
+   RS6000_CPU (NAME, ALTNAME, CPU, FLAGS)
 
where the arguments are the fields of struct rs6000_ptt.  */
 
-RS6000_CPU ("401", PROCESSOR_PPC403, MASK_SOFT_FLOAT)
-RS6000_CPU ("403", PROCESSOR_PPC403, MASK_SOFT_FLOAT | MASK_STRICT_ALIGN)
-RS6000_CPU ("405", PROCESSOR_PPC405, MASK_SOFT_FLOAT | MASK_MULHW | MASK_DLMZB)
-RS6000_CPU ("405fp", PROCESSOR_PPC405, MASK_MULHW | MASK_DLMZB)
-RS6000_CPU ("440", PROCESSOR_PPC440, MASK_SOFT_FLOAT | MASK_MULHW | MASK_DLMZB)
-RS6000_CPU ("440fp", PROCESSOR_PPC440, MASK_MULHW | MASK_DLMZB)
-RS6000_CPU ("464", PROCESSOR_PPC440, MASK_SOFT_FLOAT | MASK_MULHW | MASK_DLMZB)
-RS6000_CPU ("464fp", PROCESSOR_PPC440, MASK_MULHW | MASK_DLMZB)
-RS6000_CPU ("476", PROCESSOR_PPC476,
+RS6000_CPU ("401", NULL, PROCESSOR_PPC403, MASK_SOFT_FLOAT)
+RS6000_CPU ("403", "ppc403", PROCESSOR_PPC403, MASK_SOFT_FLOAT | 
MASK_STRICT_ALIGN)
+RS6000_CPU ("405", "ppc405", PROCESSOR_PPC405, MASK_SOFT_FLOAT | MASK_MULHW | 
MASK_DLMZB)
+RS6000_CPU ("405fp", NULL, PROCESSOR_PPC405, MASK_MULHW | MASK_DLMZB)
+RS6000_CPU ("440", "ppc440", PROCESSOR_PPC440, MASK_SOFT_FLOAT | MASK_MULHW | 
MASK_DLMZB)
+RS6000_CPU ("440fp", NULL, PROCESSOR_PPC440, MASK_MULHW | MASK_DLMZB)
+RS6000_CPU ("464", NULL, PROCESSOR_PPC440, MASK_SOFT_FLOAT | MASK_MULHW | 
MASK_DLMZB)
+RS6000_CPU ("464fp", NULL, PROCESSOR_PPC440, MASK_MULHW | MASK_DLMZB)
+RS6000_CPU ("476", "ppc470", PROCESSOR_PPC476,
MASK_SOFT_FLOAT | MASK_PPC_GFXOPT | MASK_MFCRF | MASK_POPCNTB
| MASK_FPRND | MASK_CMPB | MASK_MULHW | MASK_DLMZB)
-RS6000_CPU ("476fp", PROCESSOR_PPC476,
+RS6000_CPU ("476fp", NULL, PROCESSOR_PPC476,
MASK_PPC_GFXOPT | MASK_MFCRF | MASK_POPCNTB | MASK_FPRND
| MASK_CMPB | MASK_MULHW | MASK_DLMZB)
-RS6000_CPU ("505", PROCESSOR_MPCCORE, 0)
-RS6000_CPU ("601", PROCESSOR_PPC601, MASK_MULTIPLE | MASK_STRING)
-RS6000_CPU ("602", PROCESSOR_PPC603, MASK_PPC_GFXOPT)
-RS6000_CPU ("603", PROCESSOR_PPC603, MASK_PPC_GFXOPT)
-RS6000_CPU ("603e", PROCESSOR_PPC603, MASK_PPC_GFXOPT)
-RS6000_CPU ("604", PROCESSOR_PPC604, MASK_PPC_GFXOPT)
-RS6000_CPU ("604e", PROCESSOR_PPC604e, MASK_PPC_GFXOPT)
-RS6000_CPU ("620", PROCESSOR_PPC620, MASK_PPC_GFXOPT | MASK_POWERPC64)
-RS6000_CPU ("630", PROCESSOR_PPC630, MASK_PPC_GFXOPT | MASK_POWERPC64)
-RS6000_CPU ("740", PROCESSOR_PPC750, MASK_PPC_GFXOPT)
-RS6000_CPU ("7400", PROCESSOR_PPC7400, POWERPC_7400_MASK)
-RS6000_CPU ("7450", PROCESSOR_PPC7450, POWERPC_7400_MASK)
-RS6000_CPU ("750", PROCESSOR_PPC750, MASK_PPC_GFXOPT)
-RS6000_CPU ("801", PROCESSOR_MPCCORE, MASK_SOFT_FLOAT)
-RS6000_CPU ("821", PROCESSOR_MPCCORE, MASK_SOFT_FLOAT)
-RS6000_CPU ("823", PROCESSOR_MPCCORE, MASK_SOFT_FLOAT)
-RS6000_CPU ("8540", PROCESSOR_PPC8540, MASK_STRICT_ALIGN | MASK_ISEL)
-RS6000_CPU ("8548", PROCESSOR_PPC8548, MASK_STRICT_ALIGN | MASK_ISEL)
-RS6000_CPU ("a2", PROCESSOR_PPCA2,
+RS6000_CPU ("505", NULL, PROCESSOR_MPCCORE, 0)
+RS6000_CPU ("601", "ppc601", PROCESSOR_PPC601, MASK_MULTIPLE |MASK_STRING)
+RS6000_CPU ("602", NULL, PROCESSOR_PPC603, MASK_PPC_GFXOPT)
+RS6000_CPU ("603", "ppc603", PROCESSOR_PPC603, MASK_PPC_GFXOPT)
+RS6000_CPU ("603e", NULL, PROCESSOR_PPC603, 

Re: [PATCH, fortran] Support Fortran 2018 teams

2018-01-24 Thread Damian Rouson

On January 24, 2018 at 1:29:12 PM, Steve Kargl 
(s...@troutmask.apl.washington.edu) wrote:


Yes, thanks, Paul. Unfortunately, I've run out of time.  
Damian, GCC is in stage 3, we need to wait for approval  
from the release manager (aka Jakub) before committing  
the patch.  


Will do.  

Damian

Re: [PATCH, fortran] Support Fortran 2018 teams

2018-01-24 Thread Steve Kargl
On Wed, Jan 24, 2018 at 01:25:51PM -0800, Damian Rouson wrote:
> Thank you, Paul.  I think Alessandro has commit rights.
> If so, then I’ll ask him to make the requested edits and commit it.
> 
> Damian
> 

Yes, thanks, Paul.  Unfortunately, I've run out of time.
Damian, GCC is in stage 3, we need to wait for approval
from the release manager (aka Jakub) before committing
the patch.

-- 
Steve


Re: [PATCH, fortran] Support Fortran 2018 teams

2018-01-24 Thread Damian Rouson
Thank you, Paul.   I think Alessandro has commit rights.  If so, then I’ll ask 
him to make the requested edits and commit it.

Damian

On January 24, 2018 at 12:19:58 PM, Paul Richard Thomas 
(paul.richard.tho...@gmail.com) wrote:

Hi All,  

Given the delay relative to the start of stage 3, I thought that I had  
better deal with this asap:  


+ /* TODO: this works on any derived type when  
+ it should only work with team_type. */  
+ if (team->ts.type != BT_DERIVED)  
Why don't you give the team_type derived type an attribute 'team_type'  
and test that?  

- code node is passed. The result type and library subroutine name  
+ code ndoe is passed. The result type and library subroutine name  
typo  

+! Tests if team_number intrinsic fucntion works  

It's just as well that there is an 'n' in there, although it gives me  
an idea for a new type of fortran procedure that does what it says :-)  

Together with the change that Steve identified, this seems to me to be  
ready to go.  

(Jakub, This is all hidden behind the -fcoarray option. To my mind  
this is safe for release.)  

OK for trunk.  

Many thanks for this patch.  

Paul  


On 23 January 2018 at 05:45, Steve Kargl  
 wrote:  
> I'm heading out of town for a meeting at the end of  
> week, so gfortran patches/reviews are on hold at the  
> moment. If someone else wants to step up to review  
> the patch, I won't object.  
>  
> --  
> steve  
>  
> On Mon, Jan 22, 2018 at 08:29:41PM -0800, Damian Rouson wrote:  
>> Is Fortran 2018 teams patch ok for trunk?  
>>  
>> Damian  
>>  
>> On January 19, 2018 at 2:47:39 PM, Alessandro Fanfarillo (elfa...@ucar.edu) 
>> wrote:  
>>  
>> I can confirm that the little change suggested by Steve passes the  
>> regtests (on x86_64-pc-linux-gnu) and the regular tests using  
>> OpenCoarrays.  
>>  
>> On Fri, Jan 19, 2018 at 10:33 AM, Steve Kargl  
>>  wrote:  
>> > On Fri, Jan 19, 2018 at 09:18:14AM -0800, Damian Rouson wrote:  
>> >> Thanks for catching that, Steve, and for responding, Alessandro.  
>> >>  
>> >> Anything else?  
>> >>  
>> >  
>> > I've only just started to look at the patch. Unfortunately,  
>> > I know zero about teams, so need to read the patch and F2018  
>> > standard simultaneously.  
>> >  
>> > --  
>> > Steve  
>>  
>>  
>>  
>> --  
>>  
>> Alessandro Fanfarillo, Ph.D.  
>> Postdoctoral Researcher  
>> National Center for Atmospheric Research  
>> Mesa Lab, Boulder, CO, USA  
>> 303-497-1229  
>  
> --  
> Steve  
> 20170425 https://www.youtube.com/watch?v=VWUpyCsUKR4  
> 20161221 https://www.youtube.com/watch?v=IbCHE-hONow  



--  
"If you can't explain it simply, you don't understand it well enough"  
- Albert Einstein  


Re: New istreambuf_iterator debug check

2018-01-24 Thread François Dumont

On 24/01/2018 18:53, Petr Ovtchenkov wrote:

On Wed, 24 Jan 2018 17:39:59 +0100
François Dumont  wrote:


Hi

      I'd like to propose this new debug check. Comparing with non-eos
istreambuf_iterator sounds like an obvious coding mistake.

      I propose it despite the stage 1 as it is just a new debug check,
it doesn't impact the lib in normal mode.

      Tested under Linux x86_64, ok to commit ?

François


bool
equal(const istreambuf_iterator& __b) const
-  { return _M_at_eof() == __b._M_at_eof(); }
+  {
+   bool __this_at_eof = _M_at_eof();
+   bool __b_at_eof = __b._M_at_eof();
+
+   __glibcxx_requires_cond(__this_at_eof || __b_at_eof, _M_message(
+ "Abnormal comparison to non-end-of-stream istreambuf_iterator"));
+   return __this_at_eof == __b_at_eof;
+  }

Looks strange for me. It is legal and possible that istreambuf_iterator
will be in EOF state.

Sure, but consider rather the associated 3_neg.cc showing the debug 
check purpose:


  cistreambuf_iter it1(istrs), it2(istrs);
  it1 == it2; // No sens



Re: Add support for bitwise reductions

2018-01-24 Thread Rainer Orth
Jeff Law  writes:

> On 11/22/2017 11:12 AM, Richard Sandiford wrote:
>> Richard Sandiford  writes:
>>> This patch adds support for the SVE bitwise reduction instructions
>>> (ANDV, ORV and EORV).  It's a fairly mechanical extension of existing
>>> REDUC_* operators.
>>>
>>> Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu
>>> and powerpc64le-linux-gnu.
>> 
>> Here's an updated version that applies on top of the recent
>> removal of REDUC_*_EXPR.  Tested as before.
>> 
>> Thanks,
>> Richard
>> 
>> 
>> 2017-11-22  Richard Sandiford  
>>  Alan Hayward  
>>  David Sherwood  
>> 
>> gcc/
>>  * optabs.def (reduc_and_scal_optab, reduc_ior_scal_optab)
>>  (reduc_xor_scal_optab): New optabs.
>>  * doc/md.texi (reduc_and_scal_@var{m}, reduc_ior_scal_@var{m})
>>  (reduc_xor_scal_@var{m}): Document.
>>  * doc/sourcebuild.texi (vect_logical_reduc): Likewise.
>>  * internal-fn.def (IFN_REDUC_AND, IFN_REDUC_IOR, IFN_REDUC_XOR): New
>>  internal functions.
>>  * fold-const-call.c (fold_const_call): Handle them.
>>  * tree-vect-loop.c (reduction_fn_for_scalar_code): Return the new
>>  internal functions for BIT_AND_EXPR, BIT_IOR_EXPR and BIT_XOR_EXPR.
>>  * config/aarch64/aarch64-sve.md (reduc__scal_):
>>  (*reduc__scal_): New patterns.
>>  * config/aarch64/iterators.md (UNSPEC_ANDV, UNSPEC_ORV)
>>  (UNSPEC_XORV): New unspecs.
>>  (optab): Add entries for them.
>>  (BITWISEV): New int iterator.
>>  (bit_reduc_op): New int attributes.
>> 
>> gcc/testsuite/
>>  * lib/target-supports.exp (check_effective_target_vect_logical_reduc):
>>  New proc.
>>  * gcc.dg/vect/vect-reduc-or_1.c: Also run for vect_logical_reduc
>>  and add an associated scan-dump test.  Prevent vectorization
>>  of the first two loops.
>>  * gcc.dg/vect/vect-reduc-or_2.c: Likewise.
>>  * gcc.target/aarch64/sve_reduc_1.c: Add AND, IOR and XOR reductions.
>>  * gcc.target/aarch64/sve_reduc_2.c: Likewise.
>>  * gcc.target/aarch64/sve_reduc_1_run.c: Likewise.
>>  (INIT_VECTOR): Tweak initial value so that some bits are always set.
>>  * gcc.target/aarch64/sve_reduc_2_run.c: Likewise.
> OK.
> Jeff

Two tests have regressed on sparc-sun-solaris2.*:

+FAIL: gcc.dg/vect/vect-reduc-or_1.c -flto -ffat-lto-objects  scan-tree-dump 
vect "Reduce using vector shifts"
+FAIL: gcc.dg/vect/vect-reduc-or_1.c scan-tree-dump vect "Reduce using vector 
shifts"
+FAIL: gcc.dg/vect/vect-reduc-or_2.c -flto -ffat-lto-objects  scan-tree-dump 
vect "Reduce using vector shifts"
+FAIL: gcc.dg/vect/vect-reduc-or_2.c scan-tree-dump vect "Reduce using vector 
shifts"

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH, fortran] Support Fortran 2018 teams

2018-01-24 Thread Paul Richard Thomas
Hi All,

Given the delay relative to the start of stage 3, I thought that I had
better deal with this asap:


+  /* TODO: this works on any derived type when
+ it should only work with team_type.  */
+  if (team->ts.type != BT_DERIVED)
Why don't you give the team_type derived type an attribute 'team_type'
and test that?

-   code node is passed.  The result type and library subroutine name
+   code ndoe is passed.  The result type and library subroutine name
typo

+! Tests if team_number intrinsic fucntion works

It's just as well that there is an 'n' in there, although it gives me
an idea for a new type of fortran procedure that does what it says :-)

Together with the change that Steve identified, this seems to me to be
ready to go.

(Jakub, This is all hidden behind the -fcoarray option. To my mind
this is safe for release.)

OK for trunk.

Many thanks for this patch.

Paul


On 23 January 2018 at 05:45, Steve Kargl
 wrote:
> I'm heading out of town for a meeting at the end of
> week, so gfortran patches/reviews are on hold at the
> moment.  If someone else wants to step up to review
> the patch, I won't object.
>
> --
> steve
>
> On Mon, Jan 22, 2018 at 08:29:41PM -0800, Damian Rouson wrote:
>> Is Fortran 2018 teams patch ok for trunk?
>>
>> Damian
>>
>> On January 19, 2018 at 2:47:39 PM, Alessandro Fanfarillo (elfa...@ucar.edu) 
>> wrote:
>>
>> I can confirm that the little change suggested by Steve passes the
>> regtests (on x86_64-pc-linux-gnu) and the regular tests using
>> OpenCoarrays.
>>
>> On Fri, Jan 19, 2018 at 10:33 AM, Steve Kargl
>>  wrote:
>> > On Fri, Jan 19, 2018 at 09:18:14AM -0800, Damian Rouson wrote:
>> >> Thanks for catching that, Steve, and for responding, Alessandro.
>> >>
>> >> Anything else?
>> >>
>> >
>> > I've only just started to look at the patch. Unfortunately,
>> > I know zero about teams, so need to read the patch and F2018
>> > standard simultaneously.
>> >
>> > --
>> > Steve
>>
>>
>>
>> --
>>
>> Alessandro Fanfarillo, Ph.D.
>> Postdoctoral Researcher
>> National Center for Atmospheric Research
>> Mesa Lab, Boulder, CO, USA
>> 303-497-1229
>
> --
> Steve
> 20170425 https://www.youtube.com/watch?v=VWUpyCsUKR4
> 20161221 https://www.youtube.com/watch?v=IbCHE-hONow



-- 
"If you can't explain it simply, you don't understand it well enough"
- Albert Einstein


Re: [PATCH], PR target/81550, Rewrite PowerPC loop_align test so it still tests the original target hook

2018-01-24 Thread Michael Meissner
On Wed, Jan 24, 2018 at 12:35:38PM -0600, Segher Boessenkool wrote:
> Hi!
> 
> On Wed, Jan 24, 2018 at 12:27:55AM -0500, Michael Meissner wrote:
> > 
> > As Segher and I were discussing over private IRC, the root cause of this 
> > bug is
> > the compiler no long generates the BDNZ instruction for a count down loop,
> > instead it decrements the index in a GPR and does a branch/comparison on it.
> 
> Yes, ivopts makes a bad decision (it uses stride 8 for all IVs, it should
> keep one with stride -1 for the loop counter, for optimal code; it also
> does three separate increments for the three memory accesses, which is
> a bit excessive here).
> 
> > In doing so, it now unrolls the loop twice, and and the resulting loop is 
> > too
> > big for the target hook TARGET_ASM_LOOP_ALIGN_MAX_SKIP.  This means the loop
> > isn't aligned to a 32 byte boundary.
> 
> It's not really unrolling, it is bb-reorder copying an RTL block.  However,
> even if you disable it you still get 9 insns on some configurations, so
> your patch does not hide the problem :-(
> 
> Although, hrm, in your patch you also change "int i" to "long i"; that
> alone seems to be enough to fix everything?  Could you check that please?

Changing i and n to either 'long' or 'long unsigned' makes the test work.

It is interesting that -mcpu=power7 -mbig does not seem to be able to create
LFDU and STFDU, but either setting cpu to power8/power9 or setting -mbig to
-mlittle or -m32 it can generate those instructions.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH] Fix gcc.target/aarch64/sve/peel_ind_1.c for -mcmodel=tiny

2018-01-24 Thread Richard Sandiford
Szabolcs Nagy  writes:
> Fix test failures with -mcmodel=tiny when adr is generated instead of adrp.
>
> FAIL: gcc.target/aarch64/sve/peel_ind_1.c -march=armv8.2-a+sve
> scan-assembler \\tadrp\\tx[0-9]+, x\\n
> FAIL: gcc.target/aarch64/sve/peel_ind_2.c -march=armv8.2-a+sve
> scan-assembler \\tadrp\\tx[0-9]+, x\\n
> FAIL: gcc.target/aarch64/sve/peel_ind_3.c -march=armv8.2-a+sve
> scan-assembler \\tadrp\\tx[0-9]+, x\\n
>
> gcc/testsuite/ChangeLog:
>
> 2018-01-24  Szabolcs Nagy  
>
>  * gcc.target/aarch64/sve/peel_ind_1.c: Match (adrp|adr) in 
> scan-assembler.
>  * gcc.target/aarch64/sve/peel_ind_2.c: Likewise.
>  * gcc.target/aarch64/sve/peel_ind_3.c: Likewise.

LGTM FWIW.  Thanks for fixing this!

Richard


patch to fix PR84014

2018-01-24 Thread Vladimir Makarov

The following patch fixes

   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84014

The patch was tested on powerpc64 and bootstrapped on x86-64.

Committed as rev. 257029.

Index: ChangeLog
===
--- ChangeLog	(revision 257028)
+++ ChangeLog	(working copy)
@@ -1,3 +1,9 @@
+2018-01-24  Vladimir Makarov  
+
+	PR target/84014
+	* ira-build.c (setup_min_max_allocno_live_range_point): Set up
+	min/max for never referenced object.
+
 2018-01-24  Jakub Jelinek  
 
 	PR middle-end/83977
Index: testsuite/ChangeLog
===
--- testsuite/ChangeLog	(revision 257028)
+++ testsuite/ChangeLog	(working copy)
@@ -1,3 +1,8 @@
+2018-01-24  Vladimir Makarov  
+
+	PR target/84014
+	* gcc.target/powerpc/pr84014.c: New.
+
 2018-01-24  Jakub Jelinek  
 
 	PR middle-end/83977
Index: ira-build.c
===
--- ira-build.c	(revision 256891)
+++ ira-build.c	(working copy)
@@ -2728,7 +2728,13 @@ setup_min_max_allocno_live_range_point (
 	ira_object_t parent_obj;
 
 	if (OBJECT_MAX (obj) < 0)
-	  continue;
+	  {
+		/* The object is not used and hence does not live.  */
+		ira_assert (OBJECT_LIVE_RANGES (obj) == NULL);
+		OBJECT_MAX (obj) = 0;
+		OBJECT_MIN (obj) = 1;
+		continue;
+	  }
 	ira_assert (ALLOCNO_CAP_MEMBER (a) == NULL);
 	/* Accumulation of range info.  */
 	if (ALLOCNO_CAP (a) != NULL)
Index: testsuite/gcc.target/powerpc/pr84014.c
===
--- testsuite/gcc.target/powerpc/pr84014.c	(nonexistent)
+++ testsuite/gcc.target/powerpc/pr84014.c	(working copy)
@@ -0,0 +1,20 @@
+/* { dg-do compile { target powerpc*-*-* } }*/
+/* { dg-options "-O1 -fno-split-wide-types -m32 -mcpu=e300c3" } */
+
+int
+nh (void)
+{
+}
+
+long long int
+si (void)
+{
+}
+
+int
+xf (int fg)
+{
+  int y5 = nh ();
+  fg += !!y5 ? y5 : si ();
+  return fg;
+}


Re: [PATCH], PR target/81550, Rewrite PowerPC loop_align test so it still tests the original target hook

2018-01-24 Thread Segher Boessenkool
Hi!

On Wed, Jan 24, 2018 at 12:27:55AM -0500, Michael Meissner wrote:
> 
> As Segher and I were discussing over private IRC, the root cause of this bug 
> is
> the compiler no long generates the BDNZ instruction for a count down loop,
> instead it decrements the index in a GPR and does a branch/comparison on it.

Yes, ivopts makes a bad decision (it uses stride 8 for all IVs, it should
keep one with stride -1 for the loop counter, for optimal code; it also
does three separate increments for the three memory accesses, which is
a bit excessive here).

> In doing so, it now unrolls the loop twice, and and the resulting loop is too
> big for the target hook TARGET_ASM_LOOP_ALIGN_MAX_SKIP.  This means the loop
> isn't aligned to a 32 byte boundary.

It's not really unrolling, it is bb-reorder copying an RTL block.  However,
even if you disable it you still get 9 insns on some configurations, so
your patch does not hide the problem :-(

Although, hrm, in your patch you also change "int i" to "long i"; that
alone seems to be enough to fix everything?  Could you check that please?


Segher


Re: [PATCH] Fix ICEs with "omp declare simd" attribute on versioned fns or omp_fn* (PR middle-end/83977)

2018-01-24 Thread Richard Biener
On January 24, 2018 6:51:54 PM GMT+01:00, Richard Biener  
wrote:
>On January 24, 2018 6:40:25 PM GMT+01:00, Jakub Jelinek
> wrote:
>>On Wed, Jan 24, 2018 at 06:36:02PM +0100, Martin Jambor wrote:
>>> > I think there's already a set of attributes that prevent cloning
>>and
>>> > or are adjusted by the IPA param machinery. The Martins or Honza
>>> > should know better.
>>> 
>>> I am not sure I understand the problem but if
>>> tree_versionable_function_p returns false, the local.versionable bit
>>is
>>> not set and no cloning for that function happens.
>>> 
>>> If it is sufficient that IPA-CP and other IPA passes do not change
>>the
>>> function type in any way (in practice that they don't remove
>>> parameters), it is sufficient to clear the
>local.can_change_signature
>>> cgraph flag in compute_fn_summary() in ipa-fnsummary.c.  That is how
>>we
>>> handle, or rather avoid handling, fnspec attributes.
>>
>>Well, "omp declare simd" is a part of the ABI just for the original
>>exported
>>functions, for everything else it is a pure optimization, but I'm not
>>sure
>>if we want to deoptimize e.g. callers of these functions outside of
>>loops
>>by disabling the signature changing cloning for those.  For calls from
>>within OpenMP simd regions or other loops where we try hard to
>>vectorize
>>them, it might make sense not to change those callers, for callers
>from
>>other loops, a question.
>
>Until we can distinguish the cases I think not changing the signature
>by default might be a good thing. 

Otoh cloning for ipa cp and then inlining is OK. Not sure how the mitigation 
mechanism works. 

Richard. 

>Richard. 
>
>>  Jakub



Re: New istreambuf_iterator debug check

2018-01-24 Thread Petr Ovtchenkov
On Wed, 24 Jan 2018 17:39:59 +0100
François Dumont  wrote:

> Hi
> 
>      I'd like to propose this new debug check. Comparing with non-eos 
> istreambuf_iterator sounds like an obvious coding mistake.
> 
>      I propose it despite the stage 1 as it is just a new debug check, 
> it doesn't impact the lib in normal mode.
> 
>      Tested under Linux x86_64, ok to commit ?
> 
> François
> 

   bool
   equal(const istreambuf_iterator& __b) const
-  { return _M_at_eof() == __b._M_at_eof(); }
+  {
+   bool __this_at_eof = _M_at_eof();
+   bool __b_at_eof = __b._M_at_eof();
+
+   __glibcxx_requires_cond(__this_at_eof || __b_at_eof, _M_message(
+ "Abnormal comparison to non-end-of-stream istreambuf_iterator"));
+   return __this_at_eof == __b_at_eof;
+  }

Looks strange for me. It is legal and possible that istreambuf_iterator
will be in EOF state.

--

  - ptr


Re: [PATCH] Fix ICEs with "omp declare simd" attribute on versioned fns or omp_fn* (PR middle-end/83977)

2018-01-24 Thread Richard Biener
On January 24, 2018 6:40:25 PM GMT+01:00, Jakub Jelinek  
wrote:
>On Wed, Jan 24, 2018 at 06:36:02PM +0100, Martin Jambor wrote:
>> > I think there's already a set of attributes that prevent cloning
>and
>> > or are adjusted by the IPA param machinery. The Martins or Honza
>> > should know better.
>> 
>> I am not sure I understand the problem but if
>> tree_versionable_function_p returns false, the local.versionable bit
>is
>> not set and no cloning for that function happens.
>> 
>> If it is sufficient that IPA-CP and other IPA passes do not change
>the
>> function type in any way (in practice that they don't remove
>> parameters), it is sufficient to clear the local.can_change_signature
>> cgraph flag in compute_fn_summary() in ipa-fnsummary.c.  That is how
>we
>> handle, or rather avoid handling, fnspec attributes.
>
>Well, "omp declare simd" is a part of the ABI just for the original
>exported
>functions, for everything else it is a pure optimization, but I'm not
>sure
>if we want to deoptimize e.g. callers of these functions outside of
>loops
>by disabling the signature changing cloning for those.  For calls from
>within OpenMP simd regions or other loops where we try hard to
>vectorize
>them, it might make sense not to change those callers, for callers from
>other loops, a question.

Until we can distinguish the cases I think not changing the signature by 
default might be a good thing. 

Richard. 

>   Jakub




[build] Configure USE_HIDDEN_LINKONCE on Solaris/x86

2018-01-24 Thread Rainer Orth
Prompted by PR target/83838 (Many gcc.target/i386/indirect-thunk*.c
tests FAIL), which is caused by this snippet in i386/sol2.h

/* Only recent versions of Solaris 11 ld properly support hidden .gnu.linkonce
   sections, so don't use them.  */
#ifndef USE_GLD
#define USE_HIDDEN_LINKONCE 0
#endif

I had a fresh look at enabling it at configure time if possible.  A
first test on Solaris 10 and 11/x86 showed that even the latest Solaris
10/x86 ld doesn't work, with a couple of testsuite failures like

ld: fatal: symbol '__x86.get_pc_thunk.ax' is multiply-defined:
(file cp_lto_20081118_0.o type=FUNC; file cp_lto_20081118_1.o 
type=FUNC);

which matches the fact that full support for .gnu.linkonce sections and
comdat as emitted by gcc only appeared in Solaris 11.

However, AFAICT this happened in the early OpenSolaris days, so
USE_HIDDEN_LINKONCE can safely be enabled on Solaris 11/x86.

This is what the following patch does, being careful to only affect
Solaris/x86 targets.  Two testcases had to be adjusted not to XPASS now.

Bootstrapped without regressions on i386-pc-solaris2.10,
i386-pc-solaris2.11 (both with as/ld and gas/gld), and
x86_64-pc-linux-gnu.  Now, the PR target/83838 failures are gone on
Solaris 11/x86, as expected.

Even though we're late in the GCC 8 release cycle, I believe the patch
is safe enough to be applied at this stage, so I'll do so in a day or
two unless someone objects.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2018-01-21  Rainer Orth  

gcc/testsuite:
* gcc.target/i386/mcount_pic.c: Only xfail get_pc_thunk scan on
Solaris 10.
* gcc.target/i386/pr63620.c: Likewise.

gcc:
* config/i386/sol2.h (USE_HIDDEN_LINKONCE): Remove.
* configure.ac (hidden_linkonce): New test.
* configure: Regenerate.
* config.in: Regenerate.

# HG changeset patch
# Parent  c7f14a8b12c25c407e4379959d7ecf27040d9ca1
Configure USE_HIDDEN_LINKONCE on Solaris/x86

diff --git a/gcc/config/i386/sol2.h b/gcc/config/i386/sol2.h
--- a/gcc/config/i386/sol2.h
+++ b/gcc/config/i386/sol2.h
@@ -253,9 +253,3 @@ along with GCC; see the file COPYING3.  
 /* We do not need NT_VERSION notes.  */
 #undef X86_FILE_START_VERSION_DIRECTIVE
 #define X86_FILE_START_VERSION_DIRECTIVE false
-
-/* Only recent versions of Solaris 11 ld properly support hidden .gnu.linkonce
-   sections, so don't use them.  */
-#ifndef USE_GLD
-#define USE_HIDDEN_LINKONCE 0
-#endif
diff --git a/gcc/configure.ac b/gcc/configure.ac
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -3070,6 +3070,31 @@ AC_DEFINE_UNQUOTED(HAVE_COMDAT_GROUP,
 || test $gcc_cv_as_comdat_group_group = yes; then echo 1; else echo 0; fi`],
 [Define 0/1 if your assembler and linker support COMDAT groups.])
 
+# Restrict this test to Solaris/x86: other targets define this statically.
+case "${target}" in
+  i?86-*-solaris2* | x86_64-*-solaris2*)
+AC_MSG_CHECKING(support for hidden thunks in linkonce sections)
+if test $in_tree_ld = yes || echo "$ld_ver" | grep GNU > /dev/null; then
+  hidden_linkonce=yes
+else
+  case "${target}" in
+	# Full support for hidden thunks in linkonce sections only appeared in
+	# Solaris 11/OpenSolaris.
+*-*-solaris2.1[[1-9]]*)
+	  hidden_linkonce=yes
+	  ;;
+	*)
+	  hidden_linkonce=no
+	  ;;
+  esac
+fi
+AC_MSG_RESULT($hidden_linkonce)
+AC_DEFINE_UNQUOTED(USE_HIDDEN_LINKONCE,
+  [`if test $hidden_linkonce = yes; then echo 1; else echo 0; fi`],
+[Define 0/1 if your linker supports hidden thunks in linkonce sections.])
+  ;;
+esac
+
 gcc_GAS_CHECK_FEATURE([line table discriminator support],
  gcc_cv_as_discriminator,
  [2,19,51],,
diff --git a/gcc/testsuite/gcc.target/i386/mcount_pic.c b/gcc/testsuite/gcc.target/i386/mcount_pic.c
--- a/gcc/testsuite/gcc.target/i386/mcount_pic.c
+++ b/gcc/testsuite/gcc.target/i386/mcount_pic.c
@@ -11,5 +11,5 @@ int main ()
 }
 
 /* { dg-final { scan-assembler "mcount" } } */
-/* { dg-final { scan-assembler "get_pc_thunk" { xfail { *-*-solaris* && { ! gld } } } } } */
+/* { dg-final { scan-assembler "get_pc_thunk" { xfail { *-*-solaris2.10* && { ! gld } } } } } */
 /* { dg-final { cleanup-profile-file } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr63620.c b/gcc/testsuite/gcc.target/i386/pr63620.c
--- a/gcc/testsuite/gcc.target/i386/pr63620.c
+++ b/gcc/testsuite/gcc.target/i386/pr63620.c
@@ -17,4 +17,4 @@ test (__float128 x, int p, func f)
   return x;
 }
 
-/* { dg-final { scan-assembler "get_pc_thunk" { xfail { *-*-solaris* && { ! gld } } } } } */
+/* { dg-final { scan-assembler "get_pc_thunk" { xfail { *-*-solaris2.10* && { ! gld } } } } } */


Re: [PATCH] Fix ICEs with "omp declare simd" attribute on versioned fns or omp_fn* (PR middle-end/83977)

2018-01-24 Thread Jakub Jelinek
On Wed, Jan 24, 2018 at 06:36:02PM +0100, Martin Jambor wrote:
> > I think there's already a set of attributes that prevent cloning and
> > or are adjusted by the IPA param machinery. The Martins or Honza
> > should know better.
> 
> I am not sure I understand the problem but if
> tree_versionable_function_p returns false, the local.versionable bit is
> not set and no cloning for that function happens.
> 
> If it is sufficient that IPA-CP and other IPA passes do not change the
> function type in any way (in practice that they don't remove
> parameters), it is sufficient to clear the local.can_change_signature
> cgraph flag in compute_fn_summary() in ipa-fnsummary.c.  That is how we
> handle, or rather avoid handling, fnspec attributes.

Well, "omp declare simd" is a part of the ABI just for the original exported
functions, for everything else it is a pure optimization, but I'm not sure
if we want to deoptimize e.g. callers of these functions outside of loops
by disabling the signature changing cloning for those.  For calls from
within OpenMP simd regions or other loops where we try hard to vectorize
them, it might make sense not to change those callers, for callers from
other loops, a question.

Jakub


Re: [PATCH] Fix ICEs with "omp declare simd" attribute on versioned fns or omp_fn* (PR middle-end/83977)

2018-01-24 Thread Martin Jambor
On Wed, Jan 24 2018, Richard Biener wrote:
> On January 24, 2018 5:16:45 PM GMT+01:00, Jakub Jelinek  
> wrote:
>>On Wed, Jan 24, 2018 at 05:08:10PM +0100, Richard Biener wrote:
>>> >The "omp declare simd" attribute refers to argument numbers of the
>>> >functions, so trying to apply it on versioned functions that can
>>> >perhaps
>>> >have different number and types of arguments results in ICEs or
>>> >wrong-code.
>>> >Unfortunately, if simd attribute or #pragma omp declare simd is used
>>> >on C++ ctors or dtors, those have DECL_ABSTRACT_ORIGIN of something
>>> >that
>>> >really doesn't exist, abstract ctor or dtor, so checking if
>>> >the types of node->decl and its DECL_ABSTRACT_ORIGIN are compatible
>>> >function
>>> >types doesn't work.
>>> 
>>> So if the attribute is on the decl we clone it should be in the list
>>of things that cloning adjusts or blocks cloning. 
>>
>>Yeah, guess I could move the attribute removal code from omp-low.c to
>>some
>>helper function and in tree_function_versioning check if the attribute
>>is
>>present and if yes and the old/new function types don't match, drop the
>>attribute.  Or, add some flag to cgraph whether the attribute should be
>>honored or not, and clear it in tree_function_versioning and
>>omp_create_child_function instead of removing the attribute.
>
> I think there's already a set of attributes that prevent cloning and
> or are adjusted by the IPA param machinery. The Martins or Honza
> should know better.

I am not sure I understand the problem but if
tree_versionable_function_p returns false, the local.versionable bit is
not set and no cloning for that function happens.

If it is sufficient that IPA-CP and other IPA passes do not change the
function type in any way (in practice that they don't remove
parameters), it is sufficient to clear the local.can_change_signature
cgraph flag in compute_fn_summary() in ipa-fnsummary.c.  That is how we
handle, or rather avoid handling, fnspec attributes.

Martin



Re: [SFN+LVU+IEPM v4 9/9] [IEPM] Introduce inline entry point markers

2018-01-24 Thread Jakub Jelinek
On Tue, Dec 12, 2017 at 12:54:13AM -0200, Alexandre Oliva wrote:
> +/* Check whether BLOCK, a lexical block, is nested within OUTER, or is
> +   OUTER itself.  If BOTHWAYS, check not only that BLOCK can reach
> +   OUTER through BLOCK_SUPERCONTEXT links, but also that there is a
> +   path from OUTER to BLOCK through BLOCK_SUBBLOCKs and
> +   BLOCK_FRAGMENT_ORIGIN links.  */
> +static bool
> +block_within_block_p (tree block, tree outer, bool bothways)
> +{
> +  if (block == outer)
> +return true;
> +
> +  /* Quickly check that OUTER is up BLOCK's supercontext chain.  */
> +  for (tree context = BLOCK_SUPERCONTEXT (block);
> +   context != outer;
> +   context = BLOCK_SUPERCONTEXT (context))
> +if (!context || TREE_CODE (context) != BLOCK)
> +  return false;
> +
> +  if (!bothways)
> +return true;
> +
> +  /* Now check that each block is actually referenced by its
> + parent.  */
> +  for (tree context = BLOCK_SUPERCONTEXT (block); ;
> +   context = BLOCK_SUPERCONTEXT (context))
> +{
> +  if (BLOCK_FRAGMENT_ORIGIN (context))
> + {
> +   gcc_assert (!BLOCK_SUBBLOCKS (context));
> +   context = BLOCK_FRAGMENT_ORIGIN (context);
> + }
> +  for (tree sub = BLOCK_SUBBLOCKS (context);
> +sub != block;
> +sub = BLOCK_CHAIN (sub))
> + if (!sub)
> +   return false;
> +  if (context == outer)
> + return true;
> +  else
> + block = context;
> +}
> +}
> +
> +/* Called during final while assembling the marker of the entry point
> +   for an inlined function.  */
> +
> +static void
> +dwarf2out_inline_entry (tree block)
> +{
> +  gcc_assert (DECL_P (block_ultimate_origin (block)));
> +
> +  /* Sanity check the block tree.  This would catch a case in which
> + BLOCK got removed from the tree reachable from the outermost
> + lexical block, but got retained in markers.  It would still link
> + back to its parents, but some ancestor would be missing a link
> + down the path to the sub BLOCK.  If the block got removed, its
> + BLOCK_NUMBER will not be a usable value.  */
> +  gcc_checking_assert (block_within_block_p (block,
> +  DECL_INITIAL
> +  (current_function_decl),
> +  true));

I think this asks for
  if (flag_checking)
gcc_assert (block_within_block_p (block,
  DECL_INITIAL (current_function_decl),
  true));

> --- a/gcc/tree-ssa-live.c
> +++ b/gcc/tree-ssa-live.c
> @@ -520,6 +520,11 @@ remove_unused_scope_block_p (tree scope, bool 
> in_ctor_dtor_block)
> else if (!BLOCK_SUPERCONTEXT (scope)
>  || TREE_CODE (BLOCK_SUPERCONTEXT (scope)) == FUNCTION_DECL)
>   unused = false;
> +   /* Preserve the block, it is referenced by at least the inline
> +  entry point marker.  */
> +   else if (debug_nonbind_markers_p
> + && inlined_function_outer_scope_p (scope))
> + unused = false;
> /* Innermost blocks with no live variables nor statements can be always
>eliminated.  */
> else if (!nsubblocks)
> @@ -548,11 +553,13 @@ remove_unused_scope_block_p (tree scope, bool 
> in_ctor_dtor_block)
>   }
> else if (BLOCK_VARS (scope) || BLOCK_NUM_NONLOCALIZED_VARS (scope))
>   unused = false;
> -   /* See if this block is important for representation of inlined function.
> -  Inlined functions are always represented by block with
> -  block_ultimate_origin being set to FUNCTION_DECL and 
> DECL_SOURCE_LOCATION
> -  set...  */
> -   else if (inlined_function_outer_scope_p (scope))
> +   /* See if this block is important for representation of inlined
> +  function.  Inlined functions are always represented by block
> +  with block_ultimate_origin being set to FUNCTION_DECL and
> +  DECL_SOURCE_LOCATION set, unless they expand to nothing...  But
> +  see above for the case of statement frontiers.  */
> +   else if (!debug_nonbind_markers_p
> + && inlined_function_outer_scope_p (scope))
>   unused = false;

Wonder what the above hunks will do for LTO memory consumption.  We'll see.

Otherwise the patch looks reasonable to me, but I think it depends on the
7/9.

Jakub


Re: [SFN+LVU+IEPM v4 7/9] [LVU] Introduce location views

2018-01-24 Thread Jakub Jelinek
On Tue, Dec 12, 2017 at 12:52:18AM -0200, Alexandre Oliva wrote:
> --- a/include/dwarf2.h
> +++ b/include/dwarf2.h
> @@ -298,6 +298,14 @@ enum dwarf_location_list_entry_type
>  DW_LLE_start_end = 0x07,
>  DW_LLE_start_length = 0x08,
>  
> +/* 
> 
> +   has the proposal for now; only available to list members.
> +
> +   A (possibly updated) copy of the proposal is available at
> +   .  */
> +DW_LLE_GNU_view_pair = 0x09,
> +#define DW_LLE_view_pair DW_LLE_GNU_view_pair
> +

This looks wrong.  The proposal has not been accepted yet, so you
really can't know if DW_LLE_view_pair will be like that or whether it
will have value of 9.  Unfortunately, the DWARF standard doesn't specify a
vendor range for DW_LLE_* values.  I'd use 0xf0 or so, and don't pretend
there is DW_LLE_view_pair at all, just use DW_LLE_GNU_view_pair everywhere.
Jason, what do you think?

> --- a/gcc/dwarf2asm.c
> +++ b/gcc/dwarf2asm.c
> @@ -767,6 +767,33 @@ dw2_asm_output_data_sleb128 (HOST_WIDE_INT value,
>va_end (ap);
>  }
>  
> +/* output symbol LAB1 as an unsigned LEB128 quantity.  */

Capital O in Output please.

> +static inline bool
> +dwarf2out_locviews_in_attribute ()
> +{
> +  return debug_variable_location_views
> +&& dwarf_version <= 5;

Formatting, should be
  return debug_variable_location_views && dwarf_version <= 5;
or
  return (debug_variable_location_views
  && dwarf_version <= 5);
if it wouldn't fit (but it does).

> +static inline bool
> +dwarf2out_locviews_in_loclist ()
> +{
> +#ifndef DW_LLE_view_pair
> +  return false;
> +#else
> +  return debug_variable_location_views
> +&& dwarf_version >= 6;

Likewise.

> +
> +static bool
> +output_asm_line_debug_info (void)
> +{
> +  return DWARF2_ASM_VIEW_DEBUG_INFO
> +|| (DWARF2_ASM_LINE_DEBUG_INFO && !debug_variable_location_views);

Likewise.

> +  dw2_asm_output_data (1, DW_LLE_view_pair,
> +"DW_LLE_view_pair");

This also fits on a single line.

> +/* Output the dwarf version number.  */
> +
> +static void
> +output_dwarf_version ()
> +{
> +  /* ??? For now, if -gdwarf-6 is specified, we output version 5 with
> + views in loclist.  That will change eventually.  */
> +  if (dwarf_version == 6)
> +{
> +  static bool once;
> +  if (!once)
> + {
> +   warning (0,
> +"-gdwarf-6 is output as version 5 with incompatibilities");
> +   once = true;
> + }
> +  dw2_asm_output_data (2, 5, "DWARF version number");
> +}

Do we really need to introduce -gdwarf-6 at this point?
-gdwarf-5 -gvariable-location-views should be sufficient, isn't it?
We don't know at all what will it look like in 3 or how many years.
My preference would be to keep all those dwarf_version == 6 related changes
out, including this output_dwarf_version function etc.

> +  const char *label = NOTE_DURING_CALL_P (loc_note)
> + ? last_postcall_label : last_label;

Again wrong formatting,
  const char *label
= NOTE_DURING_CALL_P (loc_note) ? last_postcall_label : last_label;
is better.

> +  return !DECL_IGNORED_P (current_function_decl)
> +&& debug_variable_location_views
> +&& insn && GET_CODE (insn) == NOTE
> +&& NOTE_KIND (insn) == NOTE_INSN_VAR_LOCATION;

Formatting.

Jakub


Re: [aarch64][PATCH v2] Disable reg offset in quad-word store for Falkor

2018-01-24 Thread Siddhesh Poyarekar
On Wednesday 24 January 2018 06:29 PM, Siddhesh Poyarekar wrote:
>>> +  /* Avoid register indexing for 128-bit stores when the
>>> + AARCH64_EXTRA_TUNE_SLOW_REGOFFSET_QUADWORD_STORE option is set.  */
>>> +  if (!optimize_size
>>> +  && type == ADDR_QUERY_STR
>>> +  && (aarch64_tune_params.extra_tuning_flags
>>> + & AARCH64_EXTRA_TUNE_SLOW_REGOFFSET_QUADWORD_STORE)
>>> +  && (mode == TImode || mode == TFmode
>>> + || aarch64_vector_data_mode_p (mode)))
>>> +    allow_reg_index_p = false;
>>
>> The aarch64_classify_vector_mode code has been reworked recently for SVE
>> so I'm not entirely
>> up to date with its logic, but I believe that
>> "aarch64_classify_vector_mode (mode)" will
>> allow 64-bit vector modes, which would not be using the 128-bit Q
>> register, so you may be disabling
>> register indexing for D-register memory stores.
> 
> I check this and fix the condition if necessary.

Looking back at the patch I remember why I used
aarch64_vector_data_mode_p; this is to catch the pattern
aarch64_simd_mov which optimizes a 64-bit store pair into a
single quad word store.  It should not avoid register indexing for any
other vector modes since their patterns won't pass ADDR_QUERY_STR.  In
any case, I will be doing the CPU2017 run without -mcpu=falkor, so I'll
report results from that.

Siddhesh


[PATCH, rs6000] Updates for vec_cmp_*() gimple-folding vector testcases

2018-01-24 Thread Will Schmidt
Hi,
  Assorted updates for the vector intrinsics / gimple folding vec_cmp()
testcases to handle codegen variations as seen between P8,P9 targets.
This breaks apart the testcases into a #included header that contains the
testcase contents, and fold-vec-cmp-*.c tests that contain the target specific
stanzas. 

sniff-tested across power systems.
This cleans up a handful of errors seen on P9.

OK for trunk?
Thanks,
-Will

[testsuite]

2018-01-24  Will Schmidt  

* gcc.target/powerpc/fold-vec-cmp-int.c:  Delete.
* gcc.target/powerpc/fold-vec-cmp-int.h:  New.
* gcc.target/powerpc/fold-vec-cmp-int.p7.c:  New.
* gcc.target/powerpc/fold-vec-cmp-int.p8.c:  New.
* gcc.target/powerpc/fold-vec-cmp-int.p9.c:  New.
* gcc.target/powerpc/fold-vec-cmp-short.c:  Delete.
* gcc.target/powerpc/fold-vec-cmp-short.h:  New.
* gcc.target/powerpc/fold-vec-cmp-short.p8.c:  New.
* gcc.target/powerpc/fold-vec-cmp-short.p9.c:  New.
* gcc.target/powerpc/fold-vec-cmp-char.c:  Delete.
* gcc.target/powerpc/fold-vec-cmp-char.h:  New.
* gcc.target/powerpc/fold-vec-cmp-char.p8.c:  New.
* gcc.target/powerpc/fold-vec-cmp-char.p9.c:  New.

diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-cmp-char.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-cmp-char.c
deleted file mode 100644
index 3a1aa60..000
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-cmp-char.c
+++ /dev/null
@@ -1,86 +0,0 @@
-/* Verify that overloaded built-ins for vec_cmp{eq,ge,gt,le,lt,ne} with
-   char inputs produce the right code.  */
-
-/* { dg-do compile } */
-/* { dg-require-effective-target powerpc_p8vector_ok } */
-/* { dg-options "-mpower8-vector -O2" } */
-
-#include 
-
-vector bool char
-test3_eq (vector signed char x, vector signed char y)
-{
-  return vec_cmpeq (x, y);
-}
-
-vector bool char
-test6_eq (vector unsigned char x, vector unsigned char y)
-{
-  return vec_cmpeq (x, y);
-}
-
-vector bool char
-test3_ge (vector signed char x, vector signed char y)
-{
-  return vec_cmpge (x, y);
-}
-
-vector bool char
-test6_ge (vector unsigned char x, vector unsigned char y)
-{
-  return vec_cmpge (x, y);
-}
-
-vector bool char
-test3_gt (vector signed char x, vector signed char y)
-{
-  return vec_cmpgt (x, y);
-}
-
-vector bool char
-test6_gt (vector unsigned char x, vector unsigned char y)
-{
-  return vec_cmpgt (x, y);
-}
-
-vector bool char
-test3_le (vector signed char x, vector signed char y)
-{
-  return vec_cmple (x, y);
-}
-
-vector bool char
-test6_le (vector unsigned char x, vector unsigned char y)
-{
-  return vec_cmple (x, y);
-}
-
-vector bool char
-test3_lt (vector signed char x, vector signed char y)
-{
-  return vec_cmplt (x, y);
-}
-
-vector bool char
-test6_lt (vector unsigned char x, vector unsigned char y)
-{
-  return vec_cmplt (x, y);
-}
-
-vector bool char
-test3_ne (vector signed char x, vector signed char y)
-{
-  return vec_cmpne (x, y);
-}
-
-vector bool char
-test6_ne (vector unsigned char x, vector unsigned char y)
-{
-  return vec_cmpne (x, y);
-}
-
-/* { dg-final { scan-assembler-times "vcmpequb" 4 } } */
-/* { dg-final { scan-assembler-times "vcmpgtsb" 4 } } */
-/* { dg-final { scan-assembler-times "vcmpgtub" 4 } } */
-/* { dg-final { scan-assembler-times "xxlnor" 6 } } */
-
diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-cmp-char.h 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-cmp-char.h
new file mode 100644
index 000..5316121
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-cmp-char.h
@@ -0,0 +1,77 @@
+/* Header file for fold-vec-cmp-char*.c tests.  Used to verify codegen results
+   for vec_cmp{eq,ge,gt,le,lt,ne} builtins.  */
+
+#include 
+
+vector bool char
+test3_eq (vector signed char x, vector signed char y)
+{
+  return vec_cmpeq (x, y);
+}
+
+vector bool char
+test6_eq (vector unsigned char x, vector unsigned char y)
+{
+  return vec_cmpeq (x, y);
+}
+
+vector bool char
+test3_ge (vector signed char x, vector signed char y)
+{
+  return vec_cmpge (x, y);
+}
+
+vector bool char
+test6_ge (vector unsigned char x, vector unsigned char y)
+{
+  return vec_cmpge (x, y);
+}
+
+vector bool char
+test3_gt (vector signed char x, vector signed char y)
+{
+  return vec_cmpgt (x, y);
+}
+
+vector bool char
+test6_gt (vector unsigned char x, vector unsigned char y)
+{
+  return vec_cmpgt (x, y);
+}
+
+vector bool char
+test3_le (vector signed char x, vector signed char y)
+{
+  return vec_cmple (x, y);
+}
+
+vector bool char
+test6_le (vector unsigned char x, vector unsigned char y)
+{
+  return vec_cmple (x, y);
+}
+
+vector bool char
+test3_lt (vector signed char x, vector signed char y)
+{
+  return vec_cmplt (x, y);
+}
+
+vector bool char
+test6_lt (vector unsigned char x, vector unsigned char y)
+{
+  return vec_cmplt (x, y);
+}
+
+vector bool char
+test3_ne (vector signed char x, vector signed char y)
+{
+  return vec_cmpne (x, y);
+}
+
+vector bool char
+test6_ne (vector unsigned 

[PATCH, rs6000] Updates for vec_abs() gimple-folding vector tests

2018-01-24 Thread Will Schmidt

Hi,

Assorted testcase updates to handle codegen variations between P7,p8,p9.
This breaks out the tests into p7,p8,p9 -specific versions of the same.

Sniff-tested on multiple systems, this clears up multiple errors
currently seen on P9.

OK for trunk?
Thanks
-Will


[testsuite]

2018-01-24  Will Schmidt  

* gcc.target/powerpc/fold-vec-abs-int.c:  remove scan-assembler stanzas.
* gcc.target/powerpc/fold-vec-abs-int-fwrap.c:  Same.
* gcc.target/powerpc/fold-vec-abs-int.p7.c: New.
* gcc.target/powerpc/fold-vec-abs-int.p8.c: New.
* gcc.target/powerpc/fold-vec-abs-int.p9.c: New.
* gcc.target/powerpc/fold-vec-abs-int-fwrapv.p7.c: New.
* gcc.target/powerpc/fold-vec-abs-int-fwrapv.p8.c: New.
* gcc.target/powerpc/fold-vec-abs-int-fwrapv.p9.c: New.
* gcc.target/powerpc/fold-vec-abs-longlong.c:  remove scan-assembler 
stanzas.
* gcc.target/powerpc/fold-vec-abs-longlong-fwrap.c:  Same.
* gcc.target/powerpc/fold-vec-abs-longlong.p7.c: New.
* gcc.target/powerpc/fold-vec-abs-longlong.p8.c: New.
* gcc.target/powerpc/fold-vec-abs-longlong.p9.c: New.
* gcc.target/powerpc/fold-vec-abs-longlong-fwrapv.p7.c: New.
* gcc.target/powerpc/fold-vec-abs-longlong-fwrapv.p8.c: New.
* gcc.target/powerpc/fold-vec-abs-longlong-fwrapv.p9.c: New.
* gcc.target/powerpc/fold-vec-abs-short.c:  Add xxspltib to valid 
instruction list.
* gcc.target/powerpc/fold-vec-abs-short-fwrapv.c:  Same.

diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-abs-int-fwrapv.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-abs-int-fwrapv.c
index 34dead4..22eec38 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-abs-int-fwrapv.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-abs-int-fwrapv.c
@@ -11,8 +11,6 @@ vector signed int
 test1 (vector signed int x)
 {
   return vec_abs (x);
 }
 
-/* { dg-final { scan-assembler-times "vspltisw|vxor" 1 } } */
-/* { dg-final { scan-assembler-times "vsubuwm" 1 } } */
-/* { dg-final { scan-assembler-times "vmaxsw" 1 } } */
+/* scan-assembler stanzas moved to fold-vec-abs-int-fwrapv.p*.c tests.  */
diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-abs-int-fwrapv.p7.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-abs-int-fwrapv.p7.c
new file mode 100644
index 000..739f1c9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-abs-int-fwrapv.p7.c
@@ -0,0 +1,20 @@
+/* Verify that overloaded built-ins for vec_abs with int
+   inputs produce the right results when -mcpu=power7 is specified.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_altivec_ok } */
+/* { dg-options "-maltivec -O2 -mcpu=power7 -fwrapv" } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power7" } } */
+
+#include 
+
+vector signed int
+test1 (vector signed int x)
+{
+  return vec_abs (x);
+}
+
+/* { dg-final { scan-assembler-times "vspltisw|vxor" 1 } } */
+/* { dg-final { scan-assembler-times "vsubuwm" 1 } } */
+/* { dg-final { scan-assembler-times "vmaxsw" 1 } } */
+
diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-abs-int-fwrapv.p8.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-abs-int-fwrapv.p8.c
new file mode 100644
index 000..8c284ff
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-abs-int-fwrapv.p8.c
@@ -0,0 +1,20 @@
+/* Verify that overloaded built-ins for vec_abs with int
+   inputs produce the right results when -mcpu=power8 is specified.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_altivec_ok } */
+/* { dg-options "-maltivec -O2 -mcpu=power8 -fwrapv" } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power8" } } */
+
+#include 
+
+vector signed int
+test1 (vector signed int x)
+{
+  return vec_abs (x);
+}
+
+/* { dg-final { scan-assembler-times "vspltisw|vxor" 1 } } */
+/* { dg-final { scan-assembler-times "vsubuwm" 1 } } */
+/* { dg-final { scan-assembler-times "vmaxsw" 1 } } */
+
diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-abs-int-fwrapv.p9.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-abs-int-fwrapv.p9.c
new file mode 100644
index 000..cde86b8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-abs-int-fwrapv.p9.c
@@ -0,0 +1,19 @@
+/* Verify that overloaded built-ins for vec_abs with int
+   inputs produce the right results when -mcpu=power9 is specified.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_altivec_ok } */
+/* { dg-options "-maltivec -O2 -mcpu=power9 -fwrapv" } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power9" } } */
+
+#include 
+
+vector signed int
+test1 (vector signed int x)
+{
+  return vec_abs (x);
+}
+
+/* { dg-final { scan-assembler-times "vnegw" 1 } } */
+/* { dg-final { scan-assembler-times "vmaxsw" 1 } } */
+
diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-abs-int.c 

[PATCH, rs6000] Updates for vec_neg() gimple-folding vector tests

2018-01-24 Thread Will Schmidt
Hi,

Update the vec-neg-longlong folding tests to handle codegen variations
as seen between p8 and p9 targets.
This breaks out the tests into p7,p8,p9 versions of the same, while moving the
common testcase content into a #included header.

sniff-tested across power systems (P6,P8,P9)
This cleans up a handful of errors seen on P9.

OK for trunk?
THanks,
-Will

[testsuite]

2018-01-24  Will Schmidt  

* gcc.target/powerpcfold-vec-neg-longlong.h:  New.
* gcc.target/powerpc/fold-vec-neg-longlong.p8.c:  New.
* gcc.target/powerpc/fold-vec-neg-longlong.p9.c:  New.
* gcc.target/powerpc/fold-vec-neg-longlong.c:  Delete.
* gcc.target/powerpc/fold-vec-neg-int.c: Remove scan-assembler stanzas.
* gcc.target/powerpc/fold-vec-neg-int.p7.c: New.
* gcc.target/powerpc/fold-vec-neg-int.p8.c: New.
* gcc.target/powerpc/fold-vec-neg-int.p9.c: New.

diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-neg-int.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-neg-int.c
index d6ca128..4f35856 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-neg-int.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-neg-int.c
@@ -11,8 +11,6 @@ vector signed int
 test1 (vector signed int x)
 {
   return vec_neg (x);
 }
 
-/* { dg-final { scan-assembler-times "xxspltib|vspltisw|vxor" 1 } } */
-/* { dg-final { scan-assembler-times "vsubuwm" 1 } } */
-/* { dg-final { scan-assembler-times "vmaxsw" 0 } } */
+/* Scan-assembler stanzas have been moved to fold-vec-neg-int.p*.c tests. */
diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-neg-int.p7.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-neg-int.p7.c
new file mode 100644
index 000..8e99de3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-neg-int.p7.c
@@ -0,0 +1,19 @@
+/* Verify that overloaded built-ins for vec_neg with int
+   inputs produce the right code when -mcpu=power7 is specified.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_altivec_ok } */
+/* { dg-options "-maltivec -O2 -mcpu=power7" } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power7" } } */
+
+#include 
+
+vector signed int
+test1 (vector signed int x)
+{
+  return vec_neg (x);
+}
+
+/* { dg-final { scan-assembler-times "xxspltib|vspltisw|vxor" 1 } } */
+/* { dg-final { scan-assembler-times "vsubuwm" 1 } } */
+/* { dg-final { scan-assembler-times "vmaxsw" 0 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-neg-int.p8.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-neg-int.p8.c
new file mode 100644
index 000..91067ab
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-neg-int.p8.c
@@ -0,0 +1,19 @@
+/* Verify that overloaded built-ins for vec_neg with int
+   inputs produce the right code when -mcpu=power8 is specified.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_altivec_ok } */
+/* { dg-options "-maltivec -O2 -mcpu=power8" } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power8" } } */
+
+#include 
+
+vector signed int
+test1 (vector signed int x)
+{
+  return vec_neg (x);
+}
+
+/* { dg-final { scan-assembler-times "xxspltib|vspltisw|vxor" 1 } } */
+/* { dg-final { scan-assembler-times "vsubuwm" 1 } } */
+/* { dg-final { scan-assembler-times "vmaxsw" 0 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-neg-int.p9.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-neg-int.p9.c
new file mode 100644
index 000..44732c8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-neg-int.p9.c
@@ -0,0 +1,18 @@
+/* Verify that overloaded built-ins for vec_neg with int
+   inputs produce the right code when -mcpu=power9 is specified.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_altivec_ok } */
+/* { dg-options "-maltivec -O2 -mcpu=power9" } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power9" } } */
+
+#include 
+
+vector signed int
+test1 (vector signed int x)
+{
+  return vec_neg (x);
+}
+
+/* { dg-final { scan-assembler-times "vnegw" 1 } } */
+
diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-neg-longlong.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-neg-longlong.c
deleted file mode 100644
index 48f7178..000
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-neg-longlong.c
+++ /dev/null
@@ -1,18 +0,0 @@
-/* Verify that overloaded built-ins for vec_neg with long long
-   inputs produce the right code.  */
-
-/* { dg-do compile } */
-/* { dg-require-effective-target powerpc_p8vector_ok } */
-/* { dg-options "-mpower8-vector -O2" } */
-
-#include 
-
-vector signed long long
-test3 (vector signed long long x)
-{
-  return vec_neg (x);
-}
-
-/* { dg-final { scan-assembler-times "xxspltib|vspltisw" 1 } } */
-/* { dg-final { scan-assembler-times "vsubudm" 1 } } */
-/* { dg-final { scan-assembler-times "vmaxsd" 0 } } */
diff --git 

New istreambuf_iterator debug check

2018-01-24 Thread François Dumont

Hi

    I'd like to propose this new debug check. Comparing with non-eos 
istreambuf_iterator sounds like an obvious coding mistake.


    I propose it despite the stage 1 as it is just a new debug check, 
it doesn't impact the lib in normal mode.


    Tested under Linux x86_64, ok to commit ?

François

diff --git a/libstdc++-v3/include/bits/streambuf_iterator.h b/libstdc++-v3/include/bits/streambuf_iterator.h
index 292ef3a..2e46771 100644
--- a/libstdc++-v3/include/bits/streambuf_iterator.h
+++ b/libstdc++-v3/include/bits/streambuf_iterator.h
@@ -174,7 +174,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   /// Return true both iterators are end or both are not end.
   bool
   equal(const istreambuf_iterator& __b) const
-  { return _M_at_eof() == __b._M_at_eof(); }
+  {
+	bool __this_at_eof = _M_at_eof();
+	bool __b_at_eof = __b._M_at_eof();
+
+	__glibcxx_requires_cond(__this_at_eof || __b_at_eof, _M_message(
+	  "Abnormal comparison to non-end-of-stream istreambuf_iterator"));
+	return __this_at_eof == __b_at_eof;
+  }
 
 private:
   int_type
diff --git a/libstdc++-v3/testsuite/24_iterators/istreambuf_iterator/1.cc b/libstdc++-v3/testsuite/24_iterators/istreambuf_iterator/1.cc
new file mode 100644
index 000..bb2e6f8
--- /dev/null
+++ b/libstdc++-v3/testsuite/24_iterators/istreambuf_iterator/1.cc
@@ -0,0 +1,50 @@
+// Copyright (C) 2018 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+#include 
+#include 
+
+#include 
+
+void test01()
+{
+  std::istringstream istrs("123456789");
+  typedef std::istreambuf_iterator cistreambuf_iter;
+
+  cistreambuf_iter it(istrs);
+  cistreambuf_iter eof;
+  VERIFY( !it.equal(eof) );
+  VERIFY( !eof.equal(it) );
+  VERIFY( it != eof );
+  VERIFY( eof != it );
+}
+
+void test02()
+{
+  typedef std::istreambuf_iterator cistreambuf_iter;
+
+  cistreambuf_iter eof;
+  VERIFY( eof.equal(eof) );
+  VERIFY( eof == eof );
+}
+
+int main()
+{
+  test01();
+  test02();
+  return 0;
+}
diff --git a/libstdc++-v3/testsuite/24_iterators/istreambuf_iterator/2.cc b/libstdc++-v3/testsuite/24_iterators/istreambuf_iterator/2.cc
index 3fe1cf1..47a1a00 100644
--- a/libstdc++-v3/testsuite/24_iterators/istreambuf_iterator/2.cc
+++ b/libstdc++-v3/testsuite/24_iterators/istreambuf_iterator/2.cc
@@ -47,7 +47,10 @@ void test02(void)
 
   cistreambuf_iter istrb_it05(istrs01);
   cistreambuf_iter istrb_it06(istrs01.rdbuf());
+
+#ifndef _GLIBCXX_DEBUG
   VERIFY( istrb_it05 == istrb_it06 );
+#endif
   
   // bool equal(istreambuf_iter& b)
   cistreambuf_iter istrb_it07(0);
@@ -57,12 +60,14 @@ void test02(void)
   cistreambuf_iter istrb_it10;
   VERIFY( istrb_it10.equal(istrb_it09) );
 
+#ifndef _GLIBCXX_DEBUG
   cistreambuf_iter istrb_it11(istrs01);
   cistreambuf_iter istrb_it12(istrs01.rdbuf());
   VERIFY( istrb_it11.equal(istrb_it12) );
   cistreambuf_iter istrb_it13(istrs01);
   cistreambuf_iter istrb_it14(istrs01.rdbuf());
   VERIFY( istrb_it14.equal(istrb_it13) );
+#endif
 
   cistreambuf_iter istrb_it15(istrs01);
   cistreambuf_iter istrb_it16;
@@ -77,9 +82,11 @@ void test02(void)
   cistreambuf_iter istrb_it20;
   VERIFY( istrb_it19 == istrb_it20 );
 
+#ifndef _GLIBCXX_DEBUG
   cistreambuf_iter istrb_it21(istrs01);
   cistreambuf_iter istrb_it22(istrs01.rdbuf());
   VERIFY( istrb_it22 == istrb_it21 );
+#endif
 
   cistreambuf_iter istrb_it23(istrs01);
   cistreambuf_iter istrb_it24;
diff --git a/libstdc++-v3/testsuite/24_iterators/istreambuf_iterator/debug/3_neg.cc b/libstdc++-v3/testsuite/24_iterators/istreambuf_iterator/debug/3_neg.cc
new file mode 100644
index 000..5e75704
--- /dev/null
+++ b/libstdc++-v3/testsuite/24_iterators/istreambuf_iterator/debug/3_neg.cc
@@ -0,0 +1,37 @@
+// Copyright (C) 2018 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more 

Re: [PATCH] Fix ICEs with "omp declare simd" attribute on versioned fns or omp_fn* (PR middle-end/83977)

2018-01-24 Thread Richard Biener
On January 24, 2018 5:16:45 PM GMT+01:00, Jakub Jelinek  
wrote:
>On Wed, Jan 24, 2018 at 05:08:10PM +0100, Richard Biener wrote:
>> >The "omp declare simd" attribute refers to argument numbers of the
>> >functions, so trying to apply it on versioned functions that can
>> >perhaps
>> >have different number and types of arguments results in ICEs or
>> >wrong-code.
>> >Unfortunately, if simd attribute or #pragma omp declare simd is used
>> >on C++ ctors or dtors, those have DECL_ABSTRACT_ORIGIN of something
>> >that
>> >really doesn't exist, abstract ctor or dtor, so checking if
>> >the types of node->decl and its DECL_ABSTRACT_ORIGIN are compatible
>> >function
>> >types doesn't work.
>> 
>> So if the attribute is on the decl we clone it should be in the list
>of things that cloning adjusts or blocks cloning. 
>
>Yeah, guess I could move the attribute removal code from omp-low.c to
>some
>helper function and in tree_function_versioning check if the attribute
>is
>present and if yes and the old/new function types don't match, drop the
>attribute.  Or, add some flag to cgraph whether the attribute should be
>honored or not, and clear it in tree_function_versioning and
>omp_create_child_function instead of removing the attribute.

I think there's already a set of attributes that prevent cloning and or are 
adjusted by the IPA param machinery. The Martins or Honza should know better. 

>I'd prefer to defer that to GCC9 though at this point.

Sure. 

>   Jakub



[PATCH] Fix gcc.target/aarch64/sve/peel_ind_1.c for -mcmodel=tiny

2018-01-24 Thread Szabolcs Nagy

Fix test failures with -mcmodel=tiny when adr is generated instead of adrp.

FAIL: gcc.target/aarch64/sve/peel_ind_1.c -march=armv8.2-a+sve scan-assembler 
\\tadrp\\tx[0-9]+, x\\n
FAIL: gcc.target/aarch64/sve/peel_ind_2.c -march=armv8.2-a+sve scan-assembler 
\\tadrp\\tx[0-9]+, x\\n
FAIL: gcc.target/aarch64/sve/peel_ind_3.c -march=armv8.2-a+sve scan-assembler 
\\tadrp\\tx[0-9]+, x\\n

gcc/testsuite/ChangeLog:

2018-01-24  Szabolcs Nagy  

* gcc.target/aarch64/sve/peel_ind_1.c: Match (adrp|adr) in scan-assembler.
* gcc.target/aarch64/sve/peel_ind_2.c: Likewise.
* gcc.target/aarch64/sve/peel_ind_3.c: Likewise.
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/peel_ind_1.c b/gcc/testsuite/gcc.target/aarch64/sve/peel_ind_1.c
index 864026499cd..a064c337b67 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/peel_ind_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/peel_ind_1.c
@@ -21,7 +21,7 @@ foo (void)
 }
 
 /* We should operate on aligned vectors.  */
-/* { dg-final { scan-assembler {\tadrp\tx[0-9]+, x\n} } } */
+/* { dg-final { scan-assembler {\t(adrp|adr)\tx[0-9]+, x\n} } } */
 /* We should use an induction that starts at -5, with only the last
7 elements of the first iteration being active.  */
 /* { dg-final { scan-assembler {\tindex\tz[0-9]+\.s, #-5, #5\n} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/peel_ind_2.c b/gcc/testsuite/gcc.target/aarch64/sve/peel_ind_2.c
index 2bfc09a7602..f2113be90a7 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/peel_ind_2.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/peel_ind_2.c
@@ -17,6 +17,6 @@ foo (void)
 }
 
 /* We should operate on aligned vectors.  */
-/* { dg-final { scan-assembler {\tadrp\tx[0-9]+, x\n} } } */
+/* { dg-final { scan-assembler {\t(adrp|adr)\tx[0-9]+, x\n} } } */
 /* We should unroll the loop three times.  */
 /* { dg-final { scan-assembler-times "\tst1w\t" 3 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/peel_ind_3.c b/gcc/testsuite/gcc.target/aarch64/sve/peel_ind_3.c
index 8364dc6107a..441589eef60 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/peel_ind_3.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/peel_ind_3.c
@@ -17,5 +17,5 @@ foo (int start)
 }
 
 /* We should operate on aligned vectors.  */
-/* { dg-final { scan-assembler {\tadrp\tx[0-9]+, x\n} } } */
+/* { dg-final { scan-assembler {\t(adrp|adr)\tx[0-9]+, x\n} } } */
 /* { dg-final { scan-assembler {\tubfx\t} } } */


Fix funcition level hot/cold partitionig

2018-01-24 Thread Jan Hubicka
Hi,
this patch fixes another issue found by Martin Liska's patch to trap in unlikely
section (last one which I need to bootstrap).

Here we confused local and global counts, which was misupdate at a time I
introudced them.  Bootstrapped/regtested x86_64-linux.

Honza

* ipa-profile.c (ipa_propagate_frequency_1): Fix logic skipping calls
with zero counts.
Index: ipa-profile.c
===
--- ipa-profile.c   (revision 257011)
+++ ipa-profile.c   (working copy)
@@ -331,16 +331,14 @@ ipa_propagate_frequency_1 (struct cgraph
 it is executed by the train run.  Transfer the function only if all
 callers are unlikely executed.  */
   if (profile_info
- && edge->callee->count.initialized_p ()
- /* Thunks are not profiled.  This is more or less implementation
-bug.  */
- && !d->function_symbol->thunk.thunk_p
+ && !(edge->callee->count.ipa () == profile_count::zero ())
  && (edge->caller->frequency != NODE_FREQUENCY_UNLIKELY_EXECUTED
  || (edge->caller->global.inlined_to
  && edge->caller->global.inlined_to->frequency
 != NODE_FREQUENCY_UNLIKELY_EXECUTED)))
  d->maybe_unlikely_executed = false;
-  if (edge->count.initialized_p () && !edge->count.nonzero_p ())
+  if (edge->count.ipa ().initialized_p ()
+ && !edge->count.ipa ().nonzero_p ())
continue;
   switch (edge->caller->frequency)
 {


Re: [PATCH] Fix ICEs with "omp declare simd" attribute on versioned fns or omp_fn* (PR middle-end/83977)

2018-01-24 Thread Jakub Jelinek
On Wed, Jan 24, 2018 at 05:08:10PM +0100, Richard Biener wrote:
> >The "omp declare simd" attribute refers to argument numbers of the
> >functions, so trying to apply it on versioned functions that can
> >perhaps
> >have different number and types of arguments results in ICEs or
> >wrong-code.
> >Unfortunately, if simd attribute or #pragma omp declare simd is used
> >on C++ ctors or dtors, those have DECL_ABSTRACT_ORIGIN of something
> >that
> >really doesn't exist, abstract ctor or dtor, so checking if
> >the types of node->decl and its DECL_ABSTRACT_ORIGIN are compatible
> >function
> >types doesn't work.
> 
> So if the attribute is on the decl we clone it should be in the list of 
> things that cloning adjusts or blocks cloning. 

Yeah, guess I could move the attribute removal code from omp-low.c to some
helper function and in tree_function_versioning check if the attribute is
present and if yes and the old/new function types don't match, drop the
attribute.  Or, add some flag to cgraph whether the attribute should be
honored or not, and clear it in tree_function_versioning and
omp_create_child_function instead of removing the attribute.

I'd prefer to defer that to GCC9 though at this point.

Jakub


Re: [PATCH] Fix ICEs with "omp declare simd" attribute on versioned fns or omp_fn* (PR middle-end/83977)

2018-01-24 Thread Richard Biener
On January 24, 2018 4:47:06 PM GMT+01:00, Jakub Jelinek  
wrote:
>Hi!
>
>The "omp declare simd" attribute refers to argument numbers of the
>functions, so trying to apply it on versioned functions that can
>perhaps
>have different number and types of arguments results in ICEs or
>wrong-code.
>Unfortunately, if simd attribute or #pragma omp declare simd is used
>on C++ ctors or dtors, those have DECL_ABSTRACT_ORIGIN of something
>that
>really doesn't exist, abstract ctor or dtor, so checking if
>the types of node->decl and its DECL_ABSTRACT_ORIGIN are compatible
>function
>types doesn't work.

So if the attribute is on the decl we clone it should be in the list of things 
that cloning adjusts or blocks cloning. 

>The following patch just keeps optimizing only the original functions,
>not
>any versioned copies of them, but still allows simd attribute handling
>on
>e.g. __builtin_sin.
>
>Bootstrapped/regtested on x86_64-linux and i686-linux.
>
>Richard, is the first hunk ok, the rest is OpenMP related and I can ack
>myself.

Yes. 

>2018-01-24  Jakub Jelinek  
>
>   PR middle-end/83977
>   * tree.c (free_lang_data_in_decl): Don't clear DECL_ABSTRACT_ORIGIN
>   here.
>   * omp-low.c (create_omp_child_function): Remove "omp declare simd"
>   attributes from DECL_ATTRIBUTES (decl) without affecting
>   DECL_ATTRIBUTES (current_function_decl).
>   * omp-simd-clone.c (expand_simd_clones): Ignore DECL_ARTIFICIAL
>   functions with non-NULL DECL_ABSTRACT_ORIGIN.
>
>   * c-c++-common/gomp/pr83977-1.c: New test.
>   * c-c++-common/gomp/pr83977-2.c: New test.
>   * c-c++-common/gomp/pr83977-3.c: New test.
>   * gfortran.dg/gomp/pr83977.f90: New test.
>
>--- gcc/tree.c.jj  2018-01-23 14:48:50.216265866 +0100
>+++ gcc/tree.c 2018-01-24 11:40:30.845519905 +0100
>@@ -5329,16 +5329,6 @@ free_lang_data_in_decl (tree decl)
>At this point, it is not needed anymore.  */
>   DECL_SAVED_TREE (decl) = NULL_TREE;
> 
>-  /* Clear the abstract origin if it refers to a method.
>- Otherwise dwarf2out.c will ICE as we splice functions out of
>- TYPE_FIELDS and thus the origin will not be output
>- correctly.  */
>-  if (DECL_ABSTRACT_ORIGIN (decl)
>-&& DECL_CONTEXT (DECL_ABSTRACT_ORIGIN (decl))
>-&& RECORD_OR_UNION_TYPE_P
>- (DECL_CONTEXT (DECL_ABSTRACT_ORIGIN (decl
>-  DECL_ABSTRACT_ORIGIN (decl) = NULL_TREE;
>-
>  /* Sometimes the C++ frontend doesn't manage to transform a temporary
>DECL_VINDEX referring to itself into a vtable slot number as it
>should.  Happens with functions that are copied and then forgotten
>--- gcc/omp-low.c.jj   2018-01-04 00:43:16.106702767 +0100
>+++ gcc/omp-low.c  2018-01-24 12:59:37.566218901 +0100
>@@ -1585,6 +1585,23 @@ create_omp_child_function (omp_context *
>   DECL_INITIAL (decl) = make_node (BLOCK);
>   BLOCK_SUPERCONTEXT (DECL_INITIAL (decl)) = decl;
>   DECL_ATTRIBUTES (decl) = DECL_ATTRIBUTES (current_function_decl);
>+  /* Remove omp declare simd attribute from the new attributes.  */
>+  if (tree a = lookup_attribute ("omp declare simd", DECL_ATTRIBUTES
>(decl)))
>+{
>+  while (tree a2 = lookup_attribute ("omp declare simd",
>TREE_CHAIN (a)))
>+  a = a2;
>+  a = TREE_CHAIN (a);
>+  for (tree *p = _ATTRIBUTES (decl); *p != a;)
>+  if (is_attribute_p ("omp declare simd", get_attribute_name (*p)))
>+*p = TREE_CHAIN (*p);
>+  else
>+{
>+  tree chain = TREE_CHAIN (*p);
>+  *p = copy_node (*p);
>+  p = _CHAIN (*p);
>+  *p = chain;
>+}
>+}
>   DECL_FUNCTION_SPECIFIC_OPTIMIZATION (decl)
> = DECL_FUNCTION_SPECIFIC_OPTIMIZATION (current_function_decl);
>   DECL_FUNCTION_SPECIFIC_TARGET (decl)
>--- gcc/omp-simd-clone.c.jj2018-01-23 14:48:47.275261492 +0100
>+++ gcc/omp-simd-clone.c   2018-01-24 11:45:28.484494749 +0100
>@@ -1574,6 +1574,10 @@ expand_simd_clones (struct cgraph_node *
>   tree attr = lookup_attribute ("omp declare simd",
>   DECL_ATTRIBUTES (node->decl));
>   if (attr == NULL_TREE
>+  /* Ignore artificial decls with an abstract origin, results of
>function
>+   cloning, versioning etc.  We want to handle certain builtins
>+   with simd attribute, like __builtin_sin.  */
>+  || (DECL_ARTIFICIAL (node->decl) && DECL_ABSTRACT_ORIGIN
>(node->decl))
>   || node->global.inlined_to
>   || lookup_attribute ("noclone", DECL_ATTRIBUTES (node->decl)))
> return;
>--- gcc/testsuite/c-c++-common/gomp/pr83977-1.c.jj 2018-01-24
>11:46:00.946492004 +0100
>+++ gcc/testsuite/c-c++-common/gomp/pr83977-1.c2018-01-24
>11:46:29.181489615 +0100
>@@ -0,0 +1,19 @@
>+/* PR middle-end/83977 */
>+/* { dg-do compile } */
>+/* { dg-additional-options "-O2" } */
>+
>+struct S { int a, b, c; };
>+
>+#pragma omp declare simd uniform(z) linear(v:1)

[C++ Patch/RFC] PR 83796 ("[6/7/8 Regression] Abstract classes allowed to be instantiated when initialised as default parameter to function or constructor")

2018-01-24 Thread Paolo Carlini

Hi,

I'm looking into this rather mild regression, which should be relatively 
easy to fix. In short, Jason's fix for c++/54325 moved an 
abstract_virtuals_error_sfinae check from build_aggr_init_expr to 
build_cplus_new therefore the testcase in this new bug isn't rejected 
anymore because a special conditional for value-initialization from { } 
in convert_like_real simply calls build_value_init and quickly returns, 
thus build_cplus_new isn't involved. Thus I'm working on the best way to 
add back the check. The below, which also uses cp_unevaluated_operand, 
appears to work. Likewise something similar inside build_value_init 
itself, which however seems too generic to me (build_value_init is 
called in many other cases). I'm also not sure about 
cp_unevaluated_operand, whether we need something more precise.


Thanks! Paolo.

//

Index: cp/call.c
===
--- cp/call.c   (revision 257013)
+++ cp/call.c   (working copy)
@@ -6765,6 +6765,9 @@ convert_like_real (conversion *convs, tree expr, t
&& TYPE_HAS_DEFAULT_CONSTRUCTOR (totype))
  {
bool direct = CONSTRUCTOR_IS_DIRECT_INIT (expr);
+   if (cp_unevaluated_operand
+   && abstract_virtuals_error_sfinae (NULL_TREE, totype, complain))
+ return error_mark_node;
expr = build_value_init (totype, complain);
expr = get_target_expr_sfinae (expr, complain);
if (expr != error_mark_node)
Index: testsuite/g++.dg/cpp0x/abstract-default1.C
===
--- testsuite/g++.dg/cpp0x/abstract-default1.C  (nonexistent)
+++ testsuite/g++.dg/cpp0x/abstract-default1.C  (working copy)
@@ -0,0 +1,26 @@
+// PR c++/83796
+// { dg-do compile { target c++11 } }
+
+struct MyAbstractClass
+{
+  virtual int foo() const = 0;
+};
+
+struct TestClass
+{
+  TestClass(const MyAbstractClass& m = {})  // { dg-error "abstract type" }
+  : value_(m.foo()) {}
+
+  int value_;
+};
+
+int TestFunction(const MyAbstractClass& m = {})  // { dg-error "abstract type" 
}
+{
+  return m.foo();
+}
+
+int main()
+{
+  TestClass testInstance;
+  TestFunction();
+}


[PATCH] Fix ICEs with "omp declare simd" attribute on versioned fns or omp_fn* (PR middle-end/83977)

2018-01-24 Thread Jakub Jelinek
Hi!

The "omp declare simd" attribute refers to argument numbers of the
functions, so trying to apply it on versioned functions that can perhaps
have different number and types of arguments results in ICEs or wrong-code.
Unfortunately, if simd attribute or #pragma omp declare simd is used
on C++ ctors or dtors, those have DECL_ABSTRACT_ORIGIN of something that
really doesn't exist, abstract ctor or dtor, so checking if
the types of node->decl and its DECL_ABSTRACT_ORIGIN are compatible function
types doesn't work.

The following patch just keeps optimizing only the original functions, not
any versioned copies of them, but still allows simd attribute handling on
e.g. __builtin_sin.

Bootstrapped/regtested on x86_64-linux and i686-linux.

Richard, is the first hunk ok, the rest is OpenMP related and I can ack
myself.

2018-01-24  Jakub Jelinek  

PR middle-end/83977
* tree.c (free_lang_data_in_decl): Don't clear DECL_ABSTRACT_ORIGIN
here.
* omp-low.c (create_omp_child_function): Remove "omp declare simd"
attributes from DECL_ATTRIBUTES (decl) without affecting
DECL_ATTRIBUTES (current_function_decl).
* omp-simd-clone.c (expand_simd_clones): Ignore DECL_ARTIFICIAL
functions with non-NULL DECL_ABSTRACT_ORIGIN.

* c-c++-common/gomp/pr83977-1.c: New test.
* c-c++-common/gomp/pr83977-2.c: New test.
* c-c++-common/gomp/pr83977-3.c: New test.
* gfortran.dg/gomp/pr83977.f90: New test.

--- gcc/tree.c.jj   2018-01-23 14:48:50.216265866 +0100
+++ gcc/tree.c  2018-01-24 11:40:30.845519905 +0100
@@ -5329,16 +5329,6 @@ free_lang_data_in_decl (tree decl)
 At this point, it is not needed anymore.  */
   DECL_SAVED_TREE (decl) = NULL_TREE;
 
-  /* Clear the abstract origin if it refers to a method.
- Otherwise dwarf2out.c will ICE as we splice functions out of
- TYPE_FIELDS and thus the origin will not be output
- correctly.  */
-  if (DECL_ABSTRACT_ORIGIN (decl)
- && DECL_CONTEXT (DECL_ABSTRACT_ORIGIN (decl))
- && RECORD_OR_UNION_TYPE_P
-  (DECL_CONTEXT (DECL_ABSTRACT_ORIGIN (decl
-   DECL_ABSTRACT_ORIGIN (decl) = NULL_TREE;
-
   /* Sometimes the C++ frontend doesn't manage to transform a temporary
  DECL_VINDEX referring to itself into a vtable slot number as it
 should.  Happens with functions that are copied and then forgotten
--- gcc/omp-low.c.jj2018-01-04 00:43:16.106702767 +0100
+++ gcc/omp-low.c   2018-01-24 12:59:37.566218901 +0100
@@ -1585,6 +1585,23 @@ create_omp_child_function (omp_context *
   DECL_INITIAL (decl) = make_node (BLOCK);
   BLOCK_SUPERCONTEXT (DECL_INITIAL (decl)) = decl;
   DECL_ATTRIBUTES (decl) = DECL_ATTRIBUTES (current_function_decl);
+  /* Remove omp declare simd attribute from the new attributes.  */
+  if (tree a = lookup_attribute ("omp declare simd", DECL_ATTRIBUTES (decl)))
+{
+  while (tree a2 = lookup_attribute ("omp declare simd", TREE_CHAIN (a)))
+   a = a2;
+  a = TREE_CHAIN (a);
+  for (tree *p = _ATTRIBUTES (decl); *p != a;)
+   if (is_attribute_p ("omp declare simd", get_attribute_name (*p)))
+ *p = TREE_CHAIN (*p);
+   else
+ {
+   tree chain = TREE_CHAIN (*p);
+   *p = copy_node (*p);
+   p = _CHAIN (*p);
+   *p = chain;
+ }
+}
   DECL_FUNCTION_SPECIFIC_OPTIMIZATION (decl)
 = DECL_FUNCTION_SPECIFIC_OPTIMIZATION (current_function_decl);
   DECL_FUNCTION_SPECIFIC_TARGET (decl)
--- gcc/omp-simd-clone.c.jj 2018-01-23 14:48:47.275261492 +0100
+++ gcc/omp-simd-clone.c2018-01-24 11:45:28.484494749 +0100
@@ -1574,6 +1574,10 @@ expand_simd_clones (struct cgraph_node *
   tree attr = lookup_attribute ("omp declare simd",
DECL_ATTRIBUTES (node->decl));
   if (attr == NULL_TREE
+  /* Ignore artificial decls with an abstract origin, results of function
+cloning, versioning etc.  We want to handle certain builtins
+with simd attribute, like __builtin_sin.  */
+  || (DECL_ARTIFICIAL (node->decl) && DECL_ABSTRACT_ORIGIN (node->decl))
   || node->global.inlined_to
   || lookup_attribute ("noclone", DECL_ATTRIBUTES (node->decl)))
 return;
--- gcc/testsuite/c-c++-common/gomp/pr83977-1.c.jj  2018-01-24 
11:46:00.946492004 +0100
+++ gcc/testsuite/c-c++-common/gomp/pr83977-1.c 2018-01-24 11:46:29.181489615 
+0100
@@ -0,0 +1,19 @@
+/* PR middle-end/83977 */
+/* { dg-do compile } */
+/* { dg-additional-options "-O2" } */
+
+struct S { int a, b, c; };
+
+#pragma omp declare simd uniform(z) linear(v:1)
+__attribute__((noinline)) static int
+foo (int x, int y, struct S z, int u, int v)
+{
+  return x + y + z.a;
+}
+
+int
+bar (int x, int y, int z)
+{
+  struct S s = { z, 1, 1 };
+  return foo (x, y, s, 0, 0);
+}
--- gcc/testsuite/c-c++-common/gomp/pr83977-2.c.jj  2018-01-24 
11:46:42.259488509 +0100
+++ 

Re: [PATCH, 2/2][nvptx, PR83589] Workaround for branch-around-nothing JIT bug

2018-01-24 Thread Tom de Vries

On 01/24/2018 03:07 PM, Jakub Jelinek wrote:

On Wed, Jan 24, 2018 at 02:56:28PM +0100, Tom de Vries wrote:

+#if WORKAROUND_PTXJIT_BUG_2
+/* Variant of pc_set that only requires JUMP_P (INSN) if STRICT.  This variant
+   is needed in the nvptx target because the branches generated for
+   parititioning are NONJUMP_INSN_P, not JUMP_P.  */
+
+static rtx
+nvptx_pc_set (const rtx_insn *insn, bool strict = true)
+{
+  rtx pat;
+  if ((strict && !JUMP_P (insn))
+  || (!strict && !INSN_P (insn)))
+return NULL_RTX;
+  pat = PATTERN (insn);
+
+  /* The set is allowed to appear either as the insn pattern or
+ the first set in a PARALLEL.  */
+  if (GET_CODE (pat) == PARALLEL)
+pat = XVECEXP (pat, 0, 0);


This could have been single_set.



This is just a copy of pc_set in jump.c, with the strict parameter added.

It's possible that we can use single_set in pc_set in jump.c. But there 
are subtle differences:

- current pc_set allows a second non-dead set in parallel
- single_set doesn't allow second non-dead set in parallel

I don't know whether this difference is significant or not.


+  if (!x)
+return NULL_RTX;
+  x = SET_SRC (x);
+  if (GET_CODE (x) == LABEL_REF)
+return x;
+  if (GET_CODE (x) != IF_THEN_ELSE)
+return NULL_RTX;
+  if (XEXP (x, 2) == pc_rtx && GET_CODE (XEXP (x, 1)) == LABEL_REF)
+return XEXP (x, 1);
+  if (XEXP (x, 1) == pc_rtx && GET_CODE (XEXP (x, 2)) == LABEL_REF)
+return XEXP (x, 2);
+  return NULL_RTX;


And this looks like condjump_label.  


This is just a copy of condjump_label in jump.c, with the strict 
parameter added.



What are the nvptx conditional jumps
that aren't JUMP_INSN and why?  That looks like a bad idea.


OpenACC has different execution modes:
- gang redundant vs gang partitioned
- worker single vs worker partitioned
- vector single vs vector partitioned

The transitions between the different modes are represented by:
- nvptx_fork
- nvptx_forked
- nvptx_join
- nvptx_joined
until pass_machine_reorg.

In pass_machine_reorg, they are expanded into more detailed operations 
implementing state propagation and neutering code for single mode.


The neutering code consists of branch-around code, which uses these 
conditional jumps that are not JUMP_INSN.


My assumption is that this is done in order to make the compiler behave 
conservatively with these jumps.  I'm not sure if this is related to one 
or more passes after reorg, or if this is just defensive programming.


I could try to change these into JUMP_INSN in stage1, and see how that goes.


Otherwise, there is also JUMP_LABEL (insn)...


Right, that one requires a JUMP_INSN.

Thanks,
- Tom


[PATCH][testsuite] Fix arm options in gcc.dg/lto/20110201-1_0.c

2018-01-24 Thread Kyrill Tkachov

Hi all,

This test fails on arm hardfloat targets because it sets an explicit 
-mfloat-abi=softfp.
The usual approach to setting the NEON options is to use dg-add-options 
arm_neon.
But in the lto tests we don't have that framework, we can only set them 
explicitly with dg-lto-options.

The solution is to remove the explicit -mfloat-abi=softfp and instead add an 
effective target check
for arm_neon_ok_no_float_abi that makes sure we only run this test if 
-mfpu=neon is enough to get NEON
without any -mfloat-abi options. In fact, this is what the comment above 
check_effective_target_arm_neon_ok_no_float_abi_nocache
recommends for lto tests.

That way on my hardfloat toolchain the test doesn't try to link the softfp 
binary against a hard-float runtime/test glue
and all is good. I've tested that the test is appropriately skipped when 
testing a --with-float=soft toolchain.

Committing to trunk (as the patch only touches arm options)

Thanks,
Kyrill

2018-01-24  Kyrylo Tkachov  

* gcc.dg/lto/20110201-1_0.c: Remove explicit -mfloat-abi=softfp
option.  Add arm_neon_ok_no_float_abi check.
diff --git a/gcc/testsuite/gcc.dg/lto/20110201-1_0.c b/gcc/testsuite/gcc.dg/lto/20110201-1_0.c
index 2144f0714804ea1003d58f37ca5d16ced722a3a7..871a49fe1897511e39827b96c7c57931d689ce92 100644
--- a/gcc/testsuite/gcc.dg/lto/20110201-1_0.c
+++ b/gcc/testsuite/gcc.dg/lto/20110201-1_0.c
@@ -1,6 +1,7 @@
 /* { dg-lto-do run } */
 /* { dg-lto-options { { -O0 -flto -fno-math-errno } } } */
-/* { dg-lto-options { "-O0 -flto -fno-math-errno -mfloat-abi=softfp -mfpu=neon-vfpv4" } { target arm*-*-* } } */
+/* { dg-lto-options { "-O0 -flto -fno-math-errno -mfpu=neon-vfpv4" } { target arm*-*-* } } */
+/* { dg-require-effective-target arm_neon_ok_no_float_abi { target arm*-*-* } } */
 /* { dg-require-linker-plugin "" } */
 /* { dg-require-effective-target sqrt_insn } */
 


Re: Fix use of boolean_true/false_node (PR 83979)

2018-01-24 Thread Richard Biener
On Tue, Jan 23, 2018 at 12:25 PM, Richard Sandiford
 wrote:
> r255913 changed some constant_boolean_node calls to boolean_true_node
> and boolean_false_node, which meant that the returned tree didn't
> always have the right type.
>
> Tested on aarch64-linux-gnu.  Probably bordering on obvious, but just
> in case: OK to install?

Ok.

Richard.

> Richard
>
>
> 2018-01-23  Richard Sandiford  
>
> gcc/
> PR tree-optimization/83979
> * fold-const.c (fold_comparison): Use constant_boolean_node
> instead of boolean_{true,false}_node.
>
> gcc/testsuite/
> PR tree-optimization/83979
> * g++.dg/pr83979.c: New test.
>
> Index: gcc/fold-const.c
> ===
> --- gcc/fold-const.c2018-01-16 15:13:19.643832679 +
> +++ gcc/fold-const.c2018-01-23 11:23:59.982555852 +
> @@ -8572,39 +8572,39 @@ fold_comparison (location_t loc, enum tr
> {
> case EQ_EXPR:
>   if (known_eq (bitpos0, bitpos1))
> -   return boolean_true_node;
> +   return constant_boolean_node (true, type);
>   if (known_ne (bitpos0, bitpos1))
> -   return boolean_false_node;
> +   return constant_boolean_node (false, type);
>   break;
> case NE_EXPR:
>   if (known_ne (bitpos0, bitpos1))
> -   return boolean_true_node;
> +   return constant_boolean_node (true, type);
>   if (known_eq (bitpos0, bitpos1))
> -   return boolean_false_node;
> +   return constant_boolean_node (false, type);
>   break;
> case LT_EXPR:
>   if (known_lt (bitpos0, bitpos1))
> -   return boolean_true_node;
> +   return constant_boolean_node (true, type);
>   if (known_ge (bitpos0, bitpos1))
> -   return boolean_false_node;
> +   return constant_boolean_node (false, type);
>   break;
> case LE_EXPR:
>   if (known_le (bitpos0, bitpos1))
> -   return boolean_true_node;
> +   return constant_boolean_node (true, type);
>   if (known_gt (bitpos0, bitpos1))
> -   return boolean_false_node;
> +   return constant_boolean_node (false, type);
>   break;
> case GE_EXPR:
>   if (known_ge (bitpos0, bitpos1))
> -   return boolean_true_node;
> +   return constant_boolean_node (true, type);
>   if (known_lt (bitpos0, bitpos1))
> -   return boolean_false_node;
> +   return constant_boolean_node (false, type);
>   break;
> case GT_EXPR:
>   if (known_gt (bitpos0, bitpos1))
> -   return boolean_true_node;
> +   return constant_boolean_node (true, type);
>   if (known_le (bitpos0, bitpos1))
> -   return boolean_false_node;
> +   return constant_boolean_node (false, type);
>   break;
> default:;
> }
> Index: gcc/testsuite/g++.dg/pr83979.c
> ===
> --- /dev/null   2018-01-22 18:46:35.983712806 +
> +++ gcc/testsuite/g++.dg/pr83979.c  2018-01-23 11:23:59.982555852 +
> @@ -0,0 +1,7 @@
> +/* { dg-compile } */
> +
> +int
> +foo (char* p)
> +{
> +  return p + 1000 < p;
> +}


Re: [PR81611] improve auto-inc

2018-01-24 Thread Richard Biener
On Wed, Jan 24, 2018 at 4:42 AM, Alexandre Oliva  wrote:
> These two patches fix PR81611.
>
> The first one improves forwprop so that we avoid adding SSA conflicting
> by forwpropping the iv increment, which may cause both the incremented
> and the original value to be live, even when the iv is copied between
> the PHI node and the increment.  We already handled the case in which
> there aren't any such copies.
>
> Alas, this is not enough to address the problem on avr, even though it
> fixes it on e.g. ppc.  The reason is that avr rejects var+offset
> addresses, and this prevents the memory access in a post-inc code
> sequence from being adjusted to an address that auto-inc-dec can
> recognize.
>
> The second patch adjusts auto-inc-dec to recognize a code sequence in
> which the original, unincremented pseudo is used in an address after
> it's incremented into another pseudo, and turn that into a post-inc
> address, leaving the copying for subsequent passes to eliminate.
>
> Regstrapped on x86_64-linux-gnu, i686-linux-gnu, ppc64-linux-gnu and
> aarch64-linux-gnu.  Ok to install?
>
>
> I'd appreciate suggestions on how to turn the submitted testcase into a
> regression test; I suppose an avr-specific test that requires the
> auto-inc transformation is a possibility, but that feels a bit too
> limited, doesn't it?  Thoughts?  Thanks in advance,
>
>
> [PR81611] accept copies in simple_iv_increment_p
>
> If there are copies between the GIMPLE_PHI at the loop body and the
> increment that reaches it (presumably through a back edge), still
> regard it as a simple_iv_increment, so that we won't consider the
> value in the back edge eligible for forwprop.  Doing so would risk
> making the phi node and the incremented conflicting value live
> within the loop, and the phi node to be preserved for propagated
> uses after the loop.
>
> for  gcc/ChangeLog
>
> PR tree-optimization/81611
> * tree-ssa-dom.c (simple_iv_increment_p): Skip intervening
> copies.
> ---
>  gcc/tree-ssa-dom.c |   21 +
>  1 file changed, 17 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c
> index 2b371667253a..3c0ff9458342 100644
> --- a/gcc/tree-ssa-dom.c
> +++ b/gcc/tree-ssa-dom.c
> @@ -1276,8 +1276,11 @@ record_equality (tree x, tree y, class 
> const_and_copies *const_and_copies)
>  /* Returns true when STMT is a simple iv increment.  It detects the
> following situation:
>
> -   i_1 = phi (..., i_2)
> -   i_2 = i_1 +/- ...  */
> +   i_1 = phi (..., i_k)
> +   [...]
> +   i_j = i_{j-1}  for each j : 2 <= j <= k-1
> +   [...]
> +   i_k = i_{k-1} +/- ...  */
>
>  bool
>  simple_iv_increment_p (gimple *stmt)
> @@ -1305,8 +1308,18 @@ simple_iv_increment_p (gimple *stmt)
>  return false;
>
>phi = SSA_NAME_DEF_STMT (preinc);
> -  if (gimple_code (phi) != GIMPLE_PHI)
> -return false;
> +  while (gimple_code (phi) != GIMPLE_PHI)
> +{
> +  /* Follow trivial copies, but not the DEF used in a back edge,
> +so that we don't prevent coalescing.  */
> +  if (gimple_code (phi) != GIMPLE_ASSIGN
> + || gimple_assign_lhs (phi) != preinc
> + || !gimple_assign_ssa_name_copy_p (phi))

given gimple_assign_ssa_name_copy checks it is an assign
just do

   if (!gimple_assign_ssa-anme_Copy_p (phi))

the lhs != preinc check is always false given you got to phi via
SSA_NAME_DEF_STMT of preinc.

The simple_iv_increment_p change is ok with that change.  The other
change is RTL which I
defer to somebody else.

Richard.

> +   return false;
> +
> +  preinc = gimple_assign_rhs1 (phi);
> +  phi = SSA_NAME_DEF_STMT (preinc);
> +}
>
>for (i = 0; i < gimple_phi_num_args (phi); i++)
>  if (gimple_phi_arg_def (phi, i) == lhs)
>
>
> [PR81611] turn inc-and-use-of-dead-orig into auto-inc
>
> When the addressing modes available on the machine don't allow offsets
> in addresses, odds are that post-increments will be represented in
> trees and RTL as:
>
>   y <= x + 1
>   ... *(x) ...
>   x <= y
>
> so deal with this form so as to create auto-inc addresses that we'd
> otherwise miss.
>
>
> for  gcc/ChangeLog
>
> PR rtl-optimization/81611
> * auto-inc-dec.c (attempt_change): Move dead note from
> mem_insn if it's the next use of regno
> (find_address): Take address use of reg holding
> non-incremented value.
> (merge_in_block): Attempt to use a mem insn that is the next
> use of the original regno.
> ---
>  gcc/auto-inc-dec.c |   46 --
>  1 file changed, 44 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/auto-inc-dec.c b/gcc/auto-inc-dec.c
> index d02fa9d081c7..4ffbcf56a456 100644
> --- a/gcc/auto-inc-dec.c
> +++ b/gcc/auto-inc-dec.c
> @@ -508,7 +508,11 @@ attempt_change (rtx new_addr, rtx inc_reg)
>  before the memory reference.  */
>gcc_assert (mov_insn);
>emit_insn_before (mov_insn, 

Re: Remove explicit dg-do runs from gcc.dg/vect (PR 83889)

2018-01-24 Thread Richard Biener
On Tue, Jan 23, 2018 at 12:32 PM, Richard Sandiford
 wrote:
> The failures in this PR were from forcing { dg-do run } even when
> vect.exp chooses options that are incompatible with the runtime.
> The default vect.exp behaviour is to execute when possible, so there's
> no need for a dg-do at all.
>
> The patch removes other unconditional { dg-do run }s too.  Many of them
> were already failing in the same way.
>
> Also, the dg-do run condition in vect-reduc-or* seems unnecessary:
> the test should run correctly whatever happens, and the scan tests
> are already guarded properly.
>
> Tested on aarch64-linux-gnu, arm-non-eabi, x86_64-linux-gnu and
> powerpc64le-linux-gnu.  OK to install?

Ok.

Richard.

> Richard
>
>
> 2018-01-23  Richard Sandiford  
>
> gcc/testsuite/
> PR testsuite/83889
> * gcc.dg/vect/pr79920.c: Remove explicit dg-do run.
> * gcc.dg/vect/pr80631-1.c: Likewise.
> * gcc.dg/vect/pr80631-2.c: Likewise.
> * gcc.dg/vect/pr81410.c: Likewise.
> * gcc.dg/vect/pr81633.c: Likewise.
> * gcc.dg/vect/pr81815.c: Likewise.
> * gcc.dg/vect/pr82108.c: Likewise.
> * gcc.dg/vect/pr83857.c: Likewise.
> * gcc.dg/vect/vect-alias-check-8.c: Likewise.
> * gcc.dg/vect/vect-alias-check-9.c: Likewise.
> * gcc.dg/vect/vect-alias-check-10.c: Likewise.
> * gcc.dg/vect/vect-alias-check-11.c: Likewise.
> * gcc.dg/vect/vect-alias-check-12.c: Likewise.
> * gcc.dg/vect/vect-reduc-11.c: Likewise.
> * gcc.dg/vect/vect-tail-nomask-1.c: Likewise.
> * gcc.dg/vect/vect-reduc-in-order-1.c: Remove dg-do run and use
> dg-xfail-run-if instead.
> * gcc.dg/vect/vect-reduc-in-order-2.c: Likewise.
> * gcc.dg/vect/vect-reduc-in-order-3.c: Likewise.
> * gcc.dg/vect/vect-reduc-in-order-4.c: Likewise.
> * gcc.dg/vect/vect-reduc-or_1.c: Remove conditional dg-do run.
> * gcc.dg/vect/vect-reduc-or_2.c: Likewise.
>
> Index: gcc/testsuite/gcc.dg/vect/pr79920.c
> ===
> --- gcc/testsuite/gcc.dg/vect/pr79920.c 2018-01-15 12:38:45.039094423 +
> +++ gcc/testsuite/gcc.dg/vect/pr79920.c 2018-01-23 11:29:38.977575495 +
> @@ -1,4 +1,3 @@
> -/* { dg-do run } */
>  /* { dg-additional-options "-O3 -fno-fast-math" } */
>
>  #include "tree-vect.h"
> Index: gcc/testsuite/gcc.dg/vect/pr80631-1.c
> ===
> --- gcc/testsuite/gcc.dg/vect/pr80631-1.c   2018-01-13 17:59:52.122334084 
> +
> +++ gcc/testsuite/gcc.dg/vect/pr80631-1.c   2018-01-23 11:29:38.977575495 
> +
> @@ -1,5 +1,4 @@
>  /* PR tree-optimization/80631 */
> -/* { dg-do run } */
>
>  #include "tree-vect.h"
>
> Index: gcc/testsuite/gcc.dg/vect/pr80631-2.c
> ===
> --- gcc/testsuite/gcc.dg/vect/pr80631-2.c   2017-12-14 00:04:52.323446529 
> +
> +++ gcc/testsuite/gcc.dg/vect/pr80631-2.c   2018-01-23 11:29:38.977575495 
> +
> @@ -1,5 +1,4 @@
>  /* PR tree-optimization/80631 */
> -/* { dg-do run } */
>
>  #include "tree-vect.h"
>
> Index: gcc/testsuite/gcc.dg/vect/pr81410.c
> ===
> --- gcc/testsuite/gcc.dg/vect/pr81410.c 2017-07-27 10:37:55.334036950 +0100
> +++ gcc/testsuite/gcc.dg/vect/pr81410.c 2018-01-23 11:29:38.977575495 +
> @@ -1,4 +1,3 @@
> -/* { dg-do run } */
>  /* { dg-require-effective-target vect_long_long } */
>
>  #include "tree-vect.h"
> Index: gcc/testsuite/gcc.dg/vect/pr81633.c
> ===
> --- gcc/testsuite/gcc.dg/vect/pr81633.c 2017-08-03 10:40:54.014105333 +0100
> +++ gcc/testsuite/gcc.dg/vect/pr81633.c 2018-01-23 11:29:38.977575495 +
> @@ -1,5 +1,3 @@
> -/* { dg-do run } */
> -
>  static double identity[4][4] = {{1, 0, 0, 0},
>  {0, 1, 0, 0},
>  {0, 0, 1, 0},
> Index: gcc/testsuite/gcc.dg/vect/pr81815.c
> ===
> --- gcc/testsuite/gcc.dg/vect/pr81815.c 2017-08-16 08:50:54.197549943 +0100
> +++ gcc/testsuite/gcc.dg/vect/pr81815.c 2018-01-23 11:29:38.978575453 +
> @@ -1,5 +1,3 @@
> -/* { dg-do run } */
> -
>  int __attribute__ ((noinline, noclone))
>  f (int *x, int n)
>  {
> Index: gcc/testsuite/gcc.dg/vect/pr82108.c
> ===
> --- gcc/testsuite/gcc.dg/vect/pr82108.c 2017-09-06 20:47:38.380589062 +0100
> +++ gcc/testsuite/gcc.dg/vect/pr82108.c 2018-01-23 11:29:38.978575453 +
> @@ -1,4 +1,3 @@
> -/* { dg-do run } */
>  /* { dg-require-effective-target vect_float } */
>
>  #include "tree-vect.h"
> Index: gcc/testsuite/gcc.dg/vect/pr83857.c
> 

Re: [PATCH, 2/2][nvptx, PR83589] Workaround for branch-around-nothing JIT bug

2018-01-24 Thread Jakub Jelinek
On Wed, Jan 24, 2018 at 02:56:28PM +0100, Tom de Vries wrote:
> +#if WORKAROUND_PTXJIT_BUG_2
> +/* Variant of pc_set that only requires JUMP_P (INSN) if STRICT.  This 
> variant
> +   is needed in the nvptx target because the branches generated for
> +   parititioning are NONJUMP_INSN_P, not JUMP_P.  */
> +
> +static rtx
> +nvptx_pc_set (const rtx_insn *insn, bool strict = true)
> +{
> +  rtx pat;
> +  if ((strict && !JUMP_P (insn))
> +  || (!strict && !INSN_P (insn)))
> +return NULL_RTX;
> +  pat = PATTERN (insn);
> +
> +  /* The set is allowed to appear either as the insn pattern or
> + the first set in a PARALLEL.  */
> +  if (GET_CODE (pat) == PARALLEL)
> +pat = XVECEXP (pat, 0, 0);

This could have been single_set.

> +  if (!x)
> +return NULL_RTX;
> +  x = SET_SRC (x);
> +  if (GET_CODE (x) == LABEL_REF)
> +return x;
> +  if (GET_CODE (x) != IF_THEN_ELSE)
> +return NULL_RTX;
> +  if (XEXP (x, 2) == pc_rtx && GET_CODE (XEXP (x, 1)) == LABEL_REF)
> +return XEXP (x, 1);
> +  if (XEXP (x, 1) == pc_rtx && GET_CODE (XEXP (x, 2)) == LABEL_REF)
> +return XEXP (x, 2);
> +  return NULL_RTX;

And this looks like condjump_label.  What are the nvptx conditional jumps
that aren't JUMP_INSN and why?  That looks like a bad idea.
Otherwise, there is also JUMP_LABEL (insn)...

Jakub


C++ PATCH for c++/82249, wrong mismatched pack length error

2018-01-24 Thread Jason Merrill
tsubst_pack_expansion already knows how to deal with partial
instantiation of a pack expansion, where we end up with arguments for
some packs but not others, but my recent lambda work made this come up
in a new situation: within a function, where we need to deal with
function parameter packs as well.  When we don't have arguments for a
template parameter pack we just don't have an argument pack, but for a
function parameter pack we instead end up with an argument pack
containing just an expansion of the parameter pack itself, so we need
to recognize that case and treat it like the usual unsubstituted case.
We then also need to handle the result of tsubst_expr in this case.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit 329e2e39748f6f60630f247ecd75a6556f9f72e9
Author: Jason Merrill 
Date:   Tue Jan 23 17:04:56 2018 -0500

PR c++/82249 - wrong mismatched pack length error.

* pt.c (extract_fnparm_pack, tsubst_pack_expansion): Handle
unsubstituted function parameter pack.

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index d39b54ed408..abfdbd96ae8 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -10961,7 +10961,12 @@ extract_fnparm_pack (tree tmpl_parm, tree *spec_p)
   parmvec = make_tree_vec (len);
   spec_parm = *spec_p;
   for (i = 0; i < len; i++, spec_parm = DECL_CHAIN (spec_parm))
-TREE_VEC_ELT (parmvec, i) = spec_parm;
+{
+  tree elt = spec_parm;
+  if (DECL_PACK_P (elt))
+   elt = make_pack_expansion (elt);
+  TREE_VEC_ELT (parmvec, i) = elt;
+}
 
   /* Build the argument packs.  */
   SET_ARGUMENT_PACK_ARGS (argpack, parmvec);
@@ -11414,6 +11419,7 @@ tsubst_pack_expansion (tree t, tree args, 
tsubst_flags_t complain,
   tree pattern;
   tree pack, packs = NULL_TREE;
   bool unsubstituted_packs = false;
+  bool unsubstituted_fn_pack = false;
   int i, len = -1;
   tree result;
   hash_map *saved_local_specializations = NULL;
@@ -11484,6 +11490,13 @@ tsubst_pack_expansion (tree t, tree args, 
tsubst_flags_t complain,
  else
arg_pack = make_fnparm_pack (arg_pack);
}
+ else if (argument_pack_element_is_expansion_p (arg_pack, 0))
+   /* This argument pack isn't fully instantiated yet.  We set this
+  flag rather than clear arg_pack because we do want to do the
+  optimization below, and we don't want to substitute directly
+  into the pattern (as that would expose a NONTYPE_ARGUMENT_PACK
+  where it isn't expected).  */
+   unsubstituted_fn_pack = true;
}
   else if (TREE_CODE (parm_pack) == FIELD_DECL)
arg_pack = tsubst_copy (parm_pack, args, complain, in_decl);
@@ -11521,7 +11534,8 @@ tsubst_pack_expansion (tree t, tree args, 
tsubst_flags_t complain,
 
   if (len < 0)
len = my_len;
-  else if (len != my_len)
+  else if (len != my_len
+  && !unsubstituted_fn_pack)
 {
  if (!(complain & tf_error))
/* Fail quietly.  */;
@@ -11574,7 +11588,8 @@ tsubst_pack_expansion (tree t, tree args, 
tsubst_flags_t complain,
 
   /* We cannot expand this expansion expression, because we don't have
  all of the argument packs we need.  */
-  if (use_pack_expansion_extra_args_p (packs, len, unsubstituted_packs))
+  if (use_pack_expansion_extra_args_p (packs, len, (unsubstituted_packs
+   || unsubstituted_fn_pack)))
 {
   /* We got some full packs, but we can't substitute them in until we
 have values for all the packs.  So remember these until then.  */
diff --git a/gcc/testsuite/g++.dg/cpp1y/lambda-generic-variadic7.C 
b/gcc/testsuite/g++.dg/cpp1y/lambda-generic-variadic7.C
new file mode 100644
index 000..5c5af1441c0
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/lambda-generic-variadic7.C
@@ -0,0 +1,19 @@
+// PR c++/82249
+// { dg-do compile { target c++14 } }
+
+template T calc (T t, U u) { return t; }
+template  void sink(Ts...);
+
+template < typename ... Ds >
+void f(Ds ...) {
+  [](auto ... n){
+sink (calc(n, Ds{}) ...);
+  }(Ds{} ...);
+}
+
+
+int main(){
+  f();  // Wrong error
+  f(0, 0);  // Wrong error
+  f(0); // ICE
+}


RE: [PATCH] Fix various x86 avx512{bitalg, vpopcntdq, vbmi2} issues (PR target/83488)

2018-01-24 Thread Koval, Julia
Hi,
Fixed it. Ok for trunk?

gcc/
* config/i386/avx512bitalgintrin.h (_mm512_bitshuffle_epi64_mask,
_mm512_mask_bitshuffle_epi64_mask, _mm256_bitshuffle_epi64_mask,
_mm256_mask_bitshuffle_epi64_mask, _mm_bitshuffle_epi64_mask,
_mm_mask_bitshuffle_epi64_mask): Fix type.
* config/i386/i386-builtin-types.def (UHI_FTYPE_V2DI_V2DI_UHI,
USI_FTYPE_V4DI_V4DI_USI): Remove.
* config/i386/i386-builtin.def (__builtin_ia32_vpshufbitqmb512_mask,
__builtin_ia32_vpshufbitqmb256_mask,
__builtin_ia32_vpshufbitqmb128_mask): Fix types.
* config/i386/i386.c (ix86_expand_args_builtin): Remove old types.
* config/i386/sse.md (VI1_AVX512VLBW): Change types.

gcc/testsuite/
* gcc.target/i386/avx512bitalg-vpshufbitqmb-1.c: Add -mavx512f 
-mavx512bw.
* gcc.target/i386/avx512bitalgvl-vpshufbitqmb-1.c: Add -mavx512bw.
* gcc.target/i386/i386.exp: Fix types.

Thanks,
Julia

> -Original Message-
> From: Kirill Yukhin [mailto:kirill.yuk...@gmail.com]
> Sent: Saturday, January 20, 2018 11:49 AM
> To: Koval, Julia 
> Cc: 'Jakub Jelinek' ; 'Uros Bizjak' ;
> 'GCC Patches' 
> Subject: Re: [PATCH] Fix various x86 avx512{bitalg, vpopcntdq, vbmi2} issues 
> (PR
> target/83488)
> 
> Hello Julia,
> On 12 Jan 08:55, Koval, Julia wrote:
> > Changelog
> >
> > gcc/
> > * config/i386/avx512bitalgintrin.h (_mm512_bitshuffle_epi64_mask,
> > _mm512_mask_bitshuffle_epi64_mask,
> _mm256_bitshuffle_epi64_mask,
> > _mm256_mask_bitshuffle_epi64_mask, _mm_bitshuffle_epi64_mask,
> > _mm_mask_bitshuffle_epi64_mask): Fix type.
> > * config/i386/i386-builtin-types.def (UHI_FTYPE_V2DI_V2DI_UHI,
> > USI_FTYPE_V4DI_V4DI_USI): Remove.
> > * config/i386/i386-builtin.def (__builtin_ia32_vpshufbitqmb512_mask,
> > __builtin_ia32_vpshufbitqmb256_mask,
> > __builtin_ia32_vpshufbitqmb128_mask): Fix types.
> > * config/i386/i386.c (ix86_expand_args_builtin): Remove old types.
> > * config/i386/sse.md (VI48_AVX512VLBW): Change types.
> >
> > gcc/testsuite/
> > * gcc.target/i386/avx512bitalg-vpshufbitqmb-1.c: Add -mavx512f -
> mavx512bw.
> > * gcc.target/i386/avx512bitalgvl-vpshufbitqmb-1.c: Add -mavx512bw.
> > * gcc.target/i386/i386.exp: Fix types.
> 
>  (define_mode_iterator VI48_AVX512VLBW
> -  [(V8DI "TARGET_AVX512BW") (V4DI  "TARGET_AVX512VL")
> -   (V2DI  "TARGET_AVX512VL")])
> +  [(V64QI "TARGET_AVX512BW") (V32QI  "TARGET_AVX512VL")
> +   (V16QI  "TARGET_AVX512VL")])
> I'd call this iterator VI1_AVX512VLBW.
> 
> --
> Thanks, K



0001-bitalg-fix.patch
Description: 0001-bitalg-fix.patch


Re: [PATCH, 2/2][nvptx, PR83589] Workaround for branch-around-nothing JIT bug

2018-01-24 Thread Tom de Vries

On 01/24/2018 12:03 PM, Jakub Jelinek wrote:

On Wed, Jan 24, 2018 at 11:41:45AM +0100, Tom de Vries wrote:

+/* Insert a dummy ptx insn when encountering a branch to a label with no ptx
+   insn inbetween the branch and the label.  This works around a JIT bug
+   observed at driver version 384.111, at -O0 for sm_50.  */
+
+static void
+prevent_branch_around_nothing (void)
+{
+  rtx_insn *seen_label = 0;
+for (rtx_insn *insn = get_insns (); insn; insn = NEXT_INSN (insn))
+  {
+   if (seen_label == 0)
+ {
+   if (INSN_P (insn) && condjump_p (insn))
+ seen_label = label_ref_label (nvptx_condjump_label (insn, false));
+
+   continue;
+ }
+
+   if (NOTE_P (insn))
+ continue;


I'm afraid for review I don't know the backend enough. > I'd just suggest using 
NULL instead of 0 for pointers, i.e. clearing
seen_label or comparisons of seen_label against NULL,


Done.


and wonder if
DEBUG_INSNs are guaranteed not to appear here.  If not, you'd need to
skip them too.



Done.

Retested and committed as attached.

Thanks,
- Tom
[nvptx, PR83589] Workaround for branch-around-nothing JIT bug

2018-01-23  Tom de Vries  

	PR target/83589
	* config/nvptx/nvptx.c (WORKAROUND_PTXJIT_BUG_2): Define to 1.
	(nvptx_pc_set, nvptx_condjump_label): New function. Copy from jump.c.
	Add strict parameter.
	(prevent_branch_around_nothing): Insert dummy insn between branch to
	label and label with no ptx insn inbetween.
	* config/nvptx/nvptx.md (define_insn "fake_nop"): New insn.

	* testsuite/libgomp.oacc-c-c++-common/pr83589.c: New test.

---
 gcc/config/nvptx/nvptx.c   | 92 ++
 gcc/config/nvptx/nvptx.md  |  9 +++
 .../testsuite/libgomp.oacc-c-c++-common/pr83589.c  | 21 +
 3 files changed, 122 insertions(+)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 3516740..d848412 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -78,6 +78,7 @@
 #include "target-def.h"
 
 #define WORKAROUND_PTXJIT_BUG 1
+#define WORKAROUND_PTXJIT_BUG_2 1
 
 /* The various PTX memory areas an object might reside in.  */
 enum nvptx_data_area
@@ -4363,6 +4364,93 @@ nvptx_neuter_pars (parallel *par, unsigned modes, unsigned outer)
 nvptx_neuter_pars (par->next, modes, outer);
 }
 
+#if WORKAROUND_PTXJIT_BUG_2
+/* Variant of pc_set that only requires JUMP_P (INSN) if STRICT.  This variant
+   is needed in the nvptx target because the branches generated for
+   parititioning are NONJUMP_INSN_P, not JUMP_P.  */
+
+static rtx
+nvptx_pc_set (const rtx_insn *insn, bool strict = true)
+{
+  rtx pat;
+  if ((strict && !JUMP_P (insn))
+  || (!strict && !INSN_P (insn)))
+return NULL_RTX;
+  pat = PATTERN (insn);
+
+  /* The set is allowed to appear either as the insn pattern or
+ the first set in a PARALLEL.  */
+  if (GET_CODE (pat) == PARALLEL)
+pat = XVECEXP (pat, 0, 0);
+  if (GET_CODE (pat) == SET && GET_CODE (SET_DEST (pat)) == PC)
+return pat;
+
+  return NULL_RTX;
+}
+
+/* Variant of condjump_label that only requires JUMP_P (INSN) if STRICT.  */
+
+static rtx
+nvptx_condjump_label (const rtx_insn *insn, bool strict = true)
+{
+  rtx x = nvptx_pc_set (insn, strict);
+
+  if (!x)
+return NULL_RTX;
+  x = SET_SRC (x);
+  if (GET_CODE (x) == LABEL_REF)
+return x;
+  if (GET_CODE (x) != IF_THEN_ELSE)
+return NULL_RTX;
+  if (XEXP (x, 2) == pc_rtx && GET_CODE (XEXP (x, 1)) == LABEL_REF)
+return XEXP (x, 1);
+  if (XEXP (x, 1) == pc_rtx && GET_CODE (XEXP (x, 2)) == LABEL_REF)
+return XEXP (x, 2);
+  return NULL_RTX;
+}
+
+/* Insert a dummy ptx insn when encountering a branch to a label with no ptx
+   insn inbetween the branch and the label.  This works around a JIT bug
+   observed at driver version 384.111, at -O0 for sm_50.  */
+
+static void
+prevent_branch_around_nothing (void)
+{
+  rtx_insn *seen_label = NULL;
+for (rtx_insn *insn = get_insns (); insn; insn = NEXT_INSN (insn))
+  {
+	if (seen_label == NULL)
+	  {
+	if (INSN_P (insn) && condjump_p (insn))
+	  seen_label = label_ref_label (nvptx_condjump_label (insn, false));
+
+	continue;
+	  }
+
+	if (NOTE_P (insn) || DEBUG_INSN_P (insn))
+	  continue;
+
+	if (INSN_P (insn))
+	  switch (recog_memoized (insn))
+	{
+	case CODE_FOR_nvptx_fork:
+	case CODE_FOR_nvptx_forked:
+	case CODE_FOR_nvptx_joining:
+	case CODE_FOR_nvptx_join:
+	  continue;
+	default:
+	  seen_label = NULL;
+	  continue;
+	}
+
+	if (LABEL_P (insn) && insn == seen_label)
+	  emit_insn_before (gen_fake_nop (), insn);
+
+	seen_label = NULL;
+  }
+  }
+#endif
+
 /* PTX-specific reorganization
- Split blocks at fork and join instructions
- Compute live registers
@@ -4442,6 +4530,10 @@ nvptx_reorg (void)
   if (TARGET_UNIFORM_SIMT)
 nvptx_reorg_uniform_simt ();
 
+#if WORKAROUND_PTXJIT_BUG_2
+  prevent_branch_around_nothing ();

Re: [nvptx, PR81352] Add exit insn after noreturn call for neutered threads in warp

2018-01-24 Thread Tom de Vries

On 01/24/2018 12:53 PM, Richard Biener wrote:

On Wed, 24 Jan 2018, Tom de Vries wrote:

I'll commit this shortly for stage4. Strictly speaking, this is not an 8
regression, but a wrong code bug. But I think that the code generation error
seems fundamental enough, and the fix simple and localized enough, that it's
stage4 permissible.


wrong-code bugs qualify for stage4 if a fix isn't too invasive.  Target
maintainers have an extra say to override stage4 rules anyway and for
non-primary/secondary targets nobody cares anyway.


Maybe then we should be more clear then in formulation of stage 4 criteria?

Thanks,
- Tom

[ Change validated as XHTML 1.0 Transitional ]

Index: htdocs/develop.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/develop.html,v
retrieving revision 1.178
diff -u -r1.178 develop.html
--- htdocs/develop.html 15 Jan 2018 08:23:26 -  1.178
+++ htdocs/develop.html 24 Jan 2018 13:40:30 -
@@ -130,10 +130,10 @@
 Stage 4

 During this period, the only (non-documentation) changes that may
-be made are changes that fix regressions.  Other changes may not be
-done during this period.  Note that the same constraints apply
-to release branches.  This period lasts until stage 1 opens for
-the next release.
+be made are changes that fix regressions, or that fix wrong-code bugs
+in a non-invasive way.  Other changes may not be done during this
+period.  Note that the same constraints apply to release branches.
+This period lasts until stage 1 opens for the next release.

 Rationale



Re: [aarch64][PATCH v2] Disable reg offset in quad-word store for Falkor

2018-01-24 Thread Siddhesh Poyarekar
On Wednesday 24 January 2018 05:50 PM, Kyrill Tkachov wrote:
> I would tend towards making the costs usage more intelligent and
> differentiating
> between loads and stores but I agree that is definitely GCC 9 material.
> Whether this approach is an acceptable stopgap for GCC 8 is up to the
> aarch64 maintainers
> and will depend, among other things, on the impact it has on generic
> (non-Falkor) codegen.
> A good experiment to help this approach would be to compile a large
> codebase (for example CPU2017)
> with a non-Falkor -mcpu setting and make sure that there are no assembly
> changes (or minimal).
> This would help justify the aarch64.md constraint changes.

Thanks, I'll verify with CPU2017.

> file paths don't need the gcc/ because the ChangeLog file is already in
> the gcc/ directory
> 
>>     gcc/testsuite/
>>     * gcc/testsuite/gcc.target/aarch64/pr82533.c: New test case.
> 
> Similarly, you don't need the gcc/testsuite/ prefix.
> Also, since you have a bugzilla PR entry please reference it in the
> ChangeLog
> right above the file list:
> PR target/82533
> 
> That way when the patch is committed the SVN hooks will pick it up
> automagically and update bugzilla.

Ugh, sorry, I was tardy about this - I'll fix it up.

>> @@ -5530,6 +5530,16 @@ aarch64_classify_address (struct
>> aarch64_address_info *info,
>>  || vec_flags == VEC_ADVSIMD
>>  || vec_flags == VEC_SVE_DATA));
>>
>> +  /* Avoid register indexing for 128-bit stores when the
>> + AARCH64_EXTRA_TUNE_SLOW_REGOFFSET_QUADWORD_STORE option is set.  */
>> +  if (!optimize_size
>> +  && type == ADDR_QUERY_STR
>> +  && (aarch64_tune_params.extra_tuning_flags
>> + & AARCH64_EXTRA_TUNE_SLOW_REGOFFSET_QUADWORD_STORE)
>> +  && (mode == TImode || mode == TFmode
>> + || aarch64_vector_data_mode_p (mode)))
>> +    allow_reg_index_p = false;
> 
> The aarch64_classify_vector_mode code has been reworked recently for SVE
> so I'm not entirely
> up to date with its logic, but I believe that
> "aarch64_classify_vector_mode (mode)" will
> allow 64-bit vector modes, which would not be using the 128-bit Q
> register, so you may be disabling
> register indexing for D-register memory stores.

I check this and fix the condition if necessary.

Thanks,
Siddhesh


Re: [aarch64][PATCH v2] Disable reg offset in quad-word store for Falkor

2018-01-24 Thread Kyrill Tkachov

Hi Siddhesh,

On 23/01/18 15:41, Siddhesh Poyarekar wrote:

Hi,

Here's v2 of the patch to disable register offset addressing mode for
stores of 128-bit values on Falkor because they're very costly.
Differences from the last version:

 - Incorporated changes Jim made to his patch earlier that I missed,
   i.e. adding an extra tuning parameter called
   SLOW_REGOFFSET_QUADWORD_STORE instead of making it conditional on
   TUNE_FALKOR.

 - Added a new query type ADDR_QUERY_STR to indicate the queried
   address is used for a store.  This way I can use it for other
   scenarios where stores are significantly more expensive than loads,
   such as pre/post modify addressing modes.

 - Incorporated the constraint functionality into
   aarch64_legitimate_address_p and aarch64_classify_address.

I evaluated the suggestion of using costs to do this but it's not
possible with the current costs as they do not differentiate between
loads and stores.  If modifying all passes that use these costs to
identify loads vs stores is considered OK (ivopts seems to be the
biggest user) then I can volunteer to do that work for gcc9 and
evetually replace this.



I would tend towards making the costs usage more intelligent and differentiating
between loads and stores but I agree that is definitely GCC 9 material.
Whether this approach is an acceptable stopgap for GCC 8 is up to the aarch64 
maintainers
and will depend, among other things, on the impact it has on generic 
(non-Falkor) codegen.
A good experiment to help this approach would be to compile a large codebase 
(for example CPU2017)
with a non-Falkor -mcpu setting and make sure that there are no assembly 
changes (or minimal).
This would help justify the aarch64.md constraint changes.

I have a couple of comments on the patch inline.

Thanks,
Kyrill




On Falkor, because of an idiosyncracy of how the pipelines are designed, a
quad-word store using a reg+reg addressing mode is almost twice as slow as an
add followed by a quad-word store with a single reg addressing mode.  So we
get better performance if we disallow addressing modes using register offsets
with quad-word stores.

This patch improves fpspeed by 0.3% and intspeed by 0.22% in CPU2017,
with omnetpp_s (4.3%) and pop2_s (2.6%) being the biggest winners.

2018-xx-xx  Jim Wilson  
Kugan Vivenakandarajah 
Siddhesh Poyarekar 

gcc/
* gcc/config/aarch64/aarch64-protos.h (aarch64_addr_query_type):
New member ADDR_QUERY_STR.
* gcc/config/aarch64/aarch64-tuning-flags.def
(SLOW_REGOFFSET_QUADWORD_STORE): New.
* gcc/config/aarch64/aarch64.c (qdf24xx_tunings): Add
SLOW_REGOFFSET_QUADWORD_STORE to tuning flags.
(aarch64_classify_address): Avoid register indexing for quad
mode stores when SLOW_REGOFFSET_QUADWORD_STORE is set.
* gcc/config/aarch64/constraints.md (Uts): New constraint.
* gcc/config/aarch64/aarch64.md (movti_aarch64, movtf_aarch64):
Use it.
* gcc/config/aarch64/aarch64-simd.md (aarch64_simd_mov):
Likewise.



file paths don't need the gcc/ because the ChangeLog file is already in the 
gcc/ directory


gcc/testsuite/
* gcc/testsuite/gcc.target/aarch64/pr82533.c: New test case.


Similarly, you don't need the gcc/testsuite/ prefix.
Also, since you have a bugzilla PR entry please reference it in the ChangeLog
right above the file list:
PR target/82533

That way when the patch is committed the SVN hooks will pick it up 
automagically and update bugzilla.


---
 gcc/config/aarch64/aarch64-protos.h |  4 
 gcc/config/aarch64/aarch64-simd.md  |  4 ++--
 gcc/config/aarch64/aarch64-tuning-flags.def |  4 
 gcc/config/aarch64/aarch64.c| 12 +++-
 gcc/config/aarch64/aarch64.md   |  6 +++---
 gcc/config/aarch64/constraints.md   |  7 +++
 gcc/testsuite/gcc.target/aarch64/pr82533.c  | 11 +++
 7 files changed, 42 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/pr82533.c

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index ef1b0bc8e28..5fedc85f283 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -120,6 +120,9 @@ enum aarch64_symbol_type
ADDR_QUERY_LDP_STP
   Query what is valid for a load/store pair.

+   ADDR_QUERY_STR
+  Query what is valid for a store.
+
ADDR_QUERY_ANY
   Query what is valid for at least one memory constraint, which may
   allow things that "m" doesn't.  For example, the SVE LDR and STR
@@ -128,6 +131,7 @@ enum aarch64_symbol_type
 enum aarch64_addr_query_type {
   ADDR_QUERY_M,
   ADDR_QUERY_LDP_STP,
+  ADDR_QUERY_STR,
   ADDR_QUERY_ANY
 };

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 

Re: [nvptx, PR81352] Add exit insn after noreturn call for neutered threads in warp

2018-01-24 Thread Richard Biener
On Wed, 24 Jan 2018, Tom de Vries wrote:

> Hi,
> 
> atm the test-case contained in this patch hangs.
> 
> For the test-case we generate:
> ...
>   @ %r79 bra $L18;
>   {
> call _gfortran_abort;
> trap;
> exit;
>   }
>  $L18:
> ...
> 
> which results in SASS code (at GOMP_NVPTX_JIT=-O4):
> ...
> /*05d8*/   @P0 BRA `(.L_18);
> /*05e8*/   JCAL `(_gfortran_abort);
> /*05f0*/   BPT.TRAP 0x1;
> /*05f8*/   EXIT;
> .L_18:
> ...
> There's no convergence point generated for the diverging branch, so we may end
> up executing random code after .L18 (a problem I long suspected could happen,
> but never observed until now).
> 
> The patch adds an exit on the other path, making sure that all threads in the
> warp reach exit, and indeed fixing the hang:
> ...
>   @ %r79 bra $L18;
>   {
> call _gfortran_abort;
> trap;
> exit;
>   }
>  $L18:
>  exit;
> ...
> 
> Build and reg-tested on x86_64 with nvptx accelerator.
> 
> I'll commit this shortly for stage4. Strictly speaking, this is not an 8
> regression, but a wrong code bug. But I think that the code generation error
> seems fundamental enough, and the fix simple and localized enough, that it's
> stage4 permissible.

wrong-code bugs qualify for stage4 if a fix isn't too invasive.  Target
maintainers have an extra say to override stage4 rules anyway and for
non-primary/secondary targets nobody cares anyway.

Richard.

> Thanks,
> - Tom
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH 1/7]: SVE: Add CLOBBER_HIGH expression

2018-01-24 Thread Alan Hayward
Ping.
Any comments on this?
The one line summary is that using self sets instead of clobber high would 
result in a
patch roughly the same, but with different condition checks.
It depends if people think it really is useful for self sets to not be live.

Given that we are at stage 4 now, and this can’t go in until stage 1, I’m happy 
to
leave the discussion until stage 1, but would appreciate any suggestions before 
then.

Thanks,
Alan.

> On 12 Jan 2018, at 11:58, Alan Hayward  wrote:
> 
> 
> 
>> On 19 Dec 2017, at 16:27, Jeff Law  wrote:
>> 
>> On 12/19/2017 03:12 AM, Alan Hayward wrote:
>>> Ping ping.
>>> I think there should be enough information in the first test to show that 
>>> any "set to self”
>>> registers become live. Let me know if there’s anything I’ve missed.
>> I think that both Richi and I would like you investigate fixing the df
>> infrastructure so that a self-set doesn't make things go live.  Sorry if
>> we weren't clear about that.  ie, level of complexity and likely fallout
>> so that we can evaluate that vs CLOBBER_HIGH.
>> 
>> 
>> 
>> jeff
>> 
> 
> 
> Right, sorry, I misunderstood. Ok, so I’ve been looking at trying to do this.
> 
> To summarise: To do this we need to check for 1) All reg sets to self where 
> 2) the existing value of the register fits within the mode of the new set. In 
> these cases we want then to (in effect) pretend the set doesn’t exist.
> To test this, I add a set to self in a tls_desc call for every V register ( 
> eg: (set (reg:TI V0_REGNUM) (reg:TI V0_REGNUM)), (set (reg:TI V1_REGNUM) 
> (reg:TI V1_REGNUM))  etc etc). If the patch works, then these registers will 
> not be backed up around a tls call.
> 
> First added some checks into df-scan and successfully stopped the sets to 
> self from being added to the live and uses lists.  [ To simplify the issue 
> for now I ignored the existing value of the register, and just assumed it 
> fits ].
> However, running my test case, the code still results in my vector registers 
> being backed up around tls. Even though the dumps show that my vector 
> registers are no longer live.
> 
> Debugging further, finds that the register backing up is happening as part of 
> combine.c. Ok, I can add a check for reg set to self in here too…..
> But as part of my clobber_high patch I already had code in clobber.c that 
> checked for CLOBBER_HIGH and then checked the mode of the previous value in 
> the register. My new code ends up looking very similar to my clobber high 
> patch.
> 
> Instead of:
> 
> if (GET_CODE (setter) == CLOBBER_HIGH
>&& reg_is_clobbered_by_clobber_high(REGNO(dest), GET_MODE 
> (rsp->last_set_value))
> 
> Now becomes something like:
> 
> if (GET_CODE (setter) == SET
>&& REG_P (dest) && HARD_REGISTER_P (dest) && REG_P (src) && REGNO(dst) == 
> REGNO(src)
>&& reg_is_clobbered_by_self_set(REGNO(dest), GET_MODE 
> (rsp->last_set_value))
> 
> I then need to find the next pass that has similar checks…. and again it’ll 
> be the same places I already have clobber high code.
> 
> I suspect in the end I’ll be able to remove the df-scan changes, because as I 
> effectively found out with clobber high, they aren’t causing any register 
> backups to happen.
> 
> Ok, in the new patch we do save a bit of code because there is no new 
> expression to add. But that was a small part of the full patch.
> 
> I could rewrite the patch in this way, but personally I feel it’s now 
> exploiting a side effect of a set to self, rather than being explicit in what 
> it is trying to do. 
> 
> 
> Alan.
> 



[PATCH][testsuite] XFAIL gcc.dg/tree-ssa/ssa-dom-cse-2.c on non-NEON arm targets

2018-01-24 Thread Kyrill Tkachov

Hi all,

This test fails to optimise away the PLUS reduction in the loop on arm targets 
when vectorisation
is not enabled due to absence of SIMD instructions.
From reading the logs and the PR I gather that the presence or absence of SIMD 
affects the passing of this test
on other targets as well, as evidenced by the long list of xfail targets.
This list looks quite unwieldy to me, but here is a patch adding non-NEON arm 
to that list.
Is this ok?

Or should we always force -mfpu=neon for arm targets instead? That's what we do 
for System Z,
but from a purist perspective the loop has nothing vector-specific in it so a 
compiler should be
able to reduce it regardless...

Thanks,
Kyrill

2018-01-24  Kyrylo Tkachov  

* gcc.dg/tree-ssa/ssa-dom-cse-2.c: XFAIL on !arm_neon arm targets.
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
index 7e77a6a0a226262541674bec1d7bdf081e916215..8606969e0940c36d6ba23f8e213b2fd4f5c1d52d 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
@@ -25,4 +25,4 @@ foo ()
but the loop reads only one element at a time, and DOM cannot resolve these.
The same happens on powerpc depending on the SIMD support available.  */
 
-/* { dg-final { scan-tree-dump "return 28;" "optimized" { xfail { { alpha*-*-* hppa*64*-*-* nvptx*-*-* } || { { lp64 && { powerpc*-*-* sparc*-*-* riscv*-*-* } } || aarch64_sve } } } } } */
+/* { dg-final { scan-tree-dump "return 28;" "optimized" { xfail { { alpha*-*-* hppa*64*-*-* nvptx*-*-* } || { { { lp64 && { powerpc*-*-* sparc*-*-* riscv*-*-* } } || aarch64_sve } || { arm*-*-* && { ! arm_neon } } } } } } } */


[nvptx, PR81352] Add exit insn after noreturn call for neutered threads in warp

2018-01-24 Thread Tom de Vries

Hi,

atm the test-case contained in this patch hangs.

For the test-case we generate:
...
  @ %r79 bra $L18;
  {
call _gfortran_abort;
trap;
exit;
  }
 $L18:
...

which results in SASS code (at GOMP_NVPTX_JIT=-O4):
...
/*05d8*/   @P0 BRA `(.L_18);
/*05e8*/   JCAL `(_gfortran_abort);
/*05f0*/   BPT.TRAP 0x1;
/*05f8*/   EXIT;
.L_18:
...
There's no convergence point generated for the diverging branch, so we 
may end up executing random code after .L18 (a problem I long suspected 
could happen, but never observed until now).


The patch adds an exit on the other path, making sure that all threads 
in the warp reach exit, and indeed fixing the hang:

...
  @ %r79 bra $L18;
  {
call _gfortran_abort;
trap;
exit;
  }
 $L18:
 exit;
...

Build and reg-tested on x86_64 with nvptx accelerator.

I'll commit this shortly for stage4. Strictly speaking, this is not an 8 
regression, but a wrong code bug. But I think that the code generation 
error seems fundamental enough, and the fix simple and localized enough, 
that it's stage4 permissible.


Thanks,
- Tom
[nvptx, PR81352] Add exit insn after noreturn call for neutered threads in warp

2018-01-23  Tom de Vries  

	PR target/81352
	* config/nvptx/nvptx.c (nvptx_single): Add exit insn after noreturn call
	for neutered threads in warp.
	* config/nvptx/nvptx.md (define_insn "exit"): New insn.

	* testsuite/libgomp.oacc-fortran/pr81352.f90: New test.

---
 gcc/config/nvptx/nvptx.c   |  7 ++-
 gcc/config/nvptx/nvptx.md  |  5 +
 libgomp/testsuite/libgomp.oacc-fortran/pr81352.f90 | 20 
 3 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index f5bb438..3516740 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -4062,7 +4062,12 @@ nvptx_single (unsigned mask, basic_block from, basic_block to)
 	if (tail_branch)
 	  before = emit_label_before (label, before);
 	else
-	  emit_label_after (label, tail);
+	  {
+	rtx_insn *label_insn = emit_label_after (label, tail);
+	if (mode == GOMP_DIM_VECTOR && CALL_P (tail)
+		&& find_reg_note (tail, REG_NORETURN, NULL))
+	  emit_insn_after (gen_exit (), label_insn);
+	  }
   }
 
   /* Now deal with propagating the branch condition.  */
diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index f9c087b..135479b 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -994,6 +994,11 @@
   ""
   "")
 
+(define_insn "exit"
+  [(const_int 1)]
+  ""
+  "exit;")
+
 (define_insn "return"
   [(return)]
   ""
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/pr81352.f90 b/libgomp/testsuite/libgomp.oacc-fortran/pr81352.f90
new file mode 100644
index 000..f6969c8
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/pr81352.f90
@@ -0,0 +1,20 @@
+! { dg-do run }
+
+program foo
+  integer :: a(3,3), l, ll
+  a = 0
+
+  !$acc parallel num_gangs (1) num_workers(1)
+
+  do l=1,3
+ !$acc loop vector
+ do ll=1,3
+a(l,ll) = 2
+ enddo
+  enddo
+
+  if (any(a(1:3,1:3).ne.2)) call abort
+
+  !$acc end parallel
+
+end program foo


RE: [patch][x86] -march=icelake

2018-01-24 Thread Richard Biener
On Wed, 24 Jan 2018, Koval, Julia wrote:

> I think we may want to extend it to more than 2 ints someday, when we run out 
> of bits again. It won't break the existing functionality if 3rd int will be 
> zero by default. That's why I tried to avoid "two" in the name.
> 
> Julia
> 
> > -Original Message-
> > From: Jakub Jelinek [mailto:ja...@redhat.com]
> > Sent: Wednesday, January 24, 2018 12:06 PM
> > To: Uros Bizjak ; Richard Biener 
> > Cc: Koval, Julia ; GCC Patches  > patc...@gcc.gnu.org>; Kirill Yukhin 
> > Subject: Re: [patch][x86] -march=icelake
> > 
> > On Wed, Jan 24, 2018 at 12:00:26PM +0100, Uros Bizjak wrote:
> > > On Mon, Jan 22, 2018 at 3:44 PM, Koval, Julia  
> > > wrote:
> > > > Yes, you are right, any() is not required. Here is the patch.
> > >
> > > Please also attach ChangeLog.
> > >
> > > The patch is OK for x86 target, it needs global reviewer approval
> > > (Maybe Jakub, as the patch touches OMP part).
> > 
> > I don't like the new class name nor header name, bit_mask is way too generic
> > name for something very specialized (double hwi bitmask).
> > 
> > Richard, any suggestions for this?

Maybe wide_int_bitmask?  You could then even use fixed_wide_int <> as
"implementation".

Richard.


RE: [patch][x86] -march=icelake

2018-01-24 Thread Koval, Julia
I think we may want to extend it to more than 2 ints someday, when we run out 
of bits again. It won't break the existing functionality if 3rd int will be 
zero by default. That's why I tried to avoid "two" in the name.

Julia

> -Original Message-
> From: Jakub Jelinek [mailto:ja...@redhat.com]
> Sent: Wednesday, January 24, 2018 12:06 PM
> To: Uros Bizjak ; Richard Biener 
> Cc: Koval, Julia ; GCC Patches  patc...@gcc.gnu.org>; Kirill Yukhin 
> Subject: Re: [patch][x86] -march=icelake
> 
> On Wed, Jan 24, 2018 at 12:00:26PM +0100, Uros Bizjak wrote:
> > On Mon, Jan 22, 2018 at 3:44 PM, Koval, Julia  wrote:
> > > Yes, you are right, any() is not required. Here is the patch.
> >
> > Please also attach ChangeLog.
> >
> > The patch is OK for x86 target, it needs global reviewer approval
> > (Maybe Jakub, as the patch touches OMP part).
> 
> I don't like the new class name nor header name, bit_mask is way too generic
> name for something very specialized (double hwi bitmask).
> 
> Richard, any suggestions for this?
> 
>   Jakub


Re: [patch][x86] -march=icelake

2018-01-24 Thread Jakub Jelinek
On Wed, Jan 24, 2018 at 12:00:26PM +0100, Uros Bizjak wrote:
> On Mon, Jan 22, 2018 at 3:44 PM, Koval, Julia  wrote:
> > Yes, you are right, any() is not required. Here is the patch.
> 
> Please also attach ChangeLog.
> 
> The patch is OK for x86 target, it needs global reviewer approval
> (Maybe Jakub, as the patch touches OMP part).

I don't like the new class name nor header name, bit_mask is way too generic
name for something very specialized (double hwi bitmask).

Richard, any suggestions for this?

Jakub


Re: [PATCH, 2/2][nvptx, PR83589] Workaround for branch-around-nothing JIT bug

2018-01-24 Thread Jakub Jelinek
On Wed, Jan 24, 2018 at 11:41:45AM +0100, Tom de Vries wrote:
> +/* Insert a dummy ptx insn when encountering a branch to a label with no ptx
> +   insn inbetween the branch and the label.  This works around a JIT bug
> +   observed at driver version 384.111, at -O0 for sm_50.  */
> +
> +static void
> +prevent_branch_around_nothing (void)
> +{
> +  rtx_insn *seen_label = 0;
> +for (rtx_insn *insn = get_insns (); insn; insn = NEXT_INSN (insn))
> +  {
> + if (seen_label == 0)
> +   {
> + if (INSN_P (insn) && condjump_p (insn))
> +   seen_label = label_ref_label (nvptx_condjump_label (insn, false));
> +
> + continue;
> +   }
> +
> + if (NOTE_P (insn))
> +   continue;

I'm afraid for review I don't know the backend enough.
I'd just suggest using NULL instead of 0 for pointers, i.e. clearing
seen_label or comparisons of seen_label against NULL, and wonder if
DEBUG_INSNs are guaranteed not to appear here.  If not, you'd need to
skip them too.

Jakub


Re: [patch][x86] -march=icelake

2018-01-24 Thread Uros Bizjak
On Mon, Jan 22, 2018 at 3:44 PM, Koval, Julia  wrote:
> Yes, you are right, any() is not required. Here is the patch.

Please also attach ChangeLog.

The patch is OK for x86 target, it needs global reviewer approval
(Maybe Jakub, as the patch touches OMP part).

Uros.

> Thanks,
> Julia
>
>> -Original Message-
>> From: Jakub Jelinek [mailto:ja...@redhat.com]
>> Sent: Monday, January 22, 2018 12:36 PM
>> To: Koval, Julia 
>> Cc: Richard Biener ; Uros Bizjak
>> ; GCC Patches ; Kirill Yukhin
>> 
>> Subject: Re: [patch][x86] -march=icelake
>>
>> On Mon, Jan 22, 2018 at 11:30:10AM +, Koval, Julia wrote:
>> > Hi, I tried omp_clause_mask and it looks ok.  But it lacks check if there
>> > is any bit or none.  With addition of it(as proposed or in some other way
>> > it should work.  What do you think about this approach(patch attached)?
>>
>> Well, I certainly didn't mean to use omp_clause_mask for something
>> completely unrelated to OpenMP, the reason I've mentioned it is that it is a
>> class that deals with a similar problem.
>>
>> So, if you want to use the same class, it would need to be moved to some
>> generic header, renamed and then c-common.h would typedef that_class
>> omp_clause_mask.
>>
>> I'm surprised you need any, doesn't ((mask & (...)) != 0 already handle
>> that?
>>
>>   Jakub
>


Re: [PATCH, 2/2][nvptx, PR83589] Workaround for branch-around-nothing JIT bug

2018-01-24 Thread Richard Biener
On Wed, 24 Jan 2018, Tom de Vries wrote:

> Hi,
> 
> this patch adds a workaround for the nvptx target JIT bug PR83589 - "[nvptx]
> mode-transitions.c and private-variables.{c,f90} execution FAILs at
> GOMP_NVPTX_JIT=-O0".
> 
> 
> When compiling a branch-around-nothing (where the branch is warp neutering, so
> it's a divergent branch):
> ...
>   .reg .pred %r36;
>   {
> .reg .u32 %x;
> mov.u32 %x,%tid.x;
> setp.ne.u32 %r36,%x,0;
>   }
> 
>   @ %r36 bra $L5;
>   $L5:
> ...
> 
> The JIT fails to generate a convergence point here:
> ...
>  /*0128*/   @P0 BRA `(.L_1);
> .L_1:
> ...
> 
> Consequently, we execute subsequent code in divergent mode, and when executing
> a shfl.idx a bit later we run into the undefined behaviour that shfl.idx has
> when executing in divergent mode.
> 
> The workaround detects branch-around-nothing, and inserts a ptx operation that
> does nothing (I'm calling it a fake nop, I haven't been able to come up with a
> better term yet):
> ...
>   @ %r36 bra $L5;
> {
>   .reg .u32 %nop_src;
>   .reg .u32 %nop_dst;
>   mov.u32 %nop_dst, %nop_src;
> }
>   $L5:
> ...
> which makes the test pass, because then we generate a convergence point here
> at .L1:
> ...
> /*0128*/   SSY `(.L_1);
> /*0130*/   @P0 SYNC (*"TARGET= .L_1 "*);
> /*0138*/   SYNC (*"TARGET= .L_1 "*);
> .L_1:
> ...
> 
> The workaround is not minimal given that it inserts the fake nop in all
> branch-around-nothings it detects, not just the warp neutering ones, but I
> think this is more robust than trying to identify the warp neutering branches.
> Furthermore, I'm not going for optimality here anyway. The optimal way to fix
> this is making sure we don't generate branch-around-nothing, but that's for
> stage1.
> 
> Build and reg-tested on x86_64 with nvptx accelerator.
> 
> I'd like to commit in stage4, but I'd appreciate a review of the code. Does
> the patch look OK?

Ok for stage4, but this isn't a review ;)

Richard.

> Thanks,
> - Tom
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


[PATCH, 2/2][nvptx, PR83589] Workaround for branch-around-nothing JIT bug

2018-01-24 Thread Tom de Vries

Hi,

this patch adds a workaround for the nvptx target JIT bug PR83589 - 
"[nvptx] mode-transitions.c and private-variables.{c,f90} execution 
FAILs at GOMP_NVPTX_JIT=-O0".



When compiling a branch-around-nothing (where the branch is warp 
neutering, so it's a divergent branch):

...
  .reg .pred %r36;
  {
.reg .u32 %x;
mov.u32 %x,%tid.x;
setp.ne.u32 %r36,%x,0;
  }

  @ %r36 bra $L5;
  $L5:
...

The JIT fails to generate a convergence point here:
...
 /*0128*/   @P0 BRA `(.L_1);
.L_1:
...

Consequently, we execute subsequent code in divergent mode, and when 
executing a shfl.idx a bit later we run into the undefined behaviour 
that shfl.idx has when executing in divergent mode.


The workaround detects branch-around-nothing, and inserts a ptx 
operation that does nothing (I'm calling it a fake nop, I haven't been 
able to come up with a better term yet):

...
  @ %r36 bra $L5;
{
  .reg .u32 %nop_src;
  .reg .u32 %nop_dst;
  mov.u32 %nop_dst, %nop_src;
}
  $L5:
...
which makes the test pass, because then we generate a convergence point 
here at .L1:

...
/*0128*/   SSY `(.L_1);
/*0130*/   @P0 SYNC (*"TARGET= .L_1 "*);
/*0138*/   SYNC (*"TARGET= .L_1 "*);
.L_1:
...

The workaround is not minimal given that it inserts the fake nop in all 
branch-around-nothings it detects, not just the warp neutering ones, but 
I think this is more robust than trying to identify the warp neutering 
branches. Furthermore, I'm not going for optimality here anyway. The 
optimal way to fix this is making sure we don't generate 
branch-around-nothing, but that's for stage1.


Build and reg-tested on x86_64 with nvptx accelerator.

I'd like to commit in stage4, but I'd appreciate a review of the code. 
Does the patch look OK?


Thanks,
- Tom
[nvptx, PR83589] Workaround for branch-around-nothing JIT bug

2018-01-23  Tom de Vries  

	PR target/83589
	* config/nvptx/nvptx.c (WORKAROUND_PTXJIT_BUG_2): Define to 1.
	(nvptx_pc_set, nvptx_condjump_label): New function. Copy from jump.c.
	Add strict parameter.
	(prevent_branch_around_nothing): Insert dummy insn between branch to
	label and label with no ptx insn inbetween.
	* config/nvptx/nvptx.md (define_insn "fake_nop"): New insn.

	* testsuite/libgomp.oacc-c-c++-common/pr83589.c: New test.

---
 gcc/config/nvptx/nvptx.c   | 92 ++
 gcc/config/nvptx/nvptx.md  |  9 +++
 .../testsuite/libgomp.oacc-c-c++-common/pr83589.c  | 21 +
 3 files changed, 122 insertions(+)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 3516740..e55b426 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -78,6 +78,7 @@
 #include "target-def.h"
 
 #define WORKAROUND_PTXJIT_BUG 1
+#define WORKAROUND_PTXJIT_BUG_2 1
 
 /* The various PTX memory areas an object might reside in.  */
 enum nvptx_data_area
@@ -4363,6 +4364,93 @@ nvptx_neuter_pars (parallel *par, unsigned modes, unsigned outer)
 nvptx_neuter_pars (par->next, modes, outer);
 }
 
+#if WORKAROUND_PTXJIT_BUG_2
+/* Variant of pc_set that only requires JUMP_P (INSN) if STRICT.  This variant
+   is needed in the nvptx target because the branches generated for
+   parititioning are NONJUMP_INSN_P, not JUMP_P.  */
+
+static rtx
+nvptx_pc_set (const rtx_insn *insn, bool strict = true)
+{
+  rtx pat;
+  if ((strict && !JUMP_P (insn))
+  || (!strict && !INSN_P (insn)))
+return NULL_RTX;
+  pat = PATTERN (insn);
+
+  /* The set is allowed to appear either as the insn pattern or
+ the first set in a PARALLEL.  */
+  if (GET_CODE (pat) == PARALLEL)
+pat = XVECEXP (pat, 0, 0);
+  if (GET_CODE (pat) == SET && GET_CODE (SET_DEST (pat)) == PC)
+return pat;
+
+  return NULL_RTX;
+}
+
+/* Variant of condjump_label that only requires JUMP_P (INSN) if STRICT.  */
+
+static rtx
+nvptx_condjump_label (const rtx_insn *insn, bool strict = true)
+{
+  rtx x = nvptx_pc_set (insn, strict);
+
+  if (!x)
+return NULL_RTX;
+  x = SET_SRC (x);
+  if (GET_CODE (x) == LABEL_REF)
+return x;
+  if (GET_CODE (x) != IF_THEN_ELSE)
+return NULL_RTX;
+  if (XEXP (x, 2) == pc_rtx && GET_CODE (XEXP (x, 1)) == LABEL_REF)
+return XEXP (x, 1);
+  if (XEXP (x, 1) == pc_rtx && GET_CODE (XEXP (x, 2)) == LABEL_REF)
+return XEXP (x, 2);
+  return NULL_RTX;
+}
+
+/* Insert a dummy ptx insn when encountering a branch to a label with no ptx
+   insn inbetween the branch and the label.  This works around a JIT bug
+   observed at driver version 384.111, at -O0 for sm_50.  */
+
+static void
+prevent_branch_around_nothing (void)
+{
+  rtx_insn *seen_label = 0;
+for (rtx_insn *insn = get_insns (); insn; insn = NEXT_INSN (insn))
+  {
+	if (seen_label == 0)
+	  {
+	if (INSN_P (insn) && condjump_p (insn))
+	  seen_label = label_ref_label (nvptx_condjump_label (insn, false));
+
+	continue;
+	  }
+
+	

Re: [PATCH, 1/2][nvptx, libgomp] Add GOMP_NVPTX_JIT=-O[0-4] in nvptx libgomp plugin

2018-01-24 Thread Jakub Jelinek
On Wed, Jan 24, 2018 at 10:58:57AM +0100, Tom de Vries wrote:
> I realize this is not really a stage4 patch, so: OK for stage1?

Ok.

Jakub


[PATCH, 1/2][nvptx, libgomp] Add GOMP_NVPTX_JIT=-O[0-4] in nvptx libgomp plugin

2018-01-24 Thread Tom de Vries

Hi,

The nvptx target PR83589 - "[nvptx] mode-transitions.c and 
private-variables.{c,f90} execution FAILs at GOMP_NVPTX_JIT=-O0" is a 
JIT bug.


I've written a workaround for this JIT bug (the second patch in this 
series).


I've only managed to reproduce the JIT bug at JIT optimization level 
-O0. But given that the JIT is a black box, I have no way of knowing if 
it only can occur at -O0, so I have to assume it can also occur at the 
current JIT optimization level used in libgomp (where we don't set it 
explicity, so we use the default, which supposedly maps onto -O4). So, I 
think we need this workaround in trunk.


But, in order to test the workaround I need a means to run a libgomp 
tests with JIT optimization level -O0.


This patch is a pruned-down and standalone version of "Handle 
GOMP_NVPTX_JIT={-O[0-4],-ori,-arch=} in libgomp nvptx plugin"  ( 
https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00172.html ), modified to 
contain just the support for changing the optimization level of the JIT.


Bootstrapped and reg-tested on x86_64.
Build and reg-tested on x86_64 with nvptx accelerator.

I realize this is not really a stage4 patch, so: OK for stage1?

[ I will commit the workaround in stage4. Without this patch, it works 
fine, it's just that the test-case does not function as a regression 
test, given that it will not trigger the JIT bug if you disable the 
workaround. ]


Thanks,
- Tom
[nvptx, libgomp] Add GOMP_NVPTX_JIT=-O[0-4] in nvptx libgomp plugin

2018-01-24  Tom de Vries  

	* plugin/cuda/cuda.h (CUjit_option): Add CU_JIT_OPTIMIZATION_LEVEL.
	* plugin/plugin-nvptx.c (_GNU_SOURCE): Define.
	(process_GOMP_NVPTX_JIT): New function.
	(link_ptx): Use process_GOMP_NVPTX_JIT.

---
 libgomp/plugin/cuda/cuda.h|  1 +
 libgomp/plugin/plugin-nvptx.c | 56 ---
 2 files changed, 54 insertions(+), 3 deletions(-)

diff --git a/libgomp/plugin/cuda/cuda.h b/libgomp/plugin/cuda/cuda.h
index edad4c6..4799825 100644
--- a/libgomp/plugin/cuda/cuda.h
+++ b/libgomp/plugin/cuda/cuda.h
@@ -88,6 +88,7 @@ typedef enum {
   CU_JIT_INFO_LOG_BUFFER_SIZE_BYTES = 4,
   CU_JIT_ERROR_LOG_BUFFER = 5,
   CU_JIT_ERROR_LOG_BUFFER_SIZE_BYTES = 6,
+  CU_JIT_OPTIMIZATION_LEVEL = 7,
   CU_JIT_LOG_VERBOSE = 12
 } CUjit_option;
 
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 9ae6095..2b875ae 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -31,6 +31,7 @@
is not clear as to what that state might be.  Or how one might
propagate it from one thread to another.  */
 
+#define _GNU_SOURCE
 #include "openacc.h"
 #include "config.h"
 #include "libgomp-plugin.h"
@@ -138,6 +139,8 @@ init_cuda_lib (void)
 # define init_cuda_lib() true
 #endif
 
+#include "secure_getenv.h"
+
 /* Convenience macros for the frequently used CUDA library call and
error handling sequence as well as CUDA library calls that
do the error checking themselves or don't do it at all.  */
@@ -876,12 +879,42 @@ notify_var (const char *var_name, const char *env_var)
 GOMP_PLUGIN_debug (0, "%s: '%s'\n", var_name, env_var);
 }
 
+static void
+process_GOMP_NVPTX_JIT (intptr_t *gomp_nvptx_o)
+{
+  const char *var_name = "GOMP_NVPTX_JIT";
+  const char *env_var = secure_getenv (var_name);
+  notify_var (var_name, env_var);
+
+  if (env_var == NULL)
+return;
+
+  const char *c = env_var;
+  while (*c != '\0')
+{
+  while (*c == ' ')
+	c++;
+
+  if (c[0] == '-' && c[1] == 'O'
+	  && '0' <= c[2] && c[2] <= '4'
+	  && (c[3] == '\0' || c[3] == ' '))
+	{
+	  *gomp_nvptx_o = c[2] - '0';
+	  c += 3;
+	  continue;
+	}
+
+  GOMP_PLUGIN_error ("Error parsing %s", var_name);
+  break;
+}
+}
+
 static bool
 link_ptx (CUmodule *module, const struct targ_ptx_obj *ptx_objs,
 	  unsigned num_objs)
 {
-  CUjit_option opts[6];
-  void *optvals[6];
+  CUjit_option opts[7];
+  void *optvals[7];
   float elapsed = 0.0;
   char elog[1024];
   char ilog[16384];
@@ -908,7 +941,24 @@ link_ptx (CUmodule *module, const struct targ_ptx_obj *ptx_objs,
   opts[5] = CU_JIT_LOG_VERBOSE;
   optvals[5] = (void *) 1;
 
-  CUDA_CALL (cuLinkCreate, 6, opts, optvals, );
+  static intptr_t gomp_nvptx_o = -1;
+
+  static bool init_done = false;
+  if (!init_done)
+{
+  process_GOMP_NVPTX_JIT (_nvptx_o);
+  init_done = true;
+  }
+
+  int nopts = 6;
+  if (gomp_nvptx_o != -1)
+{
+  opts[nopts] = CU_JIT_OPTIMIZATION_LEVEL;
+  optvals[nopts] = (void *) gomp_nvptx_o;
+  nopts++;
+}
+
+  CUDA_CALL (cuLinkCreate, nopts, opts, optvals, );
 
   for (; num_objs--; ptx_objs++)
 {


[PATCH] Fix PR83176

2018-01-24 Thread Richard Biener

The following enhances chrec_fold_plus_1 to handle a case to avoid
ICEing during GRAPHITE code-gen.  It teaches us to handle
(signed T) { (T) x, +, y } + (signed T) z as
(signed T) ( { (T) x, +, y } + (T) (singed T) z ) instead of
failing with chrec_not_known.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2018-01-24  Richard Biener  

PR tree-optimization/83176
* tree-chrec.c (chrec_fold_plus_1): Handle (signed T){(T) .. }
operands.

* gcc.dg/graphite/pr83176.c: New testcase.

Index: gcc/tree-chrec.c
===
--- gcc/tree-chrec.c(revision 256977)
+++ gcc/tree-chrec.c(working copy)
@@ -295,8 +295,23 @@ chrec_fold_plus_1 (enum tree_code code,
  return chrec_fold_plus_poly_poly (code, type, op0, op1);
 
CASE_CONVERT:
- if (tree_contains_chrecs (op1, NULL))
-   return chrec_dont_know;
+ {
+   /* We can strip sign-conversions to signed by performing the
+  operation in unsigned.  */
+   tree optype = TREE_TYPE (TREE_OPERAND (op1, 0));
+   if (INTEGRAL_TYPE_P (type)
+   && INTEGRAL_TYPE_P (optype)
+   && tree_nop_conversion_p (type, optype)
+   && TYPE_UNSIGNED (optype))
+ return chrec_convert (type,
+   chrec_fold_plus_1 (code, optype,
+  chrec_convert (optype,
+ op0, 
NULL),
+  TREE_OPERAND (op1, 0)),
+   NULL);
+   if (tree_contains_chrecs (op1, NULL))
+ return chrec_dont_know;
+ }
  /* FALLTHRU */
 
default:
@@ -313,8 +328,23 @@ chrec_fold_plus_1 (enum tree_code code,
}
 
 CASE_CONVERT:
-  if (tree_contains_chrecs (op0, NULL))
-   return chrec_dont_know;
+  {
+   /* We can strip sign-conversions to signed by performing the
+  operation in unsigned.  */
+   tree optype = TREE_TYPE (TREE_OPERAND (op0, 0));
+   if (INTEGRAL_TYPE_P (type)
+   && INTEGRAL_TYPE_P (optype)
+   && tree_nop_conversion_p (type, optype)
+   && TYPE_UNSIGNED (optype))
+ return chrec_convert (type,
+   chrec_fold_plus_1 (code, optype,
+  TREE_OPERAND (op0, 0),
+  chrec_convert (optype,
+ op1, NULL)),
+   NULL);
+   if (tree_contains_chrecs (op0, NULL))
+ return chrec_dont_know;
+  }
   /* FALLTHRU */
 
 default:
Index: gcc/testsuite/gcc.dg/graphite/pr83176.c
===
--- gcc/testsuite/gcc.dg/graphite/pr83176.c (nonexistent)
+++ gcc/testsuite/gcc.dg/graphite/pr83176.c (working copy)
@@ -0,0 +1,42 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -floop-nest-optimize" } */
+
+int wx, qi;
+
+void
+yj (int gw)
+{
+  int *ak = 
+
+  while (wx != 0)
+{
+  int k2 = (__INTPTR_TYPE__)
+  int **xq = (int **)
+
+ja:
+  *xq = 
+
+  while (qi < 1)
+   {
+ unsigned short int ey;
+
+be:
+ for (ey = 0; ey < 251; ++ey)
+   {
+ for (wx = 0; wx < 2; ++wx)
+   {
+   }
+
+ *ak += 8555712;
+ k2 += *ak;
+   }
+ ++qi;
+   }
+}
+
+  gw = 1;
+  if (gw != 0)
+goto ja;
+  else
+goto be;
+}


Re: [Patch, fortran] PR37577 - [meta-bug] change internal array descriptor format for better syntax, C interop TR, rank 15

2018-01-24 Thread Paul Richard Thomas
Hi Jakub,

The lateness is indeed embarrassing but couldn't be helped.

> Given above my preference would be to keep version an int, and
> change rank and type to signed char and attribute to signed short.
> That way there will be no padding on either 32-bit or 64-bit targets,
> and the structure will be at least a little bit smaller.
> How much work would that be to change it in the patch?  I'd expect
> just a few lines in gcc/fortran and few lines in libgfortran...
>
> Jakub

I can do that this afternoon. It's little more work than changing a
couple of typedefs. There are one or two other, cosmetic changes to
make too.

Thanks

Paul


[PATCH] Fix PR84000

2018-01-24 Thread Richard Biener

Committed as obvious.

Richard.

2018-01-24  Richard Biener  

PR middle-end/84000
* tree-cfg.c (replace_loop_annotate): Handle annot_expr_parallel_kind.

Index: gcc/tree-cfg.c
===
--- gcc/tree-cfg.c  (revision 256977)
+++ gcc/tree-cfg.c  (working copy)
@@ -347,6 +347,7 @@ replace_loop_annotate (void)
case annot_expr_unroll_kind:
case annot_expr_no_vector_kind:
case annot_expr_vector_kind:
+   case annot_expr_parallel_kind:
  break;
default:
  gcc_unreachable ();


Re: [Patch, fortran] PR37577 - [meta-bug] change internal array descriptor format for better syntax, C interop TR, rank 15

2018-01-24 Thread Jakub Jelinek
On Wed, Jan 24, 2018 at 10:30:41AM +0200, Janne Blomqvist wrote:
> On Tue, Jan 23, 2018 at 10:30 PM, Jakub Jelinek  wrote:
> > On Tue, Jan 23, 2018 at 07:30:49PM +, Paul Richard Thomas wrote:
> >> Janne, Thanks.
> >>
> >> Jakub, is this OK with you?
> >
> > It is indeed quite late for such large ABI changes, some distributions are
> > about to start using the compiler by now.
> 
> Yes, we're (painfully) aware that it's quite late. The motivation is
> that the ABI for GCC 8 has already been broken, and if we get this
> patch in, we hope to avoid having to break it again for the next
> release. See also the discussion thread starting at
> 
> https://gcc.gnu.org/ml/fortran/2018-01/msg00150.html

Ok.

> >  How much was it tested (on which
> > targets)?
> 
> Dominique tested on Darwin, both -m32 and -m64:
> https://gcc.gnu.org/ml/fortran/2018-01/msg00156.html
> 
> Thomas tested on a "big-endian target" (I guess gcc110, that is,
> powerpc64-unknown-linux-gnu.):
> https://gcc.gnu.org/ml/fortran/2018-01/msg00163.html

Good.

> In addition to bumping up the max number of dimensions, another
> motivation is to bring the descriptor closer to the F2018 C
> interoperability descriptor. See
> https://gcc.gnu.org/wiki/ArrayDescriptorUpdate for an overview. One
> idea has been to use this C descriptor as the native GFortran
> descriptor; that might or might not ever happen.
> 
> But yes, the rank, type, and attribute fields could be of type
> (signed/unsigned) char, which would reduce the sizeof(dtype_type) from
> 24 to 16 bytes on a 64-bit target. For the F2018 C descriptor the type
> field must be signed, but currently we're not using the type field in
> the same way. The others can be unsigned or signed.

Given above my preference would be to keep version an int, and
change rank and type to signed char and attribute to signed short.
That way there will be no padding on either 32-bit or 64-bit targets,
and the structure will be at least a little bit smaller.
How much work would that be to change it in the patch?  I'd expect
just a few lines in gcc/fortran and few lines in libgfortran...

Jakub


Re: [Patch, fortran] PR37577 - [meta-bug] change internal array descriptor format for better syntax, C interop TR, rank 15

2018-01-24 Thread Janne Blomqvist
On Tue, Jan 23, 2018 at 10:30 PM, Jakub Jelinek  wrote:
> On Tue, Jan 23, 2018 at 07:30:49PM +, Paul Richard Thomas wrote:
>> Janne, Thanks.
>>
>> Jakub, is this OK with you?
>
> It is indeed quite late for such large ABI changes, some distributions are
> about to start using the compiler by now.

Yes, we're (painfully) aware that it's quite late. The motivation is
that the ABI for GCC 8 has already been broken, and if we get this
patch in, we hope to avoid having to break it again for the next
release. See also the discussion thread starting at

https://gcc.gnu.org/ml/fortran/2018-01/msg00150.html

>  How much was it tested (on which
> targets)?

Dominique tested on Darwin, both -m32 and -m64:
https://gcc.gnu.org/ml/fortran/2018-01/msg00156.html

Thomas tested on a "big-endian target" (I guess gcc110, that is,
powerpc64-unknown-linux-gnu.):
https://gcc.gnu.org/ml/fortran/2018-01/msg00163.html

>  Has the debug info side of things been adjusted too (the
> get_array_descr_info langhook)?

The patch at least modifies this function.

>> >> I took the design choice choice to replace the dtype with a structure:
>> >> typedef struct dtype_type
>> >> {
>> >>   size_t elem_len;
>> >>   int version;
>> >>   int rank;
>> >>   int type;
>> >>   int attribute;
>> >> }
>> >> dtype_type;
>
> Isn't this too wasteful?  rank will be just 0-15, right?
> What values can version have?  What type?  Do you need negative values for
> any of those?
> I think using unsigned char or unsigned short for some of those fields
> should be sufficient, yeah, they don't necessarily need to be bitfields.

In addition to bumping up the max number of dimensions, another
motivation is to bring the descriptor closer to the F2018 C
interoperability descriptor. See
https://gcc.gnu.org/wiki/ArrayDescriptorUpdate for an overview. One
idea has been to use this C descriptor as the native GFortran
descriptor; that might or might not ever happen.

But yes, the rank, type, and attribute fields could be of type
(signed/unsigned) char, which would reduce the sizeof(dtype_type) from
24 to 16 bytes on a 64-bit target. For the F2018 C descriptor the type
field must be signed, but currently we're not using the type field in
the same way. The others can be unsigned or signed.




-- 
Janne Blomqvist


Fix more of cases where block is incorrectly marked as cold

2018-01-24 Thread Jan Hubicka
Hi,
this patch fixes another issue where basic block is incorrectly marked as
unlikely which is caught by Martin's hack to bb-reorder to insert trap
to all blocks in cold partition.

The problem solved here is that I have missed logic to set probabilities
to adjusted when doing basic arithmetic on them. While looking into this
I have also noticed that there is remaining FIXME in cfgcleanup and because
combine_with_freq was also wrong, I merged the RTL and tree tailmerging
logic.

Finally I have noticed that we do not put into cold section functions which
have local guessed profile but globally they are known to be executed 0
times.  This is the case of all functions not executed in train run wiht
profile feedback that definitly are supposed to land in cold section.
This is fixed in probably_never_executed predicate.

Bootstrapped/regtested x86_64-linux, comitted.

Honza

* cfgcleanup.c (try_crossjump_to_edge): Use combine_with_count
to merge probabilities.
* predict.c (probably_never_executed): Also mark as cold functions
with global 0 profile and guessed local profile.
* profile-count.c (profile_probability::combine_with_count): New
member function.
* profile-count.h (profile_probability::operator*,
profile_probability::operator*=, profile_probability::operator/,
profile_probability::operator/=): Reduce precision to adjusted
and set value to guessed on contradictory divisions.
(profile_probability::combine_with_freq): Remove.
(profile_probability::combine_wiht_count): Declare.
(profile_count::force_nonzero):: Set to adjusted.
(profile_count::probability_in):: Set quality to adjusted.
* tree-ssa-tail-merge.c (replace_block_by): Use
combine_with_count.
Index: cfgcleanup.c
===
--- cfgcleanup.c(revision 256987)
+++ cfgcleanup.c(working copy)
@@ -2130,11 +2130,9 @@ try_crossjump_to_edge (int mode, edge e1
   if (FORWARDER_BLOCK_P (s2->dest))
s2->dest->count -= s->count ();
 
-  /* FIXME: Is this correct? Should be rewritten to count API.  */
-  if (redirect_edges_to->count.nonzero_p () && src1->count.nonzero_p ())
-   s->probability = s->probability.combine_with_freq
-  (redirect_edges_to->count.to_frequency (cfun),
-   s2->probability, src1->count.to_frequency (cfun));
+  s->probability = s->probability.combine_with_count
+ (redirect_edges_to->count,
+  s2->probability, src1->count);
 }
 
   /* Adjust count for the block.  An earlier jump
Index: predict.c
===
--- predict.c   (revision 256987)
+++ predict.c   (working copy)
@@ -210,7 +210,7 @@ probably_never_executed (struct function
  profile_count count)
 {
   gcc_checking_assert (fun);
-  if (count == profile_count::zero ())
+  if (count.ipa () == profile_count::zero ())
 return true;
   /* Do not trust adjusted counts.  This will make us to drop int cold section
  code with low execution count as a result of inlining. These low counts
Index: profile-count.c
===
--- profile-count.c (revision 256987)
+++ profile-count.c (working copy)
@@ -345,3 +345,29 @@ profile_count::from_gcov_type (gcov_type
 return ret;
   }
 
+
+/* COUNT1 times event happens with *THIS probability, COUNT2 times OTHER
+   happens with COUNT2 probablity. Return probablity that either *THIS or
+   OTHER happens.  */
+
+profile_probability
+profile_probability::combine_with_count (profile_count count1,
+profile_probability other,
+profile_count count2) const
+{
+  /* If probabilities are same, we are done.
+ If counts are nonzero we can distribute accordingly. In remaining
+ cases just avreage the values and hope for the best.  */
+  if (*this == other || count1 == count2
+  || (count2 == profile_count::zero ()
+ && !(count1 == profile_count::zero (
+return *this;
+  if (count1 == profile_count::zero () && !(count2 == profile_count::zero ()))
+return other;
+  else if (count1.nonzero_p () || count2.nonzero_p ())
+return *this * count1.probability_in (count1 + count2)
+  + other * count2.probability_in (count1 + count2);
+  else
+return *this * profile_probability::even ()
+  + other * profile_probability::even ();
+}
Index: profile-count.h
===
--- profile-count.h (revision 256987)
+++ profile-count.h (working copy)
@@ -22,6 +22,7 @@ along with GCC; see the file COPYING3.
 #define GCC_PROFILE_COUNT_H
 
 struct function;
+class profile_count;
 
 /* Quality of the profile count.  Because