Re: -fstrict-aliasing fixes 5/6: make type system independent of flag_strict_aliasing

2015-12-09 Thread Richard Biener
On Wed, 9 Dec 2015, Jan Hubicka wrote:

> Hi
> this patch implements the trik for punting if we get too many nested pointers.
> This fixes the ada tstcases. Curiously enough I would like to replace 
> safe_push
> by quick_push but doing so I get weird error about freeing non-heap object
> in the auto_vec desructor...

That's odd but a sign of you doing sth wrong ;)  Like somehow
clobbering m_using_auto_storage of the vec.

But of course if you use a maximum length of 8 there
is no need to use a vec<>, just use a fixed bool[8] array
(or an integer mask).

> Bootstraping/regtesting x86_64-linux. Ok if it passes?

Ok.

Thanks,
Richard.

> Honza
> 
>   * alias.c (get_alias_set): Punt after getting 8 nested pointers.
> 
> Index: alias.c
> ===
> --- alias.c   (revision 231439)
> +++ alias.c   (working copy)
> @@ -990,6 +990,14 @@ get_alias_set (tree t)
>  || TREE_CODE (p) == VECTOR_TYPE;
>  p = TREE_TYPE (p))
>   {
> +   /* Ada supports recusive pointers.  Instead of doing recrusion check
> +  just give up once the preallocated space of 8 elements is up.
> +  In this case just punt to void * alias set.  */
> +   if (reference.length () == 8)
> + {
> +   p = ptr_type_node;
> +   break;
> + }
> if (TREE_CODE (p) == REFERENCE_TYPE)
>   /* In LTO we want languages that use references to be compatible
>  with languages that use pointers.  */


Re: [patch] Fix PR middle-end/68291 & 68292

2015-12-09 Thread Christophe Lyon
On 8 December 2015 at 19:48, Eric Botcazou  wrote:
>> I think that is ok if the testing passes.
>
> Thanks.  It did on the 3 architectures so I have applied the patch.
>

Thanks, I confirm it also fixes the regressions I noticed on ARM.

Christophe.

> --
> Eric Botcazou


Re: -fstrict-aliasing fixes 5/6: make type system independent of flag_strict_aliasing

2015-12-09 Thread Richard Biener
On Wed, 9 Dec 2015, Arnaud Charlet wrote:

> > Hi
> > this patch implements the trik for punting if we get too many nested
> > pointers.
> > This fixes the ada tstcases. Curiously enough I would like to replace
> > safe_push
> > by quick_push but doing so I get weird error about freeing non-heap object
> > in the auto_vec desructor...
> > 
> > Bootstraping/regtesting x86_64-linux. Ok if it passes?
> > 
> > Honza
> > 
> > * alias.c (get_alias_set): Punt after getting 8 nested pointers.
> > 
> > Index: alias.c
> > ===
> > 
> > --- alias.c (revision 231439)
> > +++ alias.c (working copy)
> > @@ -990,6 +990,14 @@ get_alias_set (tree t)
> >|| TREE_CODE (p) == VECTOR_TYPE;
> >p = TREE_TYPE (p))
> > {
> > + /* Ada supports recusive pointers.  Instead of doing recrusion check
> 
> typo above: recrusion -> recursion
> 
> > +just give up once the preallocated space of 8 elements is up.
> > +In this case just punt to void * alias set.  */
> > + if (reference.length () == 8)
> 
> We don't use magic numbers in general, can you please replace by a named
> constant instead?

Instead of a magic constant you could also use sth like TREE_VISITED
(or a pointer-map).  Of course that's a lot more expensive.

> > +   {
> > + p = ptr_type_node;
> > + break;
> > +   }
> >   if (TREE_CODE (p) == REFERENCE_TYPE)
> > /* In LTO we want languages that use references to be compatible
> >with languages that use pointers.  */
> 
> I'll let others comment on the general idea.
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


[PATCH, testsuite] Fix PR68629: attr-simd-3.c failure on arm-none-eabi targets

2015-12-09 Thread Thomas Preud'homme
c-c++-common/attr-simd-3.c fails to compile on arm-none-eabi targets due to 
-fcilkplus needing -pthread which is not available for those targets. This 
patch solves this issue by adding a condition to the cilkplus effective target 
that compiling with -fcilkplus succeeds and requires cilkplus as an effective 
target for attr-simd-3.c testcase.

ChangeLog entry is as follows:


*** gcc/testsuite/ChangeLog ***

2015-12-08  Thomas Preud'homme  

PR testsuite/68629
* lib/target-supports.exp (check_effective_target_cilkplus): Also
check that compiling with -fcilkplus does not give an error.
* c-c++-common/attr-simd-3.c: Require cilkplus effective target.


diff --git a/gcc/testsuite/c-c++-common/attr-simd-3.c 
b/gcc/testsuite/c-c++-common/attr-simd-3.c
index d61ba82..1970c67 100644
--- a/gcc/testsuite/c-c++-common/attr-simd-3.c
+++ b/gcc/testsuite/c-c++-common/attr-simd-3.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-require-effective-target "cilkplus" } */
 /* { dg-options "-fcilkplus" } */
 /* { dg-prune-output "undeclared here \\(not in a function\\)|\[^\n\r\]* was 
not declared in this scope" } */
 
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 4e349e9..95b903c 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -1432,7 +1432,12 @@ proc check_effective_target_cilkplus { } {
 if { [istarget avr-*-*] } {
return 0;
 }
-return 1
+return [ check_no_compiler_messages_nocache fcilkplus_available executable 
{
+   #ifdef __cplusplus
+   extern "C"
+   #endif
+   int dummy;
+   } "-fcilkplus" ]
 }
 
 proc check_linker_plugin_available { } {


Testsuite shows no regression when run with
  + an arm-none-eabi GCC cross-compiler targeting Cortex-M3
  + a bootstrapped x86_64-linux-gnu GCC native compiler

Is this ok for trunk?

Best regards,

Thomas




Re: [PATCH, PR68716] Fix GOMP/GOACC_parallel handling in find_func_clobbers

2015-12-09 Thread Jakub Jelinek
On Wed, Dec 09, 2015 at 11:01:31AM +0100, Tom de Vries wrote:
>   PR tree-optimization/68716
>   * tree-ssa-structalias.c (find_func_clobbers): Fix handling of
>   BUILT_IN_GOMP_PARALLEL and BUILT_IN_GOMP_PARALLEL.

Pasto in ChangeLog entry?

Jakub


[PATCH] Fix vectorizer memory leak

2015-12-09 Thread Richard Biener

This fixes the observed memory leak in the correct way, asserting that
overwriting of a vinfo doesn't happen.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2015-12-09  Richard Biener  

* tree-vect-stmts.c (vectorizable_load): Set new vinfo only
if it was not yet set.
* tree-vectorizer.h (set_vinfo_for_stmt): Assert we don't
overwrite an existing entry.

Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   (revision 231355)
+++ gcc/tree-vect-stmts.c   (working copy)
@@ -7276,6 +7337,9 @@ vectorizable_load (gimple *stmt, gimple_
  unshare_expr
(gimple_assign_rhs1 (stmt;
  new_temp = vect_init_vector (stmt, tem, vectype, NULL);
+ new_stmt = SSA_NAME_DEF_STMT (new_temp);
+ set_vinfo_for_stmt (new_stmt,
+ new_stmt_vec_info (new_stmt, vinfo));
}
  else
{
@@ -7283,10 +7347,8 @@ vectorizable_load (gimple *stmt, gimple_
  gsi_next ();
  new_temp = vect_init_vector (stmt, scalar_dest,
   vectype, );
+ new_stmt = SSA_NAME_DEF_STMT (new_temp);
}
- new_stmt = SSA_NAME_DEF_STMT (new_temp);
- set_vinfo_for_stmt (new_stmt,
- new_stmt_vec_info (new_stmt, vinfo));
}
 
  if (negative)
Index: gcc/tree-vectorizer.h
===
--- gcc/tree-vectorizer.h   (revision 231355)
+++ gcc/tree-vectorizer.h   (working copy)
@@ -715,7 +715,10 @@ set_vinfo_for_stmt (gimple *stmt, stmt_v
   stmt_vec_info_vec.safe_push (info);
 }
   else
-stmt_vec_info_vec[uid - 1] = info;
+{
+  gcc_checking_assert (info == NULL);
+  stmt_vec_info_vec[uid - 1] = info;
+}
 }
 
 /* Return the earlier statement between STMT1 and STMT2.  */


Re: [RFA] [PATCH] [PR tree-optimization/68619] Avoid direct cfg cleanups in tree-ssa-dom.c [1/3]

2015-12-09 Thread Jeff Law

On 12/08/2015 07:27 AM, Richard Biener wrote:


I wonder if it makes more sense to integrate this with the
domwalker itself.  Add a constructor flag to it and do everything
in itself.  By letting the before_dom_children return a taken edge
(or NULL if unknown) it can drive the outgoing edge marking. And
the domwalk worker would simply not recurse to dom children for
unreachable blocks.


So interface-wise do

[ ... ]
Close :-)

If skip_unreachable_blocks is true, then we want the walker to 
initialize EDGE_EXECUTABLE automatically.  So we drop the member 
initialization and constructor body from domwalk.h and instead have a 
ctor in domwalk.c where we can initialize the private members and set 
EDGE_EXECUTABLE as needed.


My first iteration let the clients clear EDGE_EXECUTABLE as they found 
conditionals that could be optimized.  That was pretty clean and 
localized in sccvn & dom.


If we have the before_dom_children return the taken edge, then we have 
to twiddle all the clients due to the api change in before_dom_children. 
 .  There's ~18 in total, so it's not too bad.


2 of the 18 clearly need to use the skip_unreachable_blocks capability 
(dom and sccvn).  2 others might be able to use it (tree-ssa-pre.c and 
tree-ssa-propagate.c)  I converted dom and sccvn, but not pre and the 
generic propagation engine.


I can submit the iteration which lets clients clear EDGE_EXECUTABLE, or 
the iteration where the clients return the taken edge (or NULL) from the 
before_dom_children callback.


Either is fine with me, so if you have a preference, let me know.

jeff


Re: [RFA] Transparent alias suport part 10: Fix base+offset alias analysis oracle WRT aliases

2015-12-09 Thread Richard Biener
On Wed, 9 Dec 2015, Jan Hubicka wrote:

> Hi,
> this patch fixes base+offset alias oracles to consider variable aliases.
> With this patch and one extra hack I can LTO bootstrap with variale symbol
> merging disabled in lto-symtab.c
> 
> The basic idea is simple - there is new comparer compare_base_decls which 
> knows
> how to handle aliases via symbol table's address_equal_to predicate.
> I went through the sources and tried to find all places where we compare bases
> for equality and replace the copmarsion by this new predicate.
> 
> Bootstrapped/regtested x86_64-linux, looks sane?
> 
> Honza
> 
>   * tree-ssa-alias.c (ptr_deref_may_alias_decl_p): Use compare_base_decls
>   (nonoverlapping_component_refs_of_decl_p): Update sanity check.
>   (decl_refs_may_alias_p): Use compare_base_decls.
>   * alias.c: Include cgraph.h
>   (rtx_equal_for_memref_p): Use rtx_equal_for_memref_p.
>   (compare_base_decls): New function.
>   (base_alias_check): Likewise.
>   (memrefs_conflict_p): Likewise.
>   (nonoverlapping_memrefs_p): Likewise.
>   * alias.h (compare_base_decls): Declare.
> 
>   * gcc.c-torture/execute/alias-2.c: New testcase.
> Index: tree-ssa-alias.c
> ===
> --- tree-ssa-alias.c  (revision 231438)
> +++ tree-ssa-alias.c  (working copy)
> @@ -194,7 +194,7 @@ ptr_deref_may_alias_decl_p (tree ptr, tr
>   ptr = TREE_OPERAND (base, 0);
>else if (base
>  && DECL_P (base))
> - return base == decl;
> + return compare_base_decls (base, decl) != 0;
>else if (base
>  && CONSTANT_CLASS_P (base))
>   return false;
> @@ -805,8 +805,10 @@ nonoverlapping_component_refs_of_decl_p
>ref2 = TREE_OPERAND (TREE_OPERAND (ref2, 0), 0);
>  }
>  
> -  /* We must have the same base DECL.  */
> -  gcc_assert (ref1 == ref2);
> +  /* Bases must be either same or uncomparable.  */
> +  gcc_checking_assert (ref1 == ref2
> +|| (DECL_P (ref1) && DECL_P (ref2)
> +&& compare_base_decls (ref1, ref2)));

I prefer explicit != 0 here.

>  
>/* Pop the stacks in parallel and examine the COMPONENT_REFs of the same
>   rank.  This is sufficient because we start from the same DECL and you
> @@ -989,7 +991,7 @@ decl_refs_may_alias_p (tree ref1, tree b
>gcc_checking_assert (DECL_P (base1) && DECL_P (base2));
>  
>/* If both references are based on different variables, they cannot alias. 
>  */
> -  if (base1 != base2)
> +  if (compare_base_decls (base1, base2) == 0)
>  return false;
>  
>/* If both references are based on the same variable, they cannot alias if
> Index: testsuite/gcc.c-torture/execute/alias-2.c
> ===
> --- testsuite/gcc.c-torture/execute/alias-2.c (revision 0)
> +++ testsuite/gcc.c-torture/execute/alias-2.c (revision 0)
> @@ -0,0 +1,12 @@
> +/* { dg-require-alias "" } */
> +int a[10]={};
> +extern int b[10] __attribute__ ((alias("a")));
> +int off;
> +main()
> +{
> +  b[off]=1;
> +  a[off]=2;
> +  if (b[off]!=2)
> +   __builtin_abort ();
> +  return 0;
> +}
> Index: alias.c
> ===
> --- alias.c   (revision 231438)
> +++ alias.c   (working copy)
> @@ -37,6 +37,7 @@ along with GCC; see the file COPYING3.
>  #include "langhooks.h"
>  #include "cfganal.h"
>  #include "rtl-iter.h"
> +#include "cgraph.h"
>  
>  /* The aliasing API provided here solves related but different problems:
>  
> @@ -1747,7 +1748,15 @@ rtx_equal_for_memref_p (const_rtx x, con
>return LABEL_REF_LABEL (x) == LABEL_REF_LABEL (y);
>  
>  case SYMBOL_REF:
> -  return XSTR (x, 0) == XSTR (y, 0);
> +  {
> + tree x_decl = SYMBOL_REF_DECL (x);
> + tree y_decl = SYMBOL_REF_DECL (y);
> +
> + if (!x_decl || !y_decl)
> +   return XSTR (x, 0) == XSTR (y, 0);
> + else
> +   return compare_base_decls (x_decl, y_decl) == 1;
> +  }
>  
>  case ENTRY_VALUE:
>/* This is magic, don't go through canonicalization et al.  */
> @@ -2010,6 +2019,31 @@ may_be_sp_based_p (rtx x)
>return !base || base == static_reg_base_value[STACK_POINTER_REGNUM];
>  }
>  
> +/* BASE1 and BASE2 are decls.  Return 1 if they refer to same object, 0
> +   if they refer to different objects and -1 if we can not decide.  */
> +
> +int
> +compare_base_decls (tree base1, tree base2)
> +{
> +  int ret;
> +  gcc_checking_assert (DECL_P (base1) && DECL_P (base2));
> +  if (base1 == base2)
> +return 1;
> +
> +  bool in_symtab1 = decl_in_symtab_p (base1);
> +  bool in_symtab2 = decl_in_symtab_p (base2);
> +
> +  /* Declarations of non-automatic variables may have aliases.  All other
> + decls are unique.  */
> +  if (in_symtab1 != in_symtab2 || !in_symtab1)
> +return 0;
> +  ret = symtab_node::get_create (base1)->equal_address_to
> +  (symtab_node::get_create 

Re: [PATCH, PR68716] Fix GOMP/GOACC_parallel handling in find_func_clobbers

2015-12-09 Thread Tom de Vries

On 09/12/15 11:03, Jakub Jelinek wrote:

On Wed, Dec 09, 2015 at 11:01:31AM +0100, Tom de Vries wrote:

PR tree-optimization/68716
* tree-ssa-structalias.c (find_func_clobbers): Fix handling of
BUILT_IN_GOMP_PARALLEL and BUILT_IN_GOMP_PARALLEL.


Pasto in ChangeLog entry?



Indeed, thanks for noticing.

- Tom

Fix GOMP/GOACC_parallel handling in find_func_clobbers

2015-12-08  Tom de Vries  

	PR tree-optimization/68716
	* tree-ssa-structalias.c (find_func_clobbers): Fix handling of
	BUILT_IN_GOMP_PARALLEL and BUILT_IN_GOACC_PARALLEL.

	* testsuite/libgomp.c/omp-nested-2.c: New test.

---
 gcc/tree-ssa-structalias.c | 47 +-
 libgomp/testsuite/libgomp.c/omp-nested-2.c |  4 +++
 2 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-ssa-structalias.c b/gcc/tree-ssa-structalias.c
index 060ff3e..15e351e 100644
--- a/gcc/tree-ssa-structalias.c
+++ b/gcc/tree-ssa-structalias.c
@@ -5082,7 +5082,52 @@ find_func_clobbers (struct function *fn, gimple *origt)
 	return;
 	  case BUILT_IN_GOMP_PARALLEL:
 	  case BUILT_IN_GOACC_PARALLEL:
-	return;
+	{
+	  unsigned int fnpos, argpos;
+	  switch (DECL_FUNCTION_CODE (decl))
+		{
+		case BUILT_IN_GOMP_PARALLEL:
+		  /* __builtin_GOMP_parallel (fn, data, num_threads, flags).  */
+		  fnpos = 0;
+		  argpos = 1;
+		  break;
+		case BUILT_IN_GOACC_PARALLEL:
+		  /* __builtin_GOACC_parallel (device, fn, mapnum, hostaddrs,
+	   sizes, kinds, ...).  */
+		  fnpos = 1;
+		  argpos = 3;
+		  break;
+		default:
+		  gcc_unreachable ();
+		}
+
+	  tree fnarg = gimple_call_arg (t, fnpos);
+	  gcc_assert (TREE_CODE (fnarg) == ADDR_EXPR);
+	  tree fndecl = TREE_OPERAND (fnarg, 0);
+	  varinfo_t cfi = get_vi_for_tree (fndecl);
+
+	  tree arg = gimple_call_arg (t, argpos);
+
+	  /* Parameter passed by value is used.  */
+	  lhs = get_function_part_constraint (fi, fi_uses);
+	  struct constraint_expr *rhsp;
+	  get_constraint_for (arg, );
+	  FOR_EACH_VEC_ELT (rhsc, j, rhsp)
+		process_constraint (new_constraint (lhs, *rhsp));
+	  rhsc.truncate (0);
+
+	  /* The caller clobbers what the callee does.  */
+	  lhs = get_function_part_constraint (fi, fi_clobbers);
+	  rhs = get_function_part_constraint (cfi, fi_clobbers);
+	  process_constraint (new_constraint (lhs, rhs));
+
+	  /* The caller uses what the callee does.  */
+	  lhs = get_function_part_constraint (fi, fi_uses);
+	  rhs = get_function_part_constraint (cfi, fi_uses);
+	  process_constraint (new_constraint (lhs, rhs));
+
+	  return;
+	}
 	  /* printf-style functions may have hooks to set pointers to
 	 point to somewhere into the generated string.  Leave them
 	 for a later exercise...  */
diff --git a/libgomp/testsuite/libgomp.c/omp-nested-2.c b/libgomp/testsuite/libgomp.c/omp-nested-2.c
new file mode 100644
index 000..7495afb
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/omp-nested-2.c
@@ -0,0 +1,4 @@
+// { dg-do run }
+// { dg-additional-options "-fipa-pta" }
+
+#include "omp-nested-1.c"


Re: Wrap fewer refs for LTO

2015-12-09 Thread Richard Biener
On Wed, 9 Dec 2015, Jan Hubicka wrote:

> Hi, while debugging an renaming issue I run into ADDR_EXPR of MEM_REF of 
> ADDR_EXPR which seemed somewhat odd + we don't really get the CONSTANT 
> and other flags right because recompute_tree_invariant_for_addr_expr 
> punts on ADDR_EXPR inside ADDR_EXPR.

That needs to be fixed then - [ + CST] is a valid invariant
address (we actually create this form quite often during propagation).

> The expression is created by wrap_refs. While looking at the code I 
> noitced that we wrap all VAR_DECLs while I think we only need to wrap 
> ones that can be merged by lto-symtab.

I think this comes before the times of "sane" type merging thus we
had to prepare for the type to get munged to sth incompatible here.

> I wonder if the function could not stop the recursive walk on ADDR_EXPR 
> and MEM_REFS? Is there a way we can use the inner MEM_REF for something 
> useful?

No, but the walkign would also never reach a handled-component inside
a MEM_REF.  A MEM_REF[] is not valid gimple (see 
is_gimple_mem_ref_addr).  So we don't ever re-write those "recursively".
But yeah, we'd save a few recursions for adding an extra test
against MEM_REF.

Index: lto-streamer-out.c
===
--- lto-streamer-out.c  (revision 231355)
+++ lto-streamer-out.c  (working copy)
@@ -2244,7 +2244,8 @@ wrap_refs (tree *tp, int *ws, void *)
 }
   else if (TREE_CODE (t) == CONSTRUCTOR)
 ;
-  else if (!EXPR_P (t))
+  else if (!EXPR_P (t)
+  || TREE_CODE (t) == MEM_REF)
 *ws = 0;
   return NULL_TREE;
 }

> Bootstrapped/regtested x86_64-linux, OK?

Ok.

Thanks,
Richard.

> 
> Honza
> 
>   * lto-streamer-out.c (wrap_refs): Only wrap public variables.
> Index: lto-streamer-out.c
> ===
> --- lto-streamer-out.c(revision 231438)
> +++ lto-streamer-out.c(working copy)
> @@ -2236,7 +2237,8 @@ wrap_refs (tree *tp, int *ws, void *)
>  {
>tree t = *tp;
>if (handled_component_p (t)
> -  && TREE_CODE (TREE_OPERAND (t, 0)) == VAR_DECL)
> +  && TREE_CODE (TREE_OPERAND (t, 0)) == VAR_DECL
> +  && TREE_PUBLIC (TREE_OPERAND (t, 0)))
>  {
>tree decl = TREE_OPERAND (t, 0);
>tree ptrtype = build_pointer_type (TREE_TYPE (decl));
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: Wrap fewer refs for LTO

2015-12-09 Thread Richard Biener
On Wed, 9 Dec 2015, Jan Hubicka wrote:

> > Hi,
> > while debugging an renaming issue I run into ADDR_EXPR of MEM_REF of 
> > ADDR_EXPR
> > which seemed somewhat odd + we don't really get the CONSTANT and other flags
> > right because recompute_tree_invariant_for_addr_expr punts on ADDR_EXPR 
> > inside
> > ADDR_EXPR.
> > 
> > The expression is created by wrap_refs. While looking at the code I noitced 
> > that
> > we wrap all VAR_DECLs while I think we only need to wrap ones that can be 
> > merged
> > by lto-symtab.
> > 
> > I wonder if the function could not stop the recursive walk on ADDR_EXPR and 
> > MEM_REFS?
> > Is there a way we can use the inner MEM_REF for something useful?
> > 
> > Bootstrapped/regtested x86_64-linux, OK?
> 
> Uh, sorry, actually I forgot about DECL_EXTERNAL. Here is updated version I 
> am re-testing.

OK.

>   * lto-streamer-out.c (wrap_refs): Only wrap public variables.
> Index: lto-streamer-out.c
> ===
> --- lto-streamer-out.c(revision 231438)
> +++ lto-streamer-out.c(working copy)
> @@ -2236,7 +2237,9 @@ wrap_refs (tree *tp, int *ws, void *)
>  {
>tree t = *tp;
>if (handled_component_p (t)
> -  && TREE_CODE (TREE_OPERAND (t, 0)) == VAR_DECL)
> +  && TREE_CODE (TREE_OPERAND (t, 0)) == VAR_DECL
> +  && (TREE_PUBLIC (TREE_OPERAND (t, 0))
> +|| DECL_EXTERNAL (TREE_OPERAND (t, 0
>  {
>tree decl = TREE_OPERAND (t, 0);
>tree ptrtype = build_pointer_type (TREE_TYPE (decl));
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


[PATCH, testsuite] Fix PR68632: gcc.target/arm/lto/pr65837 failure on M profile ARM targets

2015-12-09 Thread Thomas Preud'homme
gcc.target/arm/lto/pr65837 fails on M profile ARM targets because of lack of 
neon instructions. This patch adds the necessary arm_neon_ok effective target 
requirement to avoid running this test for such targets.

ChangeLog entry is as follows:


* gcc/testsuite/ChangeLog ***

2015-12-08  Thomas Preud'homme  

PR testsuite/68632
* gcc.target/arm/lto/pr65837_0.c: Require arm_neon_ok effective
target.


diff --git a/gcc/testsuite/gcc.target/arm/lto/pr65837_0.c 
b/gcc/testsuite/gcc.target/arm/lto/pr65837_0.c
index 000fc2a..fcc26a1 100644
--- a/gcc/testsuite/gcc.target/arm/lto/pr65837_0.c
+++ b/gcc/testsuite/gcc.target/arm/lto/pr65837_0.c
@@ -1,4 +1,5 @@
 /* { dg-lto-do run } */
+/* { dg-require-effective-target arm_neon_ok } */
 /* { dg-lto-options {{-flto -mfpu=neon}} } */
 /* { dg-suppress-ld-options {-mfpu=neon} } */



Testcase fails without the patch and succeeds with.

Is this ok for trunk?

Best regards,

Thomas




Re: [PATCH, PR68716] Fix GOMP/GOACC_parallel handling in find_func_clobbers

2015-12-09 Thread Richard Biener
On Wed, 9 Dec 2015, Jakub Jelinek wrote:

> On Wed, Dec 09, 2015 at 11:01:31AM +0100, Tom de Vries wrote:
> > PR tree-optimization/68716
> > * tree-ssa-structalias.c (find_func_clobbers): Fix handling of
> > BUILT_IN_GOMP_PARALLEL and BUILT_IN_GOMP_PARALLEL.
> 
> Pasto in ChangeLog entry?

Ok with that fixed.

Richard.


Re: [DOC, PATCH] Mention --enable-valgrind-annotations in install.texi

2015-12-09 Thread Martin Liška
On 12/09/2015 12:47 AM, Gerald Pfeifer wrote:
> On Tue, 8 Dec 2015, Martin Liška wrote:
>> I would like to add a missing configure option.
> 
> I saw that Jeff approved while I was mulling over the patch,
> but still have two questons:
> 
> 1. Why did you sort this in where you did?  (It's not alphabetic.)

Hi.

Well, it's not easy to find a place as the options are not well ordered :)
I tried to pick up a better place in v2.

> 
> +Specify that the compiler should interact with valgrind runtime, where
> +selected invalid memory reads are marked as false positives and
> +garbage collected memory is properly marked for proper interaction.
> 
> 2. "interact...for proper interaction" and "properly marked for
> proper" feels a little confusing to me.  That is, I had to think
> twice.  Any chance you can make this a little simpler to understand?

Sure, please read v2.

Martin

> 
> Gerald
> 

>From 3531cab4de82bc0899c44d5cac19f3541027be8b Mon Sep 17 00:00:00 2001
From: marxin 
Date: Tue, 8 Dec 2015 16:54:43 +0100
Subject: [PATCH] Mention --enable-valgrind-annotations in install.texi

gcc/ChangeLog:

2015-12-08  Martin Liska  

	* doc/install.texi (--enable-valgrind-annotations): Mention
	the configure option in configure page.
---
 gcc/doc/install.texi | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 0b71bef..652b936 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -1748,6 +1748,12 @@ When this option is specified more detailed information on memory
 allocation is gathered.  This information is printed when using
 @option{-fmem-report}.
 
+@item --enable-valgrind-annotations
+Specify that the compiler should interact with valgrind runtime,
+where selected memory-related operations are marked as valid.
+There operations are safe and should not become a candidate
+of an undefined behavior.
+
 @item --enable-nls
 @itemx --disable-nls
 The @option{--enable-nls} option enables Native Language Support (NLS),
-- 
2.6.3



Re: [RFA] [PATCH] [PR tree-optimization/68619] Avoid direct cfg cleanups in tree-ssa-dom.c [1/3]

2015-12-09 Thread Richard Biener
On Wed, Dec 9, 2015 at 9:31 AM, Jeff Law  wrote:
> On 12/08/2015 07:27 AM, Richard Biener wrote:
>>>
>>>
>>> I wonder if it makes more sense to integrate this with the
>>> domwalker itself.  Add a constructor flag to it and do everything
>>> in itself.  By letting the before_dom_children return a taken edge
>>> (or NULL if unknown) it can drive the outgoing edge marking. And
>>> the domwalk worker would simply not recurse to dom children for
>>> unreachable blocks.
>>
>>
>> So interface-wise do
>
> [ ... ]
> Close :-)
>
> If skip_unreachable_blocks is true, then we want the walker to initialize
> EDGE_EXECUTABLE automatically.  So we drop the member initialization and
> constructor body from domwalk.h and instead have a ctor in domwalk.c where
> we can initialize the private members and set EDGE_EXECUTABLE as needed.
>
> My first iteration let the clients clear EDGE_EXECUTABLE as they found
> conditionals that could be optimized.  That was pretty clean and localized
> in sccvn & dom.
>
> If we have the before_dom_children return the taken edge, then we have to
> twiddle all the clients due to the api change in before_dom_children.  .
> There's ~18 in total, so it's not too bad.
>
> 2 of the 18 clearly need to use the skip_unreachable_blocks capability (dom
> and sccvn).  2 others might be able to use it (tree-ssa-pre.c and
> tree-ssa-propagate.c)  I converted dom and sccvn, but not pre and the
> generic propagation engine.
>
> I can submit the iteration which lets clients clear EDGE_EXECUTABLE, or the
> iteration where the clients return the taken edge (or NULL) from the
> before_dom_children callback.
>
> Either is fine with me, so if you have a preference, let me know.

Return the taken edge.

Richard.

> jeff


[PATCH, PR68716] Fix GOMP/GOACC_parallel handling in find_func_clobbers

2015-12-09 Thread Tom de Vries

[ was: Re: [PATCH, PR46032] Handle BUILT_IN_GOMP_PARALLEL in ipa-pta ]

On 30/11/15 13:11, Tom de Vries wrote:

On 30/11/15 10:16, Richard Biener wrote:

On Mon, 30 Nov 2015, Tom de Vries wrote:


Hi,

this patch fixes PR46032.

It handles a call:
...
   __builtin_GOMP_parallel (fn, data, num_threads, flags)
...
as:
...
   fn (data)
...
in ipa-pta.

This improves ipa-pta alias analysis in the parallelized function fn,
and
allows vectorization in the testcase without a runtime alias test.

Bootstrapped and reg-tested on x86_64.

OK for stage3 trunk?


+ /* Assign the passed argument to the appropriate incoming
+parameter of the function.  */
+ struct constraint_expr lhs ;
+ lhs = get_function_part_constraint (fi, fi_parm_base + 0);
+ auto_vec rhsc;
+ struct constraint_expr *rhsp;
+ get_constraint_for_rhs (arg, );
+ while (rhsc.length () != 0)
+   {
+ rhsp =  ();
+ process_constraint (new_constraint (lhs, *rhsp));
+ rhsc.pop ();
+   }

please use style used elsewhere with

  FOR_EACH_VEC_ELT (rhsc, j, rhsp)
process_constraint (new_constraint (lhs, *rhsp));
  rhsc.truncate (0);



That code was copied from find_func_aliases_for_call.
I've factored out the bit that I copied as
find_func_aliases_for_call_arg, and fixed the style there (and dropped
'rhsc.truncate (0)' since AFAIU it's redundant at the end of a function).


+ /* Parameter passed by value is used.  */
+ lhs = get_function_part_constraint (fi, fi_uses);
+ struct constraint_expr *rhsp;
+ get_constraint_for_address_of (arg, );

This isn't correct - you want to use get_constraint_for (arg, ).
After all rhs is already an ADDR_EXPR.



Can we add an assert somewhere to detect this incorrect usage?


+ FOR_EACH_VEC_ELT (rhsc, j, rhsp)
+   process_constraint (new_constraint (lhs, *rhsp));
+ rhsc.truncate (0);
+
+ /* The caller clobbers what the callee does.  */
+ lhs = get_function_part_constraint (fi, fi_clobbers);
+ rhs = get_function_part_constraint (cfi, fi_clobbers);
+ process_constraint (new_constraint (lhs, rhs));
+
+ /* The caller uses what the callee does.  */
+ lhs = get_function_part_constraint (fi, fi_uses);
+ rhs = get_function_part_constraint (cfi, fi_uses);
+ process_constraint (new_constraint (lhs, rhs));

I don't see why you need those.  The solver should compute these
in even better precision (context sensitive on the call side).

The same is true for the function parameter.  That is, the only
needed part of the patch should be that making sure we see
the "direct" call and assign parameters correctly.



Dropped this bit.


Dropping the find_func_clobbers bit (in particular, the part copying the 
clobbers) caused PR68716.


Basically the find_func_clobbers bit tries to emulate the generic 
handling of calls in find_func_clobbers, but pretends the call is

  fn (data)
when handling
  __builtin_GOMP_parallel (fn, data, num_threads, flags)
.

I used the same comments as in the generic call handling to make it 
clear what part of the generic call handling is emulated.


Compared to the originally posted patch, the changes are:
- removed 'gcc_assert (TREE_CODE (arg) == ADDR_EXPR)'. Ran into a case
  where arg is NULL.
- use get_constraint_for instead of get_constraint_for_address_of as
  requested in a review comment above.
- handle GOACC_parallel as well

Bootstrapped and reg-tested on x86_64.

OK for stage3 trunk?

Thanks,
- Tom

Fix GOMP/GOACC_parallel handling in find_func_clobbers

2015-12-08  Tom de Vries  

	PR tree-optimization/68716
	* tree-ssa-structalias.c (find_func_clobbers): Fix handling of
	BUILT_IN_GOMP_PARALLEL and BUILT_IN_GOMP_PARALLEL.

	* testsuite/libgomp.c/omp-nested-2.c: New test.

---
 gcc/tree-ssa-structalias.c | 47 +-
 libgomp/testsuite/libgomp.c/omp-nested-2.c |  4 +++
 2 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-ssa-structalias.c b/gcc/tree-ssa-structalias.c
index 060ff3e..15e351e 100644
--- a/gcc/tree-ssa-structalias.c
+++ b/gcc/tree-ssa-structalias.c
@@ -5082,7 +5082,52 @@ find_func_clobbers (struct function *fn, gimple *origt)
 	return;
 	  case BUILT_IN_GOMP_PARALLEL:
 	  case BUILT_IN_GOACC_PARALLEL:
-	return;
+	{
+	  unsigned int fnpos, argpos;
+	  switch (DECL_FUNCTION_CODE (decl))
+		{
+		case BUILT_IN_GOMP_PARALLEL:
+		  /* __builtin_GOMP_parallel (fn, data, num_threads, flags).  */
+		  fnpos = 0;
+		  argpos = 1;
+		  break;
+		case BUILT_IN_GOACC_PARALLEL:
+		  /* __builtin_GOACC_parallel (device, fn, mapnum, hostaddrs,
+	   sizes, kinds, ...).  */
+		  fnpos = 1;
+		  argpos = 3;
+		  

Re: [PATCH] Fix phiopt ICE in Factor conversion in COND_EXPR (PR tree-optimization/66949)

2015-12-09 Thread Richard Biener
On Tue, Dec 8, 2015 at 5:21 PM, Marek Polacek  wrote:
> The following is a conservative fix for this PR.  This is an ICE transpiring
> in the new "Factor conversion in COND_EXPR" optimization added in r225722.
>
> Before this optimization kicks in, we have
>   :
>   ...
>   p1_32 = (short unsigned int) _20;
>
>   :
>   ...
>   iftmp.0_18 = (short unsigned int) _20;
>
>   :
>   ...
>   # iftmp.0_19 = PHI 
>
> after factor_out_conditional_conversion does its work, we end up with those 
> two
> def stmts removed and instead of the PHI we'll have
>
>   # _35 = PHI <_20(3), _20(2)>
>   iftmp.0_19 = (short unsigned int) _35;
>
> That itself looks like a fine optimization, but after 
> factor_out_conditional_conversion
> there's
>  320   phis = phi_nodes (bb2);
>  321   phi = single_non_singleton_phi_for_edges (phis, e1, e2);
>  322   gcc_assert (phi);
> and phis look like
>   b.2_38 = PHI 
>   _35 = PHI <_20(3), _20(2)>
> so single_non_singleton_phi_for_edges returns NULL and the subsequent assert 
> triggers.
>
> With this patch we won't ICE (and PRE should clean this up anyway), but I 
> don't know,
> maybe I should try harder to optimize even this problematical case (not sure 
> how hard
> it would be...)?
>
> pr66949-2.c only ICEd on powerpc64le and I have verified that this patch 
> fixes it too.
>
> Bootstrapped/regtested on x86_64-linux, ok for trunk?

Hmm.

Yes, there is an opportunity for optimization.

But I don't see why we have the asserts at all - they seem to be originated from
development.  IMHO factor_out_conditonal_conversion should simply return
the new PHI it generates instead of relying on
signle_non_signelton_phi_for_edges
to return it.

Richard.

> 2015-12-08  Marek Polacek  
>
> PR tree-optimization/66949
> * tree-ssa-phiopt.c (factor_out_conditional_conversion): Return false 
> if
> NEW_ARG0 and NEW_ARG1 are equal.
>
> * gcc.dg/torture/pr66949-1.c: New test.
> * gcc.dg/torture/pr66949-2.c: New test.
>
> diff --git gcc/testsuite/gcc.dg/torture/pr66949-1.c 
> gcc/testsuite/gcc.dg/torture/pr66949-1.c
> index e69de29..1b765bc 100644
> --- gcc/testsuite/gcc.dg/torture/pr66949-1.c
> +++ gcc/testsuite/gcc.dg/torture/pr66949-1.c
> @@ -0,0 +1,28 @@
> +/* PR tree-optimization/66949 */
> +/* { dg-do compile } */
> +
> +int a, *b = , c;
> +
> +unsigned short
> +fn1 (unsigned short p1, unsigned int p2)
> +{
> +  return p2 > 1 || p1 >> p2 ? p1 : p1 << p2;
> +}
> +
> +void
> +fn2 ()
> +{
> +  int *d = 
> +  for (a = 0; a < -1; a = 1)
> +;
> +  if (a < 0)
> +c = 0;
> +  *b = fn1 (*d || c, *d);
> +}
> +
> +int
> +main ()
> +{
> +  fn2 ();
> +  return 0;
> +}
> diff --git gcc/testsuite/gcc.dg/torture/pr66949-2.c 
> gcc/testsuite/gcc.dg/torture/pr66949-2.c
> index e69de29..e6250a3 100644
> --- gcc/testsuite/gcc.dg/torture/pr66949-2.c
> +++ gcc/testsuite/gcc.dg/torture/pr66949-2.c
> @@ -0,0 +1,23 @@
> +/* PR tree-optimization/66949 */
> +/* { dg-do compile } */
> +
> +char a;
> +int b, c, d;
> +extern int fn2 (void);
> +
> +short
> +fn1 (short p1, short p2)
> +{
> +  return p2 == 0 ? p1 : p1 / p2;
> +}
> +
> +int
> +main (void)
> +{
> +  char e = 1;
> +  int f = 7;
> +  c = a >> f;
> +  b = fn1 (c, 0 < d <= e && fn2 ());
> +
> +  return 0;
> +}
> diff --git gcc/tree-ssa-phiopt.c gcc/tree-ssa-phiopt.c
> index 344cd2f..caac5d5 100644
> --- gcc/tree-ssa-phiopt.c
> +++ gcc/tree-ssa-phiopt.c
> @@ -477,6 +477,11 @@ factor_out_conditional_conversion (edge e0, edge e1, 
> gphi *phi,
> return false;
>  }
>
> +  /* If we were to continue, we'd create a PHI with same arguments for edges
> + E0 and E1.  That could get us in trouble later, so punt.  */
> +  if (operand_equal_for_phi_arg_p (new_arg0, new_arg1))
> +return false;
> +
>/*  If arg0/arg1 have > 1 use, then this transformation actually increases
>the number of expressions evaluated at runtime.  */
>if (!has_single_use (arg0)
>
> Marek


Re: -fstrict-aliasing fixes 5/6: make type system independent of flag_strict_aliasing

2015-12-09 Thread Arnaud Charlet
> Hi
> this patch implements the trik for punting if we get too many nested
> pointers.
> This fixes the ada tstcases. Curiously enough I would like to replace
> safe_push
> by quick_push but doing so I get weird error about freeing non-heap object
> in the auto_vec desructor...
> 
> Bootstraping/regtesting x86_64-linux. Ok if it passes?
> 
> Honza
> 
>   * alias.c (get_alias_set): Punt after getting 8 nested pointers.
> 
> Index: alias.c
> ===
> 
> --- alias.c   (revision 231439)
> +++ alias.c   (working copy)
> @@ -990,6 +990,14 @@ get_alias_set (tree t)
>  || TREE_CODE (p) == VECTOR_TYPE;
>  p = TREE_TYPE (p))
>   {
> +   /* Ada supports recusive pointers.  Instead of doing recrusion check

typo above: recrusion -> recursion

> +  just give up once the preallocated space of 8 elements is up.
> +  In this case just punt to void * alias set.  */
> +   if (reference.length () == 8)

We don't use magic numbers in general, can you please replace by a named
constant instead?

> + {
> +   p = ptr_type_node;
> +   break;
> + }
> if (TREE_CODE (p) == REFERENCE_TYPE)
>   /* In LTO we want languages that use references to be compatible
>  with languages that use pointers.  */

I'll let others comment on the general idea.


[PATCH] Fix PR68583

2015-12-09 Thread Richard Biener

Well, not really in the end - the actual testcase would need code
hoisting to make the refs equal enough for ifcvt.  Making the testcase
simpler doesn't seem to capture the issue so for now without one
(I'll keep trying).

Bootstrapped and tested on x86_64-unknown-linux-gnu.

Richard.

2015-12-09  Richard Biener  

PR tree-optimization/68583
* tree-if-conv.c (ifc_dr): Make flags bool, add w_unconditionally
flag and rename predicates to w_predicate, rw_predicate and
base_w_predicate.
(DR_WRITTEN_AT_LEAST_ONCE): Rename to ...
(DR_BASE_W_UNCONDITIONALLY): ... this.
(DR_W_UNCONDITIONALLY): Add.
(hash_memrefs_baserefs_and_store_DRs_read): Adjust.  Compute
unconditionally written separately from read or written.
(ifcvt_memrefs_wont_trap): Properly treat reads.
(ifcvt_could_trap_p): Inline ...
(if_convertible_gimple_assign_stmt_p): ... here.  Refactor
to avoid code duplication.
(if_convertible_loop_p_1): Adjust and properly initialize
predicates.

Index: gcc/tree-if-conv.c
===
*** gcc/tree-if-conv.c  (revision 231396)
--- gcc/tree-if-conv.c  (working copy)
*** if_convertible_phi_p (struct loop *loop,
*** 582,601 
 each DR->aux field.  */
  
  struct ifc_dr {
!   /* -1 when not initialized, 0 when false, 1 when true.  */
!   int written_at_least_once;
! 
!   /* -1 when not initialized, 0 when false, 1 when true.  */
!   int rw_unconditionally;
! 
!   tree predicate;
! 
!   tree base_predicate;
  };
  
  #define IFC_DR(DR) ((struct ifc_dr *) (DR)->aux)
! #define DR_WRITTEN_AT_LEAST_ONCE(DR) (IFC_DR (DR)->written_at_least_once)
  #define DR_RW_UNCONDITIONALLY(DR) (IFC_DR (DR)->rw_unconditionally)
  
  /* Iterates over DR's and stores refs, DR and base refs, DR pairs in
 HASH tables.  While storing them in HASH table, it checks if the
--- 582,600 
 each DR->aux field.  */
  
  struct ifc_dr {
!   bool rw_unconditionally;
!   bool w_unconditionally;
!   bool written_at_least_once;
! 
!   tree rw_predicate;
!   tree w_predicate;
!   tree base_w_predicate;
  };
  
  #define IFC_DR(DR) ((struct ifc_dr *) (DR)->aux)
! #define DR_BASE_W_UNCONDITIONALLY(DR) (IFC_DR (DR)->written_at_least_once)
  #define DR_RW_UNCONDITIONALLY(DR) (IFC_DR (DR)->rw_unconditionally)
+ #define DR_W_UNCONDITIONALLY(DR) (IFC_DR (DR)->w_unconditionally)
  
  /* Iterates over DR's and stores refs, DR and base refs, DR pairs in
 HASH tables.  While storing them in HASH table, it checks if the
*** hash_memrefs_baserefs_and_store_DRs_read
*** 623,660 
  ref = TREE_OPERAND (ref, 0);
  
master_dr = _DR_map->get_or_insert (ref, );
- 
if (!exist1)
! {
!   IFC_DR (a)->predicate = ca;
!   *master_dr = a;
! }
!   else
! IFC_DR (*master_dr)->predicate
!   = fold_or_predicates
!   (EXPR_LOCATION (IFC_DR (*master_dr)->predicate),
!ca, IFC_DR (*master_dr)->predicate);
  
!   if (is_true_predicate (IFC_DR (*master_dr)->predicate))
! DR_RW_UNCONDITIONALLY (*master_dr) = 1;
  
if (DR_IS_WRITE (a))
  {
base_master_dr = _DR_map->get_or_insert (base_ref, );
- 
if (!exist2)
!   {
! IFC_DR (a)->base_predicate = ca;
! *base_master_dr = a;
!   }
!   else
!   IFC_DR (*base_master_dr)->base_predicate
! = fold_or_predicates
! (EXPR_LOCATION (IFC_DR (*base_master_dr)->base_predicate),
!  ca, IFC_DR (*base_master_dr)->base_predicate);
! 
!   if (is_true_predicate (IFC_DR (*base_master_dr)->base_predicate))
!   DR_WRITTEN_AT_LEAST_ONCE (*base_master_dr) = 1;
  }
  }
  
--- 622,654 
  ref = TREE_OPERAND (ref, 0);
  
master_dr = _DR_map->get_or_insert (ref, );
if (!exist1)
! *master_dr = a;
  
!   if (DR_IS_WRITE (a))
! {
!   IFC_DR (*master_dr)->w_predicate
!   = fold_or_predicates (UNKNOWN_LOCATION, ca,
! IFC_DR (*master_dr)->w_predicate);
!   if (is_true_predicate (IFC_DR (*master_dr)->w_predicate))
!   DR_W_UNCONDITIONALLY (*master_dr) = true;
! }
!   IFC_DR (*master_dr)->rw_predicate
! = fold_or_predicates (UNKNOWN_LOCATION, ca,
! IFC_DR (*master_dr)->rw_predicate);
!   if (is_true_predicate (IFC_DR (*master_dr)->rw_predicate))
! DR_RW_UNCONDITIONALLY (*master_dr) = true;
  
if (DR_IS_WRITE (a))
  {
base_master_dr = _DR_map->get_or_insert (base_ref, );
if (!exist2)
!   *base_master_dr = a;
!   IFC_DR (*base_master_dr)->base_w_predicate
!   = fold_or_predicates (UNKNOWN_LOCATION, ca,
! IFC_DR (*base_master_dr)->base_w_predicate);
!   if (is_true_predicate (IFC_DR (*base_master_dr)->base_w_predicate))
!   DR_BASE_W_UNCONDITIONALLY (*base_master_dr) = true;
  }
  }
  
*** 

Re: [DOC, PATCH] Mention --enable-valgrind-annotations in install.texi

2015-12-09 Thread Markus Trippelsdorf
On 2015.12.09 at 10:33 +0100, Martin Liška wrote:
> On 12/09/2015 12:47 AM, Gerald Pfeifer wrote:
> > On Tue, 8 Dec 2015, Martin Liška wrote:
> >> I would like to add a missing configure option.
> > 
> > +Specify that the compiler should interact with valgrind runtime, where
> > +selected invalid memory reads are marked as false positives and
> > +garbage collected memory is properly marked for proper interaction.
> > 
> > 2. "interact...for proper interaction" and "properly marked for
> > proper" feels a little confusing to me.  That is, I had to think
> > twice.  Any chance you can make this a little simpler to understand?
> 
> Sure, please read v2.
> 
> +@item --enable-valgrind-annotations
> +Specify that the compiler should interact with valgrind runtime,
> +where selected memory-related operations are marked as valid.
> +There operations are safe and should not become a candidate
> +of an undefined behavior.

Sorry, but this is simply awful ;-). How about:

Mark selected memory related operations in the compiler when run under
valgrind to suppress false positives.


-- 
Markus


Re: [RFC] Request for comments on ivopts patch

2015-12-09 Thread Richard Biener
On Tue, Dec 8, 2015 at 7:56 PM, Steve Ellcey  wrote:
> I have an ivopts optimization question/proposal.  When compiling the
> attached program the ivopts pass prefers the original ivs over new ivs
> and that causes us to generate less efficient code on MIPS.  It may
> affect other platforms too.
>
> The Source code is a C strcmp:
>
> int strcmp (const char *p1, const char *p2)
> {
>   const unsigned char *s1 = (const unsigned char *) p1;
>   const unsigned char *s2 = (const unsigned char *) p2;
>   unsigned char c1, c2;
>   do {
>   c1 = (unsigned char) *s1++;
>   c2 = (unsigned char) *s2++;
>   if (c1 == '\0') return c1 - c2;
>   } while (c1 == c2);
>   return c1 - c2;
> }
>
> Currently the code prefers the original ivs and so it generates
> code that increments s1 and s2 before doing the loads (and uses
> a -1 offset):
>
>   :
>   # s1_1 = PHI 
>   # s2_2 = PHI 
>   s1_6 = s1_1 + 1;
>   c1_8 = MEM[base: s1_6, offset: 4294967295B];
>   s2_9 = s2_2 + 1;
>   c2_10 = MEM[base: s2_9, offset: 4294967295B];
>   if (c1_8 == 0)
> goto ;
>   else
> goto ;
>
> If I remove the cost increment for non-original ivs then GCC
> does the loads before the increments:
>
>  :
>   # ivtmp.6_17 = PHI 
>   # ivtmp.7_21 = PHI 
>   _25 = (void *) ivtmp.6_17;
>   c1_8 = MEM[base: _25, offset: 0B];
>   _26 = (void *) ivtmp.7_21;
>   c2_10 = MEM[base: _26, offset: 0B];
>   if (c1_8 == 0)
> goto ;
>   else
> goto ;
> .
> .
>   :
>   ivtmp.6_14 = ivtmp.6_17 + 1;
>   ivtmp.7_23 = ivtmp.7_21 + 1;
>   if (c1_8 == c2_10)
> goto ;
>   else
> goto ;
>
>
> This second case (without the preference for the original IV)
> generates better code on MIPS because the final assembly
> has the increment instructions between the loads and the tests
> of the values being loaded and so there is no delay (or less delay)
> between the load and use.  It seems like this could easily be
> the case for other platforms too so I was wondering what people
> thought of this patch:

You don't comment on the comment you remove ... debugging
programs is also important!

So if then the cost of both cases should be distinguished
somewhere else, like granting a bonus for increment before
exit test or so.

Richard.

> 2015-12-08  Steve Ellcey  
>
> * tree-ssa-loop-ivopts.c (determine_iv_cost): Remove preference for
> original ivs.
>
>
> diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
> index 98dc451..26daabc 100644
> --- a/gcc/tree-ssa-loop-ivopts.c
> +++ b/gcc/tree-ssa-loop-ivopts.c
> @@ -5818,14 +5818,6 @@ determine_iv_cost (struct ivopts_data *data, struct 
> iv_cand *cand)
>
>cost = cost_step + adjust_setup_cost (data, cost_base.cost);
>
> -  /* Prefer the original ivs unless we may gain something by replacing it.
> - The reason is to make debugging simpler; so this is not relevant for
> - artificial ivs created by other optimization passes.  */
> -  if (cand->pos != IP_ORIGINAL
> -  || !SSA_NAME_VAR (cand->var_before)
> -  || DECL_ARTIFICIAL (SSA_NAME_VAR (cand->var_before)))
> -cost++;
> -
>/* Prefer not to insert statements into latch unless there are some
>   already (so that we do not create unnecessary jumps).  */
>if (cand->pos == IP_END


Re: [PATCH] Encode alignment info in MASK_{LOAD,STORE} ifns (PR tree-optimization/68786)

2015-12-09 Thread Richard Biener
On Tue, 8 Dec 2015, Jakub Jelinek wrote:

> Hi!
> 
> As written in the PR, with MASK_{LOAD,STORE} ifns we lose info on the
> alignment of the point, unless the referenced SSA_NAME has pointer info
> with alignment mask, but that is just an optimization issue rather than
> requirement.
> 
> As the second argument to these builtins is always 0 (used just to hold
> the pointer type for aliasing purposes), this patch uses the value of
> the second arg to hold alignment info.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok.

Thanks,
Richard.

> 2015-12-08  Jakub Jelinek  
> 
>   PR tree-optimization/68786
>   * tree-if-conv.c: Include builtins.h.
>   (predicate_mem_writes): Put result of get_object_alignment (ref)
>   into second argument's value.
>   * tree-vect-stmts.c (vectorizable_mask_load_store): Put minimum
>   pointer alignment into second argument's value.
>   * tree-data-ref.c (get_references_in_stmt): Use value of second
>   argument for build_aligned_type, and only the type to build
>   a zero second argument for MEM_REF.
>   * internal-fn.c (expand_mask_load_optab_fn,
>   expand_mask_store_optab_fn): Likewise.
> 
> --- gcc/tree-if-conv.c.jj 2015-11-30 13:40:38.0 +0100
> +++ gcc/tree-if-conv.c2015-12-08 17:29:18.139186787 +0100
> @@ -111,6 +111,7 @@ along with GCC; see the file COPYING3.
>  #include "dbgcnt.h"
>  #include "tree-hash-traits.h"
>  #include "varasm.h"
> +#include "builtins.h"
>  
>  /* List of basic blocks in if-conversion-suitable order.  */
>  static basic_block *ifc_bbs;
> @@ -2093,7 +2094,8 @@ predicate_mem_writes (loop_p loop)
>   vect_sizes.safe_push (bitsize);
>   vect_masks.safe_push (mask);
> }
> - ptr = build_int_cst (reference_alias_ptr_type (ref), 0);
> + ptr = build_int_cst (reference_alias_ptr_type (ref),
> +  get_object_alignment (ref));
>   /* Copy points-to info if possible.  */
>   if (TREE_CODE (addr) == SSA_NAME && !SSA_NAME_PTR_INFO (addr))
> copy_ref_info (build2 (MEM_REF, TREE_TYPE (ref), addr, ptr),
> --- gcc/tree-vect-stmts.c.jj  2015-12-02 20:26:59.0 +0100
> +++ gcc/tree-vect-stmts.c 2015-12-08 17:10:47.226768560 +0100
> @@ -2058,10 +2058,11 @@ vectorizable_mask_load_store (gimple *st
>   misalign = DR_MISALIGNMENT (dr);
> set_ptr_info_alignment (get_ptr_info (dataref_ptr), align,
> misalign);
> +   tree ptr = build_int_cst (TREE_TYPE (gimple_call_arg (stmt, 1)),
> + misalign ? misalign & -misalign : align);
> new_stmt
>   = gimple_build_call_internal (IFN_MASK_STORE, 4, dataref_ptr,
> -   gimple_call_arg (stmt, 1),
> -   vec_mask, vec_rhs);
> +   ptr, vec_mask, vec_rhs);
> vect_finish_stmt_generation (stmt, new_stmt, gsi);
> if (i == 0)
>   STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt;
> @@ -2107,10 +2108,11 @@ vectorizable_mask_load_store (gimple *st
>   misalign = DR_MISALIGNMENT (dr);
> set_ptr_info_alignment (get_ptr_info (dataref_ptr), align,
> misalign);
> +   tree ptr = build_int_cst (TREE_TYPE (gimple_call_arg (stmt, 1)),
> + misalign ? misalign & -misalign : align);
> new_stmt
>   = gimple_build_call_internal (IFN_MASK_LOAD, 3, dataref_ptr,
> -   gimple_call_arg (stmt, 1),
> -   vec_mask);
> +   ptr, vec_mask);
> gimple_call_set_lhs (new_stmt, make_ssa_name (vec_dest));
> vect_finish_stmt_generation (stmt, new_stmt, gsi);
> if (i == 0)
> --- gcc/tree-data-ref.c.jj2015-11-09 13:39:32.0 +0100
> +++ gcc/tree-data-ref.c   2015-12-08 17:38:55.199094723 +0100
> @@ -3872,6 +3872,8 @@ get_references_in_stmt (gimple *stmt, ve
>else if (stmt_code == GIMPLE_CALL)
>  {
>unsigned i, n;
> +  tree ptr, type;
> +  unsigned int align;
>  
>ref.is_read = false;
>if (gimple_call_internal_p (stmt))
> @@ -3882,12 +3884,16 @@ get_references_in_stmt (gimple *stmt, ve
> break;
>   ref.is_read = true;
> case IFN_MASK_STORE:
> - ref.ref = fold_build2 (MEM_REF,
> -ref.is_read
> -? TREE_TYPE (gimple_call_lhs (stmt))
> -: TREE_TYPE (gimple_call_arg (stmt, 3)),
> -gimple_call_arg (stmt, 0),
> -gimple_call_arg (stmt, 1));
> + ptr = build_int_cst (TREE_TYPE (gimple_call_arg (stmt, 1)), 0);
> + align = tree_to_shwi 

Re: [Patch,rtl Optimization]: Better register pressure estimate for Loop Invariant Code Motion.

2015-12-09 Thread Richard Biener
On Tue, Dec 8, 2015 at 11:24 PM, Ajit Kumar Agarwal
 wrote:
> Based on the comments on RFC patch this patch incorporates all the comments 
> from Jeff. Thanks Jeff for the valuable feedback.
>
> This patch enables the better register pressure estimate for Loop Invariant 
> code motion. This patch Calculate the Loop Liveness used for regs_used
>  used to calculate the register pressure  used in the cost estimation.
>
> Loop Liveness is based on the following properties.  we only require to 
> calculate the set of objects that are live at the birth or the header of the 
> loop.  We
> don't need to calculate the live through the Loop considering Live in and 
> Live out of all the basic blocks of the Loop. This is because the set of 
> objects. That
> are live-in at the birth or header of the loop will be live-in at every node 
> in the Loop.
>
> If a v live out at the header of the loop then the variable is live-in at 
> every node in the Loop. To prove this, Consider a Loop L with header h such 
> that The
> variable v defined at d is live-in at h. Since v is live at h, d is not part 
> of L. This follows from the dominance property, i.e. h is strictly dominated 
> by d.
> Furthermore, there exists a path from h to a use of v which does not go 
> through d. For every node of the loop, p, since the loop is strongly connected
> Component of the CFG, there exists a path, consisting only of nodes of L from 
> p to h. Concatenating those two paths prove that v is live-in and live-out Of 
> p.
>
> Also Calculate the Live-Out and Live-In for the exit edge of the loop. This 
> considers liveness for not only the Loop latch but also the liveness outside 
> the Loops.
>
> Bootstrapped and Regtested for x86_64 and microblaze target.
>
> There is an extra failure for the testcase gcc.dg/loop-invariant.c  with the 
> change that looks correct to me.
>
> This is because the available_regs = 6 and the regs_needed = 1 and new_regs = 
> 0 and the regs_used = 10.  As the reg_used that are based on the Liveness
> given above is greater than the available_regs, then it's candidate of spill 
> and the estimate_register_pressure calculates the spill cost. This spill cost 
> is greater
> than inv_cost and gain comes to be negative. The disables the loop invariant 
> for the above testcase.
>
> Disabling of the Loop invariant for the testcase loop-invariant.c with this 
> patch  looks correct to me considering the calculation of available_regs in 
> cfgloopanal.c
> is correct.

You keep a lot of data (bitmaps) live just to use the number of bits
set in the end.

I'll note that this also increases the complexity of the pass which is
enabled at -O1+ where
-O1 should limit itself to O(n*log n) algorithms but live is quadratic.

So I don't think doing this unconditionally is a good idea (and we
have -fira-loop-pressure
after all).

Please watch not making -O1 worse again after we spent so much time
making it suitable
for all the weird autogenerated code.

Richard.

>  ChangeLog:
>  2015-12-09  Ajit Agarwal  
>
> * loop-invariant.c
> (calculate_loop_liveness): New.
> (move_loop_invariants): Use of calculate_loop_liveness.
>
> Signed-off-by:Ajit Agarwal ajit...@xilinx.com
>
> ---
>  gcc/loop-invariant.c |   77 
> ++
>  1 files changed, 65 insertions(+), 12 deletions(-)
>
> diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
> index 53d1377..ac08594 100644
> --- a/gcc/loop-invariant.c
> +++ b/gcc/loop-invariant.c
> @@ -1464,18 +1464,7 @@ find_invariants_to_move (bool speed, bool call_p)
> registers used; we put some initial bound here to stand for
> induction variables etc.  that we do not detect.  */
>  {
> -  unsigned int n_regs = DF_REG_SIZE (df);
> -
> -  regs_used = 2;
> -
> -  for (i = 0; i < n_regs; i++)
> -   {
> - if (!DF_REGNO_FIRST_DEF (i) && DF_REGNO_LAST_USE (i))
> -   {
> - /* This is a value that is used but not changed inside loop.  */
> - regs_used++;
> -   }
> -   }
> +  regs_used = bitmap_count_bits (_DATA (curr_loop)->regs_live) + 2;
>  }
>
>if (! flag_ira_loop_pressure)
> @@ -1966,7 +1955,63 @@ mark_ref_regs (rtx x)
>}
>  }
>
> +/* Calculate the Loop Liveness used for regs_used used in
> +   heuristics to calculate the register pressure.  */
> +
> +static void
> +calculate_loop_liveness (void)
> +{
> +  struct loop *loop;
> +
> +  FOR_EACH_LOOP (loop, 0)
> +if (loop->aux == NULL)
> +  {
> +loop->aux = xcalloc (1, sizeof (struct loop_data));
> +bitmap_initialize (_DATA (loop)->regs_live, _obstack);
> + }
> +
> +  FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
> +  {
> +int  i;
> +edge e;
> +vec edges;
> +edges = get_loop_exit_edges (loop);
> +
> +/* Loop Liveness is based on the following proprties.
> +   we only require to calculate the 

Re: [Patch AArch64] Reinstate CANNOT_CHANGE_MODE_CLASS to fix pr67609

2015-12-09 Thread James Greenhalgh
On Fri, Nov 27, 2015 at 01:01:01PM +, James Greenhalgh wrote:
> This patch follow Richard Henderson's advice to tighten up
> CANNOT_CHANGE_MODE_CLASS for AArch64 to avoid a simplification bug in
> the middle-end.
> 
> There is nothing AArch64-specific about the testcase which triggers this,
> so I'll put it in the testcase for other targets. If you see a regression,
> the explanation in the PR is much more thorough and correct than I can
> reproduce here, so I'd recommend starting there. In short, target
> maintainers need to:
> 
> > forbid BITS_PER_WORD (64-bit) subregs of hard registers >
> > BITS_PER_WORD.  See the verbiage I added to the i386 backend for this.
> 
> We removed the CANNOT_CHANGE_MODE_CLASS macro back in January 2015. Before
> then, we used it to workaround bugs in big-endian vector support
> ( https://gcc.gnu.org/ml/gcc-patches/2014-12/msg01216.html ). Ideally,
> we'd not need to bring this macro back, but if we can't fix the middle-end
> bug this exposes, we need the workaround.
> 
> For AArch64, doing this runs in to some trouble with two of our
> instruction patterns - we end up with:
> 
>   (truncate:DI (reg:TF))
> 
> Which fails if it ever make it through to the simplify routines with
> something nasty like:
> 
>   (and:DI (truncate:DI (reg:TF 32 v0 [ a ]))
>   (const_int 2305843009213693951 [0x1fff]))
> 
> The simplify routines want to turn this around to look like:
> 
>   (truncate:DI (and:TF (reg:TF 32 v0 [ a ])
>   (const_int 2305843009213693951 [0x1fff])))
> 
> Which then wants to further simplify the expression by first building
> the constant in TF mode, and trunc_int_for_mode barfs:
> 
>   0x7a38a5 trunc_int_for_mode(long, machine_mode)
> .../gcc/explow.c:53
> 
> We can fix that by changing the patterns to use a zero_extract, which seems
> more in line with what they actually express (extracting the two 64-bit
> halves of a 128-bit value).
> 
> Bootstrapped on aarch64-none-linux-gnu, and tested on aarch64-none-elf and
> aarch64_be-none-elf without seeing any correctness regressions.
> 
> OK?

*ping*

Thanks,
James


> 
> If so, we ought to get this backported to the release branches, the gcc-5
> backport applies clean (testing ongoing but looks OK so far) if the release
> managers and AArch64 maintainers agree this is something that should be
> backported this late in the 5.3 release cycle.
> 
> Thanks,
> James
> 
> ---
> 2015-11-27  James Greenhalgh  
> 
>   * config/aarch64/aarch64-protos.h
>   (aarch64_cannot_change_mode_class): Bring back.
>   * config/aarch64/aarch64.c
>   (aarch64_cannot_change_mode_class): Likewise.
>   * config/aarch64/aarch64.h (CANNOT_CHANGE_MODE_CLASS): Likewise.
>   * config/aarch64/aarch64.md (aarch64_movdi_low): Use
>   zero_extract rather than truncate.
>   (aarch64_movdi_high): Likewise.
> 
> 2015-11-27  James Greenhalgh  
> 
>   * gcc.dg/torture/pr67609.c: New.
> 

> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index e0a050c..59d3da4 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -269,6 +269,9 @@ int aarch64_get_condition_code (rtx);
>  bool aarch64_bitmask_imm (HOST_WIDE_INT val, machine_mode);
>  int aarch64_branch_cost (bool, bool);
>  enum aarch64_symbol_type aarch64_classify_symbolic_expression (rtx);
> +bool aarch64_cannot_change_mode_class (machine_mode,
> +machine_mode,
> +enum reg_class);
>  bool aarch64_const_vec_all_same_int_p (rtx, HOST_WIDE_INT);
>  bool aarch64_constant_address_p (rtx);
>  bool aarch64_expand_movmem (rtx *);
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 3fe2f0f..fadb716 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -12408,6 +12408,24 @@ aarch64_vectorize_vec_perm_const_ok (machine_mode 
> vmode,
>return ret;
>  }
>  
> +/* Implement target hook CANNOT_CHANGE_MODE_CLASS.  */
> +bool
> +aarch64_cannot_change_mode_class (machine_mode from,
> +   machine_mode to,
> +   enum reg_class rclass)
> +{
> +  /* We cannot allow word_mode subregs of full vector modes.
> + Otherwise the middle-end will assume it's ok to store to
> + (subreg:DI (reg:TI 100) 0) in order to modify only the low 64 bits
> + of the 128-bit register.  However, after reload the subreg will
> + be dropped leaving a plain DImode store.  See PR67609 for a more
> + detailed dicussion.  In all other cases, we want to be premissive
> + and return false.  */
> +  return (reg_classes_intersect_p (FP_REGS, rclass)
> +   && GET_MODE_SIZE (to) == UNITS_PER_WORD
> +   && GET_MODE_SIZE (from) > UNITS_PER_WORD);
> +}
> +
>  rtx
>  aarch64_reverse_mask (enum machine_mode mode)
>  {
> diff --git 

Re: [hsa 4/10] Merge of HSA branch

2015-12-09 Thread Jakub Jelinek
On Mon, Dec 07, 2015 at 12:21:22PM +0100, Martin Jambor wrote:
> Subject: Make copy_gimple_seq_and_replace_locals copy seqs in omp clauses
> 
> Hi,
> 
> this is https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00477.html with
> the early return requested by Jakub.  Please refer to that previous
> email for explanation why it is necessary.
> 
> Thanks,
> 
> 2015-12-03  Martin Jambor  
> 
>   * tree-inline.c (duplicate_remap_omp_clause_seq): New function.
>   (replace_locals_op): Duplicate gimple sequences in OMP clauses.

Ok, thanks.

Jakub


[PATCH] Fix PR68806

2015-12-09 Thread Richard Biener

This fixes vectorization re-start without SLP when SLP analysis
removed an SLP instance - I was relying on the reduction chains
still being in the SLP instances.  The following properly
detects this case and also handles SLP reductions properly
(they can be vectorized non-SLP as well).

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2015-12-09  Richard Biener  

PR tree-optimization/68806
* tree-vect-loop.c (vect_analyze_loop_2): Properly detect
reduction chains and ignore SLP reductions.

* gcc.dg/torture/pr68806.c: New testcase.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c(revision 231451)
+++ gcc/tree-vect-loop.c(working copy)
@@ -2123,9 +2123,12 @@ again:
   if (!slp)
 return false;
 
+  /* If there are reduction chains re-trying will fail anyway.  */
+  if (! LOOP_VINFO_REDUCTION_CHAINS (loop_vinfo).is_empty ())
+return false;
+
   /* Likewise if the grouped loads or stores in the SLP cannot be handled
- via interleaving or lane instructions or if there were any SLP
- reductions.  */
+ via interleaving or lane instructions.  */
   slp_instance instance;
   slp_tree node;
   unsigned i, j;
@@ -2135,7 +2138,7 @@ again:
   vinfo = vinfo_for_stmt
  (SLP_TREE_SCALAR_STMTS (SLP_INSTANCE_TREE (instance))[0]);
   if (! STMT_VINFO_GROUPED_ACCESS (vinfo))
-   return false;
+   continue;
   vinfo = vinfo_for_stmt (STMT_VINFO_GROUP_FIRST_ELEMENT (vinfo));
   unsigned int size = STMT_VINFO_GROUP_SIZE (vinfo);
   tree vectype = STMT_VINFO_VECTYPE (vinfo);
Index: gcc/testsuite/gcc.dg/torture/pr68806.c
===
--- gcc/testsuite/gcc.dg/torture/pr68806.c  (revision 0)
+++ gcc/testsuite/gcc.dg/torture/pr68806.c  (working copy)
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+
+int sad(const unsigned char *p1, long p2)
+{
+  int a = 0;
+  for (int y = 0; y < 16; y++)
+{
+  for (int x = 0; x < 12; x++)
+   a += p1[x];
+  p1 += p2;
+}
+  return a;
+}


Re: [PATCH] Do not sanitize left shifts for -fwrapv (PR68418)

2015-12-09 Thread Paolo Bonzini


On 04/12/2015 23:48, Jeff Law wrote:
>>
>> Why would pointer types be shifted at all (at the ubsan level,
>> which is basically the AST)?
> BTW, if you argument is that we can never get into this code with a
> shift of a pointer object, I'd like to see some kind of analysis to back
> up that assertion -- which could be as simple as pointing to FE code
> that issues an error if the user tried to do something like shift a
> pointer object.

You're right, I should have qualified that better.  And actually there
is an issue with this patch, though it is not pointers.

There are only two call sites for ubsan_instrument_shift.
In c/c-typeck.c:

  /* Remember whether we're doing << or >>.  */
  bool doing_shift = false;

  /* The expression codes of the data types of the arguments tell us
 whether the arguments are integers, floating, pointers, etc.  */
  code0 = TREE_CODE (type0);
  code1 = TREE_CODE (type1);

  switch (code)
{
...
case RSHIFT_EXPR:
  ...
  else if ((code0 == INTEGER_TYPE || code0 == FIXED_POINT_TYPE)
   && code1 == INTEGER_TYPE)
{
  doing_shift = true;
  ...
}
  ...
case LSHIFT_EXPR:
  ...
  else if ((code0 == INTEGER_TYPE || code0 == FIXED_POINT_TYPE)
   && code1 == INTEGER_TYPE)
{
  doing_shift = true;
  ...
}
  ...
}
  ...
  if ((flag_sanitize & (SANITIZE_SHIFT | SANITIZE_DIVIDE
| SANITIZE_FLOAT_DIVIDE))
  && do_ubsan_in_current_function ()
  && (doing_div_or_mod || doing_shift)
  && !require_constant_value)
{
  /* OP0 and/or OP1 might have side-effects.  */
  op0 = c_save_expr (op0);
  op1 = c_save_expr (op1);
  op0 = c_fully_fold (op0, false, NULL);
  op1 = c_fully_fold (op1, false, NULL);
  ...
  else if (doing_shift && (flag_sanitize & SANITIZE_SHIFT))
instrument_expr = ubsan_instrument_shift (location, code, op0, op1);
}

cp/typeck.c is the same but it doesn't handle code0 == FIXED_POINT_TYPE.

But FIXED_POINT_TYPE is not an integral type, and thus it would fail the
TYPE_OVERFLOW_WRAPS check with my patch.  I'll post an updated patch that
also removes all instrumentation in the case of fixed point types, similar
to instrument_si_overflow.

Thanks for the careful review!

Paolo


Re: [PATCH] Make basic asm implicitly clobber memory, pr24414

2015-12-09 Thread Bernd Schmidt

On 12/09/2015 03:18 AM, Bernd Edlinger wrote:

Furthermore there is a documented use for asm(""): The empty assembler string 
is used to make a function
volatile, thus calls can not be optimized away.  But I think it is not 
necessary to make this clobber anything,
nor should it be an instruction scheduling barrier, as it used to be in the 
past.


Making that change seems risky; best not to make assumptions about how 
these are used. In any case,



... or should I wait for the next stage1?


I think that would be better.


Bernd


Fix PR ada/66526

2015-12-09 Thread Eric Botcazou
This is about the 3 -Wuninitialized warnings for the build of the Ada library:

g-expect.adb: In function 'GNAT.EXPECT.SET_UP_CHILD_COMMUNICATIONS':
g-expect.adb:1356:7: warning: 'INPUT' is used uninitialized in this function 
[-Wuninitialized]
g-expect.adb:1357:7: warning: 'OUTPUT' is used uninitialized in this function 
[-Wuninitialized]
g-expect.adb:1358:7: warning: 'ERROR' is used uninitialized in this function 
[-Wuninitialized]

They are actually false positives, but the compiler will never be able to 
understand that because the uses are past a fork-like function call on Unix 
systems.

Tested on x86-64/Linux, applied on the mainline.


2015-12-09  Eric Botcazou  

PR ada/66526
* g-expect.adb (Set_Up_Child_Communications): Add matching condition
for uses of Input, Ouput and Error variables after the Execvp call.

-- 
Eric BotcazouIndex: g-expect.adb
===
--- g-expect.adb	(revision 231440)
+++ g-expect.adb	(working copy)
@@ -1348,17 +1348,22 @@ package body GNAT.Expect is
 
   Portable_Execvp (Pid.Pid'Access, Cmd & ASCII.NUL, Args);
 
-  --  The following commands are not executed on Unix systems, and are only
-  --  required for Windows systems. We are now in the parent process.
+  --  The following lines are only required for Windows systems and will
+  --  not be executed on Unix systems, but we use the same condition as
+  --  above to avoid warnings on uninitialized variables on Unix systems.
+  --  We are now in the parent process.
 
-  --  Restore the old descriptors
+  if No_Fork_On_Target then
 
-  Dup2 (Input,  GNAT.OS_Lib.Standin);
-  Dup2 (Output, GNAT.OS_Lib.Standout);
-  Dup2 (Error,  GNAT.OS_Lib.Standerr);
-  Close (Input);
-  Close (Output);
-  Close (Error);
+ --  Restore the old descriptors
+
+ Dup2 (Input,  GNAT.OS_Lib.Standin);
+ Dup2 (Output, GNAT.OS_Lib.Standout);
+ Dup2 (Error,  GNAT.OS_Lib.Standerr);
+ Close (Input);
+ Close (Output);
+ Close (Error);
+  end if;
end Set_Up_Child_Communications;
 
---


[PATCH] Fix memory leaks in tree-vect-data-refs.c

2015-12-09 Thread Martin Liška
Hi.

This is simple follow-up of the previous patch that fixes last remaining
leak in vectorizer.

Patch can regbootstrap on x64_64-linux-gnu.

Ready for trunk?
Martin
>From 0d61420eb49dec0f5d14108373a546a8f1b52571 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 9 Dec 2015 10:14:00 +0100
Subject: [PATCH] Fix memory leaks in tree-vect-data-refs.c

gcc/ChangeLog:

2015-12-09  Martin Liska  

	* tree-vect-data-refs.c: Free an overwritten dataref.
---
 gcc/tree-vect-data-refs.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index 8810af1..4c566c8 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -3847,6 +3847,7 @@ again:
 	  return false;
 	}
 
+	  free_data_ref (datarefs[i]);
 	  datarefs[i] = dr;
 	  STMT_VINFO_GATHER_SCATTER_P (stmt_info) = gatherscatter;
 	}
-- 
2.6.3



[PATCH] GCC-5 backport of PR lto/65948

2015-12-09 Thread Martin Liška
Hi.

I would like to backport forgotten patch to GCC-5 branch.
Bootstrap and regression tests have been running, ready after it finishes?

Thanks,
Martin
>From 626df2a92db87f7c0668c7f2902f314ed6a9bc4c Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 9 Dec 2015 12:49:22 +0100
Subject: [PATCH] Backport PR lto/65948

---
 gcc/ChangeLog| 9 +
 gcc/ipa-devirt.c | 1 +
 2 files changed, 10 insertions(+)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index ce6f865..067f261 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,12 @@
+2015-12-09  Martin Liska  
+
+	Backport from mainline
+	2015-04-30  Jan Hubicka  
+
+	PR lto/65948
+	* ipa-devirt.c (odr_types_equivalent_p): NULLPTR_TYPE is equivalent
+	to itself.
+
 2015-12-07  Jakub Jelinek  
 
 	Backport from mainline
diff --git a/gcc/ipa-devirt.c b/gcc/ipa-devirt.c
index e319785..c165bce 100644
--- a/gcc/ipa-devirt.c
+++ b/gcc/ipa-devirt.c
@@ -1536,6 +1536,7 @@ odr_types_equivalent_p (tree t1, tree t2, bool warn, bool *warned,
 	break;
   }
 case VOID_TYPE:
+case NULLPTR_TYPE:
   break;
 
 default:
-- 
2.6.3



[PATCH] Fix PR68583 some more

2015-12-09 Thread Richard Biener

This time with a simplified testcase - the remaining issue with
the original one is if-conversion looking at DR_REF and nothing
hoisting the address compute of a[i+1] in both arms making it
see they are the same (yeah, SCEV to the rescue, but not in this
patch).

This patch re-organizes things so that we apply if conversion
of stores when it is safe (and not only when -ftree-loop-if-convert-stores
is given).  And when it is not safe (aka store-data-races) we rely
on -ftree-loop-if-convert-stores - which I'm going to change in a followup
patch to use --param allow-store-data-races like we do for LIM
(and incidentially enable for -Ofast).

The patch changes the flow throughout if-conversion to remember
if we if-converted a store (because that needs to trigger some
different code-paths) and re-uses any_mask_load_store for that
which also means we apply versioning if we if-convert a store.
IMHO a good thing to avoid spurious store-data-races (if requested)
in non-vectorized code.  if-conversion of stores in scalar
code should remain to RTL if-conversion.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

I've verified that SPEC CPU 2006 still builds and runs properly
on a AVX2 machine with flag_tree_loop_if_convert_stores changed
to PARAM_VALUE (PARAM_ALLOW_STORE_DATA_RACES) and -Ofast -march=native.

AFAICS if-conversion still has some unguarded store data races
with respect to the handled-components it strips from the
accesses before determining them as "same".  So I suppose
that strictly speaking we need to split the hash map into
two when we don't allow store data races (one for reads
doing the stripping and one for stores not doing it).  We could
also simply keep a flag "all DR_REFs same".

That said, I'd like to get rid of the separate if-conversion flag
for stores.

Richard.

2015-12-09  Richard Biener  

PR tree-optimization/68583
* tree-if-conv.c (if_convertible_phi_p): Drop
flag_tree_loop_if_convert_stores check in favor of the
existing any_mask_load_store check.
(insert_gimplified_predicates): Likewise.
(combine_blocks): Likewise.
(tree_if_conversion): Likewise.
(ifcvt_memrefs_wont_trap): Properly check
flag_tree_loop_if_convert_stores in all places that can end
up introducing store-data-races.
(if_convertible_gimple_assign_stmt_p): Remove restriction
on flag_tree_loop_if_convert_stores for stores we can if-convert
without introducing store-data-races.  Force versioning for
all if-converted stores.

* gcc.dg/tree-ssa/ifc-pr68583.c: New testcase.
* gcc.dg/vect/vect-72.c: Adjust.
* gcc.dg/vect/vect-cselim-2.c: Likewise.
* gcc.dg/vect/vect-strided-store-a-u8-i2.c: Likewise.

Index: gcc/tree-if-conv.c
===
*** gcc/tree-if-conv.c  (revision 231444)
--- gcc/tree-if-conv.c  (working copy)
*** bb_with_exit_edge_p (struct loop *loop,
*** 517,523 
 PHI is not if-convertible if:
 - it has more than 2 arguments.
  
!When the flag_tree_loop_if_convert_stores is not set, PHI is not
 if-convertible if:
 - a virtual PHI is immediately used in another PHI node,
 - there is a virtual PHI in a BB other than the loop->header.
--- 517,523 
 PHI is not if-convertible if:
 - it has more than 2 arguments.
  
!When we didn't see if-convertible stores, PHI is not
 if-convertible if:
 - a virtual PHI is immediately used in another PHI node,
 - there is a virtual PHI in a BB other than the loop->header.
*** if_convertible_phi_p (struct loop *loop,
*** 545,554 
  }
  }
  
!   if (flag_tree_loop_if_convert_stores || any_mask_load_store)
  return true;
  
!   /* When the flag_tree_loop_if_convert_stores is not set, check
   that there are no memory writes in the branches of the loop to be
   if-converted.  */
if (virtual_operand_p (gimple_phi_result (phi)))
--- 545,554 
  }
  }
  
!   if (any_mask_load_store)
  return true;
  
!   /* When there were no if-convertible stores, check
   that there are no memory writes in the branches of the loop to be
   if-converted.  */
if (virtual_operand_p (gimple_phi_result (phi)))
*** ifcvt_memrefs_wont_trap (gimple *stmt, v
*** 713,728 
   to unconditionally.  */
if (base_master_dr
  && DR_BASE_W_UNCONDITIONALLY (*base_master_dr))
!   return true;
else
{
  /* or the base is know to be not readonly.  */
  tree base_tree = get_base_address (DR_REF (a));
  if (DECL_P (base_tree)
  && decl_binds_to_current_def_p (base_tree)
! && flag_tree_loop_if_convert_stores
! && !TREE_READONLY (base_tree))
!   return true;
}
  }
return false;
--- 713,727 
   to unconditionally. 

Re: [Patch AArch64] Reinstate CANNOT_CHANGE_MODE_CLASS to fix pr67609

2015-12-09 Thread Marcus Shawcroft
On 27 November 2015 at 13:01, James Greenhalgh  wrote:

> 2015-11-27  James Greenhalgh  
>
> * config/aarch64/aarch64-protos.h
> (aarch64_cannot_change_mode_class): Bring back.
> * config/aarch64/aarch64.c
> (aarch64_cannot_change_mode_class): Likewise.
> * config/aarch64/aarch64.h (CANNOT_CHANGE_MODE_CLASS): Likewise.
> * config/aarch64/aarch64.md (aarch64_movdi_low): Use
> zero_extract rather than truncate.
> (aarch64_movdi_high): Likewise.
>
> 2015-11-27  James Greenhalgh  
>
> * gcc.dg/torture/pr67609.c: New.
>

+ detailed dicussion.  In all other cases, we want to be premissive

s/premissive/permissive/

OK /Marcus


Re: [hsa 5/10] OpenMP lowering/expansion changes (gridification)

2015-12-09 Thread Jakub Jelinek
On Mon, Dec 07, 2015 at 12:22:43PM +0100, Martin Jambor wrote:
> it creates a copy of the entire target body and expands it slightly
> differently for concurrent execution on a GPU.  Note that both teams
> and distribute constructs are mandatory.  Moreover, currently the
> distribute has to be in a combined statement with the inner for
> construct.  And there are quite a few other restrictions which I hope

The standard calls those composite constructs, and I bet for gridification
you want that restriction always, without composite distribute parallel for
there are two different unrelated loops.

>   * builtin-types.def (BT_FN_VOID_UINT_PTR_INT_PTR): New.
>   (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_INT_INT): Removed.
>   (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR): New.
>   * fortran/types.def (BT_FN_VOID_UINT_PTR_INT_PTR): New.
>   (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_INT_INT): Removed.
>   (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR): New.

Fortran has its own ChangeLog file.

> @@ -556,9 +558,9 @@ DEF_FUNCTION_TYPE_9 
> (BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT,
>BT_PTR_FN_VOID_PTR_PTR, BT_LONG, BT_LONG,
>BT_BOOL, BT_UINT, BT_PTR, BT_INT)
>  
> -DEF_FUNCTION_TYPE_10 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_INT_INT,
> -   BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_SIZE, BT_PTR,
> -   BT_PTR, BT_PTR, BT_UINT, BT_PTR, BT_INT, BT_INT)
> +DEF_FUNCTION_TYPE_9 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR,
> +  BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_SIZE, BT_PTR,
> +  BT_PTR, BT_PTR, BT_UINT, BT_PTR, BT_PTR)

There shouldn't be an empty line in between this DEF_FUNCTION_TYPE_9 and the
previous one.

> @@ -221,9 +223,9 @@ DEF_FUNCTION_TYPE_9 
> (BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT,
>BT_PTR_FN_VOID_PTR_PTR, BT_LONG, BT_LONG,
>BT_BOOL, BT_UINT, BT_PTR, BT_INT)
>  
> -DEF_FUNCTION_TYPE_10 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_INT_INT,
> +DEF_FUNCTION_TYPE_9 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR,
> BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_SIZE, BT_PTR,
> -   BT_PTR, BT_PTR, BT_UINT, BT_PTR, BT_INT, BT_INT)
> +   BT_PTR, BT_PTR, BT_UINT, BT_PTR, BT_PTR)
>  
>  DEF_FUNCTION_TYPE_11 
> (BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_UINT_LONG_INT_LONG_LONG_LONG,
> BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR,

Ditto.

> --- a/gcc/gimple.def
> +++ b/gcc/gimple.def
> @@ -369,13 +369,17 @@ DEFGSCODE(GIMPLE_OMP_TARGET, "gimple_omp_target", 
> GSS_OMP_PARALLEL_LAYOUT)
>  /* GIMPLE_OMP_TEAMS  represents #pragma omp teams
> BODY is the sequence of statements inside the single section.
> CLAUSES is an OMP_CLAUSE chain holding the associated clauses.  */
> -DEFGSCODE(GIMPLE_OMP_TEAMS, "gimple_omp_teams", GSS_OMP_SINGLE_LAYOUT)
> +DEFGSCODE(GIMPLE_OMP_TEAMS, "gimple_omp_teams", GSS_OMP_TEAMS_LAYOUT)

Why?

> +/* GIMPLE_OMP_GPUKERNEL  represents a parallel loop lowered for 
> execution
> +   on a GPU.  It is an artificial statement created by omp lowering.  */
> +DEFGSCODE(GIMPLE_OMP_GPUKERNEL, "gimple_omp_gpukernel", GSS_OMP)

Why do you call it GPUKERNEL or KERNEL_BODY when you really mean gridified
body and gridified loop?  I mean, what is GPU specific about it?  PTX is
unlikely going to use that.  And kernel is a wide term.

> @@ -622,8 +623,14 @@ struct GTY((tag("GSS_OMP_FOR")))
>/* [ WORD 11 ]
>   Pre-body evaluated before the loop body begins.  */
>gimple_seq pre_body;
> +
> +  /* [ WORD 12 ]
> + If set, this statement is part of a gridified kernel, its clauses need 
> to
> + be scanned and lowered but the statement should be discarded after
> + lowering.  */
> +  bool kernel_phony;

Ugh no, flags should go into GF_OMP_*.

> @@ -643,6 +660,12 @@ struct GTY((tag("GSS_OMP_PARALLEL_LAYOUT")))
>/* [ WORD 10 ]
>   Shared data argument.  */
>tree data_arg;
> +
> +  /* [ WORD 11 ] */
> +  /* If set, this statement is part of a gridified kernel, its clauses need 
> to
> + be scanned and lowered but the statement should be discarded after
> + lowering.  */
> +  bool kernel_phony;
>  };

Likewise.

As for omp-low.c changes, the file is already large enough that it would be
nice if it is easy to find out what routines are for gridification purposes
only, use some special prefix (grid_*, ompgrid_*, ...) for all such
functions?

> @@ -1761,6 +1786,8 @@ fixup_child_record_type (omp_context *ctx)
>  {
>tree f, type = ctx->record_type;
>  
> +  if (!ctx->receiver_decl)
> +return;

So when is receiver_decl NULL?

> @@ -2113,6 +2140,14 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
>   }
> break;
>  
> + case OMP_CLAUSE__GRIDDIM_:
> +   if (ctx->outer)
> + {
> +   scan_omp_op 

RE: [Patch,rtl Optimization]: Better register pressure estimate for Loop Invariant Code Motion.

2015-12-09 Thread Ajit Kumar Agarwal


-Original Message-
From: Richard Biener [mailto:richard.guent...@gmail.com] 
Sent: Wednesday, December 09, 2015 4:06 PM
To: Ajit Kumar Agarwal
Cc: Jeff Law; GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli 
Hunsigida; Nagaraju Mekala
Subject: Re: [Patch,rtl Optimization]: Better register pressure estimate for 
Loop Invariant Code Motion.

On Tue, Dec 8, 2015 at 11:24 PM, Ajit Kumar Agarwal 
 wrote:
> Based on the comments on RFC patch this patch incorporates all the comments 
> from Jeff. Thanks Jeff for the valuable feedback.
>
> This patch enables the better register pressure estimate for Loop 
> Invariant code motion. This patch Calculate the Loop Liveness used for 
> regs_used  used to calculate the register pressure  used in the cost 
> estimation.
>
> Loop Liveness is based on the following properties.  we only require 
> to calculate the set of objects that are live at the birth or the 
> header of the loop.  We don't need to calculate the live through the Loop 
> considering Live in and Live out of all the basic blocks of the Loop. This is 
> because the set of objects. That are live-in at the birth or header of the 
> loop will be live-in at every node in the Loop.
>
> If a v live out at the header of the loop then the variable is live-in 
> at every node in the Loop. To prove this, Consider a Loop L with header h 
> such that The variable v defined at d is live-in at h. Since v is live at h, 
> d is not part of L. This follows from the dominance property, i.e. h is 
> strictly dominated by d.
> Furthermore, there exists a path from h to a use of v which does not 
> go through d. For every node of the loop, p, since the loop is strongly 
> connected Component of the CFG, there exists a path, consisting only of nodes 
> of L from p to h. Concatenating those two paths prove that v is live-in and 
> live-out Of p.
>
> Also Calculate the Live-Out and Live-In for the exit edge of the loop. This 
> considers liveness for not only the Loop latch but also the liveness outside 
> the Loops.
>
> Bootstrapped and Regtested for x86_64 and microblaze target.
>
> There is an extra failure for the testcase gcc.dg/loop-invariant.c  with the 
> change that looks correct to me.
>
> This is because the available_regs = 6 and the regs_needed = 1 and 
> new_regs = 0 and the regs_used = 10.  As the reg_used that are based 
> on the Liveness given above is greater than the available_regs, then it's 
> candidate of spill and the estimate_register_pressure calculates the spill 
> cost. This spill cost is greater than inv_cost and gain comes to be negative. 
> The disables the loop invariant for the above testcase.
>
> Disabling of the Loop invariant for the testcase loop-invariant.c with 
> this patch  looks correct to me considering the calculation of available_regs 
> in cfgloopanal.c is correct.

>>You keep a lot of data (bitmaps) live just to use the number of bits set in 
>>the end.

>>I'll note that this also increases the complexity of the pass which is 
>>enabled at -O1+ where
>>-O1 should limit itself to O(n*log n) algorithms but live is quadratic.

>.So I don't think doing this unconditionally is a good idea (and we have 
>-fira-loop-pressure after all).

>>Please watch not making -O1 worse again after we spent so much time making it 
>>suitable for all the weird autogenerated code.

I can also implement without keeping the bitmaps live data  and would not 
require to set the bits in the end.  I am interested in the
Liveness in the loop header and the loop exit which I can calculate where I am 
setting the regs_used for the curr_loop.

This will not require to set and keep the bitmaps for liveness and also make it 
available for the curr_loop that are candidates of loop
Invariant. 

Thanks & Regards
Ajit

Richard.

>  ChangeLog:
>  2015-12-09  Ajit Agarwal  
>
> * loop-invariant.c
> (calculate_loop_liveness): New.
> (move_loop_invariants): Use of calculate_loop_liveness.
>
> Signed-off-by:Ajit Agarwal ajit...@xilinx.com
>
> ---
>  gcc/loop-invariant.c |   77 
> ++
>  1 files changed, 65 insertions(+), 12 deletions(-)
>
> diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c index 
> 53d1377..ac08594 100644
> --- a/gcc/loop-invariant.c
> +++ b/gcc/loop-invariant.c
> @@ -1464,18 +1464,7 @@ find_invariants_to_move (bool speed, bool call_p)
> registers used; we put some initial bound here to stand for
> induction variables etc.  that we do not detect.  */
>  {
> -  unsigned int n_regs = DF_REG_SIZE (df);
> -
> -  regs_used = 2;
> -
> -  for (i = 0; i < n_regs; i++)
> -   {
> - if (!DF_REGNO_FIRST_DEF (i) && DF_REGNO_LAST_USE (i))
> -   {
> - /* This is a value that is used but not changed inside loop.  */
> - regs_used++;
> -   }
> -   }
> +  regs_used = bitmap_count_bits 

Re: [PATCH] Fix PR68583

2015-12-09 Thread Richard Biener
On Wed, 9 Dec 2015, Richard Biener wrote:

> 
> Well, not really in the end - the actual testcase would need code
> hoisting to make the refs equal enough for ifcvt.  Making the testcase
> simpler doesn't seem to capture the issue so for now without one
> (I'll keep trying).
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu.

This fixed PR68417, I committed the testcase below.

Richard.

2015-12-09  Richard Biener  

PR tree-optimization/68417
* gcc.dg/vect/pr68417.c: New testcase.

Index: gcc/testsuite/gcc.dg/vect/pr68417.c
===
--- gcc/testsuite/gcc.dg/vect/pr68417.c (revision 0)
+++ gcc/testsuite/gcc.dg/vect/pr68417.c (working copy)
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_condition } */
+
+#define N 16
+
+typedef struct {
+  int x;
+  int y;
+} Point;
+
+void
+foo(Point *p1, Point *p2, Point *p3, int *data)
+{
+  int m, m1, m2;
+  int i;
+
+  for (i = 0; i < N; i++) {
+m = *data++;
+
+m1 = p1->x - m;
+m2 = p2->x + m;
+
+p3->y = (m1 >= m2) ? p1->y : p2->y;
+
+p1++;
+p2++;
+p3++;
+  }
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" } } */


Re: [PATCH] GCC-5 backport of PR lto/65948

2015-12-09 Thread Bernd Schmidt

On 12/09/2015 12:50 PM, Martin Liška wrote:

I would like to backport forgotten patch to GCC-5 branch.
Bootstrap and regression tests have been running, ready after it finishes?


Ok.


Bernd



Re: [hsa 6/10] Pass manager changes

2015-12-09 Thread Jakub Jelinek
On Mon, Dec 07, 2015 at 12:24:05PM +0100, Martin Jambor wrote:
> the pass manager changes required for HSA have already been committed
> to trunk so all that remains are these additions to the pass pipeline.
> 
> Thanks,
> 
> Martin
> 
> 
> 2015-12-04  Martin Jambor  
>   Martin Liska  
> 
>   * passes.def: Schedule pass_ipa_hsa and pass_gen_hsail.
>   * tree-pass.h (make_pass_gen_hsail): Declare.
>   (make_pass_ipa_hsa): Likewise.

I'll defer this to richi.

Jakub


Re: [hsa 2/10] Modifications to libgomp proper

2015-12-09 Thread Jakub Jelinek
On Mon, Dec 07, 2015 at 12:19:57PM +0100, Martin Jambor wrote:
> +/* Flag set when the subsequent element in the device-specific argument
> +   values.  */
> +#define GOMP_TARGET_ARG_SUBSEQUENT_PARAM (1 << 7)
> +
> +/* Bitmask to apply to a target argument to find out the value identifier.  
> */
> +#define GOMP_TARGET_ARG_ID_MASK  (((1 << 8) - 1) << 8)
> +/* Target argument index of NUM_TEAMS.  */
> +#define GOMP_TARGET_ARG_NUM_TEAMS(1 << 8)
> +/* Target argument index of THREAD_LIMIT.  */
> +#define GOMP_TARGET_ARG_THREAD_LIMIT (2 << 8)

I meant that these two would be just special, passed as the first two
pointers in the array, without the markup.  Because, otherwise you either
need to use GOMP_TARGET_ARG_SUBSEQUENT_PARAM for these always, or for 32-bit
arches and for 64-bit ones shift often at runtime.  Having the markup even
for them is perhaps cleaner, but less efficient, so if you really want to go
that way, please make sure you handle it properly for 32-bit pointers
architectures though.  num_teams or thread_limit could be > 32767 or >
65535.

> -static void
> -gomp_target_fallback_firstprivate (void (*fn) (void *), size_t mapnum,
> -void **hostaddrs, size_t *sizes,
> -unsigned short *kinds)
> +static void *
> +gomp_target_unshare_firstprivate (size_t mapnum, void **hostaddrs,
> +   size_t *sizes, unsigned short *kinds)
>  {
>size_t i, tgt_align = 0, tgt_size = 0;
>char *tgt = NULL;
> @@ -1281,7 +1282,7 @@ gomp_target_fallback_firstprivate (void (*fn) (void *), 
> size_t mapnum,
>}
>if (tgt_align)
>  {
> -  tgt = gomp_alloca (tgt_size + tgt_align - 1);
> +  tgt = gomp_malloc (tgt_size + tgt_align - 1);

I don't like using gomp_malloc here, either copy/paste the function, or
create separate inline functions for the two loops, one for the first loop
which returns you tgt_align and tgt_size, and another for the stuff after
the allocation.  Then you can use those two inline functions to implement
both gomp_target_fallback_firstprivate which will use alloca, and
gomp_target_unshare_firstprivate which will use gomp_malloc instead.

> @@ -1356,6 +1377,11 @@ GOMP_target (int device, void (*fn) (void *), const 
> void *unused,
> and several arguments have been added:
> FLAGS is a bitmask, see GOMP_TARGET_FLAG_* in gomp-constants.h.
> DEPEND is array of dependencies, see GOMP_task for details.
> +   ARGS is a pointer to an array consisting of NUM_TEAMS, THREAD_LIMIT and a
> +   variable number of device-specific arguments, which always take two 
> elements
> +   where the first specifies the type and the second the actual value.  The
> +   last element of the array is a single NULL.

Note, here you document NUM_TEAMS and THREAD_LIMIT as special values, not
encoded.

> @@ -1473,6 +1508,7 @@ GOMP_target_data (int device, const void *unused, 
> size_t mapnum,
>struct gomp_device_descr *devicep = resolve_device (device);
>  
>if (devicep == NULL
> +  || (devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
>|| !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))

Would be nice to have some consistency in the order of capabilities checks.
Usually you check SHARED_MEM after OPENMP_400, so perhaps do it this way
here too.

> @@ -1741,23 +1784,38 @@ gomp_target_task_fn (void *data)
>  
>if (ttask->state == GOMP_TARGET_TASK_FINISHED)
>   {
> -   gomp_unmap_vars (ttask->tgt, true);
> +   if (ttask->tgt)
> + gomp_unmap_vars (ttask->tgt, true);
> return false;
>   }

This doesn't make sense.  For the GOMP_OFFLOAD_CAP_SHARED_MEM case, unless
you want to run the free (ttask->firstprivate_copies); as a task, you
shouldn't be queing the target task again for further execution, instead
it should just be handled like a finished task at that point.  The reason
why for XeonPhi or PTX gomp_target_task_fn is run with
GOMP_TARGET_TASK_FINISHED state is that gomp_unmap_vars can perform IO
actions and doing it with the tasking lock held is highly undesirable.
>  
> -  void *fn_addr = gomp_get_target_fn_addr (devicep, ttask->fn);
> -  ttask->tgt
> - = gomp_map_vars (devicep, ttask->mapnum, ttask->hostaddrs, NULL,
> -  ttask->sizes, ttask->kinds, true,
> -  GOMP_MAP_VARS_TARGET);
> +  bool shared_mem;
> +  if (devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
> + {
> +   shared_mem = true;
> +   ttask->tgt = NULL;
> +   ttask->firstprivate_copies
> + = gomp_target_unshare_firstprivate (ttask->mapnum, ttask->hostaddrs,
> + ttask->sizes, ttask->kinds);
> + }
> +  else
> + {
> +   shared_mem = false;
> +   ttask->tgt = gomp_map_vars (devicep, ttask->mapnum, ttask->hostaddrs,
> +   NULL, ttask->sizes, ttask->kinds, 

Re: [hsa 3/10] HSA libgomp plugin

2015-12-09 Thread Jakub Jelinek
On Mon, Dec 07, 2015 at 12:20:49PM +0100, Martin Jambor wrote:
> +#include 
> +#include 
> +#include 
> +#include 
> +#include "libgomp-plugin.h"
> +#include "gomp-constants.h"
> +#include "hsa.h"
> +#include "hsa_ext_finalize.h"

If these 2 headers are coming from outside of gcc, better use <> for them
instead of "", and better place them above the libgomp specific ones.

> +#include "dlfcn.h"

Ditto.

> +/* Flag to decide whether print to stderr information about what is going on.
> +   Set in init_debug depending on environment variables.  */
> +
> +static bool debug;
> +
> +/* Flag to decide if the runtime should suppress a possible fallback to host
> +   execution.  */
> +
> +static bool suppress_host_fallback;
> +
> +/* Initialize debug and suppress_host_fallback according to the environment. 
>  */
> +
> +static void
> +init_enviroment_variables (void)
> +{
> +  if (getenv ("HSA_DEBUG"))
> +debug = true;
> +  else
> +debug = false;
> +
> +  if (getenv ("HSA_SUPPRESS_HOST_FALLBACK"))
> +suppress_host_fallback = true;
> +  else
> +suppress_host_fallback = false;

Wonder whether we want these custom env vars in suid programs, allowing
users to change behavior might be undesirable.  So perhaps use secure_getenv
instead of getenv if found by configure?
> +
> +  const char* hsa_error;

Space before * instead of after it (multiple spots).

> +  hsa_status_string (status, _error);
> +
> +  unsigned l = strlen (hsa_error);
> +
> +  char *err = GOMP_PLUGIN_malloc (sizeof (char) * l);

sizeof (char) is 1, please don't multiply by it, that is just confusing.
It makes sense in macros where you don't know what the type really is, but
not here.

> +  memcpy (err, hsa_error, l - 1);
> +  err[l] = '\0';
> +
> +  fprintf (stderr, "HSA warning: %s (%s)\n", str, err);

Why do you allocate err at all, if you free it right away?  Can't you use
hsa_error directly instead?
> +
> +  free (err);

> +static void
> +hsa_fatal (const char *str, hsa_status_t status)
> +{
> +  const char* hsa_error;
> +  hsa_status_string (status, _error);
> +  GOMP_PLUGIN_fatal ("HSA fatal error: %s (%s)", str, hsa_error);

Like you do e.g. here?

> +/* Callback of dispatch queues to report errors.  */
> +
> +static void
> +queue_callback(hsa_status_t status, hsa_queue_t* queue __attribute__ 
> ((unused)),

Missing space before (.
> +/* Callback of hsa_agent_iterate_regions.  Determine if a memory REGION can 
> be
> +   used for kernarg allocations and if so write it to the memory pointed to 
> by
> +   DATA and break the query.  */
> +
> +static hsa_status_t get_kernarg_memory_region (hsa_region_t region, void* 
> data)

Missing newline between return type and function name.

> +  hsa_region_t* ret = (hsa_region_t*) data;

Two spots with wrong formatting above.
> +{
> +  if (agent->first_module)
> +  agent->first_module->prev = module;

Wrong indentation.

> +  unsigned size = end - start;
> +  char *buf = (char *) malloc (size);
> +  memcpy (buf, start, size);

malloc could return NULL and you crash.  Should this use GOMP_PLUGIN_malloc
instead?

> +  struct GOMP_hsa_kernel_dispatch *shadow = GOMP_PLUGIN_malloc_cleared
> +(sizeof (struct GOMP_hsa_kernel_dispatch));

Formatting, = should go on the next line, and if even that doesn't help,
maybe use sizeof (*shadow)?
> +
> +  shadow->queue = agent->command_q;
> +  shadow->omp_data_memory = omp_data_size > 0
> +? GOMP_PLUGIN_malloc (omp_data_size) : NULL;

= should go on the next line.

> +  unsigned dispatch_count = kernel->dependencies_count;
> +  shadow->kernel_dispatch_count = dispatch_count;
> +
> +  shadow->children_dispatches = GOMP_PLUGIN_malloc
> +(dispatch_count * sizeof (struct GOMP_hsa_kernel_dispatch *));

Likewise.
> +indent_stream (FILE *f, unsigned indent)
> +{
> +  for (int i = 0; i < indent; i++)
> +fputc (' ', f);

Perhaps fprintf (f, "%*s", indent, "");
instead?

> +  struct GOMP_hsa_kernel_dispatch *shadow = create_single_kernel_dispatch
> +(kernel, omp_data_size);

Formatting issues again.

> +  shadow->omp_num_threads = 64;
> +  shadow->debug = 0;
> +  shadow->omp_level = kernel->gridified_kernel_p ? 1 : 0;
> +
> +  /* Create kernel dispatch data structures.  We do not allow to have
> + a kernel dispatch with depth bigger than one.  */
> +  for (unsigned i = 0; i < kernel->dependencies_count; i++)
> +{
> +  struct kernel_info *dependency = get_kernel_for_agent
> + (kernel->agent, kernel->dependencies[i]);
> +  shadow->children_dispatches[i] = create_single_kernel_dispatch
> + (dependency, omp_data_size);
> +  shadow->children_dispatches[i]->queue =

Never put = at the end of line.
> + kernel->agent->kernel_dispatch_command_q;

Jakub


Re: [PATCH] Fix memory leaks in tree-vect-data-refs.c

2015-12-09 Thread Richard Biener
On Wed, Dec 9, 2015 at 12:14 PM, Martin Liška  wrote:
> Hi.
>
> This is simple follow-up of the previous patch that fixes last remaining
> leak in vectorizer.
>
> Patch can regbootstrap on x64_64-linux-gnu.
>
> Ready for trunk?

Ok.

Richard.

> Martin


Re: [C++ Patch] PR 60218

2015-12-09 Thread Jason Merrill

OK


RE: [PATCH] [ARC] Add support for atomic memory built-in.

2015-12-09 Thread Claudiu Zissulescu
I will add this text before  "*memory_barrier" pattern:

;; For ARCHS, we use a hardware data memory barrier that waits for
;; completion of current data memory operations before initiating
;; similar data memory operations.

Once done, I will commit it.

Thanks,
Claudiu

>
> Tested with dg.exp (when passing -matomic to gcc compiler line, the atomic 
> tests are also successfully executed).
The comment before "*memory_barrier" could use some elaboration on what it does 
for TARGET_HS.
Otherwise, this is OK.


Re: [PATCH] Fix phiopt ICE in Factor conversion in COND_EXPR (PR tree-optimization/66949)

2015-12-09 Thread Richard Biener
On Wed, Dec 9, 2015 at 3:22 PM, Marek Polacek  wrote:
> On Wed, Dec 09, 2015 at 11:41:44AM +0100, Richard Biener wrote:
>> But I don't see why we have the asserts at all - they seem to be originated 
>> from
>> development.  IMHO factor_out_conditonal_conversion should simply return
>> the new PHI it generates instead of relying on
>> signle_non_signelton_phi_for_edges
>> to return it.
>
> Ok, that seems to work.
>
> Bootstrapped/regtested on x86_64-linux, ok for trunk?

Ok.

Thanks,
Richard.

> 2015-12-09  Marek Polacek  
>
> PR tree-optimization/66949
> * tree-ssa-phiopt.c (tree_ssa_phiopt_worker): Don't call
> single_non_singleton_phi_for_edges to get the PHI from
> factor_out_conditional_conversion.  Use NULL_TREE instead of NULL.
> (factor_out_conditional_conversion): Adjust declaration.  Make it
> return the newly-created PHI.
>
> * gcc.dg/torture/pr66949-1.c: New test.
> * gcc.dg/torture/pr66949-2.c: New test.
>
> diff --git gcc/testsuite/gcc.dg/torture/pr66949-1.c 
> gcc/testsuite/gcc.dg/torture/pr66949-1.c
> index e69de29..1b765bc 100644
> --- gcc/testsuite/gcc.dg/torture/pr66949-1.c
> +++ gcc/testsuite/gcc.dg/torture/pr66949-1.c
> @@ -0,0 +1,28 @@
> +/* PR tree-optimization/66949 */
> +/* { dg-do compile } */
> +
> +int a, *b = , c;
> +
> +unsigned short
> +fn1 (unsigned short p1, unsigned int p2)
> +{
> +  return p2 > 1 || p1 >> p2 ? p1 : p1 << p2;
> +}
> +
> +void
> +fn2 ()
> +{
> +  int *d = 
> +  for (a = 0; a < -1; a = 1)
> +;
> +  if (a < 0)
> +c = 0;
> +  *b = fn1 (*d || c, *d);
> +}
> +
> +int
> +main ()
> +{
> +  fn2 ();
> +  return 0;
> +}
> diff --git gcc/testsuite/gcc.dg/torture/pr66949-2.c 
> gcc/testsuite/gcc.dg/torture/pr66949-2.c
> index e69de29..e6250a3 100644
> --- gcc/testsuite/gcc.dg/torture/pr66949-2.c
> +++ gcc/testsuite/gcc.dg/torture/pr66949-2.c
> @@ -0,0 +1,23 @@
> +/* PR tree-optimization/66949 */
> +/* { dg-do compile } */
> +
> +char a;
> +int b, c, d;
> +extern int fn2 (void);
> +
> +short
> +fn1 (short p1, short p2)
> +{
> +  return p2 == 0 ? p1 : p1 / p2;
> +}
> +
> +int
> +main (void)
> +{
> +  char e = 1;
> +  int f = 7;
> +  c = a >> f;
> +  b = fn1 (c, 0 < d <= e && fn2 ());
> +
> +  return 0;
> +}
> diff --git gcc/tree-ssa-phiopt.c gcc/tree-ssa-phiopt.c
> index 344cd2f..d7e8aa0 100644
> --- gcc/tree-ssa-phiopt.c
> +++ gcc/tree-ssa-phiopt.c
> @@ -49,7 +49,7 @@ along with GCC; see the file COPYING3.  If not see
>  static unsigned int tree_ssa_phiopt_worker (bool, bool);
>  static bool conditional_replacement (basic_block, basic_block,
>  edge, edge, gphi *, tree, tree);
> -static bool factor_out_conditional_conversion (edge, edge, gphi *, tree, 
> tree);
> +static gphi *factor_out_conditional_conversion (edge, edge, gphi *, tree, 
> tree);
>  static int value_replacement (basic_block, basic_block,
>   edge, edge, gimple *, tree, tree);
>  static bool minmax_replacement (basic_block, basic_block,
> @@ -310,19 +310,19 @@ tree_ssa_phiopt_worker (bool do_store_elim, bool 
> do_hoist_loads)
>
>   /* Something is wrong if we cannot find the arguments in the PHI
>  node.  */
> - gcc_assert (arg0 != NULL && arg1 != NULL);
> + gcc_assert (arg0 != NULL_TREE && arg1 != NULL_TREE);
>
> - if (factor_out_conditional_conversion (e1, e2, phi, arg0, arg1))
> + gphi *newphi = factor_out_conditional_conversion (e1, e2, phi,
> +   arg0, arg1);
> + if (newphi != NULL)
> {
> + phi = newphi;
>   /* factor_out_conditional_conversion may create a new PHI in
>  BB2 and eliminate an existing PHI in BB2.  Recompute values
>  that may be affected by that change.  */
> - phis = phi_nodes (bb2);
> - phi = single_non_singleton_phi_for_edges (phis, e1, e2);
> - gcc_assert (phi);
>   arg0 = gimple_phi_arg_def (phi, e1->dest_idx);
>   arg1 = gimple_phi_arg_def (phi, e2->dest_idx);
> - gcc_assert (arg0 != NULL && arg1 != NULL);
> + gcc_assert (arg0 != NULL_TREE && arg1 != NULL_TREE);
> }
>
>   /* Do the replacement of conditional if it can be done.  */
> @@ -402,9 +402,9 @@ replace_phi_edge_with_variable (basic_block cond_block,
>
>  /* PR66726: Factor conversion out of COND_EXPR.  If the arguments of the PHI
> stmt are CONVERT_STMT, factor out the conversion and perform the 
> conversion
> -   to the result of PHI stmt.  */
> +   to the result of PHI stmt.  Return the newly-created PHI, if any.  */
>
> -static bool
> +static gphi *
>  factor_out_conditional_conversion (edge e0, edge e1, gphi *phi,
>tree arg0, tree arg1)
>  {
> @@ -421,7 +421,7 @@ factor_out_conditional_conversion (edge e0, edge e1, 

Re: [Fortran, Patch] Memory sync after coarray image control statements and assignment

2015-12-09 Thread Matthew Wahab

On 08/12/15 09:25, Tobias Burnus wrote:

On Mon, Dec 07, 2015 at 02:09:22PM +, Matthew Wahab wrote:

I wonder whether using
__asm__ __volatile__ ("":::"memory");
would be sufficient as it has a way lower overhead than
__sync_synchronize().


I don't know anything about Fortran or coarrays and I'm curious
whether this affects architectures with weak memory models. Is the
barrier only needed to stop reordering by the compiler or is does it
also need to stop reordering by the hardware?


Short answer: I think no mfence is needed as either the communication
is local (to the thread/process) - in which case the hardware will act
correctly - or the communication is remote (different thread, process,
communication to different computer via interlink [ethernet, infiniband,
...]); and in the later case, the communication library has to deal with
it.


Thanks for explaining this, it made things clear. Based on your description, I agree 
that hardware reordering shouldn't be a problem.



and the (main) program code (slightly trimmed):

   static void * restrict caf_token.0;
   static integer(kind=4) * restrict var;
   void _caf_init.1 (void);

   *var = 4;

   desc.3.data = 42;
   _gfortran_caf_send (caf_token.0, 0B /* offset */ var,
   _gfortran_caf_this_image (0), , 0B, , 4, 
4, 0);
   __asm__ __volatile__("":::"memory");  // new
   tmp = *var;

The problem is that in that case the compiler does not know that
"_gfortran_caf_send (caf_token.0," can modify "*var".



Is the restrict attribute on var correct? From what you say, it sounds like *var 
could be accessed through other pointers (assuming restrict has the same meaning as 
in C).


Matthew


RE: [Patch,rtl Optimization]: Better register pressure estimate for Loop Invariant Code Motion.

2015-12-09 Thread Ajit Kumar Agarwal


-Original Message-
From: Bernd Schmidt [mailto:bschm...@redhat.com] 
Sent: Wednesday, December 09, 2015 7:34 PM
To: Ajit Kumar Agarwal; Richard Biener
Cc: Jeff Law; GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli 
Hunsigida; Nagaraju Mekala
Subject: Re: [Patch,rtl Optimization]: Better register pressure estimate for 
Loop Invariant Code Motion.

On 12/09/2015 12:22 PM, Ajit Kumar Agarwal wrote:
>
> This is because the available_regs = 6 and the regs_needed = 1 and 
> new_regs = 0 and the regs_used = 10.  As the reg_used that are based 
> on the Liveness given above is greater than the available_regs, then
 > it's candidate of spill and the estimate_register_pressure calculates  > the 
 > spill cost. This spill cost is greater than inv_cost and gain  > comes to be 
 > negative. The disables the loop invariant for the above  > testcase.

>>As far as I can tell this loop does not lead to a spill. Hence, failure of 
>>this testcase would suggest there is something wrong with your idea.

As far as I can see there is nothing wrong with the idea, the existing 
implementation of the calculation of available_regs results to 6 which is not 
the case for this
testcase. The estimate of regs_used based on the Liveness ( idea behind this 
patch)  is correct, but the incorrect estimate of the available_regs (6) 
results in the gain to be negative. 

>> +  FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)  {

>>Formatting.

>> +/* Loop Liveness is based on the following proprties.

>>"properties"

I will incorporate the change.

>> +   we only require to calculate the set of objects that are live at
>> +   the birth or the header of the loop.
>> +   We don't need to calculate the live through the Loop considering
>> +   Live in and Live out of all the basic blocks of the Loop. This is
>> +   because the set of objects. That are live-in at the birth or header
>> +   of the loop will be live-in at every node in the Loop.
>> +   If a v live out at the header of the loop then the variable is 
>> live-in
>> +   at every node in the Loop. To prove this, Consider a Loop L with 
>> header
>> +   h such that The variable v defined at d is live-in at h. Since v is 
>> live
>> +   at h, d is not part of L. This follows from the dominance property, 
>> i.e.
>> +   h is strictly dominated by d. Furthermore, there exists a path from 
>> h to
>> +   a use of v which does not go through d. For every node of the loop, 
>> p,
>> +   since the loop is strongly connected Component of the CFG, there 
>> exists
>> +   a path, consisting only of nodes of L from p to h. Concatenating 
>> those
>> +   two paths prove that v is live-in and live-out Of p.  */

>>Please Refrain From Randomly Capitalizing Words, and please also fix the 
>>grammar ("set of objects. That are live-in"). These problems make this 
>>comment (and also your emails) >>extremely hard to read.

>>Partly for that reason, I can't quite make up my mind whether this patch 
>>makes things better or just different. The testcase failure makes me think 
>>it's probably not an improvement.

I'll correct it. I am seeing the gains for Mibench/EEMBC benchmarks for 
Microblaze target with this patch. I will run SPEC CPU 2000  benchmarks for 
i386 with this patch.

>> +bitmap_ior_into (_DATA (loop)->regs_live, DF_LR_IN (loop->header));
>> +bitmap_ior_into (_DATA (loop)->regs_live, DF_LR_OUT 
>> + (loop->header));

>>Formatting.

I'll incorporate the change.

Thanks & Regards
Ajit

Bernd


Re: [hsa 6/10] Pass manager changes

2015-12-09 Thread Richard Biener
On Wed, Dec 9, 2015 at 2:20 PM, Jakub Jelinek  wrote:
> On Mon, Dec 07, 2015 at 12:24:05PM +0100, Martin Jambor wrote:
>> the pass manager changes required for HSA have already been committed
>> to trunk so all that remains are these additions to the pass pipeline.
>>
>> Thanks,
>>
>> Martin
>>
>>
>> 2015-12-04  Martin Jambor  
>>   Martin Liska  
>>
>>   * passes.def: Schedule pass_ipa_hsa and pass_gen_hsail.
>>   * tree-pass.h (make_pass_gen_hsail): Declare.
>>   (make_pass_ipa_hsa): Likewise.
>
> I'll defer this to richi.

Ok.

Richard.

> Jakub


[PATCH] Add levels to -Wmisleading-indentation; add level 1 to -Wall

2015-12-09 Thread David Malcolm
On Mon, 2015-11-02 at 16:41 -0700, Jeff Law wrote:
> On 11/02/2015 12:35 PM, David Malcolm wrote:
>
> >
> >> diff --git a/gdb/ada-lang.c b/gdb/ada-lang.c
> >> index fff4862..2559a36 100644
> >> --- a/gdb/ada-lang.c
> >> +++ b/gdb/ada-lang.c
> >> @@ -11359,9 +11359,11 @@ ada_evaluate_subexp (struct type *expect_type, 
> >> struct expression *exp,
> >>  return value_zero (ada_aligned_type (type), lval_memory);
> >>}
> >>  else
> >> -arg1 = ada_value_struct_elt (arg1, >elts[pc + 2].string, 0);
> >> -arg1 = unwrap_value (arg1);
> >> -return ada_to_fixed_value (arg1);
> >> +  {
> >> +arg1 = ada_value_struct_elt (arg1, >elts[pc + 2].string, 0);
> >> +arg1 = unwrap_value (arg1);
> >> +return ada_to_fixed_value (arg1);
> >> +  }
> >>
> >>case OP_TYPE:
> >>  /* The value is not supposed to be used.  This is here to make it
> >
> > Interesting.  It's not technically a bug, since the "if true" clause has
> > an unconditional return, but it looks like a time-bomb to me.  I'm happy
> > that we warn for it.
> Agreed.
>
>
> >
> > These three are all of the form:
> > if (record_debug)
> >   fprint (...multiple lines...);
> >   return -1;
> >
> > It seems reasonable to me to complain about the indentation of the
> > return -1;
> Agreed.
>
>
> >
> >> diff --git a/sysdeps/ieee754/flt-32/k_rem_pio2f.c 
> >> b/sysdeps/ieee754/flt-32/k_rem_pio2f.c
> >> index 0c7685c..a0d844c 100644
> >> --- a/sysdeps/ieee754/flt-32/k_rem_pio2f.c
> >> +++ b/sysdeps/ieee754/flt-32/k_rem_pio2f.c
> >> @@ -65,7 +65,8 @@ int __kernel_rem_pio2f(float *x, float *y, int e0, int 
> >> nx, int prec, const int32
> >>
> >>/* compute q[0],q[1],...q[jk] */
> >>   for (i=0;i<=jk;i++) {
> >> -   for(j=0,fw=0.0;j<=jx;j++) fw += x[j]*f[jx+i-j]; q[i] = fw;
> >> +   for(j=0,fw=0.0;j<=jx;j++) fw += x[j]*f[jx+i-j];
> >> +   q[i] = fw;
> >>   }
> >>
> >>   jz = jk;
> >
> > I think it's very reasonable to complain about the above code.
> Agreed.
>
> >
> > So if I've counted things right the tally is:
> > * 5 dubious-looking though correct places, where it's reasonable to
> > issue a warning
> > * 5 places where there's a blank line between guarded and non-guarded
> > stmt, where patch 1 of the kit would have suppressed the warning
> > * one bug (PR 68187)
> >
> > I think we want the first kind of thing at -Wall, but I'm not so keen on
> > the second kind at -Wall.  Is there precedent for "levels" of a warning?
> > (so e.g. pedantry level 1 at -Wall, level 2 at -Wextra, and have patch 1
> > be the difference between levels 1 and 2?)
> >
> > Sorry for harping on about patch 1; thanks again for posting this
> No worries about harping.  These are real world cases.
>
> And yes, there's definitely precedent for different levels of a warning.
>   If you wanted to add a patch to have different levels and select one
> for -Wall, I'd look favorably on that.

The attached patch implements this idea.

It's a revised version of:
  "[PATCH 1/4] -Wmisleading-indentation: don't warn in presence of entirely 
blank lines"
(https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03225.html)
(which wasn't approved, since you *did* want warnings for them),
and of:
  "[PATCH 4/4] Add -Wmisleading-indentation to -Wall"
   (https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03223.html)
which you approved, but which Richi was unhappy with
(I haven't committed it).

The attached patch adds the idea of a strictness level to
-Wmisleading-indentation, where
  -Wmisleading-indentation
becomes equivalent to
  -Wmisleading-indentation=1

The heuristic from patch 1 becomes the difference between
-Wmisleading-indentation=1 and -Wmisleading-indentation=2.

It adds -Wmisleading-indentation=1 to -Wall, with
-Wmisleading-indentation=2 available for users who want
extra strictness.

I think this gives us a big improvement in the signal:noise ratio of
the "-Wall" form of the warning for real code (see e.g. the stats for
gcc itself above), relative to the earlier form of the -Wall patch.

Bootstrapped with x86_64-pc-linux-gnu.

Adds 33 PASS results to g++.sum; adds 11 PASS results to gcc.sum.

OK for trunk?

gcc/c-family/ChangeLog:
* c-indentation.c (line_is_blank_p): New function.
(separated_by_blank_lines_p): New function.
(should_warn_for_misleading_indentation): Add param
strictness_level.  Move early reject from
warn_for_misleading_indentation to here.  At strictness
level 1, don't warn if the guarded and unguarded code are
separated by one or more entirely blank lines.
(warn_for_misleading_indentation): Pass value of
warn_misleading_indentation to
should_warn_for_misleading_indentation and move early
rejection case for disabled aka level 0 to there.
* c.opt (Wmisleading-indentation): Convert to an alias for...
(Wmisleading-indentation=): ...this new version of
  

[PATCH] Do not compute dependences in if-conversion

2015-12-09 Thread Richard Biener

This removes the dubious dependence computation in if-conversion
just for the sake of testing if that might fail (given the vectorizer
itself doesn't fail that easily).  It's a quadratic operation after all
which we should avoid at all cost.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2015-12-09  Richard Biener  

* tree-if-conv.c (if_convertible_loop_p_1): Do not compute
dependences.
(if_convertible_loop_p): Adjust.

Index: gcc/tree-if-conv.c
===
*** gcc/tree-if-conv.c  (revision 231453)
--- gcc/tree-if-conv.c  (working copy)
*** predicate_bbs (loop_p loop)
*** 1172,1189 
  
  static bool
  if_convertible_loop_p_1 (struct loop *loop,
-vec *loop_nest,
 vec *refs,
!vec *ddrs, bool *any_mask_load_store)
  {
-   bool res;
unsigned int i;
basic_block exit_bb = NULL;
  
!   /* Don't if-convert the loop when the data dependences cannot be
!  computed: the loop won't be vectorized in that case.  */
!   res = compute_data_dependences_for_loop (loop, true, loop_nest, refs, ddrs);
!   if (!res)
  return false;
  
calculate_dominance_info (CDI_DOMINATORS);
--- 1172,1184 
  
  static bool
  if_convertible_loop_p_1 (struct loop *loop,
 vec *refs,
!bool *any_mask_load_store)
  {
unsigned int i;
basic_block exit_bb = NULL;
  
!   if (find_data_references_in_loop (loop, refs) == chrec_dont_know)
  return false;
  
calculate_dominance_info (CDI_DOMINATORS);
*** if_convertible_loop_p (struct loop *loop
*** 1300,1306 
edge_iterator ei;
bool res = false;
vec refs;
-   vec ddrs;
  
/* Handle only innermost loop.  */
if (!loop || loop->inner)
--- 1295,1300 
*** if_convertible_loop_p (struct loop *loop
*** 1333,1342 
return false;
  
refs.create (5);
!   ddrs.create (25);
!   auto_vec loop_nest;
!   res = if_convertible_loop_p_1 (loop, _nest, , ,
!any_mask_load_store);
  
data_reference_p dr;
unsigned int i;
--- 1327,1333 
return false;
  
refs.create (5);
!   res = if_convertible_loop_p_1 (loop, , any_mask_load_store);
  
data_reference_p dr;
unsigned int i;
*** if_convertible_loop_p (struct loop *loop
*** 1344,1350 
  free (dr->aux);
  
free_data_refs (refs);
-   free_dependence_relations (ddrs);
  
delete ref_DR_map;
ref_DR_map = NULL;
--- 1335,1340 


[PATCH, c++] Add testcase from PR68348 to the testsuite

2015-12-09 Thread Uros Bizjak
This patch adds the test for the fixed PR.

2015-12-09  Uros Bizjak  

PR c++/68348
PR c++/68464
* g++.dg/pr68348.C: New test.

Tested on x86_64-linux-gnu.

OK for mainline?

Uros.
Index: g++.dg/pr68348.C
===
--- g++.dg/pr68348.C(revision 0)
+++ g++.dg/pr68348.C(revision 0)
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-std=c++11 -O2" } */
+
+struct C {
+  constexpr C ():w (), x (), y () {}
+  constexpr double fn () const noexcept;
+  double w;
+  double x;
+  double y;
+};
+
+constexpr double C::fn () const noexcept { return w; }
+
+C foo ()
+{
+  C c;
+  c.fn ();
+  return c;
+}


Re: [Patch,rtl Optimization]: Better register pressure estimate for Loop Invariant Code Motion.

2015-12-09 Thread David Malcolm
On Wed, 2015-12-09 at 11:35 +0100, Richard Biener wrote:
> On Tue, Dec 8, 2015 at 11:24 PM, Ajit Kumar Agarwal
>  wrote:
> > Based on the comments on RFC patch this patch incorporates all the comments 
> > from Jeff. Thanks Jeff for the valuable feedback.
> >
> > This patch enables the better register pressure estimate for Loop Invariant 
> > code motion. This patch Calculate the Loop Liveness used for regs_used
> >  used to calculate the register pressure  used in the cost estimation.
> >
> > Loop Liveness is based on the following properties.  we only require to 
> > calculate the set of objects that are live at the birth or the header of 
> > the loop.  We
> > don't need to calculate the live through the Loop considering Live in and 
> > Live out of all the basic blocks of the Loop. This is because the set of 
> > objects. That
> > are live-in at the birth or header of the loop will be live-in at every 
> > node in the Loop.
> >
> > If a v live out at the header of the loop then the variable is live-in at 
> > every node in the Loop. To prove this, Consider a Loop L with header h such 
> > that The
> > variable v defined at d is live-in at h. Since v is live at h, d is not 
> > part of L. This follows from the dominance property, i.e. h is strictly 
> > dominated by d.
> > Furthermore, there exists a path from h to a use of v which does not go 
> > through d. For every node of the loop, p, since the loop is strongly 
> > connected
> > Component of the CFG, there exists a path, consisting only of nodes of L 
> > from p to h. Concatenating those two paths prove that v is live-in and 
> > live-out Of p.
> >
> > Also Calculate the Live-Out and Live-In for the exit edge of the loop. This 
> > considers liveness for not only the Loop latch but also the liveness 
> > outside the Loops.
> >
> > Bootstrapped and Regtested for x86_64 and microblaze target.
> >
> > There is an extra failure for the testcase gcc.dg/loop-invariant.c  with 
> > the change that looks correct to me.
> >
> > This is because the available_regs = 6 and the regs_needed = 1 and new_regs 
> > = 0 and the regs_used = 10.  As the reg_used that are based on the Liveness
> > given above is greater than the available_regs, then it's candidate of 
> > spill and the estimate_register_pressure calculates the spill cost. This 
> > spill cost is greater
> > than inv_cost and gain comes to be negative. The disables the loop 
> > invariant for the above testcase.
> >
> > Disabling of the Loop invariant for the testcase loop-invariant.c with this 
> > patch  looks correct to me considering the calculation of available_regs in 
> > cfgloopanal.c
> > is correct.
> 
> You keep a lot of data (bitmaps) live just to use the number of bits
> set in the end.
> 
> I'll note that this also increases the complexity of the pass which is
> enabled at -O1+ where
> -O1 should limit itself to O(n*log n) algorithms but live is quadratic.
> 
> So I don't think doing this unconditionally is a good idea (and we
> have -fira-loop-pressure
> after all).
> 
> Please watch not making -O1 worse again after we spent so much time
> making it suitable
> for all the weird autogenerated code.
> 
> Richard.
> 
> >  ChangeLog:
> >  2015-12-09  Ajit Agarwal  
> >
> > * loop-invariant.c
> > (calculate_loop_liveness): New.
> > (move_loop_invariants): Use of calculate_loop_liveness.
> >
> > Signed-off-by:Ajit Agarwal ajit...@xilinx.com
> >
> > ---
> >  gcc/loop-invariant.c |   77 
> > ++
> >  1 files changed, 65 insertions(+), 12 deletions(-)
> >
> > diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
> > index 53d1377..ac08594 100644
> > --- a/gcc/loop-invariant.c
> > +++ b/gcc/loop-invariant.c

[...snip...]

> > @@ -1966,7 +1955,63 @@ mark_ref_regs (rtx x)
> >}
> >  }
> >
> > +/* Calculate the Loop Liveness used for regs_used used in
> > +   heuristics to calculate the register pressure.  */
> > +
> > +static void
> > +calculate_loop_liveness (void)
> > +{
> > +  struct loop *loop;
> > +
> > +  FOR_EACH_LOOP (loop, 0)
> > +if (loop->aux == NULL)
> > +  {
> > +loop->aux = xcalloc (1, sizeof (struct loop_data));
> > +bitmap_initialize (_DATA (loop)->regs_live, _obstack);
> > + }
> > +
> > +  FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
> > +  {
> > +int  i;
> > +edge e;
> > +vec edges;
> > +edges = get_loop_exit_edges (loop);

Doesn't this leak "edges"?  (Could/should it be an auto_vec?)

> > +
> > +/* Loop Liveness is based on the following proprties.
> > +   we only require to calculate the set of objects that are live at
> > +   the birth or the header of the loop.
> > +   We don't need to calculate the live through the Loop considering
> > +   Live in and Live out of all the basic blocks of the Loop. This is
> > +   because the set of objects. That are live-in at the birth or header
> 

Re: PING^1: [PATCH] Add TYPE_EMPTY_RECORD for C++ empty class

2015-12-09 Thread Richard Biener
On Tue, Dec 8, 2015 at 5:22 PM, H.J. Lu  wrote:
> On Mon, Nov 23, 2015 at 12:53 PM, H.J. Lu  wrote:
>> On Mon, Nov 23, 2015 at 1:57 AM, Richard Biener
>>  wrote:
>>> On Sat, Nov 21, 2015 at 12:46 AM, H.J. Lu  wrote:
 On Fri, Nov 20, 2015 at 2:17 PM, Jason Merrill  wrote:
> On 11/20/2015 01:52 PM, H.J. Lu wrote:
>>
>> On Tue, Nov 17, 2015 at 4:22 AM, Richard Biener
>>  wrote:
>>>
>>> On Tue, Nov 17, 2015 at 12:01 PM, H.J. Lu  wrote:

 Empty record should be returned and passed the same way in C and C++.
 This patch adds LANG_HOOKS_EMPTY_RECORD_P for C++ empty class, which
 defaults to return false.  For C++, LANG_HOOKS_EMPTY_RECORD_P is 
 defined
 to is_really_empty_class, which returns true for C++ empty classes.  
 For
 LTO, we stream out a bit to indicate if a record is empty and we store
 it in TYPE_LANG_FLAG_0 when streaming in.  get_ref_base_and_extent is
 changed to set bitsize to 0 for empty records.  Middle-end and x86
 backend are updated to ignore empty records for parameter passing and
 function value return.  Other targets may need similar changes.
>>>
>>>
>>> Please avoid a new langhook for this and instead claim a bit in
>>> tree_type_common
>>> like for example restrict_flag (double-check it is unused for
>>> non-pointers).
>>
>>
>> There is no bit in tree_type_common I can overload.  restrict_flag is
>> checked for non-pointers to issue an error when it is used on
>> non-pointers:
>>
>>
>> /export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/template/qualttp20.C:19:38:
>> error: ‘__restrict__’ qualifiers cannot be applied to ‘AS::L’
>> typedef typename T::L __restrict__ r;// { dg-error "'__restrict__'
>> qualifiers cannot" "" }
>
>
> The C++ front end only needs to check TYPE_RESTRICT for this purpose on
> front-end-specific type codes like TEMPLATE_TYPE_PARM; cp_type_quals could
> handle that specifically if you change TYPE_RESTRICT to only apply to
> pointers.
>

 restrict_flag is also checked in this case:

 [hjl@gnu-6 gcc]$ cat x.i
 struct dummy { };

 struct dummy
 foo (struct dummy __restrict__ i)
 {
   return i;
 }
 [hjl@gnu-6 gcc]$ gcc -S x.i -Wall
 x.i:4:13: error: invalid use of ‘restrict’
  foo (struct dummy __restrict__ i)
  ^
 x.i:4:13: error: invalid use of ‘restrict’
 [hjl@gnu-6 gcc]$

 restrict_flag can't also be used to indicate `i' is an empty record.
>>>
>>> I'm sure this error can be done during parsing w/o relying on TYPE_RESTRICT.
>>>
>>> But well, use any other free bit (but do not enlarge
>>> tree_type_common).  Eventually
>>> you can free up a bit by putting sth into type_lang_specific currently
>>> using bits
>>> in tree_type_common.
>>
>> There are no bits in tree_type_common I can move.  Instead,
>> this patch overloads side_effects_flag in tree_base.  Tested on
>> Linux/x86-64.  OK for trunk?

I'm fine with using side_effects_flag for this.

I miss an explanation of where this detail of the ABI is documented and wonder
if the term "empty record" is also used in that document and how it is
documented.

Thus

+/* Nonzero in a type considered an empty record.  */
+#define TYPE_EMPTY_RECORD(NODE) \
+  (TYPE_CHECK (NODE)->base.side_effects_flag)

should refer to the ABI where is defined what an "empty record" is and how
it is handled by the backend(s).

+/* Return true if type T is an empty record.  */
+
+static inline bool
+type_is_empty_record_p (const_tree t)
+{
+  return TYPE_EMPTY_RECORD (TYPE_MAIN_VARIANT (t));

the type checker should probably check the bit is consistent across
variants so it can be tested on any of them.

You fail to adjust other targets gimplification hooks which suggests
this is a detail of the x86 psABI and not the C++ ABI?  If so I miss
a -Wpsabi warning for this change.

Did you investigate the effect of this patch on a larger code base?
More specifically does it only affect variadic functions?

An entry for changes.html is warranted as well.

Thanks,
Richard.

>> Thanks.
>>
>> --
>> H.J.
>
>
>
> --
> H.J.


Re: [Patch,rtl Optimization]: Better register pressure estimate for Loop Invariant Code Motion.

2015-12-09 Thread Bernd Schmidt

On 12/09/2015 12:22 PM, Ajit Kumar Agarwal wrote:


This is because the available_regs = 6 and the regs_needed = 1 and
new_regs = 0 and the regs_used = 10.  As the reg_used that are based
on the Liveness given above is greater than the available_regs, then

> it's candidate of spill and the estimate_register_pressure calculates
> the spill cost. This spill cost is greater than inv_cost and gain
> comes to be negative. The disables the loop invariant for the above
> testcase.

As far as I can tell this loop does not lead to a spill. Hence, failure 
of this testcase would suggest there is something wrong with your idea.



+  FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)  {


Formatting.


+/* Loop Liveness is based on the following proprties.


"properties"


+   we only require to calculate the set of objects that are live at
+   the birth or the header of the loop.
+   We don't need to calculate the live through the Loop considering
+   Live in and Live out of all the basic blocks of the Loop. This is
+   because the set of objects. That are live-in at the birth or header
+   of the loop will be live-in at every node in the Loop.
+   If a v live out at the header of the loop then the variable is live-in
+   at every node in the Loop. To prove this, Consider a Loop L with header
+   h such that The variable v defined at d is live-in at h. Since v is live
+   at h, d is not part of L. This follows from the dominance property, i.e.
+   h is strictly dominated by d. Furthermore, there exists a path from h to
+   a use of v which does not go through d. For every node of the loop, p,
+   since the loop is strongly connected Component of the CFG, there exists
+   a path, consisting only of nodes of L from p to h. Concatenating those
+   two paths prove that v is live-in and live-out Of p.  */


Please Refrain From Randomly Capitalizing Words, and please also fix the 
grammar ("set of objects. That are live-in"). These problems make this 
comment (and also your emails) extremely hard to read.


Partly for that reason, I can't quite make up my mind whether this patch 
makes things better or just different. The testcase failure makes me 
think it's probably not an improvement.



+bitmap_ior_into (_DATA (loop)->regs_live, DF_LR_IN (loop->header));
+bitmap_ior_into (_DATA (loop)->regs_live, DF_LR_OUT
+ (loop->header));


Formatting.


Bernd


Re: [AArch64] Emit square root using the Newton series

2015-12-09 Thread Marcus Shawcroft
On 8 December 2015 at 21:35, Evandro Menezes  wrote:
>Emit square root using the Newton series
>
>2015-12-03  Evandro Menezes  
>
>gcc/
> * config/aarch64/aarch64-protos.h (aarch64_emit_swsqrt):
>Declare new
> function.
> * config/aarch64/aarch64-simd.md (sqrt2): New
>expansion and
> insn definitions.
> * config/aarch64/aarch64-tuning-flags.def
> (AARCH64_EXTRA_TUNE_FAST_SQRT): New tuning macro.
> * config/aarch64/aarch64.c (aarch64_emit_swsqrt): Define
>new function.
> * config/aarch64/aarch64.md (sqrt2): New expansion
>and insn
> definitions.
> * config/aarch64/aarch64.opt (mlow-precision-recip-sqrt):
>Expand option
> description.
> * doc/invoke.texi (mlow-precision-recip-sqrt): Likewise.
>
> This patch extends the patch that added support for implementing x^-1/2
> using the Newton series by adding support for x^1/2 as well.

Hi Evandro, What benchmarking have you done on this patch?
/M


Re: [PATCH 5/7][Testsuite] Support ARMv8.1 ARM tests.

2015-12-09 Thread Christophe Lyon
On 7 December 2015 at 17:10, Matthew Wahab  wrote:
> On 27/11/15 17:11, Matthew Wahab wrote:
>>
>> On 27/11/15 13:44, Christophe Lyon wrote:

 On 26/11/15 16:02, Matthew Wahab wrote:
>>
>>
> This patch adds ARMv8.1 support to GCC Dejagnu, to allow ARM
> tests to specify targest and to set up command line options.
> It builds on the ARMv8.1 target support added for AArch64 tests, partly
> reworking that support to take into account the different
> configurations
> that tests may be run under.
>>
>>
>>> I may be mistaken, but -mfpu=neon-fp-armv8 and -mfloat-abi=softfp are not
>>> supported by aarch64-gcc. So it seems to me that
>>> check_effective_target_arm_v8_1a_neon_ok_nocache will not always work
>>> for aarch64 after your patch.
>>
>>
>>> Or does it work because no option is needed and thus "" always
>>> matches and thus the loop always exits after the first iteration
>>> on aarch64?
>>
>>
>> Yes, the idea is that the empty string will make the function first try
>> '-march=armv8.1-a' without any other flag. That will work for AArch64
>> because it
>> doesn't need any other option.
>>
>>> Maybe a more accurate comment would help remembering that, in case
>>> -mfpu option becomes necessary for aarch64.
>>>
>>
>> Agreed, it's worth having a comment to explain what the 'foreach'
>> construct is doing.
>>
>> Matthew
>
>
> I've added a comment to the foreach construct, to make it clearer what
> it's doing.
>

I only have a minor typo to report in the other comment you added before
check_effective_target_arm_v8_1a_neon_ok_nocache:
"Record the command line options that needed."
("that" doesn't sound right to my non-native ears :-)

Otherwise OK for me, but I can't approve.

Christophe.


> Matthew
>
> testsuite/
> 2015-12-07  Matthew Wahab  
>
>
> * lib/target-supports.exp (add_options_for_arm_v8_1a_neon): Update
> comment.  Use check_effetive_target_arm_v8_1a_neon_ok to select
> the command line options.
> (check_effective_target_arm_v8_1a_neon_ok_nocache): Update initial
> test to allow ARM targets.  Select and record a working set of
> command line options.
> (check_effective_target_arm_v8_1a_neon_hw): Add tests for ARM
> targets.
>


Re: [PATCH] Fix phiopt ICE in Factor conversion in COND_EXPR (PR tree-optimization/66949)

2015-12-09 Thread Marek Polacek
On Wed, Dec 09, 2015 at 11:41:44AM +0100, Richard Biener wrote:
> But I don't see why we have the asserts at all - they seem to be originated 
> from
> development.  IMHO factor_out_conditonal_conversion should simply return
> the new PHI it generates instead of relying on
> signle_non_signelton_phi_for_edges
> to return it.

Ok, that seems to work.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2015-12-09  Marek Polacek  

PR tree-optimization/66949
* tree-ssa-phiopt.c (tree_ssa_phiopt_worker): Don't call 
single_non_singleton_phi_for_edges to get the PHI from
factor_out_conditional_conversion.  Use NULL_TREE instead of NULL.
(factor_out_conditional_conversion): Adjust declaration.  Make it
return the newly-created PHI.

* gcc.dg/torture/pr66949-1.c: New test.
* gcc.dg/torture/pr66949-2.c: New test.

diff --git gcc/testsuite/gcc.dg/torture/pr66949-1.c 
gcc/testsuite/gcc.dg/torture/pr66949-1.c
index e69de29..1b765bc 100644
--- gcc/testsuite/gcc.dg/torture/pr66949-1.c
+++ gcc/testsuite/gcc.dg/torture/pr66949-1.c
@@ -0,0 +1,28 @@
+/* PR tree-optimization/66949 */
+/* { dg-do compile } */
+
+int a, *b = , c;
+
+unsigned short
+fn1 (unsigned short p1, unsigned int p2)
+{
+  return p2 > 1 || p1 >> p2 ? p1 : p1 << p2;
+}
+
+void
+fn2 ()
+{
+  int *d = 
+  for (a = 0; a < -1; a = 1)
+;
+  if (a < 0)
+c = 0;
+  *b = fn1 (*d || c, *d);
+}
+
+int
+main ()
+{
+  fn2 ();
+  return 0;
+}
diff --git gcc/testsuite/gcc.dg/torture/pr66949-2.c 
gcc/testsuite/gcc.dg/torture/pr66949-2.c
index e69de29..e6250a3 100644
--- gcc/testsuite/gcc.dg/torture/pr66949-2.c
+++ gcc/testsuite/gcc.dg/torture/pr66949-2.c
@@ -0,0 +1,23 @@
+/* PR tree-optimization/66949 */
+/* { dg-do compile } */
+
+char a;
+int b, c, d;
+extern int fn2 (void);
+
+short
+fn1 (short p1, short p2)
+{
+  return p2 == 0 ? p1 : p1 / p2;
+}
+
+int
+main (void)
+{
+  char e = 1;
+  int f = 7;
+  c = a >> f;
+  b = fn1 (c, 0 < d <= e && fn2 ());
+
+  return 0;
+}
diff --git gcc/tree-ssa-phiopt.c gcc/tree-ssa-phiopt.c
index 344cd2f..d7e8aa0 100644
--- gcc/tree-ssa-phiopt.c
+++ gcc/tree-ssa-phiopt.c
@@ -49,7 +49,7 @@ along with GCC; see the file COPYING3.  If not see
 static unsigned int tree_ssa_phiopt_worker (bool, bool);
 static bool conditional_replacement (basic_block, basic_block,
 edge, edge, gphi *, tree, tree);
-static bool factor_out_conditional_conversion (edge, edge, gphi *, tree, tree);
+static gphi *factor_out_conditional_conversion (edge, edge, gphi *, tree, 
tree);
 static int value_replacement (basic_block, basic_block,
  edge, edge, gimple *, tree, tree);
 static bool minmax_replacement (basic_block, basic_block,
@@ -310,19 +310,19 @@ tree_ssa_phiopt_worker (bool do_store_elim, bool 
do_hoist_loads)
 
  /* Something is wrong if we cannot find the arguments in the PHI
 node.  */
- gcc_assert (arg0 != NULL && arg1 != NULL);
+ gcc_assert (arg0 != NULL_TREE && arg1 != NULL_TREE);
 
- if (factor_out_conditional_conversion (e1, e2, phi, arg0, arg1))
+ gphi *newphi = factor_out_conditional_conversion (e1, e2, phi,
+   arg0, arg1);
+ if (newphi != NULL)
{
+ phi = newphi;
  /* factor_out_conditional_conversion may create a new PHI in
 BB2 and eliminate an existing PHI in BB2.  Recompute values
 that may be affected by that change.  */
- phis = phi_nodes (bb2);
- phi = single_non_singleton_phi_for_edges (phis, e1, e2);
- gcc_assert (phi);
  arg0 = gimple_phi_arg_def (phi, e1->dest_idx);
  arg1 = gimple_phi_arg_def (phi, e2->dest_idx);
- gcc_assert (arg0 != NULL && arg1 != NULL);
+ gcc_assert (arg0 != NULL_TREE && arg1 != NULL_TREE);
}
 
  /* Do the replacement of conditional if it can be done.  */
@@ -402,9 +402,9 @@ replace_phi_edge_with_variable (basic_block cond_block,
 
 /* PR66726: Factor conversion out of COND_EXPR.  If the arguments of the PHI
stmt are CONVERT_STMT, factor out the conversion and perform the conversion
-   to the result of PHI stmt.  */
+   to the result of PHI stmt.  Return the newly-created PHI, if any.  */
 
-static bool
+static gphi *
 factor_out_conditional_conversion (edge e0, edge e1, gphi *phi,
   tree arg0, tree arg1)
 {
@@ -421,7 +421,7 @@ factor_out_conditional_conversion (edge e0, edge e1, gphi 
*phi,
  statement have the same unary operation, we can handle more
  than two arguments too.  */
   if (gimple_phi_num_args (phi) != 2)
-return false;
+return NULL;
 
   /* First canonicalize to simplify tests.  */
   if (TREE_CODE (arg0) != SSA_NAME)
@@ -433,14 +433,14 @@ factor_out_conditional_conversion (edge e0, edge e1, gphi 
*phi,
   

Re: [PATCH] Make basic asm implicitly clobber memory, pr24414

2015-12-09 Thread Bernd Edlinger
Hi,

On 09.12.2015 12:06 Bernd Schmidt wrote:
> On 12/09/2015 03:18 AM, Bernd Edlinger wrote:
>> Furthermore there is a documented use for asm(""): The empty 
>> assembler string is used to make a function
>> volatile, thus calls can not be optimized away.  But I think it is 
>> not necessary to make this clobber anything,
>> nor should it be an instruction scheduling barrier, as it used to be 
>> in the past.
>
> Making that change seems risky; best not to make assumptions about how 
> these are used. In any case,
>

So would you agree on the general direction of the patch,
if I drop the hunk in sched-deps.c ?


Thanks
Bernd.



[PATCH] Add testcase for c++/68348

2015-12-09 Thread Marek Polacek
This adds a testcase for the already fixed PR68348.

Tested on x86_64-linux, ok for trunk?

2015-12-09  Marek Polacek  

PR c++/68348
* g++.dg/cpp0x/pr68348.C: New test.

diff --git gcc/testsuite/g++.dg/cpp0x/pr68348.C 
gcc/testsuite/g++.dg/cpp0x/pr68348.C
index e69de29..9033bba 100644
--- gcc/testsuite/g++.dg/cpp0x/pr68348.C
+++ gcc/testsuite/g++.dg/cpp0x/pr68348.C
@@ -0,0 +1,18 @@
+// PR c++/68348
+// { dg-do compile { target c++11 } }
+
+struct C {
+  constexpr C() : w(), x(), y() {}
+  constexpr double fn() const noexcept;
+  double w;
+  double x;
+  double y;
+};
+
+constexpr double C::fn() const noexcept { return w; }
+C foo()
+{
+  C c;
+  c.fn ();
+  return c;
+}

Marek


Re: [PATCH] Add testcase for c++/68348

2015-12-09 Thread Kyrill Tkachov

On 09/12/15 15:38, Marek Polacek wrote:

This adds a testcase for the already fixed PR68348.

Tested on x86_64-linux, ok for trunk?

2015-12-09  Marek Polacek  

PR c++/68348
* g++.dg/cpp0x/pr68348.C: New test.

diff --git gcc/testsuite/g++.dg/cpp0x/pr68348.C 
gcc/testsuite/g++.dg/cpp0x/pr68348.C
index e69de29..9033bba 100644
--- gcc/testsuite/g++.dg/cpp0x/pr68348.C
+++ gcc/testsuite/g++.dg/cpp0x/pr68348.C
@@ -0,0 +1,18 @@
+// PR c++/68348
+// { dg-do compile { target c++11 } }
+
+struct C {
+  constexpr C() : w(), x(), y() {}
+  constexpr double fn() const noexcept;
+  double w;
+  double x;
+  double y;
+};
+
+constexpr double C::fn() const noexcept { return w; }
+C foo()
+{
+  C c;
+  c.fn ();
+  return c;
+}

Marek



Same as:
https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01001.html
:)

Kyrill



[Patch, Fortran] PR68815 - replace '%s' quotes by %< ... %>

2015-12-09 Thread Tobias Burnus
This patch replaces some of the '%s' quotes of diagnostic strings
by the nicer quotes.


First, it replaces some leftovers of '%s' -> %qs in directly used
error strings.


It then also converts some (well: resolve.c only) '%s' to %%<%s%%>,
which are using with sprintf(), but which are still passed as
fmt string to the diagnostic functions.

There is still code using '%s' - but those bits have the form
  sprintf (buffer, " '%s' ... ");
  gfc_error ("Bla ... : %s", ..., buffer);
which prevents the use of %< ... %>. (See last comment in the PR.)


In principle, %<%c%> and %<%d%> should be convertable to %qc and
%qd (as the code is more readable), but the current function
annotation prevent this, telling that the q flag is not valid for
%c and %d. As %< is fine, I didn't dig into it.


Build and regtested on x86-64-gnu-linux.
OK for the trunk?

Tobias
	PR fortran/68815
	* decl.c (gfc_verify_c_interop_param, variable_decl): Use
	%< ... %> for quoting in diagnostics.
	* io.c (check_format): Ditto.
	* resolve.c (resolve_operator): Ditto.
	* symbol.c (check_conflict): Ditto.
	* trans-common.c (translate_common): Ditto.

diff --git a/gcc/fortran/decl.c b/gcc/fortran/decl.c
index bff23e1..b03dadf 100644
--- a/gcc/fortran/decl.c
+++ b/gcc/fortran/decl.c
@@ -1194,7 +1194,7 @@ gfc_verify_c_interop_param (gfc_symbol *sym)
 	  if (sym->as != NULL && sym->as->type == AS_ASSUMED_SHAPE
 	  && !gfc_notify_std (GFC_STD_F2008_TS, "Assumed-shape array %qs "
   "at %L as dummy argument to the BIND(C) "
-  "procedure '%s' at %L", sym->name, 
+  "procedure %qs at %L", sym->name,
   &(sym->declared_at), 
   sym->ns->proc_name->name, 
   &(sym->ns->proc_name->declared_at)))
@@ -2023,9 +2023,9 @@ variable_decl (int elem)
   if (sym != NULL && (sym->attr.dummy || sym->attr.result))
 	{
 	  m = MATCH_ERROR;
-	  gfc_error ("'%s' at %C is a redefinition of the declaration "
+	  gfc_error ("%qs at %C is a redefinition of the declaration "
 		 "in the corresponding interface for MODULE "
-		 "PROCEDURE '%s'", sym->name,
+		 "PROCEDURE %qs", sym->name,
 		 gfc_current_ns->proc_name->name);
 	  goto cleanup;
 	}
diff --git a/gcc/fortran/io.c b/gcc/fortran/io.c
index 8cf952f..9a77234 100644
--- a/gcc/fortran/io.c
+++ b/gcc/fortran/io.c
@@ -549,7 +549,7 @@ check_format (bool is_input)
 {
   const char *posint_required	  = _("Positive width required");
   const char *nonneg_required	  = _("Nonnegative width required");
-  const char *unexpected_element  = _("Unexpected element %<%c%> in format "
+  const char *unexpected_element  = _("Unexpected element %qc in format "
   "string at %L");
   const char *unexpected_end	  = _("Unexpected end of format string");
   const char *zero_width	  = _("Zero width in format descriptor");
diff --git a/gcc/fortran/resolve.c b/gcc/fortran/resolve.c
index 10add62..65a2b7f 100644
--- a/gcc/fortran/resolve.c
+++ b/gcc/fortran/resolve.c
@@ -3560,7 +3560,7 @@ resolve_operator (gfc_expr *e)
 	  break;
 	}
 
-  sprintf (msg, _("Operand of unary numeric operator '%s' at %%L is %s"),
+  sprintf (msg, _("Operand of unary numeric operator %%<%s%%> at %%L is %s"),
 	   gfc_op2string (e->value.op.op), gfc_typename (>ts));
   goto bad_op;
 
@@ -3576,7 +3576,7 @@ resolve_operator (gfc_expr *e)
 	}
 
   sprintf (msg,
-	   _("Operands of binary numeric operator '%s' at %%L are %s/%s"),
+	   _("Operands of binary numeric operator %%<%s%%> at %%L are %s/%s"),
 	   gfc_op2string (e->value.op.op), gfc_typename (>ts),
 	   gfc_typename (>ts));
   goto bad_op;
@@ -3610,7 +3610,7 @@ resolve_operator (gfc_expr *e)
 	  break;
 	}
 
-  sprintf (msg, _("Operands of logical operator '%s' at %%L are %s/%s"),
+  sprintf (msg, _("Operands of logical operator %%<%s%%> at %%L are %s/%s"),
 	   gfc_op2string (e->value.op.op), gfc_typename (>ts),
 	   gfc_typename (>ts));
 
@@ -3695,7 +3695,7 @@ resolve_operator (gfc_expr *e)
 		 ? ".eqv." : ".neqv.", gfc_op2string (e->value.op.op));
   else
 	sprintf (msg,
-		 _("Operands of comparison operator '%s' at %%L are %s/%s"),
+		 _("Operands of comparison operator %%<%s%%> at %%L are %s/%s"),
 		 gfc_op2string (e->value.op.op), gfc_typename (>ts),
 		 gfc_typename (>ts));
 
@@ -3703,13 +3703,14 @@ resolve_operator (gfc_expr *e)
 
 case INTRINSIC_USER:
   if (e->value.op.uop->op == NULL)
-	sprintf (msg, _("Unknown operator '%s' at %%L"), e->value.op.uop->name);
+	sprintf (msg, _("Unknown operator %%<%s%%> at %%L"),
+		 e->value.op.uop->name);
   else if (op2 == NULL)
-	sprintf (msg, _("Operand of user operator '%s' at %%L is %s"),
+	sprintf (msg, _("Operand of user operator %%<%s%%> at %%L is %s"),
 		 e->value.op.uop->name, gfc_typename (>ts));
   else
 	{
-	  sprintf (msg, _("Operands of user operator '%s' at %%L are %s/%s"),
+	  sprintf (msg, _("Operands of user operator %%<%s%%> at %%L are %s/%s"),
 		   e->value.op.uop->name, gfc_typename (>ts),
 		   

Splitting up gcc/omp-low.c? (was: [hsa 5/10] OpenMP lowering/expansion changes (gridification))

2015-12-09 Thread Thomas Schwinge
Hi!

I've been meaning to suggest this for some time already:

On Wed, 9 Dec 2015 14:19:30 +0100, Jakub Jelinek  wrote:
> As for omp-low.c changes, the file is already large enough that it would be
> nice if it is easy to find out what routines are for gridification purposes
> only, use some special prefix (grid_*, ompgrid_*, ...) for all such
> functions?

In addition to that, how about we split up gcc/omp-low.c into several
files?  Would it make sense (I have not yet looked in detail) to do so
along the borders of the several passes defined therein?  Or, can you
tell already that there would be too many cross-references between the
several files to make this infeasible?

I'd suggest to do this shortly before GCC 6 is released, so that
backports from trunk to gcc-6-branch will be easy.  (I assume we don't
have to care for gcc-5-branch backports too much any longer.)

$ wc -l gcc/*.c | sort -r -n | head
  879881 total
   25770 gcc/dwarf2out.c
   19834 gcc/omp-low.c
   14419 gcc/fold-const.c
   14357 gcc/combine.c
   14003 gcc/tree.c
   11622 gcc/expr.c
   11610 gcc/gimplify.c
   10417 gcc/tree-vrp.c
   10328 gcc/var-tracking.c


Grüße
 Thomas


signature.asc
Description: PGP signature


RE: [AArch64] Emit square root using the Newton series

2015-12-09 Thread Evandro Menezes
Hi, Marcus.

I've run Geekbench, SPEC CPU2000 and synthetic benchmarks.

I can share these results iterating an array with values between 1 and 100 
and taking their square root:

Million Operations/sJuno
A53 @850MHz A57 @1100MHz
X^½ DP  Canon31  37 
Newton   13  39 
%Δ  -57%6%
SP  Canon48  144 
Newton   18  62 
%Δ  -63%-57%
X^-½DP  Canon17  16 
Newton   14  42 
%Δ  -17%155%
SP  Canon28  70 
Newton   20  62 
%Δ  -30%-11%

As you can see, it's a mixed result for A57 and a definite regression for A53.  
In mnost benchmarks overall, this is not a good optimization for A57.  That's 
why I left it as a target-specific tuning.

Thank you,

-- 
Evandro Menezes  Austin, TX


> -Original Message-
> From: Marcus Shawcroft [mailto:marcus.shawcr...@gmail.com]
> Sent: Wednesday, December 09, 2015 8:06
> To: Evandro Menezes
> Cc: GCC Patches; Marcus Shawcroft; James Greenhalgh; Andrew Pinski; Benedikt
> Huber; philipp.toms...@theobroma-systems.com
> Subject: Re: [AArch64] Emit square root using the Newton series
> 
> On 8 December 2015 at 21:35, Evandro Menezes  wrote:
> >Emit square root using the Newton series
> >
> >2015-12-03  Evandro Menezes  
> >
> >gcc/
> > * config/aarch64/aarch64-protos.h (aarch64_emit_swsqrt):
> >Declare new
> > function.
> > * config/aarch64/aarch64-simd.md (sqrt2): New
> >expansion and
> > insn definitions.
> > * config/aarch64/aarch64-tuning-flags.def
> > (AARCH64_EXTRA_TUNE_FAST_SQRT): New tuning macro.
> > * config/aarch64/aarch64.c (aarch64_emit_swsqrt): Define
> >new function.
> > * config/aarch64/aarch64.md (sqrt2): New expansion
> >and insn
> > definitions.
> > * config/aarch64/aarch64.opt (mlow-precision-recip-sqrt):
> >Expand option
> > description.
> > * doc/invoke.texi (mlow-precision-recip-sqrt): Likewise.
> >
> > This patch extends the patch that added support for implementing
> > x^-1/2 using the Newton series by adding support for x^1/2 as well.
> 
> Hi Evandro, What benchmarking have you done on this patch?
> /M



Re: [PATCH] Make basic asm implicitly clobber memory, pr24414

2015-12-09 Thread Bernd Edlinger
Hi,

On 09.12.2015 16:48 Bernd Schmidt wrote:
> On 12/09/2015 04:09 PM, Bernd Edlinger wrote:
>
>> So would you agree on the general direction of the patch,
>> if I drop the hunk in sched-deps.c ?
>
> I'm not sure there was any consensus in that other thread, but I think 
> assuming that basic asms clobber memory and CC, can be justified. That 
> certainly seems like a safer default. Ideally though I think it would 
> be best if we could deprecate basic asms in functions, or at least 
> warn about them in -Wall.
>
>

Well no, we did not get to a consensus on the warning issue.

My personal gut feeling on that warning is a bit mixed...

If we have a -Wall-enabled warning on asm("..."), people who know next 
to nothing
about assembler will be encouraged to "fix" this warning in a part of 
the code which
they probably do not understand at all. This frightens me a bit.

Because I know they will soon find out, that adding a few colons fixes 
the warning, but
asm("...":::) is not any better IMHO.

For me, it is just very very unlikely that any piece of assembler really 
clobbers nothing
and has no inputs and no outputs at the same time, even it it looks so 
at first sight...

It is much more likely that someone forgot to fill in the clobber section.

So for me it would also be good to warn on asm("...":::) and require 
that, if they want
to fix this warning, they must at least write something in the clobber
section, like asm ("...":::"nothing"); that would be a new clobber name 
which can only
stand alone and, which can get stripped after the warning processing 
took place in the FE.

So I think a warning should warn on something that is so unusual that it 
is likely a bug.


Bernd.


Re: [gomp4] Fix Fortran deviceptr

2015-12-09 Thread James Norris

Cesar,

On 12/08/2015 11:10 AM, Cesar Philippidis wrote:

On 12/08/2015 08:22 AM, James Norris wrote:


   2. It appears that deviceptr code in GOACC_parallel_keyed is mostly
  identical to GOACC_data_start. Can you put that duplicate code into
  a function? That would be easier to maintain in the long run.



Fixed.


Where? I don't see a patch.


Opp... Now attached.




Since you're working on fortran, can you take a look at how
gfc_match_omp_clauses is handling OMP_CLAUSE_DEVICEPTR. It seems overly
confusing to me. Could it be done in a similar way as OMP_CLAUSE_COPYIN,
i.e., using gfc_match_omp_map_clause?



Confusion removed and replaced with code similar to how
the other clauses are handled.

Patch to be committed after testing completes.

Thanks!
Jim
diff --git a/gcc/fortran/ChangeLog.gomp b/gcc/fortran/ChangeLog.gomp
index 00e5746..d05326d 100644
--- a/gcc/fortran/ChangeLog.gomp
+++ b/gcc/fortran/ChangeLog.gomp
@@ -1,3 +1,8 @@
+2015-12-09  James Norris  
+
+	* openmp.c (gfc_match_omp_clauses): Re-write handling of the
+	deviceptr clause.
+
 2015-12-08  Thomas Schwinge  
 
 	* gfortran.h (symbol_attribute): Add oacc_function_nohost member.
diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index b59528be..78d2e1e 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -818,19 +818,10 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, uint64_t mask,
    OMP_MAP_ALLOC))
 	continue;
   if ((mask & OMP_CLAUSE_DEVICEPTR)
-	  && gfc_match ("deviceptr ( ") == MATCH_YES)
-	{
-	  gfc_omp_namelist **list = >lists[OMP_LIST_MAP];
-	  gfc_omp_namelist **head = NULL;
-	  if (gfc_match_omp_variable_list ("", list, true, NULL, , false)
-	  == MATCH_YES)
-	{
-	  gfc_omp_namelist *n;
-	  for (n = *head; n; n = n->next)
-		n->u.map_op = OMP_MAP_FORCE_DEVICEPTR;
-	  continue;
-	}
-	}
+	  && gfc_match ("deviceptr ( ") == MATCH_YES
+	  && gfc_match_omp_map_clause (>lists[OMP_LIST_MAP],
+   OMP_MAP_FORCE_DEVICEPTR))
+	continue;
   if ((mask & OMP_CLAUSE_BIND) && c->routine_bind == NULL
 	  && gfc_match ("bind ( %s )", >routine_bind) == MATCH_YES)
 	{
diff --git a/libgomp/ChangeLog.gomp b/libgomp/ChangeLog.gomp
index d88cb94..ec133a7 100644
--- a/libgomp/ChangeLog.gomp
+++ b/libgomp/ChangeLog.gomp
@@ -1,3 +1,11 @@
+2015-12-09  James Norris  
+
+	* oacc-parallel.c (handle_ftn_pointers): New function.
+	(GOACC_parallel_keyed, GOACC_data_start): Factor out Fortran pointer
+	handling.
+	* testsuite/libgomp.oacc-fortran/declare-1.f90: Add comment.
+	* testsuite/libgomp.oacc-fortran/deviceptr-1.f90: Fix comment.
+
 2015-12-08  James Norris  
 
 	* testsuite/libgomp.oacc-fortran/kernels-map-1.f90: Add new test.
diff --git a/libgomp/oacc-parallel.c b/libgomp/oacc-parallel.c
index a606152..f68db78 100644
--- a/libgomp/oacc-parallel.c
+++ b/libgomp/oacc-parallel.c
@@ -57,6 +57,51 @@ find_pointer (int pos, size_t mapnum, unsigned short *kinds)
   return 0;
 }
 
+/* Handle the mapping pair that are presented when a
+   deviceptr clause is used with Fortran.  */
+
+static void
+handle_ftn_pointers (size_t mapnum, void **hostaddrs, size_t *sizes,
+		 unsigned short *kinds)
+{
+  int i;
+
+  for (i = 0; i < mapnum; i++)
+{
+  unsigned short kind1 = kinds[i] & 0xff;
+
+  /* Handle Fortran deviceptr clause.  */
+  if (kind1 == GOMP_MAP_FORCE_DEVICEPTR)
+	{
+	  unsigned short kind2;
+
+	  if (i < (signed)mapnum - 1)
+	kind2 = kinds[i + 1] & 0xff;
+	  else
+	kind2 = 0x;
+
+	  if (sizes[i] == sizeof (void *))
+	continue;
+
+	  /* At this point, we're dealing with a Fortran deviceptr.
+	 If the next element is not what we're expecting, then
+	 this is an instance of where the deviceptr variable was
+	 not used within the region and the pointer was removed
+	 by the gimplifier.  */
+	  if (kind2 == GOMP_MAP_POINTER
+	  && sizes[i + 1] == 0
+	  && hostaddrs[i] == *(void **)hostaddrs[i + 1])
+	{
+	  kinds[i+1] = kinds[i];
+	  sizes[i+1] = sizeof (void *);
+	}
+
+	  /* Invalidate the entry.  */
+	  hostaddrs[i] = NULL;
+	}
+}
+}
+
 static void goacc_wait (int async, int num_waits, va_list *ap);
 
 
@@ -99,40 +144,7 @@ GOACC_parallel_keyed (int device, void (*fn) (void *),
   thr = goacc_thread ();
   acc_dev = thr->dev;
 
-  for (i = 0; i < mapnum; i++)
-{
-  unsigned short kind1 = kinds[i] & 0xff;
-
-  /* Handle Fortran deviceptr clause.  */
-  if (kind1 == GOMP_MAP_FORCE_DEVICEPTR)
-	{
-	  unsigned short kind2;
-
-	  if (i < (signed)mapnum - 1)
-	kind2 = kinds[i + 1] & 0xff;
-	  else
-	kind2 = 0x;
-
-	  if (sizes[i] == sizeof (void *))
-	continue;
-
-	  /* At this point, we're dealing with a Fortran deviceptr.
-	 If the next element is not what we're expecting, then
-	 this is an instance of where the deviceptr variable was
-	 not used within the 

RE: [PATCH][ARC] Refurbish emitting DWARF2 for epilogue.

2015-12-09 Thread Claudiu Zissulescu
> The main point of having rtl epilogues is to allow such optimizations.
> Traditionally, we have said that at -O1, it is OK to curb optimizations for 
> the
> sake of having programs that are saner to debug, while -O2 and above should
> just optimize, and the debug info generation is just thrown in afterwards as a
> best-effort attempt.
> 
> Additionally, we also have -Og now.

Well, it seems to me that we prefer to disable optimizations when talking about 
debug related information (see PR target/60598 git-svn-id: 
svn+ssh://gcc.gnu.org/svn/gcc/trunk@208749 138bc75d-0d04-0410-961f-82ee72b054a4 
commit).  

> 
> You can also consider having separate options to control optimizations that
> affect debugging.
> If leaving out epilogue cfi is what it takes to allow epilogue scheduling
> without the compiler crashing, then that is what should be done by default at
> -O2, but if someone finds that particularly vexing, they might appreciate a -
> mepilogue-cfi option to change that default, and instead disable some
> scheduling (delay slot and otherwise).

Unfortunately, dwarf2cfi checks the paths for consistency (dwarf2cfi.c:2284), 
throwing an error if those paths are not ok. Also, with ARC gcc we focus on 
Linux type of application, hence, the unwinding info needs to be consistent.  
As far as I can see, having a sort of mno-epilogue-cfi option will just inhibit 
or not the emitting of the blockage instruction, and the option will be valid 
only when compiling without emitting dwarf information (i.e., the 
mno-epilogue-cfi is incompatible with -g). 
Personally, I do not see the benefit of having such an option, as one may lose 
like 1 cycle per function call (HS/EM cpus) in very particular cases. Running 
the dg.exp, compile.exp, execute.exp, and builing Linux with some default apps, 
we found only 4 cases in which the branch scheduler slot needs the blockage 
mechanism. Also, adding blockage before generating any prologue instruction 
seems to be a wide spread practice in gcc backends (see SH for example). 

Best,
Claudiu


Re: [PATCH] Add testcase for c++/68348

2015-12-09 Thread Marek Polacek
On Wed, Dec 09, 2015 at 03:39:50PM +, Kyrill Tkachov wrote:
> On 09/12/15 15:38, Marek Polacek wrote:
> >This adds a testcase for the already fixed PR68348.
> >
> >Tested on x86_64-linux, ok for trunk?
> >
> >2015-12-09  Marek Polacek  
> >
> > PR c++/68348
> > * g++.dg/cpp0x/pr68348.C: New test.
> >
> >diff --git gcc/testsuite/g++.dg/cpp0x/pr68348.C 
> >gcc/testsuite/g++.dg/cpp0x/pr68348.C
> >index e69de29..9033bba 100644
> >--- gcc/testsuite/g++.dg/cpp0x/pr68348.C
> >+++ gcc/testsuite/g++.dg/cpp0x/pr68348.C
> >@@ -0,0 +1,18 @@
> >+// PR c++/68348
> >+// { dg-do compile { target c++11 } }
> >+
> >+struct C {
> >+  constexpr C() : w(), x(), y() {}
> >+  constexpr double fn() const noexcept;
> >+  double w;
> >+  double x;
> >+  double y;
> >+};
> >+
> >+constexpr double C::fn() const noexcept { return w; }
> >+C foo()
> >+{
> >+  C c;
> >+  c.fn ();
> >+  return c;
> >+}
> >
> > Marek
> >
> 
> Same as:
> https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01001.html
> :)

Ah!  The PR wasn't assigned to anyone.  But I think my version is better
because I think we prefer
// { dg-do compile { target c++11 } }
to
/* { dg-options "-std=c++11 -O2" } */
;)

Marek


Re: [PATCH, testsuite] Fix sse4_1-round* inline asm statements

2015-12-09 Thread Uros Bizjak
On Wed, Dec 9, 2015 at 8:18 AM, Uros Bizjak  wrote:

> Saying that, I see we don't need to define ASM_SUFFIX anymore. I'll
> prepare the patch that removes these #defines.

2015-12-09  Uros Bizjak  

* gcc.target/i386/sse4_1-roundps-1.c: Remove ASM_SUFFIX define.
* gcc.target/i386/sse4_1-roundps-2.c: Ditto.
* gcc.target/i386/sse4_1-roundps-3.c: Ditto.
* gcc.target/i386/sse4_1-roundsd-1.c: Ditto.
* gcc.target/i386/sse4_1-roundsd-2.c: Ditto.
* gcc.target/i386/sse4_1-roundsd-3.c: Ditto.
* gcc.target/i386/sse4_1-roundss-1.c: Ditto.
* gcc.target/i386/sse4_1-roundss-2.c: Ditto.
* gcc.target/i386/sse4_1-roundss-3.c: Ditto.

Tested on x86_64-linux-gnu {,-m32} and committed to mainline SVN.
patch will be backported to all release branches.

Uros.
Index: gcc.target/i386/sse4_1-roundps-1.c
===
--- gcc.target/i386/sse4_1-roundps-1.c  (revision 231440)
+++ gcc.target/i386/sse4_1-roundps-1.c  (working copy)
@@ -7,7 +7,6 @@
 
 #define VEC_T __m128
 #define FP_T float
-#define ASM_SUFFIX "s"
 
 #define ROUND_INTRIN(x, mode) _mm_ceil_ps(x)
 #define ROUND_MODE _MM_FROUND_CEIL
Index: gcc.target/i386/sse4_1-roundss-3.c
===
--- gcc.target/i386/sse4_1-roundss-3.c  (revision 231440)
+++ gcc.target/i386/sse4_1-roundss-3.c  (working copy)
@@ -7,7 +7,6 @@
 
 #define VEC_T __m128
 #define FP_T float
-#define ASM_SUFFIX "s"
 
 #define ROUND_INTRIN(x, mode) _mm_floor_ss(x, x)
 #define ROUND_MODE _MM_FROUND_FLOOR
Index: gcc.target/i386/sse4_1-roundps-2.c
===
--- gcc.target/i386/sse4_1-roundps-2.c  (revision 231440)
+++ gcc.target/i386/sse4_1-roundps-2.c  (working copy)
@@ -7,7 +7,6 @@
 
 #define VEC_T __m128
 #define FP_T float
-#define ASM_SUFFIX "s"
 
 #define ROUND_INTRIN _mm_round_ps
 #define ROUND_MODE _MM_FROUND_NINT
Index: gcc.target/i386/sse4_1-roundps-3.c
===
--- gcc.target/i386/sse4_1-roundps-3.c  (revision 231440)
+++ gcc.target/i386/sse4_1-roundps-3.c  (working copy)
@@ -7,7 +7,6 @@
 
 #define VEC_T __m128
 #define FP_T float
-#define ASM_SUFFIX "s"
 
 #define ROUND_INTRIN(x, mode) _mm_floor_ps(x)
 #define ROUND_MODE _MM_FROUND_FLOOR
Index: gcc.target/i386/sse4_1-roundsd-1.c
===
--- gcc.target/i386/sse4_1-roundsd-1.c  (revision 231440)
+++ gcc.target/i386/sse4_1-roundsd-1.c  (working copy)
@@ -7,7 +7,6 @@
 
 #define VEC_T __m128d
 #define FP_T double
-#define ASM_SUFFIX "l"
 
 #define ROUND_INTRIN(x, mode) _mm_ceil_sd(x, x)
 #define ROUND_MODE _MM_FROUND_CEIL
Index: gcc.target/i386/sse4_1-roundsd-2.c
===
--- gcc.target/i386/sse4_1-roundsd-2.c  (revision 231440)
+++ gcc.target/i386/sse4_1-roundsd-2.c  (working copy)
@@ -7,7 +7,6 @@
 
 #define VEC_T __m128d
 #define FP_T double
-#define ASM_SUFFIX "l"
 
 #define ROUND_INTRIN(x, mode) _mm_round_sd(x, x, mode)
 #define ROUND_MODE _MM_FROUND_NINT
Index: gcc.target/i386/sse4_1-roundsd-3.c
===
--- gcc.target/i386/sse4_1-roundsd-3.c  (revision 231440)
+++ gcc.target/i386/sse4_1-roundsd-3.c  (working copy)
@@ -7,7 +7,6 @@
 
 #define VEC_T __m128d
 #define FP_T double
-#define ASM_SUFFIX "l"
 
 #define ROUND_INTRIN(x, mode) _mm_floor_sd(x, x)
 #define ROUND_MODE _MM_FROUND_FLOOR
Index: gcc.target/i386/sse4_1-roundss-1.c
===
--- gcc.target/i386/sse4_1-roundss-1.c  (revision 231440)
+++ gcc.target/i386/sse4_1-roundss-1.c  (working copy)
@@ -7,7 +7,6 @@
 
 #define VEC_T __m128
 #define FP_T float
-#define ASM_SUFFIX "s"
 
 #define ROUND_INTRIN(x, mode) _mm_ceil_ss(x, x)
 #define ROUND_MODE _MM_FROUND_CEIL
Index: gcc.target/i386/sse4_1-roundss-2.c
===
--- gcc.target/i386/sse4_1-roundss-2.c  (revision 231440)
+++ gcc.target/i386/sse4_1-roundss-2.c  (working copy)
@@ -7,7 +7,6 @@
 
 #define VEC_T __m128
 #define FP_T float
-#define ASM_SUFFIX "s"
 
 #define ROUND_INTRIN(x, mode) _mm_round_ss(x, x, mode)
 #define ROUND_MODE _MM_FROUND_NINT


Re: [PATCH] Add levels to -Wmisleading-indentation; add level 1 to -Wall

2015-12-09 Thread Bernd Schmidt

On 12/09/2015 04:38 PM, David Malcolm wrote:

+/* The following function contains examples of bad indentation that's
+   arguably not misleading, due to a blank line between the guarded and the
+   non-guarded code.  Some of the blank lines deliberately contain
+   redundant whitespace, to verify that this is handled.
+   Based on examples seen in gcc/fortran/io.c, gcc/function.c, and
+   gcc/regrename.c.
+
+   At -Wmisleading-indentation (implying level 1) we shouldn't emit
+   warnings for this function.  Compare with
+   fn_40_level_1 in Wmisleading-indentation-level-1.c and
+   fn_40_level_2 in Wmisleading-indentation-level-2.c.  */
+
+void
+fn_40_implicit_level_1 (int arg)
+{
+if (flagA)
+  foo (0);
+
+  foo (1);
+


I personally see no use for the blank line heuristic, in fact I think it 
is misguided. To my eyes a warning is clearly warranted for the examples 
in this testcase. Do we actually have any users calling for this heuristic?



Bernd


Re: [PATCH] Make basic asm implicitly clobber memory, pr24414

2015-12-09 Thread Bernd Schmidt

On 12/09/2015 04:09 PM, Bernd Edlinger wrote:


So would you agree on the general direction of the patch,
if I drop the hunk in sched-deps.c ?


I'm not sure there was any consensus in that other thread, but I think 
assuming that basic asms clobber memory and CC, can be justified. That 
certainly seems like a safer default. Ideally though I think it would be 
best if we could deprecate basic asms in functions, or at least warn 
about them in -Wall.



Bernd


[committed] Avoid undefined behavior in test.

2015-12-09 Thread Alexander Monakov
I have committed the following testsuite patch as obvious.

The test calls a variadic function that extracts pointers with va_arg, but the
terminating NULL is passed as an int, not a pointer.  This wouldn't trip on
32-bit architectures, and even on 64-bit the test simply iterates until it
gets a NULL stack slot (in any containing frame).

Noticed while testing NVPTX with -mgomp.

2015-12-09  Alexander Monakov  

* gcc.c-torture/execute/980716-1.c: Avoid undefined behavior due to
passing terminating NULL as int rather than pointer.

--- gcc.c-torture/execute/980716-1.c(revision 231457)
+++ gcc.c-torture/execute/980716-1.c(working copy)
@@ -20,7 +20,7 @@
 int
 main()
 {
-stub(1, "ab", "bc", "cx", 0);
+stub(1, "ab", "bc", "cx", (char *)0);
 exit (0);
 }



Re: [Fortran, Patch] Memory sync after coarray image control statements and assignment

2015-12-09 Thread Alessandro Fanfarillo
Done.

I have permission for contributing but I don't have write permission
on the repository.

2015-12-09 8:23 GMT+01:00 Tobias Burnus :
> Alessandro Fanfarillo wrote:
>>
>> in attachment the new patch. I also checked the behavior with
>> move_alloc: it synchronizes right after the deregistration of the
>> destination.
>> I also noticed that __asm__ __volatile__ ("":::"memory") is called
>> before sync all and not after. It shouldn't be a problem, right?
>
>
> The patch looks mostly good to me, except:
>
>> @@ -1222,6 +1230,15 @@ gfc_conv_intrinsic_caf_get (gfc_se *se, gfc_expr
>> *expr, tree lhs, tree lhs_kind,
>> se->expr = res_var;
>> if (array_expr->ts.type == BT_CHARACTER)
>>   se->string_length = argse.string_length;
>> +
>> +  /* It guarantees memory consistency within the same segment */
>> +  /* tmp = gfc_build_string_const (strlen ("memory")+1, "memory"), */
>> +  /* tmp = build5_loc (input_location, ASM_EXPR, void_type_node, */
>> +  /*   gfc_build_string_const (1, ""), NULL_TREE, NULL_TREE,
>> */
>> +  /*   tree_cons (NULL_TREE, tmp, NULL_TREE), NULL_TREE); */
>> +  /* ASM_VOLATILE_P (tmp) = 1; */
>> +  /* gfc_add_expr_to_block (>pre, tmp); */
>> +
>>   }
>
>
> Do not add out-commented code. Please remove.
>
>
>>  gfc_trans_critical (gfc_code *code)
>>  {
>>stmtblock_t block;
>>tree tmp, token = NULL_TREE;
>>
>>gfc_start_block ();
>>
>>if (flag_coarray == GFC_FCOARRAY_LIB)
>>  {
>>token = gfc_get_symbol_decl (code->resolved_sym);
>>token = GFC_TYPE_ARRAY_CAF_TOKEN (TREE_TYPE (token));
>>tmp = build_call_expr_loc (input_location, gfor_fndecl_caf_lock, 7,
>>  token, integer_zero_node,
>> integer_one_node,
>>  null_pointer_node, null_pointer_node,
>>  null_pointer_node, integer_zero_node);
>>gfc_add_expr_to_block (, tmp);
>>  }
>>
>> +  /* It guarantees memory consistency within the same segment */
>> +  tmp = gfc_build_string_const (strlen ("memory")+1, "memory"),
>> +tmp = build5_loc (input_location, ASM_EXPR, void_type_node,
>> + gfc_build_string_const (1, ""), NULL_TREE,
>> NULL_TREE,
>> + tree_cons (NULL_TREE, tmp, NULL_TREE), NULL_TREE);
>> +  ASM_VOLATILE_P (tmp) = 1;
>> +
>> +  gfc_add_expr_to_block (, tmp);
>> +
>>tmp = gfc_trans_code (code->block->next);
>>gfc_add_expr_to_block (, tmp);
>>
>>if (flag_coarray == GFC_FCOARRAY_LIB)
>>  {
>>tmp = build_call_expr_loc (input_location, gfor_fndecl_caf_unlock,
>> 6,
>>  token, integer_zero_node,
>> integer_one_node,
>>  null_pointer_node, null_pointer_node,
>>  integer_zero_node);
>>gfc_add_expr_to_block (, tmp);
>>  }
>>
>> +  /* It guarantees memory consistency within the same segment */
>> +  tmp = gfc_build_string_const (strlen ("memory")+1, "memory"),
>> +  tmp = build5_loc (input_location, ASM_EXPR, void_type_node,
>> +   gfc_build_string_const (1, ""), NULL_TREE,
>> NULL_TREE,
>> +   tree_cons (NULL_TREE, tmp, NULL_TREE), NULL_TREE);
>> +  ASM_VOLATILE_P (tmp) = 1;
>> +
>> +  gfc_add_expr_to_block (, tmp);
>>
>>return gfc_finish_block ();
>>  }
>
>
> Please move the two new code additions into the associated "if (flag_coarray
> == GFC_FCOARRAY_LIB)" blocks - and keep an eye on the indentation: for the
> current code location, the second block is wrongly idented.
>
>
> With the two issues fixed: LGTM.
>
> Cheers,
>
> Tobias
>
> PS: I assume you can still commit yourself. I am asking because Steve did
> commit the recent EVENT patch for you.
2015-12-09  Tobias Burnus  
Alessandro Fanfarillo 

* trans.c (gfc_allocate_using_lib,gfc_deallocate_with_status):
Introducing __asm__ __volatile__ ("":::"memory")
after image control statements.
* trans-stmt.c  (gfc_trans_sync, gfc_trans_event_post_wait,
gfc_trans_lock_unlock, gfc_trans_critical): Ditto.
* trans-intrinsic.c (gfc_conv_intrinsic_caf_get,
conv_caf_send): Introducing __asm__ __volatile__ ("":::"memory")
after send, before get and around sendget.

2015-12-09  Tobias Burnus  
Alessandro Fanfarillo 

* gfortran.dg/coarray_40.f90: New.

commit 96975d401c4b66c0362d853bf6b604cda171552b
Author: Alessandro Fanfarillo 
Date:   Wed Dec 9 17:05:28 2015 +0100

Introducing __asm__ __volatile__ (:::memory) after image control 
statements, send, get and sendget

diff --git a/gcc/fortran/trans-intrinsic.c b/gcc/fortran/trans-intrinsic.c
index 21efe44..31bad35 100644
--- a/gcc/fortran/trans-intrinsic.c
+++ b/gcc/fortran/trans-intrinsic.c
@@ -1211,6 +1211,14 

[PATCH] Better error recovery for merge-conflict markers (v4)

2015-12-09 Thread David Malcolm
On Wed, 2015-11-04 at 14:56 +0100, Bernd Schmidt wrote:
> On 10/30/2015 04:16 PM, David Malcolm wrote:
> > The idea is to more gracefully handle merger conflict markers
> > in the source code being compiled.  Specifically, in the C and
> > C++ frontends, if we're about to emit an error, see if the
> > source code is at a merger conflict marker, and if so, emit
> > a more specific message, so that the user gets this:
> >
> > foo.c:1:1: error: source file contains patch conflict marker
> >   <<< HEAD
> >   ^
> >
> > It's something of a "fit and finish" cosmetic item, but these
> > things add up.
>
> This seems like fairly low impact but also low cost, so I'm fine with it
> in principle. I wonder whether the length of the marker is the same
> across all versions of patch (and VC tools)?

It's hardcoded for GNU patch:
  http://git.savannah.gnu.org/cgit/patch.git/tree/src/merge.c
which hardcodes e.g.:
  fputs (outstate->after_newline + "\n<<<\n", fp);

I don't know if it's hardcoded for CVS or Subversion, but both have
documentation showing that format:
 ftp://ftp.gnu.org/old-gnu/Manuals/cvs/html_node/cvs_38.html
 http://svnbook.red-bean.com/en/1.7/svn.tour.cycle.html#svn.tour.cycle.resolve

It's the default of git:
  http://git.kernel.org/cgit/git/git.git/tree/Documentation/merge-config.txt
(config option "merge.conflictStyle")
This git commit (to git) seems to have generalized it to support a
"conflict-marker-size" attribute:
  https://github.com/git/git/commit/8588567c96490b8d236b1bc13f9bcb0dfa118efe

Mercurial uses them; the format appears to be a keyword-argument in:
  https://selenic.com/hg/file/tip/mercurial/simplemerge.py#l91
but it's hardcoded in this regex in filemerge.py:
if re.search("^(<<< .*|===|>>> .*)$", fcd.data(),

Bazaar uses them; see e.g.:
  
http://bazaar.launchpad.net/~bzr-pqm/bzr/bzr.dev/view/head:/bzrlib/tests/test_merge3.py
(I couldn't easily tell if they're configurable)

FWIW, Perforce appears to use a different format;
http://www.perforce.com/perforce/doc.current/manuals/p4guide/chapter.resolve.html
has an example showing:

 ORIGINAL file#n
(text from the original version)
 THEIR file#m
(text from their file)
 YOURS file
(text from your file)


>From what I can tell, Perforce is the outlier here.

> > +static bool
> > +c_parser_peek_conflict_marker (c_parser *parser, enum cpp_ttype tok1_kind)
> > +{
> > +  c_token *token2 = c_parser_peek_2nd_token (parser);
> > +  if (token2->type != tok1_kind)
> > +return false;
> > +  c_token *token3 = c_parser_peek_nth_token (parser, 3);
> > +  if (token3->type != tok1_kind)
> > +return false;
> > +  c_token *token4 = c_parser_peek_nth_token (parser, 4);
> > +  if (token4->type != conflict_marker_get_final_tok_kind (tok1_kind))
> > +return false;
> > +  return true;
> > +}
>
> Just thinking out loud - I guess it would be too much to hope for to
> share lexers between frontends so that we need only one copy of this?

Probably :(

> > +extern short some_var; /* this line would lead to a warning */
>
> Would or does? I don't see anything suppressing it?

It's skipped in error-handling.

c_parser_declaration_or_fndef has:
1794  declarator = c_parser_declarator (parser,
1795specs->typespec_kind != 
ctsk_none,
1796C_DTR_NORMAL, );
1797  if (declarator == NULL)
1798{
[...snip...]
1807  c_parser_skip_to_end_of_block_or_statement (parser);

The call to c_parser_declarator fails, and when issuing:
3465  c_parser_error (parser, "expected identifier or %<(%>");
we emit the "conflict marker" wording error for the error.

Then at line 1807 we skip, discarding everything up to the ";" in that
decl.

Would a better wording be:

extern short some_var; /* This line would lead to a warning due to the
  duplicate name, but it is skipped when handling
  the conflict marker.  */

> There seems to be no testcase verifying what happens if the marker is
> not at the start of the line (IMO it should not be interpreted as a marker).

The v3 patch actually reported them as markers regardless of
location.
The v4 patch now verifies that they are at the start of the line;
I've added test coverage for this (patch-conflict-markers-11.c).

That said, it's not clear they're always at the beginning of a line;
this bazaar bug indicates that CVS (and bazaar) can emit them
mid-line:
  https://bugs.launchpad.net/bzr/+bug/36399

I noticed a visual glitch with the v3 patch now that we have range
information for tokens:  with caret-printing, we get just the first
token within the marker underlined:

 <<< HEAD
 ^~

which looks strange (especially with the underlined chars colorized).

Hence in the v4 patch I've added a location tweak so that it
underline/colorizes *all* of the marker:

 <<< HEAD
 ^~~

Wording-wise, should it be 

[PATCH v2] Do not sanitize left shifts for -fwrapv (PR68418)

2015-12-09 Thread Paolo Bonzini
Left shifts into the sign bit is a kind of overflow, and the
standard chooses to treat left shifts of negative values the
same way.

However, the -fwrapv option modifies the language to one where
integers are defined as two's complement---which also defines
entirely the behavior of shifts.  Disable sanitization of left
shifts when -fwrapv is in effect, using the same logic as
instrument_si_overflow.  The same change was proposed
for LLVM at https://llvm.org/bugs/show_bug.cgi?id=25552.

Bootstrapped/regtested x86_64-pc-linux-gnu.  Ok for trunk, and for
GCC 5 branch after 5.3 is released?

Thanks,

Paolo

gcc:
PR sanitizer/68418
* c-family/c-ubsan.c (ubsan_instrument_shift): Disable
sanitization of left shifts for wrapping signed types as well.

gcc/testsuite:
PR sanitizer/68418
* gcc.dg/ubsan/c99-wrapv-shift-1.c,
gcc.dg/ubsan/c99-wrapv-shift-2.c: New testcases.

Index: c-family/c-ubsan.c
===
--- c-family/c-ubsan.c  (revision 231289)
+++ c-family/c-ubsan.c  (working copy)
@@ -124,12 +124,17 @@ ubsan_instrument_shift (location_t loc, enum tree_
   t = fold_convert_loc (loc, op1_utype, op1);
   t = fold_build2 (GT_EXPR, boolean_type_node, t, uprecm1);
 
+  /* If this is not a signed operation, don't perform overflow checks.
+ Also punt on bit-fields.  */
+  if (!INTEGRAL_TYPE_P (type0)
+  || TYPE_OVERFLOW_WRAPS (type0)
+  || GET_MODE_BITSIZE (TYPE_MODE (type0)) != TYPE_PRECISION (type0))
+;
+
   /* For signed x << y, in C99/C11, the following:
  (unsigned) x >> (uprecm1 - y)
  if non-zero, is undefined.  */
-  if (code == LSHIFT_EXPR
-  && !TYPE_UNSIGNED (type0)
-  && flag_isoc99)
+  else if (code == LSHIFT_EXPR && flag_isoc99 && cxx_dialect < cxx11)
 {
   tree x = fold_build2 (MINUS_EXPR, op1_utype, uprecm1,
fold_convert (op1_utype, unshare_expr (op1)));
@@ -142,9 +147,7 @@ ubsan_instrument_shift (location_t loc, enum tree_
   /* For signed x << y, in C++11 and later, the following:
  x < 0 || ((unsigned) x >> (uprecm1 - y))
  if > 1, is undefined.  */
-  if (code == LSHIFT_EXPR
-  && !TYPE_UNSIGNED (type0)
-  && (cxx_dialect >= cxx11))
+  else if (code == LSHIFT_EXPR && cxx_dialect >= cxx11)
 {
   tree x = fold_build2 (MINUS_EXPR, op1_utype, uprecm1,
fold_convert (op1_utype, unshare_expr (op1)));
Index: testsuite/gcc.dg/ubsan/c99-wrapv-shift-1.c
===
--- testsuite/gcc.dg/ubsan/c99-wrapv-shift-1.c  (revision 0)
+++ testsuite/gcc.dg/ubsan/c99-wrapv-shift-1.c  (working copy)
@@ -0,0 +1,9 @@
+/* { dg-do run } */
+/* { dg-options "-fsanitize=shift -fwrapv -w -std=c99" } */
+
+int
+main (void)
+{
+  int a = -42;
+  a << 1;
+}
Index: testsuite/gcc.dg/ubsan/c99-wrapv-shift-2.c
===
--- testsuite/gcc.dg/ubsan/c99-wrapv-shift-2.c  (revision 0)
+++ testsuite/gcc.dg/ubsan/c99-wrapv-shift-2.c  (working copy)
@@ -0,0 +1,9 @@
+/* { dg-do run } */
+/* { dg-options "-fsanitize=shift -fwrapv -w -std=c99" } */
+
+int
+main (void)
+{
+  int a = 1;
+  a <<= 31;
+}


Re: Splitting up gcc/omp-low.c?

2015-12-09 Thread Bernd Schmidt

On 12/09/2015 05:24 PM, Thomas Schwinge wrote:


In addition to that, how about we split up gcc/omp-low.c into several
files?  Would it make sense (I have not yet looked in detail) to do so
along the borders of the several passes defined therein?  Or, can you
tell already that there would be too many cross-references between the
several files to make this infeasible?


It would be nice to get rid of all the code duplication in that file. 
That alone could reduce the size by quite a bit, and hopefully make it 
easier to read.


I suspect a split along the ompexp/omplow boundary would be quite easy 
to achieve.



I'd suggest to do this shortly before GCC 6 is released, so that
backports from trunk to gcc-6-branch will be easy.  (I assume we don't
have to care for gcc-5-branch backports too much any longer.)


I'll declare myself agnostic as to whether such a change is appropriate 
for gcc-6 at this stage. I guess it kind of depends on the specifics.



Bernd


Re: [PATCH] Better error recovery for merge-conflict markers (v4)

2015-12-09 Thread Bernd Schmidt

On 12/09/2015 05:58 PM, David Malcolm wrote:

On Wed, 2015-11-04 at 14:56 +0100, Bernd Schmidt wrote:


This seems like fairly low impact but also low cost, so I'm fine with it
in principle. I wonder whether the length of the marker is the same
across all versions of patch (and VC tools)?


It's hardcoded for GNU patch:

[...]
>From what I can tell, Perforce is the outlier here.

Thanks for checking all that.


Just thinking out loud - I guess it would be too much to hope for to
share lexers between frontends so that we need only one copy of this?


Probably :(


Someone slap sense into me, I just thought of deriving C and C++ parsers 
from a common base class... (no this is not a suggestion for this patch).



Would a better wording be:

extern short some_var; /* This line would lead to a warning due to the
   duplicate name, but it is skipped when handling
   the conflict marker.  */


I think so, yes.


That said, it's not clear they're always at the beginning of a line;
this bazaar bug indicates that CVS (and bazaar) can emit them
mid-line:
   https://bugs.launchpad.net/bzr/+bug/36399


Ok. CVS I think we shouldn't worry about, and it looks like this is one 
particular bug/corner case where the conflict end marker is the last 
thing in the file. I think on the whole it's best to check for beginning 
of the line as you've done.



Wording-wise, should it be "merge conflict marker", rather
than "patch conflict marker"?

Clang spells it:
"error: version control conflict marker in file"
http://blog.llvm.org/2010/04/amazing-feats-of-clang-error-recovery.html#merge_conflicts


Yeah, if another compiler has a similar/identical diagnostic I think we 
should just copy that unless there's a very good reason not to.



Rebased on top of r231445 (from yesterday).
Successfully bootstrapped on x86_64-pc-linux-gnu.
Adds 82 new PASSes to g++.sum and 27 new PASSes to gcc.sum.

OK for trunk?


I'm inclined to say yes since it was originally submitted in time and 
it's hard to imagine how the change could be risky (I'll point out right 
away that there are one or two other patches in the queue that were also 
submitted in time which I feel should not be considered for gcc-6 at 
this point due to risk).


Let's wait until the end of the week for objections, commit then.


Bernd


Re: [PATCH] enable loop fusion on isl-15

2015-12-09 Thread Sebastian Pop
Richard Biener wrote:
> On Fri, Dec 4, 2015 at 8:59 PM, Sebastian Paul Pop  wrote:
> > I would highly recommend updating the required version of ISL to isl-0.15:
> > that would simplify the existing code, removing a lot of code under "#ifdef
> > old ISL",
> > and allow us to fully transition to schedule_trees instead of dealing with
> > the
> > old antiquated union_maps in the scheudler.  The result is faster
> > compilation time.
> 
> Hmm.  I think we agreed to raise the requirement to ISL 0.14.  OTOH the plan
> was to make graphite enabled by -O3 [-fprofile-use] by default which would
> mean making ISL a hard host requirement.  That raises the barrier on making
> the version requirement stricter ...
> 
> Sebastian, were quite into stage3 already - what's your plans / progress with
> the defaulting of GRPAHITE?  (compile-time / performance numbers though
> I see ICEs still popping up - a good thing in some sense as it looks like
> GRAPHITE gets testing).

Richard,

to enable Graphite by default in -O3 we had two requirements:
- reduce compilation time when compiling with Graphite,
- show performance improvements when using Graphite.

We did a good work in reducing the overall compilation time used by Graphite
with the rewrite of the SCoP detection algorithm.  Overall compilation time
spent in SCoP detection on tramp3d-v4 was reduced from 7% to 0.3%.  During a
bootstrap of GCC the slowest SCoP detection was 0.3%, and the overall percentage
overhead in terms of number of instructions executed by the SCoP detection was
reduced from 0.3% to 0.01% (measured with counting the number of instructions
with valgrind.)  Details of algorithmic changes and experiments are in a paper
that we submitted (and just got accepted) in the polyhedral workshop IMPACT'16:
https://gcc.gnu.org/wiki/Graphite?action=AttachFile=view=scop-detection-submitted-for-review.pdf

On the code generation side, we made sure to keep the representation in SSA all
the time, such that when Graphite does not transform the code, the original code
is left in place with no change.  When Graphite does a transform, the generated
code is also in SSA form, and we fixed several scalar performance issues linked
to the loop level where scalar definitions were inserted: now we compute the
insertion place such that there is no need for other code motion passes like
LICM to optimize the code generated by Graphite.

We are still working on tuning Graphite's fusion, tiling, and interchange and we
will report improvements on the Polybench loop kernels soon.

Thanks,
Sebastian

> 
> Thanks,
> Richard.
> 
> > Thanks,
> > Sebastian
> >
> > -Original Message-
> > From: Mike Stump [mailto:mikest...@comcast.net]
> > Sent: Friday, December 04, 2015 12:03 PM
> > To: Alan Lawrence
> > Cc: Sebastian Pop; seb...@gmail.com; gcc-patches@gcc.gnu.org;
> > hiradi...@msn.com
> > Subject: Re: [PATCH] enable loop fusion on isl-15
> >
> > On Dec 4, 2015, at 5:13 AM, Alan Lawrence  wrote:
> >> On 05/11/15 21:43, Sebastian Pop wrote:
> >>>* graphite-optimize-isl.c (optimize_isl): Call
> >>>isl_options_set_schedule_maximize_band_depth.
> >>>
> >>>* gcc.dg/graphite/fuse-1.c: New.
> >>>* gcc.dg/graphite/fuse-2.c: New.
> >>>* gcc.dg/graphite/interchange-13.c: Remove bogus check.
> >>
> >> I note that the test
> >>
> >> scan-tree-dump-times forwprop4 "gimple_simplified to[^\\n]*\\^ 12" 1
> >>
> >> FAILs under isl-0.14, with which GCC can still be built and generally
> > claims to work.
> >>
> >> Is it worth trying to detect this in the testsuite, so we can XFAIL it? By
> > which I mean, is there a reasonable testsuite mechanism by which we could do
> > that?
> >
> > You can permanently ignore it by updating to 0.15?  I don't see the
> > advantage of bothering to finesse this too much.  I don't know of a way to
> > detect 14 v 15 other than this test case, but, if you do that, you can't use
> > that result to gate this test case.  If one wanted to engineer in a way, one
> > would expose the isl version via a preprocessor symbol (built in), and then
> > the test case would use that to gate it.  If we had to fix it, I think I'd
> > prefer we just raise the isl version to 15 or later and be done with it.
> >


Re: [PATCH] Add levels to -Wmisleading-indentation; add level 1 to -Wall

2015-12-09 Thread David Malcolm
On Wed, 2015-12-09 at 16:40 +0100, Bernd Schmidt wrote:
> On 12/09/2015 04:38 PM, David Malcolm wrote:
> > +/* The following function contains examples of bad indentation that's
> > +   arguably not misleading, due to a blank line between the guarded and the
> > +   non-guarded code.  Some of the blank lines deliberately contain
> > +   redundant whitespace, to verify that this is handled.
> > +   Based on examples seen in gcc/fortran/io.c, gcc/function.c, and
> > +   gcc/regrename.c.
> > +
> > +   At -Wmisleading-indentation (implying level 1) we shouldn't emit
> > +   warnings for this function.  Compare with
> > +   fn_40_level_1 in Wmisleading-indentation-level-1.c and
> > +   fn_40_level_2 in Wmisleading-indentation-level-2.c.  */
> > +
> > +void
> > +fn_40_implicit_level_1 (int arg)
> > +{
> > +if (flagA)
> > +  foo (0);
> > +
> > +  foo (1);
> > +
> 
> I personally see no use for the blank line heuristic, in fact I think it 
> is misguided. To my eyes a warning is clearly warranted for the examples 
> in this testcase. 

This goes back to the examples given in:
  https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03225.html

This is about managing the signal:noise ratio for something in -Wall.

The distinction I want to make here is between badly indented code vs
misleadingly indented code.  Yes, the code is badly indented, but to my
eyes the code is sufficiently badly indented that it has become *not*
misleading: I can immediately see that something is awry.  I want to
preserve the -Wall level of the warning for the cases when it's not
immediately clear that there's a problem.

The levels idea means that you can turn off the blank-lines heuristic by
setting -Wmisleading-indentation=2.

> Do we actually have any users calling for this heuristic?

FWIW, gcc trunk is clean with the blank-lines heuristic, but not without
it  (see the URL above for the 3 places I know of that fire without the
heuristic).

Dave




Re: [RFA] Transparent alias suport part 10: Fix base+offset alias analysis oracle WRT aliases

2015-12-09 Thread Jan Hubicka
> > +  bool in_symtab1 = decl_in_symtab_p (base1);
> > +  bool in_symtab2 = decl_in_symtab_p (base2);
> > +
> > +  /* Declarations of non-automatic variables may have aliases.  All other
> > + decls are unique.  */
> > +  if (in_symtab1 != in_symtab2 || !in_symtab1)
> > +return 0;
> > +  ret = symtab_node::get_create (base1)->equal_address_to
> > +(symtab_node::get_create (base2), true);
> 
> And I don't like ::get_create too much, why's ::get not enough here?
> Predicates really shouldn't end up changing global state.

I will re-test with ::get here.  I think I only copied the code from folding.
One case we can get decls w/o corresponding symbol table entries during late
compilation is when we introduce a new builtin call.  We may just revisit
all those place and be sure we inser that to symbol table, but that is for next
stage1.

> 
> please refactor to avoid doing this twice WRT !x_decl / !y_decl
> 
> Ok with these changes.

Thanks!
I think I also need to update IPA-PTA code to work on alias targets same way as
I did for ipa-reference (need to construct a testcase) and we sould be down to
missed optimizations and not wrongcodes with extra aliases
(GVN/CSE/operand_equal_p should know about aliases to make them completely
no-op)

Honza
> 
> Thanks,
> Richard.
> 
> > +  /* If decls are different or we know by offsets that there is no 
> > overlap,
> > +we win.  */
> > +  if (!cmp || !offset_overlap_p (c, xsize, ysize))
> > +   return 0;
> > +  /* Decls may or may not be different and offsets overlap*/
> > +  return -1;
> > +}
> > +  else if (rtx_equal_for_memref_p (x, y))
> >  {
> >return offset_overlap_p (c, xsize, ysize);
> >  }
> >  
> >/* This code used to check for conflicts involving stack references and
> >   globals but the base address alias code now handles these cases.  */
> >  
> > @@ -2636,7 +2703,7 @@ nonoverlapping_memrefs_p (const_rtx x, c
> >   are constants or if one is a constant and the other a pointer into the
> >   stack frame.  Otherwise a different base means we can't tell if they
> >   overlap or not.  */
> > -  if (! rtx_equal_p (basex, basey))
> > +  if (! compare_base_decls (exprx, expry))
> >  return ((CONSTANT_P (basex) && CONSTANT_P (basey))
> > || (CONSTANT_P (basex) && REG_P (basey)
> > && REGNO_PTR_FRAME_P (REGNO (basey)))
> > @@ -2647,6 +2714,10 @@ nonoverlapping_memrefs_p (const_rtx x, c
> >if (loop_invariant)
> >  return 0;  
> >  
> > +  /* Offset based disambiguation is OK even if we do not know that the
> > + declarations are necessarily different
> > +(i.e. compare_base_decls (exprx, expry) == 2)  */
> > +
> >sizex = (!MEM_P (rtlx) ? (int) GET_MODE_SIZE (GET_MODE (rtlx))
> >: MEM_SIZE_KNOWN_P (rtlx) ? MEM_SIZE (rtlx)
> >: -1);
> > Index: alias.h
> > ===
> > --- alias.h (revision 231438)
> > +++ alias.h (working copy)
> > @@ -36,6 +36,7 @@ extern int nonoverlapping_memrefs_p (con
> >  extern void dump_alias_stats_in_alias_c (FILE *s);
> >  tree reference_alias_ptr_type (tree);
> >  bool alias_ptr_types_compatible_p (tree, tree);
> > +int compare_base_decls (tree, tree);
> >  
> >  /* This alias set can be used to force a memory to conflict with all
> > other memories, creating a barrier across which no memory reference
> > 
> > 
> 
> -- 
> Richard Biener 
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
> 21284 (AG Nuernberg)


Re: [AArch64] Emit square root using the Newton series

2015-12-09 Thread Kyrill Tkachov

Hi Evandro,

On 08/12/15 21:35, Evandro Menezes wrote:

Emit square root using the Newton series

   2015-12-03  Evandro Menezes  

   gcc/
* config/aarch64/aarch64-protos.h (aarch64_emit_swsqrt):
   Declare new
function.
* config/aarch64/aarch64-simd.md (sqrt2): New
   expansion and
insn definitions.
* config/aarch64/aarch64-tuning-flags.def
(AARCH64_EXTRA_TUNE_FAST_SQRT): New tuning macro.
* config/aarch64/aarch64.c (aarch64_emit_swsqrt): Define
   new function.
* config/aarch64/aarch64.md (sqrt2): New expansion
   and insn
definitions.
* config/aarch64/aarch64.opt (mlow-precision-recip-sqrt):
   Expand option
description.
* doc/invoke.texi (mlow-precision-recip-sqrt): Likewise.

This patch extends the patch that added support for implementing x^-1/2 using 
the Newton series by adding support for x^1/2 as well.

Is it OK at this point of stage 3?



A comment on the patch itself from me...

diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def 
b/gcc/config/aarch64/aarch64-tuning-flags.def
index 6f7dbce..11c6c9a 100644
--- a/gcc/config/aarch64/aarch64-tuning-flags.def
+++ b/gcc/config/aarch64/aarch64-tuning-flags.def
@@ -30,4 +30,4 @@
 
 AARCH64_EXTRA_TUNING_OPTION ("rename_fma_regs", RENAME_FMA_REGS)

 AARCH64_EXTRA_TUNING_OPTION ("recip_sqrt", RECIP_SQRT)
-
+AARCH64_EXTRA_TUNING_OPTION ("fast_sqrt", FAST_SQRT)

That seems like a misleading name to me.
If we're doing this, that means that the sqrt instruction is not faster
than doing the inverse sqrt estimation followed by a multiply.
I think a name like "synth_sqrt" or "estimate_sqrt" or something along those 
lines
is more appropriate.

Thanks,
Kyrill



Re: [AArch64] Emit square root using the Newton series

2015-12-09 Thread Kyrill Tkachov


On 09/12/15 17:02, Kyrill Tkachov wrote:


On 09/12/15 16:59, Evandro Menezes wrote:

On 12/09/2015 10:52 AM, Kyrill Tkachov wrote:

Hi Evandro,

On 08/12/15 21:35, Evandro Menezes wrote:

Emit square root using the Newton series

   2015-12-03  Evandro Menezes 

   gcc/
* config/aarch64/aarch64-protos.h (aarch64_emit_swsqrt):
   Declare new
function.
* config/aarch64/aarch64-simd.md (sqrt2): New
   expansion and
insn definitions.
* config/aarch64/aarch64-tuning-flags.def
(AARCH64_EXTRA_TUNE_FAST_SQRT): New tuning macro.
* config/aarch64/aarch64.c (aarch64_emit_swsqrt): Define
   new function.
* config/aarch64/aarch64.md (sqrt2): New expansion
   and insn
definitions.
* config/aarch64/aarch64.opt (mlow-precision-recip-sqrt):
   Expand option
description.
* doc/invoke.texi (mlow-precision-recip-sqrt): Likewise.

This patch extends the patch that added support for implementing x^-1/2 using 
the Newton series by adding support for x^1/2 as well.

Is it OK at this point of stage 3?



A comment on the patch itself from me...

diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def 
b/gcc/config/aarch64/aarch64-tuning-flags.def
index 6f7dbce..11c6c9a 100644
--- a/gcc/config/aarch64/aarch64-tuning-flags.def
+++ b/gcc/config/aarch64/aarch64-tuning-flags.def
@@ -30,4 +30,4 @@

 AARCH64_EXTRA_TUNING_OPTION ("rename_fma_regs", RENAME_FMA_REGS)
 AARCH64_EXTRA_TUNING_OPTION ("recip_sqrt", RECIP_SQRT)
-
+AARCH64_EXTRA_TUNING_OPTION ("fast_sqrt", FAST_SQRT)

That seems like a misleading name to me.
If we're doing this, that means that the sqrt instruction is not faster
than doing the inverse sqrt estimation followed by a multiply.
I think a name like "synth_sqrt" or "estimate_sqrt" or something along those 
lines
is more appropriate.


Unfortunately, this is the case on Exynos M1: the series is faster than the instruction. 
:-(  So, other targets when this is also true, using the "fast_sqrt" option 
might make sense.



Sure, but the way your patch is written, we will emit the series when 
"fast_sqrt" is set, rather
than the other way around, unless I'm misreading the logic in:



Sorry, what I meant to say is it would be clearer, IMO, to describe the 
compiler action that is being taken
(e.g. the rename_fma_regs tuning flag), in this case it's estimating sqrt using 
a series.

Kyrill


diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 030a101..f6d2da4 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -4280,7 +4280,23 @@

 ;; sqrt

-(define_insn "sqrt2"
+(define_expand "sqrt2"
+  [(set (match_operand:VDQF 0 "register_operand")
+(sqrt:VDQF (match_operand:VDQF 1 "register_operand")))]
+  "TARGET_SIMD"
+{
+  if ((AARCH64_EXTRA_TUNE_FAST_SQRT & aarch64_tune_params.extra_tuning_flags)
+  && !optimize_function_for_size_p (cfun)
+  && flag_finite_math_only
+  && !flag_trapping_math
+  && flag_unsafe_math_optimizations)
+{
+  aarch64_emit_swsqrt (operands[0], operands[1]);
+  DONE;
+}
+})


Thanks,
Kyrill






Re: [PATCH, ARM] PR68674 Fix LTO support for neon builtins and error catching

2015-12-09 Thread Kyrill Tkachov

Hi Christian,

On 08/12/15 12:53, Christian Bruel wrote:

Hi,

The order of the NEON builtins construction has led to complications since the attribute target support. This was not a problem when driven from the command line, but was causing various issues when the builtins was mixed between fpu 
configurations or when used with LTO.


Firstly the builtin functions was not initialized before the parsing of 
functions, leading to wrong type initializations.

Then error catching code when a builtin was used without the proper fpu flags 
was incomprehensible for the user, for instance

#include "arm_neon.h"

int8x8_t a, b;
int16x8_t e;

void
main()
{
  e = (int16x8_t)__builtin_neon_vaddlsv8qi (a, b);
}

compiled with default options (without -mfpu=neon -mfloat-abi=hard) gave pages 
of

/arm-none-eabi/6.0.0/include/arm_neon.h:39:9: error: unknown type name 
'__simd64_int8_t'
 typedef __simd64_int8_t int8x8_t;
...
...
arm_neon.h:4724:3: error: can't convert a vector of type 'poly64x2_t {aka 
__vector(4) int}' to type 'int' which has different size
   return (poly64x2_t)__builtin_neon_vsli_nv2di ((int64x2_t) __a, (int64x2_t) 
__b, __c);
   ^~
...
... and one for each arm_neon.h lines..

by postponing the check into arm_expand_builtin, we now emit something more 
useful:

testo.c: In function 'main':
testo.c:9:7: error: '__builtin_neon_vaddlsv8qi' neon builtin is not supported 
in this configuration.
   e = (int16x8_t)__builtin_neon_vaddlsv8qi (a, b);
   ^~~

One small side effect to note: The total memory allocated is 370k bigger when neon is not used, so this support will have a follow-up to make their initialization lazy. But I'd like first to stabilize the stuff for stage3 (or get it 
pre-approved if the memory is an issue)


tested without new failures with {,-mfpu=vfp,-mfpu=neon}{,-march=armv7-a\}
(a few tests that was fail are now unsupported)



I agree, the vector types (re)initialisation is a tricky part.
I've seen similar issues in the aarch64 work for target attributes

 bool
 arm_vector_mode_supported_p (machine_mode mode)
 {
-  /* Neon also supports V2SImode, etc. listed in the clause below.  */
-  if (TARGET_NEON && (mode == V2SFmode || mode == V4SImode || mode == V8HImode
+  if (mode == V2SFmode || mode == V4SImode || mode == V8HImode
   || mode == V4HFmode || mode == V16QImode || mode == V4SFmode
-  || mode == V2DImode || mode == V8HFmode))
-return true;
-
-  if ((TARGET_NEON || TARGET_IWMMXT)
-  && ((mode == V2SImode)
- || (mode == V4HImode)
- || (mode == V8QImode)))
+  || mode == V2DImode || mode == V8HFmode
+  || mode == V2SImode || mode == V4HImode || mode == V8QImode)
 return true;
 


So this allows vector modes unconditionally for all targets/fpu configurations?
I was tempted to do that in aarch64 when I was encountering similar issues.
In the end what worked for me was re-laying out the vector types in 
SET_CURRENT_FUNCTION
if necessary (https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01084.html)

Kyrill



[gomp4, committed] backport: "Fix oacc kernels default mapping for scalars"

2015-12-09 Thread Tom de Vries

Hi,

I've backported the patch "Fix oacc kernels default mapping for scalars" 
from trunk to gomp-4_0-branch.


Committed to gomp-4_0-branch.

Thanks,
- Tom
backport: "Fix oacc kernels default mapping for scalars"

2015-12-02  Tom de Vries  

	backport from trunk:
	* gimplify.c (enum gimplify_omp_var_data): Add enum value
	GOVD_MAP_FORCE.
	(oacc_default_clause): Fix default for scalars in oacc kernels.
	(gimplify_adjust_omp_clauses_1): Handle GOVD_MAP_FORCE.

	* c-c++-common/goacc/kernels-default-2.c: New test.
	* c-c++-common/goacc/kernels-default.c: New test.

---
 gcc/gimplify.c  | 21 +++--
 .../c-c++-common/goacc/kernels-default-2.c  | 17 +
 gcc/testsuite/c-c++-common/goacc/kernels-default.c  | 14 ++
 gcc/testsuite/gfortran.dg/goacc/reduction-2.f95 |  2 +-
 4 files changed, 47 insertions(+), 7 deletions(-)

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index e8964c6..d305165 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -90,8 +90,11 @@ enum gimplify_omp_var_data
   /* Flag for shared vars that are or might be stored to in the region.  */
   GOVD_WRITTEN = 131072,
 
+  /* Flag for GOVD_MAP, if it is a forced mapping.  */
+  GOVD_MAP_FORCE = 262144,
+
   /* OpenACC deviceptr clause.  */
-  GOVD_USE_DEVPTR = 1 << 18,
+  GOVD_USE_DEVPTR = 1 << 19,
 
   GOVD_DATA_SHARE_CLASS = (GOVD_SHARED | GOVD_PRIVATE | GOVD_FIRSTPRIVATE
 			   | GOVD_LASTPRIVATE | GOVD_REDUCTION | GOVD_LINEAR
@@ -5988,8 +5991,12 @@ oacc_default_clause (struct gimplify_omp_ctx *ctx, tree decl, unsigned flags)
   gcc_unreachable ();
 
 case ORT_ACC_KERNELS:
-  /* Everything under kernels are default 'present_or_copy'.  */
+  /* Scalars are default 'copy' under kernels, non-scalars are default
+	 'present_or_copy'.  */
   flags |= GOVD_MAP;
+  if (!AGGREGATE_TYPE_P (TREE_TYPE (decl)))
+	flags |= GOVD_MAP_FORCE;
+
   rkind = "kernels";
   break;
 
@@ -7700,10 +7707,12 @@ gimplify_adjust_omp_clauses_1 (splay_tree_node n, void *data)
 }
   else if (code == OMP_CLAUSE_MAP)
 {
-  OMP_CLAUSE_SET_MAP_KIND (clause,
-			   flags & GOVD_MAP_TO_ONLY
-			   ? GOMP_MAP_TO
-			   : GOMP_MAP_TOFROM);
+  int kind = (flags & GOVD_MAP_TO_ONLY
+		  ? GOMP_MAP_TO
+		  : GOMP_MAP_TOFROM);
+  if (flags & GOVD_MAP_FORCE)
+	kind |= GOMP_MAP_FLAG_FORCE;
+  OMP_CLAUSE_SET_MAP_KIND (clause, kind);
   if (DECL_SIZE (decl)
 	  && TREE_CODE (DECL_SIZE (decl)) != INTEGER_CST)
 	{
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-default-2.c b/gcc/testsuite/c-c++-common/goacc/kernels-default-2.c
new file mode 100644
index 000..232b123
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-default-2.c
@@ -0,0 +1,17 @@
+/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-fdump-tree-gimple" } */
+
+#define N 2
+
+void
+foo (void)
+{
+  unsigned int a[N];
+
+#pragma acc kernels
+  {
+a[0]++;
+  }
+}
+
+/* { dg-final { scan-tree-dump-times "map\\(tofrom" 1 "gimple" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-default.c b/gcc/testsuite/c-c++-common/goacc/kernels-default.c
new file mode 100644
index 000..58cd5e1
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-default.c
@@ -0,0 +1,14 @@
+/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-fdump-tree-gimple" } */
+
+void
+foo (void)
+{
+  unsigned int i;
+#pragma acc kernels
+  {
+i++;
+  }
+}
+
+/* { dg-final { scan-tree-dump-times "map\\(force_tofrom" 1 "gimple" } } */
diff --git a/gcc/testsuite/gfortran.dg/goacc/reduction-2.f95 b/gcc/testsuite/gfortran.dg/goacc/reduction-2.f95
index 3f46eac..9d34cbd 100644
--- a/gcc/testsuite/gfortran.dg/goacc/reduction-2.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/reduction-2.f95
@@ -17,6 +17,6 @@ end subroutine
 
 ! { dg-final { scan-tree-dump-times "target oacc_parallel firstprivate.a." 1 "gimple" } }
 ! { dg-final { scan-tree-dump-times "acc loop reduction..:a. private.p." 1 "gimple" } }
-! { dg-final { scan-tree-dump-times "target oacc_kernels map.tofrom:a .len: 4.." 1 "gimple" } }
+! { dg-final { scan-tree-dump-times "target oacc_kernels map.force_tofrom:a .len: 4.." 1 "gimple" } }
 ! { dg-final { scan-tree-dump-times "acc loop reduction..:a. private.k." 1 "gimple" } }
 


Re: [patch] remove WCHAR_TYPE definition for FreeBSD PowerPC64

2015-12-09 Thread Andreas Tobler

On 06.12.15 23:54, Andreas Tobler wrote:

Hi,

I'm going to commit this patch to trunk, 5.4 and 4.9 branch if there are
no objections.

The redefinition of WCHAR_TYPE for PowerPC64 is wrong since its
beginning. My fault.

We use the definition from freebsd.h.



Committed to trunk. But I left the #undef WCHAR_TYPE. I deleted to fast..

For the diff see:

https://gcc.gnu.org/ml/gcc-cvs/2015-12/msg00379.html

I'll apply the same fix later to gcc-5 and gcc-4.9 tree.

Andreas


2015-12-06  Andreas Tobler  

* config/rs6000/freebsd64.h: Remove the redefinition of WCHAR_TYPE.

Index: freebsd64.h
===
--- freebsd64.h (revision 231129)
+++ freebsd64.h (working copy)
@@ -316,8 +316,6 @@
   #define PTRDIFF_TYPE (TARGET_64BIT ? "long int" : "int")

   /* rs6000.h gets this wrong for FreeBSD.  We use the GCC defaults
instead.  */
-#undef WCHAR_TYPE
-#defineWCHAR_TYPE  (TARGET_64BIT ? "int" : "long int")
   #undef  WCHAR_TYPE_SIZE
   #define WCHAR_TYPE_SIZE 32





Re: [PATCH] enable loop fusion on isl-15

2015-12-09 Thread Richard Biener
On December 9, 2015 6:45:42 PM GMT+01:00, Sebastian Pop  
wrote:
>Richard Biener wrote:
>> On Fri, Dec 4, 2015 at 8:59 PM, Sebastian Paul Pop
> wrote:
>> > I would highly recommend updating the required version of ISL to
>isl-0.15:
>> > that would simplify the existing code, removing a lot of code under
>"#ifdef
>> > old ISL",
>> > and allow us to fully transition to schedule_trees instead of
>dealing with
>> > the
>> > old antiquated union_maps in the scheudler.  The result is faster
>> > compilation time.
>> 
>> Hmm.  I think we agreed to raise the requirement to ISL 0.14.  OTOH
>the plan
>> was to make graphite enabled by -O3 [-fprofile-use] by default which
>would
>> mean making ISL a hard host requirement.  That raises the barrier on
>making
>> the version requirement stricter ...
>> 
>> Sebastian, were quite into stage3 already - what's your plans /
>progress with
>> the defaulting of GRPAHITE?  (compile-time / performance numbers
>though
>> I see ICEs still popping up - a good thing in some sense as it looks
>like
>> GRAPHITE gets testing).
>
>Richard,
>
>to enable Graphite by default in -O3 we had two requirements:
>- reduce compilation time when compiling with Graphite,
>- show performance improvements when using Graphite.
>
>We did a good work in reducing the overall compilation time used by
>Graphite
>with the rewrite of the SCoP detection algorithm.  Overall compilation
>time
>spent in SCoP detection on tramp3d-v4 was reduced from 7% to 0.3%. 
>During a
>bootstrap of GCC the slowest SCoP detection was 0.3%, and the overall
>percentage
>overhead in terms of number of instructions executed by the SCoP
>detection was
>reduced from 0.3% to 0.01% (measured with counting the number of
>instructions
>with valgrind.)  Details of algorithmic changes and experiments are in
>a paper
>that we submitted (and just got accepted) in the polyhedral workshop
>IMPACT'16:
>https://gcc.gnu.org/wiki/Graphite?action=AttachFile=view=scop-detection-submitted-for-review.pdf
>
>On the code generation side, we made sure to keep the representation in
>SSA all
>the time, such that when Graphite does not transform the code, the
>original code
>is left in place with no change.  When Graphite does a transform, the
>generated
>code is also in SSA form, and we fixed several scalar performance
>issues linked
>to the loop level where scalar definitions were inserted: now we
>compute the
>insertion place such that there is no need for other code motion passes
>like
>LICM to optimize the code generated by Graphite.
>
>We are still working on tuning Graphite's fusion, tiling, and
>interchange and we
>will report improvements on the Polybench loop kernels soon.

Thanks for the progress report!

Let's raise the ISL requirement to 0.14 now to see whether it imposes problems 
on some hosts.

Richard.

>Thanks,
>Sebastian
>
>> 
>> Thanks,
>> Richard.
>> 
>> > Thanks,
>> > Sebastian
>> >
>> > -Original Message-
>> > From: Mike Stump [mailto:mikest...@comcast.net]
>> > Sent: Friday, December 04, 2015 12:03 PM
>> > To: Alan Lawrence
>> > Cc: Sebastian Pop; seb...@gmail.com; gcc-patches@gcc.gnu.org;
>> > hiradi...@msn.com
>> > Subject: Re: [PATCH] enable loop fusion on isl-15
>> >
>> > On Dec 4, 2015, at 5:13 AM, Alan Lawrence 
>wrote:
>> >> On 05/11/15 21:43, Sebastian Pop wrote:
>> >>>* graphite-optimize-isl.c (optimize_isl): Call
>> >>>isl_options_set_schedule_maximize_band_depth.
>> >>>
>> >>>* gcc.dg/graphite/fuse-1.c: New.
>> >>>* gcc.dg/graphite/fuse-2.c: New.
>> >>>* gcc.dg/graphite/interchange-13.c: Remove bogus check.
>> >>
>> >> I note that the test
>> >>
>> >> scan-tree-dump-times forwprop4 "gimple_simplified to[^\\n]*\\^ 12"
>1
>> >>
>> >> FAILs under isl-0.14, with which GCC can still be built and
>generally
>> > claims to work.
>> >>
>> >> Is it worth trying to detect this in the testsuite, so we can
>XFAIL it? By
>> > which I mean, is there a reasonable testsuite mechanism by which we
>could do
>> > that?
>> >
>> > You can permanently ignore it by updating to 0.15?  I don't see the
>> > advantage of bothering to finesse this too much.  I don't know of a
>way to
>> > detect 14 v 15 other than this test case, but, if you do that, you
>can't use
>> > that result to gate this test case.  If one wanted to engineer in a
>way, one
>> > would expose the isl version via a preprocessor symbol (built in),
>and then
>> > the test case would use that to gate it.  If we had to fix it, I
>think I'd
>> > prefer we just raise the isl version to 15 or later and be done
>with it.
>> >




[PATCH 2/3] [graphite] add array access function in the right order

2015-12-09 Thread Sebastian Pop
we used to add the access functions in the wrong order, Fortran style, leading 
to unprofitable interchanges.
---
 gcc/graphite-sese-to-poly.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/graphite-sese-to-poly.c b/gcc/graphite-sese-to-poly.c
index 887c212..480c552 100644
--- a/gcc/graphite-sese-to-poly.c
+++ b/gcc/graphite-sese-to-poly.c
@@ -912,7 +912,7 @@ pdr_add_memory_accesses (isl_map *acc, dr_info )
   for (i = 0; i < nb_subscripts; i++)
 {
   isl_pw_aff *aff;
-  tree afn = DR_ACCESS_FN (dr, nb_subscripts - 1 - i);
+  tree afn = DR_ACCESS_FN (dr, i);
 
   aff = extract_affine (scop, afn,
isl_space_domain (isl_map_get_space (acc)));
-- 
1.9.1



[Patch] Fix for MIPS PR target/65604

2015-12-09 Thread Steve Ellcey
This is a MIPS patch to make mips_output_division obey the
-fno-delayed-branch flag.  Right now, with mips1 and -mcheck-zero-division,
the division instruction is put into the bne delay slot even when
-fno-delayed-branch is specified.  This change uses a similar strategy
to MIPS16 where we do the division first and then do the zero test
while the division is being calculated.  Tested with mips1 runs and by
inspecting the code that is output.

OK to checkin?

Steve Ellcey
sell...@imgtec.com


2015-12-09  Steve Ellcey  

PR target/65604
* config/mips/mips.c (mips_output_division): Check flag_delayed_branch.


diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index 6145944..8444a91 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -13687,9 +13687,17 @@ mips_output_division (const char *division, rtx 
*operands)
}
   else
{
- output_asm_insn ("%(bne\t%2,%.,1f", operands);
- output_asm_insn (s, operands);
- s = "break\t7%)\n1:";
+ if (flag_delayed_branch)
+   {
+ output_asm_insn ("%(bne\t%2,%.,1f", operands);
+ output_asm_insn (s, operands);
+ s = "break\t7%)\n1:";
+   }
+ else
+   {
+ output_asm_insn (s, operands);
+ s = "bne\t%2,%.,1f\n\tnop\n\tbreak\t7\n1:";
+   }
}
 }
   return s;


Re: PATCH to shorten_compare -Wtype-limits handling

2015-12-09 Thread Manuel López-Ibáñez
On 20 November 2015 at 22:28, Jeff Law  wrote:
>>> That was the overall plan and he posted a patch for that.  But that patch
>>> didn't do the due diligence to verify that once the shortening code was
>>> made
>>> "pure" that we didn't regress on the quality of the code we generated.
>>
>>
>> I thought that the original plan was to make the warning code also use
>> match.pd. That is, that all folding, including FE folding, will be
>> match.pd-based. Has this changed?
>
> I don't think that's changed.
>
> Detangling the two is the first step. Once detangled we can then look to
> move the warning to a more suitable location -- right now it's in the C/C++
> front-ends, firing way too early.  Instead it ought to be checked in gimple
> form, after match.pd canonicalization and simplifications.

Perhaps worth noting is that the warnings in shorten_compare are
basically trying to figure out if X CMP_OP Y can be folded to
true/false. They do not care about shortening/optimizing the types and
conversions. Perhaps an intermediate step would be to duplicate the
function into a warning version and an optimization version, making
the warning version pure as you say, call the warning version first,
then the optimizing one. Note that this is not exactly what you want
to have at the end and it adds a bit of duplication, but it gives a
clear separation between what is needed for warning and what is needed
for optimizing.This is step one.

Step two makes the warning version fold using match.pd, fix any
warning regressions. Note that the warning version does not really
need to be pure. If the comparison can be folded to true/false, there
is no much more optimization to be done. But I'm not sure how the FEs
are handling results from folding for warnings. Otherwise, one is
probably duplicating a bit of analysis between the warning and
optimizing functions. Not sure if the overhead will be measurable at
all once the warning version uses match.pd.

Step three moves the optimizing version to match.pd. Identifying
regressions here may be harder, but no harder than any other move of
optimizations to match.pd.

The attached patch implements step 1. Feel free to do as you please with it.

Cheers,

Manuel.
Index: c-common.c
===
--- c-common.c  (revision 230753)
+++ c-common.c  (working copy)
@@ -4419,10 +4419,251 @@ expr_original_type (tree expr)
 {
   STRIP_SIGN_NOPS (expr);
   return TREE_TYPE (expr);
 }
 
+static void
+warn_shorten_compare (location_t loc, tree op0, tree op1,
+ tree restype, enum tree_code rescode)
+{
+  if (!warn_type_limits)
+return;
+
+  switch (rescode)
+{
+case NE_EXPR:
+case EQ_EXPR:
+case LT_EXPR:
+case GT_EXPR:
+case LE_EXPR:
+case GE_EXPR:
+  break;
+
+default:
+  return;
+}
+
+  /* Throw away any conversions to wider types
+ already present in the operands.  */
+  int unsignedp0, unsignedp1;
+  tree primop0 = c_common_get_narrower (op0, );
+  tree primop1 = c_common_get_narrower (op1, );
+
+  /* If primopN is first sign-extended from primopN's precision to opN's
+ precision, then zero-extended from opN's precision to
+ restype precision, shortenings might be invalid.  */
+  if (TYPE_PRECISION (TREE_TYPE (primop0)) < TYPE_PRECISION (TREE_TYPE (op0))
+  && TYPE_PRECISION (TREE_TYPE (op0)) < TYPE_PRECISION (restype)
+  && !unsignedp0
+  && TYPE_UNSIGNED (TREE_TYPE (op0)))
+primop0 = op0;
+  if (TYPE_PRECISION (TREE_TYPE (primop1)) < TYPE_PRECISION (TREE_TYPE (op1))
+  && TYPE_PRECISION (TREE_TYPE (op1)) < TYPE_PRECISION (restype)
+  && !unsignedp1
+  && TYPE_UNSIGNED (TREE_TYPE (op1)))
+primop1 = op1;
+
+  /* Handle the case that OP0 does not *contain* a conversion
+ but it *requires* conversion to FINAL_TYPE.  */
+  if (op0 == primop0 && TREE_TYPE (op0) != restype)
+unsignedp0 = TYPE_UNSIGNED (TREE_TYPE (op0));
+  if (op1 == primop1 && TREE_TYPE (op1) != restype)
+unsignedp1 = TYPE_UNSIGNED (TREE_TYPE (op1));
+
+  /* If one of the operands must be float, we do not warn.  */
+  if (TREE_CODE (TREE_TYPE (primop0)) == REAL_TYPE
+  || TREE_CODE (TREE_TYPE (primop1)) == REAL_TYPE)
+return;
+
+  /* If first arg is constant, swap the args (changing operation
+ so value is preserved), for canonicalization.  Don't do this if
+ the second arg is 0.  */
+
+  if (TREE_CONSTANT (primop0)
+  && !integer_zerop (primop1) && !real_zerop (primop1)
+  && !fixed_zerop (primop1))
+{
+  std::swap (primop0, primop1);
+  std::swap (op0, op1);
+  std::swap (unsignedp0, unsignedp1);
+
+  switch (rescode)
+   {
+   case LT_EXPR:
+ rescode = GT_EXPR;
+ break;
+   case GT_EXPR:
+ rescode = LT_EXPR;
+ break;
+   case LE_EXPR:
+ rescode = GE_EXPR;
+ break;
+   case GE_EXPR:
+ rescode = LE_EXPR;
+ break;
+  

Re: [AArch64] Emit square root using the Newton series

2015-12-09 Thread Evandro Menezes

On 12/09/2015 11:16 AM, Kyrill Tkachov wrote:


On 09/12/15 17:02, Kyrill Tkachov wrote:


On 09/12/15 16:59, Evandro Menezes wrote:

On 12/09/2015 10:52 AM, Kyrill Tkachov wrote:

Hi Evandro,

On 08/12/15 21:35, Evandro Menezes wrote:

Emit square root using the Newton series

   2015-12-03  Evandro Menezes 

   gcc/
* config/aarch64/aarch64-protos.h (aarch64_emit_swsqrt):
   Declare new
function.
* config/aarch64/aarch64-simd.md (sqrt2): New
   expansion and
insn definitions.
* config/aarch64/aarch64-tuning-flags.def
(AARCH64_EXTRA_TUNE_FAST_SQRT): New tuning macro.
* config/aarch64/aarch64.c (aarch64_emit_swsqrt): Define
   new function.
* config/aarch64/aarch64.md (sqrt2): New expansion
   and insn
definitions.
* config/aarch64/aarch64.opt (mlow-precision-recip-sqrt):
   Expand option
description.
* doc/invoke.texi (mlow-precision-recip-sqrt): Likewise.

This patch extends the patch that added support for implementing 
x^-1/2 using the Newton series by adding support for x^1/2 as well.


Is it OK at this point of stage 3?



A comment on the patch itself from me...

diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def 
b/gcc/config/aarch64/aarch64-tuning-flags.def

index 6f7dbce..11c6c9a 100644
--- a/gcc/config/aarch64/aarch64-tuning-flags.def
+++ b/gcc/config/aarch64/aarch64-tuning-flags.def
@@ -30,4 +30,4 @@

 AARCH64_EXTRA_TUNING_OPTION ("rename_fma_regs", RENAME_FMA_REGS)
 AARCH64_EXTRA_TUNING_OPTION ("recip_sqrt", RECIP_SQRT)
-
+AARCH64_EXTRA_TUNING_OPTION ("fast_sqrt", FAST_SQRT)

That seems like a misleading name to me.
If we're doing this, that means that the sqrt instruction is not 
faster

than doing the inverse sqrt estimation followed by a multiply.
I think a name like "synth_sqrt" or "estimate_sqrt" or something 
along those lines

is more appropriate.


Unfortunately, this is the case on Exynos M1: the series is faster 
than the instruction. :-(  So, other targets when this is also true, 
using the "fast_sqrt" option might make sense.




Sure, but the way your patch is written, we will emit the series when 
"fast_sqrt" is set, rather

than the other way around, unless I'm misreading the logic in:



Sorry, what I meant to say is it would be clearer, IMO, to describe 
the compiler action that is being taken
(e.g. the rename_fma_regs tuning flag), in this case it's estimating 
sqrt using a series.


Kyrill

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md

index 030a101..f6d2da4 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -4280,7 +4280,23 @@

 ;; sqrt

-(define_insn "sqrt2"
+(define_expand "sqrt2"
+  [(set (match_operand:VDQF 0 "register_operand")
+(sqrt:VDQF (match_operand:VDQF 1 "register_operand")))]
+  "TARGET_SIMD"
+{
+  if ((AARCH64_EXTRA_TUNE_FAST_SQRT & 
aarch64_tune_params.extra_tuning_flags)

+  && !optimize_function_for_size_p (cfun)
+  && flag_finite_math_only
+  && !flag_trapping_math
+  && flag_unsafe_math_optimizations)
+{
+  aarch64_emit_swsqrt (operands[0], operands[1]);
+  DONE;
+}
+})



Kyrill,

How about "approx_sqrt" for, you guessed it, approximate square root?  
The same adjective would perhaps describe "recip_sqrt" better too.


Thanks,

--
Evandro Menezes



[gomp-nvptx] git branch created

2015-12-09 Thread Alexander Monakov
Hello,

I have created a git-only branch "gomp-nvptx" for development of OpenMP
offloading on NVPTX.  Changelogs on the branch have the corresponding suffix,
and I'll use it to tag emails with patches for the branch.

The branch currently has 41 patch on top of yesterday's trunk.  Some of
those have not been posted yet (for various reasons like being temporary or
incomplete), but most important stuff has been posted.

Building correct -mgomp multilib for Newlib requires a patch, which has been
requested for inclusion here:
https://github.com/MentorEmbedded/nvptx-newlib/pull/2

The libgomp c/c++ testsuite has 6 failures, of those 2 due to missing alloca,
2 due to missing usleep, and 2 due to stubbed-out GOMP_OFFLOAD_async_run.
Fortran testsuite has several failures due to incomplete libgomp port.

Thanks.
Alexander


Re: PING^1: [PATCH] Add TYPE_EMPTY_RECORD for C++ empty class

2015-12-09 Thread H.J. Lu
On Wed, Dec 9, 2015 at 6:05 AM, Richard Biener
 wrote:
> On Tue, Dec 8, 2015 at 5:22 PM, H.J. Lu  wrote:
>> On Mon, Nov 23, 2015 at 12:53 PM, H.J. Lu  wrote:
>>> On Mon, Nov 23, 2015 at 1:57 AM, Richard Biener
>>>  wrote:
 On Sat, Nov 21, 2015 at 12:46 AM, H.J. Lu  wrote:
> On Fri, Nov 20, 2015 at 2:17 PM, Jason Merrill  wrote:
>> On 11/20/2015 01:52 PM, H.J. Lu wrote:
>>>
>>> On Tue, Nov 17, 2015 at 4:22 AM, Richard Biener
>>>  wrote:

 On Tue, Nov 17, 2015 at 12:01 PM, H.J. Lu  wrote:
>
> Empty record should be returned and passed the same way in C and C++.
> This patch adds LANG_HOOKS_EMPTY_RECORD_P for C++ empty class, which
> defaults to return false.  For C++, LANG_HOOKS_EMPTY_RECORD_P is 
> defined
> to is_really_empty_class, which returns true for C++ empty classes.  
> For
> LTO, we stream out a bit to indicate if a record is empty and we store
> it in TYPE_LANG_FLAG_0 when streaming in.  get_ref_base_and_extent is
> changed to set bitsize to 0 for empty records.  Middle-end and x86
> backend are updated to ignore empty records for parameter passing and
> function value return.  Other targets may need similar changes.


 Please avoid a new langhook for this and instead claim a bit in
 tree_type_common
 like for example restrict_flag (double-check it is unused for
 non-pointers).
>>>
>>>
>>> There is no bit in tree_type_common I can overload.  restrict_flag is
>>> checked for non-pointers to issue an error when it is used on
>>> non-pointers:
>>>
>>>
>>> /export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/template/qualttp20.C:19:38:
>>> error: ‘__restrict__’ qualifiers cannot be applied to ‘AS::L’
>>> typedef typename T::L __restrict__ r;// { dg-error "'__restrict__'
>>> qualifiers cannot" "" }
>>
>>
>> The C++ front end only needs to check TYPE_RESTRICT for this purpose on
>> front-end-specific type codes like TEMPLATE_TYPE_PARM; cp_type_quals 
>> could
>> handle that specifically if you change TYPE_RESTRICT to only apply to
>> pointers.
>>
>
> restrict_flag is also checked in this case:
>
> [hjl@gnu-6 gcc]$ cat x.i
> struct dummy { };
>
> struct dummy
> foo (struct dummy __restrict__ i)
> {
>   return i;
> }
> [hjl@gnu-6 gcc]$ gcc -S x.i -Wall
> x.i:4:13: error: invalid use of ‘restrict’
>  foo (struct dummy __restrict__ i)
>  ^
> x.i:4:13: error: invalid use of ‘restrict’
> [hjl@gnu-6 gcc]$
>
> restrict_flag can't also be used to indicate `i' is an empty record.

 I'm sure this error can be done during parsing w/o relying on 
 TYPE_RESTRICT.

 But well, use any other free bit (but do not enlarge
 tree_type_common).  Eventually
 you can free up a bit by putting sth into type_lang_specific currently
 using bits
 in tree_type_common.
>>>
>>> There are no bits in tree_type_common I can move.  Instead,
>>> this patch overloads side_effects_flag in tree_base.  Tested on
>>> Linux/x86-64.  OK for trunk?
>
> I'm fine with using side_effects_flag for this.
>
> I miss an explanation of where this detail of the ABI is documented and wonder
> if the term "empty record" is also used in that document and how it is
> documented.
>
> Thus
>
> +/* Nonzero in a type considered an empty record.  */
> +#define TYPE_EMPTY_RECORD(NODE) \
> +  (TYPE_CHECK (NODE)->base.side_effects_flag)
>
> should refer to the ABI where is defined what an "empty record" is and how
> it is handled by the backend(s).

Empty C++ class is a corner case which isn't covered in psABI nor C++ ABI.
There is no mention of "empty record" in GCC documentation.  But there are
plenty of "empty class" in gcc/cp.  This change affects all targets.  C++ ABI
should specify how it should be passed.

> +/* Return true if type T is an empty record.  */
> +
> +static inline bool
> +type_is_empty_record_p (const_tree t)
> +{
> +  return TYPE_EMPTY_RECORD (TYPE_MAIN_VARIANT (t));
>
> the type checker should probably check the bit is consistent across
> variants so it can be tested on any of them.

TYPE_EMPTY_RECORD is only relevant for parameter passing.  For

struct foo;
typedef foo bar;

TYPE_EMPTY_RECORD has no impact.  Since call only uses
the main variant type, checking TYPE_MAIN_VARIANT is sufficient.

> You fail to adjust other targets gimplification hooks which suggests
> this is a detail of the x86 psABI and not the C++ ABI?  If so I miss
> a -Wpsabi warning for this change.

My change is generic.  The only x86 specific change is

diff --git a/gcc/config/i386/i386.c 

Re: [Patch, Fortran] PR68815 - replace '%s' quotes by %< ... %>

2015-12-09 Thread Steve Kargl
On Wed, Dec 09, 2015 at 04:53:37PM +0100, Tobias Burnus wrote:
> 
> Build and regtested on x86-64-gnu-linux.
> OK for the trunk?
> 

OK.

-- 
Steve


[PATCH 3/3] [graphite] specify more isl codegen options

2015-12-09 Thread Sebastian Pop
---
 gcc/graphite-optimize-isl.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/gcc/graphite-optimize-isl.c b/gcc/graphite-optimize-isl.c
index f90fcfd..50f2b3c 100644
--- a/gcc/graphite-optimize-isl.c
+++ b/gcc/graphite-optimize-isl.c
@@ -50,6 +50,7 @@ along with GCC; see the file COPYING3.  If not see
 #include 
 #ifdef HAVE_ISL_OPTIONS_SET_SCHEDULE_SERIALIZE_SCCS
 #include 
+#include 
 #endif
 
 #include "graphite.h"
@@ -405,7 +406,14 @@ optimize_isl (scop_p scop)
   isl_options_set_schedule_maximize_band_depth (scop->isl_context, 1);
 #ifdef HAVE_ISL_OPTIONS_SET_SCHEDULE_SERIALIZE_SCCS
   /* ISL-0.15 or later.  */
+  isl_options_set_schedule_serialize_sccs (scop->isl_context, 0);
   isl_options_set_schedule_maximize_band_depth (scop->isl_context, 1);
+  isl_options_set_schedule_max_constant_term (scop->isl_context, 20);
+  isl_options_set_schedule_max_coefficient (scop->isl_context, 20);
+  isl_options_set_tile_scale_tile_loops (scop->isl_context, 0);
+  isl_options_set_coalesce_bounded_wrapping (scop->isl_context, 1);
+  isl_options_set_ast_build_exploit_nested_bounds (scop->isl_context, 1);
+  isl_options_set_ast_build_atomic_upper_bound (scop->isl_context, 1);
 #else
   isl_options_set_schedule_fuse (scop->isl_context, ISL_SCHEDULE_FUSE_MIN);
 #endif
-- 
1.9.1



[PATCH 1/3] [graphite] dump param name when timing out

2015-12-09 Thread Sebastian Pop
---
 gcc/graphite-optimize-isl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/graphite-optimize-isl.c b/gcc/graphite-optimize-isl.c
index 8727e39..f90fcfd 100644
--- a/gcc/graphite-optimize-isl.c
+++ b/gcc/graphite-optimize-isl.c
@@ -426,7 +426,7 @@ optimize_isl (scop_p scop)
   if (!schedule || isl_ctx_last_error (scop->isl_context) == isl_error_quota)
 {
   if (dump_file && dump_flags)
-   fprintf (dump_file, "ISL timed out at %d operations\n",
+   fprintf (dump_file, "ISL timed out --param max-isl-operations=%d\n",
 max_operations);
   if (schedule)
isl_schedule_free (schedule);
-- 
1.9.1



Re: [PATCH][ARC] Refurbish emitting DWARF2 for epilogue.

2015-12-09 Thread Joern Wolfgang Rennecke


On 09/12/15 15:34, Claudiu Zissulescu wrote:


Well, it seems to me that we prefer to disable optimizations when talking about 
debug related information (see PR target/60598 git-svn-id: 
svn+ssh://gcc.gnu.org/svn/gcc/trunk@208749 138bc75d-0d04-0410-961f-82ee72b054a4 
commit).
Actually, unwind information might also be needed for exception 
handling, depending on the target.

(sjlj exception handling or dwarf2)
In which case it becomes a matter of program correctness if any 
exception might be raised that

could lead to incorrect unwinding due to incorrect/incomplete unwind info.

So were both wrong: this is not merely about debugging.

OTOH, the example you give also shows a much more nuanced approach to 
throttling optimization:
the patch doesn't dead all epilogue scheduling, but specifically tests 
for the presence of a frame related

insn at the point where it could cause trouble.
The equivalent would be to reject a frame related insn in the delay slot 
of a conditional branch.
For arc.md, that would likely involve splitting four of the define_delay 
statements into two each.
If that seems a bit much, a compromize would be to patch in_delay_slot, 
thus affecting unconditional branches too.
More ambitious approaches would be to move the note to a (zero-sized, 
pseudo-)placeholder insn, or
finding a way to make the dwarf2cfi cope with speculated frame related 
insns.  Eg.g even if the delay slot

is not anulled, could we mark the note as anulled?

Another approach would be to fix fill_eager_delay_slots not to stuff 
unconditional frame related insns into

non-anulled delay slots of conditional branches.
Or have some target hook to make it not even bother filling delay slots 
speculatively; for targets that can
fully unexpose the delay slot, like SH and ARC >= ARC700, this aspect of 
fill_eager_delay_slots only mucks up

schedules and increases code size.


Unfortunately, dwarf2cfi checks the paths for consistency (dwarf2cfi.c:2284), 
throwing an error if those paths are not ok. Also, with ARC gcc we focus on 
Linux type of application, hence, the unwinding info needs to be consistent.
As far as I can see, having a sort of mno-epilogue-cfi option will just inhibit 
or not the emitting of the blockage instruction, and the option will be valid 
only when compiling without emitting dwarf information (i.e., the 
mno-epilogue-cfi is incompatible with -g).
Personally, I do not see the benefit of having such an option, as one may lose 
like 1 cycle per function call (HS/EM cpus) in very particular cases. Running 
the dg.exp, compile.exp, execute.exp, and builing Linux with some default apps, 
we found only 4 cases in which the branch scheduler slot needs the blockage 
mechanism.


The number of problems you found needs not bear any relation to the 
number of optimizations
suppressed by the blockage instruction.  Consider the case when there 
are high-latency unconditional
instructions just before a lengthy epilogue.  sched2 scheduling some 
epilogue instructions into the scheduling bubbles can hide the latencies.
More relevant ways to get data would be comparing the object files (from 
a whole toolchain library set and/or one or more big application(s)) 
built with/without the blockage insn emitted, or to benchmark it.

  Also, adding blockage before generating any prologue instruction seems to be 
a wide spread practice in gcc backends (see SH for example).
It is a quick way to 'fix' (actually, hide) bugs.  At the expense of 
optimization.
It can be justified if the resources to keep a port in working order are 
limited, and so are the forgone

optimization benefits.
But at a minimum, you should add a comment to explain what problem you 
are papering over.
That works best if you actually file a bug report in bugzilla first 
(about the interaction of fill_eager_delay_slots and dwarf2cfi) so that 
you can name the bug.


Re: [Fortran, Patch] Memory sync after coarray image control statements and assignment

2015-12-09 Thread Tobias Burnus

Alessandro Fanfarillo wrote:

Done.

Thanks. Committed as r231476.

Do we need to do anything about GCC 5 or is this only a GCC 6 issue?


I have permission for contributing but I don't have write permission
on the repository.


That can be changed: Simply fill out the form and list me (burnus (at] 
gcc.gnu.org) as sponsor: https://sourceware.org/cgi-bin/pdw/ps_form.cgi 
– and see https://gcc.gnu.org/svnwrite.html


Tobias


Re: [PATCH] rtlanal: Fix bits/bytes confusion in set_noop_p (PR68814)

2015-12-09 Thread Segher Boessenkool
On Wed, Dec 09, 2015 at 10:13:34PM +0100, Eric Botcazou wrote:
> > The meaning of ZERO_EXTRACT depends on BITS_BIG_ENDIAN, not on
> > BYTES_BIG_ENDIAN.
> 
> That's correct.

> > --- a/gcc/rtlanal.c
> > +++ b/gcc/rtlanal.c
> > @@ -1534,7 +1534,7 @@ set_noop_p (const_rtx set)
> > 
> >if (GET_CODE (dst) == ZERO_EXTRACT)
> >  return rtx_equal_p (XEXP (dst, 0), src)
> > -  && ! BYTES_BIG_ENDIAN && XEXP (dst, 2) == const0_rtx
> > +  && !BITS_BIG_ENDIAN && XEXP (dst, 2) == const0_rtx
> >&& !side_effects_p (src);
> > 
> >if (GET_CODE (dst) == STRICT_LOW_PART)
> 
> I have a hard time figuring out what a SET verifying the condition would 
> really mean.  Could you post it and explain where it comes from?

Sure!

The condition is meant to say "if you set the low part of a reg to
be equal to that reg".  And it only handles little-endian numbering
(many things with zero_extract do).

My case comes from the gcc.dg/pr65492-2.c, the "test1int2" case.
combine has made an insn
modifying insn i321: zero_extract(%3:DI,0x20,0)=%3:DI
which is splatting the SImode parameter to both the high and low halves
of the dest reg.  Then, it tries to combine it with the USE of that reg
at the end of the function, giving

Failed to match this instruction:
(parallel [
(use (ior:DI (and:DI (reg/i:DI 3 3)
(const_int 4294967295 [0x]))
(ashift:DI (reg:DI 3 3 [ c ])
(const_int 32 [0x20]
(set (zero_extract:DI (reg/i:DI 3 3)
(const_int 32 [0x20])
(const_int 0 [0]))
(reg:DI 3 3 [ c ]))
])

and if it has a parallel of two which doesn't match, it sees if it
just needs one arm because the other is a noop set, and that ends
up with
deleting noop move 21
because of the wrong test, making the testcase fail.
(powerpc64le has BITS_BIG_ENDIAN set, a bit unusual).


Segher


[PATCH] Fix up noce_try_abs again (PR rtl-optimization/68670)

2015-12-09 Thread Jakub Jelinek
Hi!

Not sure what I've been thinking when writing the previous noce_try_abs fix.
I thought that the optimization can be applied for all the conditions,
and whether it can be applied depends on if it is cond ? ~x : x or
cond ? x : ~x.  But that is not the case, the optimization can be only
applied to a subset of conditions, and when it can be applied, it can be
applied to both the cond ? ~x : x and cond ? x : ~x cases (depending on
the condition one is one_cmpl_abs (x) and the other ~one_cmpl_abs (x)).
So, the previous patch fixed some cases (the ones triggered in the
testcases), kept other cases broken (as can be seen in the new testcases),
pessimized some cases (stopped applying one_cmpl_abs when it could be
applied) and kept optimizing other cases.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk/5.4?

2015-12-09  Jakub Jelinek  

PR rtl-optimization/68376
PR rtl-optimization/68670
* ifcvt.c (noce_try_abs): For one_cmpl allow < 0, >= 0
or > -1 conditions regardless of negate, and disallow
all other conditions.

* gcc.c-torture/execute/pr68376-2.c (f5, f6, f7, f8): New
tests.
(main): Call them.
* gcc.dg/pr68670-1.c: New test.
* gcc.dg/pr68670-2.c: New test.

--- gcc/ifcvt.c.jj  2015-12-03 16:39:36.0 +0100
+++ gcc/ifcvt.c 2015-12-09 17:18:35.418052077 +0100
@@ -2601,17 +2601,14 @@ noce_try_abs (struct noce_if_info *if_in
  Note that these rtx constants are known to be CONST_INT, and
  therefore imply integer comparisons.
  The one_cmpl case is more complicated, as we want to handle
- only x < 0 ? ~x : x or x >= 0 ? ~x : x but not
- x <= 0 ? ~x : x or x > 0 ? ~x : x, as the latter two
- have different result for x == 0.  */
+ only x < 0 ? ~x : x or x >= 0 ? x : ~x to one_cmpl_abs (x)
+ and x < 0 ? x : ~x or x >= 0 ? ~x : x to ~one_cmpl_abs (x),
+ but not other cases (x > -1 is equivalent of x >= 0).  */
   if (c == constm1_rtx && GET_CODE (cond) == GT)
-{
-  if (one_cmpl && negate)
-   return FALSE;
-}
+;
   else if (c == const1_rtx && GET_CODE (cond) == LT)
 {
-  if (one_cmpl && !negate)
+  if (one_cmpl)
return FALSE;
 }
   else if (c == CONST0_RTX (GET_MODE (b)))
@@ -2619,23 +2616,8 @@ noce_try_abs (struct noce_if_info *if_in
   if (one_cmpl)
switch (GET_CODE (cond))
  {
- case GT:
-   if (!negate)
- return FALSE;
-   break;
  case GE:
-   /* >= 0 is the same case as above > -1.  */
-   if (negate)
- return FALSE;
-   break;
  case LT:
-   if (negate)
- return FALSE;
-   break;
- case LE:
-   /* <= 0 is the same case as above < 1.  */
-   if (!negate)
- return FALSE;
break;
  default:
return FALSE;
--- gcc/testsuite/gcc.c-torture/execute/pr68376-2.c.jj  2015-11-19 
09:48:05.0 +0100
+++ gcc/testsuite/gcc.c-torture/execute/pr68376-2.c 2015-12-09 
17:58:57.0 +0100
@@ -26,6 +26,30 @@ f4 (int x)
   return x <= 0 ? x : ~x;
 }
 
+__attribute__((noinline, noclone)) int
+f5 (int x)
+{
+  return x >= 0 ? ~x : x;
+}
+
+__attribute__((noinline, noclone)) int
+f6 (int x)
+{
+  return x >= 0 ? x : ~x;
+}
+
+__attribute__((noinline, noclone)) int
+f7 (int x)
+{
+  return x > 0 ? ~x : x;
+}
+
+__attribute__((noinline, noclone)) int
+f8 (int x)
+{
+  return x > 0 ? x : ~x;
+}
+
 int
 main ()
 {
@@ -37,5 +61,13 @@ main ()
 abort ();
   if (f4 (5) != -6 || f4 (-5) != -5 || f4 (0) != 0)
 abort ();
+  if (f5 (5) != -6 || f5 (-5) != -5 || f5 (0) != -1)
+abort ();
+  if (f6 (5) != 5 || f6 (-5) != 4 || f6 (0) != 0)
+abort ();
+  if (f7 (5) != -6 || f7 (-5) != -5 || f7 (0) != 0)
+abort ();
+  if (f8 (5) != 5 || f8 (-5) != 4 || f8 (0) != -1)
+abort ();
   return 0;
 }
--- gcc/testsuite/gcc.dg/pr68670-1.c.jj 2015-12-09 17:22:41.465621589 +0100
+++ gcc/testsuite/gcc.dg/pr68670-1.c2015-12-09 17:22:05.251126555 +0100
@@ -0,0 +1,5 @@
+/* PR rtl-optimization/68670 */
+/* { dg-do run } */
+/* { dg-options "-O2 -ftracer" } */
+
+#include "../gcc.c-torture/execute/pr68376-1.c"
--- gcc/testsuite/gcc.dg/pr68670-2.c.jj 2015-12-09 17:22:44.528578868 +0100
+++ gcc/testsuite/gcc.dg/pr68670-2.c2015-12-09 17:22:31.641758606 +0100
@@ -0,0 +1,5 @@
+/* PR rtl-optimization/68670 */
+/* { dg-do run } */
+/* { dg-options "-O2 -ftracer" } */
+
+#include "../gcc.c-torture/execute/pr68376-2.c"

Jakub


Re: [PATCH] rtlanal: Fix bits/bytes confusion in set_noop_p (PR68814)

2015-12-09 Thread Eric Botcazou
> The meaning of ZERO_EXTRACT depends on BITS_BIG_ENDIAN, not on
> BYTES_BIG_ENDIAN.

That's correct.

> Testing in progress on powerpc64le-linux; if it passes, is this
> okay for trunk?
> 
> 
> Segher
> 
> 
> 2015-12-09  Segher Boessenkool  
> 
>   PR rtl-optimization/68814
>   * rtlanal.c (set_noop_p): Use BITS_BIG_ENDIAN instead of
>   BYTES_BIG_ENDIAN.
> ---
>  gcc/rtlanal.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
> index 67098e5..f893bca 100644
> --- a/gcc/rtlanal.c
> +++ b/gcc/rtlanal.c
> @@ -1534,7 +1534,7 @@ set_noop_p (const_rtx set)
> 
>if (GET_CODE (dst) == ZERO_EXTRACT)
>  return rtx_equal_p (XEXP (dst, 0), src)
> -&& ! BYTES_BIG_ENDIAN && XEXP (dst, 2) == const0_rtx
> +&& !BITS_BIG_ENDIAN && XEXP (dst, 2) == const0_rtx
>  && !side_effects_p (src);
> 
>if (GET_CODE (dst) == STRICT_LOW_PART)

I have a hard time figuring out what a SET verifying the condition would 
really mean.  Could you post it and explain where it comes from?

-- 
Eric Botcazou


[PATCH] Fix up fold_ctor_reference and fully_constant_vn_reference_p (PR tree-optimization/68785)

2015-12-09 Thread Jakub Jelinek
Hi!

On a testcase like below which would trigger UB at runtime we trigger
UB in the compiler, by reading uninitialized bytes.

The VCE folding for which native_{encode,interpret}_expr has been originally
written passes the length from the first one to the second one, so that
the latter can return NULL_TREE (not fold) if not enough bytes in the buffer
were filled.  I believe this is the shortest fix for this issue and makes
the code consistent with what is used in VCE folding.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2015-12-09  Jakub Jelinek  

PR tree-optimization/68785
* gimple-fold.c (fold_ctor_reference): Pass return value from
native_encode_expr to native_interpret_expr.
* tree-ssa-sccvn.c (fully_constant_vn_reference_p): Likewise.

* gcc.dg/pr68785.c: New test.

--- gcc/gimple-fold.c.jj2015-11-24 11:43:35.0 +0100
+++ gcc/gimple-fold.c   2015-12-09 10:48:06.824975709 +0100
@@ -5495,9 +5495,10 @@ fold_ctor_reference (tree type, tree cto
   && size <= MAX_BITSIZE_MODE_ANY_MODE)
 {
   unsigned char buf[MAX_BITSIZE_MODE_ANY_MODE / BITS_PER_UNIT];
-  if (native_encode_expr (ctor, buf, size / BITS_PER_UNIT,
- offset / BITS_PER_UNIT) > 0)
-   return native_interpret_expr (type, buf, size / BITS_PER_UNIT);
+  int len = native_encode_expr (ctor, buf, size / BITS_PER_UNIT,
+   offset / BITS_PER_UNIT);
+  if (len > 0)
+   return native_interpret_expr (type, buf, len);
 }
   if (TREE_CODE (ctor) == CONSTRUCTOR)
 {
--- gcc/tree-ssa-sccvn.c.jj 2015-12-04 17:19:12.0 +0100
+++ gcc/tree-ssa-sccvn.c2015-12-09 10:50:30.329960789 +0100
@@ -1370,8 +1370,9 @@ fully_constant_vn_reference_p (vn_refere
  else
{
  unsigned char buf[MAX_BITSIZE_MODE_ANY_MODE / BITS_PER_UNIT];
- if (native_encode_expr (ctor, buf, size, off) > 0)
-   return native_interpret_expr (ref->type, buf, size);
+ int len = native_encode_expr (ctor, buf, size, off);
+ if (len > 0)
+   return native_interpret_expr (ref->type, buf, len);
}
}
 }
--- gcc/testsuite/gcc.dg/pr68785.c.jj   2015-12-09 10:52:00.232698487 +0100
+++ gcc/testsuite/gcc.dg/pr68785.c  2015-12-09 10:50:54.0 +0100
@@ -0,0 +1,9 @@
+/* PR tree-optimization/68785 */
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+
+int
+foo (void)
+{
+  return *(int *) "";
+}

Jakub


Re: [PATCH] Better error recovery for merge-conflict markers (v4)

2015-12-09 Thread Jeff Law

On 12/09/2015 10:44 AM, Bernd Schmidt wrote:

Just thinking out loud - I guess it would be too much to hope for to
share lexers between frontends so that we need only one copy of this?


Probably :(


Someone slap sense into me, I just thought of deriving C and C++ parsers
from a common base class... (no this is not a suggestion for this patch).

It'd be nice.  But I don't even think it's on anyone's TODO list.


Wording-wise, should it be "merge conflict marker", rather
than "patch conflict marker"?

Clang spells it:
"error: version control conflict marker in file"
http://blog.llvm.org/2010/04/amazing-feats-of-clang-error-recovery.html#merge_conflicts



Yeah, if another compiler has a similar/identical diagnostic I think we
should just copy that unless there's a very good reason not to.

Agreed.




Rebased on top of r231445 (from yesterday).
Successfully bootstrapped on x86_64-pc-linux-gnu.
Adds 82 new PASSes to g++.sum and 27 new PASSes to gcc.sum.

OK for trunk?


I'm inclined to say yes since it was originally submitted in time and
it's hard to imagine how the change could be risky (I'll point out right
away that there are one or two other patches in the queue that were also
submitted in time which I feel should not be considered for gcc-6 at
this point due to risk).

Let's wait until the end of the week for objections, commit then.
As the person who has come the closest to objecting, I'll go on the 
record as "not objecting".


jeff



  1   2   >