Re: [PATCH] #52291 - clarify sync_fetch_and_OP for pointers

2016-01-20 Thread Jeff Law

On 01/20/2016 05:11 PM, Martin Sebor wrote:

The bug points out that while the __sync_fetch_and_OP intrinsics are
documented to have semantics equivalent to the "x OP= y" compound
assignment expressions, when used with pointer operands they actually
behave as if they operated on integers.  I.e., they are not scaled by
the size of the pointed-to type.

The attached patch brings the documentation of both the __sync_ and
the __atomic_ intrinsics into alignment with their actual effects.

Martin

PS See also c/64843 for some additional background.

gcc-52291.patch


2016-01-20  Martin Sebor

PR c/52291
* extend.texi (__sync Builtins): Clarify the semantics aof
__sync_fetch_and_OP built-ins on pointers.
(__atomic Builtins): Same.

OK
jeff


Re: Fixes for PR66178

2016-01-20 Thread Jeff Law

On 01/20/2016 07:25 AM, Bernd Schmidt wrote:

PR66178 has some testcases where we construct expressions involving
additions and subtractions of label addresses, and we crash when trying
to expand these. There are two different issues here, shown by various
testcases in the PR:

  * expand_expr_real_2 can drop EXPAND_INITIALIZER and then go into
a path where it wants to gen_reg_rtx.
  * simplify-rtx can turn subtractions into NOT operations in some
cases. That seems inadvisable for symbolic expressions.

The following was bootstrapped and tested on x86_64-linux. Ok for all
branches?

 PR middle-end/66178
 * expr.c (expand_expr_real_2) [PLUS_EXPR, MINUS_EXPR]: Don't
 drop EXPAND_INITIALIZER.
 * rtl.h (contains_symbolic_reference_p): Declare.
 * rtlanal.c (contains_symbolic_reference_p): New function.
 * simplify-rtx.c (simplify_binary_operation_1): Don't turn
 a subtraction into a NOT if symbolic constants are involved.

testsuite/
 PR middle-end/66178
 gcc.dg/torture/pr66178.c: New test.

pr66178.diff



Index: gcc/rtlanal.c
===
--- gcc/rtlanal.c   (revision 232555)
+++ gcc/rtlanal.c   (working copy)
@@ -6243,6 +6243,19 @@ contains_symbol_ref_p (const_rtx x)
return false;
  }

+/* Return true if RTL X contains a SYMBOL_REF or LABEL_REF.  */
+
+bool
+contains_symbolic_reference_p (const_rtx x)
+{
+  subrtx_iterator::array_type array;
+  FOR_EACH_SUBRTX (iter, array, x, ALL)
+if (SYMBOL_REF_P (*iter) || GET_CODE (*iter) == LABEL_REF)
+  return true;
+
+  return false;
+}
+
I thought we already had a routine to do this.  I thought we needed it 
in varasm.c, but couldn't find it.  Then I found contains_symbol_ref_p, 
but obviously that's not good enough because you need LABEL_REFs too.




  /* Return true if X contains a thread-local symbol.  */

  bool
Index: gcc/simplify-rtx.c
===
--- gcc/simplify-rtx.c  (revision 232555)
+++ gcc/simplify-rtx.c  (working copy)
@@ -2277,8 +2277,11 @@ simplify_binary_operation_1 (enum rtx_co
if (!HONOR_SIGNED_ZEROS (mode) && trueop0 == CONST0_RTX (mode))
return simplify_gen_unary (NEG, mode, op1, mode);

-  /* (-1 - a) is ~a.  */
-  if (trueop0 == constm1_rtx)
+  /* (-1 - a) is ~a, unless the expression avoids symbolic constants,
+in which case not retaining additions and subtractions could
+cause invalid assembly to be produced.  */

"avoids?" Somehow this isn't parsing well.

"unless the expression has symbolic constants" or something similar is 
what I think you want.


OK with the comment fixed.

jeff



Re: [SMS] Schedule normalization after scheduling branch

2016-01-20 Thread Jeff Law

On 01/18/2016 02:38 PM, Roman Zhuykov wrote:

Hello,

4 years ago when I create some SMS patches nobody cares about SMS
failures on ia64.  Now ia64 is even more dead but at least one of bugs
appears on powerpc - PR69252.
Proposed patch is here:
https://gcc.gnu.org/ml/gcc-patches/2011-12/txt00266.txt and it even
suits current trunk without modification.

Powerpc PR69252 situation seems to be the same as I described earlier
on ia64, I can discuss it once again.

Actually a pointer back to this is actually more helpful:

https://gcc.gnu.org/ml/gcc-patches/2011-12/msg01800.html

As that has more of the rationale for the change, including the key note 
that it's essentially duplicating a check/adjustment that Richard added 
in sms_schedule in a place that would make it apply after scheduling the 
branch instruction.


This is OK for the trunk.  Please go ahead and install the change.

Thanks,
jeff




TR29124 C++ Special Maths - Make pull functions into global namespace.

2016-01-20 Thread Ed Smith-Rowland
Now that libstdc++ installs a proper math.h we can piggyback on that to 
put in the last bit of TR29124.


This patch adds the math special functions to c_compatibility/math.h in 
the global namespace.

I remove the XFAILs from the compile_2.cc tests.

This converts 21 XFAILs into 21 PASSes.

Tested on x86_64-linux.

I understand if this is too late.
I'll put it up on trunk and backport after stage 1 reopens.

Meanwhile I'll commit this to the tr29124 branch.

Ed

2016-01-20  Edward Smith-Rowland  <3dw...@verizon.net>

TR29124 C++ Special Math -  pulls funcs into global namespace.
* include/c_compatibility/math.h: Import the TR29124 functions
into the global namespace.
* testsuite/special_functions/01_assoc_laguerre/compile_2.cc: Remove
xfail and make compile-only.
* testsuite/special_functions/02_assoc_legendre/compile_2.cc: Ditto.
* testsuite/special_functions/03_beta/compile_2.cc: Ditto.
* testsuite/special_functions/04_comp_ellint_1/compile_2.cc: Ditto.
* testsuite/special_functions/05_comp_ellint_2/compile_2.cc: Ditto.
* testsuite/special_functions/06_comp_ellint_3/compile_2.cc: Ditto.
* testsuite/special_functions/07_cyl_bessel_i/compile_2.cc: Ditto.
* testsuite/special_functions/08_cyl_bessel_j/compile_2.cc: Ditto.
* testsuite/special_functions/09_cyl_bessel_k/compile_2.cc: Ditto.
* testsuite/special_functions/10_cyl_neumann/compile_2.cc: Ditto.
* testsuite/special_functions/11_ellint_1/compile_2.cc: Ditto.
* testsuite/special_functions/12_ellint_2/compile_2.cc: Ditto.
* testsuite/special_functions/13_ellint_3/compile_2.cc: Ditto.
* testsuite/special_functions/14_expint/compile_2.cc: Ditto.
* testsuite/special_functions/15_hermite/compile_2.cc: Ditto.
* testsuite/special_functions/16_laguerre/compile_2.cc: Ditto.
* testsuite/special_functions/17_legendre/compile_2.cc: Ditto.
* testsuite/special_functions/18_riemann_zeta/compile_2.cc: Ditto.
* testsuite/special_functions/19_sph_bessel/compile_2.cc: Ditto.
* testsuite/special_functions/20_sph_legendre/compile_2.cc: Ditto.
* testsuite/special_functions/21_sph_neumann/compile_2.cc: Ditto.
Index: include/c_compatibility/math.h
===
--- include/c_compatibility/math.h  (revision 232610)
+++ include/c_compatibility/math.h  (working copy)
@@ -75,70 +75,71 @@
 #endif
 
 #if __STDCPP_WANT_MATH_SPEC_FUNCS__ == 1
-using std::assoc_laguerref
-using std::assoc_laguerrel
-using std::assoc_laguerre
-using std::assoc_legendref
-using std::assoc_legendrel
-using std::assoc_legendre
-using std::betaf
-using std::betal
-using std::beta
-using std::comp_ellint_1f
-using std::comp_ellint_1l
-using std::comp_ellint_1
-using std::comp_ellint_2f
-using std::comp_ellint_2l
-using std::comp_ellint_2
-using std::comp_ellint_3f
-using std::comp_ellint_3l
-using std::comp_ellint_3
-using std::cyl_bessel_if
-using std::cyl_bessel_il
-using std::cyl_bessel_i
-using std::cyl_bessel_jf
-using std::cyl_bessel_jl
-using std::cyl_bessel_j
-using std::cyl_bessel_kf
-using std::cyl_bessel_kl
-using std::cyl_bessel_k
-using std::cyl_neumannf
-using std::cyl_neumannl
-using std::cyl_neumann
-using std::ellint_1f
-using std::ellint_1l
-using std::ellint_1
-using std::ellint_2f
-using std::ellint_2l
-using std::ellint_2
-using std::ellint_3f
-using std::ellint_3l
-using std::ellint_3
-using std::expintf
-using std::expintl
-using std::expint
-using std::hermitef
-using std::hermitel
-using std::hermite
-using std::laguerref
-using std::laguerrel
-using std::laguerre
-using std::legendref
-using std::legendrel
-using std::legendre
-using std::riemann_zetaf
-using std::riemann_zetal
-using std::riemann_zeta
-using std::sph_besself
-using std::sph_bessell
-using std::sph_bessel
-using std::sph_legendref
-using std::sph_legendrel
-using std::sph_legendre
-using std::sph_neumannf
-using std::sph_neumannl
-using std::sph_neumann
-#endif
-#endif
+using std::assoc_laguerref;
+using std::assoc_laguerrel;
+using std::assoc_laguerre;
+using std::assoc_legendref;
+using std::assoc_legendrel;
+using std::assoc_legendre;
+using std::betaf;
+using std::betal;
+using std::beta;
+using std::comp_ellint_1f;
+using std::comp_ellint_1l;
+using std::comp_ellint_1;
+using std::comp_ellint_2f;
+using std::comp_ellint_2l;
+using std::comp_ellint_2;
+using std::comp_ellint_3f;
+using std::comp_ellint_3l;
+using std::comp_ellint_3;
+using std::cyl_bessel_if;
+using std::cyl_bessel_il;
+using std::cyl_bessel_i;
+using std::cyl_bessel_jf;
+using std::cyl_bessel_jl;
+using std::cyl_bessel_j;
+using std::cyl_bessel_kf;
+using std::cyl_bessel_kl;
+using std::cyl_bessel_k;
+using std::cyl_neumannf;
+using std::cyl_neumannl;
+using std::cyl_neumann;
+using std::ellint_1f;
+using std::ellint_1l;
+using std::ellint_1;
+using std::ellint_2f;
+using 

[PATCH] fix #69405 - [6 Regression] ICE in c_tree_printer on an invalid __atomic_fetch_add

2016-01-20 Thread Martin Sebor

The attached patch avoids printing a diagnostic referencing the type
of an incompatible argument to the __atomic_xxx built-ins when the
argument is in error.  Doing otherwise causes an ICE as pointed out
in the bug, for both of which I am to blame.

Martin
gcc/testsuite/ChangeLog:
2016-01-20  Martin Sebor  

	PR c/69405
	* gcc.dg/sync-fetch.c: New test.

gcc/c-family/ChangeLog:
2016-01-20  Martin Sebor  

	PR c/69405
	* c-common.c (sync_resolve_size): Avoid printing diagnostic about
an incompatible argument when the argument isn't a valid a tree
node.

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 1a2c21b..378afae 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -10704,8 +10704,11 @@ sync_resolve_size (tree function, vec *params, bool fetch)
 return size;
 
  incompatible:
-  error ("operand type %qT is incompatible with argument %d of %qE",
-	 argtype, 1, function);
+  /* Issue the diagnostic only if the argument is valid, otherwise
+ it would be redundant at best and could be misleading.  */
+  if (argtype != error_mark_node)
+error ("operand type %qT is incompatible with argument %d of %qE",
+	   argtype, 1, function);
   return 0;
 }
 
diff --git a/gcc/testsuite/gcc.dg/sync-fetch.c b/gcc/testsuite/gcc.dg/sync-fetch.c
new file mode 100644
index 000..44b6cdc
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/sync-fetch.c
@@ -0,0 +1,115 @@
+/* PR c/69405 - [6 Regression] ICE in c_tree_printer on an invalid
+   __atomic_fetch_add */
+/* Test to verify that the diagnostic doesn't cause an ICE when any
+   of the arguments to __atomic_fetch_OP is undeclared.  */
+/* { dg-do compile } */
+
+void test_add_undeclared_first_arg (void)
+{
+  int a = 0;
+  __atomic_fetch_add (, , 0);   /* { dg-error ".b. undeclared" } */
+}
+
+void test_sub_undeclared_first_arg (void)
+{
+  int a = 0;
+  __atomic_fetch_sub (, , 0);  /* { dg-error ".b. undeclared" } */
+}
+
+void test_or_undeclared_first_arg (void)
+{
+  int a = 0;
+  __atomic_fetch_or (, , 0);  /* { dg-error ".b. undeclared" } */
+}
+
+void test_and_undeclared_first_arg (void)
+{
+  int a = 0;
+  __atomic_fetch_and (, , 0);  /* { dg-error ".b. undeclared" } */
+}
+
+void test_xor_undeclared_first_arg (void)
+{
+  int a = 0;
+  __atomic_fetch_xor (, , 0);  /* { dg-error ".b. undeclared" } */
+}
+
+void test_nand_undeclared_first_arg (void)
+{
+  int a = 0;
+  __atomic_fetch_nand (, , 0);  /* { dg-error ".b. undeclared" } */
+}
+
+
+void test_add_undeclared_second_arg (void)
+{
+  int b = 0;
+  __atomic_fetch_add (, , 0);   /* { dg-error ".a. undeclared" } */
+}
+
+void test_sub_undeclared_second_arg (void)
+{
+  int b = 0;
+  __atomic_fetch_sub (, , 0);  /* { dg-error ".a. undeclared" } */
+}
+
+void test_or_undeclared_second_arg (void)
+{
+  int b = 0;
+  __atomic_fetch_or (, , 0);  /* { dg-error ".a. undeclared" } */
+}
+
+void test_and_undeclared_second_arg (void)
+{
+  int b = 0;
+  __atomic_fetch_and (, , 0);  /* { dg-error ".a. undeclared" } */
+}
+
+void test_xor_undeclared_second_arg (void)
+{
+  int b = 0;
+  __atomic_fetch_xor (, , 0);  /* { dg-error ".a. undeclared" } */
+}
+
+void test_nand_undeclared_second_arg (void)
+{
+  int b = 0;
+  __atomic_fetch_nand (, , 0);  /* { dg-error ".a. undeclared" } */
+}
+
+
+void test_add_undeclared_third_arg (void)
+{
+  int a = 0, b = 0;
+  __atomic_fetch_add (, , m);   /* { dg-error ".m. undeclared" } */
+}
+
+void test_sub_undeclared_third_arg (void)
+{
+  int a = 0, b = 0;
+  __atomic_fetch_sub (, , m);  /* { dg-error ".m. undeclared" } */
+}
+
+void test_or_undeclared_third_arg (void)
+{
+  int a = 0, b = 0;
+  __atomic_fetch_or (, , m);  /* { dg-error ".m. undeclared" } */
+}
+
+void test_and_undeclared_third_arg (void)
+{
+  int a = 0, b = 0;
+  __atomic_fetch_and (, , m);  /* { dg-error ".m. undeclared" } */
+}
+
+void test_xor_undeclared_third_arg (void)
+{
+  int a = 0, b = 0;
+  __atomic_fetch_xor (, , m);  /* { dg-error ".m. undeclared" } */
+}
+
+void test_nand_undeclared_third_arg (void)
+{
+  int a = 0, b = 0;
+  __atomic_fetch_nand (, , m);  /* { dg-error ".m. undeclared" } */
+}


Re: [PATCH] Revert an __int20ish get_ref_base_and_extent change (PR c++/69355)

2016-01-20 Thread Jeff Law

On 01/20/2016 05:03 PM, Jakub Jelinek wrote:

Hi!

Among the __int20 changes was one that made get_ref_base_and_extent behave
differently from similar get_inner_reference, e.g. for XFmode long double
or structures containing just a single long double member,
get_ref_base_and_extent now returns on x86 80 instead of 96 or 128 which
the type spans in memory.  For a function that returns the extent IMHO
the bitsize is right, it also seems to allow SRA from scalarizing such
accesses (which it doesn't otherwise), and on the testcase below also
triggers some SRA bug that causes even wrong-code.  Martin said he will have
a look at SRA, but this patch will make that bug just latent issue.
DJ said his msp430 testing didn't reveal anything that would need the
precision there in this case (at least that is how I understood it).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-01-21  Jakub Jelinek  

PR c++/69355
* tree-dfa.c (get_ref_base_and_extent): Use GET_MODE_BITSIZE (mode)
for bitsize instead of GET_MODE_PRECISION (mode).

* g++.dg/torture/pr69355.C: New test.
Wasn't this discussed in IRC and approved there?  Isn't this patch just 
making get_ref_base_and_extent consistent with get_inner_reference, 
right?  If so, approved.


Jeff



Re: [PATCH v2] PR48344: Fix unrecognizable insn error when gcc

2016-01-20 Thread Jeff Law

On 01/20/2016 04:51 PM, Bernd Schmidt wrote:



On 01/20/2016 10:49 PM, Kelvin Nilsen wrote:


 * toplev.c (do_compile): remove invocation of process_options ()
 from within do_compile ()
 (toplev::main): insert invocation of process_options () before
 invocation of handle_common_deferred_options ().


The ChangeLog seems badly formatted, but it could have been eaten by
your mailer. You might want to include it as part of the attachment to
avoid whitespace damage.

As for the patch itself, it makes me a little nervous - it's hard to
judge whether this could have unintended consequences where something
relies on the existing ordering. I'd much rather postpone the generation
of stack_limit_rtx until rtl initialization time. Maybe this needs to be
per-function anyway, can Pmode change with attribute target?
I couldn't ever get past my own nervousness about this patch either. 
If there isn't a need for stack_limit_rtx early, then delaying that 
initialization seems wiser.


jeff


Re: [gomp4] fix c++ reference mappings in openacc

2016-01-20 Thread Cesar Philippidis
On 01/20/2016 07:46 PM, Cesar Philippidis wrote:
> I've applied this patch to gomp-4_0-branch which fixes of problems
> involving reference type variables in openacc data clauses. The first
> problem was, the c++ front end was incorrectly handling reference types
> in general in openacc. Instead of mapping the variable, it would map the
> pointer to the variable by itself. The second problem was, if the
> gimplifier saw a pointer mapping for a data clause, it would propagate
> it to omp-lower. That's bad because if you have something like this
> 
>   int  = ...
> 
>   #pragma acc data copy (var)
>   {
>  ...var...
>   }
> 
> where the var inside the data region would have some uninitialized value
> because omplower installs a new variable for it. The gimpifier is
> already handling openmp target data regions properly, so this patch
> extends it to ignore pointer mappings in acc enter/exit and data constructs.
> 
> Ultimately this patch will need to go in trunk, but the c++ changes
> don't apply cleanly. I'll need to work on that later.

And here's the patch.

Cesar
2016-01-20  Cesar Philippidis  

	gcc/cp/
	* parser.c (cp_parser_oacc_all_clauses): Call finish_omp_clauses
	with allow_fields set to true.
	(cp_parser_oacc_cache): Likewise.
	(cp_parser_oacc_loop): Likewise.
	* semantics.c (finish_omp_clauses): Ensure that is_oacc is properly
	set when calling hanlde_omp_array_sections.

	gcc/
	* gimplify.c (gimplify_scan_omp_clauses):  Consider OACC_{DATA,
	PARALLEL, KERNELS} when processing firstprivate pointers and
	references, and setting target_kind_firstprivatize_array_bases.

	libgomp/
	* testsuite/libgomp.oacc-c++/non-scalar-data.C: New test.


diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index d88877a..4882b19 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -32324,7 +32324,7 @@ cp_parser_oacc_all_clauses (cp_parser *parser, omp_clause_mask mask,
   cp_parser_skip_to_pragma_eol (parser, pragma_tok);
 
   if (finish_p)
-return finish_omp_clauses (clauses, true, false);
+return finish_omp_clauses (clauses, true, true);
 
   return clauses;
 }
@@ -35140,7 +35140,7 @@ cp_parser_oacc_cache (cp_parser *parser, cp_token *pragma_tok)
   tree stmt, clauses;
 
   clauses = cp_parser_omp_var_list (parser, OMP_CLAUSE__CACHE_, NULL_TREE);
-  clauses = finish_omp_clauses (clauses, true, false);
+  clauses = finish_omp_clauses (clauses, true, true);
 
   cp_parser_require_pragma_eol (parser, cp_lexer_peek_token (parser->lexer));
 
@@ -35471,9 +35471,9 @@ cp_parser_oacc_loop (cp_parser *parser, cp_token *pragma_tok, char *p_name,
 {
   clauses = c_oacc_split_loop_clauses (clauses, cclauses);
   if (*cclauses)
-	finish_omp_clauses (*cclauses, true, false);
+	finish_omp_clauses (*cclauses, true, true);
   if (clauses)
-	finish_omp_clauses (clauses, true, false);
+	finish_omp_clauses (clauses, true, true);
 }
 
   tree block = begin_omp_structured_block ();
diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index 3ca6137..e161186 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -5807,7 +5807,7 @@ finish_omp_clauses (tree clauses, bool is_oacc, bool allow_fields,
 	  t = OMP_CLAUSE_DECL (c);
 	  if (TREE_CODE (t) == TREE_LIST)
 	{
-	  if (handle_omp_array_sections (c, allow_fields))
+	  if (handle_omp_array_sections (c, allow_fields && !is_oacc))
 		{
 		  remove = true;
 		  break;
@@ -6567,7 +6567,7 @@ finish_omp_clauses (tree clauses, bool is_oacc, bool allow_fields,
 	}
 	  if (TREE_CODE (t) == TREE_LIST)
 	{
-	  if (handle_omp_array_sections (c, allow_fields))
+	  if (handle_omp_array_sections (c, allow_fields && !is_oacc))
 		remove = true;
 	  break;
 	}
@@ -6601,7 +6601,7 @@ finish_omp_clauses (tree clauses, bool is_oacc, bool allow_fields,
 	  t = OMP_CLAUSE_DECL (c);
 	  if (TREE_CODE (t) == TREE_LIST)
 	{
-	  if (handle_omp_array_sections (c, allow_fields))
+	  if (handle_omp_array_sections (c, allow_fields && !is_oacc))
 		remove = true;
 	  else
 		{
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index cdb5b96..152942f 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -6092,7 +6092,7 @@ omp_notice_variable (struct gimplify_omp_ctx *ctx, tree decl, bool in_code)
 	{
 	  unsigned nflags = flags;
 	  if (ctx->target_map_pointers_as_0len_arrays
-	  || ctx->target_map_scalars_firstprivate)
+	   || ctx->target_map_scalars_firstprivate)
 	{
 	  bool is_declare_target = false;
 	  bool is_scalar = false;
@@ -6456,7 +6456,10 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
   case OMP_TARGET_DATA:
   case OMP_TARGET_ENTER_DATA:
   case OMP_TARGET_EXIT_DATA:
+  case OACC_DATA:
   case OACC_HOST_DATA:
+  case OACC_PARALLEL:
+  case OACC_KERNELS:
 	ctx->target_firstprivatize_array_bases = true;
   default:
 	break;
@@ -6726,7 +6729,10 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
 	case OMP_TARGET_DATA:
 	

[gomp4] fix c++ reference mappings in openacc

2016-01-20 Thread Cesar Philippidis
I've applied this patch to gomp-4_0-branch which fixes of problems
involving reference type variables in openacc data clauses. The first
problem was, the c++ front end was incorrectly handling reference types
in general in openacc. Instead of mapping the variable, it would map the
pointer to the variable by itself. The second problem was, if the
gimplifier saw a pointer mapping for a data clause, it would propagate
it to omp-lower. That's bad because if you have something like this

  int  = ...

  #pragma acc data copy (var)
  {
 ...var...
  }

where the var inside the data region would have some uninitialized value
because omplower installs a new variable for it. The gimpifier is
already handling openmp target data regions properly, so this patch
extends it to ignore pointer mappings in acc enter/exit and data constructs.

Ultimately this patch will need to go in trunk, but the c++ changes
don't apply cleanly. I'll need to work on that later.

Cesar


Re: gomp_target_fini

2016-01-20 Thread Thomas Schwinge
Hi!

Ping.

On Mon, 11 Jan 2016 11:39:46 +0100, I wrote:
> Ping.
> 
> On Wed, 23 Dec 2015 12:05:32 +0100, I wrote:
> > Ping.
> > 
> > On Wed, 16 Dec 2015 13:30:21 +0100, I wrote:
> > > On Mon, 14 Dec 2015 19:47:36 +0300, Ilya Verbin  wrote:
> > > > On Fri, Dec 11, 2015 at 18:27:13 +0100, Jakub Jelinek wrote:
> > > > > On Tue, Dec 08, 2015 at 05:45:59PM +0300, Ilya Verbin wrote:
> > > > > > +/* This function finalizes all initialized devices.  */
> > > > > > +
> > > > > > +static void
> > > > > > +gomp_target_fini (void)
> > > > > > +{
> > > > > > +  [...]
> > > > > 
> > > > > The question is what will this do if there are async target tasks 
> > > > > still
> > > > > running on some of the devices at this point (forgotten #pragma omp 
> > > > > taskwait
> > > > > or similar if target nowait regions are started outside of parallel 
> > > > > region,
> > > > > or exit inside of parallel, etc.  But perhaps it can be handled 
> > > > > incrementally.
> > > > > Also there is the question that the 
> > > > > So I think the patch is ok with the above mentioned changes.
> > > > 
> > > > Here is what I've committed to trunk.
> > > 
> > > > --- a/libgomp/libgomp.h
> > > > +++ b/libgomp/libgomp.h
> > > > @@ -888,6 +888,14 @@ typedef struct acc_dispatch_t
> > > >} cuda;
> > > >  } acc_dispatch_t;
> > > >  
> > > > +/* Various state of the accelerator device.  */
> > > > +enum gomp_device_state
> > > > +{
> > > > +  GOMP_DEVICE_UNINITIALIZED,
> > > > +  GOMP_DEVICE_INITIALIZED,
> > > > +  GOMP_DEVICE_FINALIZED
> > > > +};
> > > > +
> > > >  /* This structure describes accelerator device.
> > > > It contains name of the corresponding libgomp plugin, function 
> > > > handlers for
> > > > interaction with the device, ID-number of the device, and 
> > > > information about
> > > > @@ -933,8 +941,10 @@ struct gomp_device_descr
> > > >/* Mutex for the mutable data.  */
> > > >gomp_mutex_t lock;
> > > >  
> > > > -  /* Set to true when device is initialized.  */
> > > > -  bool is_initialized;
> > > > +  /* Current state of the device.  OpenACC allows to move from 
> > > > INITIALIZED state
> > > > + back to UNINITIALIZED state.  OpenMP allows only to move from 
> > > > INITIALIZED
> > > > + to FINALIZED state (at program shutdown).  */
> > > > +  enum gomp_device_state state;
> > > 
> > > (ACK, but I assume we'll want to make sure that an OpenACC device is
> > > never re-initialized if we're in/after the libgomp finalization phase.)
> > > 
> > > 
> > > The issue mentioned above: "exit inside of parallel" is actually a
> > > problem for nvptx offloading: the libgomp.oacc-c-c++-common/abort-1.c,
> > > libgomp.oacc-c-c++-common/abort-3.c, and libgomp.oacc-fortran/abort-1.f90
> > > test cases now run into annoying "WARNING: program timed out".  Here is
> > > what's happening, as I understand it: in
> > > libgomp/plugin/plugin-nvptx.c:nvptx_exec, the cuStreamSynchronize call
> > > returns CUDA_ERROR_LAUNCH_FAILED, upon which we call GOMP_PLUGIN_fatal.
> > > 
> > > > --- a/libgomp/target.c
> > > > +++ b/libgomp/target.c
> > > 
> > > > +/* This function finalizes all initialized devices.  */
> > > > +
> > > > +static void
> > > > +gomp_target_fini (void)
> > > > +{
> > > > +  int i;
> > > > +  for (i = 0; i < num_devices; i++)
> > > > +{
> > > > +  struct gomp_device_descr *devicep = [i];
> > > > +  gomp_mutex_lock (>lock);
> > > > +  if (devicep->state == GOMP_DEVICE_INITIALIZED)
> > > > +   {
> > > > + devicep->fini_device_func (devicep->target_id);
> > > > + devicep->state = GOMP_DEVICE_FINALIZED;
> > > > +   }
> > > > +  gomp_mutex_unlock (>lock);
> > > > +}
> > > > +}
> > > 
> > > > @@ -2387,6 +2433,9 @@ gomp_target_init (void)
> > > >if (devices[i].capabilities & GOMP_OFFLOAD_CAP_OPENACC_200)
> > > > goacc_register ([i]);
> > > >  }
> > > > +
> > > > +  if (atexit (gomp_target_fini) != 0)
> > > > +gomp_fatal ("atexit failed");
> > > >  }
> > > 
> > > Now, with the above change installed, GOMP_PLUGIN_fatal will trigger the
> > > atexit handler, gomp_target_fini, which, with the device lock held, will
> > > call back into the plugin, GOMP_OFFLOAD_fini_device, which will try to
> > > clean up.
> > > 
> > > Because of the earlier CUDA_ERROR_LAUNCH_FAILED, the associated CUDA
> > > context is now in an inconsistent state, see
> > > :
> > > 
> > > CUDA_ERROR_LAUNCH_FAILED = 719
> > > An exception occurred on the device while executing a
> > > kernel. Common causes include dereferencing an invalid device
> > > pointer and accessing out of bounds shared memory. The context
> > > cannot be used, so it must be destroyed (and a new one should be
> > > created). All existing device memory allocations from this
> > > context are invalid and must be reconstructed if the program is
> > > to 

Re: [PATCH] Fix up reduction-1{1,2} testcases (PR middle-end/68221)

2016-01-20 Thread Thomas Schwinge
Hi!

Ping.

On Mon, 11 Jan 2016 11:40:58 +0100, I wrote:
> Ping.
> 
> On Wed, 23 Dec 2015 12:03:48 +0100, I wrote:
> > Ping.
> > 
> > On Thu, 26 Nov 2015 14:31:56 +0100, I wrote:
> > > On Mon, 23 Nov 2015 12:13:07 +0100 (CET), Richard Biener 
> > >  wrote:
> > > > On Fri, 20 Nov 2015, Jakub Jelinek wrote:
> > > > > If C/C++ array section reductions have non-zero (positive) bias, it is
> > > > > implemented by declaring a smaller private array and subtracting the 
> > > > > bias
> > > > > from the start of the private array (because valid code may only 
> > > > > dereference
> > > > > elements from bias onwards).  But, this isn't something that is 
> > > > > kosher in
> > > > > C/C++ pointer arithmetics and the alias oracle seems to get upset on 
> > > > > that.
> > > > > So, the following patch fixes that by performing the subtraction on 
> > > > > integral
> > > > > type instead of p+ -bias.
> > > > 
> > > > So this still does use the biased pointer because you do not
> > > > re-write accesses (where you could have applied the biasing to
> > > > the indexes / offsets), right?  Thus the patch is merely obfuscation
> > > > for GCC rather than making it kosher for C/C++ (you still have a
> > > > pointer pointing outside of the private array object)?
> > > > 
> > > > I still hope to have a look where the alias oracle gets things
> > > > wrong (well, if so by accident at least).
> > > 
> > > I understand this ("have a look where the alias oracle gets things
> > > wrong") to have happened in Richi's trunk r230793,
> > > ?
> > > 
> > > I've tested that with the original POINTER_PLUS_EXPR code restored and
> > > with Richi's r230793 applied, for x86_64 GNU/Linux there is no change for
> > > the libgomp.c/reduction-11.c, libgomp.c/reduction-12.c,
> > > libgomp.c++/reduction-11.C, libgomp.c++/reduction-12.C test cases
> > > (already PASSed), but for 32-bit x86, they now PASS instead of FAILing.
> > > That is, I tested with the following (part of r230672) reverted:
> > > 
> > > > > --- gcc/omp-low.c.jj  2015-11-20 12:56:17.0 +0100
> > > > > +++ gcc/omp-low.c 2015-11-20 13:44:29.080374051 +0100
> > > > > @@ -,11 +,13 @@ lower_rec_input_clauses (tree clauses, g
> > > > >  
> > > > > if (!integer_zerop (bias))
> > > > >   {
> > > > > -   bias = fold_convert_loc (clause_loc, sizetype, bias);
> > > > > -   bias = fold_build1_loc (clause_loc, NEGATE_EXPR,
> > > > > -   sizetype, bias);
> > > > > -   x = fold_build2_loc (clause_loc, POINTER_PLUS_EXPR,
> > > > > -TREE_TYPE (x), x, bias);
> > > > > +   bias = fold_convert_loc (clause_loc, 
> > > > > pointer_sized_int_node,
> > > > > +bias);
> > > > > +   yb = fold_convert_loc (clause_loc, 
> > > > > pointer_sized_int_node,
> > > > > +  x);
> > > > > +   yb = fold_build2_loc (clause_loc, MINUS_EXPR,
> > > > > + pointer_sized_int_node, yb, 
> > > > > bias);
> > > > > +   x = fold_convert_loc (clause_loc, TREE_TYPE (x), yb);
> > > > > yb = create_tmp_var (ptype, name);
> > > > > gimplify_assign (yb, x, ilist);
> > > > > x = yb;
> > > 
> > > OK to commit the following to trunk?
> > > 
> > > commit 92b0eebfcbe914d3addeb97d4bb33f76a44dbe60
> > > Author: Thomas Schwinge 
> > > Date:   Thu Nov 26 14:21:13 2015 +0100
> > > 
> > > Restore original POINTER_PLUS_EXPR code
> > > 
> > >   PR middle-end/68221
> > >   gcc/
> > >   * omp-low.c (lower_rec_input_clauses): If C/C++ array reduction
> > >   has non-zero bias, use pointer plus of negated bias instead of
> > >   subtracting it in integer type.
> > > ---
> > >  gcc/omp-low.c |   12 +---
> > >  1 file changed, 5 insertions(+), 7 deletions(-)
> > > 
> > > diff --git gcc/omp-low.c gcc/omp-low.c
> > > index 0b44588..927d9d9 100644
> > > --- gcc/omp-low.c
> > > +++ gcc/omp-low.c
> > > @@ -4451,13 +4451,11 @@ lower_rec_input_clauses (tree clauses, gimple_seq 
> > > *ilist, gimple_seq *dlist,
> > >  
> > > if (!integer_zerop (bias))
> > >   {
> > > -   bias = fold_convert_loc (clause_loc, pointer_sized_int_node,
> > > -bias);
> > > -   yb = fold_convert_loc (clause_loc, pointer_sized_int_node,
> > > -  x);
> > > -   yb = fold_build2_loc (clause_loc, MINUS_EXPR,
> > > - pointer_sized_int_node, yb, bias);
> > > -   x = fold_convert_loc (clause_loc, TREE_TYPE (x), yb);
> > > +   bias = fold_convert_loc (clause_loc, sizetype, bias);
> > > +   bias = fold_build1_loc 

Re: [PATCH 1/5] s390: Use proper read-only data section for literals.

2016-01-20 Thread Marcin Kościelnicki

On 20/01/16 14:11, Andreas Krebbel wrote:

On 01/02/2016 08:16 PM, Marcin Kościelnicki wrote:

Previously, .rodata was hardcoded.  For C++ vague linkage functions,
this resulted in needlessly duplicated literals.  With the new split
stack support, this resulted in link errors, due to .rodata containing
relocations to the discarded text sections.

gcc/ChangeLog:

* config/s390/s390.md (pool_section_start): Use switch_to_section
to select proper read-only data section instead of hardcoding .rodata.
(pool_section_end): Use switch_to_section to match the above.
---
  gcc/ChangeLog   |  6 ++
  gcc/config/s390/s390.md | 11 +--
  2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 23ce209..2c572a7 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,9 @@
+2016-01-02  Marcin Kościelnicki  
+
+   * config/s390/s390.md (pool_section_start): Use switch_to_section
+   to select proper read-only data section instead of hardcoding .rodata.
+   (pool_section_end): Use switch_to_section to match the above.
+


This is ok if bootstrap and regression tests are clean. Thanks!

-Andreas-




The bootstrap and regression tests are indeed clean for this patch and 
#2.  I don't have commit access to gcc repo, how do I get this pushed?


Marcin Kościelnicki


Re: [PATCH, GCC] Fix PR67781: wrong code generation for partial load on big endian targets

2016-01-20 Thread Thomas Preud'homme
On Friday, January 08, 2016 10:05:25 AM Richard Biener wrote:
> On Tue, 5 Jan 2016, Thomas Preud'homme wrote:
> > Hi,
> > 
> > bswap optimization pass generate wrong code on big endian targets when the
> > result of a bit operation it analyzed is a partial load of the range of
> > memory accessed by the original expression (when one or more bytes at
> > lowest address were lost in the computation). This is due to the way
> > cmpxchg and cmpnop are adjusted in find_bswap_or_nop before being
> > compared to the result of the symbolic expression. Part of the adjustment
> > is endian independent: it's to ignore the bytes that were not accessed by
> > the original gimple expression. However, when the result has less byte
> > than that original expression, some more byte need to be ignored and this
> > is endian dependent.
> > 
> > The current code only support loss of bytes at the highest addresses
> > because there is no code to adjust the address of the load. However, for
> > little and big endian targets the bytes at highest address translate into
> > different byte significance in the result. This patch first separate
> > cmpxchg and cmpnop adjustement into 2 steps and then deal with endianness
> > correctly for the second step.
> > 
> > ChangeLog entries are as follow:
> > 
> > 
> > *** gcc/ChangeLog ***
> > 
> > 2015-12-16  Thomas Preud'homme  
> > 
> > PR tree-optimization/67781
> > * tree-ssa-math-opts.c (find_bswap_or_nop): Zero out bytes in
> > cmpxchg
> > and cmpnop in two steps: first the ones not accessed in original
> > gimple expression in a endian independent way and then the ones
> > not
> > accessed in the final result in an endian-specific way.
> > 
> > *** gcc/testsuite/ChangeLog ***
> > 
> > 2015-12-16  Thomas Preud'homme  
> > 
> > PR tree-optimization/67781
> > * gcc.c-torture/execute/pr67781.c: New file.
> > 
> > diff --git a/gcc/testsuite/gcc.c-torture/execute/pr67781.c
> > b/gcc/testsuite/gcc.c-torture/execute/pr67781.c
> > new file mode 100644
> > index 000..bf50aa2
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.c-torture/execute/pr67781.c
> > @@ -0,0 +1,34 @@
> > +#ifdef __UINT32_TYPE__
> > +typedef __UINT32_TYPE__ uint32_t;
> > +#else
> > +typedef unsigned uint32_t;
> > +#endif
> > +
> > +#ifdef __UINT8_TYPE__
> > +typedef __UINT8_TYPE__ uint8_t;
> > +#else
> > +typedef unsigned char uint8_t;
> > +#endif
> > +
> > +struct
> > +{
> > +  uint32_t a;
> > +  uint8_t b;
> > +} s = { 0x123456, 0x78 };
> > +
> > +int pr67781()
> > +{
> > +  uint32_t c = (s.a << 8) | s.b;
> > +  return c;
> > +}
> > +
> > +int
> > +main ()
> > +{
> > +  if (sizeof (uint32_t) * __CHAR_BIT__ != 32)
> > +return 0;
> > +
> > +  if (pr67781 () != 0x12345678)
> > +__builtin_abort ();
> > +  return 0;
> > +}
> > diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
> > index b00f046..e5a185f 100644
> > --- a/gcc/tree-ssa-math-opts.c
> > +++ b/gcc/tree-ssa-math-opts.c
> > @@ -2441,6 +2441,8 @@ find_bswap_or_nop_1 (gimple *stmt, struct
> > symbolic_number *n, int limit)
> > 
> >  static gimple *
> >  find_bswap_or_nop (gimple *stmt, struct symbolic_number *n, bool *bswap)
> >  {
> > 
> > +  unsigned rsize;
> > +  uint64_t tmpn, mask;
> > 
> >  /* The number which the find_bswap_or_nop_1 result should match in order
> >  
> > to have a full byte swap.  The number is shifted to the right
> > according to the size of the symbolic number before using it.  */
> > 
> > @@ -2464,24 +2466,38 @@ find_bswap_or_nop (gimple *stmt, struct
> > symbolic_number *n, bool *bswap)
> > 
> >/* Find real size of result (highest non-zero byte).  */
> >if (n->base_addr)
> > 
> > -{
> > -  int rsize;
> > -  uint64_t tmpn;
> > -
> > -  for (tmpn = n->n, rsize = 0; tmpn; tmpn >>= BITS_PER_MARKER,
> > rsize++); -  n->range = rsize;
> > -}
> > +for (tmpn = n->n, rsize = 0; tmpn; tmpn >>= BITS_PER_MARKER,
> > rsize++);
> > +  else
> > +rsize = n->range;
> > 
> > -  /* Zero out the extra bits of N and CMP*.  */
> > +  /* Zero out the bits corresponding to untouched bytes in original
> > gimple
> > + expression.  */
> > 
> >if (n->range < (int) sizeof (int64_t))
> >
> >  {
> > 
> > -  uint64_t mask;
> > -
> > 
> >mask = ((uint64_t) 1 << (n->range * BITS_PER_MARKER)) - 1;
> >cmpxchg >>= (64 / BITS_PER_MARKER - n->range) * BITS_PER_MARKER;
> >cmpnop &= mask;
> >  
> >  }
> > 
> > +  /* Zero out the bits corresponding to unused bytes in the result of the
> > + gimple expression.  */
> > +  if (rsize < n->range)
> > +{
> > +  if (BYTES_BIG_ENDIAN)
> > +   {
> > + mask = ((uint64_t) 1 << (rsize * BITS_PER_MARKER)) - 1;
> > + cmpxchg &= mask;
> > + cmpnop >>= (n->range - rsize) * BITS_PER_MARKER;
> > +   }
> > +  else
> > +   {
> > + mask = ((uint64_t) 1 << (rsize * 

Re: [PATCH], PowerPC IEEE 128-bit fp, #11-rev4 (enable libgcc conversions)

2016-01-20 Thread Michael Meissner
This is revision 4 of the IEEE 128-bit floating point libgcc support.

Since revision 3, I have removed the gcc changes that broke AIX.  I rewrote the
IBM extended double pack/unpack support to not use the builtin functions, but
instead uses a union.  The libgcc code that I wrote tickles a bug in the pack
function.  While I would like to fix the pack function bug, I will need to make
sure I don't break AIX, so I didn't want to couple this library to getting
those bugs fixed.

I have also rewritten how the ifunc support is done, so that ifunc is only done
if the target assembler supports ISA 3.0 instructions AND the compiler supports
ifunc functions.  This is so that the compiler can build on 64-bit systems if
--enable-gnu-indirect-function is not specified without the ifunc functions
being flagged.

I have done bootstraps on both a big endian power7 system and a little endian
power8 with no regressions.  In addition, I have built a compiler explicitly
disabling ifunc support, and it built and ran the ieee 128-bit floating point
unit tests correctly.

Can I install this into libgcc?

Assuming I can install these changes, the one final change that I would like to
make is to enable float128 automatically on VSX powerpc Linux systems, but not
on other systems (AIX, *BSD, etc.) since those systems do not build float128
emulator functions.

2016-01-20  Michael Meissner  
Steven Munroe 
Tulio Magno Quites Machado Filho 

* config/rs6000/float128-sed: New files to convert TF names to KF
names for PowerPC IEEE 128-bit floating point support.
* config/rs6000/float128-sed-hw: Likewise.

* config/rs6000/float128-hw.c: New file for ISA 3.0 IEEE 128-bit
floating point hardware support.

* config/rs6000/float128-ifunc.c: New file to pick either IEEE
128-bit floating point software emulation or use ISA 3.0 hardware
support if it is available.

* config/rs6000/quad-float128.h: New file to support IEEE 128-bit
floating point.

* config/rs6000/extendkftf2-sw.c: New file, convert IEEE 128-bit
floating point to IBM extended double.

* config/rs6000/trunctfkf2-sw.c: New file, convert IBM extended
double to IEEE 128-bit floating point.

* config/rs6000/t-float128: New Makefile fragments to enable
building __float128 emulation support.
* config/rs6000/t-float128-hw: Likewise.

* config/rs6000/sfp-exceptions.c: New file to provide exception
support for IEEE 128-bit floating point.

* config/rs6000/floattikf.c: New files for converting between IEEE
128-bit floating point and signed/unsigned 128-bit integers.
* config/rs6000/fixunskfti.c: Likewise.
* config/rs6000/fixkfti.c: Likewise.
* config/rs6000/floatuntikf.c: Likewise.

* config/rs6000/sfp-machine.h (_FP_W_TYPE_SIZE): Use 64-bit types
when building on 64-bit systems, or when VSX is enabled.
(_FP_W_TYPE): Likewise.
(_FP_WS_TYPE): Likewise.
(_FP_I_TYPE): Likewise.
(TItype): Define on 64-bit systems.
(UTItype): Likewise.
(TI_BITS): Likewise.
(_FP_MUL_MEAT_D): Add support for using 64-bit types.
(_FP_MUL_MEAT_Q): Likewise.
(_FP_DIV_MEAT_D): Likewise.
(_FP_DIV_MEAT_Q): Likewise.
(_FP_NANFRAC_D): Likewise.
(_FP_NANFRAC_Q): Likewise.
(ISA_BIT): Add exception support if we are being compiled on a
machine with hardware floating point support to build the IEEE
128-bit emulation functions.
(FP_EX_INVALID): Likewise.
(FP_EX_OVERFLOW): Likewise.
(FP_EX_UNDERFLOW): Likewise.
(FP_EX_DIVZERO): Likewise.
(FP_EX_INEXACT): Likewise.
(FP_EX_ALL): Likewise.
(__sfp_handle_exceptions): Likewise.
(FP_HANDLE_EXCEPTIONS): Likewise.
(FP_RND_NEAREST): Likewise.
(FP_RND_ZERO): Likewise.
(FP_RND_PINF): Likewise.
(FP_RND_MINF): Likewise.
(FP_RND_MASK): Likewise.
(_FP_DECL_EX): Likewise.
(FP_INIT_ROUNDMODE): Likewise.
(FP_ROUNDMODE): Likewise.

* libgcc/config.host (powerpc*-*-linux*): If compiler can compile
VSX code, enable IEEE 128-bit floating point.  If the compiler can
compile IEEE 128-bit floating point code with ISA 3.0 IEEE 128-bit
floating point hardware instructions and it supports declaring
functions with the ifunc attribute, enable ifunc functions to
switch between software and hardware support.
* configure.ac (powerpc*-*-linux*): Likewise.
* configure: Regenerate.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: libgcc/config/rs6000/float128-sed-hw

Re: [doc, dwarf] Update bit-rotten DWARF option documentation

2016-01-20 Thread Sandra Loosemore

On 01/20/2016 06:52 AM, Jason Merrill wrote:

On 01/19/2016 11:31 PM, Sandra Loosemore wrote:

-@option{-gdwarf-2} does not accept a concatenated debug level, because
-GCC used to support an option @option{-gdwarf} that meant to generate
-debug information in version 1 of the DWARF format (which is very
-different from version 2), and it would have been too confusing.  That
-debug format is long obsolete, but the option cannot be changed now.


I think we should retain some mention of DWARF 1 here, perhaps as an
explanation of why {version} starts at 2.


 Ron Guilmette implemented the @command{protoize} and
@command{unprotoize}
-tools, the support for Dwarf symbolic debugging information, and much of
+tools, the support for DWARF symbolic debugging information, and much of
 the support for System V Release 4.  He has also worked heavily on the


Let's clarify that rfg's implementation was of DWARF version 1.


Thanks for the sanity check.  I've committed the attached version of the 
patch, which addresses your comments.


-Sandra

2016-01-20  Sandra Loosemore 

	gcc/
	* common.opt (feliminate-dwarf2-dups): Replace references to
	"DWARF 2" with just "DWARF".
	* config/ia64/ia64.opt (mdwarf2-asm): Likewise.
	* doc/extend.texi: Likewise.
	* doc/cpp.texi: Likewise.
	* doc/invoke.texi: Likewise.
	(Option Summary): Add -gdwarf to list of Debugging Options.
	(Debugging Options): Document -gdwarf.
	* doc/contrib.texi: Spell "DWARF" like that.
Index: gcc/common.opt
===
--- gcc/common.opt	(revision 232582)
+++ gcc/common.opt	(working copy)
@@ -1234,7 +1234,7 @@ Perform early inlining.
 
 feliminate-dwarf2-dups
 Common Report Var(flag_eliminate_dwarf2_dups)
-Perform DWARF2 duplicate elimination.
+Perform DWARF duplicate elimination.
 
 fipa-sra
 Common Report Var(flag_ipa_sra) Init(0) Optimization
Index: gcc/config/ia64/ia64.opt
===
--- gcc/config/ia64/ia64.opt	(revision 232582)
+++ gcc/config/ia64/ia64.opt	(working copy)
@@ -103,7 +103,7 @@ Do not inline square root.
 
 mdwarf2-asm
 Target Report Mask(DWARF2_ASM)
-Enable Dwarf 2 line debug info via GNU as.
+Enable DWARF line debug info via GNU as.
 
 mearly-stop-bits
 Target Report Mask(EARLY_STOP_BITS)
Index: gcc/doc/extend.texi
===
--- gcc/doc/extend.texi	(revision 232582)
+++ gcc/doc/extend.texi	(working copy)
@@ -916,8 +916,8 @@ provided as built-in functions by GCC@.
 
 GCC can allocate complex automatic variables in a noncontiguous
 fashion; it's even possible for the real part to be in a register while
-the imaginary part is on the stack (or vice versa).  Only the DWARF 2
-debug info format can represent this, so use of DWARF 2 is recommended.
+the imaginary part is on the stack (or vice versa).  Only the DWARF
+debug info format can represent this, so use of DWARF is recommended.
 If you are using the stabs debug info format, GCC describes a noncontiguous
 complex variable as if it were two separate variables of noncomplex type.
 If the variable's actual name is @code{foo}, the two fictitious
@@ -1075,7 +1075,7 @@ the technical report.
 @end itemize
 
 Types @code{_Decimal32}, @code{_Decimal64}, and @code{_Decimal128}
-are supported by the DWARF 2 debug information format.
+are supported by the DWARF debug information format.
 
 @node Hex Floats
 @section Hex Floats
@@ -1249,7 +1249,7 @@ is incomplete:
 Pragmas to control overflow and rounding behaviors are not implemented.
 @end itemize
 
-Fixed-point types are supported by the DWARF 2 debug information format.
+Fixed-point types are supported by the DWARF debug information format.
 
 @node Named Address Spaces
 @section Named Address Spaces
Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi	(revision 232583)
+++ gcc/doc/invoke.texi	(working copy)
@@ -316,7 +316,7 @@ Objective-C and Objective-C++ Dialects}.
 
 @item Debugging Options
 @xref{Debugging Options,,Options for Debugging Your Program}.
-@gccoptlist{-g  -g@var{level}  -gcoff  -gdwarf-@var{version} @gol
+@gccoptlist{-g  -g@var{level}  -gcoff  -gdwarf -gdwarf-@var{version} @gol
 -ggdb  -grecord-gcc-switches  -gno-record-gcc-switches @gol
 -gstabs  -gstabs+  -gstrict-dwarf  -gno-strict-dwarf @gol
 -gvms  -gxcoff  -gxcoff+ -gz@r{[}=@var{type}@r{]} @gol
@@ -5727,7 +5727,7 @@ information useful for debugging do not 
 @item -g
 @opindex g
 Produce debugging information in the operating system's native format
-(stabs, COFF, XCOFF, or DWARF 2)@.  GDB can work with this debugging
+(stabs, COFF, XCOFF, or DWARF)@.  GDB can work with this debugging
 information.
 
 On most systems that use stabs format, @option{-g} enables use of extra
@@ -5741,12 +5741,13 @@ to generate the extra information, use @
 @item -ggdb
 @opindex ggdb
 Produce debugging information for use by GDB@.  

Re: [hsa merge 02/10] Modifications to libgomp proper

2016-01-20 Thread Jakub Jelinek
On Wed, Jan 20, 2016 at 05:47:59PM +0300, Ilya Verbin wrote:
> OK for trunk?
> 
> libgomp/
>   * task.c (gomp_create_target_task): Set firstprivate_copies to NULL.
> 
> diff --git a/libgomp/task.c b/libgomp/task.c
> index 0f45c44..38d4e9b 100644
> --- a/libgomp/task.c
> +++ b/libgomp/task.c
> @@ -683,6 +683,7 @@ gomp_create_target_task (struct gomp_device_descr 
> *devicep,
>ttask->state = state;
>ttask->task = task;
>ttask->team = team;
> +  ttask->firstprivate_copies = NULL;
>task->fn = NULL;
>task->fn_data = ttask;
>task->final_task = 0;

Ok (though, eventually I'd prefer if free (ttask->firstprivate_copies) is
only performed for the shared mem async tasks and not other one.

Jakub


[PATCH] Fix PR69378

2016-01-20 Thread Richard Biener

I am currently testing the following patch to fix PR69378 (another
fallout of the PR69117 fix).

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2016-01-20  Richard Biener  

PR tree-optimization/69378
* tree-ssa-sccvn.c (dominated_by_p_w_unex): New function.
(set_ssa_val_to): Use it for dominance checks taking into
account not executable edges.

Index: gcc/tree-ssa-sccvn.c
===
*** gcc/tree-ssa-sccvn.c(revision 232603)
--- gcc/tree-ssa-sccvn.c(working copy)
*** print_scc (FILE *out, vec scc)
*** 2969,2974 
--- 2969,3055 
fprintf (out, "\n");
  }
  
+ /* Return true if BB1 is dominated by BB2 taking into account edges
+that are not executable.  */
+ 
+ static bool
+ dominated_by_p_w_unex (basic_block bb1, basic_block bb2)
+ {
+   edge_iterator ei;
+   edge e;
+ 
+   if (dominated_by_p (CDI_DOMINATORS, bb1, bb2))
+ return true;
+ 
+   /* Before iterating we'd like to know if there exists a
+  (executable) path from bb2 to bb1 at all, if not we can
+  directly return false.  For now simply iterate once.  */
+ 
+   /* Iterate to the single executable bb1 predecessor.  */
+   if (EDGE_COUNT (bb1->preds) > 1)
+ {
+   edge prede = NULL;
+   FOR_EACH_EDGE (e, ei, bb1->preds)
+   if (e->flags & EDGE_EXECUTABLE)
+ {
+   if (prede)
+ {
+   prede = NULL;
+   break;
+ }
+   prede = e;
+ }
+   if (prede)
+   {
+ bb1 = prede->src;
+ 
+ /* Re-do the dominance check with changed bb1.  */
+ if (dominated_by_p (CDI_DOMINATORS, bb1, bb2))
+   return true;
+   }
+ }
+ 
+   /* Iterate to the single executable bb2 successor.  */
+   edge succe = NULL;
+   FOR_EACH_EDGE (e, ei, bb2->succs)
+ if (e->flags & EDGE_EXECUTABLE)
+   {
+   if (succe)
+ {
+   succe = NULL;
+   break;
+ }
+   succe = e;
+   }
+   if (succe)
+ {
+   /* Verify the reached block is only reached through succe.
+If there is only one edge we can spare us the dominator
+check and iterate directly.  */
+   if (EDGE_COUNT (succe->dest->preds) > 1)
+   {
+ FOR_EACH_EDGE (e, ei, succe->dest->preds)
+   if (e != succe
+   && (e->flags & EDGE_EXECUTABLE))
+ {
+   succe = NULL;
+   break;
+ }
+   }
+   if (succe)
+   {
+ bb2 = succe->dest;
+ 
+ /* Re-do the dominance check with changed bb2.  */
+ if (dominated_by_p (CDI_DOMINATORS, bb1, bb2))
+   return true;
+   }
+ }
+ 
+   /* We could now iterate updating bb1 / bb2.  */
+   return false;
+ }
+ 
  /* Set the value number of FROM to TO, return true if it has changed
 as a result.  */
  
*** set_ssa_val_to (tree from, tree to)
*** 3046,3060 
  && SSA_NAME_RANGE_INFO (to))
{
  if (SSA_NAME_IS_DEFAULT_DEF (to)
! || dominated_by_p (CDI_DOMINATORS,
!gimple_bb (SSA_NAME_DEF_STMT (from)),
!gimple_bb (SSA_NAME_DEF_STMT (to
/* Keep the info from the dominator.  */
;
  else if (SSA_NAME_IS_DEFAULT_DEF (from)
!  || dominated_by_p (CDI_DOMINATORS,
! gimple_bb (SSA_NAME_DEF_STMT (to)),
! gimple_bb (SSA_NAME_DEF_STMT (from
{
  /* Save old info.  */
  if (! VN_INFO (to)->info.range_info)
--- 3127,3141 
  && SSA_NAME_RANGE_INFO (to))
{
  if (SSA_NAME_IS_DEFAULT_DEF (to)
! || dominated_by_p_w_unex
!   (gimple_bb (SSA_NAME_DEF_STMT (from)),
!gimple_bb (SSA_NAME_DEF_STMT (to
/* Keep the info from the dominator.  */
;
  else if (SSA_NAME_IS_DEFAULT_DEF (from)
!  || dominated_by_p_w_unex
!   (gimple_bb (SSA_NAME_DEF_STMT (to)),
!gimple_bb (SSA_NAME_DEF_STMT (from
{
  /* Save old info.  */
  if (! VN_INFO (to)->info.range_info)
*** set_ssa_val_to (tree from, tree to)
*** 3076,3090 
   && SSA_NAME_PTR_INFO (to))
{
  if (SSA_NAME_IS_DEFAULT_DEF (to)
! || dominated_by_p (CDI_DOMINATORS,
!gimple_bb (SSA_NAME_DEF_STMT (from)),
!gimple_bb (SSA_NAME_DEF_STMT (to
/* Keep the info from the dominator.  */
;
  else if 

[gomp-nvptx 02/13] omp-low: extend SIMD lowering for SIMT execution

2016-01-20 Thread Alexander Monakov
This patch extends SIMD-via-SIMT lowering in omp-low.c to handle all loops,
lowering reduction/lastprivate/ordered appropriately (but it still chickens
out on collapsed loops, handling them as if safelen=1).  New SIMT lowering
snippets use new internal functions that are folded for non-SIMT targets in
omp_device_lower, allowing subsequent optimizations to clean up.

* internal-fn.c (expand_GOMP_SIMT_LANE): Update.
(expand_GOMP_SIMT_LAST_LANE): New.
(expand_GOMP_SIMT_ORDERED_PRED): New.
(expand_GOMP_SIMT_VOTE_ANY): New.
(expand_GOMP_SIMT_XCHG_BFLY): New.
(expand_GOMP_SIMT_XCHG_IDX): New.
* internal-fn.def (GOMP_SIMT_LAST_LANE): New.
(GOMP_SIMT_ORDERED_PRED): New.
(GOMP_SIMT_VOTE_ANY): New.
(GOMP_SIMT_XCHG_BFLY): New.
(GOMP_SIMT_XCHG_IDX): New.
* omp-low.c (omp_maybe_offloaded_ctx): New, outlined from...
(create_omp_child_function): ...here.  Simplify.
(omp_max_simt_vf): New.  Use it...
(omp_max_vf): ...here.
(lower_rec_input_clauses): Add reduction lowering for SIMT execution.
(lower_lastprivate_clauses): Likewise, for lastprivate lowering.
(lower_omp_ordered): Likewise, for "ordered" lowering.
(expand_omp_simd): Update SIMT transforms.
(execute_omp_device_lower): Update.  Fold SIMD ifns on SIMT targets.
---
 gcc/ChangeLog.gomp-nvptx |  23 +++
 gcc/internal-fn.c| 110 -
 gcc/internal-fn.def  |   5 +
 gcc/omp-low.c| 399 ++-
 4 files changed, 427 insertions(+), 110 deletions(-)

diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index f730548..6eba12f 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -161,11 +161,12 @@ static void
 expand_GOMP_SIMT_LANE (internal_fn, gcall *stmt)
 {
   tree lhs = gimple_call_lhs (stmt);
+  if (!lhs)
+return;
 
   rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
-  /* FIXME: use a separate pattern for OpenMP?  */
-  gcc_assert (targetm.have_oacc_dim_pos ());
-  emit_insn (targetm.gen_oacc_dim_pos (target, const2_rtx));
+  gcc_assert (targetm.have_omp_simt_lane ());
+  emit_insn (targetm.gen_omp_simt_lane (target));
 }
 
 /* This should get expanded in omp_device_lower pass.  */
@@ -176,6 +177,109 @@ expand_GOMP_SIMT_VF (internal_fn, gcall *)
   gcc_unreachable ();
 }
 
+/* Lane index of the first SIMT lane that supplies a non-zero argument.
+   This is a SIMT counterpart to GOMP_SIMD_LAST_LANE, used to represent the
+   lane that executed the last iteration for handling OpenMP lastprivate.  */
+
+static void
+expand_GOMP_SIMT_LAST_LANE (internal_fn, gcall *stmt)
+{
+  tree lhs = gimple_call_lhs (stmt);
+  if (!lhs)
+return;
+
+  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+  rtx cond = expand_normal (gimple_call_arg (stmt, 0));
+  machine_mode mode = TYPE_MODE (TREE_TYPE (lhs));
+  struct expand_operand ops[2];
+  create_output_operand ([0], target, mode);
+  create_input_operand ([1], cond, mode);
+  gcc_assert (targetm.have_omp_simt_last_lane ());
+  expand_insn (targetm.code_for_omp_simt_last_lane, 2, ops);
+}
+
+/* Non-transparent predicate used in SIMT lowering of OpenMP "ordered".  */
+
+static void
+expand_GOMP_SIMT_ORDERED_PRED (internal_fn, gcall *stmt)
+{
+  tree lhs = gimple_call_lhs (stmt);
+  if (!lhs)
+return;
+
+  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+  rtx ctr = expand_normal (gimple_call_arg (stmt, 0));
+  machine_mode mode = TYPE_MODE (TREE_TYPE (lhs));
+  struct expand_operand ops[2];
+  create_output_operand ([0], target, mode);
+  create_input_operand ([1], ctr, mode);
+  gcc_assert (targetm.have_omp_simt_ordered ());
+  expand_insn (targetm.code_for_omp_simt_ordered, 2, ops);
+}
+
+/* "Or" boolean reduction across SIMT lanes: return non-zero in all lanes if
+   any lane supplies a non-zero argument.  */
+
+static void
+expand_GOMP_SIMT_VOTE_ANY (internal_fn, gcall *stmt)
+{
+  tree lhs = gimple_call_lhs (stmt);
+  if (!lhs)
+return;
+
+  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+  rtx cond = expand_normal (gimple_call_arg (stmt, 0));
+  machine_mode mode = TYPE_MODE (TREE_TYPE (lhs));
+  struct expand_operand ops[2];
+  create_output_operand ([0], target, mode);
+  create_input_operand ([1], cond, mode);
+  gcc_assert (targetm.have_omp_simt_vote_any ());
+  expand_insn (targetm.code_for_omp_simt_vote_any, 2, ops);
+}
+
+/* Exchange between SIMT lanes with a "butterfly" pattern: source lane index
+   is destination lane index XOR given offset.  */
+
+static void
+expand_GOMP_SIMT_XCHG_BFLY (internal_fn, gcall *stmt)
+{
+  tree lhs = gimple_call_lhs (stmt);
+  if (!lhs)
+return;
+
+  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+  rtx src = expand_normal (gimple_call_arg (stmt, 0));
+  rtx idx = expand_normal (gimple_call_arg (stmt, 1));
+  machine_mode mode = TYPE_MODE (TREE_TYPE 

Re: [RFC] [nvptx] Try to cope with cuLaunchKernel returning CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES

2016-01-20 Thread Alexander Monakov
On Tue, 19 Jan 2016, Alexander Monakov wrote:
> > You mean you already have implemented something along the lines I
> > proposed?
> 
> Yes, I was implementing OpenMP teams, and it made sense to add warps per block
> limiting at the same time (i.e. query CU_FUNC_ATTRIBUTE_... and limit if
> default or requested number of threads per team is too high).  I intend to
> post that patch as part of a larger series shortly (but the patch itself is
> simple enough, although a small tweak will be needed to make it apply to
> OpenACC too).

Here's the patch I was talking about:
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=04e68c22081c36caf5da9d9f4ca5e895e1088c78;hp=735c8a7d88a7e14cb707f22286678982174175a6

Alexander


[gomp-nvptx 08/13] libgomp: add nvptx lock.c

2016-01-20 Thread Alexander Monakov
This patch implements lock.c on NVPTX by moving a bunch of generic
implementations (in terms of gomp_mutex_t) from config/linux/lock.c to lock.c
and reusing them on NVPTX.

* config/linux/lock.c (gomp_init_lock_30): Move to generic lock.c.
(gomp_destroy_lock_30): Ditto.
(gomp_set_lock_30): Ditto.
(gomp_unset_lock_30): Ditto.
(gomp_test_lock_30): Ditto.
(gomp_init_nest_lock_30): Ditto.
(gomp_destroy_nest_lock_30): Ditto.
(gomp_set_nest_lock_30): Ditto.
(gomp_unset_nest_lock_30): Ditto.
(gomp_test_nest_lock_30): Ditto.
* lock.c: New.
* config/nvptx/lock.c: New.
---
 libgomp/ChangeLog.gomp-nvptx |  15 ++
 libgomp/config/linux/lock.c  |  94 +
 libgomp/config/nvptx/lock.c  |  41 +++
 libgomp/lock.c   | 123 +++
 4 files changed, 181 insertions(+), 92 deletions(-)
 create mode 100644 libgomp/lock.c

diff --git a/libgomp/config/linux/lock.c b/libgomp/config/linux/lock.c
index 32cd21d..a80d7c5 100644
--- a/libgomp/config/linux/lock.c
+++ b/libgomp/config/linux/lock.c
@@ -32,98 +32,8 @@
 #include 
 #include "wait.h"
 
-
-/* The internal gomp_mutex_t and the external non-recursive omp_lock_t
-   have the same form.  Re-use it.  */
-
-void
-gomp_init_lock_30 (omp_lock_t *lock)
-{
-  gomp_mutex_init (lock);
-}
-
-void
-gomp_destroy_lock_30 (omp_lock_t *lock)
-{
-  gomp_mutex_destroy (lock);
-}
-
-void
-gomp_set_lock_30 (omp_lock_t *lock)
-{
-  gomp_mutex_lock (lock);
-}
-
-void
-gomp_unset_lock_30 (omp_lock_t *lock)
-{
-  gomp_mutex_unlock (lock);
-}
-
-int
-gomp_test_lock_30 (omp_lock_t *lock)
-{
-  int oldval = 0;
-
-  return __atomic_compare_exchange_n (lock, , 1, false,
- MEMMODEL_ACQUIRE, MEMMODEL_RELAXED);
-}
-
-void
-gomp_init_nest_lock_30 (omp_nest_lock_t *lock)
-{
-  memset (lock, '\0', sizeof (*lock));
-}
-
-void
-gomp_destroy_nest_lock_30 (omp_nest_lock_t *lock)
-{
-}
-
-void
-gomp_set_nest_lock_30 (omp_nest_lock_t *lock)
-{
-  void *me = gomp_icv (true);
-
-  if (lock->owner != me)
-{
-  gomp_mutex_lock (>lock);
-  lock->owner = me;
-}
-
-  lock->count++;
-}
-
-void
-gomp_unset_nest_lock_30 (omp_nest_lock_t *lock)
-{
-  if (--lock->count == 0)
-{
-  lock->owner = NULL;
-  gomp_mutex_unlock (>lock);
-}
-}
-
-int
-gomp_test_nest_lock_30 (omp_nest_lock_t *lock)
-{
-  void *me = gomp_icv (true);
-  int oldval;
-
-  if (lock->owner == me)
-return ++lock->count;
-
-  oldval = 0;
-  if (__atomic_compare_exchange_n (>lock, , 1, false,
-  MEMMODEL_ACQUIRE, MEMMODEL_RELAXED))
-{
-  lock->owner = me;
-  lock->count = 1;
-  return 1;
-}
-
-  return 0;
-}
+/* Reuse the generic implementation in terms of gomp_mutex_t.  */
+#include "../../lock.c"
 
 #ifdef LIBGOMP_GNU_SYMBOL_VERSIONING
 /* gomp_mutex_* can be safely locked in one thread and
diff --git a/libgomp/config/nvptx/lock.c b/libgomp/config/nvptx/lock.c
index e69de29..7731704 100644
--- a/libgomp/config/nvptx/lock.c
+++ b/libgomp/config/nvptx/lock.c
@@ -0,0 +1,41 @@
+/* Copyright (C) 2016 Free Software Foundation, Inc.
+   Contributed by Alexander Monakov .
+
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+/* This is a NVPTX specific implementation of the public OpenMP locking
+   primitives.  */
+
+/* Reuse the generic implementation in terms of gomp_mutex_t.  */
+#include "../../lock.c"
+
+ialias (omp_init_lock)
+ialias (omp_init_nest_lock)
+ialias (omp_destroy_lock)
+ialias (omp_destroy_nest_lock)
+ialias (omp_set_lock)
+ialias (omp_set_nest_lock)
+ialias (omp_unset_lock)
+ialias (omp_unset_nest_lock)
+ialias (omp_test_lock)
+ialias (omp_test_nest_lock)
diff --git a/libgomp/lock.c b/libgomp/lock.c
new file mode 100644
index 000..783bd77
--- /dev/null
+++ b/libgomp/lock.c
@@ -0,0 +1,123 @@
+/* Copyright (C) 2005-2016 Free Software Foundation, Inc.
+   

Re: [wwwdocs] gcc-6/changes.html: diagnostics, Levenshtein, -Wmisleading-indentation, jit (v2)

2016-01-20 Thread Manuel López-Ibáñez
On 20 January 2016 at 17:38, Gerald Pfeifer  wrote:
> On Wed, 20 Jan 2016, Jakub Jelinek wrote:
>>>   Content-Security-Policy: default-src 'self' http: https:
>>>
>>> So either we get the configuration of the web server changed, or
>>> indeed we need to touch all those existing pages.
>> At least the warning/error/note styles are something that multiple pages
>> are using and going to use in the future, so if that could be defined in
>> the main gcc.css, it would be enough.
>
> Done thusly.  With this change, at least gcc-6/changes.html should
> be fine again.
>
> And I can commit working my way backwards through all the other
> changes.html pages over the coming couple of days.

wwwdocs/htdocs$ find . -name '*.html' | xargs grep --color -e " style *="

shows a bit more inline CSS than changes.html, unfortunately.


Re: [PATCH] c++/59759 - ICE in unify, using std::enable_if on classes

2016-01-20 Thread Martin Sebor

On 01/20/2016 06:39 AM, Jason Merrill wrote:

On 01/19/2016 01:09 PM, Martin Sebor wrote:

Attached is the patch to avoid the ICE that Kai posted below
with the test case Marek asked for in his response.  I didn't
see any further followup on the list.


Thanks, but the code is actually well-formed; I've now fixed the bug
properly.


You're right.  I had reduced the test case some more (comment #15)
and inadvertently made it ill-formed in the process by replacing
'typename' with 'class' in the definition of A, while still
triggering the ICE.   Your patch does the right thing with both.

Martin


[gomp-nvptx 00/13] SIMD, teams, Fortran

2016-01-20 Thread Alexander Monakov
Hello,

I'm pushing this patch series to the gomp-nvptx branch.  It adds the
following:

  - backend and omp-low.c changes for SIMT-style SIMD region handling
  - libgomp changes for running the fortran testsuite
  - libgomp changes for spawning multiple OpenMP teams

I'll perform a trunk merge and copyright years update on the branch shortly.
There are 4 tests that still fail in libgomp testsuite with NVPTX offloading:

  - 2 due to missing 'usleep'
  - 2 due to unimplemented 'target nowait'/GOMP_OFFLOAD_run_async.

The most interesting part of the series is omp-low.c additions for lowering of
SIMD regions for SIMT execution.  I've taken care to insert new code only when
the region could be offloaded to NVPTX, and make sure that added code can be
easily cleaned up on the host compiler side.

However, there's one infrastructure piece that I didn't manage to nail down
yet.  We are running in non-default mode outside of SIMD regions, with
per-warp soft-stacks and atomics instrumented to have an effect once per warp.
We need to transition to the opposite on SIMD region boundaries. While
switching atomics is easy, I see no way to model stack switching in GCC IL,
except for doing it at function boundaries (which is then also easy from the
backend point of view).  As a result, we need to outline SIMD regions for
NVPTX into separate functions, if they are not already outlined by virtue of
being combined into an 'omp parallel' or 'omp task'.

To achieve that, I think there are two general possibilities:

1) post lto-streamin, in omp_device_lower, in accel compiler only.  I'm not
sure how hard it would be, it's not something that GCC does normally, although
tree-parloops performs that.  I think this isn't preferable.

2) Up front during omp-lowering, properly outline it together with parallel
and task regions, and tweak inlining so inlining back happens on host side
only.  It looks like I'd need to invent a new ephemeral GIMPLE statement, say
OMP__SIMTREG_ that is handled like other 'taskreg' kinds (OMP_PARALLEL and
OMP_TASK) and artificially inject it in IL.  Or maybe to avoid excessive
surgery, it may be better to reuse existing taskreg kind (OMP_PARALLEL) and
attach and artificial clause instead that signals that this "parallel" is for
outlining a SIMD region.

Thoughts, comments?

Thanks.
Alexander


[gomp-nvptx 10/13] libgomp testsuite: add -foffload=-lgfortran

2016-01-20 Thread Alexander Monakov
Link libgfortran for offloaded code as well.

* testsuite/libgomp.fortran/fortran.exp (lang_link_flags): Pass
-foffload=-lgfortran in addition to -lgfortran.
* testsuite/libgomp.oacc-fortran/fortran.exp (lang_link_flags): Ditto.
---
 libgomp/ChangeLog.gomp-nvptx   | 6 ++
 libgomp/testsuite/libgomp.fortran/fortran.exp  | 2 +-
 libgomp/testsuite/libgomp.oacc-fortran/fortran.exp | 2 +-
 3 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/libgomp/testsuite/libgomp.fortran/fortran.exp 
b/libgomp/testsuite/libgomp.fortran/fortran.exp
index 9e6b643..d848ed4 100644
--- a/libgomp/testsuite/libgomp.fortran/fortran.exp
+++ b/libgomp/testsuite/libgomp.fortran/fortran.exp
@@ -7,7 +7,7 @@ global ALWAYS_CFLAGS
 
 set shlib_ext [get_shlib_extension]
 set lang_library_path  "../libgfortran/.libs"
-set lang_link_flags"-lgfortran"
+set lang_link_flags"-lgfortran -foffload=-lgfortran"
 if [info exists lang_include_flags] then {
 unset lang_include_flags
 }
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/fortran.exp 
b/libgomp/testsuite/libgomp.oacc-fortran/fortran.exp
index 2d6b647..663c932 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/fortran.exp
+++ b/libgomp/testsuite/libgomp.oacc-fortran/fortran.exp
@@ -9,7 +9,7 @@ global ALWAYS_CFLAGS
 
 set shlib_ext [get_shlib_extension]
 set lang_library_path  "../libgfortran/.libs"
-set lang_link_flags"-lgfortran"
+set lang_link_flags"-lgfortran -foffload=-lgfortran"
 if [info exists lang_include_flags] then {
 unset lang_include_flags
 }


[gomp-nvptx 07/13] libgomp plugin: set __nvptx_clocktick

2016-01-20 Thread Alexander Monakov
This is the libgomp plugin side of omp_clock_wtime support on NVPTX.  Query
GPU frequency and copy the value into the device image.

At the moment CUDA driver sets GPU to a fixed frequency when a CUDA context is
created (the default is to use the highest non-boost frequency, but it can be
altered with the nvidia-smi utility), so as long as dynamic boost is not
implemented, and thermal throttling does not happen, what was queried should
correspond to the actual frequency of %clock64 updates.  However, on GTX Titan
we observed that the driver returns GPU frequency that is midway between
actual frequency and boost frequency -- we consider that a driver bug.  Thus,
the implementation comes with a caveat that device-side measurements are less
reliable (than host-side).

* plugin/plugin-nvptx.c (struct ptx_device): New field (clock_khz).
(nvptx_open_device): Set it.
(nvptx_set_clocktick): New.  Use it...
(GOMP_OFFLOAD_load_image): ...here.
---
 libgomp/ChangeLog.gomp-nvptx  |  7 +++
 libgomp/plugin/plugin-nvptx.c | 28 +++-
 2 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index e687586..87e0494 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -287,8 +287,9 @@ struct ptx_device
   bool overlap;
   bool map;
   bool concur;
-  int  mode;
   bool mkern;
+  int  mode;
+  int clock_khz;
 
   struct ptx_image_data *images;  /* Images loaded on device.  */
   pthread_mutex_t image_lock; /* Lock for above list.  */
@@ -641,6 +642,12 @@ nvptx_open_device (int n)
 
   ptx_dev->mkern = pi;
 
+  r = cuDeviceGetAttribute (, CU_DEVICE_ATTRIBUTE_CLOCK_RATE, dev);
+  if (r != CUDA_SUCCESS)
+GOMP_PLUGIN_fatal ("cuDeviceGetAttribute error: %s", cuda_error (r));
+
+  ptx_dev->clock_khz = pi;
+
   r = cuDeviceGetAttribute (_engines,
CU_DEVICE_ATTRIBUTE_ASYNC_ENGINE_COUNT, dev);
   if (r != CUDA_SUCCESS)
@@ -1505,6 +1512,23 @@ GOMP_OFFLOAD_version (void)
   return GOMP_VERSION;
 }
 
+/* Initialize __nvptx_clocktick, if present in MODULE.  */
+
+static void
+nvptx_set_clocktick (CUmodule module, struct ptx_device *dev)
+{
+  CUdeviceptr dptr;
+  CUresult r = cuModuleGetGlobal (, NULL, module, "__nvptx_clocktick");
+  if (r == CUDA_ERROR_NOT_FOUND)
+return;
+  if (r != CUDA_SUCCESS)
+GOMP_PLUGIN_fatal ("cuModuleGetGlobal error: %s", cuda_error (r));
+  double __nvptx_clocktick = 1e-3 / dev->clock_khz;
+  r = cuMemcpyHtoD (dptr, &__nvptx_clocktick, sizeof (__nvptx_clocktick));
+  if (r != CUDA_SUCCESS)
+GOMP_PLUGIN_fatal ("cuMemcpyHtoD error: %s", cuda_error (r));
+}
+
 /* Load the (partial) program described by TARGET_DATA to device
number ORD.  Allocate and return TARGET_TABLE.  */
 
@@ -1590,6 +1614,8 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const 
void *target_data,
   targ_tbl->end = targ_tbl->start + bytes;
 }
 
+  nvptx_set_clocktick (module, dev);
+
   return fn_entries + var_entries;
 }
 


[gomp-nvptx 06/13] libgomp: add nvptx time.c

2016-01-20 Thread Alexander Monakov
This patch implements time.c on NVPTX with the %clock64 register.  The PTX
documentation describes %globaltimer as explicitely off-limits for us.

* config/nvptx/time.c: New.
---
 libgomp/ChangeLog.gomp-nvptx |  4 
 libgomp/config/nvptx/time.c  | 49 
 2 files changed, 53 insertions(+)

diff --git a/libgomp/config/nvptx/time.c b/libgomp/config/nvptx/time.c
index e69de29..08feafe 100644
--- a/libgomp/config/nvptx/time.c
+++ b/libgomp/config/nvptx/time.c
@@ -0,0 +1,49 @@
+/* Copyright (C) 2015 Free Software Foundation, Inc.
+   Contributed by Dmitry Melnik 
+
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+/* This file implements timer routines for NVPTX.  It uses the %clock64 cycle
+   counter.  */
+
+#include "libgomp.h"
+
+/* This is set from host in plugin-nvptx.c.  */
+double __nvptx_clocktick = 0;
+
+double
+omp_get_wtime (void)
+{
+  uint64_t clock;
+  asm ("mov.u64 %0, %%clock64;" : "=r" (clock));
+  return clock * __nvptx_clocktick;
+}
+
+double
+omp_get_wtick (void)
+{
+  return __nvptx_clocktick;
+}
+
+ialias (omp_get_wtime)
+ialias (omp_get_wtick)


[PATCH][cilkplus] fix c++ implicit conversions with cilk_spawn (PR/69024, PR/68997)

2016-01-20 Thread Ryan Burn
This patch follows on from
https://gcc.gnu.org/ml/gcc-patches/2015-12/msg02142.html

As discussed, it creates a separate function
cilk_cp_detect_spawn_and_unwrap in gcc/cp to handle processing
cilk_spawn expressions for c++ and adds support for implicit
constructor and type conversions.

Bootstrapped and regression tested on x86_64-linux.

gcc/c-family/ChangeLog:
2015-01-20  Ryan Burn  

  PR c++/69024
  PR c++/68997
  * cilk.c (cilk_ignorable_spawn_rhs_op): Change to have external linkage.
  * cilk.c (recognize_spawn): Rename to cilk_recognize_spawn. Change to have
  external linkage.
  * cilk.c (cilk_detect_and_unwrap): Rename to recognize_spawn to
  cilk_recognize_spawn.
  * cilk.c (extract_free_variables): Don't extract free variables from
  AGGR_INIT_EXPR slot.

gcc/cp/ChangeLog
2015-01-20  Ryan Burn  

  PR c++/69024
  PR c++/68997
  * cp-gimplify.c (cp_gimplify_expr): Call cilk_cp_detect_spawn_and_unwrap
  instead of cilk_detect_spawn_and_unwrap.
  * cp-cilkplus.c (is_conversion_operator_function_decl_p): New.
  * cp-cilkplus.c (find_spawn): New.
  * cp-cilkplus.c (cilk_cp_detect_spawn_and_unwrap): New.

gcc/testsuite/ChangeLog
2015-01-20  Ryan Burn  

  PR c++/69024
  PR c++/68997
  * g++.dg/cilk-plus/CK/pr68001.cc: Fix to not depend on broken diagnostic.
  * g++.dg/cilk-plus/CK/pr69024.cc: New test.
  * g++.dg/cilk-plus/CK/pr68997.cc: New test.
Index: gcc/c-family/cilk.c
===
--- gcc/c-family/cilk.c (revision 232444)
+++ gcc/c-family/cilk.c (working copy)
@@ -185,7 +185,7 @@
A comparison to constant is simple enough to allow, and
is used to convert to bool.  */
 
-static bool
+bool
 cilk_ignorable_spawn_rhs_op (tree exp)
 {
   enum tree_code code = TREE_CODE (exp);
@@ -223,8 +223,8 @@
 /* Returns true when EXP is a CALL_EXPR with _Cilk_spawn in front.  Unwraps
CILK_SPAWN_STMT wrapper from the CALL_EXPR in *EXP0 statement.  */
 
-static bool
-recognize_spawn (tree exp, tree *exp0)
+bool
+cilk_recognize_spawn (tree exp, tree *exp0)
 {
   bool spawn_found = false;
   if (TREE_CODE (exp) == CILK_SPAWN_STMT)
@@ -292,7 +292,7 @@
   
   /* Now we should have a CALL_EXPR with a CILK_SPAWN_STMT wrapper around 
  it, or return false.  */
-  if (recognize_spawn (exp, exp0))
+  if (cilk_recognize_spawn (exp, exp0))
 return true;
   return false;
 }
@@ -1251,6 +1251,21 @@
   return;
 
 case AGGR_INIT_EXPR:
+  {
+   int len = 0;
+   int ii = 0;
+   extract_free_variables (TREE_OPERAND (t, 1), wd, ADD_READ);
+   if (TREE_CODE (TREE_OPERAND (t, 0)) == INTEGER_CST)
+ {
+   len = TREE_INT_CST_LOW (TREE_OPERAND (t, 0));
+
+   for (ii = 3; ii < len; ii++)
+ extract_free_variables (TREE_OPERAND (t, ii), wd, ADD_READ);
+   extract_free_variables (TREE_TYPE (t), wd, ADD_READ);
+ }
+   break;
+  }
+
 case CALL_EXPR:
   {
int len = 0;
Index: gcc/cp/cp-gimplify.c
===
--- gcc/cp/cp-gimplify.c(revision 232444)
+++ gcc/cp/cp-gimplify.c(working copy)
@@ -39,6 +39,7 @@
 static tree cp_fold_r (tree *, int *, void *);
 static void cp_genericize_tree (tree*);
 static tree cp_fold (tree);
+bool cilk_cp_detect_spawn_and_unwrap (tree *);
 
 /* Local declarations.  */
 
@@ -619,7 +620,7 @@
 case INIT_EXPR:
   if (fn_contains_cilk_spawn_p (cfun))
{
- if (cilk_detect_spawn_and_unwrap (expr_p))
+ if (cilk_cp_detect_spawn_and_unwrap (expr_p))
{
  cilk_cp_gimplify_call_params_in_spawned_fn (expr_p,
  pre_p, post_p);
@@ -637,7 +638,7 @@
 modify_expr_case:
   {
if (fn_contains_cilk_spawn_p (cfun)
-   && cilk_detect_spawn_and_unwrap (expr_p)
+   && cilk_cp_detect_spawn_and_unwrap (expr_p)
&& !seen_error ())
  {
cilk_cp_gimplify_call_params_in_spawned_fn (expr_p, pre_p, post_p);
@@ -738,7 +739,7 @@
 
 case CILK_SPAWN_STMT:
   gcc_assert(fn_contains_cilk_spawn_p (cfun)
-&& cilk_detect_spawn_and_unwrap (expr_p));
+&& cilk_cp_detect_spawn_and_unwrap (expr_p));
 
   if (!seen_error ())
{
@@ -749,7 +750,7 @@
 
 case CALL_EXPR:
   if (fn_contains_cilk_spawn_p (cfun)
- && cilk_detect_spawn_and_unwrap (expr_p)
+ && cilk_cp_detect_spawn_and_unwrap (expr_p)
  && !seen_error ())
{
  cilk_cp_gimplify_call_params_in_spawned_fn (expr_p, pre_p, post_p);
Index: gcc/cp/cp-cilkplus.c
===
--- gcc/cp/cp-cilkplus.c(revision 232444)
+++ gcc/cp/cp-cilkplus.c(working copy)
@@ -27,6 +27,108 @@
 #include "tree-iterator.h"
 #include "cilk.h"
 
+bool cilk_ignorable_spawn_rhs_op (tree);
+bool 

[wwwdocs] Make colors work again on gcc-5/changes.html

2016-01-20 Thread Gerald Pfeifer
Per our discussion around gcc-6/changes.html, this uses the new
global styles (and adds another one).

Committed.

Gerald

Introduce a new CSS class boldlime.  Use this, and similar ones
throughout gcc-5/changes.html.

Index: gcc.css
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc.css,v
retrieving revision 1.26
diff -u -r1.26 gcc.css
--- gcc.css 20 Jan 2016 17:29:35 -  1.26
+++ gcc.css 20 Jan 2016 17:41:30 -
@@ -51,6 +51,7 @@
 }
 
 .boldcyan{ font-weight:bold; color:cyan; }
+.boldlime{ font-weight:bold; color:lime; }
 .boldmagenta { font-weight:bold; color:magenta; }
 .boldred { font-weight:bold; color:red; }
 
Index: gcc-5/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-5/changes.html,v
retrieving revision 1.136
diff -u -r1.136 changes.html
--- gcc-5/changes.html  18 Dec 2015 17:59:09 -  1.136
+++ gcc-5/changes.html  20 Jan 2016 17:41:30 -
@@ -540,13 +540,13 @@
   test.f90:6:1:
 
0 continue
-   1
-  Error: Zero is not a valid statement label at 
(1)
+   1
+  Error: Zero is not a valid statement label 
at (1)
   test.f90:9:6:
 
  USE foo
-1
-  Warning: USE statement at (1) has no ONLY 
qualifier [-Wuse-without-only]
+1
+  Warning: USE statement at (1) has no 
ONLY qualifier [-Wuse-without-only]
 
 
 The -Wuse-without-only option has been added to warn when 
a


[gomp-nvptx 01/13] nvptx backend: new patterns for OpenMP SIMD-via-SIMT

2016-01-20 Thread Alexander Monakov
This patch adds a few insn patterns used for OpenMP SIMD
reduction/lastprivate/ordered lowering for SIMT execution.  OpenMP lowering
produces GOMP_SIMT_... internal functions when lowering SIMD constructs that
can be offloaded to a SIMT device.  After lto stream-in, those internal
functions are trivially folded when compiling for non-SIMT execution;
otherwise they are kept, and expanded into these insns.

* config/nvptx/nvptx-protos.h (nvptx_shuffle_kind): Move enum
declaration from nvptx.c.
(nvptx_gen_shuffle): Declare.
* config/nvptx/nvptx.c (nvptx_shuffle_kind): Moved to nvptx-protos.h.
(nvptx_gen_shuffle): No longer static.
* config/nvptx/nvptx.md (UNSPEC_VOTE_BALLOT): New unspec.
(UNSPEC_LANEID): Ditto.
(UNSPECV_NOUNROLL): Ditto.
(nvptx_vote_ballot): New pattern.
(omp_simt_lane): Ditto.
(nvptx_nounroll): Ditto.
(omp_simt_last_lane): Ditto.
(omp_simt_ordered): Ditto.
(omp_simt_vote_any): Ditto.
(omp_simt_xchg_bfly): Ditto.
(omp_simt_xchg_idx): Ditto.
* target-insns.def (omp_simt_lane): New.
(omp_simt_last_lane): New.
(omp_simt_ordered): New.
(omp_simt_vote_any): New.
(omp_simt_xchg_bfly): New.
(omp_simt_xchg_idx): New.
---
 gcc/ChangeLog.gomp-nvptx| 25 +
 gcc/config/nvptx/nvptx-protos.h | 11 ++
 gcc/config/nvptx/nvptx.c| 12 +-
 gcc/config/nvptx/nvptx.md   | 81 +
 gcc/target-insns.def|  6 +++
 5 files changed, 124 insertions(+), 11 deletions(-)

diff --git a/gcc/config/nvptx/nvptx-protos.h b/gcc/config/nvptx/nvptx-protos.h
index 7e0c296..e38c6ad 100644
--- a/gcc/config/nvptx/nvptx-protos.h
+++ b/gcc/config/nvptx/nvptx-protos.h
@@ -21,6 +21,16 @@
 #ifndef GCC_NVPTX_PROTOS_H
 #define GCC_NVPTX_PROTOS_H
 
+/* The kind of shuffe instruction.  */
+enum nvptx_shuffle_kind
+{
+  SHUFFLE_UP,
+  SHUFFLE_DOWN,
+  SHUFFLE_BFLY,
+  SHUFFLE_IDX,
+  SHUFFLE_MAX
+};
+
 extern void nvptx_declare_function_name (FILE *, const char *, const_tree 
decl);
 extern void nvptx_declare_object_name (FILE *file, const char *name,
   const_tree decl);
@@ -36,6 +46,7 @@ extern void nvptx_register_pragmas (void);
 extern void nvptx_expand_oacc_fork (unsigned);
 extern void nvptx_expand_oacc_join (unsigned);
 extern void nvptx_expand_call (rtx, rtx);
+extern rtx nvptx_gen_shuffle (rtx, rtx, rtx, nvptx_shuffle_kind);
 extern rtx nvptx_expand_compare (rtx);
 extern const char *nvptx_ptx_type_from_mode (machine_mode, bool);
 extern const char *nvptx_output_mov_insn (rtx, rtx);
diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index d557646..45aebdd 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -70,16 +70,6 @@
 /* This file should be included last.  */
 #include "target-def.h"
 
-/* The kind of shuffe instruction.  */
-enum nvptx_shuffle_kind
-{
-  SHUFFLE_UP,
-  SHUFFLE_DOWN,
-  SHUFFLE_BFLY,
-  SHUFFLE_IDX,
-  SHUFFLE_MAX
-};
-
 /* The various PTX memory areas an object might reside in.  */
 enum nvptx_data_area
 {
@@ -1400,7 +1390,7 @@ nvptx_gen_pack (rtx dst, rtx src0, rtx src1)
 /* Generate an instruction or sequence to broadcast register REG
across the vectors of a single warp.  */
 
-static rtx
+rtx
 nvptx_gen_shuffle (rtx dst, rtx src, rtx idx, nvptx_shuffle_kind kind)
 {
   rtx res;
diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 130c809..1522aa3 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -43,6 +43,10 @@ (define_c_enum "unspec" [
 
UNSPEC_BIT_CONV
 
+   UNSPEC_VOTE_BALLOT
+
+   UNSPEC_LANEID
+
UNSPEC_SHUFFLE
UNSPEC_BR_UNIFIED
 ])
@@ -58,6 +62,8 @@ (define_c_enum "unspecv" [
UNSPECV_FORKED
UNSPECV_JOINING
UNSPECV_JOIN
+
+   UNSPECV_NOUNROLL
 ])
 
 (define_attr "subregs_ok" "false,true"
@@ -1239,6 +1245,81 @@ (define_insn "nvptx_shuffle"
   ""
   "%.\\tshfl%S3.b32\\t%0, %1, %2, 31;")
 
+(define_insn "nvptx_vote_ballot"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
+   (unspec:SI [(match_operand:BI 1 "nvptx_register_operand" "R")]
+  UNSPEC_VOTE_BALLOT))]
+  ""
+  "%.\\tvote.ballot.b32\\t%0, %1;")
+
+(define_insn "omp_simt_lane"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "")
+   (unspec:SI [(const_int 0)] UNSPEC_LANEID))]
+  ""
+  "%.\\tmov.u32\\t%0, %%laneid;")
+
+(define_insn "nvptx_nounroll"
+  [(unspec_volatile [(const_int 0)] UNSPECV_NOUNROLL)]
+  ""
+  "\\t.pragma \\\"nounroll\\\";"
+  [(set_attr "predicable" "false")])
+
+(define_expand "omp_simt_last_lane"
+  [(match_operand:SI 0 "nvptx_register_operand" "=R")
+   (match_operand:SI 1 "nvptx_register_operand" "R")]
+  ""
+{
+  rtx pred = gen_reg_rtx (BImode);
+  rtx tmp = gen_reg_rtx (SImode);
+  emit_move_insn (pred, gen_rtx_NE (BImode, operands[1], const0_rtx));
+  emit_insn (gen_nvptx_vote_ballot (tmp, 

[gomp-nvptx 05/13] libgomp: remove sections.c, splay-tree.c

2016-01-20 Thread Alexander Monakov
This patch removes two zero-size stubs, there's no need for these overrides.

* config/nvptx/section.c: Delete.
* config/nvptx/splay-tree.c: Delete.
---
 libgomp/ChangeLog.gomp-nvptx  | 5 +
 libgomp/config/nvptx/sections.c   | 0
 libgomp/config/nvptx/splay-tree.c | 0
 3 files changed, 5 insertions(+)
 delete mode 100644 libgomp/config/nvptx/sections.c
 delete mode 100644 libgomp/config/nvptx/splay-tree.c

diff --git a/libgomp/config/nvptx/sections.c b/libgomp/config/nvptx/sections.c
deleted file mode 100644
index e69de29..000
diff --git a/libgomp/config/nvptx/splay-tree.c 
b/libgomp/config/nvptx/splay-tree.c
deleted file mode 100644
index e69de29..000


[gomp-nvptx 13/13] libgomp plugin: handle multiple teams

2016-01-20 Thread Alexander Monakov
This complements multiple teams support on the libgomp plugin side.

* plugin/plugin-nvptx.c (struct targ_fn_descriptor): Add new fields.
(struct ptx_device): Ditto.  Set them...
(nvptx_open_device): ...here.
(GOMP_OFFLOAD_load_image): Set new targ_fn_descriptor fields.
(nvptx_adjust_launch_bounds): New.  Use it...
(GOMP_OFFLOAD_run): ...here.
---
 libgomp/ChangeLog.gomp-nvptx  |   9 
 libgomp/plugin/plugin-nvptx.c | 106 +++---
 2 files changed, 109 insertions(+), 6 deletions(-)

diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 87e0494..b7bf59b 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -254,6 +254,8 @@ struct targ_fn_descriptor
 {
   CUfunction fn;
   const struct targ_fn_launch *launch;
+  int regs_per_thread;
+  int max_threads_per_block;
 };
 
 /* A loaded PTX image.  */
@@ -290,6 +292,9 @@ struct ptx_device
   bool mkern;
   int  mode;
   int clock_khz;
+  int num_sms;
+  int regs_per_block;
+  int regs_per_sm;
 
   struct ptx_image_data *images;  /* Images loaded on device.  */
   pthread_mutex_t image_lock; /* Lock for above list.  */
@@ -648,6 +653,36 @@ nvptx_open_device (int n)
 
   ptx_dev->clock_khz = pi;
 
+  r = cuDeviceGetAttribute (, CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT, 
dev);
+  if (r != CUDA_SUCCESS)
+GOMP_PLUGIN_fatal ("cuDeviceGetAttribute error: %s", cuda_error (r));
+
+  ptx_dev->num_sms = pi;
+
+  r = cuDeviceGetAttribute (, CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_BLOCK,
+   dev);
+  if (r != CUDA_SUCCESS)
+GOMP_PLUGIN_fatal ("cuDeviceGetAttribute error: %s", cuda_error (r));
+
+  ptx_dev->regs_per_block = pi;
+
+  /* CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR = 82 is defined only
+ in CUDA 6.0 and newer.  */
+  r = cuDeviceGetAttribute (, 82, dev);
+  /* Fallback: use limit of registers per block, which is usually equal.  */
+  if (r == CUDA_ERROR_INVALID_VALUE)
+pi = ptx_dev->regs_per_block;
+  else if (r != CUDA_SUCCESS)
+GOMP_PLUGIN_fatal ("cuDeviceGetAttribute error: %s", cuda_error (r));
+
+  ptx_dev->regs_per_sm = pi;
+
+  r = cuDeviceGetAttribute (, CU_DEVICE_ATTRIBUTE_WARP_SIZE, dev);
+  if (r != CUDA_SUCCESS)
+GOMP_PLUGIN_fatal ("cuDeviceGetAttribute error: %s", cuda_error (r));
+  if (pi != 32)
+GOMP_PLUGIN_fatal ("Only warp size 32 is supported");
+
   r = cuDeviceGetAttribute (_engines,
CU_DEVICE_ATTRIBUTE_ASYNC_ENGINE_COUNT, dev);
   if (r != CUDA_SUCCESS)
@@ -1589,13 +1624,23 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, 
const void *target_data,
   for (i = 0; i < fn_entries; i++, targ_fns++, targ_tbl++)
 {
   CUfunction function;
+  int nregs, mthrs;
 
   r = cuModuleGetFunction (, module, fn_descs[i].fn);
   if (r != CUDA_SUCCESS)
GOMP_PLUGIN_fatal ("cuModuleGetFunction error: %s", cuda_error (r));
+  r = cuFuncGetAttribute (, CU_FUNC_ATTRIBUTE_NUM_REGS, function);
+  if (r != CUDA_SUCCESS)
+   GOMP_PLUGIN_fatal ("cuFuncGetAttribute error: %s", cuda_error (r));
+  r = cuFuncGetAttribute (, CU_FUNC_ATTRIBUTE_MAX_THREADS_PER_BLOCK,
+ function);
+  if (r != CUDA_SUCCESS)
+   GOMP_PLUGIN_fatal ("cuFuncGetAttribute error: %s", cuda_error (r));
 
   targ_fns->fn = function;
   targ_fns->launch = _descs[i];
+  targ_fns->regs_per_thread = nregs;
+  targ_fns->max_threads_per_block = mthrs;
 
   targ_tbl->start = (uintptr_t) targ_fns;
   targ_tbl->end = targ_tbl->start + 1;
@@ -1822,19 +1867,67 @@ GOMP_OFFLOAD_openacc_set_cuda_stream (int async, void 
*stream)
   return nvptx_set_cuda_stream (async, stream);
 }
 
+/* Adjust launch dimensions: pick good values for number of blocks and warps
+   and ensure that number of warps does not exceed CUDA limits as well as GCC's
+   own limits.  */
+
+static void
+nvptx_adjust_launch_bounds (struct targ_fn_descriptor *fn,
+   struct ptx_device *ptx_dev,
+   long *teams_p, long *threads_p)
+{
+  int max_warps_block = fn->max_threads_per_block / 32;
+  /* Maximum 32 warps per block is an implementation limit in NVPTX backend
+ and libgcc, which matches documented limit of all GPUs as of 2015.  */
+  if (max_warps_block > 32)
+max_warps_block = 32;
+  if (*threads_p <= 0)
+*threads_p = 8;
+  if (*threads_p > max_warps_block)
+*threads_p = max_warps_block;
+
+  int regs_per_block = fn->regs_per_thread * 32 * *threads_p;
+  /* This is an estimate of how many blocks the device can host simultaneously.
+ Actual limit, which may be lower, can be queried with "occupancy control"
+ driver interface (since CUDA 6.0).  */
+  int max_blocks = ptx_dev->regs_per_sm / regs_per_block * ptx_dev->num_sms;
+  if (*teams_p <= 0 || *teams_p > max_blocks)
+*teams_p = max_blocks;
+}
+
 void
-GOMP_OFFLOAD_run (int ord, 

Re: [hsa merge 07/10] IPA-HSA pass

2016-01-20 Thread Ilya Verbin
On Fri, Jan 15, 2016 at 21:05:47 +0300, Ilya Verbin wrote:
> On Fri, Jan 15, 2016 at 17:45:22 +0100, Jakub Jelinek wrote:
> > On Fri, Jan 15, 2016 at 07:38:14PM +0300, Ilya Verbin wrote:
> > > On Fri, Jan 15, 2016 at 17:09:54 +0100, Jakub Jelinek wrote:
> > > > On Fri, Jan 15, 2016 at 05:02:34PM +0100, Martin Jambor wrote:
> > > > > How do other accelerators cope with the situation when half of the
> > > > > application is compiled with the accelerator disabled?  (Would some of
> > > > > their calls to GOMP_target_ext lead to abort?)
> > > > 
> > > > GOMP_target_ext should never abort (unless internal error), worst case 
> > > > it
> > > > just falls back into the host fallback.
> > > 
> > > Wouldn't that lead to hard-to-find problems in case of nonshared memory?
> > > I mean when someone expects that all target regions are executed on the 
> > > device,
> > > but in fact some of them are silently executed on the host with different 
> > > data
> > > environment.
> > 
> > E.g. for HSA it really shouldn't matter, as it is shared memory accelerator.
> > For XeonPhi we hopefully can offload anything.
> 
> As you said, if compilation of target image fails with ICE or somehow, host
> fallback and offloading to other targets should still work:
> https://gcc.gnu.org/ml/gcc-patches/2015-02/msg00951.html
> That patch was not applied, but it can be simulated by -foffload=disable,

I agree that OpenMP doesn't guarantee that all target regions must be executed
on the device, but in this case a user can't be sure that some library function
always will offload (because the library might be replaced by fallback version),
and he/she will have to write something like:

{
  map_data_to_target ();
  some_library1_fn_with_offload ();
  get_data_from_target ();   /* ! */
  send_data_to_target ();/* ! */
  some_library2_fn_with_offload ();
  get_data_from_target ();   /* ! */
  send_data_to_target ();/* ! */
  some_library3_fn_with_offload ();
  unmap_data_from_target ();
}

If you're OK with this, I'll install this patch:


libgomp/
* target.c (gomp_get_target_fn_addr): Allow host fallback if target
function wasn't mapped to the device with non-shared memory.

diff --git a/libgomp/target.c b/libgomp/target.c
index f1f5849..96fe3d5 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -1436,12 +1436,7 @@ gomp_get_target_fn_addr (struct gomp_device_descr 
*devicep,
   splay_tree_key tgt_fn = splay_tree_lookup (>mem_map, );
   gomp_mutex_unlock (>lock);
   if (tgt_fn == NULL)
-   {
- if (devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
-   return NULL;
- else
-   gomp_fatal ("Target function wasn't mapped");
-   }
+   return NULL;
 
   return (void *) tgt_fn->tgt_offset;
 }

  -- Ilya


[Patch AArch64] GCC 6 regression in vector performance. - Fix vector initialization to happen with lane load instructions.

2016-01-20 Thread James Greenhalgh

Hi,

In a number of cases where we try to create vectors we end up spilling to the
stack and then filling. This is one example distilled from a couple of
micro-benchmrks where the issue shows up. The reason for the extra cost
in this case is the unnecessary use of the stack. The patch attempts to
finesse this by using lane loads or vector inserts to produce the right
results.

This patch is mostly Ramana's work, I've just cleaned it up a little.

This has been in a number of our trees lately, and we haven't seen any
regressions. I've also bootstrapped and tested it, and run a set of
benchmarks to show no regressions on Cortex-A57 or Cortex-A53.

The patch fixes some regressions caused by the more agressive vectorization
in GCC6, so I'd like to propose it to go in even though we are in Stage 4.

OK?

Thanks,
James

---
gcc/

2016-01-20  James Greenhalgh  
Ramana Radhakrishnan  

* config/aarch64/aarch64.c (aarch64_expand_vector_init): Refactor,
always use lane loads to construct non-constant vectors.

gcc/testsuite/

2016-01-20  James Greenhalgh  
Ramana Radhakrishnan  

* gcc.target/aarch64/vector_initialization_nostack.c: New.

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 03bc1b9..3787b38 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -10985,28 +10985,37 @@ aarch64_simd_make_constant (rtx vals)
 return NULL_RTX;
 }
 
+/* Expand a vector initialisation sequence, such that TARGET is
+   initialised to contain VALS.  */
+
 void
 aarch64_expand_vector_init (rtx target, rtx vals)
 {
   machine_mode mode = GET_MODE (target);
   machine_mode inner_mode = GET_MODE_INNER (mode);
+  /* The number of vector elements.  */
   int n_elts = GET_MODE_NUNITS (mode);
+  /* The number of vector elements which are not constant.  */
   int n_var = 0;
   rtx any_const = NULL_RTX;
+  /* The first element of vals.  */
+  rtx v0 = XVECEXP (vals, 0, 0);
   bool all_same = true;
 
+  /* Count the number of variable elements to initialise.  */
   for (int i = 0; i < n_elts; ++i)
 {
   rtx x = XVECEXP (vals, 0, i);
-  if (!CONST_INT_P (x) && !CONST_DOUBLE_P (x))
+  if (!(CONST_INT_P (x) || CONST_DOUBLE_P (x)))
 	++n_var;
   else
 	any_const = x;
 
-  if (i > 0 && !rtx_equal_p (x, XVECEXP (vals, 0, 0)))
-	all_same = false;
+  all_same &= rtx_equal_p (x, v0);
 }
 
+  /* No variable elements, hand off to aarch64_simd_make_constant which knows
+ how best to handle this.  */
   if (n_var == 0)
 {
   rtx constant = aarch64_simd_make_constant (vals);
@@ -11020,14 +11029,15 @@ aarch64_expand_vector_init (rtx target, rtx vals)
   /* Splat a single non-constant element if we can.  */
   if (all_same)
 {
-  rtx x = copy_to_mode_reg (inner_mode, XVECEXP (vals, 0, 0));
+  rtx x = copy_to_mode_reg (inner_mode, v0);
   aarch64_emit_move (target, gen_rtx_VEC_DUPLICATE (mode, x));
   return;
 }
 
-  /* Half the fields (or less) are non-constant.  Load constant then overwrite
- varying fields.  Hope that this is more efficient than using the stack.  */
-  if (n_var <= n_elts/2)
+  /* Initialise a vector which is part-variable.  We want to first try
+ to build those lanes which are constant in the most efficient way we
+ can.  */
+  if (n_var != n_elts)
 {
   rtx copy = copy_rtx (vals);
 
@@ -11054,31 +11064,21 @@ aarch64_expand_vector_init (rtx target, rtx vals)
 	  XVECEXP (copy, 0, i) = subst;
 	}
   aarch64_expand_vector_init (target, copy);
+}
 
-  /* Insert variables.  */
-  enum insn_code icode = optab_handler (vec_set_optab, mode);
-  gcc_assert (icode != CODE_FOR_nothing);
+  /* Insert the variable lanes directly.  */
 
-  for (int i = 0; i < n_elts; i++)
-	{
-	  rtx x = XVECEXP (vals, 0, i);
-	  if (CONST_INT_P (x) || CONST_DOUBLE_P (x))
-	continue;
-	  x = copy_to_mode_reg (inner_mode, x);
-	  emit_insn (GEN_FCN (icode) (target, x, GEN_INT (i)));
-	}
-  return;
-}
+  enum insn_code icode = optab_handler (vec_set_optab, mode);
+  gcc_assert (icode != CODE_FOR_nothing);
 
-  /* Construct the vector in memory one field at a time
- and load the whole vector.  */
-  rtx mem = assign_stack_temp (mode, GET_MODE_SIZE (mode));
   for (int i = 0; i < n_elts; i++)
-emit_move_insn (adjust_address_nv (mem, inner_mode,
-i * GET_MODE_SIZE (inner_mode)),
-		XVECEXP (vals, 0, i));
-  emit_move_insn (target, mem);
-
+{
+  rtx x = XVECEXP (vals, 0, i);
+  if (CONST_INT_P (x) || CONST_DOUBLE_P (x))
+	continue;
+  x = copy_to_mode_reg (inner_mode, x);
+  emit_insn (GEN_FCN (icode) (target, x, GEN_INT (i)));
+}
 }
 
 static unsigned HOST_WIDE_INT
diff --git a/gcc/testsuite/gcc.target/aarch64/vector_initialization_nostack.c 

Re: reject decl with incomplete struct/union type in check_global_declaration()

2016-01-20 Thread Marek Polacek
On Wed, Jan 20, 2016 at 06:52:48PM +0530, Prathamesh Kulkarni wrote:
> Thanks for the review, I have done the suggested changes in this
> version of the patch.
> Ok for trunk ?

Given that Joseph already approved substance of the patch, this is ok
(but you might want to correct a typo below), thanks.

> +/* We need to walk over decls with incomplete struct/union/enum types
> +   after parsing the whole translation unit.
> +   In finish_decl(), if the decl is static, has incomplete
> +   struct/union/enum type, it is appened to incomplete_record_decls.

"appended"

> +   In c_parser_translation_unit(), we iterate over incomplete_record_decls
> +   and report error if any of the decls are still incomplete.  */ 

Marek


Re: [PATCH] Fix ICE with asm "m" (stmt-expr) operand (PR middle-end/67653)

2016-01-20 Thread Andreas Schwab
Richard Biener  writes:

> On Wed, 20 Jan 2016, Andreas Schwab wrote:
>
>> Jakub Jelinek  writes:
>> 
>> > +"memory input %d is not directly addressable",
>> 
>> What does that mean?
>
> Is "input %d is not addressable" better?

If that's the same what the standard calls "addressable storage", then
it's ok.

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: [PATCH] [ARC] Add basic support for double load and store instructions

2016-01-20 Thread Joern Wolfgang Rennecke


On 19/01/16 17:46, Claudiu Zissulescu wrote:
> Hi,
>
> I've prepared a new patch based on the received review (attached). I 
also added a mod on invoke.texi regarding mll64 documentation. This mod 
was missing in the first patch.

>
> I have tested it with dg.exp for arc700, archs and archs+ll64.
>
> Please let me know if everything is alright.

Oops, I missed this the first time round:
> @@ -7009,14 +7038,23 @@ arc_expand_movmem (rtx *operands)
>size = INTVAL (operands[2]);
>/* move_by_pieces_ninsns is static, so we can't use it. */
>if (align >= 4)
> -n_pieces = (size + 2) / 4U + (size & 1);
> +{
> +  if (TARGET_LL64)
> + n_pieces = (size + 2) / 8U + (size & 1);
> +  else
>
You probably mean something like:

n_pieces = (size + 4) / 8U + ((size >> 1) & 1) + (size & 1);

> -  if (piece > 4)
> +  /* Force 32 bit aligned and larger datum to use 64 bit transfers, if
> + possible.  */
> +  if (TARGET_LL64 && (piece >= 4))
> +piece = 8;
This needs another condition size >= 8 .

While looking at this code, I also notice we got a pre-exisitng problem
(read: inefficiency) with the number of pieces we actually make.

if (piece > size)
  piece = size & -size

will pick the smallest power of two in the decomposition of size.. and
that'll be the transfer size for the rest of the loop.
Better would be:

while (piece > size)
  piece >>= 1;


What you do with arc_split_move looks like it'll work for movdi, and the
the problem with movdf_insn+1 is pre-existing (I just noticed that), but
it's rather odd to have the split pattern solely for allocating an a larger
operands array.

I think it would make more sense to remote the assignment to
operands 2..5 in arc_spli_move, and instead use xop[0+swap] / xop[1+swap] /
xop2[swap] / xop[3-swap] directly to emit the insns.


[gomp-nvptx 04/13] nvptx backend: add support for placing variables in shared memory

2016-01-20 Thread Alexander Monakov
This patch allows to use __attribute__((shared)) to place non-automatic
variables in shared memory.

* config/nvptx/nvptx.c (nvptx_encode_section_info): Handle "shared"
attribute.
(nvptx_handle_shared_attribute): New.  Use it...
(nvptx_attribute_table): ... here (new entry).
---
 gcc/ChangeLog.gomp-nvptx |  7 +++
 gcc/config/nvptx/nvptx.c | 33 ++---
 2 files changed, 37 insertions(+), 3 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index f63f840..5c8c28b 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -228,9 +228,12 @@ nvptx_encode_section_info (tree decl, rtx rtl, int first)
   if (TREE_CONSTANT (decl))
area = DATA_AREA_CONST;
   else if (TREE_CODE (decl) == VAR_DECL)
-   /* TODO: This would be a good place to check for a .shared or
-  other section name.  */
-   area = TREE_READONLY (decl) ? DATA_AREA_CONST : DATA_AREA_GLOBAL;
+   {
+ if (lookup_attribute ("shared", DECL_ATTRIBUTES (decl)))
+   area = DATA_AREA_SHARED;
+ else
+   area = TREE_READONLY (decl) ? DATA_AREA_CONST : DATA_AREA_GLOBAL;
+   }
 
   SET_SYMBOL_DATA_AREA (XEXP (rtl, 0), area);
 }
@@ -4047,12 +4050,36 @@ nvptx_handle_kernel_attribute (tree *node, tree name, 
tree ARG_UNUSED (args),
   return NULL_TREE;
 }
 
+/* Handle a "shared" attribute; arguments as in
+   struct attribute_spec.handler.  */
+
+static tree
+nvptx_handle_shared_attribute (tree *node, tree name, tree ARG_UNUSED (args),
+  int ARG_UNUSED (flags), bool *no_add_attrs)
+{
+  tree decl = *node;
+
+  if (TREE_CODE (decl) != VAR_DECL)
+{
+  error ("%qE attribute only applies to variables", name);
+  *no_add_attrs = true;
+}
+  else if (current_function_decl && !TREE_STATIC (decl))
+{
+  error ("%qE attribute only applies to non-stack variables", name);
+  *no_add_attrs = true;
+}
+
+  return NULL_TREE;
+}
+
 /* Table of valid machine attributes.  */
 static const struct attribute_spec nvptx_attribute_table[] =
 {
   /* { name, min_len, max_len, decl_req, type_req, fn_type_req, handler,
affects_type_identity } */
   { "kernel", 0, 0, true, false,  false, nvptx_handle_kernel_attribute, false 
},
+  { "shared", 0, 0, true, false,  false, nvptx_handle_shared_attribute, false 
},
   { NULL, 0, 0, false, false, false, NULL, false }
 };
 


[gomp-nvptx 03/13] nvptx backend: silence warning

2016-01-20 Thread Alexander Monakov
* config/nvptx/nvptx.c (nvptx_declare_function_name): Fix warning.
---
 gcc/ChangeLog.gomp-nvptx | 4 
 gcc/config/nvptx/nvptx.c | 2 +-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 45aebdd..f63f840 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -890,7 +890,7 @@ nvptx_declare_function_name (FILE *file, const char *name, 
const_tree decl)
   if (sz == 0 && cfun->machine->has_call_with_sc)
 sz = 1;
   bool need_sp = cfun->calls_alloca || cfun->machine->has_call_with_varargs;
-  if (sz > 0 || TARGET_SOFT_STACK && need_sp)
+  if (sz > 0 || (TARGET_SOFT_STACK && need_sp))
 {
   int alignment = crtl->stack_alignment_needed / BITS_PER_UNIT;
 


[wwwdocs] Colorize gcc-4.9/changes.html again

2016-01-20 Thread Gerald Pfeifer
Use global CSS classes instead of local styles (which browsers now
often block based on the server settings of gcc.gnu.org).

Committed.

Gerald

Index: gcc-4.9/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.9/changes.html,v
retrieving revision 1.87
diff -u -r1.87 changes.html
--- gcc-4.9/changes.html28 Jun 2015 15:19:47 -  1.87
+++ gcc-4.9/changes.html20 Jan 2016 18:50:58 -
@@ -163,17 +163,17 @@
 
 $ g++ -fdiagnostics-color=always -S -Wall test.C
 test.C: In function int foo():
-test.C:1:14: warning: no return 
statement in function returning non-void [-Wreturn-type]
+test.C:1:14: warning: no return 
statement in function returning non-void [-Wreturn-type]
  int foo () { }
-  ^
-test.C:2:46: error: template instantiation 
depth exceeds maximum of 900 (use -ftemplate-depth= to increase the maximum) 
instantiating struct X100
+  ^
+test.C:2:46: error: template 
instantiation depth exceeds maximum of 900 (use -ftemplate-depth= to increase 
the maximum) instantiating struct X100
  template int N struct X { static const int value = 
XN-1::value; }; template struct X1000;
-  ^
+  ^
 test.C:2:46:   recursively required from const int 
X999::value
 test.C:2:46:   required from const int 
X1000::value
 test.C:2:88:   required from here
 
-test.C:2:46: error: incomplete type 
X100 used in nested name specifier
+test.C:2:46: error: incomplete type 
X100 used in nested name specifier
 
 
 With the new 

[PATCH] PR target/68609 vector swsqrt

2016-01-20 Thread David Edelsohn
This patch finishes PR target/68609 to use reciprocal estimate for vector sqrt.

PR target/68609
* config/rs6000/rs6000.c (rs6000_emit_swsqrt): Add vector domain check.
* config/rs6000/vector.md (sqrt2): Call rs6000_emit_swsqrt for V4SFmode.

Thanks, David

Index: rs6000.c
===
--- rs6000.c (revision 232439)
+++ rs6000.c (working copy)
@@ -32904,10 +32904,19 @@
   if (!recip)
 {
   rtx zero = force_reg (mode, CONST0_RTX (mode));
-  rtx target = emit_conditional_move (e, GT, src, zero, mode,
-  e, zero, mode, 0);
-  if (target != e)
- emit_move_insn (e, target);
+
+  if (mode == SFmode)
+ {
+  rtx target = emit_conditional_move (e, GT, src, zero, mode,
+  e, zero, mode, 0);
+  if (target != e)
+emit_move_insn (e, target);
+ }
+  else
+ {
+  rtx cond = gen_rtx_GT (VOIDmode, e, zero);
+  rs6000_emit_vector_cond_expr (e, e, zero, cond, src, zero);
+ }
 }

   /* g = sqrt estimate.  */
Index: vector.md
===
--- vector.md (revision 232438)
+++ vector.md (working copy)
@@ -270,7 +270,16 @@
   [(set (match_operand:VEC_F 0 "vfloat_operand" "")
  (sqrt:VEC_F (match_operand:VEC_F 1 "vfloat_operand" "")))]
   "VECTOR_UNIT_VSX_P (mode)"
-  "")
+{
+  if (mode == V4SFmode
+  && !optimize_function_for_size_p (cfun)
+  && flag_finite_math_only && !flag_trapping_math
+  && flag_unsafe_math_optimizations)
+{
+  rs6000_emit_swsqrt (operands[0], operands[1], 0);
+  DONE;
+}
+})

 (define_expand "rsqrte2"
   [(set (match_operand:VEC_F 0 "vfloat_operand" "")


[trans-mem] PR 68964 fallout -- bootstrap 69343+69339

2016-01-20 Thread Richard Henderson
These two bootstrap PRs, caused by old binutils on s390x and embedded powerpc
that doesn't support altivec, could have been worked around with yet more
configure checks.

But after discussion on IRC, we decided that to revert the arm+ppc+s390 changes
entirely and keep the libitm ABI the same.  Instead of calling new functions
with vector parameters (which internally wound up calling memcpy), we will
instead let the compiler call memcpy directly.  This solves all of the
platform-specific configury wrt when vector instructions are available and
avoids a bit of indirection in the end.

The portion of the patch for 68964 that *isn't* reverted by this does solve the
original ICE by deferring otherwise unhandled types to memcpy.

(Re-)tested on x86_64, s390x, and aarch64, and committed.


r~
PR bootstrap/69343
PR bootstrap/69339
PR tree-opt/68964

Revert:
gcc/
* tree.c (tm_define_builtin): New.
(find_tm_vector_type): New.
(build_tm_vector_builtins): New.
(build_common_builtin_nodes): Call it.
libitm/
* Makefile.am (libitm_la_SOURCES) [ARCH_AARCH64]: Add vect128.cc
(libitm_la_SOURCES) [ARCH_ARM]: Add neon.cc
(libitm_la_SOURCES) [ARCH_PPC]: Add vect128.cc
(libitm_la_SOURCES) [ARCH_S390]: Add vect128.cc
* configure.ac (ARCH_AARCH64): New conditional.
(ARCH_PPC, ARCH_S390): Likewise.
* Makefile.in, configure: Rebuild.
* libitm.h (_ITM_TYPE_M128): Always define.
* vect64.cc: Split ...
* vect128.cc: ... out of...
* config/x86/x86_sse.cc: ... here.
* config/arm/neon.cc: New file.

 
diff --git a/gcc/tree.c b/gcc/tree.c
index 4e54a7e..8fef0d1 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -10332,143 +10332,6 @@ local_define_builtin (const char *name, tree type, 
enum built_in_function code,
   set_builtin_decl (code, decl, true);
 }
 
-/* A subroutine of build_tm_vector_builtins.  Define a builtin with
-   all of the appropriate attributes.  */
-static void
-tm_define_builtin (const char *name, tree type, built_in_function code,
-  tree decl_attrs, tree type_attrs)
-{
-  tree decl = add_builtin_function (name, type, code, BUILT_IN_NORMAL,
-   name + strlen ("__builtin_"), decl_attrs);
-  decl_attributes (_TYPE (decl), type_attrs, ATTR_FLAG_BUILT_IN);
-  set_builtin_decl (code, decl, true);
-}
-
-/* A subroutine of build_tm_vector_builtins.  Find a supported vector
-   type VECTOR_BITS wide with inner mode ELEM_MODE.  */
-static tree
-find_tm_vector_type (unsigned vector_bits, machine_mode elem_mode)
-{
-  unsigned elem_bits = GET_MODE_BITSIZE (elem_mode);
-  unsigned nunits = vector_bits / elem_bits;
-
-  gcc_assert (elem_bits * nunits == vector_bits);
-
-  machine_mode vector_mode = mode_for_vector (elem_mode, nunits);
-  if (!VECTOR_MODE_P (vector_mode)
-  || !targetm.vector_mode_supported_p (vector_mode))
-return NULL_TREE;
-
-  tree innertype = lang_hooks.types.type_for_mode (elem_mode, 0);
-  return build_vector_type_for_mode (innertype, vector_mode);
-}
-
-/* A subroutine of build_common_builtin_nodes.  Define TM builtins for
-   vector types.  This is done after the target hook, so that the target
-   has a chance to override these.  */
-static void
-build_tm_vector_builtins (void)
-{
-  tree vtype, pvtype, ftype, decl;
-  tree attrs_load, attrs_type_load;
-  tree attrs_store, attrs_type_store;
-  tree attrs_log, attrs_type_log;
-
-  /* Do nothing if TM is turned off, either with switch or
- not enabled in the language.  */
-  if (!flag_tm || !builtin_decl_explicit_p (BUILT_IN_TM_LOAD_1))
-return;
-
-  /* Use whatever attributes a normal TM load has.  */
-  decl = builtin_decl_explicit (BUILT_IN_TM_LOAD_1);
-  attrs_load = DECL_ATTRIBUTES (decl);
-  attrs_type_load = TYPE_ATTRIBUTES (TREE_TYPE (decl));
-  /* Use whatever attributes a normal TM store has.  */
-  decl = builtin_decl_explicit (BUILT_IN_TM_STORE_1);
-  attrs_store = DECL_ATTRIBUTES (decl);
-  attrs_type_store = TYPE_ATTRIBUTES (TREE_TYPE (decl));
-  /* Use whatever attributes a normal TM log has.  */
-  decl = builtin_decl_explicit (BUILT_IN_TM_LOG);
-  attrs_log = DECL_ATTRIBUTES (decl);
-  attrs_type_log = TYPE_ATTRIBUTES (TREE_TYPE (decl));
-
-  /* By default, 64 bit vectors go through the long long helpers.  */
-
-  /* If a 128-bit vector is supported, declare those builtins.  */
-  if (!builtin_decl_explicit_p (BUILT_IN_TM_STORE_M128)
-  && ((vtype = find_tm_vector_type (128, SImode))
- || (vtype = find_tm_vector_type (128, SFmode
-{
-  pvtype = build_pointer_type (vtype);
-
-  ftype = build_function_type_list (void_type_node, pvtype, vtype, NULL);
-  tm_define_builtin ("__builtin__ITM_WM128", ftype,
-BUILT_IN_TM_STORE_M128,
-attrs_store, attrs_type_store);
-  tm_define_builtin ("__builtin__ITM_WaRM128", ftype,
-

Re: RFA: MIPS: Fix race condition causing PR 69129

2016-01-20 Thread Matthias Klose

On 19.01.2016 14:52, Nick Clifton wrote:

Hi Catherine, Hi Eric, Hi Matthew,

   GCC PR 69129 reports a problem with the MIPS backend:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69129

   I traced the problem down to a race condition in
   mips_compute_frame_info.  This calls mips_global_pointer, which
   through a torturous chain of inferior calls can end up with
   mips_get_cprestore_base_and_offset trying to use the information in
   the frame structure which has yet to be computed...

   The attached patch fixes the problem by moving the initialisation of
   the global_pointer field in the frame structure to after the args_size
   and hard_frame_pointer_offset fields have been initialised.

   Tested with no regressions on a mipsisa32-elf toolchain.  (I know that
   there are lots of different possible mips configurations.  I was not
   sure which one(s) I should test, so I chose one at random).


this fixes the bootstrap errors for me, seen in both libgnat and libgfortran.



Re: [PATCH, rs6000] Add support for __builtin_cpu_is() and __builtin_cpu_supports()

2016-01-20 Thread David Edelsohn
On Thu, Jan 14, 2016 at 10:50 PM, Peter Bergner  wrote:
> This patch adds support for __builtin_cpu_init(), __builtin_cpu_is() and
> __builtin_cpu_supports() builtins for PowerPC.  We use the same API as the
> x86* builtins of the same name.  These builtins uses the new GLIBC 2.23
> feature where we store the AT_PLATFORM, AT_HWCAP and AT_HWCAP2 values in the
> Thread Control Block (TCB) which offers very fast access to these values.
>
> As part of the agreement with the GLIBC community, we always emit a reference
> to a special symbol exported by LIBCs that support the AT_PLATFORM/AT_HWCAP*
> values in the TCB, whenever we expand one of the CPU builtins.  We do this
> so that we will never attempt to access the TCB on old LIBCs.  Joseph also
> asked that we conditionalize the enabling of this code with a configure time
> check for GLIBC's version and that is included here.
>
> I'll note that since GLIBC initializes the TCB before the application gets
> control, we don't actually need __builtin_cpu_init(), but we have implemented
> it anyway, to keep the same API as x86.  It's just our init expands to 
> nothing.
>
> This passes bootstrap and regtesting with no errors.  Ok for mainline?
>
> Peter
>
>
> gcc/
> * config/rs6000/ppc-auxv.h: New file.
> * config/rs6000/rs6000-builtin.def (cpu_init): Add new builtin.
> (cpu_is): Likewise.
> (cpu_supports): Likewise.
> * config/rs6000/rs6000.c: include "ppc-auxv.h".
> (cpu_is_info): New variable.
> (cpu_supports_info): Likewise.
> (tcb_verification_symbol): Likewise.
> (cpu_builtin_p): Likewise.
> (cpu_expand_builtin): New function.
> (rs6000_expand_ternop_builtin): Add support for CPU builtin functions.
> (rs6000_init_builtins): Likewise.
> (rs6000_elf_file_end): Emit HWCAP in TCB verification symbol.
> * config/rs6000/rs6000.h (TLS_REGNUM): New define.
> * configure.ac (gcc_cv_libc_provides_hwcap_in_tcb): New test.
> * configure: Regenerate.
> * config.in: Likewise.
>
> gcc/testsuite/
> * gcc.target/powerpc/cpu-builtin-1.c: New test.

>* doc/extend.texi (PowerPC Built-in Functions): Document
 >   __builtin_cpu_init, __builtin_cpu_is and __builtin_cpu_supports.

This is okay.

Thanks, David


Re: [PATCH] ARM PR68620 (ICE with FP16 on armeb)

2016-01-20 Thread Christophe Lyon
On 19 January 2016 at 15:51, Alan Lawrence  wrote:
> On 19/01/16 11:15, Christophe Lyon wrote:
>
 For neon_vdupn, I chose to implement neon_vdup_nv4hf and
 neon_vdup_nv8hf instead of updating the VX iterator because I thought
 it was not desirable to impact neon_vrev32.
>>>
>>>
>>> Well, the same instruction will suffice for vrev32'ing vectors of HF just
>>> as
>>> well as vectors of HI, so I think I'd argue that's harmless enough. To
>>> gain the
>>> benefit, we'd need to update arm_evpc_neon_vrev with a few new cases,
>>> though.
>>>
>> Since this is more intrusive, I'd rather leave that part for later. OK?
>
>
> Sure.
>
 +#ifdef __ARM_BIG_ENDIAN
 +  /* Here, 3 is (4-1) where 4 is the number of lanes. This is also the
 + right value for vectors with 8 lanes.  */
 +#define __arm_lane(__vec, __idx) (__idx ^ 3)
 +#else
 +#define __arm_lane(__vec, __idx) __idx
 +#endif
 +
>>>
>>>
>>> Looks right, but sounds... my concern here is that I'm hoping at some
>>> point we
>>> will move the *other* vget/set_lane intrinsics to use GCC vector
>>> extensions
>>> too. At which time (unlike __aarch64_lane which can be used everywhere)
>>> this
>>> will be the wrong formula. Can we name (and/or comment) it to avoid
>>> misleading
>>> anyone? The key characteristic seems to be that it is for vectors of
>>> 16-bit
>>> elements only.
>>>
>> I'm not to follow, here. Looking at the patterns for
>> neon_vget_lane_*internal in neon.md,
>> I can see 2 flavours: one for VD, one for VQ2. The latter uses "halfelts".
>>
>> Do you prefer that I create 2 macros (say __arm_lane and __arm_laneq),
>> that would be similar to the aarch64 ones (by computing the number of
>> lanes of the input vector), but the "q" one would use half the total
>> number of lanes instead?
>
>
> That works for me! Sthg like:
>
> #define __arm_lane(__vec, __idx) NUM_LANES(__vec) - __idx
> #define __arm_laneq(__vec, __idx) (__idx & (NUM_LANES(__vec)/2)) +
> (NUM_LANES(__vec)/2 - __idx)
> //or similarly
> #define __arm_laneq(__vec, __idx) (__idx ^ (NUM_LANES(__vec)/2 - 1))
>
> Alternatively I'd been thinking
>
> #define __arm_lane_32xN(__idx) __idx ^ 1
> #define __arm_lane_16xN(__idx) __idx ^ 3
> #define __arm_lane_8xN(__idx) __idx ^ 7
>
> Bear in mind PR64893 that we had on AArch64 :-(
>

Here is a new version, based on the comments above.
I've also removed the addition of arm_fp_ok effective target since I
added that in my other testsuite patch.

OK now?

Thanks,

Christophe

> Cheers, Alan
gcc/ChangeLog:

2016-01-20  Christophe Lyon  

PR target/68620
* config/arm/arm.c (neon_valid_immediate): Handle FP16 vectors.
* config/arm/arm_neon.h (__ARM_NUM_LANES, __arm_lane, arm_lanq):
New helper macros.
(vget_lane_f16): Handle big-endian.
(vgetq_lane_f16): Likewise.
(vset_lane_f16): Likewise.
(vsetq_lane_f16): Likewise.
* config/arm/iterators.md (VQXMOV): Add V8HF.
(VDQ): Add V4HF and V8HF.
(V_reg): Handle V4HF and V8HF.
(Is_float_mode): Likewise.
* config/arm/neon.md (movv4hf, movv8hf, neon_vdup_nv4hf,
neon_vdup_nv8hf): New patterns.
(vec_set_internal, vec_extract, neon_vld1_dup):
Use VD_LANE iterator.
(neon_vld1_dup): Use VQ2 iterator.

gcc/testsuite/ChangeLog:

2016-01-20  Christophe Lyon  

PR target/68620
* gcc.target/arm/pr68620.c: New test.

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 3588b83..b1f408c 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -12370,6 +12370,10 @@ neon_valid_immediate (rtx op, machine_mode mode, int 
inverse,
   if (!vfp3_const_double_rtx (el0) && el0 != CONST0_RTX (GET_MODE (el0)))
 return -1;
 
+  /* FP16 vectors cannot be represented.  */
+  if (innersize == 2)
+   return -1;
+
   r0 = CONST_DOUBLE_REAL_VALUE (el0);
 
   for (i = 1; i < n_elts; i++)
diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 0a33d21..69b28c8 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -5252,14 +5252,26 @@ vget_lane_s32 (int32x2_t __a, const int __b)
were marked always-inline so there were no call sites, the declaration
would nonetheless raise an error.  Hence, we must use a macro instead.  */
 
-#define vget_lane_f16(__v, __idx)  \
-  __extension__\
-({ \
-  float16x4_t __vec = (__v);   \
-  __builtin_arm_lane_check (4, __idx); \
-  float16_t __res = __vec[__idx];  \
-  __res;   \
-})
+  /* For big-endian, GCC's vector indices are reversed within each 64
+ bits compared to the architectural lane indices used by Neon
+ intrinsics.  */
+#ifdef __ARM_BIG_ENDIAN
+#define 

[PATCH, rs6000] Add Power9 asm entries

2016-01-20 Thread Pat Haugen
The following adds a couple missed Power9 assembler option entries. 
Bootstrapped on ppc64. Ok for trunk?


-Pat

2016-01-20  Pat Haugen  

* config/rs6000/aix71.h (ASM_CPU_SPEC): Add entry for Power9.
* config/rs6000/driver-rs6000.c (struct asm_names): Likewise.

Index: config/rs6000/aix71.h
===
--- config/rs6000/aix71.h   (revision 232629)
+++ config/rs6000/aix71.h   (working copy)
@@ -80,6 +80,7 @@ do {  
\
 %{mcpu=power6x: -mpwr6} \
 %{mcpu=power7: -mpwr7} \
 %{mcpu=power8: -mpwr8} \
+%{mcpu=power9: -mpwr9} \
 %{mcpu=powerpc: -mppc} \
 %{mcpu=rs64a: -mppc} \
 %{mcpu=603: -m603} \
Index: config/rs6000/driver-rs6000.c
===
--- config/rs6000/driver-rs6000.c   (revision 232629)
+++ config/rs6000/driver-rs6000.c   (working copy)
@@ -361,6 +361,7 @@ static const struct asm_name asm_names[]
   { "power6x",   "-mpwr6" },
   { "power7","-mpwr7" },
   { "power8","-mpwr8" },
+  { "power9","-mpwr9" },
   { "powerpc",   "-mppc" },
   { "rs64a", "-mppc" },
   { "603",   "-m603" },
@@ -387,6 +388,7 @@ static const struct asm_name asm_names[]
   { "power6x",   "%(asm_cpu_power6) -maltivec" },
   { "power7","%(asm_cpu_power7)" },
   { "power8","%(asm_cpu_power8)" },
+  { "power9","%(asm_cpu_power9)" },
   { "powerpc",   "-mppc" },
   { "rs64a", "-mppc64" },
   { "401",   "-mppc" },



Re: [wwwdocs] gcc-6/changes.html: diagnostics, Levenshtein, -Wmisleading-indentation, jit (v2)

2016-01-20 Thread Gerald Pfeifer
On Wed, 20 Jan 2016, Manuel López-Ibáñez wrote:
>> And I can commit working my way backwards through all the other
>> changes.html pages over the coming couple of days.
> wwwdocs/htdocs$ find . -name '*.html' | xargs grep --color -e " style *="
> 
> shows a bit more inline CSS than changes.html, unfortunately.

Yes, I know.  I'll also take care of the others (may just take
some days for the release-specific ones, possibly a tad longer
for the few others).

Gerald

Re: [PATCH, AArch64] Fix for PR67896 (C++ FE cannot distinguish __Poly{8,16,64,128}_t types)

2016-01-20 Thread Roger Ferrer Ibáñez
Hi James,

> This patch looks technically correct to me, though there is a small
> style issue to correct (in-line below), and your ChangeLogs don't fit
> our usual style.

thank you very much for the useful comments. I'm attaching a new
version of the patch with the style issues (hopefully) ironed out.

Kind regards,

gcc/ChangeLog:

2016-01-19  Roger Ferrer Ibáñez  

PR target/67896
* config/aarch64/aarch64-builtins.c
(aarch64_init_simd_builtin_types): Do not set structural
equality to __Poly{8,16,64,128}_t types.

gcc/testsuite/ChangeLog:

2016-01-19  Roger Ferrer Ibáñez  

PR target/67896
* gcc.target/aarch64/simd/pr67896.C: New.

-- 
Roger Ferrer Ibáñez
From 72c065f6a3f9d168baf357de1b567faa6042c03b Mon Sep 17 00:00:00 2001
From: Roger Ferrer Ibanez 
Date: Wed, 20 Jan 2016 21:11:42 +0100
Subject: [PATCH] Do not set structural equality on polynomial types

---
 gcc/config/aarch64/aarch64-builtins.c   | 10 ++
 gcc/testsuite/gcc.target/aarch64/simd/pr67896.C |  7 +++
 2 files changed, 13 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/pr67896.C

diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index bd7a8dd..40272ed 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -610,14 +610,16 @@ aarch64_init_simd_builtin_types (void)
   enum machine_mode mode = aarch64_simd_types[i].mode;
 
   if (aarch64_simd_types[i].itype == NULL)
-	aarch64_simd_types[i].itype =
-	  build_distinct_type_copy
-	(build_vector_type (eltype, GET_MODE_NUNITS (mode)));
+	{
+	  aarch64_simd_types[i].itype
+	= build_distinct_type_copy
+	  (build_vector_type (eltype, GET_MODE_NUNITS (mode)));
+	  SET_TYPE_STRUCTURAL_EQUALITY (aarch64_simd_types[i].itype);
+	}
 
   tdecl = add_builtin_type (aarch64_simd_types[i].name,
 aarch64_simd_types[i].itype);
   TYPE_NAME (aarch64_simd_types[i].itype) = tdecl;
-  SET_TYPE_STRUCTURAL_EQUALITY (aarch64_simd_types[i].itype);
 }
 
 #define AARCH64_BUILD_SIGNED_TYPE(mode)  \
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/pr67896.C b/gcc/testsuite/gcc.target/aarch64/simd/pr67896.C
new file mode 100644
index 000..1f916e0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/pr67896.C
@@ -0,0 +1,7 @@
+typedef __Poly8_t A;
+typedef __Poly16_t A; /* { dg-error "conflicting declaration" } */
+typedef __Poly64_t A; /* { dg-error "conflicting declaration" } */
+typedef __Poly128_t A; /* { dg-error "conflicting declaration" } */
+
+typedef __Poly8x8_t B;
+typedef __Poly16x8_t B; /* { dg-error "conflicting declaration" } */ 
-- 
2.1.4



Re: [PATCH, rs6000] Add support for __builtin_cpu_is() and __builtin_cpu_supports()

2016-01-20 Thread Peter Bergner
On Wed, 2016-01-20 at 15:15 -0500, David Edelsohn wrote:
> On Thu, Jan 14, 2016 at 10:50 PM, Peter Bergner  wrote:
> > gcc/
> > * config/rs6000/ppc-auxv.h: New file.
> > * config/rs6000/rs6000-builtin.def (cpu_init): Add new builtin.
> > (cpu_is): Likewise.
> > (cpu_supports): Likewise.
> > * config/rs6000/rs6000.c: include "ppc-auxv.h".
> > (cpu_is_info): New variable.
> > (cpu_supports_info): Likewise.
> > (tcb_verification_symbol): Likewise.
> > (cpu_builtin_p): Likewise.
> > (cpu_expand_builtin): New function.
> > (rs6000_expand_ternop_builtin): Add support for CPU builtin 
> > functions.
> > (rs6000_init_builtins): Likewise.
> > (rs6000_elf_file_end): Emit HWCAP in TCB verification symbol.
> > * config/rs6000/rs6000.h (TLS_REGNUM): New define.
> > * configure.ac (gcc_cv_libc_provides_hwcap_in_tcb): New test.
> > * configure: Regenerate.
> > * config.in: Likewise.
> > 
> > gcc/testsuite/
> > * gcc.target/powerpc/cpu-builtin-1.c: New test.
> 
> >* doc/extend.texi (PowerPC Built-in Functions): Document
> >   __builtin_cpu_init, __builtin_cpu_is and __builtin_cpu_supports.
> 
> This is okay.
> 

Thanks, committed as revision 232634.

Peter



Re: [SMS] Schedule normalization after scheduling branch

2016-01-20 Thread Martin Sebor

On 01/18/2016 02:38 PM, Roman Zhuykov wrote:

Hello,

4 years ago when I create some SMS patches nobody cares about SMS
failures on ia64.  Now ia64 is even more dead but at least one of bugs
appears on powerpc - PR69252.
Proposed patch is here:
https://gcc.gnu.org/ml/gcc-patches/2011-12/txt00266.txt and it even
suits current trunk without modification.

Powerpc PR69252 situation seems to be the same as I described earlier
on ia64, I can discuss it once again.

There were the following doloop:

   set_before reg
Loop:
   use1 reg
   set reg
   use2 reg
   insn
   cloop

After SMS it looked like this (I add scheduling stage and cycle before
each loop instruction):

   set_before reg
SMS-prologue:
   set reg
   use1 reg_copy #uninitialized
   use2 reg
   reg_copy <- reg
Loop:
0  0 set reg
0  0 use1 reg_copy
0  4 use2 reg
0 -4 reg_copy <- reg
0  8 insn
0 -1 cloop

All instructions were wrongly classified to stage zero.  While copying
them to prologue the regmove remains to be placed after use1, and as a
result, the register reg_copy is used uninititalized in prologue.
This leads to miscompilation.

I have found that the issue can be fixed by additional schedule
normalizarion after scheduling branch instruction in optimize_sc
function.  The situation here is the same as in patch by Richard
Sandiford
http://gcc.gnu.org/ml/gcc-patches/2011-10/msg00748.html which enables
scheduling regmoves.  "Moves that handle incoming values might have
been added to a new first stage.  Bump the stage count if so."  The
same bumping should be done after scheduling branch, and my patch
simply implements this.

As Martin Sebor have already done bootstrap and regtest in PR69252, I
want to ask him to attach PR69252 example as a new regression test to
this patch.  I don't know if it should be execution test or maybe
there are some special scan-assembler techniques.

Ok for trunk with regtest?


A patch with the regression test is attached.  I verified that
the test fails without your patch and passes with it applied.

Martin



--
Roman

Here is the whole old letter about the patch:
2011-12-29 17:04 GMT+03:00 Roman Zhuykov :

This week I investigated modulo scheduler on IA64.  Enabling SMS by default
(-fmodulo-sched -fmodulo-sched-allow-regmoves) leads to bootstrap failure
on IA64: gcc/build/genautomata.o differs while comparing stages 2 and 3.

I haven't studied this issue in detail, because the combination of these
my patches fixes this problem:

[Additional edges to instructions with clobber]
http://gcc.gnu.org/ml/gcc-patches/2011-12/msg00505.html
[Correctly delete anti-dep edges for renaming]
http://gcc.gnu.org/ml/gcc-patches/2011-12/msg00506.html

Then I have regtested two compilers - first is clean trunk, and the second
is trunk with SMS enabled by default and two patches mentioned above.
Comparing the results shows several new failures.

FAIL: gcc.dg/pr45259.c (internal compiler error)
FAIL: gcc.dg/pr45259.c (test for excess errors)
FAIL: gcc.dg/pr47881.c (test for excess errors)
FAIL: 
tr1/5_numerical_facilities/special_functions/08_cyl_bessel_i/check_value.cc
execution test
FAIL: 
tr1/5_numerical_facilities/special_functions/09_cyl_bessel_j/check_value.cc
execution test
FAIL: tr1/5_numerical_facilities/special_functions/11_cyl_neumann/check_value.cc
execution test
FAIL: tr1/5_numerical_facilities/special_functions/21_sph_bessel/check_value.cc
execution test
FAIL: tr1/5_numerical_facilities/special_functions/23_sph_neumann/check_value.cc
execution test

Problem with gcc.dg/pr45259.c is an ICE, which I earlier fixed by this patch:
[Correct extracting loop exit condition]
http://gcc.gnu.org/ml/gcc-patches/2011-09/msg02049.html

In gcc.dg/pr47881.c -fcompare-debug failure happens. The difference between
-fcompare-debug dumps is only some NOTE_INSN_DELETED entries are placed
differently.  I haven't studied this problem.

And the last 5 new failures have dissappered after fixing the following
described issue.

Imagine the following doloop (each use and set is a fmad in real example):

use1 reg
set reg
use2 reg
insn
cloop

After SMS it looks like this, I write a scheduling stage and cycle before each
instruction.

0  0 set reg
0  0 use1 reg_copy
0  4 use2 regR
0 -4 reg_copy = reg
0  8 insn
0 -1 cloop

So all instructions were wrongly classified to stage zero.  While copying them
to prologue the regmove remains to be placed after use1, and as a
result, the register
reg_copy is used uninititalized in prologue.  This leads to miscompilation.

I have found that the issue can be fixed by additional schedule normalizarion
after scheduling branch instruction in optimize_sc function.  The situation
here is the same as in patch by Richard Sandiford
http://gcc.gnu.org/ml/gcc-patches/2011-10/msg00748.html
which enables scheduling regmoves.
"Moves that handle incoming values might have been added
to a new first stage.  Bump the stage count if so."
The same bumping should be done after scheduling branch.

In my model example 

Re: [PATCH] c++/58109 - alignas() fails to compile with constant expression

2016-01-20 Thread Martin Sebor

Right.  The problem is this code in is_late_template_attribute:


  /* If the first attribute argument is an identifier, only consider
 second and following arguments.  Attributes like mode, format,
 cleanup and several target specific attributes aren't late
 just because they have an IDENTIFIER_NODE as first argument.  */
  if (arg == args && identifier_p (t))
continue;


It shouldn't skip an initial identifier if !attribute_takes_identifier_p.


That seems backwards. I expected attribute_takes_identifier_p()
to return true for attribute aligned since the attribute does
take one.

In any case, I changed the patch as you suggest and retested it
on x86_64.  I saw the email about stage 3 having ended but I'm
not sure it applies to changes that are still in progress.

Martin
gcc/testsuite/ChangeLog:
2016-01-20  Martin Sebor  

	PR c++/58109
	PR c++/69022
	* g++.dg/cpp0x/alignas5.C: New test.
	* g++.dg/ext/vector29.C: Same.

gcc/cp/ChangeLog:
2016-01-20  Martin Sebor  

	PR c++/58109
	PR c++/69022
	* decl2.c (is_late_template_attribute): Handle dependent argument
	to attribute align and attribute vector_size.

diff --git a/gcc/cp/decl2.c b/gcc/cp/decl2.c
index a7212ca0..7d68961 100644
--- a/gcc/cp/decl2.c
+++ b/gcc/cp/decl2.c
@@ -1193,7 +1193,8 @@ is_late_template_attribute (tree attr, tree decl)
 	 second and following arguments.  Attributes like mode, format,
 	 cleanup and several target specific attributes aren't late
 	 just because they have an IDENTIFIER_NODE as first argument.  */
-  if (arg == args && identifier_p (t))
+  if (arg == args && attribute_takes_identifier_p (name)
+	  && identifier_p (t))
 	continue;
 
   if (value_dependent_expression_p (t)
diff --git a/gcc/testsuite/g++.dg/cpp0x/alignas5.C b/gcc/testsuite/g++.dg/cpp0x/alignas5.C
new file mode 100644
index 000..2dcc41f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/alignas5.C
@@ -0,0 +1,45 @@
+// PR c++/58109  - alignas() fails to compile with constant expression
+// { dg-do compile }
+
+template 
+struct Base {
+  static const int Align = sizeof (T);
+};
+
+// Never instantiated.
+template 
+struct Derived: Base
+{
+#if __cplusplus >= 201102L
+  // This is the meat of the (simplified) regression test for c++/58109.
+  using B = Base;
+  using B::Align;
+
+  alignas (Align) char a [1];
+  alignas (Align) T b [1];
+#else
+  // Fake the test for C++ 98.
+#  define Align Base::Align
+#endif
+
+  char __attribute__ ((aligned (Align))) c [1];
+  T __attribute__ ((aligned (Align))) d [1];
+};
+
+// Instantiated to verify that the code is accepted even when instantiated.
+template 
+struct InstDerived: Base
+{
+#if __cplusplus >= 201102L
+  using B = Base;
+  using B::Align;
+
+  alignas (Align) char a [1];
+  alignas (Align) T b [1];
+#endif
+
+  char __attribute__ ((aligned (Align))) c [1];
+  T __attribute__ ((aligned (Align))) d [1];
+};
+
+InstDerived dx;
diff --git a/gcc/testsuite/g++.dg/ext/vector29.C b/gcc/testsuite/g++.dg/ext/vector29.C
new file mode 100644
index 000..4a13009
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/vector29.C
@@ -0,0 +1,53 @@
+// PR c++/69022 - attribute vector_size ignored with dependent bytes
+// { dg-do compile }
+
+template 
+struct A { static const int X = N; };
+
+#if __cplusplus >= 201202L
+#  define ASSERT(e) static_assert (e, #e)
+#else
+#  define ASSERT(e)   \
+  do { struct S { bool: !!(e); } asrt; (void) } while (0)
+#endif
+
+template 
+struct B: A
+{
+#if __cplusplus >= 201202L
+  using A::X;
+#  define VecSize X
+#else
+#  define VecSize A::X
+#endif
+
+static void foo ()
+{
+char a __attribute__ ((vector_size (N)));
+ASSERT (sizeof a == N);
+
+T b __attribute__ ((vector_size (N)));
+ASSERT (sizeof b == N);
+}
+
+static void bar ()
+{
+char c1 __attribute__ ((vector_size (VecSize)));
+ASSERT (sizeof c1 == VecSize);
+
+char c2 __attribute__ ((vector_size (A::X)));
+ASSERT (sizeof c2 == A::X);
+
+T d1 __attribute__ ((vector_size (VecSize)));
+ASSERT (sizeof d1 == VecSize);
+
+T d2 __attribute__ ((vector_size (A::X)));
+ASSERT (sizeof d2 == A::X);
+}
+};
+
+void bar ()
+{
+B::foo ();
+B::bar ();
+}


Re: [PATCH 2/4 v2][AArch64] Add support for FCCMP

2016-01-20 Thread Evandro Menezes

On 01/06/16 14:44, Evandro Menezes wrote:

Hi, Wilco.

On 01/06/2016 06:04 AM, Wilco Dijkstra wrote:
Here's what I had in mind when I inquired about distinguishing FCMP 
from

FCCMP.  As you can see in the patch, Exynos is the only target that
cares about it, but I wonder if ThunderX or Xgene would too.

What do you think?
The new attributes look fine (I've got a similar outstanding change), 
however
please don't add them to non-AArch64 cores. We only need it for 
thunderx.md,

cortex-a53.md, cortex-a57.md, xgene1.md and exynos-m1.md.


Add support for the FCCMP insn types

2016-01-04  Evandro Menezes  

gcc/
* config/aarch64/aarch64.md (fccmp): Change insn type.
(fccmpe): Likewise.
* config/aarch64/thunderx.md (thunderx_fcmp): Add
   "fccmp{s,d}" types.
* config/arm/cortex-a53.md (cortex_a53_fpalu): Likewise.
* config/arm/cortex-a57.md (cortex_a57_fp_cmp): Likewise.
* config/arm/xgene1.md (xgene1_fcmp): Likewise.
* config/arm/exynos-m1.md (exynos_m1_fp_ccmp): New insn
   reservation.
* config/arm/types.md (fccmps): Add new insn type.
(fccmpd): Likewise.

Got it.  Here's an updated patch.  Again, assuming that your original 
patch is in place.  Perhaps you can build on it.


Thank you,



Ping.

--
Evandro Menezes



Re: [SMS] Schedule normalization after scheduling branch

2016-01-20 Thread Jeff Law

On 01/20/2016 02:30 PM, Martin Sebor wrote:


2016-01-20  Martin Sebor

PR target/69252
* gcc.target/powerpc/pr69252.c: New test.
This is OK once the fix for the scheduler goes in.  It could be improved 
by moving it into gcc.dg since I don't see anything in it that is PPC 
specific.


jeff



Re: [patch] libstdc++/14608 Add C++-conforming wrappers for stdlib.h and math.h

2016-01-20 Thread Dominik Vogt
On Tue, Jan 19, 2016 at 09:43:59PM +, Jonathan Wakely wrote:
> On 08/01/16 19:18 +, Jonathan Wakely wrote:
> >This resolves the longstanding issue that #include  uses the C
> >library header, which on most targets doesn't declare the additional
> >overloads required by C++11 26.8 [c.math], and similarly for
> >.
> >
> >With this patch libstdc++ provides its own  and 
> >wrappers, which are equivalent to  or  followed by
> >using-directives for all standard names. This means there are no more
> >inconsistencies in the contents of the  and  headers.
> 
> Tested x86_64-linux, powerpc64le-linux, powerpc-aix,
> x86_64-freebsd10.2, x86_64-dragonfly4.2
> 
> Committed to trunk.

I think this fix is incomplete.  There are still some test errors
because of missing signatures in the Plumhall testsuite.  Please
check the information I've added to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60401

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany



[PATCH v2] PR48344: Fix unrecognizable insn error when gcc

2016-01-20 Thread Kelvin Nilsen


This patch has bootstrapped and tested on powerpc64le-unknown-linux-gnu 
with no regressions.  Is this ok for the trunk?


See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=48344 for the original 
problem report.  The error resulted because gcc's processing of 
command-line options within gcc initialization code originally preceded 
the processing of target-specific configuration hooks.


In the unpatched gcc implementation, the Pmode (pointer mode) variable 
has not been initialized at the time the -fstack-limit-register 
command-line option is processed.  As a consequence, the stack-limiting 
register is not assigned a proper mode.  Thus, rtl instructions that 
make use of this stack-limiting register have an unspecified mode, and 
are therefore not matched by any known instructions.


The fix represented in this patch is to move the invocation of 
process_options () from within the implementation of do_compile () to 
immediately preceding the invocation of handle_common_deferred_options 
() (inside toplev::main ()).


gcc/ChangeLog:

2016-01-14  Kelvin Nilsen 

* toplev.c (do_compile): remove invocation of process_options ()
from within do_compile ()
(toplev::main): insert invocation of process_options () before
invocation of handle_common_deferred_options ().

gcc/testsuite/ChangeLog:

2016-01-14  Kelvin Nilsen 

* gcc.target/powerpc/pr48344-1.c: New test.

Index: gcc/testsuite/gcc.target/powerpc/pr48344-1.c
===
--- gcc/testsuite/gcc.target/powerpc/pr48344-1.c(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr48344-1.c(revision 232633)
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-fstack-limit-register=r2" } */
+void foo ()
+{
+  int N = 2;
+  int slots[N];
+
+}
+
Index: gcc/toplev.c
===
--- gcc/toplev.c(revision 232135)
+++ gcc/toplev.c(working copy)
@@ -1938,8 +1938,6 @@ standard_type_bitsize (int bitsize)
 static void
 do_compile ()
 {
-  process_options ();
-
   /* Don't do any more if an error has already occurred.  */
   if (!seen_error ())
 {
@@ -2072,6 +2070,11 @@ toplev::main (int argc, char **argv)
  save_decoded_options, save_decoded_options_count,
  UNKNOWN_LOCATION, global_dc);
 
+  /* process_options() must execute before handle_common_deferred_options()
+ because handle_common_deferred_options() makes use of variables
+ initialized by process_options() (e.g. Pmode) */
+  process_options ();
+
   handle_common_deferred_options ();
 
   init_local_tick ();


[gomp-nvptx 09/13] libgomp: use generic fortran.c on nvptx

2016-01-20 Thread Alexander Monakov
This patch removes the nvptx fortran.c stub that provides only
_gfortran_abort.  It is possible to link libgfortran on NVPTX with
-foffload=-lgfortran.

* config/nvptx/fortran.c: Delete.
---
 libgomp/ChangeLog.gomp-nvptx   |  4 
 libgomp/config/nvptx/fortran.c | 40 
 2 files changed, 4 insertions(+), 40 deletions(-)
 delete mode 100644 libgomp/config/nvptx/fortran.c

diff --git a/libgomp/config/nvptx/fortran.c b/libgomp/config/nvptx/fortran.c
deleted file mode 100644
index 58ca790..000
--- a/libgomp/config/nvptx/fortran.c
+++ /dev/null
@@ -1,40 +0,0 @@
-/* OpenACC Runtime Fortran wrapper routines
-
-   Copyright (C) 2014-2015 Free Software Foundation, Inc.
-
-   Contributed by Mentor Embedded.
-
-   This file is part of the GNU Offloading and Multi Processing Library
-   (libgomp).
-
-   Libgomp is free software; you can redistribute it and/or modify it
-   under the terms of the GNU General Public License as published by
-   the Free Software Foundation; either version 3, or (at your option)
-   any later version.
-
-   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
-   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
-   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
-   more details.
-
-   Under Section 7 of GPL version 3, you are granted additional
-   permissions described in the GCC Runtime Library Exception, version
-   3.1, as published by the Free Software Foundation.
-
-   You should have received a copy of the GNU General Public License and
-   a copy of the GCC Runtime Library Exception along with this program;
-   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
-   .  */
-
-/* Temporary hack; this will be provided by libgfortran.  */
-
-extern void _gfortran_abort (void);
-
-__asm__ ("// BEGIN GLOBAL FUNCTION DECL: _gfortran_abort\n"
-".visible .func _gfortran_abort;\n"
-"// BEGIN GLOBAL FUNCTION DEF: _gfortran_abort\n"
-".visible .func _gfortran_abort\n"
-"{\n"
-"trap;\n"
-"ret;\n"
-"}\n");


Re: [wwwdocs] gcc-6/changes.html: diagnostics, Levenshtein, -Wmisleading-indentation, jit (v2)

2016-01-20 Thread Gerald Pfeifer
On Wed, 20 Jan 2016, Jakub Jelinek wrote:
>>   Content-Security-Policy: default-src 'self' http: https:
>> 
>> So either we get the configuration of the web server changed, or
>> indeed we need to touch all those existing pages.
> At least the warning/error/note styles are something that multiple pages 
> are using and going to use in the future, so if that could be defined in 
> the main gcc.css, it would be enough.

Done thusly.  With this change, at least gcc-6/changes.html should
be fine again.

And I can commit working my way backwards through all the other
changes.html pages over the coming couple of days.

Gerald


Move the boldred class from gcc-6/changes.html to gcc.css.
Introduce boldcyan and boldmagenta classes and use them in
gcc-6/changes.html.

Index: gcc.css
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc.css,v
retrieving revision 1.25
diff -u -r1.25 gcc.css
--- gcc.css 12 Sep 2011 09:42:59 -  1.25
+++ gcc.css 20 Jan 2016 17:26:10 -
@@ -49,7 +49,11 @@
   border-width: thin;
   padding: 4px;
 }
-  
+
+.boldcyan{ font-weight:bold; color:cyan; }
+.boldmagenta { font-weight:bold; color:magenta; }
+.boldred { font-weight:bold; color:red; }
+
 /* Classpath versus libgcj merge status page. */
 
 .classpath-only { background-color: #AA; }
Index: gcc-6/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-6/changes.html,v
retrieving revision 1.49
diff -u -r1.49 changes.html
--- gcc-6/changes.html  19 Jan 2016 22:42:16 -  1.49
+++ gcc-6/changes.html  20 Jan 2016 17:26:10 -
@@ -3,11 +3,6 @@
 
 GCC 6 Release Series  Changes, New Features, and Fixes
 
-
-  .boldred { font-weight:bold; color:red; }
-
-
 
@@ -80,9 +75,9 @@
 In addition, there is now initial support for precise diagnostic locations
 within strings:
 
-format-strings.c:3:14: warning: field 
width specifier '*' expects a matching 'int' argument [-Wformat=]
+format-strings.c:3:14: warning: field 
width specifier '*' expects a matching 'int' argument [-Wformat=]
printf("%*d");
-^
+^
 
 Diagnostics can now contain "fix-it hints", which are displayed
   in context underneath the relevant source code.  For example:
@@ -126,12 +121,12 @@
   https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2014-1266;>CVE-2014-1266:
 
 sslKeyExchange.c: In function 'SSLVerifySignedServerKeyExchange':
-sslKeyExchange.c:631:8: warning: statement 
is indented as if it were guarded by... [-Wmisleading-indentation]
-goto fail;
-^~~~
-sslKeyExchange.c:629:4: note: ...this 'if' 
clause, but it is not
-if ((err = SSLHashSHA1.update(hashCtx, 
signedParams)) != 0)
-^~
+sslKeyExchange.c:631:8: warning: 
statement is indented as if it were guarded by... [-Wmisleading-indentation]
+goto fail;
+^~~~
+sslKeyExchange.c:629:4: note: ...this 
'if' clause, but it is not
+if ((err = SSLHashSHA1.update(hashCtx, 
signedParams)) != 0)
+^~
 
   This warning is enabled by -Wall.
   


[gomp-nvptx 11/13] pick GOMP_target_ext changes from the hsa branch

2016-01-20 Thread Alexander Monakov
This adds necessary plumbing to spawn multiple teams.

To be reverted on this branch prior to merge.
---
 gcc/builtin-types.def|   7 +-
 gcc/fortran/types.def|   5 +-
 gcc/omp-builtins.def |   2 +-
 gcc/omp-low.c| 149 ---
 include/gomp-constants.h |  21 +++
 libgomp/libgomp.h|  12 +-
 libgomp/libgomp_g.h  |   3 +-
 libgomp/oacc-host.c  |   3 +-
 libgomp/target.c | 179 +--
 libgomp/task.c   |   3 +-
 liboffloadmic/plugin/libgomp-plugin-intelmic.cpp |   4 +-
 11 files changed, 299 insertions(+), 89 deletions(-)

diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def
index c68fb19..33bee1d 100644
--- a/gcc/builtin-types.def
+++ b/gcc/builtin-types.def
@@ -555,10 +555,9 @@ DEF_FUNCTION_TYPE_9 
(BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT,
 BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR,
 BT_PTR_FN_VOID_PTR_PTR, BT_LONG, BT_LONG,
 BT_BOOL, BT_UINT, BT_PTR, BT_INT)
-
-DEF_FUNCTION_TYPE_10 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_INT_INT,
- BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_SIZE, BT_PTR,
- BT_PTR, BT_PTR, BT_UINT, BT_PTR, BT_INT, BT_INT)
+DEF_FUNCTION_TYPE_9 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR,
+BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_SIZE, BT_PTR,
+BT_PTR, BT_PTR, BT_UINT, BT_PTR, BT_PTR)
 
 DEF_FUNCTION_TYPE_11 
(BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_UINT_LONG_INT_LONG_LONG_LONG,
  BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR,
diff --git a/gcc/fortran/types.def b/gcc/fortran/types.def
index a37e856..5838f04 100644
--- a/gcc/fortran/types.def
+++ b/gcc/fortran/types.def
@@ -220,10 +220,9 @@ DEF_FUNCTION_TYPE_9 
(BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT,
 BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR,
 BT_PTR_FN_VOID_PTR_PTR, BT_LONG, BT_LONG,
 BT_BOOL, BT_UINT, BT_PTR, BT_INT)
-
-DEF_FUNCTION_TYPE_10 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_INT_INT,
+DEF_FUNCTION_TYPE_9 (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR,
  BT_VOID, BT_INT, BT_PTR_FN_VOID_PTR, BT_SIZE, BT_PTR,
- BT_PTR, BT_PTR, BT_UINT, BT_PTR, BT_INT, BT_INT)
+ BT_PTR, BT_PTR, BT_UINT, BT_PTR, BT_PTR)
 
 DEF_FUNCTION_TYPE_11 
(BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_UINT_LONG_INT_LONG_LONG_LONG,
  BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR,
diff --git a/gcc/omp-builtins.def b/gcc/omp-builtins.def
index 35f5014..35c2724 100644
--- a/gcc/omp-builtins.def
+++ b/gcc/omp-builtins.def
@@ -341,7 +341,7 @@ DEF_GOMP_BUILTIN (BUILT_IN_GOMP_SINGLE_COPY_START, 
"GOMP_single_copy_start",
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_SINGLE_COPY_END, "GOMP_single_copy_end",
  BT_FN_VOID_PTR, ATTR_NOTHROW_LEAF_LIST)
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TARGET, "GOMP_target_ext",
- BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_INT_INT,
+ BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR,
  ATTR_NOTHROW_LIST)
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_TARGET_DATA, "GOMP_target_data_ext",
  BT_FN_VOID_INT_SIZE_PTR_PTR_PTR, ATTR_NOTHROW_LIST)
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 8996b8d..2e02c6f 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -12731,6 +12731,130 @@ mark_loops_in_oacc_kernels_region (basic_block 
region_entry,
 loop->in_oacc_kernels_region = true;
 }
 
+/* Build target argument identifier from the DEVICE identifier, value
+   identifier ID and whether the element also has a SUBSEQUENT_PARAM.  */
+
+static tree
+get_target_argument_identifier_1 (int device, bool subseqent_param, int id)
+{
+  tree t = build_int_cst (integer_type_node, device);
+  if (subseqent_param)
+t = fold_build2 (BIT_IOR_EXPR, integer_type_node, t,
+build_int_cst (integer_type_node,
+   GOMP_TARGET_ARG_SUBSEQUENT_PARAM));
+  t = fold_build2 (BIT_IOR_EXPR, integer_type_node, t,
+  build_int_cst (integer_type_node, id));
+  return t;
+}
+
+/* Like above but return it in type that can be directly stored as an element
+   of the argument array.  */
+
+static tree
+get_target_argument_identifier (int device, bool subseqent_param, int id)
+{
+  tree t = get_target_argument_identifier_1 (device, subseqent_param, id);
+  return fold_convert (ptr_type_node, t);
+}
+
+/* Return a target argument consisiting of DEVICE identifier, value identifier
+   ID, and the actual VALUE.  */
+
+static tree
+get_target_argument_value (gimple_stmt_iterator *gsi, int device, int id,
+ 

Re: [patch] libstdc++/14608 Add C++-conforming wrappers for stdlib.h and math.h

2016-01-20 Thread Jonathan Wakely

On 20/01/16 17:17 +0100, Dominik Vogt wrote:

On Tue, Jan 19, 2016 at 09:43:59PM +, Jonathan Wakely wrote:

On 08/01/16 19:18 +, Jonathan Wakely wrote:
>This resolves the longstanding issue that #include  uses the C
>library header, which on most targets doesn't declare the additional
>overloads required by C++11 26.8 [c.math], and similarly for
>.
>
>With this patch libstdc++ provides its own  and 
>wrappers, which are equivalent to  or  followed by
>using-directives for all standard names. This means there are no more
>inconsistencies in the contents of the  and  headers.

Tested x86_64-linux, powerpc64le-linux, powerpc-aix,
x86_64-freebsd10.2, x86_64-dragonfly4.2

Committed to trunk.


I think this fix is incomplete.  There are still some test errors
because of missing signatures in the Plumhall testsuite.  Please
check the information I've added to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60401


This should fix it, tested powerpc64le-linux and committed to trunk.


commit a2d2a215d6ce58e9a2d6d9b5c3cded3c06b7cf8e
Author: Jonathan Wakely 
Date:   Wed Jan 20 16:47:57 2016 +

Add C++11  overloads to the global namespace

	PR libstdc++/60401
	* include/c_compatibility/math.h (acosh, asinh, atanh, acbrt,
	copysign, erf, erfc, exp2, expm1, fdim, fma, fmax, fmin, hypot, ilogb,
	lgamma, llrint, llround, log1p, log2, logb, lrint, lround, nearbyint,
	nextafter, nexttoward, remainder, remquo, rint, round, scalbln, scalbn,
	tgamma, trunc) [__cplusplus >= 201103L && _GLIBCXX_USE_C99_MATH_TR1]:
	Add using declarations.
	* testsuite/26_numerics/headers/cmath/60401.cc: New.

diff --git a/libstdc++-v3/include/c_compatibility/math.h b/libstdc++-v3/include/c_compatibility/math.h
index 67f5ef1..d1fe75d 100644
--- a/libstdc++-v3/include/c_compatibility/math.h
+++ b/libstdc++-v3/include/c_compatibility/math.h
@@ -74,5 +74,42 @@ using std::islessgreater;
 using std::isunordered;
 #endif
 
+#if __cplusplus >= 201103L && defined(_GLIBCXX_USE_C99_MATH_TR1)
+using std::acosh;
+using std::asinh;
+using std::atanh;
+using std::cbrt;
+using std::copysign;
+using std::erf;
+using std::erfc;
+using std::exp2;
+using std::expm1;
+using std::fdim;
+using std::fma;
+using std::fmax;
+using std::fmin;
+using std::hypot;
+using std::ilogb;
+using std::lgamma;
+using std::llrint;
+using std::llround;
+using std::log1p;
+using std::log2;
+using std::logb;
+using std::lrint;
+using std::lround;
+using std::nearbyint;
+using std::nextafter;
+using std::nexttoward;
+using std::remainder;
+using std::remquo;
+using std::rint;
+using std::round;
+using std::scalbln;
+using std::scalbn;
+using std::tgamma;
+using std::trunc;
+#endif // C++11 && _GLIBCXX_USE_C99_MATH_TR1
+
 #endif
 #endif
diff --git a/libstdc++-v3/testsuite/26_numerics/headers/cmath/60401.cc b/libstdc++-v3/testsuite/26_numerics/headers/cmath/60401.cc
new file mode 100644
index 000..a6be94a
--- /dev/null
+++ b/libstdc++-v3/testsuite/26_numerics/headers/cmath/60401.cc
@@ -0,0 +1,68 @@
+// Copyright (C) 2016 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++11" }
+// { dg-do compile }
+
+// PR libstdc++/60401
+
+#include 
+
+namespace test
+{
+  template
+using F = T*;
+
+  Fabs = ::abs;
+
+#ifdef _GLIBCXX_USE_C99_MATH_TR1
+  F		acosh		= ::acosh;
+  F		asinh		= ::asinh;
+  F		atanh		= ::atanh;
+  F		cbrt		= ::cbrt;
+  F	copysign	= ::copysign;
+  F		erf		= ::erf;
+  F		erfc		= ::erfc;
+  F		exp2		= ::exp2;
+  F		expm1		= ::expm1;
+  F	fdim		= ::fdim;
+  F	fma		= ::fma;
+  F	fmax		= ::fmax;
+  F	fmin		= ::fmin;
+  F	hypot		= ::hypot;
+  F			ilogb		= ::ilogb;
+  F		lgamma		= ::lgamma;
+  F		llrint		= ::llrint;
+  F		llround		= ::llround;
+  F		log1p		= ::log1p;
+  F		log2		= ::log2;
+  F		logb		= ::logb;
+  F		lrint		= ::lrint;
+  F		lround		= ::lround;
+  F		nearbyint	= ::nearbyint;
+  F	nextafter	= 

[gomp-nvptx 12/13] libgomp: handle multiple teams on NVPTX

2016-01-20 Thread Alexander Monakov
* config/nvptx/icv-device.c (omp_get_num_teams): Update.
(omp_get_team_num): Ditto.
* config/nvptx/target.c (GOMP_teams): Update.
* config/nvptx/team.c (nvptx_thrs): Place in shared memory.
* icv.c (gomp_num_teams_var): Define.
* libgomp.h (gomp_num_teams_var): Declare.
(nvptx_thrs): Place in shared memory.
---
 libgomp/ChangeLog.gomp-nvptx  | 10 ++
 libgomp/config/nvptx/icv-device.c |  8 
 libgomp/config/nvptx/target.c | 13 -
 libgomp/config/nvptx/team.c   |  2 +-
 libgomp/icv.c |  1 +
 libgomp/libgomp.h |  3 ++-
 6 files changed, 30 insertions(+), 7 deletions(-)

diff --git a/libgomp/config/nvptx/icv-device.c 
b/libgomp/config/nvptx/icv-device.c
index 0e5fef0..bd11002 100644
--- a/libgomp/config/nvptx/icv-device.c
+++ b/libgomp/config/nvptx/icv-device.c
@@ -47,15 +47,15 @@ omp_get_num_devices (void)
 int
 omp_get_num_teams (void)
 {
-  /* FORNOW.  */
-  return 1;
+  return gomp_num_teams_var + 1;
 }
 
 int
 omp_get_team_num (void)
 {
-  /* FORNOW.  */
-  return 0;
+  int ctaid;
+  asm ("mov.u32 %0, %%ctaid.x;" : "=r" (ctaid));
+  return ctaid;
 }
 
 int
diff --git a/libgomp/config/nvptx/target.c b/libgomp/config/nvptx/target.c
index ad36013..9f34ae8 100644
--- a/libgomp/config/nvptx/target.c
+++ b/libgomp/config/nvptx/target.c
@@ -35,5 +35,16 @@ GOMP_teams (unsigned int num_teams, unsigned int 
thread_limit)
   icv->thread_limit_var
= thread_limit > INT_MAX ? UINT_MAX : thread_limit;
 }
-  (void) num_teams;
+  unsigned int num_blocks, block_id;
+  asm ("mov.u32 %0, %%nctaid.x;" : "=r" (num_blocks));
+  asm ("mov.u32 %0, %%ctaid.x;" : "=r" (block_id));
+  if (!num_teams || num_teams >= num_blocks)
+num_teams = num_blocks;
+  else if (block_id >= num_teams)
+{
+  gomp_free_thread (nvptx_thrs);
+  free (nvptx_thrs);
+  asm ("exit;");
+}
+  gomp_num_teams_var = num_teams - 1;
 }
diff --git a/libgomp/config/nvptx/team.c b/libgomp/config/nvptx/team.c
index c18517a..909f296 100644
--- a/libgomp/config/nvptx/team.c
+++ b/libgomp/config/nvptx/team.c
@@ -29,7 +29,7 @@
 #include "libgomp.h"
 #include 
 
-struct gomp_thread *nvptx_thrs;
+struct gomp_thread *nvptx_thrs __attribute__((shared));
 
 static void gomp_thread_start (struct gomp_thread_pool *);
 
diff --git a/libgomp/icv.c b/libgomp/icv.c
index aa79423..18e35e5 100644
--- a/libgomp/icv.c
+++ b/libgomp/icv.c
@@ -56,6 +56,7 @@ unsigned long gomp_bind_var_list_len;
 void **gomp_places_list;
 unsigned long gomp_places_list_len;
 int gomp_debug_var;
+unsigned int gomp_num_teams_var;
 char *goacc_device_type;
 int goacc_device_num;
 
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 1d137f1..0ef2a05 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -363,6 +363,7 @@ extern char *gomp_bind_var_list;
 extern unsigned long gomp_bind_var_list_len;
 extern void **gomp_places_list;
 extern unsigned long gomp_places_list_len;
+extern unsigned int gomp_num_teams_var;
 extern int gomp_debug_var;
 extern int goacc_device_num;
 extern char *goacc_device_type;
@@ -648,7 +649,7 @@ enum gomp_cancel_kind
 /* ... and here is that TLS data.  */
 
 #if defined __nvptx__
-extern struct gomp_thread *nvptx_thrs;
+extern struct gomp_thread *nvptx_thrs __attribute__((shared));
 static inline struct gomp_thread *gomp_thread (void)
 {
   int tid;


[PATCH] Revert an __int20ish get_ref_base_and_extent change (PR c++/69355)

2016-01-20 Thread Jakub Jelinek
Hi!

Among the __int20 changes was one that made get_ref_base_and_extent behave
differently from similar get_inner_reference, e.g. for XFmode long double
or structures containing just a single long double member,
get_ref_base_and_extent now returns on x86 80 instead of 96 or 128 which
the type spans in memory.  For a function that returns the extent IMHO
the bitsize is right, it also seems to allow SRA from scalarizing such
accesses (which it doesn't otherwise), and on the testcase below also
triggers some SRA bug that causes even wrong-code.  Martin said he will have
a look at SRA, but this patch will make that bug just latent issue.
DJ said his msp430 testing didn't reveal anything that would need the
precision there in this case (at least that is how I understood it).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-01-21  Jakub Jelinek  

PR c++/69355
* tree-dfa.c (get_ref_base_and_extent): Use GET_MODE_BITSIZE (mode)
for bitsize instead of GET_MODE_PRECISION (mode).

* g++.dg/torture/pr69355.C: New test.

--- gcc/tree-dfa.c.jj   2016-01-04 14:55:50.0 +0100
+++ gcc/tree-dfa.c  2016-01-20 11:06:15.226682927 +0100
@@ -395,7 +395,7 @@ get_ref_base_and_extent (tree exp, HOST_
   if (mode == BLKmode)
size_tree = TYPE_SIZE (TREE_TYPE (exp));
   else
-   bitsize = int (GET_MODE_PRECISION (mode));
+   bitsize = int (GET_MODE_BITSIZE (mode));
 }
   if (size_tree != NULL_TREE
   && TREE_CODE (size_tree) == INTEGER_CST)
--- gcc/testsuite/g++.dg/torture/pr69355.C.jj   2016-01-20 14:34:43.584332483 
+0100
+++ gcc/testsuite/g++.dg/torture/pr69355.C  2016-01-20 14:34:18.0 
+0100
@@ -0,0 +1,150 @@
+// PR c++/69355
+// { dg-do run }
+
+template  struct A;
+template <> struct A<1> {};
+template  struct B
+{
+  template  struct C
+  {
+typedef T *iterator;
+C (iterator p1) : m_iter (p1) {}
+void operator, (T p1) { *m_iter = p1; }
+iterator m_iter;
+  };
+  typedef double *iterator;
+  B (Obj , double) : m_object (p1) {}
+  C operator, (double);
+  Obj _object;
+};
+template 
+typename B::template C
+B::operator, (double p1)
+{
+  iterator a = m_object.data (), b = a + 1;
+  *a = 1;
+  *b = p1;
+  return C(b + 1);
+}
+class D {};
+inline double operator+(const double , D) { return p1; }
+template  class U;
+template  struct F
+{
+  enum { doIt = K < Sz - 1 ? 1 : 0 };
+  template 
+  static void assign (Dest , Src , Assign )
+  {
+p3.apply_on (p1 (K), p2 (K));
+F::assign (p1, p2, p3);
+  }
+  template  static double dot (Dest , Src )
+  {
+return p1 (K) * p2 (K) + F::dot (p1, p2);
+  }
+};
+template <> struct F<0>
+{
+  template 
+  static void assign (Dest &, Src &, Assign &) {}
+  template  static D dot (Dest &, Src &) { return D (); 
}
+};
+template  struct G
+{
+  enum { ops_assign, use_meta };
+  G (const E ) : m_expr (p1) {}
+  double operator()(int p1) const { return m_expr (p1); }
+  template 
+  static void do_assign (A<1>, Dest , Src , Assign )
+  {
+F::assign (p2, p3, p4);
+  }
+  template 
+  void assign_to (Dest , const Assign ) const
+  {
+do_assign (A<1>(), p1, *this, p2);
+  }
+  E m_expr;
+};
+struct H
+{
+  static double apply_on (double p1, long double p2) { return p1 / p2; }
+  static void apply_on (double , double p2) { p1 = p2; }
+};
+template  struct I
+{
+  I (const E1 , const E2 ) : m_lhs (p1), m_rhs (p2) {}
+  double operator()(int p1) const
+  {
+double c = m_lhs (p1);
+return H::apply_on (c, m_rhs (0));
+  }
+  E1 m_lhs;
+  const E2 m_rhs;
+};
+struct J
+{
+  J (double p1) : m_data (p1) {}
+  long double operator()(int) const { return m_data; }
+  long double m_data;
+};
+template  struct K
+{
+  K (const U ) : m_data (p1.data ()) {}
+  double operator()(int p1) const { return m_data[p1]; }
+  const double *m_data;
+};
+template  struct U
+{
+  U () {}
+  U (const U )
+  {
+*this = G(p1.const_ref ());
+  }
+  B operator=(double) { return B(*this, 0); }
+  double *data () { return m_data; }
+  const double *data () const { return m_data; }
+  double ()(int p1) { return m_data[p1]; }
+  double operator()(int p1) const { return m_data[p1]; }
+  typedef K ConstReference;
+  ConstReference const_ref () const { return *this; }
+  template  void operator=(const G )
+  {
+p1.assign_to (*this, H ());
+  }
+  double m_data[Sz];
+};
+template 
+G, Sz> div (U , double p2)
+{
+  typedef I expr_type;
+  return G(expr_type (p1.const_ref (), p2));
+}
+template  double norm2 (U )
+{
+  return __builtin_sqrt (F::dot (p1, p1));
+}
+template 
+G, Sz> operator/(U , double p2)
+{
+  return div (p1, p2);
+}
+typedef U<3> V;
+V foo (V p1)
+{
+  double e = norm2 (p1);
+  V r;
+  r = p1 / e;
+  return r;
+}
+int
+main ()
+{
+  V f;
+  f = 1, 2, 3;
+  V r = foo (f);
+  if (__builtin_fabs (r (0) - 0.267261) > 0.01
+  || __builtin_fabs (r 

Re: [PATCH v2] PR48344: Fix unrecognizable insn error when gcc

2016-01-20 Thread Bernd Schmidt



On 01/20/2016 10:49 PM, Kelvin Nilsen wrote:


 * toplev.c (do_compile): remove invocation of process_options ()
 from within do_compile ()
 (toplev::main): insert invocation of process_options () before
 invocation of handle_common_deferred_options ().


The ChangeLog seems badly formatted, but it could have been eaten by 
your mailer. You might want to include it as part of the attachment to 
avoid whitespace damage.


As for the patch itself, it makes me a little nervous - it's hard to 
judge whether this could have unintended consequences where something 
relies on the existing ordering. I'd much rather postpone the generation 
of stack_limit_rtx until rtl initialization time. Maybe this needs to be 
per-function anyway, can Pmode change with attribute target?



Bernd


[committed] Fix a warning in omp-low.c

2016-01-20 Thread Jakub Jelinek
Hi!

richi reported a warning in omp-low.c with some configure options (forgot
which), this patch ensures we don't warn.
Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2016-01-21  Jakub Jelinek  

* omp-low.c (expand_omp_target): Avoid -Wmaybe-uninitialized
warning.  Fix up formatting.

--- gcc/omp-low.c.jj2016-01-19 09:20:27.0 +0100
+++ gcc/omp-low.c   2016-01-19 11:37:05.053243550 +0100
@@ -13033,21 +13033,20 @@ expand_omp_target (struct omp_region *re
 GOMP_ASYNC_SYNC));
if (tagging && t_async)
  {
-   unsigned HOST_WIDE_INT i_async;
+   unsigned HOST_WIDE_INT i_async = GOMP_LAUNCH_OP_MAX;
 
if (TREE_CODE (t_async) == INTEGER_CST)
  {
/* See if we can pack the async arg in to the tag's
   operand.  */
i_async = TREE_INT_CST_LOW (t_async);
-
if (i_async < GOMP_LAUNCH_OP_MAX)
  t_async = NULL_TREE;
+   else
+ i_async = GOMP_LAUNCH_OP_MAX;
  }
-   if (t_async)
- i_async = GOMP_LAUNCH_OP_MAX;
-   args.safe_push (oacc_launch_pack
-   (GOMP_LAUNCH_ASYNC, NULL_TREE, i_async));
+   args.safe_push (oacc_launch_pack (GOMP_LAUNCH_ASYNC, NULL_TREE,
+ i_async));
  }
if (t_async)
  args.safe_push (t_async);

Jakub


[PATCH] #52291 - clarify sync_fetch_and_OP for pointers

2016-01-20 Thread Martin Sebor

The bug points out that while the __sync_fetch_and_OP intrinsics are
documented to have semantics equivalent to the "x OP= y" compound
assignment expressions, when used with pointer operands they actually
behave as if they operated on integers.  I.e., they are not scaled by
the size of the pointed-to type.

The attached patch brings the documentation of both the __sync_ and
the __atomic_ intrinsics into alignment with their actual effects.

Martin

PS See also c/64843 for some additional background.
2016-01-20  Martin Sebor  

	PR c/52291
	* extend.texi (__sync Builtins): Clarify the semantics aof
	__sync_fetch_and_OP built-ins on pointers.
	(__atomic Builtins): Same.
Index: gcc/doc/extend.texi
===
--- gcc/doc/extend.texi	(revision 232636)
+++ gcc/doc/extend.texi	(working copy)
@@ -9262,8 +9262,11 @@ work on multiple types.
 
 The definition given in the Intel documentation allows only for the use of
 the types @code{int}, @code{long}, @code{long long} or their unsigned
-counterparts.  GCC allows any integral scalar or pointer type that is
-1, 2, 4 or 8 bytes in length.
+counterparts.  GCC allows any scalar type that is 1, 2, 4 or 8 bytes in
+size other than the C type @code{_Bool} or the C++ type @code{bool}.
+Operations on pointer operands are performed as if the operands were
+of the @code{uintptr_t} type.  That is, they are not scaled by the size
+of the type to which the pointer points.
 
 These functions are implemented in terms of the @samp{__atomic}
 builtins (@pxref{__atomic Builtins}).  They should not be used for new
@@ -9309,7 +9312,11 @@ accessible variables should be protected
 @findex __sync_fetch_and_xor
 @findex __sync_fetch_and_nand
 These built-in functions perform the operation suggested by the name, and
-returns the value that had previously been in memory.  That is,
+returns the value that had previously been in memory.  That is, operations
+on integer operands have the following semantics.  Operations on pointer
+operands are performed as if the operands were of the @code{uintptr_t}
+type.  That is, they are not scaled by the size of the type to which
+the pointer points.
 
 @smallexample
 @{ tmp = *ptr; *ptr @var{op}= value; return tmp; @}
@@ -9335,7 +9342,9 @@ as @code{*ptr = ~(tmp & value)} instead
 @findex __sync_xor_and_fetch
 @findex __sync_nand_and_fetch
 These built-in functions perform the operation suggested by the name, and
-return the new value.  That is,
+return the new value.  That is, operations on integer operands have
+the following semantics.  Operations on pointer operands are performed as
+if the operands were of the @code{uintptr_t} type.  That is, they are not
+scaled by the size of the type to which the pointer points.
 
 @smallexample
 @{ *ptr @var{op}= value; return *ptr; @}
@@ -9592,7 +9601,9 @@ pointer.
 @deftypefnx {Built-in Function} @var{type} __atomic_or_fetch (@var{type} *ptr, @var{type} val, int memorder)
 @deftypefnx {Built-in Function} @var{type} __atomic_nand_fetch (@var{type} *ptr, @var{type} val, int memorder)
 These built-in functions perform the operation suggested by the name, and
-return the result of the operation.  That is,
+return the result of the operation.  Operations on pointer operands are
+performed as if the operands were of the @code{uintptr_t} type.  That is,
+they are not scaled by the size of the type to which the pointer points.
 
 @smallexample
 @{ *ptr @var{op}= val; return *ptr; @}
@@ -9610,7 +9621,10 @@ type.  It must not be a Boolean type.  A
 @deftypefnx {Built-in Function} @var{type} __atomic_fetch_or (@var{type} *ptr, @var{type} val, int memorder)
 @deftypefnx {Built-in Function} @var{type} __atomic_fetch_nand (@var{type} *ptr, @var{type} val, int memorder)
 These built-in functions perform the operation suggested by the name, and
-return the value that had previously been in @code{*@var{ptr}}.  That is,
+return the value that had previously been in @code{*@var{ptr}}.  Operations
+on pointer operands are performed as if the operands were of
+the @code{uintptr_t} type.  That is, they are not scaled by the size of
+the type to which the pointer points.
 
 @smallexample
 @{ tmp = *ptr; *ptr @var{op}= val; return tmp; @}


GCC 6 Status Report (2016-01-20), Stage 3 ended

2016-01-20 Thread Richard Biener

Status
==

Stage 3 has now officially ended and trunk is in regression and
documentation fixes stage now.  This means any new features or
fixes for bugs that are not regressions have to wait for GCC 7 now.

Please help analyze unconfirmed bugs in the list of serious regressions
and work towards eliminating the remaining P1 regressions.  As usual
if we reach zero P1 regressions we are ready to branch and release GCC 6.

The quality data below reflects categorizing bugs from the "no priority" P3
to others as well as our recent progress in eliminating regressions.
Bugs with priority P1 and P2 affect primary and secondary targets while
P4 and P5 might only affect the rest of targets and non-C/C++ frontends.


Quality Data


Priority  #   Change from last report
---   ---
P1   33+  30 
P2  103+  19
P3   43- 103
P4  110+  27
P5   30-   2
---   ---
Total P1-P3 179-  54
Total   319-  29


Previous Report
===

https://gcc.gnu.org/ml/gcc/2015-11/msg00075.html


[Ada] Fix 'char' compatibility with C

2016-01-20 Thread Eric Botcazou
As promised earlier, this fixes the signedness compatibility issue between 
Character/Interfaces.C.char in Ada and 'char' in the C family of languages.

Tested on x86_64-suse-linux, applied on the mainline.


2016-01-20  Eric Botcazou  

* exp_ch2.adb (Expand_Current_Value): Make an appropriate character
literal if the entity is of a character type.
* gcc-interface/lang.opt (fsigned-char): New option.
* gcc-interface/misc.c (gnat_handle_option): Accept it.
(gnat_init): Adjust comment.
* gcc-interface/gigi.h (finish_character_type): New prototype.
(maybe_character_type): New inline function.
(maybe_character_value): Likewise.
* gcc-interface/decl.c (gnat_to_gnu_entity) : For
a character of CHAR_TYPE_SIZE, make a signed type if flag_signed_char.
Set TYPE_ARTIFICIAL early and call finish_character_type on the type.
: For a subtype of character with RM_Size and
Esize equal to CHAR_TYPE_SIZE, make a signed type if flag_signed_char.
Copy TYPE_STRING_FLAG from type to subtype.
: Deal with character index types.
: Likewise.
* gcc-interface/trans.c (gigi): Replace unsigned_char_type_node with
char_type_node throughout.
(build_raise_check): Likewise.
(get_type_length): Deal with character types.
(Attribute_to_gnu) : Likewise.  Remove obsolete range check
code.  Minor tweak.
: Likewise.
(Loop_Statement_to_gnu): Likewise.
(Raise_Error_to_gnu): Likewise.
: Deal with character index types.  Remove
obsolete code.
: Likewise.
: Deal with character types.  Minor tweak.
: Likewise.
: Likewise.
: Likewise.
(emit_index_check): Delete.
* gcc-interface/utils.c (finish_character_type): New function.
(gnat_signed_or_unsigned_type_for): Deal with built-in character types
* gcc-interface/utils2.c (expand_sloc): Replaceunsigned_char_type_node
with char_type_node.
(build_call_raise): Likewise.
(build_call_raise_column): Likewise.
(build_call_raise_range): Likewise.

-- 
Eric BotcazouIndex: exp_ch2.adb
===
--- exp_ch2.adb	(revision 232465)
+++ exp_ch2.adb	(working copy)
@@ -6,7 +6,7 @@
 --  --
 -- B o d y  --
 --  --
---  Copyright (C) 1992-2015, Free Software Foundation, Inc. --
+--  Copyright (C) 1992-2016, Free Software Foundation, Inc. --
 --  --
 -- GNAT is free software;  you can  redistribute it  and/or modify it under --
 -- terms of the  GNU General Public License as published  by the Free Soft- --
@@ -193,7 +193,16 @@ package body Exp_Ch2 is
   Unchecked_Convert_To (T,
 New_Occurrence_Of (Entity (Val), Loc)));
 
- --  If constant is of an integer type, just make an appropriately
+ --  If constant is of a character type, just make an appropriate
+ --  character literal, which will get the proper type.
+
+ elsif Is_Character_Type (T) then
+Rewrite (N,
+  Make_Character_Literal (Loc,
+Chars => Chars (Val),
+Char_Literal_Value => Expr_Rep_Value (Val)));
+
+ --  If constant is of an integer type, just make an appropriate
  --  integer literal, which will get the proper type.
 
  elsif Is_Integer_Type (T) then
Index: gcc-interface/decl.c
===
--- gcc-interface/decl.c	(revision 232503)
+++ gcc-interface/decl.c	(working copy)
@@ -1560,16 +1560,24 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 case E_Enumeration_Type:
   /* A special case: for the types Character and Wide_Character in
 	 Standard, we do not list all the literals.  So if the literals
-	 are not specified, make this an unsigned integer type.  */
+	 are not specified, make this an integer type.  */
   if (No (First_Literal (gnat_entity)))
 	{
-	  gnu_type = make_unsigned_type (esize);
+	  if (esize == CHAR_TYPE_SIZE && flag_signed_char)
+	gnu_type = make_signed_type (CHAR_TYPE_SIZE);
+	  else
+	gnu_type = make_unsigned_type (esize);
 	  TYPE_NAME (gnu_type) = gnu_entity_name;
 
 	  /* Set TYPE_STRING_FLAG for Character and Wide_Character types.
 	 This is needed by the DWARF-2 back-end to distinguish between
 	 unsigned integer types and character types.  */
 	  TYPE_STRING_FLAG (gnu_type) = 1;
+
+	  /* This flag is needed by the call just below.  */
+	  TYPE_ARTIFICIAL (gnu_type) = artificial_p;
+
+	  finish_character_type (gnu_type);
 	}

[PATCH] Fix PR69345

2016-01-20 Thread Richard Biener

The following patch corrects a mistake with the SSA info updating logic
in eliminate () after my correctness patch for PR69117.  It also handles
one more case optimistically which together fixes the observed regression
in 459.GemsFDTD.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2016-01-20  Richard Biener  

PR tree-optimization/69345
* tree-ssa-sccvn.h (VN_INFO_RANGE_INFO): New inline function.
(VN_INFO_PTR_INFO): Likewise.
* tree-ssa-sccvn.c (set_ssa_val_to): Avoid clearing points-to
info when it is equal between non-dominating SSA names.
* tree-ssa-pre.c (eliminate_dom_walker::before_dom_children):
Make sure to look at original SSA infos.

Index: gcc/tree-ssa-sccvn.h
===
*** gcc/tree-ssa-sccvn.h(revision 232519)
--- gcc/tree-ssa-sccvn.h(working copy)
*** vn_valueize (tree name)
*** 243,246 
--- 243,266 
return name;
  }
  
+ /* Get at the original range info for NAME.  */
+ 
+ inline range_info_def *
+ VN_INFO_RANGE_INFO (tree name)
+ {
+   return (VN_INFO (name)->info.range_info
+ ? VN_INFO (name)->info.range_info
+ : SSA_NAME_RANGE_INFO (name));
+ }
+ 
+ /* Get at the original pointer info for NAME.  */
+ 
+ inline ptr_info_def *
+ VN_INFO_PTR_INFO (tree name)
+ {
+   return (VN_INFO (name)->info.ptr_info
+ ? VN_INFO (name)->info.ptr_info
+ : SSA_NAME_PTR_INFO (name));
+ }
+ 
  #endif /* TREE_SSA_SCCVN_H  */
Index: gcc/tree-ssa-sccvn.c
===
*** gcc/tree-ssa-sccvn.c(revision 232519)
--- gcc/tree-ssa-sccvn.c(working copy)
*** set_ssa_val_to (tree from, tree to)
*** 3092,3098 
  /* Use that from the dominator.  */
  SSA_NAME_PTR_INFO (to) = SSA_NAME_PTR_INFO (from);
}
! else
{
  /* Save old info.  */
  if (! VN_INFO (to)->info.ptr_info)
--- 3092,3102 
  /* Use that from the dominator.  */
  SSA_NAME_PTR_INFO (to) = SSA_NAME_PTR_INFO (from);
}
! else if (! SSA_NAME_PTR_INFO (from)
!  /* Handle the case of trivially equivalent info.  */
!  || memcmp (SSA_NAME_PTR_INFO (to),
! SSA_NAME_PTR_INFO (from),
! sizeof (ptr_info_def)) != 0)
{
  /* Save old info.  */
  if (! VN_INFO (to)->info.ptr_info)
Index: gcc/tree-ssa-pre.c
===
*** gcc/tree-ssa-pre.c  (revision 232519)
--- gcc/tree-ssa-pre.c  (working copy)
*** eliminate_dom_walker::before_dom_childre
*** 4033,4054 
{
  basic_block sprime_b = gimple_bb (SSA_NAME_DEF_STMT (sprime));
  if (POINTER_TYPE_P (TREE_TYPE (lhs))
! && SSA_NAME_PTR_INFO (lhs)
! && !SSA_NAME_PTR_INFO (sprime))
{
  duplicate_ssa_name_ptr_info (sprime,
!  SSA_NAME_PTR_INFO (lhs));
  if (b != sprime_b)
mark_ptr_info_alignment_unknown
(SSA_NAME_PTR_INFO (sprime));
}
! else if (!POINTER_TYPE_P (TREE_TYPE (lhs))
!  && SSA_NAME_RANGE_INFO (lhs)
!  && !SSA_NAME_RANGE_INFO (sprime)
   && b == sprime_b)
duplicate_ssa_name_range_info (sprime,
   SSA_NAME_RANGE_TYPE (lhs),
!  SSA_NAME_RANGE_INFO (lhs));
}
  
  /* Inhibit the use of an inserted PHI on a loop header when
--- 4033,4054 
{
  basic_block sprime_b = gimple_bb (SSA_NAME_DEF_STMT (sprime));
  if (POINTER_TYPE_P (TREE_TYPE (lhs))
! && VN_INFO_PTR_INFO (lhs)
! && ! VN_INFO_PTR_INFO (sprime))
{
  duplicate_ssa_name_ptr_info (sprime,
!  VN_INFO_PTR_INFO (lhs));
  if (b != sprime_b)
mark_ptr_info_alignment_unknown
(SSA_NAME_PTR_INFO (sprime));
}
! else if (INTEGRAL_TYPE_P (TREE_TYPE (lhs))
!  && VN_INFO_RANGE_INFO (lhs)
!  && ! VN_INFO_RANGE_INFO (sprime)
   && b == sprime_b)
duplicate_ssa_name_range_info (sprime,
   SSA_NAME_RANGE_TYPE (lhs),
!  VN_INFO_RANGE_INFO (lhs));
}
  
  /* Inhibit the use 

Re: [PATCH] Fix ICE with asm "m" (stmt-expr) operand (PR middle-end/67653)

2016-01-20 Thread Richard Biener
On Tue, 19 Jan 2016, Jakub Jelinek wrote:

> On Tue, Jan 19, 2016 at 10:00:00AM +0100, Richard Biener wrote:
> > On Tue, 19 Jan 2016, Jakub Jelinek wrote:
> > > Here is an attempt to fix ICE on statement expression in "m" asm input
> > > operand.  The problem is that gimplify_asm_expr attempts to mark it
> > > addressable, but that can be just too late, a temporary the 
> > > stmt-expression
> > > gimplifies to might not be addressable and may be used already in the
> > > gimplified code.  Normally the C/C++ FEs attempt to mark the operand
> > > addressable already, but in case of statement expression the temporaries
> > > might not exist yet.
> > > The patch turns also the PR29119 testcase into invalid test, but you've
> > > already said in that PR it should be invalid and I agree with that.
> > 
> > Hmm, but can't we detect this in the FE?
> 
> We could diagnose a statement expression in "m", but not sure if that is all
> that can get wrong, or if all statement expressions are problematic.

I thought about either requiring an lvalue here or at least diagnosing
that a non-lvalue might end up using a memory temporary.

> > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> > 
> > What happens if we just do _not_ mark the memory input addressable?
> > Shouldn't IRA/LRA in the end satisfy the constraint by spilling
> > a non-memory input and using the spill slot?
> 
> Well, if you want to make broken testcases work, it is always possible
> to call say prepare_gimple_addressable, but I'd think it is preferrable
> to tell people that what they do is really going to do something different
> from what they expect (that the operand, while being a memory input, will
> be some temporary containing a copy of the value rather than than the
> variable itself.

Sure, I'm just thinking that diagnosing sth at gimplification time
feels wrong ... after all we can make it unexpected but valid GIMPLE.

Erroring on a non-lvalue in the FE will likely break too much legacy
code but I guess that might be a better choice than using a
memory temporary (just in case we are faced with some fancy lock stuff).

Richard.


Re: [PATCH] Fix ICE with asm "m" (stmt-expr) operand (PR middle-end/67653)

2016-01-20 Thread Jakub Jelinek
On Wed, Jan 20, 2016 at 10:24:40AM +0100, Richard Biener wrote:
> > We could diagnose a statement expression in "m", but not sure if that is all
> > that can get wrong, or if all statement expressions are problematic.
> 
> I thought about either requiring an lvalue here or at least diagnosing
> that a non-lvalue might end up using a memory temporary.

Requiring an lvalue sounds wrong, "m" input can be validly e.g. const object 
that
can't be assigned to.  Furthermore, I'm really afraid changing that would break
too much stuff.
We had until GCC 4.7 a warning:
  warning (0, "use of memory input without lvalue in "
   "asm operand %d is deprecated", i + noutputs);
but 1) it wasn't anywhere near frontend, it was during expansion
2) it wasn't really checking lvalue, but whether the operand is a MEM or not
> 
> > > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> > > 
> > > What happens if we just do _not_ mark the memory input addressable?
> > > Shouldn't IRA/LRA in the end satisfy the constraint by spilling
> > > a non-memory input and using the spill slot?
> > 
> > Well, if you want to make broken testcases work, it is always possible
> > to call say prepare_gimple_addressable, but I'd think it is preferrable
> > to tell people that what they do is really going to do something different
> > from what they expect (that the operand, while being a memory input, will
> > be some temporary containing a copy of the value rather than than the
> > variable itself.
> 
> Sure, I'm just thinking that diagnosing sth at gimplification time
> feels wrong ... after all we can make it unexpected but valid GIMPLE.

We already do diagnose tons of other cases for inline asm at
gimplification time.  I can replace the error with a warning followed by
copying it to addressable memory, that seems to be what the older gccs were
doing during expansion after issuing above mentioned warning.

Jakub


[patch] Restore cross-language inlining into Ada

2016-01-20 Thread Eric Botcazou
Hi,

this patch from Jan:
  https://gcc.gnu.org/ml/gcc-patches/2015-03/msg01388.html
totally disabled cross-language inlining into Ada without notice, by adding a 
check that always fails when the language of the callee is not Ada...
The attached patch simply deletes this new check to restore the initial state.

Tested on x86_64-suse-linux, OK for the mainline?


2016-01-20  Eric Botcazou  

* ipa-inline.c (can_inline_edge_p): Back out overzealous check on
flag_non_call_exceptions compatibility.


-- 
Eric BotcazouIndex: ipa-inline.c
===
--- ipa-inline.c	(revision 232465)
+++ ipa-inline.c	(working copy)
@@ -432,11 +432,7 @@ can_inline_edge_p (struct cgraph_edge *e
 		 does not throw.
 		 This is tracked by DECL_FUNCTION_PERSONALITY.  */
 	  || (check_match (flag_non_call_exceptions)
-		  /* TODO: We also may allow bringing !flag_non_call_exceptions
-		 to flag_non_call_exceptions function, but that may need
-		 extra work in tree-inline to add the extra EH edges.  */
-		  && (!opt_for_fn (callee->decl, flag_non_call_exceptions)
-		  || DECL_FUNCTION_PERSONALITY (callee->decl)))
+		  && DECL_FUNCTION_PERSONALITY (callee->decl))
 	  || (check_maybe_up (flag_exceptions)
 		  && DECL_FUNCTION_PERSONALITY (callee->decl))
 	  /* Strictly speaking only when the callee contains function


Re: [committed] Add oacc_kernels_p argument to pass_parallelize_loops

2016-01-20 Thread Thomas Schwinge
Hi!

On Mon, 18 Jan 2016 14:07:11 +0100, Tom de Vries  wrote:
> Add oacc_kernels_p argument to pass_parallelize_loops

> --- a/gcc/tree-parloops.c
> +++ b/gcc/tree-parloops.c

> @@ -2315,6 +2367,9 @@ gen_parallel_loop (struct loop *loop,

|   /* Ensure that the exit condition is the first statement in the loop.
|  The common case is that latch of the loop is empty (apart from the
|  increment) and immediately follows the loop exit test.  Attempt to move 
the
|  entry of the loop directly before the exit check and increase the number 
of
|  iterations of the loop by one.  */
|   if (try_transform_to_exit_first_loop_alt (loop, reduction_list, nit))
| {
|   if (dump_file
| && (dump_flags & TDF_DETAILS))
|   fprintf (dump_file,
|"alternative exit-first loop transform succeeded"
|" for loop %d\n", loop->num);
| }
|   else
| {
> +  if (oacc_kernels_p)
> + n_threads = 1;
> +
|   /* Fall back on the method that handles more cases, but duplicates the
|loop body: move the exit condition of LOOP to the beginning of its
|header, and duplicate the part of the last iteration that gets disabled
|to the exit of the loop.  */
|   transform_to_exit_first_loop (loop, reduction_list, nit);
| }

Just for my own education: this pessimization "n_threads = 1" for OpenACC
kernels is because the duplicated loop bodies generated by
transform_to_exit_first_loop are not appropriate for parallel OpenACC
offloading execution?  (Might add a source code comment here?)  Testing
on gomp-4_0-branch, there are no changes in the testsuite if I remove
this hunk.


Grüße
 Thomas


signature.asc
Description: PGP signature


Re: [PATCH] Fix ICE with asm "m" (stmt-expr) operand (PR middle-end/67653)

2016-01-20 Thread Richard Biener
On Wed, 20 Jan 2016, Jakub Jelinek wrote:

> On Wed, Jan 20, 2016 at 10:24:40AM +0100, Richard Biener wrote:
> > > We could diagnose a statement expression in "m", but not sure if that is 
> > > all
> > > that can get wrong, or if all statement expressions are problematic.
> > 
> > I thought about either requiring an lvalue here or at least diagnosing
> > that a non-lvalue might end up using a memory temporary.
> 
> Requiring an lvalue sounds wrong, "m" input can be validly e.g. const object 
> that
> can't be assigned to.

Ok, so maybe the term "lvalue" is wrong but certainly an arbitrary
rvalue is not "valid".

>  Furthermore, I'm really afraid changing that would break
> too much stuff.
> We had until GCC 4.7 a warning:
>   warning (0, "use of memory input without lvalue in "
>"asm operand %d is deprecated", i + noutputs);
> but 1) it wasn't anywhere near frontend, it was during expansion
> 2) it wasn't really checking lvalue, but whether the operand is a MEM or not
> > 
> > > > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> > > > 
> > > > What happens if we just do _not_ mark the memory input addressable?
> > > > Shouldn't IRA/LRA in the end satisfy the constraint by spilling
> > > > a non-memory input and using the spill slot?
> > > 
> > > Well, if you want to make broken testcases work, it is always possible
> > > to call say prepare_gimple_addressable, but I'd think it is preferrable
> > > to tell people that what they do is really going to do something different
> > > from what they expect (that the operand, while being a memory input, will
> > > be some temporary containing a copy of the value rather than than the
> > > variable itself.
> > 
> > Sure, I'm just thinking that diagnosing sth at gimplification time
> > feels wrong ... after all we can make it unexpected but valid GIMPLE.
> 
> We already do diagnose tons of other cases for inline asm at
> gimplification time.  I can replace the error with a warning followed by
> copying it to addressable memory, that seems to be what the older gccs were
> doing during expansion after issuing above mentioned warning.

That sounds better then though it would be nice to warn from the FE
somehow.

Richard.


Re: [patch] Restore cross-language inlining into Ada

2016-01-20 Thread Richard Biener
On Wed, Jan 20, 2016 at 9:32 AM, Eric Botcazou  wrote:
> Hi,
>
> this patch from Jan:
>   https://gcc.gnu.org/ml/gcc-patches/2015-03/msg01388.html
> totally disabled cross-language inlining into Ada without notice, by adding a
> check that always fails when the language of the callee is not Ada...
> The attached patch simply deletes this new check to restore the initial state.
>
> Tested on x86_64-suse-linux, OK for the mainline?

I think the intent was to allow inlining a non-throwing -fnon-call-exceptions
function into a not -fnon-call-exceptions function but _not_ a
non-throwing not -fnon-call-exceptions function (that "not-throwing" is
basically a non-sensible test) into a -fnon-call-exceptions function
because that may now miss EH edges.

So the test looks conservatively correct to me - we can't reliably
check whether the callee throws if the IL now were -fnon-call-exceptions
(which we know the caller is after !opt_for_fn (callee->decl,
flag_non_call_exceptions)

So - this doesn't look correct to me.

OTOH

static inline int foo (int a, int *b)
{
  return a / *b;
}

int __attribute__((optimize("non-call-exceptions")))
bar (int *p, int *b)
{
  try
{
  return foo (*p, b);
}
  catch (...)
{
  return 0;
}
}

happily inlines foo with your patch but doesn't ICE during stmt verification.

So maybe we're not verifying that "correctness" part - ah, yeah, I think
we changed it to only verify EH tree vs. stmt consistency but not the
other way around.

Not sure if we already have a C++ testcase like the above, if not can
you add this one to the torture?

Given this I wonder if we can also change check_match to check_maybe_up,
basically handle -fnon-call-exceptions the same as -fexceptions.

Thanks,
Richard.

>
> 2016-01-20  Eric Botcazou  
>
> * ipa-inline.c (can_inline_edge_p): Back out overzealous check on
> flag_non_call_exceptions compatibility.
>
>
> --
> Eric Botcazou


Re: [wwwdocs] Update changes.html for LTO and IPA

2016-01-20 Thread Jan Hubicka
> >+  is not performed. GCC 7 will support incremental IL linking.
> 
> "IL" again  what does this mean to users?

Thanks for corrections, I will apply them and post updated patch.  Here I
wanted to explain that gcc -r should now give a correct code (while with
earlier GCC releases it will produce code that often works but sometimes
doesn't).

Doing incremental link however will prevent whole program optimization from
happening because it produces final assembly at gcc -r time.  To get this
working as expected you want gcc -r to incrementally link into the LTO IL again
that is implemented by patch 
https://gcc.gnu.org/ml/gcc-patches/2015-12/msg0.html
but it seems it will only get in next stage 1.

Honza


Re: -z bndplt documentation in GCC manual

2016-01-20 Thread Ilya Enkovich
2016-01-20 3:42 GMT+03:00 Sandra Loosemore :
> On 01/19/2016 03:24 AM, Ilya Enkovich wrote:
>>
>> 2016-01-19 5:25 GMT+03:00 Sandra Loosemore :
>>>
>>> I think the documentation relating to '-z bndplt' in the GCC manual
>>> description of -fcheck-pointer-bounds is incorrect.  It looks like, as of
>>> r225862, the GCC driver is supposed to emit an error message if GCC was
>>> configured with a linker that doesn't support this option and you pass
>>> -mmpx
>>> without -static.  Is that right?  I'll fix the documentation once I'm
>>> clear
>>> on what the actual behavior is.
>>
>>
>> Compiler just emits a note where user is warned that GCC configuration may
>> lead to decreased instrumentation coverage.
>
>
> OK.  Is the attached patch accurate?  The existing text has several
> markup/grammatical/spelling errors and I'd like to simplify it to make it
> less repetitive and more direct and user-friendly.

I think your text accurately describes the situation. Thanks!

Ilya

>
> (BTW, part of the problem I had parsing the code is that the manual doesn't
> document the %n spec file syntax, or several other % escapes.  I opened
> PR69367 for that since I have too many other things in my pile to get to it
> any time soon.)
>
> -Sandra


Re: [PATCH] Fix c/68513 for GCC5 (match.pd and SAVE_EXPRs)

2016-01-20 Thread Richard Biener
On Tue, Jan 19, 2016 at 8:48 PM, Marek Polacek  wrote:
> Recently on IRC we've concluded that for GCC 5 the simplest solution
> will be to just disable the problematic pattern on GENERIC.  So done
> in the following.  (The problem was that the match.pd pattern created
> SAVE_EXPRs which then leaked into gimplification.)
>
> Bootstrapped/regtested on x86_64-linux, ok for 5?

Please instead wrap the pattern in

#if GIMPLE
...
#endif

and add a comment refering to the PR.

Ok with that change.

Thanks,
Richard.

> 2016-01-19  Marek Polacek  
>
> PR c/68513
> * match.pd ((x & ~m) | (y & m)): Only perform on GIMPLE.
>
> * gcc.dg/pr68513.c: New test.
>
> diff --git gcc/match.pd gcc/match.pd
> index e40720e..0b557e6 100644
> --- gcc/match.pd
> +++ gcc/match.pd
> @@ -385,8 +385,9 @@ along with GCC; see the file COPYING3.  If not see
>  /* (x & ~m) | (y & m) -> ((x ^ y) & m) ^ x */
>  (simplify
>(bit_ior:c (bit_and:c@3 @0 (bit_not @2)) (bit_and:c@4 @1 @2))
> -  (if ((TREE_CODE (@3) != SSA_NAME || has_single_use (@3))
> -   && (TREE_CODE (@4) != SSA_NAME || has_single_use (@4)))
> +  (if (GIMPLE
> +   && (TREE_CODE (@3) != SSA_NAME || has_single_use (@3))
> +   && (TREE_CODE (@4) != SSA_NAME || has_single_use (@4)))
> (bit_xor (bit_and (bit_xor @0 @1) @2) @0)))
>
>
> diff --git gcc/testsuite/gcc.dg/pr68513.c gcc/testsuite/gcc.dg/pr68513.c
> index e69de29..86f878d 100644
> --- gcc/testsuite/gcc.dg/pr68513.c
> +++ gcc/testsuite/gcc.dg/pr68513.c
> @@ -0,0 +1,125 @@
> +/* PR c/68513 */
> +/* { dg-do compile } */
> +/* { dg-options "-funsafe-math-optimizations -fno-math-errno -O 
> -Wno-div-by-zero" } */
> +
> +int i;
> +unsigned u;
> +volatile int *e;
> +
> +#define E (i ? *e : 0)
> +
> +/* Can't trigger some of them because operand_equal_p will return false
> +   for side-effects.  */
> +
> +/* (x & ~m) | (y & m) -> ((x ^ y) & m) ^ x */
> +int
> +fn1 (void)
> +{
> +  int r = 0;
> +  r += (short) (E & ~u | i & u);
> +  r += -(short) (E & ~u | i & u);
> +  r += (short) -(E & ~u | i & u);
> +  return r;
> +}
> +
> +/* sqrt(x) < y is x >= 0 && x != +Inf, when y is large.  */
> +double
> +fn2 (void)
> +{
> +  double r;
> +  r = __builtin_sqrt (E) < __builtin_inf ();
> +  return r;
> +}
> +
> +/* sqrt(x) < c is the same as x >= 0 && x < c*c.  */
> +double
> +fn3 (void)
> +{
> +  double r;
> +  r = __builtin_sqrt (E) < 1.3;
> +  return r;
> +}
> +
> +/* copysign(x,y)*copysign(x,y) -> x*x.  */
> +double
> +fn4 (double y, double x)
> +{
> +  return __builtin_copysign (E, y) * __builtin_copysign (E, y);
> +}
> +
> +/* x <= +Inf is the same as x == x, i.e. !isnan(x).  */
> +int
> +fn5 (void)
> +{
> +  return E <= __builtin_inf ();
> +}
> +
> +/* Fold (A & ~B) - (A & B) into (A ^ B) - B.  */
> +int
> +fn6 (void)
> +{
> +  return (i & ~E) - (i & E);
> +}
> +
> +/* Fold (A & B) - (A & ~B) into B - (A ^ B).  */
> +int
> +fn7 (void)
> +{
> +  return (i & E) - (i & ~E);
> +}
> +
> +/* x + (x & 1) -> (x + 1) & ~1 */
> +int
> +fn8 (void)
> +{
> +  return E + (E & 1);
> +}
> +
> +/* Simplify comparison of something with itself.  */
> +int
> +fn9 (void)
> +{
> +  return E <= E | E >= E;
> +}
> +
> +/* Fold (A & ~B) - (A & B) into (A ^ B) - B.  */
> +int
> +fn10 (void)
> +{
> +  return (i & ~E) - (i & E);
> +}
> +
> +/* abs(x)*abs(x) -> x*x.  Should be valid for all types.  */
> +int
> +fn11 (void)
> +{
> +  return __builtin_abs (E) * __builtin_abs (E);
> +}
> +
> +/* (x | CST1) & CST2 -> (x & CST2) | (CST1 & CST2) */
> +int
> +fn12 (void)
> +{
> +  return (E | 11) & 12;
> +}
> +
> +/* fold_range_test */
> +int
> +fn13 (const char *s)
> +{
> +  return s[E] != '\0' && s[E] != '/';
> +}
> +
> +/* fold_comparison */
> +int
> +fn14 (void)
> +{
> +  return (!!i ? : (u *= E / 0)) >= (u = E);
> +}
> +
> +/* fold_mult_zconjz */
> +_Complex int
> +fn15 (_Complex volatile int *z)
> +{
> +  return *z * ~*z;
> +}
>
> Marek


Re: [PATCH] Fix c/68513 for GCC5 (match.pd and SAVE_EXPRs)

2016-01-20 Thread Marek Polacek
On Wed, Jan 20, 2016 at 12:01:50PM +0100, Richard Biener wrote:
> On Tue, Jan 19, 2016 at 8:48 PM, Marek Polacek  wrote:
> > Recently on IRC we've concluded that for GCC 5 the simplest solution
> > will be to just disable the problematic pattern on GENERIC.  So done
> > in the following.  (The problem was that the match.pd pattern created
> > SAVE_EXPRs which then leaked into gimplification.)
> >
> > Bootstrapped/regtested on x86_64-linux, ok for 5?
> 
> Please instead wrap the pattern in
> 
> #if GIMPLE
> ...
> #endif
> 
> and add a comment refering to the PR.
> 
> Ok with that change.

Thanks, done:

2016-01-20  Marek Polacek  

PR c/68513
* match.pd ((x & ~m) | (y & m)): Only perform on GIMPLE.

* gcc.dg/pr68513.c: New test.

diff --git gcc/match.pd gcc/match.pd
index e40720e..405fec6 100644
--- gcc/match.pd
+++ gcc/match.pd
@@ -382,12 +382,15 @@ along with GCC; see the file COPYING3.  If not see
   (bit_not (bit_not @0))
   @0)
 
+/* Disable on GENERIC because of PR68513.  */
+#if GIMPLE
 /* (x & ~m) | (y & m) -> ((x ^ y) & m) ^ x */
 (simplify
   (bit_ior:c (bit_and:c@3 @0 (bit_not @2)) (bit_and:c@4 @1 @2))
   (if ((TREE_CODE (@3) != SSA_NAME || has_single_use (@3))
&& (TREE_CODE (@4) != SSA_NAME || has_single_use (@4)))
(bit_xor (bit_and (bit_xor @0 @1) @2) @0)))
+#endif
 
 
 /* Associate (p +p off1) +p off2 as (p +p (off1 + off2)).  */
diff --git gcc/testsuite/gcc.dg/pr68513.c gcc/testsuite/gcc.dg/pr68513.c
index e69de29..86f878d 100644
--- gcc/testsuite/gcc.dg/pr68513.c
+++ gcc/testsuite/gcc.dg/pr68513.c
@@ -0,0 +1,125 @@
+/* PR c/68513 */
+/* { dg-do compile } */
+/* { dg-options "-funsafe-math-optimizations -fno-math-errno -O 
-Wno-div-by-zero" } */
+
+int i;
+unsigned u;
+volatile int *e;
+
+#define E (i ? *e : 0)
+
+/* Can't trigger some of them because operand_equal_p will return false
+   for side-effects.  */
+
+/* (x & ~m) | (y & m) -> ((x ^ y) & m) ^ x */
+int
+fn1 (void)
+{
+  int r = 0;
+  r += (short) (E & ~u | i & u);
+  r += -(short) (E & ~u | i & u);
+  r += (short) -(E & ~u | i & u);
+  return r;
+}
+
+/* sqrt(x) < y is x >= 0 && x != +Inf, when y is large.  */
+double
+fn2 (void)
+{
+  double r;
+  r = __builtin_sqrt (E) < __builtin_inf ();
+  return r;
+}
+
+/* sqrt(x) < c is the same as x >= 0 && x < c*c.  */
+double
+fn3 (void)
+{
+  double r;
+  r = __builtin_sqrt (E) < 1.3;
+  return r;
+}
+
+/* copysign(x,y)*copysign(x,y) -> x*x.  */
+double
+fn4 (double y, double x)
+{
+  return __builtin_copysign (E, y) * __builtin_copysign (E, y);
+}
+
+/* x <= +Inf is the same as x == x, i.e. !isnan(x).  */
+int
+fn5 (void)
+{
+  return E <= __builtin_inf ();
+}
+
+/* Fold (A & ~B) - (A & B) into (A ^ B) - B.  */
+int
+fn6 (void)
+{
+  return (i & ~E) - (i & E);
+}
+
+/* Fold (A & B) - (A & ~B) into B - (A ^ B).  */
+int
+fn7 (void)
+{
+  return (i & E) - (i & ~E);
+}
+
+/* x + (x & 1) -> (x + 1) & ~1 */
+int
+fn8 (void)
+{
+  return E + (E & 1);
+}
+
+/* Simplify comparison of something with itself.  */
+int
+fn9 (void)
+{
+  return E <= E | E >= E;
+}
+
+/* Fold (A & ~B) - (A & B) into (A ^ B) - B.  */
+int
+fn10 (void)
+{
+  return (i & ~E) - (i & E);
+}
+
+/* abs(x)*abs(x) -> x*x.  Should be valid for all types.  */
+int
+fn11 (void)
+{
+  return __builtin_abs (E) * __builtin_abs (E);
+}
+
+/* (x | CST1) & CST2 -> (x & CST2) | (CST1 & CST2) */
+int
+fn12 (void)
+{
+  return (E | 11) & 12;
+}
+
+/* fold_range_test */
+int
+fn13 (const char *s)
+{
+  return s[E] != '\0' && s[E] != '/';
+}
+
+/* fold_comparison */
+int
+fn14 (void)
+{
+  return (!!i ? : (u *= E / 0)) >= (u = E);
+}
+
+/* fold_mult_zconjz */
+_Complex int
+fn15 (_Complex volatile int *z)
+{
+  return *z * ~*z;
+}

Marek


[gomp4, committed] Remove reduction clauses in kernels region earlier

2016-01-20 Thread Tom de Vries

Hi,

I've committed this patch to gomp-4_0-branch, moving the removal of the 
reduction clauses in the kernels region earlier, before localize_reductions.


Thanks,
- Tom
Remove reduction clauses in kernels region earlier

2016-01-20  Tom de Vries  

	* gimplify.c (gimplify_omp_for): Remove reduction clauses in kernels
	region.

	* testsuite/libgomp.oacc-fortran/kernels-acc-loop-reduction-2.f90: New test.
	* testsuite/libgomp.oacc-fortran/kernels-acc-loop-reduction.f90: New test.

	* gfortran.dg/goacc/reduction-2.f95: Remove scans related to reductions
	in kernels region.

---
 gcc/gimplify.c | 20 +
 gcc/testsuite/gfortran.dg/goacc/reduction-2.f95|  2 --
 .../kernels-acc-loop-reduction-2.f90   | 26 ++
 .../kernels-acc-loop-reduction.f90 | 21 +
 4 files changed, 67 insertions(+), 2 deletions(-)

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index eda2e9c..cdb5b96 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -8505,6 +8505,26 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
   gcc_unreachable ();
 }
 
+  /* Strip out reductions, as they are not handled yet.  */
+  if (gimplify_omp_ctxp != NULL
+  && (gimplify_omp_ctxp->region_type == ORT_ACC_KERNELS
+	  || (gimplify_omp_ctxp->outer_context != NULL
+	  && (gimplify_omp_ctxp->outer_context->region_type
+		  == ORT_ACC_KERNELS
+{
+  tree *prev_ptr = _FOR_CLAUSES (for_stmt);
+
+  while (tree probe = *prev_ptr)
+	{
+	  tree *next_ptr = _CLAUSE_CHAIN (probe);
+
+	  if (OMP_CLAUSE_CODE (probe) == OMP_CLAUSE_REDUCTION)
+	*prev_ptr = *next_ptr;
+	  else
+	prev_ptr = next_ptr;
+	}
+}
+
   if (ort == ORT_ACC)
 localize_reductions (expr_p, false);
 
diff --git a/gcc/testsuite/gfortran.dg/goacc/reduction-2.f95 b/gcc/testsuite/gfortran.dg/goacc/reduction-2.f95
index fe07c78..d121110 100644
--- a/gcc/testsuite/gfortran.dg/goacc/reduction-2.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/reduction-2.f95
@@ -17,6 +17,4 @@ end subroutine
 
 ! { dg-final { scan-tree-dump-times "target oacc_parallel firstprivate.a." 1 "gimple" } }
 ! { dg-final { scan-tree-dump-times "acc loop reduction..:a. private.p." 1 "gimple" } }
-! { dg-final { scan-tree-dump-times "target oacc_kernels map.force_tofrom:a .len: 4.." 1 "gimple" } }
-! { dg-final { scan-tree-dump-times "acc loop reduction..:a. private.k." 1 "gimple" } }
 
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/kernels-acc-loop-reduction-2.f90 b/libgomp/testsuite/libgomp.oacc-fortran/kernels-acc-loop-reduction-2.f90
new file mode 100644
index 000..fdf9409
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/kernels-acc-loop-reduction-2.f90
@@ -0,0 +1,26 @@
+program foo
+
+  IMPLICIT NONE
+  INTEGER :: vol = 0
+
+  call bar (vol)
+
+  if (vol .ne. 4) call abort
+end program foo
+
+subroutine bar(vol)
+  IMPLICIT NONE
+
+  INTEGER :: vol
+  INTEGER :: j,k
+
+  !$ACC KERNELS
+  !$ACC LOOP REDUCTION(+:vol)
+  DO k=1,2
+ !$ACC LOOP REDUCTION(+:vol)
+ DO j=1,2
+	vol = vol + 1
+ ENDDO
+  ENDDO
+  !$ACC END KERNELS
+end subroutine bar
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/kernels-acc-loop-reduction.f90 b/libgomp/testsuite/libgomp.oacc-fortran/kernels-acc-loop-reduction.f90
new file mode 100644
index 000..912a22b
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/kernels-acc-loop-reduction.f90
@@ -0,0 +1,21 @@
+program foo
+  IMPLICIT NONE
+  INTEGER :: vol = 0
+
+  call bar (vol)
+
+  if (vol .ne. 2) call abort
+end program foo
+
+subroutine bar(vol)
+  IMPLICIT NONE
+  INTEGER :: vol
+  INTEGER :: j
+
+  !$ACC KERNELS
+  !$ACC LOOP REDUCTION(+:vol)
+  DO j=1,2
+ vol = vol + 1
+  ENDDO
+  !$ACC END KERNELS
+end subroutine bar


[wwwdocs] Add GCC 6 porting_to template

2016-01-20 Thread Marek Polacek
I suspect we'll be adding some real material there soon.

Applied.

Index: porting_to.html
===
RCS file: porting_to.html
diff -N porting_to.html
--- /dev/null   1 Jan 1970 00:00:00 -
+++ porting_to.html 20 Jan 2016 11:53:29 -
@@ -0,0 +1,41 @@
+
+
+
+Porting to GCC 6
+
+
+
+Porting to GCC 6
+
+
+The GCC 6 release series differs from previous GCC releases in
+a number of ways. Some of
+these are a result of bug fixing, and some old behaviors have been
+intentionally changed in order to support new standards, or relaxed
+in standards-conforming ways to facilitate compilation or run-time
+performance.  Some of these changes are not visible to the naked eye
+and will not cause problems when updating from older versions.
+
+
+
+However, some of these changes are visible, and can cause grief to
+users porting to GCC 6. This document is an effort to identify major
+issues and provide clear solutions in a quick and easily searched
+manner. Additions and suggestions for improvement are welcome.
+
+
+
+Preprocessor issues
+
+
+C language issues
+
+
+C++ language issues
+
+
+Links
+
+
+
+

Marek


Re: [PATCH] DWARF: add abstract origin links on lexical blocks DIEs

2016-01-20 Thread Pierre-Marie de Rodat

On 01/18/2016 10:47 AM, Pierre-Marie de Rodat wrote:

Thank you for your inputs! I’m going to try that, then. I hope this test
will not be too fragile…


Here it is! Re-bootstrapped and regtested successfuly on x86_64-linux. 
I’ve checked that the testcase fails on the mainline.


--
Pierre-Marie de Rodat
>From 451d62ff871734727b0f0f570f89b6cfbed922f2 Mon Sep 17 00:00:00 2001
From: Pierre-Marie de Rodat 
Date: Tue, 12 Jan 2016 14:50:33 +0100
Subject: [PATCH] DWARF: add abstract origin links on lexical blocks DIEs

Track from which abstract lexical block concrete ones come from in DWARF
so that debuggers can inherit the former from the latter. This enables
debuggers to properly handle the following case:

  * function Child2 is nested in a lexical block, itself nested in
function Child1;
  * function Child1 is inlined into some call site;
  * function Child2 is never inlined.

Here, Child2 is described in DWARF only in the abstract instance of
Child1. So when debuggers decode Child1's concrete instances, they need
to fetch the definition for Child2 in the corresponding abstract
instance: the DW_AT_abstract_origin link on the lexical block that
embeds Child1 enables them to do that.

Bootstrapped and regtested on x86_64-linux.

gcc/ChangeLog:

	* dwarf2out.c (add_abstract_origin_attribute): Adjust
	documentation comment.  For BLOCK nodes, add a
	DW_AT_abstract_origin attribute that points to the DIE generated
	for the origin BLOCK.
	(gen_lexical_block_die): Call add_abstract_origin_attribute for
	blocks from inlined functions.

gcc/testsuite/Changelog:

	* gcc.dg/debug/dwarf2/nested_fun.c: New testcase.
---
 gcc/dwarf2out.c| 13 --
 gcc/testsuite/gcc.dg/debug/dwarf2/nested_fun.c | 65 ++
 2 files changed, 75 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/nested_fun.c

diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index f742900..d1503ec 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -18470,15 +18470,16 @@ add_prototyped_attribute (dw_die_ref die, tree func_type)
 }
 
 /* Add an 'abstract_origin' attribute below a given DIE.  The DIE is found
-   by looking in either the type declaration or object declaration
-   equate table.  */
+   by looking in the type declaration, the object declaration equate table or
+   the block mapping.  */
 
 static inline dw_die_ref
 add_abstract_origin_attribute (dw_die_ref die, tree origin)
 {
   dw_die_ref origin_die = NULL;
 
-  if (TREE_CODE (origin) != FUNCTION_DECL)
+  if (TREE_CODE (origin) != FUNCTION_DECL
+  && TREE_CODE (origin) != BLOCK)
 {
   /* We may have gotten separated from the block for the inlined
 	 function, if we're in an exception handler or some such; make
@@ -18500,6 +18501,8 @@ add_abstract_origin_attribute (dw_die_ref die, tree origin)
 origin_die = lookup_decl_die (origin);
   else if (TYPE_P (origin))
 origin_die = lookup_type_die (origin);
+  else if (TREE_CODE (origin) == BLOCK)
+origin_die = BLOCK_DIE (origin);
 
   /* XXX: Functions that are never lowered don't always have correct block
  trees (in the case of java, they simply have no block tree, in some other
@@ -21307,6 +21310,10 @@ gen_lexical_block_die (tree stmt, dw_die_ref context_die)
 	  BLOCK_DIE (stmt) = stmt_die;
 	  old_die = NULL;
 	}
+
+  tree origin = block_ultimate_origin (stmt);
+  if (origin != NULL_TREE && origin != stmt)
+	add_abstract_origin_attribute (stmt_die, origin);
 }
 
   if (old_die)
diff --git a/gcc/testsuite/gcc.dg/debug/dwarf2/nested_fun.c b/gcc/testsuite/gcc.dg/debug/dwarf2/nested_fun.c
new file mode 100644
index 000..c783ac0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/debug/dwarf2/nested_fun.c
@@ -0,0 +1,65 @@
+/* As part of inlining, a BLOCK (described as DW_TAG_lexical_block DIE's) may
+   be present both as an abstract instance and a concrete one in the DWARF
+   output.  This testcase attempts to make sure that the concrete ones refer to
+   the abstract ones thanks to the DW_AT_abstract_origin attribute.
+
+   Such a back-link enables debuggers to make entities present in the abstract
+   instance only available in concrete ones.  */
+
+/* { dg-options "-O2 -g -std=gnu99 -gdwarf -dA" } */
+/* { dg-final { scan-assembler-times "\\(DIE \\(0x.*\\) DW_TAG_lexical_block\\)\[^)\]*DW_AT_abstract_origin" 1 } } */
+
+extern void *create (const char *);
+extern void destroy (void *);
+extern void do_nothing (char);
+
+struct string
+{
+  const char *data;
+  int lb;
+  int ub;
+};
+
+int
+main (void)
+{
+  void *o1 = create ("foo");
+
+  void
+  parent (void)
+  {
+{
+  void *o2 = create ("bar");
+
+  int
+  child (struct string s)
+  {
+	int i = s.lb;
+
+	if (s.lb <= s.ub)
+	  while (1)
+	{
+	  char c = s.data[i - s.lb];
+	  do_nothing (c);
+	  if (c == 'o')
+		return 1;
+	  if (i == s.ub)
+		break;
+	  ++i;
+	}
+	return 0;
+  }
+
+  int r;
+
+ 

Re: [committed] Add oacc_kernels_p argument to pass_parallelize_loops

2016-01-20 Thread Tom de Vries

On 20/01/16 09:54, Thomas Schwinge wrote:

Hi!

On Mon, 18 Jan 2016 14:07:11 +0100, Tom de Vries  wrote:

Add oacc_kernels_p argument to pass_parallelize_loops



--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c



@@ -2315,6 +2367,9 @@ gen_parallel_loop (struct loop *loop,


|   /* Ensure that the exit condition is the first statement in the loop.
|  The common case is that latch of the loop is empty (apart from the
|  increment) and immediately follows the loop exit test.  Attempt to move 
the
|  entry of the loop directly before the exit check and increase the number 
of
|  iterations of the loop by one.  */
|   if (try_transform_to_exit_first_loop_alt (loop, reduction_list, nit))
| {
|   if (dump_file
| && (dump_flags & TDF_DETAILS))
|   fprintf (dump_file,
|"alternative exit-first loop transform succeeded"
|" for loop %d\n", loop->num);
| }
|   else
| {

+  if (oacc_kernels_p)
+   n_threads = 1;
+

|   /* Fall back on the method that handles more cases, but duplicates the
|loop body: move the exit condition of LOOP to the beginning of its
|header, and duplicate the part of the last iteration that gets disabled
|to the exit of the loop.  */
|   transform_to_exit_first_loop (loop, reduction_list, nit);
| }

Just for my own education: this pessimization "n_threads = 1" for OpenACC
kernels is because the duplicated loop bodies generated by
transform_to_exit_first_loop are not appropriate for parallel OpenACC
offloading execution?


In the case of standard parloops, only the loop is executed in parallel, 
so the duplicated loop body is outside the parallel region.


In the case of oacc parloops, the duplicated body is included in the 
kernels region, and executed in parallel.


The duplicated body for the last iteration can be executed in parallel 
with the loop body in the loop for all the other iterations. We've done 
the dependency analysis for that.


But the duplicated loop body for the last iteration is now executed in 
parallel with itself as well. We've got code that deals with that by 
guarding the side-effects such that they're only executed for a single 
gang. But that code is atm only effective in oacc_entry_exit_ok, before 
transform_to_exit_first_loop_alt introduces the duplicated loop body.



(Might add a source code comment here?)  Testing
on gomp-4_0-branch, there are no changes in the testsuite if I remove
this hunk.


If you want to see the effect of removing the 'n_threads = 1' hunk, make 
try_transform_to_exit_first_loop_alt always return false.


I expect a loop
  for (i = 0; i < N; ++i)
a[i] = a[i] + 1;
would give incorrect results in a[N - 1].

Thanks,
- Tom


Re: [C/C++ PATCH] Don't emit invalid VEC_COND_EXPR for vector comparisons (PR c/68062)

2016-01-20 Thread Marek Polacek
On Wed, Jan 13, 2016 at 11:11:52PM +, Joseph Myers wrote:
> The C front-end changes are OK.

Jason, is the C++ part of this patch here

(which is identical to the change in the C FE) ok?

Also, not sure about backporting this, maybe just to 5?

Marek


Re: [PATCH] Require non-x32 target for compile-time MPX tests

2016-01-20 Thread Ilya Enkovich
2016-01-20 8:29 GMT+03:00 H.J. Lu :
> Compile-time MPX tests don't need the MPX run-time library.  They
> should pass for non-x32 target.
>
> OK for trunk and backport to GCC 5 branch?

This patch is OK.

Thanks,
Ilya

>
> H.J.
> ---
> Compile-time MPX tests don't need the MPX run-time library.  They
> should pass for non-x32 target.
>
> PR testsuite/69369
> * g++.dg/pr63995-1.C: Require non-x32 target, instead of,
> the MPX run-time library, for compile-time MPX test.
> * gcc.target/i386/chkp-always_inline.c: Likewise.
> * gcc.target/i386/chkp-bndret.c: Likewise.
> * gcc.target/i386/chkp-builtins-1.c: Likewise.
> * gcc.target/i386/chkp-builtins-2.c: Likewise.
> * gcc.target/i386/chkp-builtins-3.c: Likewise.
> * gcc.target/i386/chkp-builtins-4.c: Likewise.
> * gcc.target/i386/chkp-const-check-1.c: Likewise.
> * gcc.target/i386/chkp-const-check-2.c: Likewise.
> * gcc.target/i386/chkp-hidden-def.c: Likewise.
> * gcc.target/i386/chkp-label-address.c: Likewise.
> * gcc.target/i386/chkp-lifetime-1.c: Likewise.
> * gcc.target/i386/chkp-narrow-bounds.c: Likewise.
> * gcc.target/i386/chkp-pr69044.c: Likewise.
> * gcc.target/i386/chkp-remove-bndint-1.c: Likewise.
> * gcc.target/i386/chkp-remove-bndint-2.c: Likewise.
> * gcc.target/i386/chkp-strchr.c: Likewise.
> * gcc.target/i386/chkp-strlen-1.c: Likewise.
> * gcc.target/i386/chkp-strlen-2.c: Likewise.
> * gcc.target/i386/chkp-strlen-3.c: Likewise.
> * gcc.target/i386/chkp-strlen-4.c: Likewise.
> * gcc.target/i386/chkp-strlen-5.c: Likewise.
> * gcc.target/i386/chkp-stropt-1.c: Likewise.
> * gcc.target/i386/chkp-stropt-10.c: Likewise.
> * gcc.target/i386/chkp-stropt-11.c: Likewise.
> * gcc.target/i386/chkp-stropt-12.c: Likewise.
> * gcc.target/i386/chkp-stropt-13.c: Likewise.
> * gcc.target/i386/chkp-stropt-14.c: Likewise.
> * gcc.target/i386/chkp-stropt-15.c: Likewise.
> * gcc.target/i386/chkp-stropt-16.c: Likewise.
> * gcc.target/i386/chkp-stropt-2.c: Likewise.
> * gcc.target/i386/chkp-stropt-3.c: Likewise.
> * gcc.target/i386/chkp-stropt-4.c: Likewise.
> * gcc.target/i386/chkp-stropt-5.c: Likewise.
> * gcc.target/i386/chkp-stropt-6.c: Likewise.
> * gcc.target/i386/chkp-stropt-7.c: Likewise.
> * gcc.target/i386/chkp-stropt-8.c: Likewise.
> * gcc.target/i386/chkp-stropt-9.c: Likewise.
> * gcc.target/i386/pr63995-2.c: Likewise.
> * gcc.target/i386/pr64805.c: Likewise.
> * gcc.target/i386/pr65044.c: Likewise.
> * gcc.target/i386/pr65167.c: Likewise.
> * gcc.target/i386/pr65183.c: Likewise.
> * gcc.target/i386/pr65184.c: Likewise.
> * gcc.target/i386/thunk-retbnd.c: Likewise.
> ---
>  gcc/testsuite/g++.dg/pr63995-1.C | 3 +--
>  gcc/testsuite/gcc.target/i386/chkp-always_inline.c   | 3 +--
>  gcc/testsuite/gcc.target/i386/chkp-bndret.c  | 3 +--
>  gcc/testsuite/gcc.target/i386/chkp-builtins-1.c  | 3 +--
>  gcc/testsuite/gcc.target/i386/chkp-builtins-2.c  | 3 +--
>  gcc/testsuite/gcc.target/i386/chkp-builtins-3.c  | 3 +--
>  gcc/testsuite/gcc.target/i386/chkp-builtins-4.c  | 3 +--
>  gcc/testsuite/gcc.target/i386/chkp-const-check-1.c   | 3 +--
>  gcc/testsuite/gcc.target/i386/chkp-const-check-2.c   | 3 +--
>  gcc/testsuite/gcc.target/i386/chkp-hidden-def.c  | 3 +--
>  gcc/testsuite/gcc.target/i386/chkp-label-address.c   | 3 +--
>  gcc/testsuite/gcc.target/i386/chkp-lifetime-1.c  | 3 +--
>  gcc/testsuite/gcc.target/i386/chkp-narrow-bounds.c   | 3 +--
>  gcc/testsuite/gcc.target/i386/chkp-pr69044.c | 3 +--
>  gcc/testsuite/gcc.target/i386/chkp-remove-bndint-1.c | 3 +--
>  gcc/testsuite/gcc.target/i386/chkp-remove-bndint-2.c | 3 +--
>  gcc/testsuite/gcc.target/i386/chkp-strchr.c  | 3 +--
>  gcc/testsuite/gcc.target/i386/chkp-strlen-1.c| 3 +--
>  gcc/testsuite/gcc.target/i386/chkp-strlen-2.c| 3 +--
>  gcc/testsuite/gcc.target/i386/chkp-strlen-3.c| 3 +--
>  gcc/testsuite/gcc.target/i386/chkp-strlen-4.c| 3 +--
>  gcc/testsuite/gcc.target/i386/chkp-strlen-5.c| 3 +--
>  gcc/testsuite/gcc.target/i386/chkp-stropt-1.c| 3 +--
>  gcc/testsuite/gcc.target/i386/chkp-stropt-10.c   | 3 +--
>  gcc/testsuite/gcc.target/i386/chkp-stropt-11.c   | 3 +--
>  gcc/testsuite/gcc.target/i386/chkp-stropt-12.c   | 3 +--
>  gcc/testsuite/gcc.target/i386/chkp-stropt-13.c   | 3 +--
>  gcc/testsuite/gcc.target/i386/chkp-stropt-14.c   | 3 +--
>  gcc/testsuite/gcc.target/i386/chkp-stropt-15.c   | 3 +--
>  gcc/testsuite/gcc.target/i386/chkp-stropt-16.c   | 3 +--
>  gcc/testsuite/gcc.target/i386/chkp-stropt-2.c| 3 +--
>  gcc/testsuite/gcc.target/i386/chkp-stropt-3.c   

Re: Patch RFA: Add option -fcollectible-pointers, use it in ivopts

2016-01-20 Thread Richard Biener
On Wed, Jan 20, 2016 at 6:48 AM, Ian Lance Taylor  wrote:
> As discussed at https://gcc.gnu.org/ml/gcc/2016-01/msg00023.html , the
> Go frontend needs some way to prevent ivopts from temporarily removing
> all pointers into a memory block.  This patch adds a new option
> -fcollectible-pointers which makes that happen.  This is not the best
> way to solve the problem, but it seems safe for GCC 6.
>
> I made the option -fcollectible-pointers intending that any similar
> optimizations (or, rather, non-optimizations) can be captured in the
> same option.  And if we develop a better approach for ivopts, it can
> still be covered under this option name.
>
> Bootstrapped and tested on x86_64-pc-linux-gnu.  OK for mainline?

+  // -fcollectible-pointers means that we have to keep a real pointer
+  // live, but the ivopts code may replace a real pointer with one
+  // pointing before or after the memory block that is then adjusted
+  // into the memory block during the loop.
+  // FIXME: It would likely be better to actually force the pointer
+  // live and still use ivopts; for example, it would be enough to
+  // write the pointer into memory and keep it there until after the
+  // loop.
+  if (flag_collectible_pointers && POINTER_TYPE_P (TREE_TYPE (base)))
+return;

please use C-style comments.  The above is to add_autoinc_candidates
which I find weird - we certainly produce out-of-bound pointers even on
x86 which isn't auto-inc.  So this can't be a complete fix.  I guess you
are correct in disabling some offsetted address IV candidates but
I think there's some other issues regarding to exit test replacement
maybe (replace ptr <= ptr2 with ptr != ptr2 or so).

While the docs of the option look fine I find

+fcollectible-pointers
+Common Report Var(flag_collectible_pointers) Optimization
+Ensure that pointers are always collectible by a garbage collector

somewhat confusing (we don't collect pointers but pointed-to memory).
Maybe "Ensure that derived pointers always point to the original object"?
In that light -fcollectible-pointers is a bad option name as well.  Maybe
-fall-pointers-are-gc-roots or sth like that.

Richard.

> Ian
>
> gcc/ChangeLog:
>
> 2016-01-19  Ian Lance Taylor  
>
> * common.opt (fcollectible-pointers): New option.
> * tree-ssa-loop-ivopts.c (add_autoinc_candidates): If
> -fcollectible-pointers, skip pointers.
> * doc/invoke.texi (Optimize Options): Document
> -fcollectible-pointers
>
> gcc/testsuite/ChangeLog:
>
> 2016-01-19  Ian Lance Taylor  
>
> * gcc.dg/tree-ssa/ivopt_5.c: New test.


Re: [PATCH][GCC][ARM] testcase memset-inline-10.c uses -mfloat-abi=hard but does not check whether target supports it

2016-01-20 Thread Christophe Lyon
On 19 January 2016 at 12:51, Ramana Radhakrishnan
 wrote:
> On Thu, Nov 12, 2015 at 3:16 PM, Andre Vieira
>  wrote:
>> On 12/11/15 15:08, Andre Vieira wrote:
>>>
>>> Hi,
>>>
>>>This patch changes the memset-inline-10.c testcase to make sure that
>>> it is only compiled for ARM targets that support -mfloat-abi=hard using
>>> the fact that all non-thumb1 targets do.
>>>
>>>This is correct because all targets for which -mthumb causes the
>>> compiler to use thumb2 will support the generation of FP instructions.
>>>
>>>Tested by running regressions for this testcase for various ARM
>>> targets.
>>>
>>>Is this OK to commit?
>
> This is OK - Sorry about the delay in reviewing this.
>
> I'd like to restructure gcc.target/arm if I could at some point to be
> more resilient to multilib testing and prevent such long lists of
> directives in tests.
>
Indeed, as this patch makes this test now unsupported if one forces
runtestflags to include -marm or -march=armv5t.
thumb2_ok returns false in these cases, but the test used to pass.

Or maybe it's simpler to agree on not using this kind of flags during
validation, and rely on gcc configure flags to set them instead.


> regards
> Ramana
>
>>>
>>>Thanks,
>>>Andre Vieira
>>>
>>> gcc/testsuite/ChangeLog:
>>> 2015-11-06  Andre Vieira  
>>>
>>>  * gcc.target/arm/memset-inline-10.c: Added
>>>  dg-require-effective-target arm_thumb2_ok.
>>>
>> Now with attachment, sorry about that.
>>
>> Cheers,
>> Andre


Re: [PATCH PR68542]

2016-01-20 Thread Richard Biener
On Mon, Jan 18, 2016 at 3:50 PM, Yuri Rumyantsev  wrote:
> Richard,
>
> Here is the second part of patch which really preforms mask stores and
> all statements related to it to new basic block guarded by test on
> zero mask. Hew test is also added.
>
> Is it OK for trunk?

+  /* Pick up all masked stores in loop if any.  */
+  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next ())
+{
+  stmt = gsi_stmt (gsi);

you fail to iterate over all BBs of the loop here.  Please follow
other uses in the
vectorizer.

+  while (!worklist.is_empty ())
+{
+  gimple *last, *last_store;
+  edge e, efalse;
+  tree mask;
+  basic_block store_bb, join_bb;
+  gimple_stmt_iterator gsi_to;
+  /* tree arg3; */

remove

+  tree vdef, new_vdef;
+  gphi *phi;
+  bool first_dump;
+  tree vectype;
+  tree zero;
+
+  last = worklist.pop ();
+  mask = gimple_call_arg (last, 2);
+  /* Create new bb.  */

bb should be initialized from gimple_bb (last), not loop->header

+  e = split_block (bb, last);

+   gsi_from = gsi_for_stmt (stmt1);
+   gsi_to = gsi_start_bb (store_bb);
+   gsi_move_before (_from, _to);
+   update_stmt (stmt1);

I think the update_stmt is redundant and you should be able to
keep two gsis throughout the loop, from and to, no?

+   /* Put other masked stores with the same mask to STORE_BB.  */
+   if (worklist.is_empty ()
+   || gimple_call_arg (worklist.last (), 2) != mask
+   || !is_valid_sink (worklist.last (), last_store))

as I understand the code the last check is redundant with the invariant
you track if you verify the stmt you breaked from the inner loop is
actually equal to worklist.last () and you add a flag to track whether
you did visit a load (vuse) in the sinking loop you didn't end up sinking.

+ /* Issue different messages depending on FIRST_DUMP.  */
+ if (first_dump)
+   {
+ dump_printf_loc (MSG_NOTE, vect_location,
+  "Move MASK_STORE to new bb#%d\n",
+  store_bb->index);
+ first_dump = false;
+   }
+ else
+   dump_printf_loc (MSG_NOTE, vect_location,
+"Move MASK_STORE to created bb\n");

just add a separate dump when you create the BB, "Created new bb#%d for ..."
to avoid this.

Note that I can't comment on the x86 backend part so that will need to
be reviewed by somebody
else.

Thanks,
Richard.

> Thanks.
> Yuri.
>
> 2016-01-18  Yuri Rumyantsev  
>
> PR middle-end/68542
> * config/i386/i386.c (ix86_expand_branch): Implement integral vector
> comparison with boolean result.
> * config/i386/sse.md (define_expand "cbranch4): Add define-expand
> for vector comparion with eq/ne only.
> * tree-vect-loop.c (is_valid_sink): New function.
> (optimize_mask_stores): Likewise.
> * tree-vect-stmts.c (vectorizable_mask_load_store): Initialize
> has_mask_store field of vect_info.
> * tree-vectorizer.c (vectorize_loops): Invoke optimaze_mask_stores for
> vectorized loops having masked stores.
> * tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and
> correspondent macros.
> (optimize_mask_stores): Add prototype.
>
> gcc/testsuite/ChangeLog:
> * gcc.dg/vect/vect-mask-store-move-1.c: New test.
>
> 2016-01-18 17:07 GMT+03:00 Richard Biener :
>> On Mon, Jan 18, 2016 at 3:02 PM, Yuri Rumyantsev  wrote:
>>> Thanks Richard.
>>>
>>> I changed the check on type as you proposed.
>>>
>>> What about the second back-end part of patch (it has been sent 08.12.15).
>>
>> Can't see it in my inbox - can you reply to the mail with a ping?
>>
>> Thanks,
>> Richard.
>>
>>> Thanks.
>>> Yuri.
>>>
>>> 2016-01-18 15:44 GMT+03:00 Richard Biener :
 On Mon, Jan 11, 2016 at 11:06 AM, Yuri Rumyantsev  
 wrote:
> Hi Richard,
>
> Did you have anu chance to look at updated patch?

 diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
 index acbb70b..208a752 100644
 --- a/gcc/tree-vrp.c
 +++ b/gcc/tree-vrp.c
 @@ -5771,6 +5771,10 @@ register_edge_assert_for (tree name, edge e,
 gimple_stmt_iterator si,
 _code, ))
  return;

 +  /* VRP doesn't track ranges for vector types.  */
 +  if (TREE_CODE (TREE_TYPE (name)) == VECTOR_TYPE)
 +return;
 +

 please instead fix extract_code_and_val_from_cond_with_ops with

 Index: gcc/tree-vrp.c
 ===
 --- gcc/tree-vrp.c  (revision 232506)
 +++ gcc/tree-vrp.c  (working copy)
 @@ -5067,8 +5067,9 @@ extract_code_and_val_from_cond_with_ops
if (invert)
  comp_code = 

Re: [PATCH] OpenACC use_device clause ICE fix

2016-01-20 Thread Bernd Schmidt

On 01/05/2016 02:15 PM, Chung-Lin Tang wrote:

* omp-low.c (scan_sharing_clauses): Call add_local_decl() for
use_device/use_device_ptr variables.


It looks vaguely plausible, but if everything is part of the host 
function, why make a copy of the decl at all? I.e. what happens if you 
just remove the install_var_local call?



Bernd



Re: [PATCH] New version of libmpx with new memmove wrapper

2016-01-20 Thread Matthias Klose

On 11.12.2015 15:34, Ilya Enkovich wrote:

I fixed it, bootstrapped, regtested and applied to trunk.  Here is committed 
version.


this left libmpx/libtool-version, which now is unused and outdated. Ok to 
remove?

Matthias



Re: reject decl with incomplete struct/union type in check_global_declaration()

2016-01-20 Thread Prathamesh Kulkarni
On 19 January 2016 at 16:49, Marek Polacek  wrote:
> Sorry for speaking up late, but I think we could do better with formatting
> in this patch:
>
> On Sat, Jan 16, 2016 at 03:45:22PM +0530, Prathamesh Kulkarni wrote:
>> diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
>> index 915376d..d36fc67 100644
>> --- a/gcc/c/c-decl.c
>> +++ b/gcc/c/c-decl.c
>> @@ -4791,6 +4791,13 @@ finish_decl (tree decl, location_t init_loc, tree 
>> init,
>>  TREE_TYPE (decl) = error_mark_node;
>>}
>>
>> +  if ((RECORD_OR_UNION_TYPE_P (TREE_TYPE (decl))
>> +   || TREE_CODE (TREE_TYPE (decl)) == ENUMERAL_TYPE)
>> +   && DECL_SIZE (decl) == 0 && TREE_STATIC (decl))
>
> DECL_SIZE yields a tree, so I'd rather see NULL_TREE instead of 0 here (yeah,
> the enclosing code uses 0s :().  The "&& TREE_STATIC..." should be on its own
> line.
>
>> + {
>> +   incomplete_record_decls.safe_push (decl);
>> + }
>> +
>
> Redundant braces.
>
>> diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
>> index a0e0052..3c8a496 100644
>> --- a/gcc/c/c-parser.c
>> +++ b/gcc/c/c-parser.c
>> @@ -59,6 +59,8 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "gimple-expr.h"
>>  #include "context.h"
>>
>> +vec incomplete_record_decls = vNULL;
>
> This could use a comment.
>
>> +
>> +  for (unsigned i = 0; i < incomplete_record_decls.length (); ++i)
>> +{
>> +  tree decl = incomplete_record_decls[i];
>> +  if (DECL_SIZE (decl) == 0 && TREE_TYPE (decl) != error_mark_node)
>
> I'd s/0/NULL_TREE/.
Thanks for the review, I have done the suggested changes in this
version of the patch.
Ok for trunk ?

Thanks,
Prathamesh
>
> Marek
diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
index 5830e22..1ec6042 100644
--- a/gcc/c/c-decl.c
+++ b/gcc/c/c-decl.c
@@ -4791,6 +4791,12 @@ finish_decl (tree decl, location_t init_loc, tree init,
   TREE_TYPE (decl) = error_mark_node;
 }
 
+  if ((RECORD_OR_UNION_TYPE_P (TREE_TYPE (decl))
+ || TREE_CODE (TREE_TYPE (decl)) == ENUMERAL_TYPE)
+ && DECL_SIZE (decl) == NULL_TREE
+ && TREE_STATIC (decl))
+   incomplete_record_decls.safe_push (decl);
+
   if (is_global_var (decl) && DECL_SIZE (decl) != 0)
{
  if (TREE_CODE (DECL_SIZE (decl)) == INTEGER_CST)
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 919680a..1d3b9e1 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -59,6 +59,15 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-expr.h"
 #include "context.h"
 
+/* We need to walk over decls with incomplete struct/union/enum types
+   after parsing the whole translation unit.
+   In finish_decl(), if the decl is static, has incomplete
+   struct/union/enum type, it is appened to incomplete_record_decls.
+   In c_parser_translation_unit(), we iterate over incomplete_record_decls
+   and report error if any of the decls are still incomplete.  */ 
+
+vec incomplete_record_decls = vNULL;
+
 void
 set_c_expr_source_range (c_expr *expr,
 location_t start, location_t finish)
@@ -1421,6 +1430,16 @@ c_parser_translation_unit (c_parser *parser)
}
   while (c_parser_next_token_is_not (parser, CPP_EOF));
 }
+
+  for (unsigned i = 0; i < incomplete_record_decls.length (); ++i)
+{
+  tree decl = incomplete_record_decls[i];
+  if (DECL_SIZE (decl) == NULL_TREE && TREE_TYPE (decl) != error_mark_node)
+   {
+ error ("storage size of %q+D isn%'t known", decl);
+ TREE_TYPE (decl) = error_mark_node;
+   }
+}
 }
 
 /* Parse an external declaration (C90 6.7, C99 6.9).
diff --git a/gcc/c/c-tree.h b/gcc/c/c-tree.h
index 81a3d58..cf79ba7 100644
--- a/gcc/c/c-tree.h
+++ b/gcc/c/c-tree.h
@@ -731,4 +731,6 @@ set_c_expr_source_range (c_expr *expr,
 /* In c-fold.c */
 extern tree decl_constant_value_for_optimization (tree);
 
+extern vec incomplete_record_decls;
+
 #endif /* ! GCC_C_TREE_H */
diff --git a/gcc/testsuite/gcc.dg/Wcxx-compat-8.c 
b/gcc/testsuite/gcc.dg/Wcxx-compat-8.c
index f7e8c55..4e9ddc1 100644
--- a/gcc/testsuite/gcc.dg/Wcxx-compat-8.c
+++ b/gcc/testsuite/gcc.dg/Wcxx-compat-8.c
@@ -33,6 +33,7 @@ enum e3
 
 __typeof__ (struct s5 { int i; }) v5; /* { dg-warning "invalid in C\[+\]\[+\]" 
} */
 __typeof__ (struct t5) w5; /* { dg-bogus "invalid in C\[+\]\[+\]" } */
+  /* { dg-error "storage size of 'w5' isn't known" "" { target *-*-* } 35 } */
 
 int
 f1 (struct s1 *p)
@@ -64,4 +65,4 @@ f5 ()
   return &((struct t8) { });  /* { dg-warning "invalid in C\[+\]\[+\]" } */
 }
 
-/* { dg-error "invalid use of undefined type" "" { target *-*-* } 64 } */
+/* { dg-error "invalid use of undefined type" "" { target *-*-* } 65 } */
diff --git a/gcc/testsuite/gcc.dg/declspec-1.c 
b/gcc/testsuite/gcc.dg/declspec-1.c
index c19f107..b024601 100644
--- a/gcc/testsuite/gcc.dg/declspec-1.c
+++ b/gcc/testsuite/gcc.dg/declspec-1.c
@@ -9,13 +9,15 @@ typedef int t;
 /* These should all be diagnosed, but only once, not 

Re: [PATCH, PR tree-optimization/69328] Fix vectorization of boolean vector comparision in COND_EXPR

2016-01-20 Thread Richard Biener
On Tue, Jan 19, 2016 at 11:46 AM, Ilya Enkovich  wrote:
> Hi,
>
> Currently vectorizer incorrectly handles a case when COND_EXPR
> has boolean vector comparison.  Firstly masked COND_EXPR is
> determined incorrectly.  Also we don't check vector types of
> compared values are compatible.  This patch fixes these problems.
> Bootstrapped and regtested for x86_64-pc-linux-gnu.  OK for trunk?

Ok.

Thanks,
Richard.

> Thanks,
> Ilya
> --
> gcc/
>
> 2016-01-19  Ilya Enkovich  
> Richard Biener  
>
> PR tree-optimization/69328
> * tree-vect-stmts.c (vect_is_simple_cond): Check compared
> vectors have same number of elements.
> (vectorizable_condition): Fix masked version recognition.
>
>
> gcc/testsuite/
>
> 2016-01-19  Ilya Enkovich  
>
> PR tree-optimization/69328
> * gcc.dg/pr69328.c: New test.
>
>
> diff --git a/gcc/testsuite/gcc.dg/pr69328.c b/gcc/testsuite/gcc.dg/pr69328.c
> new file mode 100644
> index 000..a495596
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr69328.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3" } */
> +
> +int a, b;
> +void fn1() {
> +  int c;
> +  char *d;
> +  for (; a; ++a) {
> +int e, f;
> +e = d[a];
> +if (!e && f || !f && e)
> +  ++c;
> +  }
> +  if (c)
> +b = .499;
> +}
> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> index 635c797..9d4d286 100644
> --- a/gcc/tree-vect-stmts.c
> +++ b/gcc/tree-vect-stmts.c
> @@ -7441,6 +7441,10 @@ vect_is_simple_cond (tree cond, vec_info *vinfo, tree 
> *comp_vectype)
>&& TREE_CODE (rhs) != FIXED_CST)
>  return false;
>
> +  if (vectype1 && vectype2
> +  && TYPE_VECTOR_SUBPARTS (vectype1) != TYPE_VECTOR_SUBPARTS (vectype2))
> +return false;
> +
>*comp_vectype = vectype1 ? vectype1 : vectype2;
>return true;
>  }
> @@ -7544,13 +7548,9 @@ vectorizable_condition (gimple *stmt, 
> gimple_stmt_iterator *gsi,
>if (!vect_is_simple_use (else_clause, stmt_info->vinfo, _stmt, ))
>  return false;
>
> -  if (VECTOR_BOOLEAN_TYPE_P (comp_vectype))
> -{
> -  vec_cmp_type = comp_vectype;
> -  masked = true;
> -}
> -  else
> -vec_cmp_type = build_same_sized_truth_vector_type (comp_vectype);
> +  masked = !COMPARISON_CLASS_P (cond_expr);
> +  vec_cmp_type = build_same_sized_truth_vector_type (comp_vectype);
> +
>if (vec_cmp_type == NULL_TREE)
>  return false;
>


[patch] libstdc++/69386 Ensure C++ language linkage in cmath and cstdlib

2016-01-20 Thread Jonathan Wakely

On 19/01/16 21:43 +, Jonathan Wakely wrote:

On 08/01/16 19:18 +, Jonathan Wakely wrote:

This resolves the longstanding issue that #include  uses the C
library header, which on most targets doesn't declare the additional
overloads required by C++11 26.8 [c.math], and similarly for
.

With this patch libstdc++ provides its own  and 
wrappers, which are equivalent to  or  followed by
using-directives for all standard names. This means there are no more
inconsistencies in the contents of the  and  headers.


The new wrappers might get included as:

extern "C" {
#include 
}

which then includes  inside the extern "C" block, so we need
to ensure that the definitions in  get the right language
linkage.

Tested powerpc64le-linux, committed to trunk.


commit d5175d037f6d1a5951d3a023dca71bed16ec0434
Author: Jonathan Wakely 
Date:   Wed Jan 20 11:41:47 2016 +

Ensure C++ language linkage in cmath and cstdlib

	PR libstdc++/69386
	* include/c_global/ccomplex: Ensure C++ language linkage.
	* include/c_global/cmath: Likewise.
	* include/c_global/cstdlib: Likewise.
	* include/c_global/ctgmath: Likewise.
	* testsuite/17_intro/headers/c++2011/linkage.cc: New.

diff --git a/libstdc++-v3/include/c_global/ccomplex b/libstdc++-v3/include/c_global/ccomplex
index 8879e20..df2e413 100644
--- a/libstdc++-v3/include/c_global/ccomplex
+++ b/libstdc++-v3/include/c_global/ccomplex
@@ -35,6 +35,8 @@
 # include 
 #endif
 
+extern "C++" {
 #include 
+}
 
 #endif
diff --git a/libstdc++-v3/include/c_global/cmath b/libstdc++-v3/include/c_global/cmath
index 45e40ab3..c4ee3f5 100644
--- a/libstdc++-v3/include/c_global/cmath
+++ b/libstdc++-v3/include/c_global/cmath
@@ -74,6 +74,8 @@
 #undef tan
 #undef tanh
 
+extern "C++"
+{
 namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
@@ -1790,4 +1792,6 @@ _GLIBCXX_END_NAMESPACE_VERSION
 #  include 
 #endif
 
+} // extern "C++"
+
 #endif
diff --git a/libstdc++-v3/include/c_global/cstdlib b/libstdc++-v3/include/c_global/cstdlib
index 44b6e5c..1ba5fb7 100644
--- a/libstdc++-v3/include/c_global/cstdlib
+++ b/libstdc++-v3/include/c_global/cstdlib
@@ -115,6 +115,8 @@ namespace std
 #undef wcstombs
 #undef wctomb
 
+extern "C++"
+{
 namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
@@ -272,6 +274,8 @@ namespace std
 
 #endif // _GLIBCXX_USE_C99_STDLIB
 
+} // extern "C++"
+
 #endif // !_GLIBCXX_HOSTED
 
 #endif
diff --git a/libstdc++-v3/include/c_global/ctgmath b/libstdc++-v3/include/c_global/ctgmath
index 2fee958..4314516 100644
--- a/libstdc++-v3/include/c_global/ctgmath
+++ b/libstdc++-v3/include/c_global/ctgmath
@@ -35,7 +35,9 @@
 #  include 
 #else
 #  include 
+extern "C++" {
 #  include 
+}
 #endif
 
 #endif 
diff --git a/libstdc++-v3/testsuite/17_intro/headers/c++2011/linkage.cc b/libstdc++-v3/testsuite/17_intro/headers/c++2011/linkage.cc
new file mode 100644
index 000..33e7053
--- /dev/null
+++ b/libstdc++-v3/testsuite/17_intro/headers/c++2011/linkage.cc
@@ -0,0 +1,50 @@
+// Copyright (C) 2016 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++11" }
+// { dg-do compile }
+
+// libstdc++/69386
+
+extern "C"
+{
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+}


Re: [PATCH 1/5] s390: Use proper read-only data section for literals.

2016-01-20 Thread Andreas Krebbel
On 01/02/2016 08:16 PM, Marcin Kościelnicki wrote:
> Previously, .rodata was hardcoded.  For C++ vague linkage functions,
> this resulted in needlessly duplicated literals.  With the new split
> stack support, this resulted in link errors, due to .rodata containing
> relocations to the discarded text sections.
> 
> gcc/ChangeLog:
> 
>   * config/s390/s390.md (pool_section_start): Use switch_to_section
>   to select proper read-only data section instead of hardcoding .rodata.
>   (pool_section_end): Use switch_to_section to match the above.
> ---
>  gcc/ChangeLog   |  6 ++
>  gcc/config/s390/s390.md | 11 +--
>  2 files changed, 15 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index 23ce209..2c572a7 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,9 @@
> +2016-01-02  Marcin Kościelnicki  
> +
> + * config/s390/s390.md (pool_section_start): Use switch_to_section
> + to select proper read-only data section instead of hardcoding .rodata.
> + (pool_section_end): Use switch_to_section to match the above.
> +

This is ok if bootstrap and regression tests are clean. Thanks!

-Andreas-




Re: [PATCH 2/5] s390: Fix missing .size directives.

2016-01-20 Thread Andreas Krebbel
On 01/02/2016 08:16 PM, Marcin Kościelnicki wrote:
> It seems at some point the .size hook was hijacked to emit some
> machine-specific directives, and the actual .size directive was
> forgotten.  This caused problems for split-stack support, since
> linker couldn't scan the function body for non-split-stack calls.
> 
> gcc/ChangeLog:
> 
>   * config/s390/s390.c (s390_asm_declare_function_size): Add code
>   to actually emit the .size directive.

...

>  s390_asm_declare_function_size (FILE *asm_out_file,
> - const char *fnname ATTRIBUTE_UNUSED, tree decl)
> + const char *fnname, tree decl)
>  {
> +  if (!flag_inhibit_size_directive)
> +ASM_OUTPUT_MEASURED_SIZE (asm_out_file, fnname);
>if (DECL_FUNCTION_SPECIFIC_TARGET (decl) == NULL)
>  return;
>fprintf (asm_out_file, "\t.machine pop\n");

It would be good to use the original ASM_DECLARE_FUNCTION_SIZE macro from 
config/elfos.h here.  This
probably would require to change its name in s390.h first and then use it from
s390_asm_declare_function_size. Not really beautiful but at least changes to 
the original macro
would not require adjusting our backend.

-Andreas-



[PATCH] xfail one ppc testcase (PR tree-optimization/66612)

2016-01-20 Thread Jakub Jelinek
Hi!

As per discussion in the PR, I'd like to xfail this test for GCC6 and
change it to 7.0 milestone, because it is too late/too risky to change
this for gcc 6 now.

Bootstrapped/regtested on powerpc64{,le}-linux, ok for trunk?

2016-01-20  Jakub Jelinek  

PR tree-optimization/66612
* gcc.target/powerpc/20050830-1.c: Xfail the scan-assembler test
for bdn instruction.

--- gcc/testsuite/gcc.target/powerpc/20050830-1.c.jj2008-09-05 
12:54:15.0 +0200
+++ gcc/testsuite/gcc.target/powerpc/20050830-1.c   2016-01-20 
09:16:12.885474312 +0100
@@ -1,7 +1,8 @@
 /* Make sure the doloop optimization is done for this loop. */
 /* { dg-do compile { target powerpc*-*-* } } */
 /* { dg-options "-O2" } */
-/* { dg-final { scan-assembler "bdn" } } */
+/* XFAIL for now, see PR66612.  */
+/* { dg-final { scan-assembler "bdn" { xfail *-*-* } } } */
 extern int a[];
 int foo(int w) {
   int n = w;

Jakub


Re: [PATCH] New version of libmpx with new memmove wrapper

2016-01-20 Thread Ilya Enkovich
2016-01-20 16:20 GMT+03:00 Matthias Klose :
> On 11.12.2015 15:34, Ilya Enkovich wrote:
>>
>> I fixed it, bootstrapped, regtested and applied to trunk.  Here is
>> committed version.
>
>
> this left libmpx/libtool-version, which now is unused and outdated. Ok to
> remove?

OK if bootstrap passes.

Thanks,
Ilya

>
> Matthias
>


Re: [PATCH] xfail one ppc testcase (PR tree-optimization/66612)

2016-01-20 Thread David Edelsohn
On Wed, Jan 20, 2016 at 9:28 AM, Jakub Jelinek  wrote:
> Hi!
>
> As per discussion in the PR, I'd like to xfail this test for GCC6 and
> change it to 7.0 milestone, because it is too late/too risky to change
> this for gcc 6 now.
>
> Bootstrapped/regtested on powerpc64{,le}-linux, ok for trunk?
>
> 2016-01-20  Jakub Jelinek  
>
> PR tree-optimization/66612
> * gcc.target/powerpc/20050830-1.c: Xfail the scan-assembler test
> for bdn instruction.

Okay with me.

Thanks, David


Re: [hsa merge 02/10] Modifications to libgomp proper

2016-01-20 Thread Ilya Verbin
On Wed, Jan 13, 2016 at 18:39:27 +0100, Martin Jambor wrote:
>   * task.c (GOMP_PLUGIN_target_task_completion): Free
>   firstprivate_copies.

Also this change caused 3 fails on intelmicemul:

FAIL: libgomp.c/target-32.c execution test
FAIL: libgomp.c/target-33.c execution test
FAIL: libgomp.c/target-34.c execution test

Because ttask->firstprivate_copies is uninitialized for 
!GOMP_OFFLOAD_CAP_SHARED_MEM.

(gdb) p ttask->firstprivate_copies
$1 = (void *) 0x1
(gdb) n
Program received signal SIGSEGV, Segmentation fault.
0x003b076800dc in free () from /lib64/libc.so.6
(gdb) bt
#0  0x003b076800dc in free () from /lib64/libc.so.6
#1  0x77dda871 in GOMP_PLUGIN_target_task_completion (data=0x624ac0) at 
gcc/libgomp/task.c:585
[...]


OK for trunk?

libgomp/
* task.c (gomp_create_target_task): Set firstprivate_copies to NULL.

diff --git a/libgomp/task.c b/libgomp/task.c
index 0f45c44..38d4e9b 100644
--- a/libgomp/task.c
+++ b/libgomp/task.c
@@ -683,6 +683,7 @@ gomp_create_target_task (struct gomp_device_descr *devicep,
   ttask->state = state;
   ttask->task = task;
   ttask->team = team;
+  ttask->firstprivate_copies = NULL;
   task->fn = NULL;
   task->fn_data = ttask;
   task->final_task = 0;

  -- Ilya


RE: [PATCH] MIPS: Prevent the p5600-bonding.c test from being run for the n32 and 64 ABIs

2016-01-20 Thread Andrew Bennett
Ping.

Andrew

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org] On
> Behalf Of Andrew Bennett
> Sent: 02 September 2015 14:55
> To: Matthew Fortune; gcc-patches@gcc.gnu.org
> Cc: Moore, Catherine (catherine_mo...@mentor.com)
> Subject: RE: [PATCH] MIPS: Prevent the p5600-bonding.c test from being run for
> the n32 and 64 ABIs
> 
> > > diff --git a/gcc/testsuite/gcc.target/mips/p5600-bonding.c
> > > b/gcc/testsuite/gcc.target/mips/p5600-bonding.c
> > > index 0890ffa..20c26ca 100644
> > > --- a/gcc/testsuite/gcc.target/mips/p5600-bonding.c
> > > +++ b/gcc/testsuite/gcc.target/mips/p5600-bonding.c
> > > @@ -1,6 +1,7 @@
> > >  /* { dg-do compile } */
> > >  /* { dg-options "-dp -mtune=p5600  -mno-micromips -mno-mips16" } */
> > >  /* { dg-skip-if "Bonding needs peephole optimization." { *-*-* } { "-O0"
> "-
> > O1" } { "" } }
> > > */
> > > +/* { dg-skip-if "There is no DI mode support for load/store bonding" { *-
> *-
> > * } { "-
> > > mabi=n32" "-mabi=64" } { "" } } */
> > >  typedef int VINT32 __attribute__ ((vector_size((16;
> >
> > If the best fix we can do for this test is to limit what it tests then we
> > should still not just skip it. There is some precedence for tests that
> > require a specific arch with the isa=loongson special case. I'd rather
> > just lock the test down to p5600 as per the filename.
> 
> I have changed the testcase's dg-options so that it is only built for p5600.
> The updated patch and ChangeLog are below.
> 
> Ok to commit?
> 
> Many thanks,
> 
> 
> 
> Andrew
> 
> 
> testsuite/
>   * gcc.target/mips/p5600-bonding.c (dg-options): Force the test to be
> always
>   built for p5600.
>   * gcc.target/mips/mips.exp (mips-dg-options): Add support for the
> isa=p5600
>   dg-option.
> 
> 
> diff --git a/gcc/testsuite/gcc.target/mips/mips.exp
> b/gcc/testsuite/gcc.target/mips/mips.exp
> index 42e7fff..e8d1895 100644
> --- a/gcc/testsuite/gcc.target/mips/mips.exp
> +++ b/gcc/testsuite/gcc.target/mips/mips.exp
> @@ -142,6 +142,9 @@
>  #   isa=loongson
>  #  select a Loongson processor
>  #
> +#   isa=p5600
> +#  select a P5600 processor
> +#
>  #   addressing=absolute
>  #  force absolute addresses to be used
>  #
> @@ -1009,6 +1012,10 @@ proc mips-dg-options { args } {
> if { ![regexp {^-march=loongson} $arch] } {
> set arch "-march=loongson2f"
> }
> +   } elseif { [string equal $spec "isa=p5600"] } {
> +   if { ![regexp {^-march=p5600} $arch] } {
> +   set arch "-march=p5600"
> +   }
> } else {
> if { ![regexp {^(isa(?:|_rev))(=|<=|>=)([0-9]*)$} \
>$spec dummy prop relation value nocpus] } {
> diff --git a/gcc/testsuite/gcc.target/mips/p5600-bonding.c
> b/gcc/testsuite/gcc.target/mips/p5600-bonding.c
> index 0890ffa..0bc6d91 100644
> --- a/gcc/testsuite/gcc.target/mips/p5600-bonding.c
> +++ b/gcc/testsuite/gcc.target/mips/p5600-bonding.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-dp -mtune=p5600  -mno-micromips -mno-mips16" } */
> +/* { dg-options "-dp isa=p5600 -mtune=p5600 -mno-micromips -mno-mips16" } */
>  /* { dg-skip-if "Bonding needs peephole optimization." { *-*-* } { "-O0" "-
> O1" } { "" } } */
>  typedef int VINT32 __attribute__ ((vector_size((16;



Re: [doc, dwarf] Update bit-rotten DWARF option documentation

2016-01-20 Thread Jason Merrill

On 01/19/2016 11:31 PM, Sandra Loosemore wrote:

-@option{-gdwarf-2} does not accept a concatenated debug level, because
-GCC used to support an option @option{-gdwarf} that meant to generate
-debug information in version 1 of the DWARF format (which is very
-different from version 2), and it would have been too confusing.  That
-debug format is long obsolete, but the option cannot be changed now.


I think we should retain some mention of DWARF 1 here, perhaps as an 
explanation of why {version} starts at 2.



 Ron Guilmette implemented the @command{protoize} and @command{unprotoize}
-tools, the support for Dwarf symbolic debugging information, and much of
+tools, the support for DWARF symbolic debugging information, and much of
 the support for System V Release 4.  He has also worked heavily on the


Let's clarify that rfg's implementation was of DWARF version 1.

Jason



Re: [PATCH 2/5] s390: Fix missing .size directives.

2016-01-20 Thread Dominik Vogt
On Wed, Jan 20, 2016 at 02:16:23PM +0100, Andreas Krebbel wrote:
> On 01/02/2016 08:16 PM, Marcin Kościelnicki wrote:
> >  s390_asm_declare_function_size (FILE *asm_out_file,
> > -   const char *fnname ATTRIBUTE_UNUSED, tree decl)
> > +   const char *fnname, tree decl)
> >  {
> > +  if (!flag_inhibit_size_directive)
> > +ASM_OUTPUT_MEASURED_SIZE (asm_out_file, fnname);
> >if (DECL_FUNCTION_SPECIFIC_TARGET (decl) == NULL)
> >  return;
> >fprintf (asm_out_file, "\t.machine pop\n");
> 
> It would be good to use the original ASM_DECLARE_FUNCTION_SIZE macro from 
> config/elfos.h here.  This
> probably would require to change its name in s390.h first and then use it from
> s390_asm_declare_function_size. Not really beautiful but at least changes to 
> the original macro
> would not require adjusting our backend.

Maybe it's better not to invent yet another solution to deal with
this and just do it like proposed in the patch.  So if the default
implementation is ever changed, the same search pattern will find
all identical copies of the code.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany



Re: Patch RFA: Add option -fcollectible-pointers, use it in ivopts

2016-01-20 Thread Richard Biener
On Wed, Jan 20, 2016 at 3:02 PM, Ian Lance Taylor  wrote:
> On Wed, Jan 20, 2016 at 3:13 AM, Richard Biener
>  wrote:
>> On Wed, Jan 20, 2016 at 6:48 AM, Ian Lance Taylor  wrote:
>>> As discussed at https://gcc.gnu.org/ml/gcc/2016-01/msg00023.html , the
>>> Go frontend needs some way to prevent ivopts from temporarily removing
>>> all pointers into a memory block.  This patch adds a new option
>>> -fcollectible-pointers which makes that happen.  This is not the best
>>> way to solve the problem, but it seems safe for GCC 6.
>>>
>>> I made the option -fcollectible-pointers intending that any similar
>>> optimizations (or, rather, non-optimizations) can be captured in the
>>> same option.  And if we develop a better approach for ivopts, it can
>>> still be covered under this option name.
>>>
>>> Bootstrapped and tested on x86_64-pc-linux-gnu.  OK for mainline?
>>
>> +  // -fcollectible-pointers means that we have to keep a real pointer
>> +  // live, but the ivopts code may replace a real pointer with one
>> +  // pointing before or after the memory block that is then adjusted
>> +  // into the memory block during the loop.
>> +  // FIXME: It would likely be better to actually force the pointer
>> +  // live and still use ivopts; for example, it would be enough to
>> +  // write the pointer into memory and keep it there until after the
>> +  // loop.
>> +  if (flag_collectible_pointers && POINTER_TYPE_P (TREE_TYPE (base)))
>> +return;
>>
>> please use C-style comments.
>
> Whoops, sorry, too much Go coding.
>
>> The above is to add_autoinc_candidates
>> which I find weird - we certainly produce out-of-bound pointers even on
>> x86 which isn't auto-inc.
>
> Despite the name, this is used on all systems.  That is the function
> where we consider using BASE + STEP * i in  a loop.
>
>> So this can't be a complete fix.  I guess you
>> are correct in disabling some offsetted address IV candidates but
>> I think there's some other issues regarding to exit test replacement
>> maybe (replace ptr <= ptr2 with ptr != ptr2 or so).
>
> I'll look into that.
>
>> While the docs of the option look fine I find
>>
>> +fcollectible-pointers
>> +Common Report Var(flag_collectible_pointers) Optimization
>> +Ensure that pointers are always collectible by a garbage collector
>>
>> somewhat confusing (we don't collect pointers but pointed-to memory).
>> Maybe "Ensure that derived pointers always point to the original object"?
>> In that light -fcollectible-pointers is a bad option name as well.  Maybe
>> -fall-pointers-are-gc-roots or sth like that.
>
> I'm OK with that, or how about -fkeep-gc-roots-live?

Sounds good.

Richard.

> Ian


  1   2   >