date:20150930

Re: [PATCH] x86 interrupt attribute

2015-09-30 Thread Yulia Koval

Tests for Linux/i686 passed successfully.

Julia

On Wed, Sep 30, 2015 at 3:48 PM, H.J. Lu  wrote:
> On Wed, Sep 30, 2015 at 5:36 AM, Yulia Koval  wrote:
>> Hi,
>>
>> Thanks. I added all fixes to the patch, bootstrapped/regtested it on
>> Linux/x86_64. Linux/i686 in progress. Ok for trunk if testing passes
>> successfully?
>>
>
> We will work on the "no_caller_saved_registers" attribute, or
> something like that, later as an optimization.
>
> Thanks.
>
> --
> H.J.

Re: New OpenACC pass and Target Hook

2015-09-30 Thread Bernd Schmidt


For avoidance of doubt, is this approval, or 'LGTM, but needs Jakub's
approval'?


Go ahead and commit.


Bernd

Re: [PATCH] Optimize certain end of loop conditions into min/max operation

2015-09-30 Thread Marc Glisse


On Fri, 18 Sep 2015, Marc Glisse wrote:

+(bit_and:c (op @0 @1) (op @0 @2))


:c seems useless here. On the other hand, it might make sense to use op:s
since this is mostly useful if it removes the 2 original comparisons.


As I was saying, :c is useless.
(x:c y z)
is replaced by two copies of the transformation, one with
(x y z)
and the other with
(x z y)
In your transformation, both versions would be equivalent, so the second
one is redundant.

Also, if you have:
a=x

Re: [Patch, testsuite] Skip addr_equal-1 if target keeps null pointer checks

2015-09-30 Thread Jeff Law


On 09/29/2015 12:41 AM, Senthil Kumar Selvaraj wrote:

On Mon, Sep 28, 2015 at 01:38:18PM -0600, Jeff Law wrote:

On 09/28/2015 02:15 AM, Senthil Kumar Selvaraj wrote:

Hi,

   The below patch skips gcc.dg/addr_equal-1.c if the target keeps null
   pointer checks.

   The test fails for such targets (avr, in my case) because the address
   comparison in the below code does not resolve to a constant, causing
   builtin_constant_p to return false and fail the test.

   /* Variables and functions do not share same memory locations otherwise.  */
   if (!__builtin_constant_p ((void *)undef_fn0 == (void *)_var0))
 abort ();

   For targets that delete null pointer checks, the equality comparison 
expression
   is optimized away to 0, as the code in match.pd knows they can only be
   equal if they are both NULL, which cannot be true since
   flag-delete-null-pointer-checks is on.

   For targets that keep null pointer checks, 0 is a valid address and the
comparison expression is left as is, and that causes a later pass to
fold the builtin_constant_p to a false value, resulting in the test 
failure.

This sounds like a failing in the compiler itself, not a testsuite issue.

Even on a target where objects can be at address 0, you can't have a
variable and a function at the same address.


Hmm, symtab_node::equal_address_to, which is where the address equality
check happens, has a comment that contradicts
your statement, and the function variable overlap check is done after the
NULL possibility check. The current code looks like this

/* If both symbols may resolve to NULL, we can not really prove them 
different.  */
 if (!nonzero_address () && !s2->nonzero_address ())
   return 2;

 /* Except for NULL, functions and variables never overlap.  */
 if (TREE_CODE (decl) != TREE_CODE (s2->decl))
   return 0;

Does anyone know why?

The only case I could think of would be weak symbols.

jeff

Re: [PATCH] Update SSA_NAME manager to use two lists

2015-09-30 Thread Jeff Law


On 09/30/2015 12:46 PM, Jakub Jelinek wrote:

On Wed, Sep 30, 2015 at 11:44:38AM -0600, Jeff Law wrote:

+/* Move all SSA_NAMEs from FREE_SSA_NAMES_QUEUE to FREE_SSA_NAMES.
+
+   We do not, but should have a mode to verify the state of the SSA_NAMEs
+   lists.  In particular at this point every name must be in the IL,
+   on the free list or in the queue.  Anything else is an error.  */
+
+void
+flush_ssaname_freelist (void)
+{
+  while (!vec_safe_is_empty (FREE_SSANAMES_QUEUE (cfun)))
+{
+  tree t = FREE_SSANAMES_QUEUE (cfun)->pop ();
+  vec_safe_push (FREE_SSANAMES (cfun), t);
+}


Isn't it faster to just do:
   vec_safe_splice (FREE_SSANAMES (cfun), FREE_SSANAMES_QUEUE (cfun));
   vec_safe_truncate (FREE_SSANAMES_QUEUE (cfun));
or so?  I mean, rather than reallocating the vector perhaps many times
grow it just once to the exact size, and memcpy there the data.

Probably.

I'm pondering a bit of refactoring in there, I'll look at fixing this 
while I'm in there.


jeff

Re: [Patch, testsuite] Skip addr_equal-1 if target keeps null pointer checks

2015-09-30 Thread Jan Hubicka

> On 09/29/2015 12:41 AM, Senthil Kumar Selvaraj wrote:
> >On Mon, Sep 28, 2015 at 01:38:18PM -0600, Jeff Law wrote:
> >>On 09/28/2015 02:15 AM, Senthil Kumar Selvaraj wrote:
> >>>Hi,
> >>>
> >>>   The below patch skips gcc.dg/addr_equal-1.c if the target keeps null
> >>>   pointer checks.
> >>>
> >>>   The test fails for such targets (avr, in my case) because the address
> >>>   comparison in the below code does not resolve to a constant, causing
> >>>   builtin_constant_p to return false and fail the test.
> >>>
> >>>   /* Variables and functions do not share same memory locations 
> >>> otherwise.  */
> >>>   if (!__builtin_constant_p ((void *)undef_fn0 == (void *)_var0))
> >>> abort ();
> >>>
> >>>   For targets that delete null pointer checks, the equality comparison 
> >>> expression
> >>>   is optimized away to 0, as the code in match.pd knows they can only be
> >>>   equal if they are both NULL, which cannot be true since
> >>>   flag-delete-null-pointer-checks is on.
> >>>
> >>>   For targets that keep null pointer checks, 0 is a valid address and the
> >>>   comparison expression is left as is, and that causes a later pass to
> >>>   fold the builtin_constant_p to a false value, resulting in the test 
> >>> failure.
> >>This sounds like a failing in the compiler itself, not a testsuite issue.
> >>
> >>Even on a target where objects can be at address 0, you can't have a
> >>variable and a function at the same address.
> >
> >Hmm, symtab_node::equal_address_to, which is where the address equality
> >check happens, has a comment that contradicts
> >your statement, and the function variable overlap check is done after the
> >NULL possibility check. The current code looks like this
> >
> >/* If both symbols may resolve to NULL, we can not really prove them 
> > different.  */
> > if (!nonzero_address () && !s2->nonzero_address ())
> >   return 2;
> >
> > /* Except for NULL, functions and variables never overlap.  */
> > if (TREE_CODE (decl) != TREE_CODE (s2->decl))
> >   return 0;
> >
> >Does anyone know why?
> The only case I could think of would be weak symbols.

Yep, the check is there for weak symbols.  nonzero_address returns true for most
common symbols.
I tried to be simply conservative here about correctness, but I assume we would 
have
non-transitive equivalence because something like this would trigger abort

if (fn == NULL && var == NULL)
  assert (fn == var);

I assume one can before nonzero_address check something like

 if (TREE_CODE (decl) != TREE_CODE (s2->decl)
 && ((analyzed && DECL_EXTERNAL (decl)) || !DECL_WEAK (decl))
 && ((s2->analyzed && DECL_EXTERNAL (s2->decl)) || !DECL_WEAK (decl)))
   return 0;

before nonzero_address check as I see that if both fn and var are defined
they can't bind to same address. (basically the second part of conditional
copy nonzero_address with flag_delete_null_pointer_checks assumed to be true,
extra parameter to nonzero_address may do)

Honza
> 
> jeff

Re: Fold acc_on_device

2015-09-30 Thread Jakub Jelinek

On Wed, Sep 30, 2015 at 03:01:22PM -0400, Nathan Sidwell wrote:
> On 09/30/15 08:46, Richard Biener wrote:
> 
> >>>Please don't add any new GENERIC based builtin folders.  Instead add to
> >>>gimple-fold.c:gimple_fold_builtin
> 
> Is this patch ok?
> 
> nathan

> 2015-09-30  Nathan Sidwell  
> 
>   * builtins.c: Don't include gomp-constants.h.
>   (fold_builtin_1): Don't fold acc_on_device here.
>   * gimple-fold.c: Include gomp-constants.h.
>   (gimple_fold_builtin_acc_on_device): New.
>   (gimple_fold_builtin): Call it.
> 
> Index: gimple-fold.c
> ===
> --- gimple-fold.c (revision 228288)
> +++ gimple-fold.c (working copy)
> @@ -62,6 +62,7 @@ along with GCC; see the file COPYING3.
>  #include "output.h"
>  #include "tree-eh.h"
>  #include "gimple-match.h"
> +#include "gomp-constants.h"
>  
>  /* Return true when DECL can be referenced from current unit.
> FROM_DECL (if non-null) specify constructor of variable DECL was taken 
> from.
> @@ -2708,6 +2709,34 @@ gimple_fold_builtin_strlen (gimple_stmt_
>return true;
>  }
>  
> +/* Fold a call to __builtin_acc_on_device.  */
> +
> +static bool
> +gimple_fold_builtin_acc_on_device (gimple_stmt_iterator *gsi, tree arg0)
> +{
> +  /* Defer folding until we know which compiler we're in.  */
> +  if (symtab->state != EXPANSION)
> +return false;
> +
> +  unsigned val_host = GOMP_DEVICE_HOST;
> +  unsigned val_dev = GOMP_DEVICE_NONE;
> +
> +#ifdef ACCEL_COMPILER
> +  val_host = GOMP_DEVICE_NOT_HOST;
> +  val_dev = ACCEL_COMPILER_acc_device;
> +#endif
> +
> +  tree host = build2 (EQ_EXPR, boolean_type_node, arg0,
> +   build_int_cst (integer_type_node, val_host));
> +  tree dev = build2 (EQ_EXPR, boolean_type_node, arg0,
> +  build_int_cst (integer_type_node, val_dev));
> +
> +  tree result = build2 (TRUTH_OR_EXPR, boolean_type_node, host, dev);
> +
> +  result = fold_convert (integer_type_node, result);
> +  gimplify_and_update_call_from_tree (gsi, result);
> +  return true;
> +}

Wouldn't it be better to just emit GIMPLE here instead?
So
  tree res = make_ssa_name (boolean_type_node);
  gimple g = gimple_build_assign (res, EQ_EXPR, arg0,
  build_int_cst (integer_type_node, val_host));
  gsi_insert_before (gsi, g);
...
?

Jakub

Re: [PATCH] Clear flow-sensitive info in phiopt (PR tree-optimization/67769)

2015-09-30 Thread Jeff Law


On 09/30/2015 08:17 AM, Marek Polacek wrote:

Another instance of out of date SSA range info.  Before phiopt1 we had

   :
   if (N_2(D) >= 0)
 goto ;
   else
 goto ;

   :
   iftmp.0_3 = MIN_EXPR ;

   :
   # iftmp.0_5 = PHI <0(2), iftmp.0_3(3)>
   value_4 = (short int) iftmp.0_5;
   return value_4;

and after phiop1:

   :
   iftmp.0_3 = MIN_EXPR ;
   iftmp.0_6 = MAX_EXPR ;
   value_4 = (short int) iftmp.0_6;
   return value_4;

But the flow-sensitive info in this BB hasn't been cleared up.

This problem doesn't show up in GCC5 but might be latent there.

Bootstrapped/regtested on x86_64-linux, ok for trunk and 5 as well?

2015-09-30  Marek Polacek  

PR tree-optimization/67769
* tree-ssa-phiopt.c (tree_ssa_phiopt_worker): Call
reset_flow_sensitive_info_in_bb when changing the CFG.

* gcc.dg/torture/pr67769.c: New test.

OK.
jeff

Re: Fold acc_on_device

2015-09-30 Thread Nathan Sidwell


On 09/30/15 08:46, Richard Biener wrote:


Please don't add any new GENERIC based builtin folders.  Instead add to
gimple-fold.c:gimple_fold_builtin


Is this patch ok?

nathan
2015-09-30  Nathan Sidwell  

	* builtins.c: Don't include gomp-constants.h.
	(fold_builtin_1): Don't fold acc_on_device here.
	* gimple-fold.c: Include gomp-constants.h.
	(gimple_fold_builtin_acc_on_device): New.
	(gimple_fold_builtin): Call it.

Index: gimple-fold.c
===
--- gimple-fold.c	(revision 228288)
+++ gimple-fold.c	(working copy)
@@ -62,6 +62,7 @@ along with GCC; see the file COPYING3.
 #include "output.h"
 #include "tree-eh.h"
 #include "gimple-match.h"
+#include "gomp-constants.h"
 
 /* Return true when DECL can be referenced from current unit.
FROM_DECL (if non-null) specify constructor of variable DECL was taken from.
@@ -2708,6 +2709,34 @@ gimple_fold_builtin_strlen (gimple_stmt_
   return true;
 }
 
+/* Fold a call to __builtin_acc_on_device.  */
+
+static bool
+gimple_fold_builtin_acc_on_device (gimple_stmt_iterator *gsi, tree arg0)
+{
+  /* Defer folding until we know which compiler we're in.  */
+  if (symtab->state != EXPANSION)
+return false;
+
+  unsigned val_host = GOMP_DEVICE_HOST;
+  unsigned val_dev = GOMP_DEVICE_NONE;
+
+#ifdef ACCEL_COMPILER
+  val_host = GOMP_DEVICE_NOT_HOST;
+  val_dev = ACCEL_COMPILER_acc_device;
+#endif
+
+  tree host = build2 (EQ_EXPR, boolean_type_node, arg0,
+		  build_int_cst (integer_type_node, val_host));
+  tree dev = build2 (EQ_EXPR, boolean_type_node, arg0,
+		 build_int_cst (integer_type_node, val_dev));
+
+  tree result = build2 (TRUTH_OR_EXPR, boolean_type_node, host, dev);
+
+  result = fold_convert (integer_type_node, result);
+  gimplify_and_update_call_from_tree (gsi, result);
+  return true;
+}
 
 /* Fold the non-target builtin at *GSI and return whether any simplification
was made.  */
@@ -2848,6 +2877,9 @@ gimple_fold_builtin (gimple_stmt_iterato
 	   n == 3
 	   ? gimple_call_arg (stmt, 2)
 	   : NULL_TREE, fcode);
+case BUILT_IN_ACC_ON_DEVICE:
+  return gimple_fold_builtin_acc_on_device (gsi,
+		gimple_call_arg (stmt, 0));
 default:;
 }
 
Index: builtins.c
===
--- builtins.c	(revision 228288)
+++ builtins.c	(working copy)
@@ -64,7 +64,6 @@ along with GCC; see the file COPYING3.
 #include "cgraph.h"
 #include "tree-chkp.h"
 #include "rtl-chkp.h"
-#include "gomp-constants.h"
 
 
 static tree do_mpc_arg1 (tree, tree, int (*)(mpc_ptr, mpc_srcptr, mpc_rnd_t));
@@ -10230,27 +10229,6 @@ fold_builtin_1 (location_t loc, tree fnd
 	return build_empty_stmt (loc);
   break;
 
-case BUILT_IN_ACC_ON_DEVICE:
-  /* Don't fold on_device until we know which compiler is active.  */
-  if (symtab->state == EXPANSION)
-	{
-	  unsigned val_host = GOMP_DEVICE_HOST;
-	  unsigned val_dev = GOMP_DEVICE_NONE;
-
-#ifdef ACCEL_COMPILER
-	  val_host = GOMP_DEVICE_NOT_HOST;
-	  val_dev = ACCEL_COMPILER_acc_device;
-#endif
-	  tree host = build2 (EQ_EXPR, boolean_type_node, arg0,
-			  build_int_cst (integer_type_node, val_host));
-	  tree dev = build2 (EQ_EXPR, boolean_type_node, arg0,
-			 build_int_cst (integer_type_node, val_dev));
-
-	  tree result = build2 (TRUTH_OR_EXPR, boolean_type_node, host, dev);
-	  return fold_convert (integer_type_node, result);
-	}
-  break;
-
 default:
   break;
 }

[PATCH] Update SSA_NAME manager to use two lists

2015-09-30 Thread Jeff Law



The SSA_NAME manager currently has a single free list.  As names are 
released, they're put on the free list and recycled immediately.


This has led to several problems through the years -- in particular 
removal of an edge may result in removal of a PHI when the target of the 
edge is unreachable.  This can result in released names being left in 
the IL until *all* unreachable code is eliminated.  Long term we'd like 
to discover all the unreachable code exposed by a deleted edge earlier, 
but that's further out.


Richi originally suggested using a two list implementation to avoid this 
class of problems.  Essentially released names are queued until it's 
safe to start recycling them.  I agreed, but didn't get around to doing 
any of the implementation work.


Bernd recently took care of the implementation side.  This patch is 
mostly his.  The only change of significance I made is the placement of 
the call to flush the pending list.  Bernd had it in the ssa updater, I 
put it after cfg cleanups.  The former does recycle better, but there's 
nothing that inherently ensures there aren't unreachables in the CFG 
during update_ssa (in practice it's not a problem because we typically 
update dominators first, which requires a cleaned cfg).


I've got a follow-up which exploits this improved safety in DOM to 
optimize things better in DOM rather than waiting for jump threading to 
clean things up.


No additional tests in this patch as the only failure seen when I 
twiddled things a little was already covered by existing tests.


Bootstrapped and regression tested on x86-linux-gnu, with and without 
the follow-up patch to exploit the capability in DOM.


Installed on the trunk.

jeff



* gimple-ssa.h (gimple_df): Add free_ssanames_queue field.
* passes.c: Include tree-ssanames.h.
(execute_function_todo): Flush the pending free SSA_NAMEs after
eliminating unreachable basic blocks.
* tree-ssanames.c (FREE_SSANAMES_QUEUE): new.
(init_ssanames): Initialize FREE_SSANAMES_QUEUE.
(fini_ssanames): Finalize FREE_SSANAMES_QUEUE.
(flush_ssanames_freelist): New function.
(release_ssaname_fn): Put released names on the queue.
(pass_release_ssa_names::execute): Call flush_ssanames_freelist.
* tree-ssanames.h (flush_ssanames_freelist): Declare.



diff --git a/gcc/gimple-ssa.h b/gcc/gimple-ssa.h
index c89071e..39551da 100644
--- a/gcc/gimple-ssa.h
+++ b/gcc/gimple-ssa.h
@@ -90,6 +90,9 @@ struct GTY(()) gimple_df {
   /* Free list of SSA_NAMEs.  */
   vec *free_ssanames;
 
+  /* Queue of SSA_NAMEs to be freed at the next opportunity.  */
+  vec *free_ssanames_queue;
+
   /* Hashtable holding definition for symbol.  If this field is not NULL, it
  means that the first reference to this variable in the function is a
  USE or a VUSE.  In those cases, the SSA renamer creates an SSA name
diff --git a/gcc/passes.c b/gcc/passes.c
index d06a293..5b41102 100644
--- a/gcc/passes.c
+++ b/gcc/passes.c
@@ -84,6 +84,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "cfgrtl.h"
 #include "tree-ssa-live.h"  /* For remove_unused_locals.  */
 #include "tree-cfgcleanup.h"
+#include "tree-ssanames.h"
 
 using namespace gcc;
 
@@ -1913,6 +1914,14 @@ execute_function_todo (function *fn, void *data)
 {
   cleanup_tree_cfg ();
 
+  /* Once unreachable nodes have been removed from the CFG,
+there can't be any lingering references to released
+SSA_NAMES (because there is no more unreachable code).
+
+Thus, now is the time to flush the SSA_NAMEs freelist.  */
+  if (fn->gimple_df)
+   flush_ssaname_freelist ();
+
   /* When cleanup_tree_cfg merges consecutive blocks, it may
 perform some simplistic propagation when removing single
 valued PHI nodes.  This propagation may, in turn, cause the
diff --git a/gcc/tree-ssanames.c b/gcc/tree-ssanames.c
index 4199290..e029062 100644
--- a/gcc/tree-ssanames.c
+++ b/gcc/tree-ssanames.c
@@ -69,6 +69,7 @@ unsigned int ssa_name_nodes_reused;
 unsigned int ssa_name_nodes_created;
 
 #define FREE_SSANAMES(fun) (fun)->gimple_df->free_ssanames
+#define FREE_SSANAMES_QUEUE(fun) (fun)->gimple_df->free_ssanames_queue
 
 
 /* Initialize management of SSA_NAMEs to default SIZE.  If SIZE is
@@ -91,6 +92,7 @@ init_ssanames (struct function *fn, int size)
  least 50 elements reserved in it.  */
   SSANAMES (fn)->quick_push (NULL_TREE);
   FREE_SSANAMES (fn) = NULL;
+  FREE_SSANAMES_QUEUE (fn) = NULL;
 
   fn->gimple_df->ssa_renaming_needed = 0;
   fn->gimple_df->rename_vops = 0;
@@ -103,6 +105,7 @@ fini_ssanames (void)
 {
   vec_free (SSANAMES (cfun));
   vec_free (FREE_SSANAMES (cfun));
+  vec_free (FREE_SSANAMES_QUEUE (cfun));
 }
 
 /* Dump some simple statistics regarding the re-use of SSA_NAME nodes.  */
@@ -114,6 +117,22 @@ ssanames_print_statistics (void)
   fprintf (stderr, "SSA_NAME nodes reused: %u\n",

Re: [PATCH ARM]: PR67745: Fix function alignment after attribute 2/2

2015-09-30 Thread Jeff Law


On 09/29/2015 07:24 AM, Christian Bruel wrote:

This patch uses FUNCTION_BOUNDARY instead of DECL_ALIGN to check the max
align when optimizing for size in assemble_start_function.
This is necessary for ARM that can switch the max code alignment
directives between modes.

No regressions for ARM
Testing on-going for x86

Christian


align2.patch


2015-09-29  Christian Bruel

PR target/67745
* gcc/varasm.c (assemble_start_function): Use current's function align.
Does this override alignment information that might be attached to the 
DECL?   Does that, in effect, override any alignment information that 
the developer may have put on the decl?  If so, then it seems like a bad 
idea, even with -Os.


Am I missing something here?

jeff

Re: [PATCH] Update SSA_NAME manager to use two lists

2015-09-30 Thread Jakub Jelinek

On Wed, Sep 30, 2015 at 11:44:38AM -0600, Jeff Law wrote:
> +/* Move all SSA_NAMEs from FREE_SSA_NAMES_QUEUE to FREE_SSA_NAMES.
> +
> +   We do not, but should have a mode to verify the state of the SSA_NAMEs
> +   lists.  In particular at this point every name must be in the IL,
> +   on the free list or in the queue.  Anything else is an error.  */
> +
> +void
> +flush_ssaname_freelist (void)
> +{
> +  while (!vec_safe_is_empty (FREE_SSANAMES_QUEUE (cfun)))
> +{
> +  tree t = FREE_SSANAMES_QUEUE (cfun)->pop ();
> +  vec_safe_push (FREE_SSANAMES (cfun), t);
> +}

Isn't it faster to just do:
  vec_safe_splice (FREE_SSANAMES (cfun), FREE_SSANAMES_QUEUE (cfun));
  vec_safe_truncate (FREE_SSANAMES_QUEUE (cfun));
or so?  I mean, rather than reallocating the vector perhaps many times
grow it just once to the exact size, and memcpy there the data.

Jakub

Re: [C/C++ PATCH] RFC: Implement -Wduplicated-cond (PR c/64249) (version 2)

2015-09-30 Thread Jeff Law


On 09/30/2015 09:47 AM, Joseph Myers wrote:

The C front-end changes are OK.

The rest are OK as well.

jeff

Re: [Patch ifcvt costs 0/3] Introduce a new target hook for ifcvt costs.

2015-09-30 Thread Mike Stump

On Sep 30, 2015, at 1:04 AM, Richard Biener  wrote:
> So what about a branch_cost hook that takes taken/not-taken probabilities as 
> argument?

So, for my port, I need to know %prediction as well to calculate cost.  I know, 
kinda sucks.  Or put another way, I want to explain the cost taken, predicted, 
not-taken, predicted, taken, mis-predicted, and not-taken-mis-predicted and let 
the caller sort out if the branch will be predicted or mis-predicted, as it can 
do the math itself and that math is target independent.

Re: [PATCH] x86 interrupt attribute

2015-09-30 Thread H.J. Lu

On Wed, Sep 30, 2015 at 5:36 AM, Yulia Koval  wrote:
> Hi,
>
> Thanks. I added all fixes to the patch, bootstrapped/regtested it on
> Linux/x86_64. Linux/i686 in progress. Ok for trunk if testing passes
> successfully?

+  /* If true, the current function is an interrupt function as
+ specified by the "interrupt" or "exception" attribute.  */
+  BOOL_BITFIELD is_interrupt : 1;
+
+  /* If true, the current function is an interrupt function as
+ specified by the "exception" attribute.  */
+  BOOL_BITFIELD is_exception : 1;

Please update comments.  There is no "exception" attribute.

-- 
H.J.

Re: [PATCH] Fix default_binds_local_p_2 for extern protected data

2015-09-30 Thread Andreas Krebbel

On 09/30/2015 06:21 PM, Szabolcs Nagy wrote:
> On 30/09/15 14:47, Bernd Schmidt wrote:
>> On 09/17/2015 11:15 AM, Szabolcs Nagy wrote:
>>> ping 2.
>>>
>>> this patch is needed for working visibility ("protected")
>>> attribute for extern data on targets using default_binds_local_p_2.
>>> https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01871.html
>>
>> I hesitate to review this one since I don't think I understand the
>> issues on the various affected arches well enough. It looks like Jakub
>> had some input on the earlier changes, maybe he could take a look? Or
>> maybe rth knows best. Adding Ccs.
>>
>> It would help to have examples of code generation demonstrating the
>> problem and how you would solve it. Input from the s390 maintainers
>> whether this is correct for their port would also be appreciated.

We are having the same problem on S/390. I think the GCC change is correct for 
S/390 as well.

-Andreas-

>>
> 
> consider the TU
> 
>__attribute__((visibility("protected"))) int n;
> 
>int f () { return n; }
> 
> if n "binds_local" then gcc -O -fpic -S is like
> 
>  .text
>  .align  2
>  .global f
>  .arch armv8-a+fp+simd
>  .type   f, %function
> f:
>  adrpx0, n
>  ldr w0, [x0, #:lo12:n]
>  ret
>  .size   f, .-f
>  .protected  n
>  .comm   n,4,4
> 
> so 'n' is a direct reference, not accessed through
> the GOT ('n' will be in the .bss of the dso).
> this is the current behavior.
> 
> if i remove the protected visibility attribute
> then the access goes through GOT:
> 
>  .text
>  .align  2
>  .global f
>  .arch armv8-a+fp+simd
>  .type   f, %function
> f:
>  adrpx0, _GLOBAL_OFFSET_TABLE_
>  ldr x0, [x0, #:gotpage_lo15:n]
>  ldr w0, [x0]
>  ret
>  .size   f, .-f
>  .comm   n,4,4
> 
> protected visibility means the definition cannot
> be overridden by another module, but it should
> still allow extern references.
> 
> if the main module references such an object then
> (as an implementation detail) it may use copy
> relocation against it, which places 'n' in the
> main module and the dynamic linker should make
> sure that references to 'n' point there.
> 
> this is only possible if references to 'n' go
> through the GOT (i.e. it should not be "binds_local").

Re: [Patch ifcvt costs 0/3] Introduce a new target hook for ifcvt costs.

2015-09-30 Thread Richard Biener

On Tue, Sep 29, 2015 at 9:23 PM, Mike Stump  wrote:
> On Sep 29, 2015, at 7:31 AM, James Greenhalgh  
> wrote:
>> On Tue, Sep 29, 2015 at 11:16:37AM +0100, Richard Biener wrote:
>>> On Fri, Sep 25, 2015 at 5:04 PM, James Greenhalgh
>>>  wrote:

 In relation to the patch I put up for review a few weeks ago to teach
 RTL if-convert to handle multiple sets in a basic block [1], I was
 asking about a sensible cost model to use. There was some consensus at
 Cauldron that what should be done in this situation is to introduce a
 target hook that delegates answering the question to the target.
>>>
>>> Err - the consensus was to _not_ add gazillion of special target hooks
>>> but instead enhance what we have with rtx_cost so that passes can
>>> rely on comparing before and after costs of a sequence of insns.
>>
>> Ah, I was not able to attend Cauldron this year, so I was trying to pick out
>> "consensus" from the video. Rewatching it now, I see a better phrase would
>> be "suggestion with some support”.
>
> I’m not a big fan of rtx_cost.  To me it feels more like a crude, sledge 
> hammer.  Now, that is the gcc way, we have a ton of these things, but would 
> be nice to refine the tools so that the big escape hatch isn’t used as often 
> and we have more finer grained ways of doing things.  rtx_cost should be what 
> a code-generator generates with most new ports when they use the nice api to 
> do a port.  The old sledge hammer wielding ports may well always define 
> rtx_cost themselves, but, we should shoot for something better.
>
> As a concrete example, I now have a code-generator for enum reg_class, 
> N_REG_CLASSES, REG_CLASS_NAMES, REG_CLASS_CONTENTS, REGISTER_NAMES, 
> FIXED_REGISTERS, CALL_USED_REGISTERS, ADDITIONAL_REGISTER_NAMES, 
> REG_ALLOC_ORDER and more (some binutils code-gen to do with registers), and 
> oh my, it is so much nicer to user than the original api.  If you only ever 
> have to write once these things, fine, but, if you develop and prototype 
> CPUs, the existing interface is, well, less than ideal.  I can do things like:
>
> gccrclass
>   rc_gprs = “GENERAL”;
>
> r gpr[] = { rc_gprs, Fixed, Used,
> "$zero", "$sp", "$fp", "$lr" };
> r gpr_sav[] = { Notfixed, Notused, alias ("$save_first"),
> "$sav1",   "$sav2",   "$sav3",   "$sav4”,
>
> and get all the other goop I need for free.  I’d encourage people to find a 
> way to do up an rtx_cost generator.  If you're a port maintainer, and want to 
> redo your port to use a nicer api to do the registers, let me know.  I’d love 
> to see progress made to rid gcc of the old crappy apis.

I agree that rtx_cost isn't the nicest thing either.  But adding hooks
like my_transform_proftable_p (X *) with 'X' being pass private
data structure isn't very great design either.  In fact it's worse IMHO.

Richard.

Re: Add checkpoint to libgomp dg-shouldfail tests

2015-09-30 Thread Thomas Schwinge

Hi!

On Sun, 27 Sep 2015 18:55:53 +0200, Jakub Jelinek  wrote:
> > > OK for trunk?
> 
> Ok.

Thanks for the review.  Committed in r228282:

commit e344dc2e94177f522fe8d62fe95ec2d1687e017a
Author: tschwinge 
Date:   Wed Sep 30 08:44:49 2015 +

Add checkpoint to libgomp dg-shouldfail tests

That is, verify that we're actually reaching the expected checkpoint before
terminating.

libgomp/
* testsuite/libgomp.oacc-c-c++-common/abort-1.c: Add checkpoint.
* testsuite/libgomp.oacc-c-c++-common/abort-3.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/clauses-2.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/data-already-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/data-already-2.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/data-already-3.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/data-already-4.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/data-already-5.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/data-already-6.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/data-already-7.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/data-already-8.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-11.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-16.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-17.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-18.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-2.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-20.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-21.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-22.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-23.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-25.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-26.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-27.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-28.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-29.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-3.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-30.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-34.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-35.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-36.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-39.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-4.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-40.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-42.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-43.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-44.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-47.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-48.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-52.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-53.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-54.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-57.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-58.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-62.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-63.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-64.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-65.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-67.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-68.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-71.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-77.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-80.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/present-1.c: Likewise.
* testsuite/libgomp.oacc-fortran/abort-1.f90: Likewise.
* testsuite/libgomp.oacc-fortran/data-already-1.f: Likewise.
* testsuite/libgomp.oacc-fortran/data-already-2.f: Likewise.
* testsuite/libgomp.oacc-fortran/data-already-3.f: Likewise.
* testsuite/libgomp.oacc-fortran/data-already-4.f: Likewise.
* testsuite/libgomp.oacc-fortran/data-already-5.f: Likewise.
* testsuite/libgomp.oacc-fortran/data-already-6.f: Likewise.
* testsuite/libgomp.oacc-fortran/data-already-7.f: Likewise.
* testsuite/libgomp.oacc-fortran/data-already-8.f: Likewise.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@228282 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog  | 66 ++
 .../testsuite/libgomp.oacc-c-c++-common/abort-1.c  |  5 +-

Re: [PATCH, PR target/67761] Fix i686-- bootstrap comparison failure

2015-09-30 Thread Uros Bizjak

Hello!

> My recenttly introduced STV pass doesn't skip debug instructions and it 
> causes transformation
> (mistly cost computation) depending on debug info.  It causes bootstrap 
> comparison failure.  This
> patch fixes.  Bootstrapped for i686-linux.  Testing for 
> x86_64-unknown-linux-gnu{,m32} is in
> progress.  OK for trunk if pass?

IMO, it would be also beneficial to bootstrap with slm default
architecture, so new code paths get some coverage via bootstrap.

> gcc/
>
> 2015-09-29  Ilya Enkovich  
>
> * config/i386/i386.c (scalar_chain::analyze_register_chain): Ignore
> debug insns.
> (scalar_chain::convert_reg): Likewise.
>
> gcc/testsuite/
>
> 2015-09-29  Ilya Enkovich  
>
> * gcc.target/i386/pr67761.c: New test.

OK.

Thanks,
Uros.

[gomp4,committed] Remove release_dangling_ssa_names

2015-09-30 Thread Tom de Vries


[ was: Re: [PATCH] Don't create superfluous parm in expand_omp_taskreg ]
On 24/09/15 11:02, Thomas Schwinge wrote:

Hi Tom!

On Thu, 24 Sep 2015 08:36:27 +0200, Tom de Vries  wrote:

>On 24/09/15 08:23, Thomas Schwinge wrote:

> >On Tue, 11 Aug 2015 20:53:39 +0200, Tom de Vries  
wrote:

> >>Don't create superfluous parm in expand_omp_taskreg
> >>
> >>2015-08-11  Tom de Vries
> >>
> >>   * omp-low.c (expand_omp_taskreg): If in ssa, set rhs of parcopy stmt to
> >>   parm_decl, rather than generating a dummy default def in cfun.
> >>   * tree-cfg.c (replace_ssa_name): Assume no default defs.  Make sure
> >>   ssa_name from cfun and child_fn do not share a stmt as def stmt.
> >>   (move_stmt_op): Handle PARM_DECl.
> >>   (gather_ssa_name_hash_map_from): New function.
> >>   (move_sese_region_to_fn): Add default defs for function params, and add
> >>   them to vars_map.  Release copied ssa names.
> >>   * tree-cfg.h (gather_ssa_name_hash_map_from): Declare.

> >
> >Do I understand correct that with this change present on trunk (which I'm
> >currently merging into gomp-4_0-branch), the changes you've earlier done
> >on gomp-4_0-branch to gcc/omp-low.c:release_dangling_ssa_names,
> >gcc/tree-cfg.c:replace_ssa_name, should now be reverted?  That is, how
> >much of the following patches can be reverted now (listed backwards in
> >time)?

>
>indeed, in the above commit we release the dangling ssa names in
>move_sese_region_to_fn. So after committing this patch to the
>gomp-4_0-branch, the call to release_dangling_ssa_names is no longer
>necessary, and the function release_dangling_ssa_names can be removed.





  Well, I'm asking because in my merge tree, I'm running
   into an assertion that you added there -- not sure yet whether I've
   done something wrong, though.


The source of the problem was in expand_omp_target, which needed similar 
changes as expand_omp_taskreg got in the "Don't create superfluous parm 
in expand_omp_taskreg" patch.


Now that the merge ( 
https://gcc.gnu.org/viewcvs/gcc/branches/gomp-4_0-branch/gcc/omp-low.c?limit_changes=0=228091=228090=228091 
) contains that change, I've committed these two patches to gomp-4_0-branch:

- Revert "Fix release_dangling_ssa_names"
  (Reverting an earlier attempt to handle the
  release_dangling_ssa_names TODO, which was committed to the
  gomp-4_0-branch)
- Remove release_dangling_ssa_names

Thanks,
- Tom

Revert "Fix release_dangling_ssa_names"

2015-09-24  Tom de Vries  

	Revert:
	2015-08-05  Tom de Vries  

	* omp-low.c (release_dangling_ssa_names): Release SSA_NAMEs with NULL
	def stmt.
	* tree-cfg.c (replace_ssa_name): Don't move default def nops.  Set def
	stmt of unused SSA_NAME to NULL.
---
 gcc/omp-low.c  | 33 -
 gcc/tree-cfg.c | 12 
 2 files changed, 24 insertions(+), 21 deletions(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index a72db53..04a60ab 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -10541,10 +10541,11 @@ make_pass_expand_omp (gcc::context *ctxt)
   return new pass_expand_omp (ctxt);
 }
 
-/* After running pass_expand_omp_ssa to expand the oacc kernels directive, we
-   are left in the original function with anonymous SSA_NAMEs, with a NULL
-   defining statement.  This function finds those SSA_NAMEs and releases
-   them.  */
+/* After running pass_expand_omp_ssa to expand the oacc kernels
+   directive, we are left in the original function with anonymous
+   SSA_NAMEs, with a defining statement that has been deleted.  This
+   pass finds those SSA_NAMEs and releases them.
+   TODO: Either fix this elsewhere, or make the fix unnecessary.  */
 
 static void
 release_dangling_ssa_names (void)
@@ -10559,12 +10560,26 @@ release_dangling_ssa_names (void)
   gimple *stmt = SSA_NAME_DEF_STMT (name);
   if (stmt != NULL)
 	continue;
+  bool found = false;
 
-  release_ssa_name (name);
-  gcc_assert (SSA_NAME_IN_FREE_LIST (name));
-  if (dump_file
-	  && (dump_flags & TDF_DETAILS))
-	fprintf (dump_file, "Released dangling ssa name %u\n", i);
+  ssa_op_iter op_iter;
+  def_operand_p def_p;
+  FOR_EACH_PHI_OR_STMT_DEF (def_p, stmt, op_iter, SSA_OP_ALL_DEFS)
+	{
+	  tree def = DEF_FROM_PTR (def_p);
+	  if (def == name)
+	{
+	  found = true;
+	  break;
+	}
+	}
+
+  if (!found)
+	{
+	  if (dump_file)
+	fprintf (dump_file, "Released dangling ssa name %u\n", i);
+	  release_ssa_name (name);
+	}
 }
 }
 
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index cd7a4b4..a3c3b20 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -6422,12 +6422,6 @@ replace_ssa_name (tree name, hash_map *vars_map,
 	{
 	  gcc_assert (!SSA_NAME_IS_DEFAULT_DEF (name));
 	  replace_by_duplicate_decl (, vars_map, to_context);
-	  /* If name is a default def, then we don't move the defining stmt
-	 (which is a nop).

Re: Fold acc_on_device

2015-09-30 Thread Richard Biener

On Tue, Sep 29, 2015 at 8:21 PM, Nathan Sidwell  wrote:
> This patch folds acc_on_device as a regular builtin, but postponed until we
> know which compiler we're in.  As suggested by Bernd, we use the existing
> builtin folding machinery.
>
> Trunk is still using  the older PTX runtime scheme (Thomas is working on
> that), so the only change there is in the  host-side libgomp piece.
>
> Ok for trunk?

Please don't add any new GENERIC based builtin folders.  Instead add to
gimple-fold.c:gimple_fold_builtin

Otherwise you're just generating more work for us who move foldings from
builtins.c to gimple-fold.c.

Thanks,
Richard.

> nathan

[RS6000] Make -msingle-pic-base remove the ELFv2 global entry code

2015-09-30 Thread Alan Modra

For other ABIs, -msingle-pic-base makes gcc omit loading of the PIC
register in function prologues.  This patch makes the option affect
ELFv2 too.

I wrote a patch like this during the initial ELFv2 effort, but there
were many more important patches to push and this one somehow got
dropped.  Dusted off and retested at the request of powerpc64 kernel
people who'd like an option to disable ELFv2 global entry code for
the kernel.  OK mainline?

* config/rs6000/rs6000.c (rs6000_emit_prologue): Don't set
r2_setup_needed when TARGET_SINGLE_PIC_BASE.
(rs6000_output_mi_thunk): Likewise.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index ae456ff..023f622 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -24118,13 +24118,13 @@ rs6000_emit_prologue (void)
 #define NOT_INUSE(R) do {} while (0)
 #endif
 
-  if (DEFAULT_ABI == ABI_ELFv2)
+  if (DEFAULT_ABI == ABI_ELFv2
+  && !TARGET_SINGLE_PIC_BASE)
 {
   cfun->machine->r2_setup_needed = df_regs_ever_live_p (TOC_REGNUM);
 
   /* With -mminimal-toc we may generate an extra use of r2 below.  */
-  if (!TARGET_SINGLE_PIC_BASE
- && TARGET_TOC && TARGET_MINIMAL_TOC && get_pool_size () != 0)
+  if (TARGET_TOC && TARGET_MINIMAL_TOC && get_pool_size () != 0)
cfun->machine->r2_setup_needed = true;
 }
 
@@ -26800,7 +26800,8 @@ rs6000_output_mi_thunk (FILE *file, tree thunk_fndecl 
ATTRIBUTE_UNUSED,
   /* Ensure we have a global entry point for the thunk.   ??? We could
  avoid that if the target routine doesn't need a global entry point,
  but we do not know whether this is the case at this point.  */
-  if (DEFAULT_ABI == ABI_ELFv2)
+  if (DEFAULT_ABI == ABI_ELFv2
+  && !TARGET_SINGLE_PIC_BASE)
 cfun->machine->r2_setup_needed = true;
 
   /* Run just enough of rest_of_compilation to get the insns emitted.

-- 
Alan Modra
Australia Development Lab, IBM

Re: [PATCH] Optimize certain end of loop conditions into min/max operation

2015-09-30 Thread Michael Collison


Richard and Marc,

What is ':s'? I don't see any documentation for it. So you would like me 
to remove :c and add :s?



On 09/18/2015 02:23 AM, Richard Biener wrote:

On Fri, Sep 18, 2015 at 9:38 AM, Marc Glisse  wrote:

Just a couple extra points. We can end up with a mix of < and >, which might
prevent from matching:

   _3 = b_1(D) > a_2(D);
   _5 = a_2(D) < c_4(D);
   _8 = _3 & _5;

Just like with &, we could also transform:
x < y | x < z  --->  x < max(y, z)

(but maybe wait to make sure reviewers are ok with the first transformation
before generalizing)

Please merge the patterns as suggested and do the :c/:s changes as well.

The issue with getting mixed < and > is indeed there - I've wanted to
extend :c to handle tcc_comparison in some way at some point but
didn't get to how best to implement that yet...

So to fix that currently you have to replicate the merged pattern
with swapped comparison operands.

Otherwise I'm fine with the general approach.

Richard.


On Fri, 18 Sep 2015, Marc Glisse wrote:


On Thu, 17 Sep 2015, Michael Collison wrote:


Here is the the patch modified with test cases for MIN_EXPR and MAX_EXPR
expressions. I need some assistance; this test case will fail on targets
that don't have support for MIN/MAX such as 68k. Is there any way to remedy
this short of enumerating whether a target support MIN/MAX in
testsuite/lib/target_support?

2015-07-24  Michael Collison 
Andrew Pinski 

* match.pd ((x < y) && (x < z) -> x < min (y,z),
(x > y) and (x > z) -> x > max (y,z))
* testsuite/gcc.dg/tree-ssa/minmax-loopend.c: New test.

diff --git a/gcc/match.pd b/gcc/match.pd
index 5e8fd32..8691710 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1793,3 +1793,17 @@ along with GCC; see the file COPYING3.  If not see
 (convert (bit_and (op (convert:utype @0) (convert:utype @1))
   (convert:utype @4)))

+
+/* Transform (@0 < @1 and @0 < @2) to use min */
+(for op (lt le)
+(simplify


You seem to be missing all indentation.


+(bit_and:c (op @0 @1) (op @0 @2))


:c seems useless here. On the other hand, it might make sense to use op:s
since this is mostly useful if it removes the 2 original comparisons.


+(if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))


How did you chose this restriction? It seems safe enough, but the
transformation could make sense in other cases as well. It can always be
generalized later though.


+(op @0 (min @1 @2)
+
+/* Transform (@0 > @1 and @0 > @2) to use max */
+(for op (gt ge)


Note that you could unify the patterns with something like:
(for op (lt le gt ge)
 ext (min min max max)
(simplify ...


+(simplify
+(bit_and:c (op @0 @1) (op @0 @2))
+(if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
+(op @0 (max @1 @2)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c
b/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c
new file mode 100644
index 000..cc0189a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+#define N 1024
+
+int a[N], b[N], c[N];
+
+void add (unsigned int m, unsigned int n)
+{
+  unsigned int i;
+  for (i = 0; i < m && i < n; ++i)


Maybe writing '&' instead of '&&' would make it depend less on the target.
Also, both tests seem to be for GENERIC (i.e. I expect that you are already
seeing the optimized version with -fdump-tree-original or
-fdump-tree-gimple). Maybe something as simple as:
int f(long a, long b, long c) {
  int cmp1 = a < b;
  int cmp2 = a < c;
  return cmp1 & cmp2;
}


+a[i] = b[i] + c[i];
+}
+
+void add2 (unsigned int m, unsigned int n)
+{
+  unsigned int i;
+  for (i = N-1; i > m && i > n; --i)
+a[i] = b[i] + c[i];
+}
+
+/* { dg-final { scan-tree-dump "MIN_EXPR" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump "MAX_EXPR" 1 "optimized" } } */




--
Marc Glisse


--
Michael Collison
Linaro Toolchain Working Group
michael.colli...@linaro.org

Re: [PATCH] DWARF support for AIX v5

2015-09-30 Thread Matthias Klose


On 25.09.2015 20:59, David Edelsohn wrote:

This version adds extra tests for HAVE_XCOFF_DWARF_EXTRAS.  I placed
the default in dwarf2out.c instead of defaults.h.

Because eh_frame is internal to GCC with its own section, I emit the
length, but inhibit the length for AIX debug_frame section.

This allows DWARF debugging to work on older AIX 7.1 systems within
the limitations of the available sections.  It also allows it to build
and test on a wider variety of AIX 7.1 systems.

I also changed the TLS decorations to use a switch statement, as suggested.

Thanks, David

 * dwarf2out.c (XCOFF_DEBUGGING_INFO): Default 0 definition.
 (HAVE_XCOFF_DWARF_EXTRAS): Default to 0 definition.
 (output_fde): Don't output length for debug_frame on AIX.
 (output_call_frame_info): Don't output length for debug_frame on AIX.
 (have_macinfo): Force to False for XCOFF_DEBUGGING_INFO and not
 HAVE_XCOFF_DWARF_EXTRAS.
 (add_AT_loc_list): Return early if XCOFF_DEBUGGING_INFO and not
 HAVE_XCOFF_DWARF_EXTRAS.
 (output_compilation_unit_header): Don't output length on AIX.
 (output_pubnames): Don't output length on AIX.
 (output_aranges): Delete argument. Compute length locally. Don't
 output length on AIX.
 (output_line_info): Don't output length on AIX.
 (dwarf2out_finish): Don't compute aranges_length.
 * dwarf2asm.c (XCOFF_DEBUGGING_INFO): Default 0 definition.
 (dw2_asm_output_nstring): Emit .byte not .ascii on AIX.
 * config/rs6000/rs6000.c (rs6000_output_dwrf_dtprel): Emit correct
 symbol decoration for AIX.
 (rs6000_xcoff_debug_unwind_info): New.
 (rs6000_xcoff_asm_named_section): Emit .dwsect pseudo-op
 for SECTION_DEBUG.
 (rs6000_xcoff_declare_function_name): Emit different
 .function pseudo-op when DWARF2_DEBUG. Don't call
 xcoffout_declare_function for DWARF2_DEBUG.
 * config/rs6000/xcoff.h (TARGET_DEBUG_UNWIND_INFO):
 Redefine.
 * config/rs6000/aix71.h (DWARF2_DEBUGGING_INFO): Define.
 (PREFERRED_DEBUGGING_TYPE): Define.
 (DEBUG_INFO_SECTION): Define.
 (DEBUG_ABBREV_SECTION): Define.
 (DEBUG_ARANGES_SECTION): Define.
 (DEBUG_LINE_SECTION): Define.
 (DEBUG_PUBNAMES_SECTION): Define.
 (DEBUG_PUBTYPES_SECTION): Define.
 (DEBUG_STR_SECTION): Define.
 (DEBUG_RANGES_SECTION): Define.


I see a build failure on powerpc64le-linux-gnu:

/home/doko/gcc/gcc-snapshot-20150929/src/gcc/configure: line 26465: syntax error 
near unexpected token `$target'
/home/doko/gcc/gcc-snapshot-20150929/src/gcc/configure: line 26465: `case 
$target in'

Makefile:4165: recipe for target 'configure-stage1-gcc' failed
make[4]: *** [configure-stage1-gcc] Error 2
make[4]: Leaving directory '/home/doko/gcc/gcc-snapshot-20150929/build'

$ bash -n src/gcc/configure
src/gcc/configure: line 26465: syntax error near unexpected token `$target'
src/gcc/configure: line 26465: `case $target in'

The above ChangeLog entry doesn't mention the patch to configure.ac and the 
regeneration of configure, but it is included in the commit message.  It looks 
like that gcc/configure was manually fixed without fixing gcc/configure.ac. 
Committing as obvious (although I see some whitespace changes with an unmodified 
autoconf2.64 downloaded from ftp.gnu.org).


Matthias

gcc/

2015-09-30  Matthias Klose  

* configure.ac: Remove extraneous ;;.
* configure: Regenerate.

Index: configure.ac
===
--- configure.ac(revision 228280)
+++ configure.ac(working copy)
@@ -4326,7 +4326,6 @@
[Define if your assembler supports .ref])])
;;
 esac
-;;

 case $target in
   *-*-aix*)

Re: [PATCH] Fix warnings building pdp11 port

2015-09-30 Thread Richard Biener

On Tue, Sep 29, 2015 at 6:55 PM, Jeff Law  wrote:
> The pdp11 port fails to build with the trunk because of a warning.
> Essentially VRP determines that the result of using BRANCH_COST is a
> constant with the range [0..1].  That's always less than 4, 3 and the
> various other magic constants used with BRANCH_COST and VRP issues a warning
> about that comparison.

It does?  Huh.  Is it about undefined overflow which is the only thing
VRP should end up
warning about?  If so I wonder how that happens, at least I can't
reproduce it for
--target=pdp11 --enable-werror build of cc1.

> I expect we're going to be overhauling BRANCH_COST shortly.  In the mean
> time, this just revectors BRANCH_COST for the pdp11 into a function to
> prevent VRP from collapsing the test and issuing the warning.
>
> Yes, this means more code in the pdp11 cross compiler.  I'm not terribly
> concerned about that and I couldn't stand the idea of scattering diagnostic
> push/pop stuff all over the place to make just the pdp11 port happy.
>
>
> Tested by building the pdp11 targets from config-all.mk.
>
> Installed on the trunk.
>
> Jeff

Re: [Patch ifcvt costs 0/3] Introduce a new target hook for ifcvt costs.

2015-09-30 Thread Richard Biener

On Tue, Sep 29, 2015 at 4:31 PM, James Greenhalgh
 wrote:
> On Tue, Sep 29, 2015 at 11:16:37AM +0100, Richard Biener wrote:
>> On Fri, Sep 25, 2015 at 5:04 PM, James Greenhalgh
>>  wrote:
>> > Hi,
>> >
>> > In relation to the patch I put up for review a few weeks ago to teach
>> > RTL if-convert to handle multiple sets in a basic block [1], I was
>> > asking about a sensible cost model to use. There was some consensus at
>> > Cauldron that what should be done in this situation is to introduce a
>> > target hook that delegates answering the question to the target.
>>
>> Err - the consensus was to _not_ add gazillion of special target hooks
>> but instead enhance what we have with rtx_cost so that passes can
>> rely on comparing before and after costs of a sequence of insns.
>
> Ah, I was not able to attend Cauldron this year, so I was trying to pick out
> "consensus" from the video. Rewatching it now, I see a better phrase would
> be "suggestion with some support".
>
> Watching the video a second time, it seems your proposal is that we improve
> the RTX costs infrastructure to handle sequences of Gimple/RTX. That would
> get us some way to making a smart decision in if-convert, but I'm not
> convinced it allows us to answer the question we are interested in.
>
> We have the rtx for before and after, and we can generate costs for these
> sequences. This allows us to calculate some weighted cost of the
> instructions based on the calculated probabilities that each block is
> executed. However, we are missing information on how expensive the branch
> is, and we have no way to get that through an RTX-costs infrastructure.

Yeah, though during the meeting at the Cauldron I was asking on whether we
maybe want a replacement_cost hook that can assume the to-be-replaced
sequence is in the IL thus the hook can inspect insn_bb and thus get at
branch costs ...

Surely the proposed rtx_cost infrastructure enhancements will not
cover all cases
so the thing I wanted to throw in was that there was _not_ consensus that cost
should be computed by pass specific target hooks that allow the target
to inspect
pass private data.  Because that's a maintainance nightmare if you change a pass
and have to second guess 50 targets cost hook implementations.

> We could add a hook to give a cost in COSTS_N_INSNS units to a branch based
> on its predictability. This is difficult as COSTS_N_INSNS units can differ
> depending on whether you are talking about floating-point or integer code.

Yes, which is why I suggested a replacement cost ...  Of course the question is
what you feed that hook as in principle it would be nice to avoid building the
replacement RTXen until we know doing that will be necessary (the replacement
is profitable).

Maybe as soon as CFG changes are involved we need to think about adding
a BB frequency to the hook though factoring in that can be done in the passes.
What is the interesting part for the target is probably the cost of
the branch itself
as that can vary depending on predictability (which is usually very
hard to assess
at compile-time anyway).  So what about a branch_cost hook that takes
taken/not-taken probabilities as argument?

Note that the agreement during the discussion was that all costs need to be
comparable.

> By this I mean, the compiler considers a SET which costs more than
> COSTS_N_INSNS (1) to be "expensive". Consequently, some targets set the cost
> of both an integer SET and a floating-point SET to both be COSTS_N_INSNS (1).
> In reality, these instructions may have different latency performance
> characteristics. What real world quantity are we trying to invoke when we
> say a branch costs the same as 3 SET instructions of any type? It certainly
> isn't mispredict penalty (likely measured in cycles, not relative to the cost
> of a SET instruction, which may well be completely free on modern x86
> processors), nor is it the cost of executing the branch instruction which
> is often constant to resolve regardless of predicted/mispredicted status.
>
> On the other side of the equation, we want a cost for the converted
> sequence. We can build a cost of the generated rtl sequence, but for
> targets like AArch64 this is going to be wildly off. AArch64 will expand
> (a > b) ? x : y; as a set to the CC register, followed by a conditional
> move based on the CC register. Consequently, where we have multiple sets
> back to back we end up with:
>
>   set CC (a > b)
>   set x1 (CC ? x : y)
>   set CC (a > b)
>   set x2 (CC ? x : z)
>   set CC (a > b)
>   set x3 (CC ? x : k)
>
> Which we know will be simplified later to:
>
>   set CC (a > b)
>   set x1 (CC ? x : y)
>   set x2 (CC ? x : z)
>   set x3 (CC ? x : k)
>
> I imagine other targets have something similar in their expansion of
> movcc (though I haven't looked).
>
> Our comparison for if-conversion then must be:
>
>   weighted_old_cost = (taken_probability * (then_bb_cost)
>

Re: [PATCH] Optimize certain end of loop conditions into min/max operation

2015-09-30 Thread Richard Biener

On Wed, Sep 30, 2015 at 9:29 AM, Michael Collison
 wrote:
> Richard and Marc,
>
> What is ':s'? I don't see any documentation for it. So you would like me to
> remove :c and add :s?

There is documentation for both in the internals manual.

I don't have enough context to say whether you should remove "them" or
not.  What's
the current patch?  If you made the suggested changes you should be left with
only required :s and :c.

Richard.

>
>
> On 09/18/2015 02:23 AM, Richard Biener wrote:
>>
>> On Fri, Sep 18, 2015 at 9:38 AM, Marc Glisse  wrote:
>>>
>>> Just a couple extra points. We can end up with a mix of < and >, which
>>> might
>>> prevent from matching:
>>>
>>>_3 = b_1(D) > a_2(D);
>>>_5 = a_2(D) < c_4(D);
>>>_8 = _3 & _5;
>>>
>>> Just like with &, we could also transform:
>>> x < y | x < z  --->  x < max(y, z)
>>>
>>> (but maybe wait to make sure reviewers are ok with the first
>>> transformation
>>> before generalizing)
>>
>> Please merge the patterns as suggested and do the :c/:s changes as well.
>>
>> The issue with getting mixed < and > is indeed there - I've wanted to
>> extend :c to handle tcc_comparison in some way at some point but
>> didn't get to how best to implement that yet...
>>
>> So to fix that currently you have to replicate the merged pattern
>> with swapped comparison operands.
>>
>> Otherwise I'm fine with the general approach.
>>
>> Richard.
>>
>>> On Fri, 18 Sep 2015, Marc Glisse wrote:
>>>
 On Thu, 17 Sep 2015, Michael Collison wrote:

> Here is the the patch modified with test cases for MIN_EXPR and
> MAX_EXPR
> expressions. I need some assistance; this test case will fail on
> targets
> that don't have support for MIN/MAX such as 68k. Is there any way to
> remedy
> this short of enumerating whether a target support MIN/MAX in
> testsuite/lib/target_support?
>
> 2015-07-24  Michael Collison 
> Andrew Pinski 
>
> * match.pd ((x < y) && (x < z) -> x < min (y,z),
> (x > y) and (x > z) -> x > max (y,z))
> * testsuite/gcc.dg/tree-ssa/minmax-loopend.c: New test.
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 5e8fd32..8691710 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -1793,3 +1793,17 @@ along with GCC; see the file COPYING3.  If not
> see
>  (convert (bit_and (op (convert:utype @0) (convert:utype @1))
>(convert:utype @4)))
>
> +
> +/* Transform (@0 < @1 and @0 < @2) to use min */
> +(for op (lt le)
> +(simplify


 You seem to be missing all indentation.

> +(bit_and:c (op @0 @1) (op @0 @2))


 :c seems useless here. On the other hand, it might make sense to use
 op:s
 since this is mostly useful if it removes the 2 original comparisons.

> +(if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))


 How did you chose this restriction? It seems safe enough, but the
 transformation could make sense in other cases as well. It can always be
 generalized later though.

> +(op @0 (min @1 @2)
> +
> +/* Transform (@0 > @1 and @0 > @2) to use max */
> +(for op (gt ge)


 Note that you could unify the patterns with something like:
 (for op (lt le gt ge)
  ext (min min max max)
 (simplify ...

> +(simplify
> +(bit_and:c (op @0 @1) (op @0 @2))
> +(if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
> +(op @0 (max @1 @2)
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c
> b/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c
> new file mode 100644
> index 000..cc0189a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c
> @@ -0,0 +1,23 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +
> +#define N 1024
> +
> +int a[N], b[N], c[N];
> +
> +void add (unsigned int m, unsigned int n)
> +{
> +  unsigned int i;
> +  for (i = 0; i < m && i < n; ++i)


 Maybe writing '&' instead of '&&' would make it depend less on the
 target.
 Also, both tests seem to be for GENERIC (i.e. I expect that you are
 already
 seeing the optimized version with -fdump-tree-original or
 -fdump-tree-gimple). Maybe something as simple as:
 int f(long a, long b, long c) {
   int cmp1 = a < b;
   int cmp2 = a < c;
   return cmp1 & cmp2;
 }

> +a[i] = b[i] + c[i];
> +}
> +
> +void add2 (unsigned int m, unsigned int n)
> +{
> +  unsigned int i;
> +  for (i = N-1; i > m && i > n; --i)
> +a[i] = b[i] + c[i];
> +}
> +
> +/* { dg-final { scan-tree-dump "MIN_EXPR" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump "MAX_EXPR" 1

Re: [libffi] Correct powerpc sysv stack argument accounting (#194)

2015-09-30 Thread Alan Modra

On Thu, Sep 03, 2015 at 09:33:45PM -0400, Anthony Green wrote:
> Please go ahead. I've been on vacation for a while. Returning next week... 

Committed revision 228307.

>  Original message 
> From: Alan Modra  
> Date: 09-03-2015  7:40 PM  (GMT-05:00) 
> To: Richard Henderson , gcc-patches@gcc.gnu.org 
> Cc: Anthony Green , David Edelsohn  
> Subject: Re: [libffi] Correct powerpc sysv stack argument accounting (#194) 
> 
> On Tue, Aug 04, 2015 at 08:23:46AM -0700, Richard Henderson wrote:
> > Looks good, Alan.  Thanks.  After this gets merged, I guess it's
> > worth merging back to gcc.
> 
> It's been a month since I created the pull request and posted
> https://sourceware.org/ml/libffi-discuss/2015/msg00079.html
> 
> Given that things are going rather slow in libffi land at the moment,
> perhaps I should merge this to gcc now?

-- 
Alan Modra
Australia Development Lab, IBM

Re: [PATCH] x86 interrupt attribute

2015-09-30 Thread Yulia Koval

Done.

Julia

On Wed, Sep 30, 2015 at 9:59 PM, H.J. Lu  wrote:
> On Wed, Sep 30, 2015 at 5:36 AM, Yulia Koval  wrote:
>> Hi,
>>
>> Thanks. I added all fixes to the patch, bootstrapped/regtested it on
>> Linux/x86_64. Linux/i686 in progress. Ok for trunk if testing passes
>> successfully?
>
> +  /* If true, the current function is an interrupt function as
> + specified by the "interrupt" or "exception" attribute.  */
> +  BOOL_BITFIELD is_interrupt : 1;
> +
> +  /* If true, the current function is an interrupt function as
> + specified by the "exception" attribute.  */
> +  BOOL_BITFIELD is_exception : 1;
>
> Please update comments.  There is no "exception" attribute.
>
> --
> H.J.


patch_interrupt
Description: Binary data

Do not use TYPE_CANONICAL in useless_type_conversion

2015-09-30 Thread Jan Hubicka

Hi,
this implements the idea we discussed at Cauldron to not use TYPE_CANONICAL for
useless_type_conversion_p.  The basic idea is that TYPE_CANONICAL is language
specific and should not be part of definition of the Gimple type system that 
should
be quite agnostic of language.

useless_type_conversion_p clearly is about operations on the type and those do 
not
depends on TYPE_CANONICAL or alias info. For LTO and C/Fortran interpoerability
rules we are forced to make TYPE_CANONICAL more coarse than it needs to be
that results in troubles with useless_type_conversion_p use.

After dropping the check I needed to solve two issues. First is that we need a
definition of useless conversions for aggregates. As discussed earlier I made
it to depend only on size. The basic idea is that only operations you can do on
gimple with those are moves and field accesses. Field accesses have
corresponding type into in COMPONENT_REF or MEM_REF, so we do not care about
conversions of those.  This caused three Ada failures on PPC64, because we can
not move between structures of same size but different mode.

Other failure introduced was 2 cases in C++ testsuite because we currently
do not handle OFFSET_TYPE at all.  I added the obvious check for TREE_TYPE
and BASE_TYPE to be compatible.
I think we can allow more of conversions between OFFSET_TYPEs and integer
types, but I would like to leave this for incremental changes. (It is probalby
not too important anyway).

Bootstrapped/regtested x86_64-linux except Ada and ppc64-linux for all languages
including Ada. OK?

I have reviewed the uses of useless_type_conversion_p on non-register types
and there are some oddities, will send separate patches for those.

Honza

* gimple-expr.c (useless_type_conversion_p): Do not use TYPE_CANONICAL
for defining useless conversions; make structure compatible if size
and mode are.
Index: gimple-expr.c
===
--- gimple-expr.c   (revision 228267)
+++ gimple-expr.c   (working copy)
@@ -87,11 +87,6 @@ useless_type_conversion_p (tree outer_ty
   if (inner_type == outer_type)
 return true;
 
-  /* If we know the canonical types, compare them.  */
-  if (TYPE_CANONICAL (inner_type)
-  && TYPE_CANONICAL (inner_type) == TYPE_CANONICAL (outer_type))
-return true;
-
   /* Changes in machine mode are never useless conversions unless we
  deal with aggregate types in which case we defer to later checks.  */
   if (TYPE_MODE (inner_type) != TYPE_MODE (outer_type)
@@ -270,12 +265,23 @@ useless_type_conversion_p (tree outer_ty
   return true;
 }
 
-  /* For aggregates we rely on TYPE_CANONICAL exclusively and require
- explicit conversions for types involving to be structurally
- compared types.  */
+  /* For aggregates compare only the size and mode.  Accesses to fields do have
+ a type information by themselves and thus we only care if we can i.e.
+ use the types in move operations.  */
   else if (AGGREGATE_TYPE_P (inner_type)
   && TREE_CODE (inner_type) == TREE_CODE (outer_type))
-return false;
+return (!TYPE_SIZE (outer_type)
+   || (TYPE_SIZE (inner_type)
+   && operand_equal_p (TYPE_SIZE (inner_type),
+   TYPE_SIZE (outer_type), 0)))
+  && TYPE_MODE (outer_type) == TYPE_MODE (inner_type);
+  else if (TREE_CODE (inner_type) == OFFSET_TYPE
+  && TREE_CODE (inner_type) == TREE_CODE (outer_type))
+return useless_type_conversion_p (TREE_TYPE (outer_type),
+ TREE_TYPE (inner_type))
+  && useless_type_conversion_p
+   (TYPE_OFFSET_BASETYPE (outer_type),
+TYPE_OFFSET_BASETYPE (inner_type));
 
   return false;
 }

Re: [PATCH] x86 interrupt attribute

2015-09-30 Thread H.J. Lu

On Wed, Sep 30, 2015 at 12:53 PM, Yulia Koval  wrote:
> Done.
>

Please provide new ChangeLog entries since we removed and
added codes.


-- 
H.J.

Re: [patch, committed] Dump function attributes

2015-09-30 Thread Tom de Vries


On 29/09/15 13:29, Richard Biener wrote:

On Tue, Sep 29, 2015 at 1:23 PM, Tom de Vries  wrote:

On 29/09/15 12:36, Richard Biener wrote:


On Tue, Sep 29, 2015 at 7:43 AM, Tom de Vries 
wrote:


[ was: Re: [RFC] Dump function attributes ]

On 28/09/15 17:17, Bernd Schmidt wrote:



On 09/28/2015 04:32 PM, Tom de Vries wrote:



patch below prints the function attributes in the dump file.





foo ()
[ noclone , noinline ]
{
...

Good idea?

If so, do we want one attribute per line?




Only for really long ones I'd think. Patch is ok for now.




Reposting patch with ChangeLog entry added.

Bootstrapped and reg-tested on x86_64.

Committed to trunk.



Hmpf.  I always like to make the dump-files as much copy to
testcases
as possible.



Hmm, interesting. Not something I use, but I can imagine it's useful.


So why did you invent a new syntax for attributes instead of using
the existing __attribute__(("noclone", "noinline")) (in this case)?



My main concerns were:
- being able to see in dump files what the actual attributes of a
   function are (rather than having to figure it out in a debug session).
- being able to write testcases that can test for the presence of those
   attributes in dump files


Did you verify
how attributes with arguments get printed?



F.i. an oacc offload function compiled by the host compiler is annotated as
follows:

before pass_oacc_transform (in the gomp-4_0-branch):
...
[ oacc function 32, , , omp target entrypoint ]
...

after pass_oacc_transform:

[ oacc function 1, 1, 1, omp target entrypoint ]
.


Hmm, ok.  So without some extra dump_attribute_list wrapping
__attribute_(( ... )) around the above doesn't make it more amenable
for cut



With attached untested follow-up patch, for test-case:
...
void __attribute__((noinline)) __attribute__((alias ("bar"), noclone))
foo (void)
{

}

void __attribute__ ((__target__ ("arch=core2", "sse3")))
foo2 (void)
{

}

void __attribute__ ((optimize ((1
foo3 (void)
{

}

void __attribute__ ((optimize (("1"
foo4 (void)
{

}
...

I get at gimple dump:
...
__attribute__((noclone, alias ("bar"), noinline))
foo ()
{

}


__attribute__((__target__ ("arch=core2", "sse3")))
foo2 ()
{

}


__attribute__((optimize (1)))
foo3 ()
{

}


__attribute__((optimize ("1")))
foo4 ()
{

}
...

OK if bootstrap/regtest succeeds?

Thanks,
- Tom

Make dumping of function attributes resemble source syntax

---
 gcc/tree-cfg.c | 32 +---
 1 file changed, 25 insertions(+), 7 deletions(-)

diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index e06ee28..735ac46 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -72,6 +72,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-cfgcleanup.h"
 #include "wide-int-print.h"
 #include "gimplify.h"
+#include "attribs.h"
 
 /* This file contains functions for building the Control Flow Graph (CFG)
for a function tree.  */
@@ -7352,6 +7353,30 @@ dump_function_to_file (tree fndecl, FILE *file, int flags)
 		  && decl_is_tm_clone (fndecl));
   struct function *fun = DECL_STRUCT_FUNCTION (fndecl);
 
+  if (DECL_ATTRIBUTES (fndecl) != NULL_TREE)
+{
+  fprintf (file, "__attribute__((");
+
+  bool first = true;
+  tree chain;
+  for (chain = DECL_ATTRIBUTES (fndecl); chain;
+	   first = false, chain = TREE_CHAIN (chain))
+	{
+	  if (!first)
+	fprintf (file, ", ");
+
+	  print_generic_expr (file, get_attribute_name (chain), dump_flags);
+	  if (TREE_VALUE (chain) != NULL_TREE)
+	{
+	  fprintf (file, " (");
+	  print_generic_expr (file, TREE_VALUE (chain), dump_flags);
+	  fprintf (file, ")");
+	}
+	}
+
+  fprintf (file, "))\n");
+}
+
   current_function_decl = fndecl;
   fprintf (file, "%s %s(", function_name (fun), tmclone ? "[tm-clone] " : "");
 
@@ -7369,13 +7394,6 @@ dump_function_to_file (tree fndecl, FILE *file, int flags)
 }
   fprintf (file, ")\n");
 
-  if (DECL_ATTRIBUTES (fndecl) != NULL_TREE)
-{
-  fprintf (file, "[ ");
-  print_generic_expr (file, DECL_ATTRIBUTES (fndecl), dump_flags);
-  fprintf (file, "]\n");
-}
-
   if (flags & TDF_VERBOSE)
 print_node (file, "", fndecl, 2);
 
-- 
1.9.1

Re: Elimitate duplication of get_catalogs in different abi

2015-09-30 Thread Jonathan Wakely


On 29/09/15 21:49 +0200, François Dumont wrote:

Indeed, I just rerun all tests with success. I am re-attaching the patch.

2015-09-30  François Dumont  
   Jonathan Wakely  

   * config/locale/gnu/messages_members.cc (Catalog_info, Catalogs):
   Move...
   * config/locale/gnu/c++locale_internal.h: ...here in std namespace.
   * config/locale/gnu/c_locale.cc: Move implementation of latter here.
   * config/abi/pre/gnu.ver: Adjust.

Ok to commit ?


OK, thanks.

Re: [HSA] introduce hsa_num_threads

2015-09-30 Thread Martin Liška

On 09/25/2015 04:22 PM, Martin Liška wrote:
> Hello.
> 
> In the following patch HSA is capable of handling various OMP builtins
> that are utilized to set or get the number of threads.
> 
> Martin
> 

Hello.

This patch is a small follow-up which preserves hsa_num_threads among
kernel dispatches.

Martin
>From 2897bc5c5485430f1102688a437785fdf2a80add Mon Sep 17 00:00:00 2001
From: marxin 
Date: Fri, 25 Sep 2015 17:01:00 +0200
Subject: [PATCH] HSA: distribute hsa_num_threads among kernel dispatches.

libgomp/ChangeLog:

2015-09-25  Martin Liska  

* hsa-traits.h: Add omp_num_threads to hsa_kernel_dispatch
structure.
* plugin/plugin-hsa.c (print_kernel_dispatch): Print the
struct field.
(create_kernel_dispatch_recursive): Set default value
to omp_num_threads
(GOMP_OFFLOAD_run): Add shadow_reg to all kernel dispatches.

gcc/ChangeLog:

2015-09-25  Martin Liska  

	* hsa-gen.c (struct hsa_kernel_dispatch): New field.
	(gen_hsa_insns_for_kernel_call): Distribute hsa_num_threads
	for a kernel dispatch.
	(init_omp_in_prologue): Emit loading of shadow argument.
	(gen_body_from_gimple): Remove usage of init_omp_in_prologue.
	(generate_hsa): Move it to this function.
---
 gcc/hsa-gen.c   | 42 +++---
 libgomp/hsa-traits.h|  2 ++
 libgomp/plugin/plugin-hsa.c | 16 
 3 files changed, 45 insertions(+), 15 deletions(-)

diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c
index 6f45bfe..185b9cc 100644
--- a/gcc/hsa-gen.c
+++ b/gcc/hsa-gen.c
@@ -101,6 +101,8 @@ struct hsa_kernel_dispatch
   uint32_t group_segment_size;
   /* Number of children kernel dispatches.  */
   uint64_t kernel_dispatch_count;
+  /* Number of threads.  */
+  uint32_t omp_num_threads;
   /* Debug purpose argument.  */
   uint64_t debug;
   /* Kernel dispatch structures created for children kernel dispatches.  */
@@ -3523,6 +3525,16 @@ gen_hsa_insns_for_kernel_call (hsa_bb *hbb, gcall *call)
 			  addr);
   hbb->append_insn (mem);
 
+  /* Write to shadow_reg->omp_num_threads = hsa_num_threads.  */
+  hbb->append_insn (new hsa_insn_comment
+		("set shadow_reg->omp_num_threads = hsa_num_threads"));
+
+  addr = new hsa_op_address (shadow_reg, offsetof (hsa_kernel_dispatch,
+		   omp_num_threads));
+  hbb->append_insn
+(new hsa_insn_mem (BRIG_OPCODE_ST, hsa_num_threads_reg->type,
+		   hsa_num_threads_reg, addr));
+
   /* Write to packet->workgroup_size_x.  */
   hbb->append_insn (new hsa_insn_comment
 		("set packet->workgroup_size_x = hsa_num_threads"));
@@ -4507,12 +4519,27 @@ hsa_init_new_bb (basic_block bb)
 /* Initialize OMP in an HSA basic block PROLOGUE.  */
 
 static void
-init_omp_in_prologue (hsa_bb *prologue)
+init_omp_in_prologue (void)
 {
-  BrigType16_t t = hsa_num_threads->type;
-  prologue->append_insn
-(new hsa_insn_mem (BRIG_OPCODE_ST, t, new hsa_op_immed (64, t),
-		   new hsa_op_address (hsa_num_threads)));
+  if (!hsa_cfun->kern_p)
+return;
+
+  hsa_bb *prologue = hsa_bb_for_bb (ENTRY_BLOCK_PTR_FOR_FN (cfun));
+
+  /* Load a default value from shadow argument.  */
+  hsa_op_reg *shadow_reg_ptr = hsa_cfun->get_shadow_reg ();
+  hsa_op_address *addr = new hsa_op_address
+(shadow_reg_ptr, offsetof (hsa_kernel_dispatch, omp_num_threads));
+
+  hsa_op_reg *threads = new hsa_op_reg (BRIG_TYPE_U32);
+  hsa_insn_basic *basic = new hsa_insn_mem
+(BRIG_OPCODE_LD, threads->type, threads, addr);
+  prologue->append_insn (basic);
+
+  /* Save it to private variable hsa_num_threads.  */
+  basic = new hsa_insn_mem (BRIG_OPCODE_ST, hsa_num_threads->type, threads,
+			new hsa_op_address (hsa_num_threads));
+  prologue->append_insn (basic);
 }
 
 /* Go over gimple representation and generate our internal HSA one.  SSA_MAP
@@ -4554,8 +4581,6 @@ gen_body_from_gimple (vec  *ssa_map)
 	}
 }
 
-  init_omp_in_prologue (hsa_bb_for_bb (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
-
   FOR_EACH_BB_FN (bb, cfun)
 {
   gimple_stmt_iterator gsi;
@@ -5012,6 +5037,9 @@ generate_hsa (bool kernel)
   gen_function_def_parameters (hsa_cfun, _map);
   if (seen_error ())
 goto fail;
+
+  init_omp_in_prologue ();
+
   gen_body_from_gimple (_map);
   if (seen_error ())
 goto fail;
diff --git a/libgomp/hsa-traits.h b/libgomp/hsa-traits.h
index 3b20008..6fb7e48 100644
--- a/libgomp/hsa-traits.h
+++ b/libgomp/hsa-traits.h
@@ -43,6 +43,8 @@ struct hsa_kernel_dispatch
   uint32_t group_segment_size;
   /* Number of children kernel dispatches.  */
   uint64_t kernel_dispatch_count;
+  /* Number of threads.  */
+  uint32_t omp_num_threads;
   /* Debug purpose argument.  */
   uint64_t debug;
   /* Kernel dispatch structures created for children kernel dispatches.  */
diff --git a/libgomp/plugin/plugin-hsa.c b/libgomp/plugin/plugin-hsa.c
index f9be015..76a3b45 100644
--- a/libgomp/plugin/plugin-hsa.c
+++ b/libgomp/plugin/plugin-hsa.c
@@ -743,6 +743,9 @@

[PATCH 1/2][ARC] Add support for ARCv2 CPUs

2015-09-30 Thread Claudiu Zissulescu

This patch adds basic support for Synopsys' ARCv2 CPUs. 

Can this be committed?

Thanks,
Claudiu

ChangeLog:
2015-08-27  Claudiu Zissulescu  

* common/config/arc/arc-common.c (arc_handle_option): Handle ARCv2
options.
* config/arc/arc-opts.h: Add ARCv2 CPUs.
* config/arc/arc-protos.h (arc_secondary_reload_conv): Prototype.
* config/arc/arc.c (arc_secondary_reload): Handle subreg (reg)
situation, and store instructions with large offsets.
(arc_secondary_reload_conv): New function.
(arc_init): Add ARCv2 options.
(arc_conditional_register_usage): Select the proper register usage
for ARCv2 processors.
(arc_handle_interrupt_attribute): ILINK2 is only valid for ARCv1
architecture.
(arc_compute_function_type): Likewise.
(arc_print_operand): Handle new ARCv2 punctuation characters.
(arc_return_in_memory): ARCv2 ABI returns in registers up to 16
bytes.
(workaround_arc_anomaly, arc_asm_insn_p, arc_loop_hazard): New
function.
(arc_reorg, arc_hazard): Use it.
* config/arc/arc.h (TARGET_CPU_CPP_BUILTINS): Define __HS__ and
__EM__.
(ASM_SPEC): Add ARCv2 options.
(TARGET_NORM): ARC HS has norm instructions by default.
(TARGET_OPTFPE): Use optimized floating point emulation for ARC
HS.
(TARGET_AT_DBR_CONDEXEC): Only for ARC600 family.
(TARGET_EM, TARGET_HS, TARGET_V2, TARGET_MPYW, TARGET_MULTI):
Define.
* config/arc/arc.md
(commutative_binary_mult_comparison_result_used, movsicc_insn)
(mulsi3, mulsi3_600_lib, mulsidi3, mulsidi3_700, mulsi3_highpart)
(umulsi3_highpart_i, umulsi3_highpart_int, umulsi3_highpart)
(umulsidi3, umulsidi3_700, cstoresi4, simple_return): Use it for
ARCv2.
(mulhisi3, mulhisi3_imm, mulhisi3_reg, umulhisi3, umulhisi3_imm)
(umulhisi3_reg, umulhisi3_reg, mulsi3_v2, nopv, bswapsi2)
(prefetch, divsi3, udivsi3 modsi3, umodsi3, arcset, arcsetltu)
(arcsetgeu, arcsethi, arcsetls, reload_*_load, reload_*_store)
(extzvsi): New pattern.
* config/arc/arc.opt: New ARCv2 options.
* config/arc/arcEM.md: New file.
* config/arc/arcHS.md: Likewise.
* config/arc/constraints.md (C3p): New constraint, accepts 1 and 2
values.
(Cm2): A signed 9-bit integer constant constraint.
(C62): An unsigned 6-bit integer constant constraint.
* config/arc/predicates.md (proper_comparison_operator): Select
the mode from a valid compare operand.
* config/arc/t-arc-newlib: Add ARCv2 multilib options.
* doc/invoke.texi: Add documentation for -mcpu=
-mcode-density and -mdiv-rem.


01-arcv2.patch
Description: 01-arcv2.patch

Re: [gomp4,committed] Remove release_dangling_ssa_names

2015-09-30 Thread Thomas Schwinge

Hi Tom!

On Wed, 30 Sep 2015 11:46:56 +0200, Tom de Vries  wrote:
> On 30/09/15 11:25, Thomas Schwinge wrote:
> > Don't we also want to commit the following change, which was part of your
> > trunk r227103 (883f001d2c3672e0674bec71f36a2052734a72cf) commit (and now
> > shows up as a delta between trunk and gomp-4_0-branch)?
> >
> > --- gcc/tree-cfg.c
> > +++ gcc/tree-cfg.c
> > @@ -6424,9 +6424,6 @@ replace_ssa_name (tree name, hash_map 
> > *vars_map,
> >   replace_by_duplicate_decl (, vars_map, to_context);
> >   new_name = make_ssa_name_fn (DECL_STRUCT_FUNCTION (to_context),
> >decl, SSA_NAME_DEF_STMT (name));
> > - if (SSA_NAME_IS_DEFAULT_DEF (name))
> > -   set_ssa_default_def (DECL_STRUCT_FUNCTION (to_context),
> > -decl, new_name);
> > }
> > else
> > new_name = copy_ssa_name_fn (DECL_STRUCT_FUNCTION (to_context),
> 
> Indeed, that bit is part of the patch "Don't create superfluous parm in 
> expand_omp_taskreg", but was dropped in the merge (probably because it 
> conflicted with the "Fix release_dangling_ssa_names" patch that I just 
> reverted).

Aha, so my fault after all.  ;-)

> So we need to apply it.

Committed to gomp-4_0-branch in r228285:

commit 4d7d168dc9aab3fa1ebf55bb3cb94d7b0477d639
Author: tschwinge 
Date:   Wed Sep 30 10:03:37 2015 +

More gcc/tree-cfg.c:replace_ssa_name cleanup

gcc/
* tree-cfg.c (replace_ssa_name): Revert obsolete changes.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@228285 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog.gomp | 4 
 gcc/tree-cfg.c | 3 ---
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
index 06440bf..c4033e0 100644
--- gcc/ChangeLog.gomp
+++ gcc/ChangeLog.gomp
@@ -1,3 +1,7 @@
+2015-09-30  Thomas Schwinge  
+
+   * tree-cfg.c (replace_ssa_name): Revert obsolete changes.
+
 2015-09-30  Tom de Vries  
 
* omp-low.c (release_dangling_ssa_names): Remove.
diff --git gcc/tree-cfg.c gcc/tree-cfg.c
index a3c3b20..500fc43 100644
--- gcc/tree-cfg.c
+++ gcc/tree-cfg.c
@@ -6424,9 +6424,6 @@ replace_ssa_name (tree name, hash_map 
*vars_map,
  replace_by_duplicate_decl (, vars_map, to_context);
  new_name = make_ssa_name_fn (DECL_STRUCT_FUNCTION (to_context),
   decl, SSA_NAME_DEF_STMT (name));
- if (SSA_NAME_IS_DEFAULT_DEF (name))
-   set_ssa_default_def (DECL_STRUCT_FUNCTION (to_context),
-decl, new_name);
}
   else
new_name = copy_ssa_name_fn (DECL_STRUCT_FUNCTION (to_context),


Grüße,
 Thomas


signature.asc
Description: PGP signature

[Patch] [x86_64] znver1 enablement

2015-09-30 Thread Kumar, Venkataramanan

Hi Maintainers,

The attached patch enables -march=znver1 (AMD family 17h Zen processor).  

Costs and tunings are copied from bdver4,  but we will be adjusting them later 
for znver1. 
Also a basic scheduler description for znver1 is added and we will update this 
as we get more information.   

Testing : 
GCC bootstrap and gcc regression passes on x86_64-pc-linux-gnu.
GCC bootstrap passed with  "make BOOT_CFLAGS= -O2 -g -march=znver1 -mno-adx 
-mno-mwaitx -mno-clzero -mno-sha -mno-clflushopt -mno-rdseed" on 
x86_64-pc-linux-gnu .

Built SPEC2006 benchmarks with -march=znver1 and ran it on bdver4 machine.  

Wrf and Calculix failed to compile but looks like a general register allocation 
issue not restricted to -march=znver1.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67717

ChangeLog:
* config.gcc (i[34567]86-*-linux* | ...): Add znver1.
(case ${target}): Add znver1.
* config/i386/cpuid.h(bit_CLZERO):  Define.
* config/i386/driver-i386.c: (host_detect_local_cpu): Let
-march=native recognize znver1 processors. 
* config/i386/i386-c.c (ix86_target_macros_internal): Add
znver1, clzero def_and_undef. 
* config/i386/i386.c (struct processor_costs znver1_cost): New.
(m_znver1): New definition.
(m_AMD_MULTIPLE): Includes m_znver1.
(processor_target_table): Add znver1 entry.
(ix86_target_string) : Add clzero entry.
(static const char *const cpu_names): Add znver1 entry.
(ix86_option_override_internal): Add znver1 instruction sets.
(PTA_CLZERO) :  New definition.
(ix86_option_override_internal): Handle new clzerooption.
(ix86_issue_rate): Add znver1.
(ix86_adjust_cost): Add znver1.
(get_builtin_code_for_version): Set priority for 
PROCESSOR_ZNVER1.  
(ia32_multipass_dfa_lookahead): Add znver1.
(enum processor_model): Add M_AMDFAM17H_znver1.
(struct _arch_names_table): Add M_AMDFAM17H_znver1.
(has_dispatch): Add znver1.   
* config/i386/i386.h (TARGET_znver1): New definition. 
(TARGET_CLZERO): Define.
(TARGET_CLZERO_P): Define.
(struct ix86_size_cost): Add TARGET_ZNVER1.
(enum processor_type): Add PROCESSOR_znver1.
* config/i386/i386.md (define_attr "cpu"): Add znver1.
(set_attr znver1_decode): New definitions for znver1.
* config/i386/i386.opt (flag_dispatch_scheduler): Add znver1.
(mclzero): New.
* config/i386/mmx.md (set_attr znver1_decode): New definitions
for znver1.
* config/i386/sse.md (set_attr znver1_decode): Likewise.
* config/i386/x86-tune.def:  Add znver1 tunings.
* config/i386/znver1.md: Introduce znver1 cpu and include new 
md file.
* gcc/doc/extend.texi: Add details about znver1.
* gcc/doc/invoke.texi: Add details about znver1.

Ok for trunk?

Regards,
Venkat.


diff --git a/gcc/config.gcc b/gcc/config.gcc
index 75807f5..a785a35 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -592,7 +592,7 @@ pentium4 pentium4m pentiumpro prescott iamcu"
 # 64-bit x86 processors supported by --with-arch=.  Each processor
 # MUST be separated by exactly one space.
 x86_64_archs="amdfam10 athlon64 athlon64-sse3 barcelona bdver1 bdver2 \
-bdver3 bdver4 btver1 btver2 k8 k8-sse3 opteron opteron-sse3 nocona \
+bdver3 bdver4 btver1 btver2 k8 k8-sse3 opteron opteron-sse3 nocona znver1 \
 core2 corei7 corei7-avx core-avx-i core-avx2 atom slm nehalem westmere \
 sandybridge ivybridge haswell broadwell bonnell silvermont knl x86-64 \
 native"
@@ -3105,6 +3105,10 @@ case ${target} in
 ;;
   i686-*-* | i786-*-*)
 case ${target_noncanonical} in
+  znver1-*)
+   arch=znver1
+   cpu=znver1
+   ;;
   bdver4-*)
 arch=bdver4
 cpu=bdver4
@@ -3218,6 +3222,10 @@ case ${target} in
 ;;
   x86_64-*-*)
 case ${target_noncanonical} in
+  znver1-*)
+   arch=znver1
+   cpu=znver1
+   ;;
   bdver4-*)
 arch=bdver4
 cpu=bdver4
diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h
index f3ad4db..fccdf1f 100644
--- a/gcc/config/i386/cpuid.h
+++ b/gcc/config/i386/cpuid.h
@@ -65,6 +65,9 @@
 #define bit_3DNOWP (1 << 30)
 #define bit_3DNOW  (1 << 31)
 
+/* %ebx.  */
+#define bit_CLZERO (1 << 0)
+
 /* Extended Features (%eax == 7) */
 /* %ebx */
 #define bit_FSGSBASE   (1 << 0)
diff --git a/gcc/config/i386/driver-i386.c b/gcc/config/i386/driver-i386.c
index ec4cbec..8123793 100644
--- a/gcc/config/i386/driver-i386.c
+++ b/gcc/config/i386/driver-i386.c
@@ -414,6 +414,7 @@ const char

Re: [PATCH] Unswitching outer loops.

2015-09-30 Thread Yuri Rumyantsev

Hi Richard,

I re-designed outer loop unswitching using basic idea of 23855 patch -
hoist invariant guard if loop is empty without guard. Note that this
was added to loop unswitching pass with simple modifications - using
another loop iterator etc.

Bootstrap and regression testing did not show any new failures.
What is your opinion?

Thanks.

ChangeLog:
2015-09-30  Yuri Rumyantsev  

* tree-ssa-loop-unswitch.c: Include "gimple-iterator.h" and
"cfghooks.h", add prototypes for introduced new functions.
(tree_ssa_unswitch_loops): Use from innermost loop iterator, move all
checks on ability of loop unswitching to tree_unswitch_single_loop;
invoke tree_unswitch_single_loop or tree_unswitch_outer_loop depending
on innermost loop check.
(tree_unswitch_single_loop): Add all required checks on ability of
loop unswitching under zero recursive level guard.
(tree_unswitch_outer_loop): New function.
(find_loop_guard): Likewise.
(empty_bb_without_guard_p): Likewise.
(used_outside_loop_p): Likewise.
(hoist_guard): Likewise.
(check_exit_phi): Likewise.

   gcc/testsuite/ChangeLog:
* gcc.dg/loop-unswitch-2.c: New test.

2015-09-16 11:26 GMT+03:00 Richard Biener :
> Yeah, as said, the patch wasn't fully ready and it also felt odd to do
> this hoisting in loop header copying.  Integrating it
> with LIM would be a better fit eventually.
>
> Note that we did agree to go forward with your original patch just
> making it more "generically" perform outer loop
> unswitching.  Did you explore that idea further?
>
>
>
> On Tue, Sep 15, 2015 at 6:00 PM, Yuri Rumyantsev  wrote:
>> Thanks Richard.
>>
>> I found one more issue that could not be fixed simply. In 23855 you
>> consider the following test-case:
>> void foo(int *ie, int *je, double *x)
>> {
>>   int i, j;
>>   for (j=0; j<*je; ++j)
>> for (i=0; i<*ie; ++i)
>>   x[i+j] = 0.0;
>> }
>> and proposed to hoist up a check on *ie out of loop. It requires
>> memref alias analysis since in general x and ie can alias (if their
>> types are compatible - int *ie & int * x). Such analysis is performed
>> by pre or lim passes. Without such analysis we can not hoist a test on
>> non-zero for *ie out of loop using 238565 patch.
>>  The second concern is that proposed copy header algorithm changes
>> loop structure significantly and it is not accepted by vectorizer
>> since latch is not empty (such transformation assumes loop peeling for
>> one iteration. So I can propose to implement simple guard hoisting
>> without copying header and tail blocks (if it is possible).
>>
>> I will appreciate you for any advice or help since without such
>> hoisting we are not able to perform outer loop vectorization for
>> important benchmark.
>> and
>>
>> 2015-09-15 14:22 GMT+03:00 Richard Biener :
>>> On Thu, Sep 3, 2015 at 6:32 PM, Yuri Rumyantsev  wrote:
 Hi Richard,

 I started learning, tuning and debugging patch proposed in 23855 and
 discovered thta it does not work properly.
 So I wonder is it tested patch and it should work?
>>>
>>> I don't remember, but as it wasn't committed it certainly wasn't ready.
>>>
 Should it accept for hoisting the following loop nest
   for (i=0; i>>
>>> if m is equal to 0 then we still have the c[i] = s store, no?  Of course
>>> we could unswitch the outer loop on m == 0 but simple hoisting wouldn't 
>>> work.
>>>
>>> Richard.
>>>
 2015-08-03 10:27 GMT+03:00 Richard Biener :
> On Fri, Jul 31, 2015 at 1:17 PM, Yuri Rumyantsev  
> wrote:
>> Hi Richard,
>>
>> I learned your updated patch for 23825 and it is more general in
>> comparison with my.
>> I'd like to propose you a compromise - let's consider my patch only
>> for force-vectorize outer loop only to allow outer-loop
>> vecctorization.
>
> I don't see why we should special-case that if the approach in 23825
> is sensible.
>
>> Note that your approach will not hoist invariant
>> guards if loops contains something else except for inner-loop, i.e. it
>> won't be empty for taken branch.
>
> Yes, it does not perform unswitching but guard hoisting.  Note that this
> is originally Zdenek Dvoraks patch.
>
>> I also would like to answer on your last question - CFG cleanup is
>> invoked to perform deletion of single-argument phi nodes from tail
>> block through substitution - such phi's prevent outer-loop
>> vectorization. But it is clear that such transformation can be done
>> other pass.
>
> Hmm, I wonder why the copy_prop pass after unswitching does not
> get rid of them?
>
>> What is your opinion?
>
> My opinion is that if we

Re: [patch] libstdc++/67747 Allocate space for dirent::d_name

2015-09-30 Thread Jonathan Wakely


On 29/09/15 12:54 -0600, Martin Sebor wrote:

On 09/29/2015 05:37 AM, Jonathan Wakely wrote:

POSIX says that dirent::d_name has an unspecified length, so calls to
readdir_r must pass a buffer with enough trailing space for
{NAME_MAX}+1 characters. I wasn't doing that, which works OK on
GNU/Linux and BSD where d_name is a large array, but fails on Solaris
32-bit.

This uses pathconf to get NAME_MAX and allocates a buffer.

Tested powerpc64le-linux and x86_64-dragonfly4.1, I'm going to commit
this to trunk today (and backport all the filesystem fixes to
gcc-5-branch).


Calling pathconf is only necessary when _POSIX_NO_TRUNC is zero
which I think exists mainly for legacy file systems. Otherwise,
it's safe to use NAME_MAX instead. Avoiding the call to pathconf


Oh, nice. I was using NAME_MAX originally but the glibc readdir_r(3)
man-page has an example using pathconf and says that should be used,
so I went down that route.


also avoids the TOCTOU between it and the call to opendir, and


Also nice.


hardcoding the value makes it possible to avoid dynamically
allocating the dirent buffer.


Can we be sure NAME_MAX will never be too big for the stack?

As currently written _Dir::advance() can call itself recursively to
skip the . and .. directories, so if we put the dirent buffer on the
stack then maybe we should re-use the same one not create three large
frames.


I didn't remember the MAX_PATH value on Windows anymore but from
what I've just read online it sounds like it's defined to 260.


Yes, I found that value, but I think I found something saying the
individual components were limited to 255. I'll make it 260 anyway. I
think that might relate to UTF-16 characters, so we'd need a larger
buffer for narrow characters, but I'm not sure what mingw supports
here anyway. The Windows implementations are just stubs where someone
who cares can add working code.


Defaulting to 255 on POSIX is appropriate. On XSI systems, the
minimum required value is _XOPEN_NAME_MAX which is 255 (I would
suggest using the macro instead when it's defined). Otherwise,
the strictly conforming minimum value would be 14 -- the value
of _POSIX_NAME_MAX, but since 255 is greater it's fine.

Other than that, I tend to be leery of using plain char arrays
as buffers for objects of bigger types. I don't know to what
extent this is a problem for libstdc++ anymore as more and more
hardware is tolerant of misaligned accesses and as the default
new expression typically returns memory suitably aligned for
the largest fundamental type. But since there is no requirement
in the language that it do so and I would tend to err on the
side of caution and use operator new (as opposed to
new char[len]).


For some reason I thought new char[len] would be suitably aligned for
any type with sizeof <= len, but that's only true for operator new.
(I should check I haven't made the same assumption elsewhere in the
library!)

std::aligned_union::type
will give us what we need.



Martin

PS I'm interpreting _POSIX_NO_TRUNC being zero as more
restrictive than if it was non-zero and so calling pathconf(p,
_PC_NO_TRUNC) should be required to also return non-zero for
such an implementation, regardless of p. But let me check that
I'm reading it right.

Re: [gomp4,committed] Remove release_dangling_ssa_names

2015-09-30 Thread Tom de Vries


On 30/09/15 11:25, Thomas Schwinge wrote:

Don't we also want to commit the following change, which was part of your
trunk r227103 (883f001d2c3672e0674bec71f36a2052734a72cf) commit (and now
shows up as a delta between trunk and gomp-4_0-branch)?

--- gcc/tree-cfg.c
+++ gcc/tree-cfg.c
@@ -6424,9 +6424,6 @@ replace_ssa_name (tree name, hash_map 
*vars_map,
  replace_by_duplicate_decl (, vars_map, to_context);
  new_name = make_ssa_name_fn (DECL_STRUCT_FUNCTION (to_context),
   decl, SSA_NAME_DEF_STMT (name));
- if (SSA_NAME_IS_DEFAULT_DEF (name))
-   set_ssa_default_def (DECL_STRUCT_FUNCTION (to_context),
-decl, new_name);
}
else
new_name = copy_ssa_name_fn (DECL_STRUCT_FUNCTION (to_context),


Indeed, that bit is part of the patch "Don't create superfluous parm in 
expand_omp_taskreg", but was dropped in the merge (probably because it 
conflicted with the "Fix release_dangling_ssa_names" patch that I just 
reverted).


So we need to apply it.

Thanks,
- Tom

[PATCH] Fix PR67037: Wrong code at -O1 and above on ARM

2015-09-30 Thread Bernd Edlinger

Hi,


I noticed, that this wrong code bug is caused by two insns which are created by 
the
function process_addr_reg in lra-constraints.c, which share a common sub 
expression.
The fix is rather simple, use copy_rtx when the memory reference is used twice.

The patch was boot-strapped and regression-tested
on x86_64-pc-linux-gnu and arm-linux-gnueabihf.

OK for trunk?


Thanks
Bernd.
  gcc:
2015-09-30  Bernd Edlinger  

PR rtl-optimization/67037
* lra-constraints.c (process_addr_reg): Use copy_rtx when necessary.

testsuite:
2015-09-30  Bernd Edlinger  

PR rtl-optimization/67037
* gcc.c-torture/execute/pr67037.c: New test.


patch-pr67037.diff
Description: Binary data

Re: [gomp4.1] Doacross tweaks

2015-09-30 Thread Ilya Verbin

Hi!

On Fri, Sep 25, 2015 at 18:54:47 +0200, Jakub Jelinek wrote:
> --- gcc/tree-pretty-print.c.jj2015-09-03 16:35:58.0 +0200
> +++ gcc/tree-pretty-print.c   2015-09-25 15:04:46.911844111 +0200
> @@ -569,7 +569,9 @@ dump_omp_clause (pretty_printer *pp, tre
>   if (TREE_PURPOSE (t) != integer_zero_node)
> {
>   tree p = TREE_PURPOSE (t);
> - if (!wi::neg_p (p, TYPE_SIGN (TREE_TYPE (p
> + if (OMP_CLAUSE_DEPEND_SINK_NEGATIVE (t))
> +   pp_minus (pp);
> + else
> pp_plus (pp);
>   dump_generic_node (pp, TREE_PURPOSE (t), spc, flags,
>  false);

This caused a warning:

gcc/tree-pretty-print.c: In function ‘void dump_omp_clause(pretty_printer*, 
tree, int, int)’:
gcc/tree-pretty-print.c:571:12: error: unused variable ‘p’ 
[-Werror=unused-variable]
   tree p = TREE_PURPOSE (t);
^

  -- Ilya

Re: [PATCH] Fix undefined behaviour in SH port

2015-09-30 Thread Oleg Endo

On Tue, 2015-09-29 at 10:41 -0600, Jeff Law wrote:
> More left shifts of negative signed values to fix in the SH port.  I'm 
> not sure how these were missed last week or if they were introduced 
> between the point when I tested last week and yesterday.  Regardless, 
> they're fixed in the obvious way.
> 
> Tested by building all the sh targets form config-all.mk.
> 
> Installed on the trunk.

Thanks!

Cheers,
Oleg

Re: Use gcc/coretypes.h:enum offload_abi in mkoffloads

2015-09-30 Thread Thomas Schwinge

Hi!

On Mon, 28 Sep 2015 12:19:41 +0200, Bernd Schmidt  wrote:
> >  Use gcc/coretypes.h:enum offload_abi in mkoffloads

> > +  abort ();
> 
> Can we have gcc_unreachable() in these tools?

Good suggestion, thanks!

> Other than that, it looks ok but it also doesn't seem to do anything. 

Thanks for the review.  This refactoring patch happend to come into
existance when I worked on the other mkoffload patches that I recently
posted, and as I considered it an improvement in its own right, I posted
it.

> Are you intending to add more ABIs?

As I had quoted in my previous email, H.J. Lu expressed an interest in
supporting the x32 ABI,
,
which is now more easy to do than before.

With the abort calls replaced with gcc_unreachable, committed in r228283:

commit dc0452858cb0d7f56fada1bb2f795f92cd551795
Author: tschwinge 
Date:   Wed Sep 30 08:58:04 2015 +

Use gcc/coretypes.h:enum offload_abi in mkoffloads

gcc/
* config/i386/intelmic-mkoffload.c (target_ilp32): Remove
variable, replacing it with...
(offload_abi): ... this new variable.  Adjust all users.
* config/nvptx/mkoffload.c (target_ilp32, offload_abi): Likewise.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@228283 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog|  7 +++
 gcc/config/i386/intelmic-mkoffload.c | 90 
 gcc/config/nvptx/mkoffload.c | 56 +++---
 3 files changed, 108 insertions(+), 45 deletions(-)

diff --git gcc/ChangeLog gcc/ChangeLog
index e24c7bc..d29e5d9 100644
--- gcc/ChangeLog
+++ gcc/ChangeLog
@@ -1,3 +1,10 @@
+2015-09-30  Thomas Schwinge  
+
+   * config/i386/intelmic-mkoffload.c (target_ilp32): Remove
+   variable, replacing it with...
+   (offload_abi): ... this new variable.  Adjust all users.
+   * config/nvptx/mkoffload.c (target_ilp32, offload_abi): Likewise.
+
 2015-09-30  Matthias Klose  
 
* configure.ac: Remove extraneous ;;.
diff --git gcc/config/i386/intelmic-mkoffload.c 
gcc/config/i386/intelmic-mkoffload.c
index 4a7812c..065d408 100644
--- gcc/config/i386/intelmic-mkoffload.c
+++ gcc/config/i386/intelmic-mkoffload.c
@@ -42,8 +42,7 @@ int num_temps = 0;
 const int MAX_NUM_TEMPS = 10;
 const char *temp_files[MAX_NUM_TEMPS];
 
-/* Shows if we should compile binaries for i386 instead of x86-64.  */
-bool target_ilp32 = false;
+enum offload_abi offload_abi = OFFLOAD_ABI_UNSET;
 
 /* Delete tempfiles and exit function.  */
 void
@@ -200,10 +199,17 @@ out:
 static void
 compile_for_target (struct obstack *argv_obstack)
 {
-  if (target_ilp32)
-obstack_ptr_grow (argv_obstack, "-m32");
-  else
-obstack_ptr_grow (argv_obstack, "-m64");
+  switch (offload_abi)
+{
+case OFFLOAD_ABI_LP64:
+  obstack_ptr_grow (argv_obstack, "-m64");
+  break;
+case OFFLOAD_ABI_ILP32:
+  obstack_ptr_grow (argv_obstack, "-m32");
+  break;
+default:
+  gcc_unreachable ();
+}
   obstack_ptr_grow (argv_obstack, NULL);
   char **argv = XOBFINISH (argv_obstack, char **);
 
@@ -379,10 +385,17 @@ generate_host_descr_file (const char *host_compiler)
   new_argv[new_argc++] = "-c";
   new_argv[new_argc++] = "-fPIC";
   new_argv[new_argc++] = "-shared";
-  if (target_ilp32)
-new_argv[new_argc++] = "-m32";
-  else
-new_argv[new_argc++] = "-m64";
+  switch (offload_abi)
+{
+case OFFLOAD_ABI_LP64:
+  new_argv[new_argc++] = "-m64";
+  break;
+case OFFLOAD_ABI_ILP32:
+  new_argv[new_argc++] = "-m32";
+  break;
+default:
+  gcc_unreachable ();
+}
   new_argv[new_argc++] = src_filename;
   new_argv[new_argc++] = "-o";
   new_argv[new_argc++] = obj_filename;
@@ -442,10 +455,17 @@ prepare_target_image (const char *target_compiler, int 
argc, char **argv)
   objcopy_argv[3] = "-I";
   objcopy_argv[4] = "binary";
   objcopy_argv[5] = "-O";
-  if (target_ilp32)
-objcopy_argv[6] = "elf32-i386";
-  else
-objcopy_argv[6] = "elf64-x86-64";
+  switch (offload_abi)
+{
+case OFFLOAD_ABI_LP64:
+  objcopy_argv[6] = "elf64-x86-64";
+  break;
+case OFFLOAD_ABI_ILP32:
+  objcopy_argv[6] = "elf32-i386";
+  break;
+default:
+  gcc_unreachable ();
+}
   objcopy_argv[7] = target_so_filename;
   objcopy_argv[8] = "--rename-section";
   objcopy_argv[9] = rename_section_opt;
@@ -518,17 +538,22 @@ main (int argc, char **argv)
  passed with @file.  Expand them into argv before processing.  */
   expandargv (, );
 
-  /* Find out whether we should compile binaries for i386 or x86-64.  */
-  for (int i = argc - 1; i > 0; i--)
-if (strncmp (argv[i], "-foffload-abi=", sizeof ("-foffload-abi=") - 1) == 
0)
-  {
-   if (strstr

Re: [gomp4,committed] Remove release_dangling_ssa_names

2015-09-30 Thread Thomas Schwinge

Hi Tom!

On Wed, 30 Sep 2015 08:17:04 +0200, Tom de Vries  wrote:
> [ was: Re: [PATCH] Don't create superfluous parm in expand_omp_taskreg ]
> On 24/09/15 11:02, Thomas Schwinge wrote:
> > On Thu, 24 Sep 2015 08:36:27 +0200, Tom de Vries  
> > wrote:
> >> >On 24/09/15 08:23, Thomas Schwinge wrote:
> >>> > >On Tue, 11 Aug 2015 20:53:39 +0200, Tom de 
> >>> > >Vries  wrote:
>  > >>Don't create superfluous parm in expand_omp_taskreg
>  > >>
>  > >>2015-08-11  Tom de Vries
>  > >>
>  > >> * omp-low.c (expand_omp_taskreg): If in ssa, set rhs of parcopy 
>  > >> stmt to
>  > >> parm_decl, rather than generating a dummy default def in cfun.
>  > >> * tree-cfg.c (replace_ssa_name): Assume no default defs.  Make 
>  > >> sure
>  > >> ssa_name from cfun and child_fn do not share a stmt as def stmt.
>  > >> (move_stmt_op): Handle PARM_DECl.
>  > >> (gather_ssa_name_hash_map_from): New function.
>  > >> (move_sese_region_to_fn): Add default defs for function params, 
>  > >> and add
>  > >> them to vars_map.  Release copied ssa names.
>  > >> * tree-cfg.h (gather_ssa_name_hash_map_from): Declare.
> >>> > >
> >>> > >Do I understand correct that with this change present on trunk (which 
> >>> > >I'm
> >>> > >currently merging into gomp-4_0-branch), the changes you've earlier 
> >>> > >done
> >>> > >on gomp-4_0-branch to gcc/omp-low.c:release_dangling_ssa_names,
> >>> > >gcc/tree-cfg.c:replace_ssa_name, should now be reverted?  That is, how
> >>> > >much of the following patches can be reverted now (listed backwards in
> >>> > >time)?
> >> >
> >> >indeed, in the above commit we release the dangling ssa names in
> >> >move_sese_region_to_fn. So after committing this patch to the
> >> >gomp-4_0-branch, the call to release_dangling_ssa_names is no longer
> >> >necessary, and the function release_dangling_ssa_names can be removed.
> 
> 
> 
> >   Well, I'm asking because in my merge tree, I'm running
> >into an assertion that you added there -- not sure yet whether I've
> >done something wrong, though.
> 
> The source of the problem was

Thanks for quickly having provided me with a patch!

> in expand_omp_target, which needed similar 
> changes as expand_omp_taskreg got in the "Don't create superfluous parm 
> in expand_omp_taskreg" patch.

(For the curious, such a patch is not yet needed on trunk, where
expand_omp_target does not yet need to support the "gimple_in_ssa_p"
case.)

> Now that the merge ( 
> https://gcc.gnu.org/viewcvs/gcc/branches/gomp-4_0-branch/gcc/omp-low.c?limit_changes=0=228091=228090=228091
>  
> ) contains that change, I've committed these two patches to gomp-4_0-branch:
> - Revert "Fix release_dangling_ssa_names"
>(Reverting an earlier attempt to handle the
>release_dangling_ssa_names TODO, which was committed to the
>gomp-4_0-branch)
> - Remove release_dangling_ssa_names

Don't we also want to commit the following change, which was part of your
trunk r227103 (883f001d2c3672e0674bec71f36a2052734a72cf) commit (and now
shows up as a delta between trunk and gomp-4_0-branch)?

--- gcc/tree-cfg.c
+++ gcc/tree-cfg.c
@@ -6424,9 +6424,6 @@ replace_ssa_name (tree name, hash_map 
*vars_map,
  replace_by_duplicate_decl (, vars_map, to_context);
  new_name = make_ssa_name_fn (DECL_STRUCT_FUNCTION (to_context),
   decl, SSA_NAME_DEF_STMT (name));
- if (SSA_NAME_IS_DEFAULT_DEF (name))
-   set_ssa_default_def (DECL_STRUCT_FUNCTION (to_context),
-decl, new_name);
}
   else
new_name = copy_ssa_name_fn (DECL_STRUCT_FUNCTION (to_context),


Grüße,
 Thomas


signature.asc
Description: PGP signature

Re: [RFA][PATCH] Fix building cr16-elf with trunk compiler

2015-09-30 Thread Bernd Schmidt


On 09/29/2015 11:49 PM, Jeff Law wrote:


This code from builtins.c:

   /* If we don't need too much alignment, we'll have been guaranteed
  proper alignment by get_trampoline_type.  */
   if (TRAMPOLINE_ALIGNMENT <= STACK_BOUNDARY)
 return tramp;


It's entirely conceivable that TRAMPOLINE_ALIGNMENT will be the same as
STACK_BOUNDARY.  And if they are, then -Wtautological-compare will
complain bitterly.


Eww. Can we fix the warning not to complain when the comparison involves 
macros?



Bernd

Re: [RFA][PATCH] Fix building cr16-elf with trunk compiler

2015-09-30 Thread Marek Polacek

On Wed, Sep 30, 2015 at 02:41:36PM +0200, Bernd Schmidt wrote:
> On 09/29/2015 11:49 PM, Jeff Law wrote:
> >
> >This code from builtins.c:
> >
> >   /* If we don't need too much alignment, we'll have been guaranteed
> >  proper alignment by get_trampoline_type.  */
> >   if (TRAMPOLINE_ALIGNMENT <= STACK_BOUNDARY)
> > return tramp;
> >
> >
> >It's entirely conceivable that TRAMPOLINE_ALIGNMENT will be the same as
> >STACK_BOUNDARY.  And if they are, then -Wtautological-compare will
> >complain bitterly.
> 
> Eww. Can we fix the warning not to complain when the comparison involves
> macros?

It already has

  /* Don't warn for various macro expansions.  */
  if (from_macro_expansion_at (loc) 
  || from_macro_expansion_at (EXPR_LOCATION (lhs))
  || from_macro_expansion_at (EXPR_LOCATION (rhs)))
return;

and also

  /* We do not warn for constants because they are typical of macro
 expansions that test for features, sizeof, and similar.  */
  if (CONSTANT_CLASS_P (lhs) || CONSTANT_CLASS_P (rhs))
return;

so why does it warn? :(

Marek

Re: Fold acc_on_device

2015-09-30 Thread Richard Biener

On Wed, Sep 30, 2015 at 2:18 PM, Nathan Sidwell  wrote:
> On 09/30/15 04:07, Richard Biener wrote:
>>
>> On Tue, Sep 29, 2015 at 8:21 PM, Nathan Sidwell  wrote:
>>>
>>> This patch folds acc_on_device as a regular builtin, but postponed until
>>> we
>>> know which compiler we're in.  As suggested by Bernd, we use the existing
>>> builtin folding machinery.
>>>
>>> Trunk is still using  the older PTX runtime scheme (Thomas is working on
>>> that), so the only change there is in the  host-side libgomp piece.
>>>
>>> Ok for trunk?
>>
>>
>> Please don't add any new GENERIC based builtin folders.  Instead add to
>> gimple-fold.c:gimple_fold_builtin
>>
>> Otherwise you're just generating more work for us who move foldings from
>> builtins.c to gimple-fold.c.
>
>
> Oh, sorry, I didn't know about that.  Will fix.
>
> Should I use the same
>  if (symtab->state == EXPANSION)
> test to make sure we're after LTO read back (i.e. know which compiler we're
> in), or is there another way?

I don't know of a better way, no.  I'll add a comment to builtins.c
(not that I expect anyone sees it ;))

Richard.

>
> nathan
>

Re: [patch] Leave errno unchanged by successful std::stoi etc

2015-09-30 Thread Jonathan Wakely


On 29/09/15 10:50 -0600, Martin Sebor wrote:

On 09/29/2015 10:15 AM, Jakub Jelinek wrote:

On Tue, Sep 29, 2015 at 05:10:20PM +0100, Jonathan Wakely wrote:

That looks wrong to me, you only restore errno if you don't throw :(.
If you throw, then errno might remain 0, which is IMHO undesirable.


My thinking was that a failed conversion that throws an exception
should be allowed to modify errno, and that the second case sets it to
ERANGE sometimes anyway.


Well, you can modify errno, you just shouldn't change it from non-zero to
zero as far as the user is concerned.

http://pubs.opengroup.org/onlinepubs/009695399/functions/errno.html
"No function in this volume of IEEE Std 1003.1-2001 shall set errno to 0."
Of course, this part of STL is not POSIX, still, as you said, it would be
nice to guarantee the same.


FWIW, I agree. It's a helpful property. If libstdc++ provides
the POSIC/C guarantee it would be nice to document it in the
manual.

That said, this part of the C++ spec (stoi and related) is specified
to such a level of detail that one might argue that the functions
aren't allowed to reset errno in an observable way.


Yes, looking at the spec again it's not as simple as my original patch
or as simple as always restoring the previous value.

We are required to call one of the strtoX functions from the C
library, and if that sets errno C++ says nothing about resetting it,
see http://eel.is/c++draft/string.conversions

So we need to zero it to reliably detect ERANGE being set, but we
should undo that if it doesn't get set (i.e. is still zero).  But if
the strtoX function sets errno to anything non-zero then it looks as
though that should be left unchanged.

So the scope-guard type should do:

   ~_Restore_errno() { if (errno == 0) errno = _M_errno; }


I noticed that we're also zeroing errno in the generic c_locale.cc
file so we need something similar there.


As an aside, I objected to this specification when it was first
proposed, not because of the errno guarantee, but because the
functions were meant to be light-weight, efficient, and certainly
thread-safe means of converting strings to numbers. Specifying
their effects as opposed to their postconditions means that can't
be implemented independent of strtol and the C locale, which makes
them anything but light-weight, and prone to data races in
programs that call setlocale.


Agreed, but that's not a cause I want to fight for :-)

commit 502928c8061343e82e982e06299c11d465f64b6c
Author: Jonathan Wakely 
Date:   Wed Sep 30 14:10:58 2015 +0100

Save-and-restore errno more carefully in libstdc++

	* doc/xml/manual/diagnostics.xml: Document use of errno.
	* config/locale/generic/c_locale.cc (_Save_errno): New helper.
	(__convert_to_v): Use _Save_errno.
	* include/ext/string_conversions.h (__stoa): Only restore errno when
	it isn't set to non-zero.

diff --git a/libstdc++-v3/config/locale/generic/c_locale.cc b/libstdc++-v3/config/locale/generic/c_locale.cc
index 6da5f22..8dfea6b 100644
--- a/libstdc++-v3/config/locale/generic/c_locale.cc
+++ b/libstdc++-v3/config/locale/generic/c_locale.cc
@@ -44,6 +44,16 @@ namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
+  namespace
+  {
+struct _Save_errno
+{
+  _Save_errno() : _M_errno(errno) { errno = 0; }
+  ~_Save_errno() { if (errno == 0) errno = _M_errno; }
+  int _M_errno;
+};
+  }
+
   template<>
 void
 __convert_to_v(const char* __s, float& __v, ios_base::iostate& __err,
@@ -59,7 +69,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   bool __overflow = false;
 
 #if !__FLT_HAS_INFINITY__
-  errno = 0;
+  const _Save_errno __save_errno;
 #endif
 
 #ifdef _GLIBCXX_HAVE_STRTOF
@@ -123,7 +133,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   char* __sanity;
 
 #if !__DBL_HAS_INFINITY__
-  errno = 0;
+  const _Save_errno __save_errno;
 #endif
 
   __v = strtod(__s, &__sanity);
@@ -167,7 +177,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   setlocale(LC_ALL, "C");
 
 #if !__LDBL_HAS_INFINITY__
-  errno = 0;
+  const _Save_errno __save_errno;
 #endif
 
 #if defined(_GLIBCXX_HAVE_STRTOLD) && !defined(_GLIBCXX_HAVE_BROKEN_STRTOLD)
diff --git a/libstdc++-v3/doc/xml/manual/diagnostics.xml b/libstdc++-v3/doc/xml/manual/diagnostics.xml
index 88ed2e2..3ceb5b3 100644
--- a/libstdc++-v3/doc/xml/manual/diagnostics.xml
+++ b/libstdc++-v3/doc/xml/manual/diagnostics.xml
@@ -71,6 +71,38 @@
   
 
 
+Use of errno by the library
+
+  
+The C and POSIX standards guarantee that errno
+is never set to zero by any library function.
+The C++ standard has less to say about when errno
+is or isn't set, but libstdc++ follows the same rule and never sets
+it to zero.
+  
+
+  
+On the other hand, there are few guarantees about when the C++ library
+sets errno on error, beyond what is specified for
+functions that come from the C library.
+For example, when std::stoi throws an

Re: [RS6000] Make -msingle-pic-base remove the ELFv2 global entry code

2015-09-30 Thread David Edelsohn

On Wed, Sep 30, 2015 at 2:11 AM, Alan Modra  wrote:
> For other ABIs, -msingle-pic-base makes gcc omit loading of the PIC
> register in function prologues.  This patch makes the option affect
> ELFv2 too.
>
> I wrote a patch like this during the initial ELFv2 effort, but there
> were many more important patches to push and this one somehow got
> dropped.  Dusted off and retested at the request of powerpc64 kernel
> people who'd like an option to disable ELFv2 global entry code for
> the kernel.  OK mainline?
>
> * config/rs6000/rs6000.c (rs6000_emit_prologue): Don't set
> r2_setup_needed when TARGET_SINGLE_PIC_BASE.
> (rs6000_output_mi_thunk): Likewise.

Okay.

Thanks, David

Re: JIT breakage after last builtin-types change

2015-09-30 Thread Ulrich Drepper

Except that this is missing:

diff --git a/gcc/jit/jit-builtins.h b/gcc/jit/jit-builtins.h
index 0b6f974..3d76247 100644
--- a/gcc/jit/jit-builtins.h
+++ b/gcc/jit/jit-builtins.h
@@ -72,6 +72,7 @@ enum jit_builtin_type
 #undef DEF_FUNCTION_TYPE_VAR_3
 #undef DEF_FUNCTION_TYPE_VAR_4
 #undef DEF_FUNCTION_TYPE_VAR_5
+#undef DEF_FUNCTION_TYPE_VAR_6
 #undef DEF_FUNCTION_TYPE_VAR_7
 #undef DEF_POINTER_TYPE
   BT_LAST


On Wed, Sep 30, 2015 at 9:09 AM, Jakub Jelinek  wrote:
> On Wed, Sep 30, 2015 at 09:05:45AM -0400, Ulrich Drepper wrote:
>> After some recent additions to builtin-types.def the jit user of the
>> definitions hasn't been updated.  OK to apply?
>>
>>
>> 2015-09-30  Ulrich Drepper  
>>
>>   * jit-builtins.c: Provide definition of DEF_FUNCTION_TYPE_VAR_6.
>>   * jit-builtins.h: Likewise.
>
> https://gcc.gnu.org/viewcvs?rev=228289=gcc=rev should fix this
> already.
>
> Jakub

Re: Openacc launch API

2015-09-30 Thread Matthias Klose


On 30.09.2015 14:40, Bernd Schmidt wrote:

On 09/30/2015 02:37 PM, Matthias Klose wrote:


this broke the jit build.

The following patch fixes the build for me. Ok to commit?

   Matthias

2015-09-30  Matthias Klose  

 * jit-builtins.h Define DEF_FUNCTION_TYPE_VAR_6,
 remove DEF_FUNCTION_TYPE_VAR_11.
 * jit-builtins.c (builtins_manager::make_type): Define and handle
 DEF_FUNCTION_TYPE_VAR_6, remove DEF_FUNCTION_TYPE_VAR_11.


Yeah, I think that qualifies as obvious.


Ok, committed.

Re: [PATCH] PR66870 PowerPC64 Enable gold linker with split stack

2015-09-30 Thread Lynn A. Boger


Any update on this patch?

On 09/18/2015 07:48 AM, David Edelsohn wrote:

On Thu, Sep 17, 2015 at 3:13 PM, Lynn A. Boger
 wrote:

Here is my updated patch, with the changes suggested by
Ian for gcc/gospec.c and David for gcc/configure.ac.

Bootstrap built and tested on ppc64le, ppc64 multilib.

2015-09-17Lynn Boger 
gcc/
 PR target/66870
 config/rs6000/sysv4.h:  Define TARGET_CAN_SPLIT_STACK_64BIT
 config.in:  Set up HAVE_GOLD_ALTERNATE_SPLIT_STACK
 configure.ac:  Define HAVE_GOLD_ALTERNATE_SPLIT_STACK
 on Power based on gold linker version
 configure:  Regenerate
 gcc.c:  Add -fuse-ld=gold to STACK_SPLIT_SPEC if
 HAVE_GOLD_ALTERNATE_SPLIT_STACK defined
 go/gospec.c:  (lang_specific_driver):  Set appropriate split
stack
 options for 64 bit compiles based on
TARGET_CAN_SPLIT_STACK_64BIT

The rs6000 bits are okay with me.

Ian needs to approve the go bits.  And Ian or a configure maintainer
needs to approve the other bits.

Thanks, David

Re: [PATCH] vectorizing conditional expressions (PR tree-optimization/65947)

2015-09-30 Thread Richard Biener

On Wed, Sep 23, 2015 at 5:51 PM, Alan Hayward  wrote:
>
>
> On 18/09/2015 14:53, "Alan Hayward"  wrote:
>
>>
>>
>>On 18/09/2015 14:26, "Alan Lawrence"  wrote:
>>
>>>On 18/09/15 13:17, Richard Biener wrote:

 Ok, I see.

 That this case is already vectorized is because it implements MAX_EXPR,
 modifying it slightly to

 int foo (int *a)
 {
int val = 0;
for (int i = 0; i < 1024; ++i)
  if (a[i] > val)
val = a[i] + 1;
return val;
 }

 makes it no longer handled by current code.

>>>
>>>Yes. I believe the idea for the patch is to handle arbitrary expressions
>>>like
>>>
>>>int foo (int *a)
>>>{
>>>int val = 0;
>>>for (int i = 0; i < 1024; ++i)
>>>  if (some_expression (i))
>>>val = another_expression (i);
>>>return val;
>>>}
>>
>>Yes, that’s correct. Hopefully my new test cases should cover everything.
>>
>
> Attached is a new version of the patch containing all the changes
> requested by Richard.

+  /* Compare the max index vector to the vector of found indexes to find
+the postion of the max value.  This will result in either a single
+match or all of the values.  */
+  tree vec_compare = make_ssa_name (index_vec_type_signed);
+  gimple vec_compare_stmt = gimple_build_assign (vec_compare, EQ_EXPR,
+induction_index,
+max_index_vec);

I'm not sure all targets can handle this.  If I deciper the code
correctly then we do

  mask = induction_index == max_index_vec;
  vec_and = mask & vec_data;

plus some casts.  So this is basically

  vec_and = induction_index == max_index_vec ? vec_data : {0, 0, ... };

without the need to relate the induction index vector type to the data
vector type.
I believe this is also the form all targets support.

I am missing a comment before all this code-generation that shows the transform
result with the variable names used in the code-gen.  I have a hard
time connecting
things here.

+  tree matched_data_reduc_cast = build1 (VIEW_CONVERT_EXPR, scalar_type,
+matched_data_reduc);
+  epilog_stmt = gimple_build_assign (new_scalar_dest,
+matched_data_reduc_cast);
+  new_temp = make_ssa_name (new_scalar_dest, epilog_stmt);
+  gimple_assign_set_lhs (epilog_stmt, new_temp);

this will leave the stmt unsimplified.  scalar sign-changes should use NOP_EXPR,
not VIEW_CONVERT_EXPR.  The easiest fix is to use fold_convert instead.
Also just do like before - first make_ssa_name and then directly use it in the
gimple_build_assign.

The patch is somewhat hard to parse with all the indentation changes.  A context
diff would be much easier to read in those contexts.

+  if (v_reduc_type == COND_REDUCTION)
+{
+  widest_int ni;
+
+  if (! max_loop_iterations (loop, ))
+   {
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_NOTE, vect_location,
+"loop count not known, cannot create cond "
+"reduction.\n");

ugh.  That's bad.

+  /* The additional index will be the same type as the condition.  Check
+that the loop can fit into this less one (because we'll use up the
+zero slot for when there are no matches).  */
+  tree max_index = TYPE_MAX_VALUE (cr_index_scalar_type);
+  if (wi::geu_p (ni, wi::to_widest (max_index)))
+   {
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_NOTE, vect_location,
+"loop size is greater than data size.\n");
+ return false;

Likewise.

@@ -5327,6 +5540,8 @@ vectorizable_reduction (gimple stmt,
gimple_stmt_iterator *gsi,
   if (dump_enabled_p ())
 dump_printf_loc (MSG_NOTE, vect_location, "transform reduction.\n");

+  STMT_VINFO_TYPE (stmt_info) = reduc_vec_info_type;
+
   /* FORNOW: Multiple types are not supported for condition.  */
   if (code == COND_EXPR)

this change looks odd (or wrong).  The type should be _only_ set/changed during
analysis.

+
+  /* For cond reductions we need to add an additional conditional based on
+the loop index.  */
+  if (v_reduc_type == COND_REDUCTION)
+   {
+ int nunits_out = TYPE_VECTOR_SUBPARTS (vectype_out);
+ int k;
...
+ STMT_VINFO_VECTYPE (index_vec_info) = cr_index_vector_type;
+ set_vinfo_for_stmt (index_condition, index_vec_info);
+
+ /* Update the phi with the vec cond.  */
+ add_phi_arg (new_phi, cond_name, loop_latch_edge (loop),
+  UNKNOWN_LOCATION);

same as before - I am missing a comment that shows the generated code
and connects
the local vars used.


+ tree ccompare_name = make_ssa_name (TREE_TYPE (ccompare));
+ gimple ccompare_stmt =

PING^2: [gcc-5-branch][PATCH] PR rtl-optimization/67029: gcc-5.2.0 unable to find a register to spill with O3 fsched-pressure fschedule-insns

2015-09-30 Thread H.J. Lu

OK to back port it to GCC 5?

On Thu, Sep 10, 2015 at 9:27 AM, H.J. Lu  wrote:
> On Fri, Aug 7, 2015 at 12:38 PM, H.J. Lu  wrote:
>> On Thu, Aug 6, 2015 at 11:19 AM, Richard Sandiford
>>  wrote:
>>> "H.J. Lu"  writes:
 Since ira_implicitly_set_insn_hard_regs may be called outside of
 ira-lives.c, it can't use the local variable, preferred_alternatives.
 This patch adds an alternative_mask argument to
 ira_implicitly_set_insn_hard_regs.

 OK for master and 5 branch if there are no regressions on Linux/x86-64?
>>>
>>> Thanks for working on this.  The patch looks good to me FWIW.
>>>
>>> I think this version is safer than the second one you posted.  With that
>>> version we could end up with the same sort of bug, e.g. because a function
>>> passes false to extract_insn without realising that one of the functions
>>> that it calls later needs the preferred alternatives to be set.
>>> Or maybe (as with my patch) a function starts to use preferred_alternatives
>>> and one of its callers gets missed.
>>>
>>> Sorry for the breakage.
>>>
>>> Thanks,
>>> Richard
>>>
 H.J.
 ---
 gcc/

   PR rtl-optimization/67029
   * ira-color.c: Include "recog.h" before including "ira-int.h".
   * target-globals.c: Likewise.
   * ira-lives.c (ira_implicitly_set_insn_hard_regs): Add an
   adds an alternative_mask argument and use it instead of
   preferred_alternatives.
   * ira.h (ira_implicitly_set_insn_hard_regs): Moved to ...
   * ira-int.h (ira_implicitly_set_insn_hard_regs): Here.
   * sched-deps.c: Include "ira-int.h" after including "ira.h".
   (sched_analyze_insn): Update call to
   ira_implicitly_set_insn_hard_regs.
   * sel-sched.c: Include "ira-int.h" after including "ira.h".
   (implicit_clobber_conflict_p): Update call to
   ira_implicitly_set_insn_hard_regs.

 gcc/testsuite/

   PR rtl-optimization/67029
   * gcc.dg/pr67029.c: New test.
>>
>> Here is the backport for gcc-5-branch.  OK for gcc-5-branch?
>>
>>
>> H.J.
>
> PING.
>
> --
> H.J.

-- 
H.J.

JIT breakage after last builtin-types change

2015-09-30 Thread Ulrich Drepper

After some recent additions to builtin-types.def the jit user of the
definitions hasn't been updated.  OK to apply?


2015-09-30  Ulrich Drepper  

* jit-builtins.c: Provide definition of DEF_FUNCTION_TYPE_VAR_6.
* jit-builtins.h: Likewise.


 jit-builtins.c |5 +
 jit-builtins.h |3 +++
 2 files changed, 8 insertions(+)

diff --git a/gcc/jit/jit-builtins.c b/gcc/jit/jit-builtins.c
index a29f446..8a89915 100644
--- a/gcc/jit/jit-builtins.c
+++ b/gcc/jit/jit-builtins.c
@@ -320,6 +320,10 @@ builtins_manager::make_type (enum jit_builtin_type type_id)
 #define DEF_FUNCTION_TYPE_VAR_5(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5) \
   case ENUM: return make_fn_type (ENUM, RETURN, 1, 5, ARG1, ARG2, ARG3, \
  ARG4, ARG5);
+#define DEF_FUNCTION_TYPE_VAR_6(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+   ARG6) \
+  case ENUM: return make_fn_type (ENUM, RETURN, 1, 6, ARG1, ARG2, ARG3, \
+ ARG4, ARG5, ARG6);
 #define DEF_FUNCTION_TYPE_VAR_7(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
ARG6, ARG7) \
   case ENUM: return make_fn_type (ENUM, RETURN, 1, 7, ARG1, ARG2, ARG3, \
@@ -350,6 +354,7 @@ builtins_manager::make_type (enum jit_builtin_type type_id)
 #undef DEF_FUNCTION_TYPE_VAR_3
 #undef DEF_FUNCTION_TYPE_VAR_4
 #undef DEF_FUNCTION_TYPE_VAR_5
+#undef DEF_FUNCTION_TYPE_VAR_6
 #undef DEF_FUNCTION_TYPE_VAR_7
 #undef DEF_FUNCTION_TYPE_VAR_11
 #undef DEF_POINTER_TYPE
diff --git a/gcc/jit/jit-builtins.h b/gcc/jit/jit-builtins.h
index fdf1323..8854326 100644
--- a/gcc/jit/jit-builtins.h
+++ b/gcc/jit/jit-builtins.h
@@ -50,6 +50,8 @@ enum jit_builtin_type
 #define DEF_FUNCTION_TYPE_VAR_4(NAME, RETURN, ARG1, ARG2, ARG3, ARG4) NAME,
 #define DEF_FUNCTION_TYPE_VAR_5(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5) \
NAME,
+#define DEF_FUNCTION_TYPE_VAR_6(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+   ARG6) NAME,
 #define DEF_FUNCTION_TYPE_VAR_7(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
ARG6, ARG7) NAME,
 #define DEF_FUNCTION_TYPE_VAR_11(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
@@ -72,6 +74,7 @@ enum jit_builtin_type
 #undef DEF_FUNCTION_TYPE_VAR_3
 #undef DEF_FUNCTION_TYPE_VAR_4
 #undef DEF_FUNCTION_TYPE_VAR_5
+#undef DEF_FUNCTION_TYPE_VAR_6
 #undef DEF_FUNCTION_TYPE_VAR_7
 #undef DEF_FUNCTION_TYPE_VAR_11
 #undef DEF_POINTER_TYPE

Re: Fold acc_on_device

2015-09-30 Thread Nathan Sidwell


On 09/30/15 04:07, Richard Biener wrote:

On Tue, Sep 29, 2015 at 8:21 PM, Nathan Sidwell  wrote:

This patch folds acc_on_device as a regular builtin, but postponed until we
know which compiler we're in.  As suggested by Bernd, we use the existing
builtin folding machinery.

Trunk is still using  the older PTX runtime scheme (Thomas is working on
that), so the only change there is in the  host-side libgomp piece.

Ok for trunk?


Please don't add any new GENERIC based builtin folders.  Instead add to
gimple-fold.c:gimple_fold_builtin

Otherwise you're just generating more work for us who move foldings from
builtins.c to gimple-fold.c.


Oh, sorry, I didn't know about that.  Will fix.

Should I use the same
 if (symtab->state == EXPANSION)
test to make sure we're after LTO read back (i.e. know which compiler we're in), 
or is there another way?


nathan

Re: Openacc launch API

2015-09-30 Thread Bernd Schmidt


On 09/30/2015 02:37 PM, Matthias Klose wrote:


this broke the jit build.

The following patch fixes the build for me. Ok to commit?

   Matthias

2015-09-30  Matthias Klose  

 * jit-builtins.h Define DEF_FUNCTION_TYPE_VAR_6,
 remove DEF_FUNCTION_TYPE_VAR_11.
 * jit-builtins.c (builtins_manager::make_type): Define and handle
 DEF_FUNCTION_TYPE_VAR_6, remove DEF_FUNCTION_TYPE_VAR_11.


Yeah, I think that qualifies as obvious.


Bernd

Re: [Patch] [x86_64] znver1 enablement

2015-09-30 Thread Uros Bizjak

On Wed, Sep 30, 2015 at 12:05 PM, Kumar, Venkataramanan
 wrote:
> Hi Maintainers,
>
> The attached patch enables -march=znver1 (AMD family 17h Zen processor).
>
> Costs and tunings are copied from bdver4,  but we will be adjusting them 
> later for znver1.
> Also a basic scheduler description for znver1 is added and we will update 
> this as we get more information.
>
> Testing :
> GCC bootstrap and gcc regression passes on x86_64-pc-linux-gnu.
> GCC bootstrap passed with  "make BOOT_CFLAGS= -O2 -g -march=znver1 -mno-adx 
> -mno-mwaitx -mno-clzero -mno-sha -mno-clflushopt -mno-rdseed" on 
> x86_64-pc-linux-gnu .
>
> Built SPEC2006 benchmarks with -march=znver1 and ran it on bdver4 machine.
>
> Wrf and Calculix failed to compile but looks like a general register 
> allocation issue not restricted to -march=znver1.
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67717
>
> ChangeLog:
> * config.gcc (i[34567]86-*-linux* | ...): Add znver1.
> (case ${target}): Add znver1.
> * config/i386/cpuid.h(bit_CLZERO):  Define.
> * config/i386/driver-i386.c: (host_detect_local_cpu): Let
> -march=native recognize znver1 processors.
> * config/i386/i386-c.c (ix86_target_macros_internal): Add
> znver1, clzero def_and_undef.
> * config/i386/i386.c (struct processor_costs znver1_cost): 
> New.
> (m_znver1): New definition.
> (m_AMD_MULTIPLE): Includes m_znver1.
> (processor_target_table): Add znver1 entry.
> (ix86_target_string) : Add clzero entry.
> (static const char *const cpu_names): Add znver1 entry.
> (ix86_option_override_internal): Add znver1 instruction sets.
> (PTA_CLZERO) :  New definition.
> (ix86_option_override_internal): Handle new clzerooption.
> (ix86_issue_rate): Add znver1.
> (ix86_adjust_cost): Add znver1.
> (get_builtin_code_for_version): Set priority for 
> PROCESSOR_ZNVER1.
> (ia32_multipass_dfa_lookahead): Add znver1.
> (enum processor_model): Add M_AMDFAM17H_znver1.
> (struct _arch_names_table): Add M_AMDFAM17H_znver1.
> (has_dispatch): Add znver1.
> * config/i386/i386.h (TARGET_znver1): New definition.
> (TARGET_CLZERO): Define.
> (TARGET_CLZERO_P): Define.
> (struct ix86_size_cost): Add TARGET_ZNVER1.
> (enum processor_type): Add PROCESSOR_znver1.
> * config/i386/i386.md (define_attr "cpu"): Add znver1.
> (set_attr znver1_decode): New definitions for znver1.
> * config/i386/i386.opt (flag_dispatch_scheduler): Add znver1.
> (mclzero): New.
> * config/i386/mmx.md (set_attr znver1_decode): New definitions
> for znver1.
> * config/i386/sse.md (set_attr znver1_decode): Likewise.
> * config/i386/x86-tune.def:  Add znver1 tunings.
> * config/i386/znver1.md: Introduce znver1 cpu and include new 
> md file.
> * gcc/doc/extend.texi: Add details about znver1.
> * gcc/doc/invoke.texi: Add details about znver1.
>
> Ok for trunk?

Please remove changes to get_builtin_code_for_version and
fold_builtin_cpu. They should be committed in a future patch, together
with relevant libgcc changes to libgcc/config/i386/cpuinfo.c.

+++ b/gcc/config.gcc
@@ -592,7 +592,7 @@ pentium4 pentium4m pentiumpro prescott iamcu"
 # 64-bit x86 processors supported by --with-arch=.  Each processor
 # MUST be separated by exactly one space.
 x86_64_archs="amdfam10 athlon64 athlon64-sse3 barcelona bdver1 bdver2 \
-bdver3 bdver4 btver1 btver2 k8 k8-sse3 opteron opteron-sse3 nocona \
+bdver3 bdver4 btver1 btver2 k8 k8-sse3 opteron opteron-sse3 nocona znver1 \
 core2 corei7 corei7-avx core-avx-i core-avx2 atom slm nehalem westmere \

Please group the new processor with other AMD processors.

+++ b/gcc/config/i386/i386-c.c

 | PTA_MOVBE | PTA_MWAITX},
-  {"btver1", PROCESSOR_BTVER1, CPU_GENERIC,
+ {"znver1", PROCESSOR_ZNVER1, CPU_ZNVER1,

wrong indetation.

+++ b/gcc/config/i386/driver-i386.c
@@ -414,6 +414,7 @@ const char *host_detect_local_cpu (int argc, const
char **argv)
   unsigned int has_avx512dq = 0, has_avx512bw = 0, has_avx512vl = 0;
   unsigned int has_avx512vbmi = 0, has_avx512ifma = 0, has_clwb = 0;
   unsigned int has_pcommit = 0, has_mwaitx = 0;
+  unsigned int has_clzero = 0;

This option should be added to the options string. Please see the end
of driver-i386.c on how other options are precessed.

+++ b/gcc/config/i386/i386.opt
@@ -569,8 +569,8 @@ the function.

 mdispatch-scheduler
 Target RejectNegative Var(flag_dispatch_scheduler)
-Do dispatch scheduling if processor is

Re: Openacc launch API

2015-09-30 Thread Nathan Sidwell


On 09/30/15 08:37, Matthias Klose wrote:

On 25.08.2015 15:29, Nathan Sidwell wrote:

Jakub,

This patch changes the launch API for openacc parallels.


this broke the jit build.

The following patch fixes the build for me. Ok to commit?

   Matthias

2015-09-30  Matthias Klose  

 * jit-builtins.h Define DEF_FUNCTION_TYPE_VAR_6,
 remove DEF_FUNCTION_TYPE_VAR_11.
 * jit-builtins.c (builtins_manager::make_type): Define and handle
 DEF_FUNCTION_TYPE_VAR_6, remove DEF_FUNCTION_TYPE_VAR_11.


Looks obvious to me.  Sorry for the breakage.


nathan

Re: [PATCH] x86 interrupt attribute

2015-09-30 Thread H.J. Lu

On Wed, Sep 30, 2015 at 5:36 AM, Yulia Koval  wrote:
> Hi,
>
> Thanks. I added all fixes to the patch, bootstrapped/regtested it on
> Linux/x86_64. Linux/i686 in progress. Ok for trunk if testing passes
> successfully?
>

We will work on the "no_caller_saved_registers" attribute, or
something like that, later as an optimization.

Thanks.

-- 
H.J.

[PATCH] Add toplevel comment to builtins.c - dont add simplifications here

2015-09-30 Thread Richard Biener


Committed.

2015-09-30  Richard Biener  

* builtins.c: Add comment that no new simplifications shouldd
be added here.

Index: gcc/builtins.c
===
--- gcc/builtins.c  (revision 228279)
+++ gcc/builtins.c  (working copy)
@@ -17,6 +17,10 @@ You should have received a copy of the G
 along with GCC; see the file COPYING3.  If not see
 .  */
 
+/* Legacy warning!  Please add no further builtin simplifications here
+   (apart from pure constant folding) - builtin simplifications should go
+   to match.pd or gimple-fold.c instead.  */
+
 #include "config.h"
 #include "system.h"
 #include "coretypes.h"

[patch] Remove empty libstdc++ directories

2015-09-30 Thread Matthias Klose


Removing two empty directories, approved by Jonathan on IRC.

  Matthias

2015-09-30  Matthias Klose  

* config/cpu/alpha, config/cpu/ia64: Remove empty directories.

Re: JIT breakage after last builtin-types change

2015-09-30 Thread Jakub Jelinek

On Wed, Sep 30, 2015 at 09:05:45AM -0400, Ulrich Drepper wrote:
> After some recent additions to builtin-types.def the jit user of the
> definitions hasn't been updated.  OK to apply?
> 
> 
> 2015-09-30  Ulrich Drepper  
> 
>   * jit-builtins.c: Provide definition of DEF_FUNCTION_TYPE_VAR_6.
>   * jit-builtins.h: Likewise.

https://gcc.gnu.org/viewcvs?rev=228289=gcc=rev should fix this
already.

Jakub

[PATCH][25/25] Remove GENERIC stmt combining from SCCVN

2015-09-30 Thread Richard Biener


This is the last patch in the series and it finally ditches the
stmt combining code from SCCVN which uses GENERIC.  I've been sitting
on this for a while because of the "bad" interface that new mprts_hook
is but I couldn't think of a better way than completely refactoring
stmt folding into more C++ (and I'm not even sure how that end result
would look like).  So rather than pondering on this forever the following
patch goes forward.

Net result is that there will be hopefully no regressions (I know
about a few corner cases I found with plastering the code with asserts
but I do not consider them important) but progression both with regarding
to compile-time / memory-use and optimization (because the new code
is strictly more powerful, not relying on the has_constants heuristic).

This is also the last major piece that was sitting on the
match-and-simplify branch.

Bootstrapped on x86_64-unknown-linux-gnu, re-testing in progress.

Richard.

2015-09-30  Richard Biener  

* gimple-match.h (mprts_hook): Declare.
* gimple-match.head.c (mprts_hook): Define.
(maybe_push_res_to_seq): Use new hook.
* gimple-fold.c (gimple_fold_stmt_to_constant_1): Likewise.
* tree-ssa-sccvn.h (vn_ssa_aux::expr): Change to a gimple_seq.
(vn_ssa_aux::has_constants): Remove.
* tree-ssa-sccvn.c: Include gimple-match.h.
(VN_INFO_GET): Assert we don't re-use SSA names.
(vn_get_expr_for): Remove.
(expr_has_constants): Likewise.
(stmt_has_constants): Likewise.
(simplify_binary_expression): Likewise.
(simplify_unary_expression): Likewise.
(vn_lookup_simplify_result): New hook.
(visit_copy): Adjust.
(visit_reference_op_call): Likewise.
(visit_phi): Likewise.
(visit_use): Likewise.
(process_scc): Likewise.
(init_scc_vn): Likewise.
(visit_reference_op_load): Likewise.  Use match-and-simplify and
a gimple seq for inserted expressions.
(try_to_simplify): Remove GENERIC stmt combining code.
(sccvn_dom_walker::before_dom_children): Use match-and-simplify.
* tree-ssa-pre.c (eliminate_insert): Adjust.
(eliminate_dom_walker::before_dom_children): Likewise.

Index: gcc/tree-ssa-sccvn.c
===
*** gcc/tree-ssa-sccvn.c.orig   2015-09-30 14:49:24.613211956 +0200
--- gcc/tree-ssa-sccvn.c2015-09-30 14:58:48.018555783 +0200
*** along with GCC; see the file COPYING3.
*** 58,63 
--- 58,64 
  #include "domwalk.h"
  #include "cgraph.h"
  #include "gimple-iterator.h"
+ #include "gimple-match.h"
  
  /* This algorithm is based on the SCC algorithm presented by Keith
 Cooper and L. Taylor Simpson in "SCC-Based Value numbering"
*** VN_INFO_GET (tree name)
*** 401,406 
--- 402,409 
  {
vn_ssa_aux_t newinfo;
  
+   gcc_assert (SSA_NAME_VERSION (name) >= vn_ssa_aux_table.length ()
+ || vn_ssa_aux_table[SSA_NAME_VERSION (name)] == NULL);
newinfo = XOBNEW (_ssa_aux_obstack, struct vn_ssa_aux);
memset (newinfo, 0, sizeof (struct vn_ssa_aux));
if (SSA_NAME_VERSION (name) >= vn_ssa_aux_table.length ())
*** VN_INFO_GET (tree name)
*** 410,501 
  }
  
  
- /* Get the representative expression for the SSA_NAME NAME.  Returns
-the representative SSA_NAME if there is no expression associated with it.  
*/
- 
- tree
- vn_get_expr_for (tree name)
- {
-   vn_ssa_aux_t vn = VN_INFO (name);
-   gimple *def_stmt;
-   tree expr = NULL_TREE;
-   enum tree_code code;
- 
-   if (vn->valnum == VN_TOP)
- return name;
- 
-   /* If the value-number is a constant it is the representative
-  expression.  */
-   if (TREE_CODE (vn->valnum) != SSA_NAME)
- return vn->valnum;
- 
-   /* Get to the information of the value of this SSA_NAME.  */
-   vn = VN_INFO (vn->valnum);
- 
-   /* If the value-number is a constant it is the representative
-  expression.  */
-   if (TREE_CODE (vn->valnum) != SSA_NAME)
- return vn->valnum;
- 
-   /* Else if we have an expression, return it.  */
-   if (vn->expr != NULL_TREE)
- return vn->expr;
- 
-   /* Otherwise use the defining statement to build the expression.  */
-   def_stmt = SSA_NAME_DEF_STMT (vn->valnum);
- 
-   /* If the value number is not an assignment use it directly.  */
-   if (!is_gimple_assign (def_stmt))
- return vn->valnum;
- 
-   /* Note that we can valueize here because we clear the cached
-  simplified expressions after each optimistic iteration.  */
-   code = gimple_assign_rhs_code (def_stmt);
-   switch (TREE_CODE_CLASS (code))
- {
- case tcc_reference:
-   if ((code == REALPART_EXPR
-  || code == IMAGPART_EXPR
-  || code == VIEW_CONVERT_EXPR)
- && TREE_CODE (TREE_OPERAND (gimple_assign_rhs1 (def_stmt),
- 0)) == SSA_NAME)
-   expr = fold_build1 (code,
-

Re: New OpenACC pass and Target Hook

2015-09-30 Thread Bernd Schmidt


On 09/29/2015 08:36 PM, Nathan Sidwell wrote:

This patch implements an openacc device-specific lowering pass, and an
openacc target hook for validating compute dimensions.

The pass 'oaccdevlow' is inserted early after LTO readback.  It is
active for offloaded openacc functions, and openacc routines.  Currently
its only action is to validate the compute dimensions specified for an
offloaded function.
+/* Validate compute dimensions, fill in non-unity defaults.  FN_LEVEL
+   indicates the level at which a routine might spawn a loop.  It is
+   negative for non-routines.  */
+
+static bool
+nvptx_validate_dims (tree ARG_UNUSED (decl), int *ARG_UNUSED (dims),
+int ARG_UNUSED (fn_level))


The function name and/or comment should mention OpenACC, it could be 
confusing otherwise.


I don't know what style to use for unused args now that we have C++. I'm 
fine with this, and it presumably will be changed anyway.



+DEFHOOK
+(validate_dims,
+"This hook should check the launch dimensions provided.  It should fill\n\
+in anything that needs to default to non-unity and verify non-defaults.\n\
+Defaults are represented as -1.  Diagnostics should be issued as\n\
+appropriate.  Return true if changes have been made.  You must override\n\
+this hook to provide dimensions larger than 1.",
+bool, (tree decl, int dims[], int fn_level),
+default_goacc_validate_dims)


I feel the documentation should be expanded to say what FN_LEVEL does. I 
guess it's one of gang/worker/vector. Without real code to look at yet 
I'm left wondering how the hook would use it.


It sounds like this is for both omp-low created child functions, and 
also for acc routine functions. That should be documented here.



+bool
+default_goacc_validate_dims (tree ARG_UNUSED (decl), int *ARG_UNUSED (dims),
+int ARG_UNUSED (fn_level))


dims is not unused in this function.

Otherwise it looks ok to me.


Bernd

Re: [PATCH] Fix default_binds_local_p_2 for extern protected data

2015-09-30 Thread Bernd Schmidt


On 09/17/2015 11:15 AM, Szabolcs Nagy wrote:

ping 2.

this patch is needed for working visibility ("protected")
attribute for extern data on targets using default_binds_local_p_2.
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01871.html


I hesitate to review this one since I don't think I understand the 
issues on the various affected arches well enough. It looks like Jakub 
had some input on the earlier changes, maybe he could take a look? Or 
maybe rth knows best. Adding Ccs.


It would help to have examples of code generation demonstrating the 
problem and how you would solve it. Input from the s390 maintainers 
whether this is correct for their port would also be appreciated.



Needs a further binutils patch too to emit R_*_GLOB_DAT
instead of R_*_RELATIVE relocs for protected data.
The glibc elf/tst-protected1a and elf/tst-protected1b
tests depend on this.


What is the consequence of not having this binutils patch? Is the gcc 
patch and improvement, a null, or are there situations where it causes 
regressions without the binutils patch?



Tested ARM and AArch64 targets.


Tested how, with or without this binutils patch?


Bernd

Re: [patch, committed] Dump function attributes

2015-09-30 Thread Richard Biener

On Wed, Sep 30, 2015 at 12:03 PM, Tom de Vries  wrote:
> On 29/09/15 13:29, Richard Biener wrote:
>>
>> On Tue, Sep 29, 2015 at 1:23 PM, Tom de Vries 
>> wrote:
>>>
>>> On 29/09/15 12:36, Richard Biener wrote:


 On Tue, Sep 29, 2015 at 7:43 AM, Tom de Vries 
 wrote:
>
>
> [ was: Re: [RFC] Dump function attributes ]
>
> On 28/09/15 17:17, Bernd Schmidt wrote:
>>
>>
>>
>> On 09/28/2015 04:32 PM, Tom de Vries wrote:
>>>
>>>
>>>
>>> patch below prints the function attributes in the dump file.
>>
>>
>>
>>
>>> foo ()
>>> [ noclone , noinline ]
>>> {
>>> ...
>>>
>>> Good idea?
>>>
>>> If so, do we want one attribute per line?
>>
>>
>>
>>
>> Only for really long ones I'd think. Patch is ok for now.
>>
>>
>
> Reposting patch with ChangeLog entry added.
>
> Bootstrapped and reg-tested on x86_64.
>
> Committed to trunk.



 Hmpf.  I always like to make the dump-files as much copy to
 testcases
 as possible.
>>>
>>>
>>>
>>> Hmm, interesting. Not something I use, but I can imagine it's useful.
>>>
 So why did you invent a new syntax for attributes instead of using
 the existing __attribute__(("noclone", "noinline")) (in this case)?
>>>
>>>
>>>
>>> My main concerns were:
>>> - being able to see in dump files what the actual attributes of a
>>>function are (rather than having to figure it out in a debug session).
>>> - being able to write testcases that can test for the presence of those
>>>attributes in dump files
>>>
 Did you verify
 how attributes with arguments get printed?
>>>
>>>
>>>
>>> F.i. an oacc offload function compiled by the host compiler is annotated
>>> as
>>> follows:
>>>
>>> before pass_oacc_transform (in the gomp-4_0-branch):
>>> ...
>>> [ oacc function 32, , , omp target entrypoint ]
>>> ...
>>>
>>> after pass_oacc_transform:
>>> 
>>> [ oacc function 1, 1, 1, omp target entrypoint ]
>>> .
>>
>>
>> Hmm, ok.  So without some extra dump_attribute_list wrapping
>> __attribute_(( ... )) around the above doesn't make it more amenable
>> for cut
>>
>
> With attached untested follow-up patch, for test-case:
> ...
> void __attribute__((noinline)) __attribute__((alias ("bar"), noclone))
> foo (void)
> {
>
> }
>
> void __attribute__ ((__target__ ("arch=core2", "sse3")))
> foo2 (void)
> {
>
> }
>
> void __attribute__ ((optimize ((1
> foo3 (void)
> {
>
> }
>
> void __attribute__ ((optimize (("1"
> foo4 (void)
> {
>
> }
> ...
>
> I get at gimple dump:
> ...
> __attribute__((noclone, alias ("bar"), noinline))
> foo ()
> {
>
> }
>
>
> __attribute__((__target__ ("arch=core2", "sse3")))
> foo2 ()
> {
>
> }
>
>
> __attribute__((optimize (1)))
> foo3 ()
> {
>
> }
>
>
> __attribute__((optimize ("1")))
> foo4 ()
> {
>
> }
> ...
>
> OK if bootstrap/regtest succeeds?

Ok.

Thanks,
Richard.

> Thanks,
> - Tom
>

Re: [PATCH] x86 interrupt attribute

2015-09-30 Thread Yulia Koval

Hi,

Thanks. I added all fixes to the patch, bootstrapped/regtested it on
Linux/x86_64. Linux/i686 in progress. Ok for trunk if testing passes
successfully?

Julia

On Wed, Sep 30, 2015 at 5:50 AM, H.J. Lu  wrote:
> On Tue, Sep 29, 2015 at 5:02 PM, H.J. Lu  wrote:
>> On Tue, Sep 29, 2015 at 4:53 PM, Mike Stump  wrote:
>>> On Sep 29, 2015, at 3:10 PM, H.J. Lu  wrote:
 On Tue, Sep 29, 2015 at 2:23 PM, Mike Stump  wrote:
> On Sep 29, 2015, at 1:59 PM, H.J. Lu  wrote:
>> commit f3a6675a8d69d810d2cad0c090a762094a0a8622
>> Author: H.J. Lu 
>> Date:   Tue Sep 29 13:47:18 2015 -0700
>>
>>   Define EPILOGUE_USES in i386 so that all preserved registers are used
>>   by the epilogue of interrupt handler.  Don't explicitly mark BP and SP
>>   registers as used since they are always used in epilogue.
>>
>> Please take a look.
>
> Oh, too bad you didn’t copy it here.  The easiest thing to blow is the 
> addition of reload_completed && on the condition
>>>
>>>
 static bool
 ix86_save_reg (unsigned int regno, bool maybe_eh_return)
 {
  /* In interrupt handler, we don't preserve MMX and x87 registers
 which aren't supported when saving and restoring registers.  No
 need to preserve callee-saved registers unless they are modified.
 We also preserve all caller-saved registers if a function call
 is made in interrupt handler since the called function may change
 them.  Don't explicitly save BP and SP registers since they are
 always preserved.  */
  if (cfun->machine->is_interrupt)
return ((df_regs_ever_live_p (regno)
 || (call_used_regs[regno] && cfun->machine->make_calls))
&& !fixed_regs[regno]
&& !STACK_REGNO_P (regno)
&& !MMX_REGNO_P (regno)
&& regno != BP_REG
&& regno != SP_REG
&& (regno <= ST7_REG || regno >= XMM0_REG));

 Is this sufficient?
>>>
>>> I see no string "reload_completed &&”.  Either, you need it here, or, you 
>>> need it in the caller.
>>
>> Do you have a testcase to show its impact?
>
> I checked this patch into hjl/interrupt/master branch.
>
> Thanks.
>
>
> --
> H.J.
> --
> commit 8c75718a0f590cb02e7cb88e36fe12e90db62bc1
> Author: H.J. Lu 
> Date:   Tue Sep 29 19:42:16 2015 -0700
>
> Preserve registers in interrupt handler after reload
>
> * config/i386/i386.c (ix86_save_reg): Preserve callee-saved and
> caller-saved registers in interrupt handler only after reload.
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 6b14471..d5c7e07 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -9,7 +9,7 @@ ix86_save_reg (unsigned int regno, bool 
> maybe_eh_return)
>   is made in interrupt handler since the called function may change
>   them.  Don't explicitly save BP and SP registers since they are
>   always preserved.  */
> -  if (cfun->machine->is_interrupt)
> +  if (cfun->machine->is_interrupt && reload_completed)
>  return ((df_regs_ever_live_p (regno)
>   || (call_used_regs[regno]
>   && cfun->machine->call_with_caller_saved_registers))


interrupt_patch
Description: Binary data

Re: Openacc launch API

2015-09-30 Thread Matthias Klose


On 25.08.2015 15:29, Nathan Sidwell wrote:

Jakub,

This patch changes the launch API for openacc parallels.


this broke the jit build.

The following patch fixes the build for me. Ok to commit?

  Matthias

2015-09-30  Matthias Klose  

* jit-builtins.h Define DEF_FUNCTION_TYPE_VAR_6,
remove DEF_FUNCTION_TYPE_VAR_11.
* jit-builtins.c (builtins_manager::make_type): Define and handle
DEF_FUNCTION_TYPE_VAR_6, remove DEF_FUNCTION_TYPE_VAR_11.


Index: gcc/jit/jit-builtins.c
===
--- gcc/jit/jit-builtins.c  (revision 228287)
+++ gcc/jit/jit-builtins.c  (working copy)
@@ -320,15 +320,14 @@
 #define DEF_FUNCTION_TYPE_VAR_5(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5) \
   case ENUM: return make_fn_type (ENUM, RETURN, 1, 5, ARG1, ARG2, ARG3, \
  ARG4, ARG5);
+#define DEF_FUNCTION_TYPE_VAR_6(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+   ARG6)   \
+  case ENUM: return make_fn_type (ENUM, RETURN, 1, 6, ARG1, ARG2, ARG3, \
+ ARG4, ARG5, ARG6);
 #define DEF_FUNCTION_TYPE_VAR_7(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
ARG6, ARG7) \
   case ENUM: return make_fn_type (ENUM, RETURN, 1, 7, ARG1, ARG2, ARG3, \
  ARG4, ARG5, ARG6, ARG7);
-#define DEF_FUNCTION_TYPE_VAR_11(ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
-ARG6, ARG7, ARG8, ARG9, ARG10, ARG11) \
-  case ENUM: return make_fn_type (ENUM, RETURN, 1, 11, ARG1, ARG2, ARG3, \
- ARG4, ARG5, ARG6, ARG7, ARG8, ARG9, \
- ARG10, ARG11);
 #define DEF_POINTER_TYPE(ENUM, TYPE) \
   case ENUM: return make_ptr_type (ENUM, TYPE);
 
@@ -350,8 +349,8 @@
 #undef DEF_FUNCTION_TYPE_VAR_3
 #undef DEF_FUNCTION_TYPE_VAR_4
 #undef DEF_FUNCTION_TYPE_VAR_5
+#undef DEF_FUNCTION_TYPE_VAR_6
 #undef DEF_FUNCTION_TYPE_VAR_7
-#undef DEF_FUNCTION_TYPE_VAR_11
 #undef DEF_POINTER_TYPE
 
 default:
Index: gcc/jit/jit-builtins.h
===
--- gcc/jit/jit-builtins.h  (revision 228287)
+++ gcc/jit/jit-builtins.h  (working copy)
@@ -50,10 +50,10 @@
 #define DEF_FUNCTION_TYPE_VAR_4(NAME, RETURN, ARG1, ARG2, ARG3, ARG4) NAME,
 #define DEF_FUNCTION_TYPE_VAR_5(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5) \
NAME,
+#define DEF_FUNCTION_TYPE_VAR_6(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
+   ARG6) NAME,
 #define DEF_FUNCTION_TYPE_VAR_7(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
ARG6, ARG7) NAME,
-#define DEF_FUNCTION_TYPE_VAR_11(NAME, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, \
-ARG6, ARG7, ARG8, ARG9, ARG10, ARG11) NAME,
 #define DEF_POINTER_TYPE(NAME, TYPE) NAME,
 #include "builtin-types.def"
 #undef DEF_PRIMITIVE_TYPE
@@ -73,7 +73,6 @@
 #undef DEF_FUNCTION_TYPE_VAR_4
 #undef DEF_FUNCTION_TYPE_VAR_5
 #undef DEF_FUNCTION_TYPE_VAR_7
-#undef DEF_FUNCTION_TYPE_VAR_11
 #undef DEF_POINTER_TYPE
   BT_LAST
 }; /* enum jit_builtin_type */

Re: Fold acc_on_device

2015-09-30 Thread Bernd Schmidt


On 09/30/2015 02:18 PM, Nathan Sidwell wrote:

On 09/30/15 04:07, Richard Biener wrote:

Please don't add any new GENERIC based builtin folders.  Instead add to
gimple-fold.c:gimple_fold_builtin

Otherwise you're just generating more work for us who move foldings from
builtins.c to gimple-fold.c.


Oh, sorry, I didn't know about that.  Will fix.


Yeah. me neither - sorry about that. TBH if we're not supposed to add 
folders to builtins.c that could use a comment near the top of that file.



Bernd

Re: [PATCH, PR target/67761] Fix i686-- bootstrap comparison failure

2015-09-30 Thread Ilya Enkovich

2015-09-30 9:06 GMT+03:00 Uros Bizjak :
> Hello!
>
>> My recenttly introduced STV pass doesn't skip debug instructions and it 
>> causes transformation
>> (mistly cost computation) depending on debug info.  It causes bootstrap 
>> comparison failure.  This
>> patch fixes.  Bootstrapped for i686-linux.  Testing for 
>> x86_64-unknown-linux-gnu{,m32} is in
>> progress.  OK for trunk if pass?
>
> IMO, it would be also beneficial to bootstrap with slm default
> architecture, so new code paths get some coverage via bootstrap.

I bootstrapped with --with-cpu=slm also.

>
>> gcc/
>>
>> 2015-09-29  Ilya Enkovich  
>>
>> * config/i386/i386.c (scalar_chain::analyze_register_chain): Ignore
>> debug insns.
>> (scalar_chain::convert_reg): Likewise.
>>
>> gcc/testsuite/
>>
>> 2015-09-29  Ilya Enkovich  
>>
>> * gcc.target/i386/pr67761.c: New test.
>
> OK.

Thanks!

Ilya

>
> Thanks,
> Uros.

Re: Fold acc_on_device

2015-09-30 Thread Nathan Sidwell


On 09/30/15 08:46, Richard Biener wrote:


 I'll add a comment to builtins.c
(not that I expect anyone sees it ;))


Put one instance at the default: label in  expand_builtin?

nathan

ptx offload data format

2015-09-30 Thread Nathan Sidwell

I've merged this patch to trunk.  It changes the PTX offload data format to be 
an array of pointers to strings, preparing the way for the static linking patch 
that Thomas is working on.


For the moment, we retain the automatic linking on of the support functions 
during PTX JITing.  Some of the changes to link_ptx were done by Bernd a while back.


No change to the PTX ABI version number, as that just got incremented last week 
with the launch API change -- it's in a state of flux right now.


nathan
2015-09-30  Nathan Sidwell  

	gcc/
	* config/nvptx/mkoffload.c (process): Change offload data format.

2015-09-30  Nathan Sidwell  
	Bernd Schmidt 

	libgomp/
	* plugin/plugin-nvptx.c (targ_fn_launch): Use GOMP_DIM_MAX.
	(struct targ_ptx_obj): New.
	(nvptx_tdata): Move earlier, change data format.
	(link_ptx): Take targ_ptx_obj ptr and count.  Allow multiple
	objects.
	(GOMP_OFFLOAD_load_image): Adjust.

Index: gcc/config/nvptx/mkoffload.c
===
--- gcc/config/nvptx/mkoffload.c	(revision 228242)
+++ gcc/config/nvptx/mkoffload.c	(working copy)
@@ -844,39 +844,53 @@ process (FILE *in, FILE *out)
   Token *tok = tokenize (input);
   const char *comma;
   id_map const *id;
+  unsigned obj_count = 0;
+  unsigned ix;
 
   do
 tok = parse_file (tok);
   while (tok->kind);
 
-  fprintf (out, "static const char ptx_code[] = \n");
+  fprintf (out, "static const char ptx_code_%u[] = \n", obj_count++);
   write_stmts (out, rev_stmts (decls));
   write_stmts (out, rev_stmts (vars));
   write_stmts (out, rev_stmts (fns));
   fprintf (out, ";\n\n");
 
+  /* Dump out array of pointers to ptx object strings.  */
+  fprintf (out, "static const struct ptx_obj {\n"
+	   "  const char *code;\n"
+	   "  __SIZE_TYPE__ size;\n"
+	   "} ptx_objs[] = {");
+  for (comma = "", ix = 0; ix != obj_count; comma = ",", ix++)
+fprintf (out, "%s\n\t{ptx_code_%u, sizeof (ptx_code_%u)}", comma, ix, ix);
+  fprintf (out, "\n};\n\n");
+
+  /* Dump out variable idents.  */
   fprintf (out, "static const char *const var_mappings[] = {");
   for (comma = "", id = var_ids; id; comma = ",", id = id->next)
 fprintf (out, "%s\n\t%s", comma, id->ptx_name);
   fprintf (out, "\n};\n\n");
 
+  /* Dump out function idents.  */
   fprintf (out, "static const struct nvptx_fn {\n"
 	   "  const char *name;\n"
-	   "  unsigned short dim[3];\n"
-	   "} func_mappings[] = {\n");
+	   "  unsigned short dim[%d];\n"
+	   "} func_mappings[] = {\n", GOMP_DIM_MAX);
   for (comma = "", id = func_ids; id; comma = ",", id = id->next)
 fprintf (out, "%s\n\t{%s}", comma, id->ptx_name);
   fprintf (out, "\n};\n\n");
 
   fprintf (out,
 	   "static const struct nvptx_tdata {\n"
-	   "  const char *ptx_src;\n"
+	   "  const struct ptx_obj *ptx_objs;\n"
+	   "  unsigned ptx_num;\n"
 	   "  const char *const *var_names;\n"
-	   "  __SIZE_TYPE__ var_num;\n"
+	   "  unsigned var_num;\n"
 	   "  const struct nvptx_fn *fn_names;\n"
-	   "  __SIZE_TYPE__ fn_num;\n"
+	   "  unsigned fn_num;\n"
 	   "} target_data = {\n"
-	   "  ptx_code,\n"
+	   "  ptx_objs, sizeof (ptx_objs) / sizeof (ptx_objs[0]),\n"
 	   "  var_mappings,"
 	   "  sizeof (var_mappings) / sizeof (var_mappings[0]),\n"
 	   "  func_mappings,"
Index: libgomp/plugin/plugin-nvptx.c
===
--- libgomp/plugin/plugin-nvptx.c	(revision 228265)
+++ libgomp/plugin/plugin-nvptx.c	(working copy)
@@ -224,9 +224,31 @@ map_push (struct ptx_stream *s, int asyn
 struct targ_fn_launch
 {
   const char *fn;
-  unsigned short dim[3];
+  unsigned short dim[GOMP_DIM_MAX];
 };
 
+/* Target PTX object information.  */
+
+struct targ_ptx_obj
+{
+  const char *code;
+  size_t size;
+};
+
+/* Target data image information.  */
+
+typedef struct nvptx_tdata
+{
+  const struct targ_ptx_obj *ptx_objs;
+  unsigned ptx_num;
+
+  const char *const *var_names;
+  unsigned var_num;
+
+  const struct targ_fn_launch *fn_descs;
+  unsigned fn_num;
+} nvptx_tdata_t;
+
 /* Descriptor of a loaded function.  */
 
 struct targ_fn_descriptor
@@ -688,7 +710,8 @@ nvptx_get_num_devices (void)
 
 
 static void
-link_ptx (CUmodule *module, const char *ptx_code)
+link_ptx (CUmodule *module, const struct targ_ptx_obj *ptx_objs,
+	  unsigned num_objs)
 {
   CUjit_option opts[7];
   void *optvals[7];
@@ -702,8 +725,6 @@ link_ptx (CUmodule *module, const char *
   void *linkout;
   size_t linkoutsize __attribute__ ((unused));
 
-  GOMP_PLUGIN_debug (0, "attempting to load:\n---\n%s\n---\n", ptx_code);
-
   opts[0] = CU_JIT_WALL_TIME;
   optvals[0] = 
 
@@ -758,25 +779,37 @@ link_ptx (CUmodule *module, const char *
 			 cuda_error (r));
 }
 
-  /* cuLinkAddData's 'data' argument erroneously omits the const qualifier.  */
-  r = cuLinkAddData (linkstate, CU_JIT_INPUT_PTX, (char *)ptx_code,
-  strlen (ptx_code) + 1, 0, 0, 0, 0);
-  if (r !=

[PATCH 2/2] call scev analysis in scop-detection as in sese-to-poly

2015-09-30 Thread Sebastian Pop

Before our rewrite of the scop detection, we used to not have a valid SESE
region under hand, and so we used to do more ad-hoc analysis of data references
by trying to prove that at all levels of a loop nest the data references would
be still valid.

Now that we have a valid SESE region, we can call the scev analysis in the same
way on the same computed loop nest in the scop-detection as in the sese-to-poly.

Next step will be to cache the data references analyzed in the scop detection
and not compute the same info in sese-to-poly.

The patch fixes block-1.f90 that used to ICE on x86_64-linux when compiled with
-m32.  Patch passed bootstrap with BOOT_CFLAGS="-g -O2 -fgraphite-identity
-floop-nest-optimize" and check on x86_64-linux using ISL-0.15.

2015-09-28  Sebastian Pop  
Aditya Kumar  

PR tree-optimization/67754
* graphite-scop-detection.c (stmt_has_simple_data_refs_p): Call
scev analysis on the same loop nest as analyze_drs_in_stmts.
* graphite-sese-to-poly.c (outermost_loop_in_sese_1): Moved and 
renamed...
(try_generate_gimple_bb): Call outermost_loop_in_sese.
(analyze_drs_in_stmts): Same.
* sese.c (outermost_loop_in_sese): ...here.
---
 gcc/graphite-scop-detection.c | 49 ++-
 gcc/graphite-sese-to-poly.c   | 30 ++
 gcc/sese.c| 28 -
 3 files changed, 49 insertions(+), 58 deletions(-)

diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c
index d95f527..c45df55 100644
--- a/gcc/graphite-scop-detection.c
+++ b/gcc/graphite-scop-detection.c
@@ -262,46 +262,37 @@ graphite_can_represent_expr (sese_l scop, loop_p loop, 
tree expr)
 static bool
 stmt_has_simple_data_refs_p (sese_l scop, gimple *stmt)
 {
-  data_reference_p dr;
-  int j;
-  bool res = true;
+  sese region = new_sese (scop.entry, scop.exit);
+  loop_p nest = outermost_loop_in_sese (region, gimple_bb (stmt));
+  loop_p loop = loop_containing_stmt (stmt);
   vec drs = vNULL;
-  loop_p outer;
-  loop_p loop_around_scop = get_entry_bb (scop.entry)->loop_father;
 
-  for (outer = loop_containing_stmt (stmt); outer && outer != loop_around_scop;
-   outer = loop_outer (outer))
+  graphite_find_data_references_in_stmt (nest, loop, stmt, );
+
+  int j;
+  data_reference_p dr;
+  FOR_EACH_VEC_ELT (drs, j, dr)
 {
-  graphite_find_data_references_in_stmt (outer,
-loop_containing_stmt (stmt),
-stmt, );
+  int nb_subscripts = DR_NUM_DIMENSIONS (dr);
+  tree ref = DR_REF (dr);
 
-  FOR_EACH_VEC_ELT (drs, j, dr)
+  for (int i = nb_subscripts - 1; i >= 0; i--)
{
- int nb_subscripts = DR_NUM_DIMENSIONS (dr);
- tree ref = DR_REF (dr);
-
- for (int i = nb_subscripts - 1; i >= 0; i--)
+ if (!graphite_can_represent_scev (DR_ACCESS_FN (dr, i))
+ || (TREE_CODE (ref) != ARRAY_REF
+ && TREE_CODE (ref) != MEM_REF
+ && TREE_CODE (ref) != COMPONENT_REF))
{
- if (!graphite_can_represent_scev (DR_ACCESS_FN (dr, i))
- || (TREE_CODE (ref) != ARRAY_REF
- && TREE_CODE (ref) != MEM_REF
- && TREE_CODE (ref) != COMPONENT_REF))
-   {
- free_data_refs (drs);
- return false;
-   }
-
- ref = TREE_OPERAND (ref, 0);
+ free_data_refs (drs);
+ return false;
}
-   }
 
-  free_data_refs (drs);
-  drs.create (0);
+ ref = TREE_OPERAND (ref, 0);
+   }
 }
 
   free_data_refs (drs);
-  return res;
+  return true;
 }
 
 /* Return true only when STMT is simple enough for being handled by Graphite.
diff --git a/gcc/graphite-sese-to-poly.c b/gcc/graphite-sese-to-poly.c
index 26f75e9..40b7d31 100644
--- a/gcc/graphite-sese-to-poly.c
+++ b/gcc/graphite-sese-to-poly.c
@@ -274,32 +274,6 @@ free_scops (vec scops)
   scops.release ();
 }
 
-/* Same as outermost_loop_in_sese, returns the outermost loop
-   containing BB in REGION, but makes sure that the returned loop
-   belongs to the REGION, and so this returns the first loop in the
-   REGION when the loop containing BB does not belong to REGION.  */
-
-static loop_p
-outermost_loop_in_sese_1 (sese region, basic_block bb)
-{
-  loop_p nest = outermost_loop_in_sese (region, bb);
-
-  if (loop_in_sese_p (nest, region))
-return nest;
-
-  /* When the basic block BB does not belong to a loop in the region,
- return the first loop in the region.  */
-  nest = nest->inner;
-  while (nest)
-if (loop_in_sese_p (nest, region))
-  break;
-else
-  nest = nest->next;
-
-  gcc_assert (nest);
-  return nest;
-}
-
 /* Generates a polyhedral black box only if the bb contains interesting
information.

[gomp4] tile clause asterisk argument

2015-09-30 Thread Cesar Philippidis

This patch fixes a fortran ICE when a tile clause contains an asterisk.
The problem was the asterisk argument is represented by a NULL
expression. That caused problems as the code when the code is translated
into gimple. The fix is to convert those NULL expressions into -1
expressions late, since that what the c and c++ front ends do.

It looks like there is a lot of existing test coverage for the tile
clause. However, this ICE isn't triggered if there are parser errors.
The new test does contain some deliberate errors, but I included them to
test for invalid nesting which gets triggered in omplow.

I've applied this patch to gomp-4_0-branch.

Cesar
2015-09-30  Cesar Philippidis  

	gcc/fortran/
	* openmp.c (resolve_oacc_loop_blocks): Represent astrisk tile
	arguments as -1.

	gcc/testsuite/
	* gfortran.dg/goacc/loop-5.f95: New test.

diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index 0bdbb73..c42a2c2 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -4891,10 +4891,21 @@ resolve_oacc_loop_blocks (gfc_code *code)
 	{
 	  num++;
 	  if (el->expr == NULL)
-	continue;
-	  resolve_oacc_positive_int_expr (el->expr, "TILE");
-	  if (el->expr->expr_type != EXPR_CONSTANT)
-	gfc_error ("TILE requires constant expression at %L", >loc);
+	{
+	  /* NULL expressions are used to represent '*' arguments.
+		 Convert those to a -1 expressions.  */
+	  el->expr = gfc_get_constant_expr (BT_INTEGER,
+		gfc_default_integer_kind,
+		>loc);
+	  mpz_set_si (el->expr->value.integer, -1);
+	}
+	  else
+	{
+	  resolve_oacc_positive_int_expr (el->expr, "TILE");
+	  if (el->expr->expr_type != EXPR_CONSTANT)
+		gfc_error ("TILE requires constant expression at %L",
+			   >loc);
+	}
 	}
   resolve_oacc_nested_loops (code, code->block->next, num, "tiled");
 }
diff --git a/gcc/testsuite/gfortran.dg/goacc/loop-5.f95 b/gcc/testsuite/gfortran.dg/goacc/loop-5.f95
new file mode 100644
index 000..c2db090
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/loop-5.f95
@@ -0,0 +1,429 @@
+! { dg-do compile }
+! { dg-additional-options "-fmax-errors=100" }
+
+! TODO: nested kernels are allowed in 2.0
+
+program test
+  implicit none
+  integer :: i, j
+
+  !$acc kernels
+!$acc loop auto
+DO i = 1,10
+ENDDO
+!$acc loop gang
+DO i = 1,10
+ENDDO
+!$acc loop gang(5)
+DO i = 1,10
+ENDDO
+!$acc loop gang(num:5)
+DO i = 1,10
+ENDDO
+!$acc loop gang(static:5)
+DO i = 1,10
+ENDDO
+!$acc loop gang(static:*)
+DO i = 1,10
+ENDDO
+!$acc loop gang
+DO i = 1,10
+  !$acc loop vector
+  DO j = 1,10
+  ENDDO
+  !$acc loop worker
+  DO j = 1,10
+  ENDDO
+ENDDO
+
+!$acc loop worker
+DO i = 1,10
+ENDDO
+!$acc loop worker(5)
+DO i = 1,10
+ENDDO
+!$acc loop worker(num:5)
+DO i = 1,10
+ENDDO
+!$acc loop worker
+DO i = 1,10
+  !$acc loop vector
+  DO j = 1,10
+  ENDDO
+ENDDO
+!$acc loop gang worker
+DO i = 1,10
+ENDDO
+
+!$acc loop vector
+DO i = 1,10
+ENDDO
+!$acc loop vector(5)
+DO i = 1,10
+ENDDO
+!$acc loop vector(length:5)
+DO i = 1,10
+ENDDO
+!$acc loop vector
+DO i = 1,10
+ENDDO
+!$acc loop gang vector
+DO i = 1,10
+ENDDO
+!$acc loop worker vector
+DO i = 1,10
+ENDDO
+
+!$acc loop auto
+DO i = 1,10
+ENDDO
+
+!$acc loop tile(1)
+DO i = 1,10
+ENDDO
+!$acc loop tile(2)
+DO i = 1,10
+ENDDO
+!$acc loop tile(6-2)
+DO i = 1,10
+ENDDO
+!$acc loop tile(6+2)
+DO i = 1,10
+ENDDO
+!$acc loop tile(*)
+DO i = 1,10
+ENDDO
+!$acc loop tile(*, 1)
+DO i = 1,10
+  DO j = 1,10
+  ENDDO
+ENDDO
+!$acc loop tile(-1) ! { dg-warning "must be positive" }
+do i = 1,10
+enddo
+!$acc loop vector tile(*)
+DO i = 1,10
+ENDDO
+!$acc loop worker tile(*)
+DO i = 1,10
+ENDDO
+!$acc loop gang tile(*)
+DO i = 1,10
+ENDDO
+!$acc loop vector gang tile(*)
+DO i = 1,10
+ENDDO
+!$acc loop vector worker tile(*)
+DO i = 1,10
+ENDDO
+!$acc loop gang worker tile(*)
+DO i = 1,10
+ENDDO
+  !$acc end kernels
+
+
+  !$acc parallel
+!$acc loop auto
+DO i = 1,10
+ENDDO
+!$acc loop gang
+DO i = 1,10
+ENDDO
+!$acc loop gang(static:5)
+DO i = 1,10
+ENDDO
+!$acc loop gang(static:*)
+DO i = 1,10
+ENDDO
+!$acc loop gang
+DO i = 1,10
+  !$acc loop vector
+  DO j = 1,10
+  ENDDO
+  !$acc loop worker
+  DO j = 1,10
+  ENDDO
+ENDDO
+
+!$acc loop worker
+DO i = 1,10
+ENDDO
+!$acc loop worker
+DO i = 1,10
+  !$acc loop vector
+  DO j = 1,10
+  ENDDO
+ENDDO
+!$acc loop gang worker
+DO i = 1,10
+ENDDO
+
+!$acc loop vector
+DO i = 1,10
+ENDDO
+!$acc loop

[PATCH 1/2] add recursion on the inner loops

2015-09-30 Thread Sebastian Pop

We now check that all data references in the current loop and inner loops
contained within loop are valid in an outer region before declaring that the
outer loop is a valid scop.

2015-09-30  Sebastian Pop  
Aditya Kumar  

PR tree-optimization/67754
* graphite-scop-detection.c (loop_body_is_valid_scop): Add missing
recursion on the inner loops.
---
 gcc/graphite-scop-detection.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c
index a498ddc..d95f527 100644
--- a/gcc/graphite-scop-detection.c
+++ b/gcc/graphite-scop-detection.c
@@ -805,6 +805,18 @@ loop_body_is_valid_scop (loop_p loop, sese_l scop)
return false;
 }
   free (bbs);
+
+  if (loop->inner)
+{
+  loop = loop->inner;
+  while (loop)
+   {
+ if (!loop_body_is_valid_scop (loop, scop))
+   return false;
+ loop = loop->next;
+   }
+}
+
   return true;
 }
 
-- 
2.1.0.243.g30d45f7

Re: [PATCH] PR66870 PowerPC64 Enable gold linker with split stack

2015-09-30 Thread Ian Lance Taylor

On Thu, Sep 17, 2015 at 12:13 PM, Lynn A. Boger
 wrote:
> Here is my updated patch, with the changes suggested by
> Ian for gcc/gospec.c and David for gcc/configure.ac.
>
> Bootstrap built and tested on ppc64le, ppc64 multilib.
>
> 2015-09-17Lynn Boger 
> gcc/
> PR target/66870
> config/rs6000/sysv4.h:  Define TARGET_CAN_SPLIT_STACK_64BIT
> config.in:  Set up HAVE_GOLD_ALTERNATE_SPLIT_STACK
> configure.ac:  Define HAVE_GOLD_ALTERNATE_SPLIT_STACK
> on Power based on gold linker version
> configure:  Regenerate
> gcc.c:  Add -fuse-ld=gold to STACK_SPLIT_SPEC if
> HAVE_GOLD_ALTERNATE_SPLIT_STACK defined
> go/gospec.c:  (lang_specific_driver):  Set appropriate split
> stack
> options for 64 bit compiles based on
> TARGET_CAN_SPLIT_STACK_64BIT

Thanks.  I had to add ATTRIBUTE_UNUSED to the new variable in
go/gospec.c.  Committed with these ChangeLog entries:

2015-10-01  Lynn Boger  

PR target/66870
* config/rs6000/sysv4.h (TARGET_CAN_SPLIT_STACK_64BIT): Define.
* configure.ac: Define HAVE_GOLD_ALTERNATE_SPLIT_STACK on Power
based on gold linker version.
* gcc.c: Add -fuse-ld=gold to STACK_SPLIT_SPEC if
HAVE_GOLD_ALTERNATE_SPLIT_STACK defined.
* configure, config.in: Regenerate.

2015-10-01  Lynn Boger  

PR target/66870
* gospec.c (lang_specific_driver): Set appropriate split stack
options for 64 bit compiles based on TARGET_CAN_SPLIT_STACK_64BIT.

Ian

Re: [PATCH] fortran/67758 -- Prevent ICE caused by misplaced COMMON

2015-09-30 Thread Steve Kargl

On Wed, Sep 30, 2015 at 05:06:30PM -0700, Steve Kargl wrote:
> Patch built and regression tested on x86_64-*-freebsd.
> OK to commit?
> 
> The patch prevents the dereferencing of a NULL pointer
> by jumping out of the cleanup of a list of COMMON blocks.
> 
> 2015-09-30  Steven G. Kargl  
> 
>   * symbol.c (gfc_restore_last_undo_checkpoint): Prevent ICE during
>   cleanup of a misplaced COMMON.
> 
> 2015-09-30  Steven G. Kargl  
> 
>   * gfortran.dg/pr67758.f: New test.
> 

Now with the patch attached!

-- 
Steve
Index: fortran/symbol.c
===
--- fortran/symbol.c	(revision 228306)
+++ fortran/symbol.c	(working copy)
@@ -3211,6 +3211,11 @@ gfc_restore_last_undo_checkpoint (void)
 
 	  while (csym != p)
 		{
+		  if (!csym)
+		{
+		  gfc_error ("Unexpected COMMON at %C");
+		  goto error;
+		}
 		  cparent = csym;
 		  csym = csym->common_next;
 		}
@@ -3237,6 +3242,8 @@ gfc_restore_last_undo_checkpoint (void)
 	restore_old_symbol (p);
 }
 
+error:
+
   latest_undo_chgset->syms.truncate (0);
   latest_undo_chgset->tbps.truncate (0);
 
Index: testsuite/gfortran.dg/pr67758.f
===
--- testsuite/gfortran.dg/pr67758.f	(revision 0)
+++ testsuite/gfortran.dg/pr67758.f	(working copy)
@@ -0,0 +1,6 @@
+c { dg-do compile }
+c PR fortran/67758
+  COMMON /FMCOM / X(80 000 000)
+  CALL T(XX(A))
+  COMMON /FMCOM / XX(80 000 000) ! { dg-error "Unexpected COMMON" }
+  END

[PATCH] fortran/67758 -- Prevent ICE caused by misplaced COMMON

2015-09-30 Thread Steve Kargl

Patch built and regression tested on x86_64-*-freebsd.
OK to commit?

The patch prevents the dereferencing of a NULL pointer
by jumping out of the cleanup of a list of COMMON blocks.

2015-09-30  Steven G. Kargl  

* symbol.c (gfc_restore_last_undo_checkpoint): Prevent ICE during
cleanup of a misplaced COMMON.

2015-09-30  Steven G. Kargl  

* gfortran.dg/pr67758.f: New test.

-- 
Steve

[PATCH] fortran/66979 -- FLUSH requires a UNIT number in the spec-list

2015-09-30 Thread Steve Kargl

When FLUSH is used with a flulsh-spec-list, a unit is required.
Thus, a statement like 'flush(iostat=i)' would lead to an ICE
because gfortran was dereference a pointer to a non-existant
unit number.  The attached patch was built and tested on
x86-*-freebsd.  OK to commit?

2015-09-30  Steven G. Kargl  

* io.c (gfc_resolve_filepos): Check for a UNIT number.  Add a nearby
missing 'return false'.

2015-09-30  Steven G. Kargl  

gfortran.dg/pr66979.f90: new test.

-- 
Steve
Index: fortran/io.c
===
--- fortran/io.c	(revision 228306)
+++ fortran/io.c	(working copy)
@@ -2515,12 +2515,21 @@ gfc_resolve_filepos (gfc_filepos *fp)
   if (!gfc_reference_st_label (fp->err, ST_LABEL_TARGET))
 return false;
 
+  if (!fp->unit && (fp->iostat || fp->iomsg))
+{
+  locus where;
+  where = fp->iostat ? fp->iostat->where : fp->iomsg->where;
+  gfc_error ("UNIT number missing in statement at %L", );
+  return false;
+}
+
   if (fp->unit->expr_type == EXPR_CONSTANT
   && fp->unit->ts.type == BT_INTEGER
   && mpz_sgn (fp->unit->value.integer) < 0)
 {
   gfc_error ("UNIT number in statement at %L must be non-negative",
 		 >unit->where);
+  return false;
 }
 
   return true;
Index: testsuite/gfortran.dg/pr66979.f90
===
--- testsuite/gfortran.dg/pr66979.f90	(revision 0)
+++ testsuite/gfortran.dg/pr66979.f90	(working copy)
@@ -0,0 +1,7 @@
+! { dg-do compile }
+! PR fortran/66979
+program p
+  implicit none
+  integer::i
+  flush (iostat=i) ! { dg-error "UNIT number missing" }
+end program p

[PATCH] fortran/67616 -- Fix ICE in BLOCK with a DATA statement

2015-09-30 Thread Steve Kargl

The attached patch was built and tested on x86_64-*-freebsd.
OK to commit?

The patch prevents an ICE in a BLOCK construct that uses
a DATA statement and default initialization.  The problem
was that the derived typed was declared in the host and
was not in the BLOCK's symtree.  The fix looks for the 
derived type through host associate.

Just remembered Mikael pre-approved patch.

2015-09-30  Steven G. Kargl  

PR fortran/67616
* primary.c (gfc_match_structure_constructor): Use a possibly
host-associated symtree to prevent ICE.

2015-09-30  Steven G. Kargl  

PR fortran/67616
* gfortran.dg/pr67616.f90: New test.

-- 
Steve
Index: fortran/primary.c
===
--- fortran/primary.c	(revision 228306)
+++ fortran/primary.c	(working copy)
@@ -2697,7 +2697,7 @@ gfc_match_structure_constructor (gfc_sym
   gfc_expr *e;
   gfc_symtree *symtree;
 
-  gfc_get_sym_tree (sym->name, NULL, , false);   /* Can't fail */
+  gfc_get_ha_sym_tree (sym->name, );
 
   e = gfc_get_expr ();
   e->symtree = symtree;
Index: testsuite/gfortran.dg/pr67616.f90
===
--- testsuite/gfortran.dg/pr67616.f90	(revision 0)
+++ testsuite/gfortran.dg/pr67616.f90	(working copy)
@@ -0,0 +1,13 @@
+! { dg-do compile }
+! PR fortran/67616
+! Original code contributed by Gerhard Steinmetz 
+program p
+   type t
+   end type
+   type(t) :: y
+   data y /t()/
+   block
+  type(t) :: x
+  data x /t()/  ! Prior to patch, this would ICE.
+   end block
+end

Re: [PATCH] Optimize certain end of loop conditions into min/max operation

2015-09-30 Thread Michael Collison


Richard and Marc,

Latest patch attached which incorporates all comments.

2015-09-30  Michael Collison 
Andrew Pinski 

* match.pd ((x < y) && (x < z) -> x < min (y,z),
(x > y) and (x > z) -> x > max (y,z))
* testsuite/gcc.dg/tree-ssa/minmax-loopend.c: New test.

On 09/30/2015 12:30 PM, Marc Glisse wrote:

On Fri, 18 Sep 2015, Marc Glisse wrote:

+(bit_and:c (op @0 @1) (op @0 @2))


:c seems useless here. On the other hand, it might make sense to 
use op:s
since this is mostly useful if it removes the 2 original 
comparisons.


As I was saying, :c is useless.
(x:c y z)
is replaced by two copies of the transformation, one with
(x y z)
and the other with
(x z y)
In your transformation, both versions would be equivalent, so the second
one is redundant.

Also, if you have:
a=x @1 and @0 > @2) to use max */
+(for op (lt le gt ge)
+ ext (min min max max)
+(simplify
+(bit_and (op:s @0 @1) (op:s @0 @2))
+(if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
+(op @0 (ext @1 @2)
+
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c b/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c
new file mode 100644
index 000..dfe6120
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int min_test(long a, long b, long c) {
+  int cmp1 = a < b;
+  int cmp2 = a < c;
+  return cmp1 & cmp2;
+}
+
+int max_test (long a, long b, long c) {
+  int cmp1 = a > b;
+  int cmp2 = a > c;
+  return cmp1 & cmp2;
+}
+
+/* { dg-final { scan-tree-dump "MIN_EXPR" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump "MAX_EXPR" 1 "optimized" } } */
-- 
1.9.1

[commit][spu] Support atomic builtins

2015-09-30 Thread Ulrich Weigand

Hello,

Fortran tests have been failing on SPU since libgfortran now assumes the
atomic builtins are always available.  On the SPU, execution is always
single-threaded, so we have not provided atomic builtins so far.  As
suggested by Ian here:
https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01818.html
this patch adds a trivial implementation of the atomic builtins on SPU.

Tested on spu-elf, committed to mainline.

Bye,
Ulrich


ChangeLog:

* config/spu/spu-protos.h (spu_expand_atomic_op): Add prototype.
* config/spu/spu.c (spu_expand_atomic_op): New function.
* config/spu/spu.md (AINT): New mode iterator.
(ATOMIC): New code iterator.
(atomic_name, atomic_pred): New code predicates.
("atomic_load", "atomic_store"): New expanders.
("atomic_compare_and_swap", "atomic_exchange"): Likewise.
(""atomic_", "atomic_fetch_",
"atomic__fetch"): Likewise.

testsuite/ChangeLog:

* lib/target-supports.exp (check_effective_target_sync_int_128):
Return 1 on spu-*-* targets.
(check_effective_target_sync_int_128_runtime): Likewise.
(check_effective_target_sync_long_long): Likewise.
(check_effective_target_sync_long_long_runtime): Likewise.
(check_effective_target_sync_int_long): Likewise.
(check_effective_target_sync_char_short): Likewise.


Index: gcc/testsuite/lib/target-supports.exp
===
*** gcc/testsuite/lib/target-supports.exp   (revision 228013)
--- gcc/testsuite/lib/target-supports.exp   (working copy)
*** proc check_effective_target_sync_int_128
*** 5085,5090 
--- 5085,5092 
  if { ([istarget x86_64-*-*] || [istarget i?86-*-*])
 && ![is-effective-target ia32] } {
return 1
+ } elseif { [istarget spu-*-*] } {
+   return 1
  } else {
return 0
  }
*** proc check_effective_target_sync_int_128
*** 5108,5113 
--- 5110,5117 
}
} ""
}]
+ } elseif { [istarget spu-*-*] } {
+   return 1
  } else {
return 0
  }
*** proc check_effective_target_sync_long_lo
*** 5122,5128 
 || [istarget aarch64*-*-*]
 || [istarget arm*-*-*]
 || [istarget alpha*-*-*]
!|| ([istarget sparc*-*-*] && [check_effective_target_lp64]) } {
return 1
  } else {
return 0
--- 5126,5133 
 || [istarget aarch64*-*-*]
 || [istarget arm*-*-*]
 || [istarget alpha*-*-*]
!|| ([istarget sparc*-*-*] && [check_effective_target_lp64])
!|| [istarget spu-*-*] } {
return 1
  } else {
return 0
*** proc check_effective_target_sync_long_lo
*** 5172,5177 
--- 5177,5184 
 && [check_effective_target_lp64]
 && [check_effective_target_ultrasparc_hw]) } {
return 1
+ } elseif { [istarget spu-*-*] } {
+   return 1
  } elseif { [istarget powerpc*-*-*] && [check_effective_target_lp64] } {
return 1
  } else {
*** proc check_effective_target_sync_int_lon
*** 5285,5290 
--- 5292,5298 
 || [istarget powerpc*-*-*]
 || [istarget crisv32-*-*] || [istarget cris-*-*]
 || ([istarget sparc*-*-*] && [check_effective_target_sparc_v9])
+|| [istarget spu-*-*]
 || [check_effective_target_mips_llsc] } {
 set et_sync_int_long_saved 1
  }
*** proc check_effective_target_sync_char_sh
*** 5315,5320 
--- 5323,5329 
 || [istarget powerpc*-*-*]
 || [istarget crisv32-*-*] || [istarget cris-*-*]
 || ([istarget sparc*-*-*] && [check_effective_target_sparc_v9])
+|| [istarget spu-*-*]
 || [check_effective_target_mips_llsc] } {
 set et_sync_char_short_saved 1
  }
Index: gcc/config/spu/spu-protos.h
===
*** gcc/config/spu/spu-protos.h (revision 228013)
--- gcc/config/spu/spu-protos.h (working copy)
*** extern void spu_builtin_promote (rtx ops
*** 76,81 
--- 76,83 
  extern void spu_expand_sign_extend (rtx ops[]);
  extern void spu_expand_vector_init (rtx target, rtx vals);
  extern rtx spu_legitimize_reload_address (rtx, machine_mode, int, int);
+ extern void spu_expand_atomic_op (enum rtx_code code, rtx mem, rtx val,
+ rtx orig_before, rtx orig_after);
  #endif /* RTX_CODE  */
  
  extern void spu_init_expanders (void);
Index: gcc/config/spu/spu.c
===
*** gcc/config/spu/spu.c(revision 228013)
--- gcc/config/spu/spu.c(working copy)
*** spu_canonicalize_comparison (int *code, 
*** 7121,7126 
--- 7121,7161 
*code = (int)swap_condition ((enum rtx_code)*code);

RFA: TM PATCH to volatile checking

2015-09-30 Thread Jason Merrill

A testcase in the TM TS pointed out a couple of holes in our volatile 
checking: we need to check for volatile accesses to general lvalues, not 
just variables, and we need to check in transaction_safe functions as 
well as transactions.


Tested x86_64-pc-linux-gnu.  OK for trunk?

commit e946db577784ed4670944dd91e81666af16793b3
Author: Jason Merrill 
Date:   Tue Sep 29 14:43:08 2015 -0400

	Diagnose volatile accesses in transaction_safe function.

	* trans-mem.c (volatile_lvalue_p): Rename from volatile_var_p.
	(diagnose_tm_1_op): Also diagnose volatile accesses in
	transaction_safe function.

diff --git a/gcc/testsuite/c-c++-common/tm/volatile-1.c b/gcc/testsuite/c-c++-common/tm/volatile-1.c
new file mode 100644
index 000..eb3799d
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/tm/volatile-1.c
@@ -0,0 +1,10 @@
+// Transaction-unsafe testcase from TM TS.
+// { dg-options -fgnu-tm }
+
+volatile int * p = 0;
+__attribute ((transaction_safe))
+int f() {
+  int x = 0;	 // ok: not volatile
+  p = 	 // ok: the pointer is not volatile
+  int i = *p;	 // { dg-error "volatile" "read through volatile glvalue" }
+}
diff --git a/gcc/testsuite/gcc.dg/tm/pr46654.c b/gcc/testsuite/gcc.dg/tm/pr46654.c
index bb63b68..563474e 100644
--- a/gcc/testsuite/gcc.dg/tm/pr46654.c
+++ b/gcc/testsuite/gcc.dg/tm/pr46654.c
@@ -7,7 +7,7 @@ int y;
 void foo(volatile int x)
 {
   __transaction_atomic {
-x = 5; /* { dg-error "invalid volatile use of 'x' inside transaction" } */
+x = 5; /* { dg-error "invalid use of volatile lvalue inside transaction" } */
 x += y;
 y++;
   }
@@ -20,7 +20,7 @@ volatile int i = 0;
 void george()
 {
   __transaction_atomic {
-   if (i == 2) /* { dg-error "invalid volatile use of 'i' inside transaction" } */
+   if (i == 2) /* { dg-error "invalid use of volatile lvalue inside transaction" } */
  i = 1;
   }
 }
diff --git a/gcc/trans-mem.c b/gcc/trans-mem.c
index d9a681f..a73d4bc 100644
--- a/gcc/trans-mem.c
+++ b/gcc/trans-mem.c
@@ -594,12 +594,12 @@ struct diagnose_tm
   gimple *stmt;
 };
 
-/* Return true if T is a volatile variable of some kind.  */
+/* Return true if T is a volatile lvalue of some kind.  */
 
 static bool
-volatile_var_p (tree t)
+volatile_lvalue_p (tree t)
 {
-  return (SSA_VAR_P (t)
+  return ((SSA_VAR_P (t) || REFERENCE_CLASS_P (t))
 	  && TREE_THIS_VOLATILE (TREE_TYPE (t)));
 }
 
@@ -612,14 +612,19 @@ diagnose_tm_1_op (tree *tp, int *walk_subtrees ATTRIBUTE_UNUSED,
   struct walk_stmt_info *wi = (struct walk_stmt_info *) data;
   struct diagnose_tm *d = (struct diagnose_tm *) wi->info;
 
-  if (volatile_var_p (*tp)
-  && d->block_flags & DIAG_TM_SAFE
-  && !d->saw_volatile)
+  if (TYPE_P (*tp))
+*walk_subtrees = false;
+  else if (volatile_lvalue_p (*tp)
+	   && !d->saw_volatile)
 {
   d->saw_volatile = 1;
-  error_at (gimple_location (d->stmt),
-		"invalid volatile use of %qD inside transaction",
-		*tp);
+  if (d->block_flags & DIAG_TM_SAFE)
+	error_at (gimple_location (d->stmt),
+		  "invalid use of volatile lvalue inside transaction");
+  else if (d->func_flags & DIAG_TM_SAFE)
+	error_at (gimple_location (d->stmt),
+		  "invalid use of volatile lvalue inside %"
+		  "function");
 }
 
   return NULL_TREE;
@@ -4300,7 +4305,7 @@ ipa_tm_scan_irr_block (basic_block bb)
 	{
 	  tree lhs = gimple_assign_lhs (stmt);
 	  tree rhs = gimple_assign_rhs1 (stmt);
-	  if (volatile_var_p (lhs) || volatile_var_p (rhs))
+	  if (volatile_lvalue_p (lhs) || volatile_lvalue_p (rhs))
 		return true;
 	}
 	  break;
@@ -4308,7 +4313,7 @@ ipa_tm_scan_irr_block (basic_block bb)
 	case GIMPLE_CALL:
 	  {
 	tree lhs = gimple_call_lhs (stmt);
-	if (lhs && volatile_var_p (lhs))
+	if (lhs && volatile_lvalue_p (lhs))
 	  return true;
 
 	if (is_tm_pure_call (stmt))

Re: [C/C++ PATCH] RFC: Implement -Wduplicated-cond (PR c/64249) (version 2)

2015-09-30 Thread Marek Polacek

Ping.

On Fri, Sep 18, 2015 at 03:44:34PM +0200, Marek Polacek wrote:
> On Fri, Sep 18, 2015 at 12:06:06PM +0200, Marek Polacek wrote:
> > > Since we don't know bar's side-effects we must assume they change
> > > the value of a and so we must avoid diagnosing the third if.
> > 
> > Ok, I'm convinced now.  We have something similar in the codebase:
> > libsupc++/eh_catch.cc has
> > 
> >   int count = header->handlerCount;
> >   if (count < 0)
> > {   
> >   // This exception was rethrown.  Decrement the (inverted) catch
> >   // count and remove it from the chain when it reaches zero.
> >   if (++count == 0)
> > globals->caughtExceptions = header->nextException;
> > }   
> >   else if (--count == 0)
> > {   
> >   // Handling for this exception is complete.  Destroy the object.
> >   globals->caughtExceptions = header->nextException;
> >   _Unwind_DeleteException (>unwindHeader);
> >   return;
> > }   
> >   else if (count < 0)
> > // A bug in the exception handling library or compiler.
> > std::terminate ();
> > 
> > Here all arms are reachable.  I guess I need to kill the chain of conditions
> > when we find something with side-effects, exactly as you suggested.
> 
> Done in the below.  This version actually bootstraps, because I've added
> -Wno-duplicated-cond for insn-dfatab.o and insn-latencytab.o (don't know
> how to fix these) + I've tweaked a condition in genemit.c.  The problem
> here is that we have
> 
>   if (INTVAL (x) == 0)
> printf ("const0_rtx");
>   else if (INTVAL (x) == 1)
> printf ("const1_rtx");
>   else if (INTVAL (x) == -1) 
> printf ("constm1_rtx");
>   // ...
>   else if (INTVAL (x) == STORE_FLAG_VALUE)
> printf ("const_true_rtx");
> 
> and STORE_FLAG_VALUE happens to be 1, so we have two same conditions.
> STORE_FLAG_VALUE is 1 or -1, but according to the documentation it can
> also be some other number so we should keep this if statement.  I've
> avoided the warning by adding STORE_FLAG_VALUE > 1 check.
> 
> How does this look like now?
> 
> Bootstrapped/regtested on x86_64-linux, ok for trunk?
> 
> 2015-09-18  Marek Polacek  
> 
>   PR c/64249
>   * c-common.c (warn_duplicated_cond_add_or_warn): New function.
>   * c-common.h (warn_duplicated_cond_add_or_warn): Declare.
>   * c.opt (Wduplicated-cond): New option.
> 
>   * c-parser.c (c_parser_statement_after_labels): Add CHAIN parameter
>   and pass it down to c_parser_if_statement.
>   (c_parser_else_body): Add CHAIN parameter and pass it down to
>   c_parser_statement_after_labels.
>   (c_parser_if_statement): Add CHAIN parameter.  Add code to warn about
>   duplicated if-else-if conditions.
> 
>   * parser.c (cp_parser_statement): Add CHAIN parameter and pass it
>   down to cp_parser_selection_statement.
>   (cp_parser_selection_statement): Add CHAIN parameter.  Add code to
>   warn about duplicated if-else-if conditions.
>   (cp_parser_implicitly_scoped_statement): Add CHAIN parameter and pass
>   it down to cp_parser_statement.
> 
>   * doc/invoke.texi: Document -Wduplicated-cond.
>   * Makefile.in (insn-latencytab.o): Use -Wno-duplicated-cond.
>   (insn-dfatab.o): Likewise.
>   * genemit.c (gen_exp): Rewrite condition to avoid -Wduplicated-cond
>   warning.
> 
>   * c-c++-common/Wduplicated-cond-1.c: New test.
>   * c-c++-common/Wduplicated-cond-2.c: New test.
>   * c-c++-common/Wduplicated-cond-3.c: New test.
>   * c-c++-common/Wduplicated-cond-4.c: New test.
>   * c-c++-common/Wmisleading-indentation.c (fn_37): Avoid
>   -Wduplicated-cond warning.
> 
> diff --git gcc/Makefile.in gcc/Makefile.in
> index c2df21d..d7caa76 100644
> --- gcc/Makefile.in
> +++ gcc/Makefile.in
> @@ -217,6 +217,8 @@ libgcov-merge-tool.o-warn = -Wno-error
>  gimple-match.o-warn = -Wno-unused
>  generic-match.o-warn = -Wno-unused
>  dfp.o-warn = -Wno-strict-aliasing
> +insn-latencytab.o-warn = -Wno-duplicated-cond
> +insn-dfatab.o-warn = -Wno-duplicated-cond
>  
>  # All warnings have to be shut off in stage1 if the compiler used then
>  # isn't gcc; configure determines that.  WARN_CFLAGS will be either
> diff --git gcc/c-family/c-common.c gcc/c-family/c-common.c
> index 4b922bf..8991215 100644
> --- gcc/c-family/c-common.c
> +++ gcc/c-family/c-common.c
> @@ -12919,4 +12919,45 @@ reject_gcc_builtin (const_tree expr, location_t loc 
> /* = UNKNOWN_LOCATION */)
>return false;
>  }
>  
> +/* If we're creating an if-else-if condition chain, first see if we
> +   already have this COND in the CHAIN.  If so, warn and don't add COND
> +   into the vector, otherwise add the COND there.  LOC is the location
> +   of COND.  */
> +
> +void
> +warn_duplicated_cond_add_or_warn (location_t loc, tree cond, vec 
> **chain)
> +{
> +  /* No chain has been created yet.  Do nothing.  */
> +  if (*chain == NULL)
> +return;
>

[PATCH] Clear flow-sensitive info in phiopt (PR tree-optimization/67769)

2015-09-30 Thread Marek Polacek

Another instance of out of date SSA range info.  Before phiopt1 we had

  :
  if (N_2(D) >= 0)
goto ;
  else
goto ;

  :
  iftmp.0_3 = MIN_EXPR ;

  :
  # iftmp.0_5 = PHI <0(2), iftmp.0_3(3)>
  value_4 = (short int) iftmp.0_5;
  return value_4;

and after phiop1:

  :
  iftmp.0_3 = MIN_EXPR ;
  iftmp.0_6 = MAX_EXPR ;
  value_4 = (short int) iftmp.0_6;
  return value_4;

But the flow-sensitive info in this BB hasn't been cleared up.

This problem doesn't show up in GCC5 but might be latent there.

Bootstrapped/regtested on x86_64-linux, ok for trunk and 5 as well?

2015-09-30  Marek Polacek  

PR tree-optimization/67769
* tree-ssa-phiopt.c (tree_ssa_phiopt_worker): Call
reset_flow_sensitive_info_in_bb when changing the CFG.

* gcc.dg/torture/pr67769.c: New test.

diff --git gcc/testsuite/gcc.dg/torture/pr67769.c 
gcc/testsuite/gcc.dg/torture/pr67769.c
index e69de29..c1d17c3 100644
--- gcc/testsuite/gcc.dg/torture/pr67769.c
+++ gcc/testsuite/gcc.dg/torture/pr67769.c
@@ -0,0 +1,23 @@
+/* { dg-do run } */
+
+static int
+clamp (int x, int lo, int hi)
+{
+  return (x < lo) ? lo : ((x > hi) ? hi : x);
+}
+
+__attribute__ ((noinline))
+short
+foo (int N)
+{
+  short value = clamp (N, 0, 16);
+  return value;
+}
+
+int
+main ()
+{
+  if (foo (-5) != 0)
+__builtin_abort ();
+  return 0;
+}
diff --git gcc/tree-ssa-phiopt.c gcc/tree-ssa-phiopt.c
index 37fdf28..101988a 100644
--- gcc/tree-ssa-phiopt.c
+++ gcc/tree-ssa-phiopt.c
@@ -338,6 +338,8 @@ tree_ssa_phiopt_worker (bool do_store_elim, bool 
do_hoist_loads)
  else if (minmax_replacement (bb, bb1, e1, e2, phi, arg0, arg1))
cfgchanged = true;
}
+  if (cfgchanged)
+   reset_flow_sensitive_info_in_bb (bb);
 }
 
   free (bb_order);

Marek

Re: [PATCH] Fix default_binds_local_p_2 for extern protected data

2015-09-30 Thread Szabolcs Nagy


On 30/09/15 14:47, Bernd Schmidt wrote:

On 09/17/2015 11:15 AM, Szabolcs Nagy wrote:

ping 2.

this patch is needed for working visibility ("protected")
attribute for extern data on targets using default_binds_local_p_2.
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01871.html


I hesitate to review this one since I don't think I understand the
issues on the various affected arches well enough. It looks like Jakub
had some input on the earlier changes, maybe he could take a look? Or
maybe rth knows best. Adding Ccs.

It would help to have examples of code generation demonstrating the
problem and how you would solve it. Input from the s390 maintainers
whether this is correct for their port would also be appreciated.



consider the TU

  __attribute__((visibility("protected"))) int n;

  int f () { return n; }

if n "binds_local" then gcc -O -fpic -S is like

.text
.align  2
.global f
.arch armv8-a+fp+simd
.type   f, %function
f:
adrpx0, n
ldr w0, [x0, #:lo12:n]
ret
.size   f, .-f
.protected  n
.comm   n,4,4

so 'n' is a direct reference, not accessed through
the GOT ('n' will be in the .bss of the dso).
this is the current behavior.

if i remove the protected visibility attribute
then the access goes through GOT:

.text
.align  2
.global f
.arch armv8-a+fp+simd
.type   f, %function
f:
adrpx0, _GLOBAL_OFFSET_TABLE_
ldr x0, [x0, #:gotpage_lo15:n]
ldr w0, [x0]
ret
.size   f, .-f
.comm   n,4,4

protected visibility means the definition cannot
be overridden by another module, but it should
still allow extern references.

if the main module references such an object then
(as an implementation detail) it may use copy
relocation against it, which places 'n' in the
main module and the dynamic linker should make
sure that references to 'n' point there.

this is only possible if references to 'n' go
through the GOT (i.e. it should not be "binds_local").

this got fixed on x86 (there are explanations in
pr65248, pr55012), but the default_binds_local_p
logic is not fixed.


Needs a further binutils patch too to emit R_*_GLOB_DAT
instead of R_*_RELATIVE relocs for protected data.
The glibc elf/tst-protected1a and elf/tst-protected1b
tests depend on this.


What is the consequence of not having this binutils patch? Is the gcc
patch and improvement, a null, or are there situations where it causes
regressions without the binutils patch?



does not cause regressions.

binutils also assumed that extern protected data is
local so it did not emit the R_*_GLOB_DAT relocations
for it which means there is no symbolic GOT entry that
the dynamic-linker can update, so the behavior is the
same as before.

glibc dynamic linker also failed to handle this case
with similar effect.

so for correct behavior new gcc, new binutils and new
glibc are needed, if any one is old that results the old
behavior.

binutils trunk and glibc-2.22 have the fix for aarch64
and arm (and x86 and various other arches).


Tested ARM and AArch64 targets.


Tested how, with or without this binutils patch?



with old and new binutils as well.

PING: [PATCH] Limit alignment on error_mark_node variable

2015-09-30 Thread H.J. Lu

PING

On Fri, Jul 10, 2015 at 5:19 AM, H.J. Lu  wrote:
> On Thu, Jul 09, 2015 at 03:57:31PM +0200, Richard Biener wrote:
>> On Thu, Jul 9, 2015 at 1:08 PM, H.J. Lu  wrote:
>> > On Thu, Jul 9, 2015 at 2:54 AM, Richard Biener
>> >  wrote:
>> >> On Thu, Jul 9, 2015 at 11:52 AM, H.J. Lu  wrote:
>> >>> On Thu, Jul 09, 2015 at 10:16:38AM +0200, Richard Biener wrote:
>>  On Wed, Jul 8, 2015 at 5:32 PM, H.J. Lu  wrote:
>>  > There is no need to try different alignment on variable of
>>  > error_mark_node.
>>  >
>>  > OK for trunk if there is no regression?
>> 
>>  Can't we avoid calling align_variable on error_mark_node type decls
>>  completely?  That is, punt earlier when we try to emit it.
>> 
>> >>>
>> >>> How about this?  OK for trunk?
>> >>
>> >> Heh, you now get the obvious question why we can't simply avoid
>> >> adding the varpool node in the first place ;)
>> >>
>> >
>> > When it was first added to varpool, its type was OK:
>> >
>> > (gdb) bt
>> > #0  varpool_node::get_create (decl=)
>> > at /export/gnu/import/git/sources/gcc/gcc/varpool.c:150
>> > #1  0x00e1c3e8 in rest_of_decl_compilation (
>> > decl=, top_level=1, at_end=0)
>> > at /export/gnu/import/git/sources/gcc/gcc/passes.c:271
>> > #2  0x00731d39 in finish_decl (decl=,
>> > init_loc=0, init=, origtype=, asmspec_tree=> > 0x0>)
>> > at /export/gnu/import/git/sources/gcc/gcc/c/c-decl.c:4863
>> > #3  0x0078d1ed in c_parser_declaration_or_fndef (
>> > parser=0x715050a8, fndef_ok=false, static_assert_ok=true,
>> > empty_ok=true, nested=false, start_attr_ok=true,
>> > objc_foreach_object_declaration=0x0, omp_declare_simd_clauses=...)
>> > at /export/gnu/import/git/sources/gcc/gcc/c/c-parser.c:1855
>> > #4  0x0078c234 in c_parser_external_declaration 
>> > (parser=0x715050a8)
>> > at /export/gnu/import/git/sources/gcc/gcc/c/c-parser.c:1435
>> > #5  0x0078be45 in c_parser_translation_unit (parser=0x715050a8)
>> > at /export/gnu/import/git/sources/gcc/gcc/c/c-parser.c:1322
>> > #6  0x007b3271 in c_parse_file ()
>> > at /export/gnu/import/git/sources/gcc/gcc/c/c-parser.c:15440
>> > #7  0x0081cb97 in c_common_parse_file ()
>> > at /export/gnu/import/git/sources/gcc/gcc/c-family/c-opts.c:1059
>> > #8  0x00f27662 in compile_file ()
>> > at /export/gnu/import/git/sources/gcc/gcc/toplev.c:543
>> > ---Type  to continue, or q  to quit---
>> > #9  0x00f29baa in do_compile ()
>> > at /export/gnu/import/git/sources/gcc/gcc/toplev.c:2041
>> > #10 0x00f29df9 in toplev::main (this=0x7fffdc90, argc=17,
>> > argv=0x7fffdd98)
>> > at /export/gnu/import/git/sources/gcc/gcc/toplev.c:2142
>> > #11 0x017d8228 in main (argc=17, argv=0x7fffdd98)
>> > at /export/gnu/import/git/sources/gcc/gcc/main.c:39
>> >
>> > Later, it was turned into error_mark_node:
>> >
>> > Old value = 
>> > New value = 
>> > finish_decl (decl=, init_loc=0, init=,
>> > origtype=, asmspec_tree=)
>> > at /export/gnu/import/git/sources/gcc/gcc/c/c-decl.c:4802
>> > 4802  if (TREE_USED (type))
>> > (gdb) bt
>> > #0  finish_decl (decl=, init_loc=0,
>> > init=, origtype=, asmspec_tree=)
>> > at /export/gnu/import/git/sources/gcc/gcc/c/c-decl.c:4802
>> > #1  0x0078d1ed in c_parser_declaration_or_fndef (
>> > parser=0x715050a8, fndef_ok=false, static_assert_ok=true,
>> > empty_ok=true, nested=true, start_attr_ok=true,
>> > objc_foreach_object_declaration=0x0, omp_declare_simd_clauses=...)
>> > at /export/gnu/import/git/sources/gcc/gcc/c/c-parser.c:1855
>> > #2  0x00792a23 in c_parser_compound_statement_nostart (
>> > parser=0x715050a8)
>> > at /export/gnu/import/git/sources/gcc/gcc/c/c-parser.c:4621
>> > #3  0x00792688 in c_parser_compound_statement 
>> > (parser=0x715050a8)
>> > at /export/gnu/import/git/sources/gcc/gcc/c/c-parser.c:4532
>> > #4  0x0078d5a3 in c_parser_declaration_or_fndef (
>> > parser=0x715050a8, fndef_ok=true, static_assert_ok=true,
>> > empty_ok=true, nested=false, start_attr_ok=true,
>> > objc_foreach_object_declaration=0x0, omp_declare_simd_clauses=...)
>> > at /export/gnu/import/git/sources/gcc/gcc/c/c-parser.c:1965
>> > #5  0x0078c234 in c_parser_external_declaration 
>> > (parser=0x715050a8)
>> > at /export/gnu/import/git/sources/gcc/gcc/c/c-parser.c:1435
>> > #6  0x0078be45 in c_parser_translation_unit (parser=0x715050a8)
>> > at /export/gnu/import/git/sources/gcc/gcc/c/c-parser.c:1322
>> > #7  0x007b3271 in c_parse_file ()
>> > ---Type  to continue, or q  to quit---
>> > at /export/gnu/import/git/sources/gcc/gcc/c/c-parser.c:15440
>> > #8  0x0081cb97 in c_common_parse_file ()
>> > at

Re: PING^2: [gcc-5-branch][PATCH] PR rtl-optimization/67029: gcc-5.2.0 unable to find a register to spill with O3 fsched-pressure fschedule-insns

2015-09-30 Thread Vladimir Makarov


On 09/30/2015 09:02 AM, H.J. Lu wrote:

OK to back port it to GCC 5?

OK.  Sorry, I've missed your request.

Re: [PATCH] Optimize certain end of loop conditions into min/max operation

2015-09-30 Thread Michael Collison



The current patch is attached.

2015-09-30  Michael Collison  
Andrew Pinski 

* match.pd ((x < y) && (x < z) -> x < min (y,z),
(x > y) and (x > z) -> x > max (y,z))
* testsuite/gcc.dg/tree-ssa/minmax-loopend.c: New test.


On 09/30/2015 01:14 AM, Richard Biener wrote:

On Wed, Sep 30, 2015 at 9:29 AM, Michael Collison
 wrote:

Richard and Marc,

What is ':s'? I don't see any documentation for it. So you would like me to
remove :c and add :s?

There is documentation for both in the internals manual.

I don't have enough context to say whether you should remove "them" or
not.  What's
the current patch?  If you made the suggested changes you should be left with
only required :s and :c.

Richard.



On 09/18/2015 02:23 AM, Richard Biener wrote:

On Fri, Sep 18, 2015 at 9:38 AM, Marc Glisse  wrote:

Just a couple extra points. We can end up with a mix of < and >, which
might
prevent from matching:

_3 = b_1(D) > a_2(D);
_5 = a_2(D) < c_4(D);
_8 = _3 & _5;

Just like with &, we could also transform:
x < y | x < z  --->  x < max(y, z)

(but maybe wait to make sure reviewers are ok with the first
transformation
before generalizing)

Please merge the patterns as suggested and do the :c/:s changes as well.

The issue with getting mixed < and > is indeed there - I've wanted to
extend :c to handle tcc_comparison in some way at some point but
didn't get to how best to implement that yet...

So to fix that currently you have to replicate the merged pattern
with swapped comparison operands.

Otherwise I'm fine with the general approach.

Richard.


On Fri, 18 Sep 2015, Marc Glisse wrote:


On Thu, 17 Sep 2015, Michael Collison wrote:


Here is the the patch modified with test cases for MIN_EXPR and
MAX_EXPR
expressions. I need some assistance; this test case will fail on
targets
that don't have support for MIN/MAX such as 68k. Is there any way to
remedy
this short of enumerating whether a target support MIN/MAX in
testsuite/lib/target_support?

2015-07-24  Michael Collison 
 Andrew Pinski 

 * match.pd ((x < y) && (x < z) -> x < min (y,z),
 (x > y) and (x > z) -> x > max (y,z))
 * testsuite/gcc.dg/tree-ssa/minmax-loopend.c: New test.

diff --git a/gcc/match.pd b/gcc/match.pd
index 5e8fd32..8691710 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1793,3 +1793,17 @@ along with GCC; see the file COPYING3.  If not
see
  (convert (bit_and (op (convert:utype @0) (convert:utype @1))
(convert:utype @4)))

+
+/* Transform (@0 < @1 and @0 < @2) to use min */
+(for op (lt le)
+(simplify


You seem to be missing all indentation.


+(bit_and:c (op @0 @1) (op @0 @2))


:c seems useless here. On the other hand, it might make sense to use
op:s
since this is mostly useful if it removes the 2 original comparisons.


+(if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))


How did you chose this restriction? It seems safe enough, but the
transformation could make sense in other cases as well. It can always be
generalized later though.


+(op @0 (min @1 @2)
+
+/* Transform (@0 > @1 and @0 > @2) to use max */
+(for op (gt ge)


Note that you could unify the patterns with something like:
(for op (lt le gt ge)
  ext (min min max max)
(simplify ...


+(simplify
+(bit_and:c (op @0 @1) (op @0 @2))
+(if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
+(op @0 (max @1 @2)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c
b/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c
new file mode 100644
index 000..cc0189a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+#define N 1024
+
+int a[N], b[N], c[N];
+
+void add (unsigned int m, unsigned int n)
+{
+  unsigned int i;
+  for (i = 0; i < m && i < n; ++i)


Maybe writing '&' instead of '&&' would make it depend less on the
target.
Also, both tests seem to be for GENERIC (i.e. I expect that you are
already
seeing the optimized version with -fdump-tree-original or
-fdump-tree-gimple). Maybe something as simple as:
int f(long a, long b, long c) {
   int cmp1 = a < b;
   int cmp2 = a < c;
   return cmp1 & cmp2;
}


+a[i] = b[i] + c[i];
+}
+
+void add2 (unsigned int m, unsigned int n)
+{
+  unsigned int i;
+  for (i = N-1; i > m && i > n; --i)
+a[i] = b[i] + c[i];
+}
+
+/* { dg-final { scan-tree-dump "MIN_EXPR" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump "MAX_EXPR" 1 "optimized" } } */




--
Marc Glisse


--
Michael Collison
Linaro Toolchain Working Group
michael.colli...@linaro.org



--
Michael Collison
Linaro Toolchain Working Group
michael.colli...@linaro.org

diff --git a/gcc/match.pd b/gcc/match.pd
index 5e8fd32..77cb91d 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1793,3

Re: [GCC, ARM] armv8 linux toolchain asan testcase fail due to stl missing conditional code

2015-09-30 Thread Kyrill Tkachov



On 09/06/15 09:17, Kyrill Tkachov wrote:

On 05/06/15 14:14, Kyrill Tkachov wrote:

On 05/06/15 14:11, Richard Earnshaw wrote:

On 05/06/15 14:08, Kyrill Tkachov wrote:

Hi Shiva,

On 05/06/15 10:42, Shiva Chen wrote:

Hi, Kyrill

I add the testcase as stl-cond.c.

Could you help to check the testcase ?

If it's OK, Could you help me to apply the patch ?


This looks ok to me.
One nit on the testcase:

diff --git a/gcc/testsuite/gcc.target/arm/stl-cond.c
b/gcc/testsuite/gcc.target/arm/stl-cond.c
new file mode 100755
index 000..44c6249
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/stl-cond.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_arch_v8a_ok } */
+/* { dg-options "-O2" } */

This should also have -marm as the problem exhibited itself in arm state.
I'll commit this patch with this change in 24 hours on your behalf if no
one
objects.


Explicit use of -marm will break multi-lib testing.  I've forgotten the
correct hook, but there's most-likely something that will give you the
right behaviour, even if it means that thumb-only multi-lib testing
skips this test.

So I think what we want is:

dg-require-effective-target arm_arm_ok

The comment in target-supports.exp is:
# Return 1 if this is an ARM target where -marm causes ARM to be
# used (not Thumb)


I've committed the attached patch to trunk on Shiva's behalf with r224269.
It gates the test on arm_arm_ok and adds -marm, like other similar tests.
The ChangeLog I used is below:


I'd like to backport this to GCC 5 and 4.9
The patch applies and tests cleanly on GCC 5.
On 4.9 it needs some minor changes, which I'm attaching here.
I've bootstrapped and tested this patch on 4.9 and the Shiva's
original patch on GCC 5.

2015-09-30  Kyrylo Tkachov  

Backport from mainline
2015-06-09  Shiva Chen  

* sync.md (atomic_load): Add conditional code for lda/ldr
(atomic_store): Likewise.

2015-09-30  Kyrylo Tkachov  

Backport from mainline
2015-06-09  Shiva Chen  

* gcc.target/arm/stl-cond.c: New test.


I'll commit them tomorrow.
Thanks,
Kyrill





2015-06-09  Shiva Chen  

  * sync.md (atomic_load): Add conditional code for lda/ldr
  (atomic_store): Likewise.

2015-06-09  Shiva Chen  

  * gcc.target/arm/stl-cond.c: New test.


Thanks,
Kyrill


Kyrill



R.


Ramana, Richard, we need to backport it to GCC 5 as well, right?

Thanks,
Kyrill



Thanks,

Shiva

2015-06-05 16:34 GMT+08:00 Kyrill Tkachov :

Hi Shiva,

On 05/06/15 09:29, Shiva Chen wrote:

Hi, Kyrill

I update the patch as Richard's suggestion.

-  return \"str\t%1, %0\";
+  return \"str%(%)\t%1, %0\";
 else
-  return \"stl\t%1, %0\";
+  return \"stl%?\t%1, %0\";
   }
-)
+  [(set_attr "predicable" "yes")
+   (set_attr "predicable_short_it" "no")])
+  [(set_attr "predicable" "yes")
+   (set_attr "predicable_short_it" "no")])


Let me sum up.

We add predicable attribute to allow gcc do if-conversion in
ce1/ce2/ce3 not only in final phase by final_prescan_insn finite state
machine.

We set predicalble_short_it to "no" to restrict conditional code
generation on armv8 with thumb mode.

However, we could use the flags -mno-restrict-it to force generating
conditional code on thumb mode.

Therefore, we have to consider the assembly output format for strb
with condition code on arm/thumb mode.

Because arm/thumb mode use different syntax for strb,
we output the assembly as str%(%)
which will put the condition code in the right place according to
TARGET_UNIFIED_ASM.

Is there still missing something ?

That's all correct, and well summarised :)
The patch looks good to me, but please include the testcase
(test.c from earlier) appropriately marked up for the testsuite.
I think to the level of dg-assemble, just so we know everything is
wired up properly.

Thanks for dealing with this.
Kyrill



Thanks,

Shiva

2015-06-04 18:00 GMT+08:00 Kyrill Tkachov
:

Hi Shiva,

On 04/06/15 10:57, Shiva Chen wrote:

Hi, Kyrill

Thanks for the tips of syntax.

It seems that correct syntax for

ldrb with condition code is ldreqb

ldab with condition code is ldabeq


So I modified the pattern as follow

   {
 enum memmodel model = (enum memmodel) INTVAL (operands[2]);
 if (model == MEMMODEL_RELAXED
 || model == MEMMODEL_CONSUME
 || model == MEMMODEL_RELEASE)
   return \"ldr%?\\t%0, %1\";
 else
   return \"lda%?\\t%0, %1\";
   }
   [(set_attr "predicable" "yes")
(set_attr "predicable_short_it" "no")])

It seems we don't have to worry about thumb mode,

I suggest you use Richard's suggestion from:
https://gcc.gnu.org/ml/gcc-patches/2015-06/msg00384.html
to write this in a clean way.


Because we already set "predicable" "yes" and predicable_short_it"
"no"
for

Re: [PATCH] Fix warnings building pdp11 port

2015-09-30 Thread Jeff Law


On 09/30/2015 01:48 AM, Richard Biener wrote:

On Tue, Sep 29, 2015 at 6:55 PM, Jeff Law  wrote:

The pdp11 port fails to build with the trunk because of a warning.
Essentially VRP determines that the result of using BRANCH_COST is a
constant with the range [0..1].  That's always less than 4, 3 and the
various other magic constants used with BRANCH_COST and VRP issues a warning
about that comparison.


It does?  Huh.  Is it about undefined overflow which is the only thing
VRP should end up
warning about?  If so I wonder how that happens, at least I can't
reproduce it for
--target=pdp11 --enable-werror build of cc1.
You have to use a trunk compiler to build the pdp11 cross.  You'll bump 
into this repeatedly:


  if (warn_type_limits
  && ret && only_ranges
  && TREE_CODE_CLASS (code) == tcc_comparison
  && TREE_CODE (op0) == SSA_NAME)
{
  /* If the comparison is being folded and the operand on the LHS
 is being compared against a constant value that is outside of
 the natural range of OP0's type, then the predicate will
 always fold regardless of the value of OP0.  If -Wtype-limits
 was specified, emit a warning.  */
  tree type = TREE_TYPE (op0);
  value_range_t *vr0 = get_value_range (op0);

  if (vr0->type == VR_RANGE
  && INTEGRAL_TYPE_P (type)
  && vrp_val_is_min (vr0->min)
  && vrp_val_is_max (vr0->max)
  && is_gimple_min_invariant (op1))
{
  location_t location;

  if (!gimple_has_location (stmt))
location = input_location;
  else
location = gimple_location (stmt);

  warning_at (location, OPT_Wtype_limits,
  integer_zerop (ret)
  ? G_("comparison always false "
   "due to limited range of data type")
  : G_("comparison always true "
   "due to limited range of data type"));
}
}

  return ret;
}


Jeff

Refactor intelmic-mkoffload.c argv building to use obstacks (was: [PATCH 1/4] Add mkoffload for Intel MIC)

2015-09-30 Thread Thomas Schwinge

Hi!

On Mon, 28 Sep 2015 13:27:32 +0200, Bernd Schmidt  wrote:
> On 09/28/2015 01:25 PM, Ilya Verbin wrote:
> > On Mon, Sep 28, 2015 at 12:09:19 +0200, Bernd Schmidt wrote:
> >> On 09/28/2015 12:03 PM, Bernd Schmidt wrote:
> >>> On 09/28/2015 10:26 AM, Thomas Schwinge wrote:
>  -  objcopy_argv[8] = NULL;
>  +  objcopy_argv[objcopy_argc++] = NULL;
>  +  gcc_checking_assert (objcopy_argc <= OBJCOPY_ARGC_MAX);
> >>>
> >>> On its own this is not an improvement - you're trading a compile time
> >>> error for a runtime error. So, what is the other change this is
> >>> preparing for?
> >>
> >> Ok, I now see the other patch. But I also see that other code in the same
> >> file and in the nvptx mkoffload is using the obstack_ptr_grow method to
> >> build argv arrays, I think that would be preferrable to this.
> >
> > I've removed obstack_ptr_grow for arrays with known sizes after this review:
> > https://gcc.gnu.org/ml/gcc-patches/2014-10/msg02210.html
> 
> That's unfortunate, I think that made the code less future-proof. IMO we 
> should revert to the obstack method especially if Thomas -v patch goes in.

Given that the discussion has settled in favor of using obstacks, I have
committed the following in r228300:

commit 99043644772bdc4b76d44058014d664ce27867f7
Author: tschwinge 
Date:   Wed Sep 30 16:42:22 2015 +

Refactor intelmic-mkoffload.c argv building to use obstacks

That is, restore and adapt the code as originally proposed.

gcc/
* config/i386/intelmic-mkoffload.c (generate_host_descr_file)
(prepare_target_image, main): Refactor argv building to use
obstacks.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@228300 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog|   8 +++
 gcc/config/i386/intelmic-mkoffload.c | 132 +++
 2 files changed, 80 insertions(+), 60 deletions(-)

diff --git gcc/ChangeLog gcc/ChangeLog
index 86f81d6..15f84ab 100644
--- gcc/ChangeLog
+++ gcc/ChangeLog
@@ -1,3 +1,11 @@
+2015-09-30  Thomas Schwinge  
+   Ilya Verbin  
+   Andrey Turetskiy  
+
+   * config/i386/intelmic-mkoffload.c (generate_host_descr_file)
+   (prepare_target_image, main): Refactor argv building to use
+   obstacks.
+
 2015-09-30  Ulrich Weigand  
 
* config/spu/spu-protos.h (spu_expand_atomic_op): Add prototype.
diff --git gcc/config/i386/intelmic-mkoffload.c 
gcc/config/i386/intelmic-mkoffload.c
index 065d408..ae88ecd 100644
--- gcc/config/i386/intelmic-mkoffload.c
+++ gcc/config/i386/intelmic-mkoffload.c
@@ -379,29 +379,31 @@ generate_host_descr_file (const char *host_compiler)
 
   fclose (src_file);
 
-  unsigned new_argc = 0;
-  const char *new_argv[9];
-  new_argv[new_argc++] = host_compiler;
-  new_argv[new_argc++] = "-c";
-  new_argv[new_argc++] = "-fPIC";
-  new_argv[new_argc++] = "-shared";
+  struct obstack argv_obstack;
+  obstack_init (_obstack);
+  obstack_ptr_grow (_obstack, host_compiler);
+  obstack_ptr_grow (_obstack, "-c");
+  obstack_ptr_grow (_obstack, "-fPIC");
+  obstack_ptr_grow (_obstack, "-shared");
   switch (offload_abi)
 {
 case OFFLOAD_ABI_LP64:
-  new_argv[new_argc++] = "-m64";
+  obstack_ptr_grow (_obstack, "-m64");
   break;
 case OFFLOAD_ABI_ILP32:
-  new_argv[new_argc++] = "-m32";
+  obstack_ptr_grow (_obstack, "-m32");
   break;
 default:
   gcc_unreachable ();
 }
-  new_argv[new_argc++] = src_filename;
-  new_argv[new_argc++] = "-o";
-  new_argv[new_argc++] = obj_filename;
-  new_argv[new_argc++] = NULL;
+  obstack_ptr_grow (_obstack, src_filename);
+  obstack_ptr_grow (_obstack, "-o");
+  obstack_ptr_grow (_obstack, obj_filename);
+  obstack_ptr_grow (_obstack, NULL);
 
-  fork_execute (new_argv[0], CONST_CAST (char **, new_argv), false);
+  char **argv = XOBFINISH (_obstack, char **);
+  fork_execute (argv[0], argv, false);
+  obstack_free (_obstack, NULL);
 
   return obj_filename;
 }
@@ -448,29 +450,31 @@ prepare_target_image (const char *target_compiler, int 
argc, char **argv)
   char *rename_section_opt
 = XALLOCAVEC (char, sizeof (".data=") + strlen (image_section_name));
   sprintf (rename_section_opt, ".data=%s", image_section_name);
-  const char *objcopy_argv[11];
-  objcopy_argv[0] = "objcopy";
-  objcopy_argv[1] = "-B";
-  objcopy_argv[2] = "i386";
-  objcopy_argv[3] = "-I";
-  objcopy_argv[4] = "binary";
-  objcopy_argv[5] = "-O";
+  obstack_init (_obstack);
+  obstack_ptr_grow (_obstack, "objcopy");
+  obstack_ptr_grow (_obstack, "-B");
+  obstack_ptr_grow (_obstack, "i386");
+  obstack_ptr_grow (_obstack, "-I");
+  obstack_ptr_grow (_obstack, "binary");
+  obstack_ptr_grow (_obstack, "-O");
   switch (offload_abi)
 {
 case OFFLOAD_ABI_LP64:
-  objcopy_argv[6] =

Re: [PATCH] Convert SPARC to LRA

2015-09-30 Thread Segher Boessenkool

On Wed, Sep 30, 2015 at 09:15:17AM -0600, Jeff Law wrote:
> >I guess the support of cc0 can be implemented for reasonable amount of
> >time.  It is just a priority issue.  I still have a lot PRs for the
> >targets already using LRA.
> I wouldn't suggest making cc0 support a significant priority.   I'd be 
> more likely to push for deprecating cc0 targets first.

It looks like most cc0 targets would be pretty easy to convert, if anyone
can do testing anyway ;-)


Segher

Re: [C/C++ PATCH] RFC: Implement -Wduplicated-cond (PR c/64249) (version 2)

2015-09-30 Thread Joseph Myers

The C front-end changes are OK.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH] Fix PR67037: Wrong code at -O1 and above on ARM

2015-09-30 Thread Vladimir Makarov


On 09/30/2015 07:40 AM, Bernd Edlinger wrote:

Hi,


I noticed, that this wrong code bug is caused by two insns which are created by 
the
function process_addr_reg in lra-constraints.c, which share a common sub 
expression.
The fix is rather simple, use copy_rtx when the memory reference is used twice.

The patch was boot-strapped and regression-tested
on x86_64-pc-linux-gnu and arm-linux-gnueabihf.

OK for trunk?



Yes, thanks.

Re: New OpenACC pass and Target Hook

2015-09-30 Thread Nathan Sidwell


On 09/30/15 09:27, Bernd Schmidt wrote:


The function name and/or comment should mention OpenACC, it could be confusing
otherwise.


Done (both)



I don't know what style to use for unused args now that we have C++. I'm fine
with this, and it presumably will be changed anyway.


Correct.


I feel the documentation should be expanded to say what FN_LEVEL does. I guess
it's one of gang/worker/vector. Without real code to look at yet I'm left
wondering how the hook would use it.


Expanded documentation.



dims is not unused in this function.

Fixed.


Otherwise it looks ok to me.


For avoidance of doubt, is this approval, or 'LGTM, but needs Jakub's approval'?

nathan
2015-09-30  Nathan Sidwell  
	Cesar Philippidis  

	gcc/
	* config/nvptx/nvptx.c (nvptx_goacc_validate_dims): New.
	(TARGET_GOACC_VALIDATE_DIMS): Override.
	* target.def (TARGET_GOACC): New target hook prefix.
	(validate_dims): New hook.
	* targhooks.h (default_goacc_validate_dims): New.
	* omp-low.c (oacc_validate_dims): New.
	(execute_oacc_device_lower): New.
	(default_goacc_validate_dims): New.
	(pass_data_oacc_device_lower): New.
	(pass_oacc_device_lower): New pass.
	(make_pass_oacc_device_lower): New.
	* tree-pass.h (make_pass_oacc_device_lower): Declare.
	* passes.def (pass_oacc_device_lower): Add it.
	* doc/tm.texi: Rebuilt.
	* doc/tm.texi.in (TARGET_GOACC_VALIDATE_DIMS): Add hook.
	* doc/invoke.texi (oaccdevlow): Document tree dump flag.

Index: gcc/targhooks.h
===
--- gcc/targhooks.h	(revision 228245)
+++ gcc/targhooks.h	(working copy)
@@ -107,6 +107,9 @@ extern unsigned default_add_stmt_cost (v
 extern void default_finish_cost (void *, unsigned *, unsigned *, unsigned *);
 extern void default_destroy_cost_data (void *);
 
+/* OpenACC hooks.  */
+extern bool default_goacc_validate_dims (tree, int [], int);
+
 /* These are here, and not in hooks.[ch], because not all users of
hooks.h include tm.h, and thus we don't have CUMULATIVE_ARGS.  */
 
Index: gcc/target.def
===
--- gcc/target.def	(revision 228245)
+++ gcc/target.def	(working copy)
@@ -1639,6 +1639,27 @@ int, (struct cgraph_node *), NULL)
 
 HOOK_VECTOR_END (simd_clone)
 
+/* Functions relating to openacc.  */
+#undef HOOK_PREFIX
+#define HOOK_PREFIX "TARGET_GOACC_"
+HOOK_VECTOR (TARGET_GOACC, goacc)
+
+DEFHOOK
+(validate_dims,
+"This hook should check the launch dimensions provided for an OpenACC\n\
+compute region, or routine.  Defaulted values are represented as -1\n\
+and non-constant values as 0. The @var{fn_level} is negative for the\n\
+function corresponding to the compute region.  For a routine is is the\n\
+outermost level at which partitioned execution may be spawned.  It\n\
+should fill in anything that needs to default to non-unity and verify\n\
+non-defaults.  Diagnostics should be issued as appropriate.  Return\n\
+true, if changes have been made.  You must override this hook to\n\
+provide dimensions larger than 1.",
+bool, (tree decl, int dims[], int fn_level),
+default_goacc_validate_dims)
+
+HOOK_VECTOR_END (goacc)
+
 /* Functions relating to vectorization.  */
 #undef HOOK_PREFIX
 #define HOOK_PREFIX "TARGET_VECTORIZE_"
Index: gcc/tree-pass.h
===
--- gcc/tree-pass.h	(revision 228245)
+++ gcc/tree-pass.h	(working copy)
@@ -406,6 +406,7 @@ extern gimple_opt_pass *make_pass_lower_
 extern gimple_opt_pass *make_pass_diagnose_omp_blocks (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_expand_omp (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_expand_omp_ssa (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_oacc_device_lower (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_object_sizes (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_strlen (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_fold_builtins (gcc::context *ctxt);
Index: gcc/omp-low.c
===
--- gcc/omp-low.c	(revision 228245)
+++ gcc/omp-low.c	(working copy)
@@ -8867,7 +8867,6 @@ expand_omp_atomic (struct omp_region *re
   expand_omp_atomic_mutex (load_bb, store_bb, addr, loaded_val, stored_val);
 }
 
-
 /* Encode an oacc launc argument.  This matches the GOMP_LAUNCH_PACK
macro on gomp-constants.h.  We do not check for overflow.  */
 
@@ -14020,4 +14019,146 @@ omp_finish_file (void)
 }
 }
 
+/* Validate and update the dimensions for offloaded FN.  ATTRS is the
+   raw attribute.  DIMS is an array of dimensions, which is returned.
+   Returns the function level dimensionality --  the level at which an
+   offload routine wishes to partition a loop.  */
+
+static int
+oacc_validate_dims (tree fn, tree attrs, int *dims)
+{
+  tree purpose[GOMP_DIM_MAX];
+  unsigned ix;
+  tree pos = TREE_VALUE (attrs);
+  int fn_level = -1;
+
+  /* Make sure

Re: Ping^2 Re: Pass -foffload targets from driver to libgomp at link time

2015-09-30 Thread Thomas Schwinge

Hi!

On Tue, 29 Sep 2015 10:18:14 +0200, Jakub Jelinek  wrote:
> On Mon, Sep 28, 2015 at 11:39:10AM +0200, Thomas Schwinge wrote:
> > On Fri, 11 Sep 2015 17:43:49 +0200, Jakub Jelinek  wrote:
> > > So, do I understand well that you'll call GOMP_set_offload_targets from
> > > construct[ors] of all shared libraries (and the binary) that contain 
> > > offloaded
> > > code?  If yes, that is surely going to fail the assertions in there.
> > 
> > Indeed.  My original plan has been to generate/invoke this constructor
> > only for/from the final executable and not for any shared libraries, but
> > it seems I didn't implemented this correctly.
> 
> How would you mean to implement it?

I have come to realize that we need to generate/invoke this constructor
>From everything that links against libgomp (which is what I implemented),
that is, executables as well as shared libraries.

> -fopenmp or -fopenacc code with
> offloading bits might not be in the final executable at all, nor in shared
> libraries it is linked against; such libraries could be only dlopened,
> consider say python plugin.  And this is not just made up, perhaps not with
> offloading yet, but people regularly use OpenMP code in plugins and then we
> get complains that fork child of the main program is not allowed to do
> anything but async-signal-safe functions.

I'm not sure I'm completely understanding that paragraph?  Are you saying
that offloaded code can be in libraries that are not linked against
libgomp?  How would these register (GOMP_offload_register) their
offloaded code?  I think it's a reasonable to expect that every shared
library that contains offloaded code must link against libgomp, which
will happen automatically given that it is built with -fopenmp/-fopenacc?

> > > You can dlopen such libraries etc.  What if you link one library with
> > > -fopenmp=nvptx-none and another one with 
> > > -fopenmp=x86_64-intelmicemul-linux?
> > 
> > So, the first question to answer is: what do we expect to happen in this
> > case, or similarly, if the executable and any shared libraries are
> > compiled with different/incompatible -foffload options?
> 
> As the device numbers are per-process, the only possibility I see is that
> all the physically available devices are always available, and just if you
> try to offload from some code to a device that doesn't support it, you get
> host fallback.  Because, one shared library could carefully use device(xyz)
> to offload to say XeonPhi it is compiled for and supports, and another
> library device(abc) to offload to PTX it is compiled for and supports.

OK, I think I get that, and it makes sense.  Even though, I don't know
how you'd do that today: as far as I can tell, there is no specification
covering the OpenMP 4 target device IDs, so I have no idea how a user
program/library could realiably use them in practice?  For example, in
the current GCC implementation, the OpenMP 4 target device IDs depend on
the number of individual devices availble in the system, and the order in
which libgomp loads the plugins, which is defined (arbitrarily) by the
GCC configuration?

> > For this, I propose that the only mode of operation that we currently can
> > support is that all of the executable and any shared libraries agree on
> > the offload targets specified by -foffload, and I thus propose the
> > following patch on top of what Joseph has posted before (passes the
> > testsuite, but not yet tested otherwise):
> 
> See above, no.

OK.

How's the following (complete patch instead of incremental patch; the
driver changes are still the same as before)?  The changes are:

  * libgomp/target.c:gomp_target_init again loads all the plugins.
  * libgomp/target.c:resolve_device and
libgomp/oacc-init.c:resolve_device verify that a default device
(OpenMP device-var ICV, and acc_device_default, respectively) is
actually enabled, or resort to host fallback if not.
  * GOMP_set_offload_targets renamed to GOMP_enable_offload_targets; used
to enable devices specified by -foffload.  Can be called multiple
times (executable, any shared libraries); the set of enabled devices
is the union of all those ever requested.
  * GOMP_offload_register (but not the new GOMP_offload_register_ver)
changed to enable all devices.  This is to maintain compatibility
with old executables and shared libraries built without the -foffload
constructor support.
  * IntelMIC mkoffload changed to use GOMP_offload_register_ver instead
of GOMP_offload_register, and GOMP_offload_unregister_ver instead of
GOMP_offload_unregister.  To avoid enabling all devices
(GOMP_offload_register).
  * New test cases to verify this (-foffload=disable, host fallback).

Ilya, I'm aware of your work on additional changes (shared memory),
,
but I think my patch is

Re: [PATCH] remove many typedefs

2015-09-30 Thread Jeff Law


On 09/29/2015 05:54 AM, tbsaunde+...@tbsaunde.org wrote:

From: Trevor Saunders 

Hi,

just more work getting rid of typedefs that are useless and sometimes hide
pointerness.  gcc/ChangeLog:

bootstrapped + regtested on x86_64-linux-gnu, I believe this is all preapproved
so I'll commit it in a day or so if nobody decides to bikeshed anything.

Trev


2015-09-29  Trevor Saunders  

* cfganal.c, compare-elim.c, coverage.c, cprop.c, df-scan.c,
function.c, read-rtl.c, statistics.c, trans-mem.c, tree-if-conv.c,
tree-into-ssa.c, tree-loop-distribution.c, tree-ssa-coalesce.c,
tree-ssa-loop-ivopts.c, tree-ssa-reassoc.c, tree-ssa-strlen.c,
tree-ssa-tail-merge.c, tree-vrp.c, var-tracking.c: Remove
unneeded typedefs.
OK.  I checked the first many files fairly close, less so for the latter 
files given how mechanical this work is.


Jeff

1 2 >

1 - 100 of 106 matches

Mail list logo