date:20151109

[PATCH] Fix PR68248

2015-11-09 Thread Richard Biener


Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2015-11-09  Richard Biener  

PR tree-optimization/68248
* tree-vect-generic.c (expand_vector_operations_1): Handle
scalar rhs2.

* gcc.dg/torture/pr68248.c: New testcase.

Index: gcc/tree-vect-generic.c
===
*** gcc/tree-vect-generic.c (revision 230003)
--- gcc/tree-vect-generic.c (working copy)
*** expand_vector_operations_1 (gimple_stmt_
*** 1527,1532 
--- 1528,1535 
tree srhs1, srhs2 = NULL_TREE;
if ((srhs1 = ssa_uniform_vector_p (rhs1)) != NULL_TREE
&& (rhs2 == NULL_TREE
+ || (! VECTOR_TYPE_P (TREE_TYPE (rhs2))
+ && (srhs2 = rhs2))
  || (srhs2 = ssa_uniform_vector_p (rhs2)) != NULL_TREE)
/* As we query direct optabs restrict to non-convert operations.  */
&& TYPE_MODE (TREE_TYPE (type)) == TYPE_MODE (TREE_TYPE (srhs1)))
Index: gcc/testsuite/gcc.dg/torture/pr68248.c
===
*** gcc/testsuite/gcc.dg/torture/pr68248.c  (revision 0)
--- gcc/testsuite/gcc.dg/torture/pr68248.c  (working copy)
***
*** 0 
--- 1,20 
+ /* { dg-do compile } */
+ 
+ int a, b, c, d;
+ 
+ int
+ fn1 (int p1)
+ {
+   return a > 0 ? p1 : p1 >> a;
+ }
+ 
+ void
+ fn2 ()
+ {
+   char e;
+   for (; c; c++)
+ {
+   e = fn1 (!d ^ 2);
+   b ^= e;
+ }
+ }

Re: [Patch] Change to argument promotion in fixed conversion library calls

2015-11-09 Thread Richard Biener

On Fri, Nov 6, 2015 at 8:14 PM, Bernd Schmidt  wrote:
> On 11/06/2015 08:04 PM, Steve Ellcey  wrote:
>>
>> When I made this change I had one regression in the GCC testsuite
>> (gcc.dg/fixed-point/convert-sat.c).  I tracked this down to the
>> fact that emit_library_call_value_1 does not do any argument promotion
>> because it does not have the original tree type information for library
>> calls.  It only knows about modes.  I can't change
>> emit_library_call_value_1
>> to do the promotion because it does not know whether to do a signed or
>> unsigned promotion, but expand_fixed_convert could do the conversion
>> before calling emit_library_call_value_1 and that is what this patch does.
>
>
> Hmm, difficult. I can see how there would be a problem, but considering how
> many calls to emit_library_call_* we have, I'm slightly worried whether this
> is really is a good approach.
>
> On the other hand, there seems to be precedent in this file:
>
>   if (GET_MODE_PRECISION (GET_MODE (from)) < GET_MODE_PRECISION
> (SImode))
> from = convert_to_mode (SImode, from, unsignedp);
>
>> The 'real' long term fix for this problem is to have tree types for
>> builtin
>> functions so the proper promotions can always be done but that is a fairly
>> large change that I am not willing to tackle right now and it could
>> probably
>> not be done in time for GCC 6.0 anyway.
>
>
> Yeah, but I agree that this is the real fix. We should aim to get rid of the
> emit_library_call functions.

Indeed.  In the "great plan" of simplifying RTL expansion by moving stuff
up to the GIMPLE level this could be done in a lowering stage lowering
all operations that we need to do via libcalls to GIMPLE calls.  Now,
we'd either need proper function declarations for all libcalls of optabs
for this or have the optab internal function stuff from Richard also provide
the libcall fallback.

In the expansion code for the as-libcall path we can then simply use the
type of the incoming argument (as we could if emit_library_call_value_1
would in addition to the RTX operands also receive the original tree ones).

Richard.


>> +  if (SCALAR_INT_MODE_P (from_mode))
>> +{
>> +  /*  If we need to promote the integer function argument we need to
>> do
>
>
> Extra space at the start of the comment.
>
>> + it here instead of inside emit_library_call_value because here
>> we
>> + know if we should be doing a signed or unsigned promotion.  */
>> +
>> +  machine_mode arg_mode;
>> +  int unsigned_p = 0;
>> +
>> +  arg_mode = promote_function_mode (NULL_TREE, from_mode,
>> +   _p, NULL_TREE, 0);
>> +  if (arg_mode != from_mode)
>> +   {
>> + from = convert_to_mode (arg_mode, from, uintp);
>> + from_mode = arg_mode;
>> +   }
>> +}
>
>
> Move this into a separate function (prepare_libcall_arg)? I'll think about
> it over the weekend and let others chime in if they want, but I think I'll
> probably end up approving it with that change.
>
>
> Bernd

[PATCH] Improve BB vectorization dependence analysis

2015-11-09 Thread Richard Biener


Currently BB vectorization computes all dependences inside a BB
region and fails all vectorization if it cannot handle some of them.

This is obviously not needed - BB vectorization can restrict the
dependence tests to those that are needed to apply the load/store
motion effectively performed by the vectorization (sinking all
participating loads/stores to the place of the last one).

With restructuring it that way it's also easy to not give up completely
but only for the SLP instance we cannot vectorize (this gives
a slight bump in my SPEC CPU 2006 testing to 756 vectorized basic
block regions).

But first and foremost this patch is to reduce the dependence analysis
cost and somewhat mitigate the compile-time effects of the first patch.

For fixing PR56118 only a cost model issue remains.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2015-11-09  Richard Biener  

PR tree-optimization/56118
* tree-vectorizer.h (vect_find_last_scalar_stmt_in_slp): Declare.
* tree-vect-slp.c (vect_find_last_scalar_stmt_in_slp): Export.
* tree-vect-data-refs.c (vect_slp_analyze_node_dependences): New
function.
(vect_slp_analyze_data_ref_dependences): Instead of computing
all dependences of the region DRs just analyze the code motions
SLP vectorization will perform.  Remove SLP instances that
cannot have their store/load motions applied.
(vect_analyze_data_refs): Allow DRs without a vectype
in BB vectorization.

* gcc.dg/vect/no-tree-sra-bb-slp-pr50730.c: Adjust.

Index: gcc/tree-vectorizer.h
===
*** gcc/tree-vectorizer.h.orig  2015-11-09 11:01:55.688175321 +0100
--- gcc/tree-vectorizer.h   2015-11-09 11:02:18.987432840 +0100
*** extern void vect_detect_hybrid_slp (loop
*** 1075,1080 
--- 1075,1081 
  extern void vect_get_slp_defs (vec , slp_tree,
   vec *, int);
  extern bool vect_slp_bb (basic_block);
+ extern gimple *vect_find_last_scalar_stmt_in_slp (slp_tree);
  
  /* In tree-vect-patterns.c.  */
  /* Pattern recognition functions.
Index: gcc/tree-vect-data-refs.c
===
*** gcc/tree-vect-data-refs.c.orig  2015-11-09 10:22:33.140125722 +0100
--- gcc/tree-vect-data-refs.c   2015-11-09 11:33:05.503874719 +0100
*** vect_slp_analyze_data_ref_dependence (st
*** 581,586 
--- 581,629 
  }
  
  
+ /* Analyze dependences involved in the transform of SLP NODE.  */
+ 
+ static bool
+ vect_slp_analyze_node_dependences (slp_instance instance, slp_tree node)
+ {
+   /* This walks over all stmts involved in the SLP load/store done
+  in NODE verifying we can sink them up to the last stmt in the
+  group.  */
+   gimple *last_access = vect_find_last_scalar_stmt_in_slp (node);
+   for (unsigned k = 0; k < SLP_INSTANCE_GROUP_SIZE (instance); ++k)
+ {
+   gimple *access = SLP_TREE_SCALAR_STMTS (node)[k];
+   if (access == last_access)
+   continue;
+   stmt_vec_info access_stmt_info = vinfo_for_stmt (access);
+   gimple_stmt_iterator gsi = gsi_for_stmt (access);
+   gsi_next ();
+   for (; gsi_stmt (gsi) != last_access; gsi_next ())
+   {
+ gimple *stmt = gsi_stmt (gsi);
+ stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+ if (!STMT_VINFO_DATA_REF (stmt_info)
+ || (DR_IS_READ (STMT_VINFO_DATA_REF (stmt_info))
+ && DR_IS_READ (STMT_VINFO_DATA_REF (access_stmt_info
+   continue;
+ 
+ ddr_p ddr = initialize_data_dependence_relation
+ (STMT_VINFO_DATA_REF (access_stmt_info),
+  STMT_VINFO_DATA_REF (stmt_info), vNULL);
+ if (vect_slp_analyze_data_ref_dependence (ddr))
+   {
+ /* ???  If the dependence analysis failed we can resort to the
+alias oracle which can handle more kinds of stmts.  */
+ free_dependence_relation (ddr);
+ return false;
+   }
+ free_dependence_relation (ddr);
+   }
+ }
+   return true;
+ }
+ 
+ 
  /* Function vect_analyze_data_ref_dependences.
  
 Examine all the data references in the basic-block, and make sure there
*** vect_slp_analyze_data_ref_dependence (st
*** 590,610 
  bool
  vect_slp_analyze_data_ref_dependences (bb_vec_info bb_vinfo)
  {
-   struct data_dependence_relation *ddr;
-   unsigned int i;
- 
if (dump_enabled_p ())
  dump_printf_loc (MSG_NOTE, vect_location,
   "=== vect_slp_analyze_data_ref_dependences ===\n");
  
!   if (!compute_all_dependences (BB_VINFO_DATAREFS (bb_vinfo),
!   _VINFO_DDRS (bb_vinfo),
!   vNULL, true))
! return false;
  
!   FOR_EACH_VEC_ELT (BB_VINFO_DDRS (bb_vinfo), i, ddr)
! if

[PATCH] 02/N Fix memory leaks in IPA

2015-11-09 Thread Martin Liška

Hi.

Following changes were consulted with Martin Jambor to properly release
memory in IPA. It fixes leaks which popped up in tramp3d with -O2.

Bootstrap and regression tests have been running.

Ready after it finishes?
Thanks,
Martin
>From 85b63f738030dd7a901c228ba76e24f820d31c5d Mon Sep 17 00:00:00 2001
From: marxin 
Date: Mon, 9 Nov 2015 12:38:27 +0100
Subject: [PATCH 2/2] Fix memory leaks in IPA.

gcc/ChangeLog:

2015-11-09  Martin Liska  

	* ipa-inline-analysis.c (estimate_function_body_sizes): Call
	body_info release function.
	* ipa-prop.c (ipa_release_body_info): New function.
	(ipa_analyze_node): Call the function.
	(ipa_node_params::~ipa_node_params): Release known_csts.
	* ipa-prop.h (ipa_release_body_info): Declare.
---
 gcc/ipa-inline-analysis.c |  2 +-
 gcc/ipa-prop.c| 20 +++-
 gcc/ipa-prop.h|  2 +-
 3 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/gcc/ipa-inline-analysis.c b/gcc/ipa-inline-analysis.c
index c07b0da..8c8b8e3 100644
--- a/gcc/ipa-inline-analysis.c
+++ b/gcc/ipa-inline-analysis.c
@@ -2853,7 +2853,7 @@ estimate_function_body_sizes (struct cgraph_node *node, bool early)
   inline_summaries->get (node)->self_time = time;
   inline_summaries->get (node)->self_size = size;
   nonconstant_names.release ();
-  fbi.bb_infos.release ();
+  ipa_release_body_info ();
   if (opt_for_fn (node->decl, optimize))
 {
   if (!early)
diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c
index d15f0eb..f379ea7 100644
--- a/gcc/ipa-prop.c
+++ b/gcc/ipa-prop.c
@@ -2258,6 +2258,19 @@ analysis_dom_walker::before_dom_children (basic_block bb)
   ipa_compute_jump_functions_for_bb (m_fbi, bb);
 }
 
+/* Release body info FBI.  */
+
+void
+ipa_release_body_info (struct ipa_func_body_info *fbi)
+{
+  int i;
+  struct ipa_bb_info *bi;
+
+  FOR_EACH_VEC_ELT (fbi->bb_infos, i, bi)
+free_ipa_bb_info (bi);
+  fbi->bb_infos.release ();
+}
+
 /* Initialize the array describing properties of formal parameters
of NODE, analyze their uses and compute jump functions associated
with actual arguments of calls from within NODE.  */
@@ -2313,11 +2326,7 @@ ipa_analyze_node (struct cgraph_node *node)
 
   analysis_dom_walker ().walk (ENTRY_BLOCK_PTR_FOR_FN (cfun));
 
-  int i;
-  struct ipa_bb_info *bi;
-  FOR_EACH_VEC_ELT (fbi.bb_infos, i, bi)
-free_ipa_bb_info (bi);
-  fbi.bb_infos.release ();
+  ipa_release_body_info ();
   free_dominance_info (CDI_DOMINATORS);
   pop_cfun ();
 }
@@ -3306,6 +3315,7 @@ ipa_node_params::~ipa_node_params ()
   free (lattices);
   /* Lattice values and their sources are deallocated with their alocation
  pool.  */
+  known_csts.release ();
   known_contexts.release ();
 
   lattices = NULL;
diff --git a/gcc/ipa-prop.h b/gcc/ipa-prop.h
index b69ee8a..2fe824d 100644
--- a/gcc/ipa-prop.h
+++ b/gcc/ipa-prop.h
@@ -775,7 +775,7 @@ bool ipa_modify_expr (tree *, bool, ipa_parm_adjustment_vec);
 ipa_parm_adjustment *ipa_get_adjustment_candidate (tree **, bool *,
 		   ipa_parm_adjustment_vec,
 		   bool);
-
+void ipa_release_body_info (struct ipa_func_body_info *);
 
 /* From tree-sra.c:  */
 tree build_ref_for_offset (location_t, tree, HOST_WIDE_INT, bool, tree,
-- 
2.6.2

Re: [AArch64][PATCH 6/7] Add NEON intrinsics vqrdmlah and vqrdmlsh.

2015-11-09 Thread Matthew Wahab


On 09/11/15 13:31, Christophe Lyon wrote:

On 30 October 2015 at 16:52, Matthew Wahab  wrote:

On 30/10/15 12:51, Christophe Lyon wrote:


On 23 October 2015 at 14:26, Matthew Wahab 
wrote:


The ARMv8.1 architecture extension adds two Adv.SIMD instructions,
sqrdmlah and sqrdmlsh. This patch adds the NEON intrinsics vqrdmlah and
vqrdmlsh for these instructions. The new intrinsics are of the form
vqrdml{as}h[q]_.

Tested the series for aarch64-none-linux-gnu with native bootstrap and
make check on an ARMv8 architecture. Also tested aarch64-none-elf with
cross-compiled check-gcc on an ARMv8.1 emulator.


Is there a publicly available simulator for v8.1? QEMU or Foundation
Model?


Sorry, I don't know.
Matthew



So, what will happen to the testsuite once this is committed?
Are we going to see FAILs when using QEMU?



No, the check at the top of the  test files

+/* { dg-require-effective-target arm_v8_1a_neon_hw } */

should make this test UNSUPPORTED if the the HW/simulator can't execute it. (Support 
for this check is added in patch #5 in this series.) Note that the aarch64-none-linux 
make check was run on ARMv8 HW which can't execute the test and correctly reported it 
as unsupported.


Matthew

Re: [vec-cmp, patch 3/6] Vectorize comparison

2015-11-09 Thread Richard Biener

On Mon, Nov 9, 2015 at 1:07 PM, Ilya Enkovich  wrote:
> On 26 Oct 16:09, Richard Biener wrote:
>> On Wed, Oct 14, 2015 at 6:12 PM, Ilya Enkovich  
>> wrote:
>> > +
>> > + ops.release ();
>> > + vec_defs.release ();
>>
>> No need to release auto_vec<>s at the end of scope explicitely.
>
> Fixed
>
>>
>> > + vec_compare = build2 (code, mask_type, vec_rhs1, vec_rhs2);
>> > + new_stmt = gimple_build_assign (mask, vec_compare);
>> > + new_temp = make_ssa_name (mask, new_stmt);
>> > + gimple_assign_set_lhs (new_stmt, new_temp);
>>
>>  new_temp = make_ssa_name (mask);
>>  gimple_build_assign (new_temp, code, vec_rhs1, vec_rhs2);
>>
>> for the 4 stmts above.
>
> Fixed
>
>>
>> > +
>> > +  vec_oprnds0.release ();
>> > +  vec_oprnds1.release ();
>>
>> Please use auto_vec<>s.
>
> These are used to hold vecs returned by vect_get_slp_defs.  Thus can't 
> use auto_vec.

Ok.

Richard.

>>
>> Ok with those changes.
>>
>> RIchard.
>>
>
>
> gcc/
>
> 2015-11-09  Ilya Enkovich  
>
> * tree-vect-data-refs.c (vect_get_new_vect_var): Support 
> vect_mask_var.
> (vect_create_destination_var): Likewise.
> * tree-vect-stmts.c (vectorizable_comparison): New.
> (vect_analyze_stmt): Add vectorizable_comparison.
> (vect_transform_stmt): Likewise.
> * tree-vectorizer.h (enum vect_var_kind): Add vect_mask_var.
> (enum stmt_vec_info_type): Add comparison_vec_info_type.
> (vectorizable_comparison): New.
>
>
> diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
> index 11bce79..926752b 100644
> --- a/gcc/tree-vect-data-refs.c
> +++ b/gcc/tree-vect-data-refs.c
> @@ -3790,6 +3790,9 @@ vect_get_new_vect_var (tree type, enum vect_var_kind 
> var_kind, const char *name)
>case vect_scalar_var:
>  prefix = "stmp";
>  break;
> +  case vect_mask_var:
> +prefix = "mask";
> +break;
>case vect_pointer_var:
>  prefix = "vectp";
>  break;
> @@ -4379,7 +4382,11 @@ vect_create_destination_var (tree scalar_dest, tree 
> vectype)
>tree type;
>enum vect_var_kind kind;
>
> -  kind = vectype ? vect_simple_var : vect_scalar_var;
> +  kind = vectype
> +? VECTOR_BOOLEAN_TYPE_P (vectype)
> +? vect_mask_var
> +: vect_simple_var
> +: vect_scalar_var;
>type = vectype ? vectype : TREE_TYPE (scalar_dest);
>
>gcc_assert (TREE_CODE (scalar_dest) == SSA_NAME);
> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> index f1216c8..ee549f4 100644
> --- a/gcc/tree-vect-stmts.c
> +++ b/gcc/tree-vect-stmts.c
> @@ -7416,6 +7416,185 @@ vectorizable_condition (gimple *stmt, 
> gimple_stmt_iterator *gsi,
>return true;
>  }
>
> +/* vectorizable_comparison.
> +
> +   Check if STMT is comparison expression that can be vectorized.
> +   If VEC_STMT is also passed, vectorize the STMT: create a vectorized
> +   comparison, put it in VEC_STMT, and insert it at GSI.
> +
> +   Return FALSE if not a vectorizable STMT, TRUE otherwise.  */
> +
> +bool
> +vectorizable_comparison (gimple *stmt, gimple_stmt_iterator *gsi,
> +gimple **vec_stmt, tree reduc_def,
> +slp_tree slp_node)
> +{
> +  tree lhs, rhs1, rhs2;
> +  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
> +  tree vectype1 = NULL_TREE, vectype2 = NULL_TREE;
> +  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
> +  tree vec_rhs1 = NULL_TREE, vec_rhs2 = NULL_TREE;
> +  tree new_temp;
> +  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
> +  enum vect_def_type dts[2] = {vect_unknown_def_type, vect_unknown_def_type};
> +  unsigned nunits;
> +  int ncopies;
> +  enum tree_code code;
> +  stmt_vec_info prev_stmt_info = NULL;
> +  int i, j;
> +  bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info);
> +  vec vec_oprnds0 = vNULL;
> +  vec vec_oprnds1 = vNULL;
> +  gimple *def_stmt;
> +  tree mask_type;
> +  tree mask;
> +
> +  if (!VECTOR_BOOLEAN_TYPE_P (vectype))
> +return false;
> +
> +  mask_type = vectype;
> +  nunits = TYPE_VECTOR_SUBPARTS (vectype);
> +
> +  if (slp_node || PURE_SLP_STMT (stmt_info))
> +ncopies = 1;
> +  else
> +ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits;
> +
> +  gcc_assert (ncopies >= 1);
> +  if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
> +return false;
> +
> +  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def
> +  && !(STMT_VINFO_DEF_TYPE (stmt_info) == vect_nested_cycle
> +  && reduc_def))
> +return false;
> +
> +  if (STMT_VINFO_LIVE_P (stmt_info))
> +{
> +  if (dump_enabled_p ())
> +   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +"value used after loop.\n");
> +  return false;
> +}
> +
> +  if (!is_gimple_assign (stmt))
> +return false;
> +
> +  code = gimple_assign_rhs_code (stmt);
> +
> +  if (TREE_CODE_CLASS (code) != tcc_comparison)
> +return false;

Re: [vec-cmp, patch 2/6] Vectorization factor computation

2015-11-09 Thread Richard Biener

On Mon, Nov 9, 2015 at 2:54 PM, Ilya Enkovich  wrote:
> 2015-10-20 16:45 GMT+03:00 Richard Biener :
>> On Wed, Oct 14, 2015 at 1:21 PM, Ilya Enkovich  
>> wrote:
>>> 2015-10-13 16:37 GMT+03:00 Richard Biener :
 On Thu, Oct 8, 2015 at 4:59 PM, Ilya Enkovich  
 wrote:
> Hi,
>
> This patch handles statements with boolean result in vectorization factor 
> computation.  For comparison its operands type is used instead of restult 
> type to compute VF.  Other boolean statements are ignored for VF.
>
> Vectype for comparison is computed using type of compared values.  
> Computed type is propagated into other boolean operations.

 This feels rather ad-hoc, mixing up the existing way of computing
 vector type and VF.  I'd rather have turned the whole
 vector type computation around to the scheme working on the operands
 rather than on the lhs and then searching
 for smaller/larger types on the rhs'.

 I know this is a tricky function (heh, but you make it even worse...).
 And it needs a helper with knowledge about operations
 so one can compute the result vector type for an operation on its
 operands.  The seeds should be PHIs (handled like now)
 and loads, and yes, externals need special handling.

 Ideally we'd do things in two stages, first compute vector types in a
 less constrained manner (not forcing a single vector size)
 and then in a 2nd run promote to a common size also computing the VF to do 
 that.
>>>
>>> This sounds like a refactoring, not a functional change, right? Also I
>>> don't see a reason to analyze DF to compute vectypes if we promote it
>>> to a single vector size anyway. For booleans we have to do it because
>>> boolean vectors of the same size may have different number of
>>> elements. What is the reason to do it for other types?
>>
>> For conversions and operators which support different sized operands
>
> That's what we handle in vector patterns and use some helper functions
> to determine vectypes there. Looks like this refactoring would affects
> patterns significantly. Probably compute vectypes before searching for
> patterns?
>
>>
>>> Shouldn't it be a patch independent from comparison vectorization series?
>>
>> As you like.
>
> I'd like to move on with vector comparison and consider VF computation
> refactoring when it's stabilized. This patch is the last one (except
> target ones) not approved in all vector comparison related series.
> Would it be OK to go on with it in a current shape?

Yes.

Thanks,
Richard.

> Thanks,
> Ilya

Re: [PATCH] PR/67682, break SLP groups up if only some elements match

2015-11-09 Thread Alan Lawrence

On 06/11/15 12:55, Richard Biener wrote:
>
>> +  /* GROUP_GAP of the first group now has to skip over the second group 
>> too.  */
>> +  GROUP_GAP (first_vinfo) += group2_size;
>
> Please add a MSG_NOTE debug printf stating that we split the group and
> at which element.

Done.

> I think you want to add && STMT_VINFO_GROUPED_ACCESS (vinfo_for_stmt (stmt))
> otherwise this could be SLP reductions where there is no way the split
> group would enable vectorization.

Ah, I had thought that the (GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt))) check
sufficed for that, as per e.g. the section above
  /* Create a node (a root of the SLP tree) for the packed grouped stores. */
But done.

> Note that BB vectorization now also very aggressively falls back to 
> considering
> non-matches being "external".
>
> Not sure why that doesn't trigger for your testcases ;)

I tested against trunk r229944, on which all of my scan-tree-dump's were 
failing

> I'm comfortable with the i < group_size half of the patch.  For the other 
> piece
> I'd like to see more compile-time / success data from, say, building
> SPEC CPU 2006.

Well, as a couple of quick data points, a compile of SPEC2000 on
aarch64-none-linux-gnu (-Ofast -fomit-frame-pointer -mcpu=cortex-a57), I have:

3080 successes without patch;
+79 successes from the "i < vectorization_factor" part of the patch (on its own)
+90 successes from the (i>=vectorization_factor) && "i < group_size" part (on 
its own)
+(79 from first) +(90 from second) + (an additional 62) from both parts 
together.

And for SPEC2006, aarch64-linux-gnu (-O3 -fomit-frame-pointer -mcpu=cortex-a57):
11979 successes without patch;
+ 499 from the "i < vectorization_factor" part
+ 264 from the (i >= vectorization factor) && (i < group_size)" part
+ extra 336 if both parts combined.

I haven't done any significant measurements of compile-time yet.

(snipping this bit out-of-order)
> Hmm. This is of course pretty bad for compile-time for the non-matching
> case as that effectively will always split into N pieces if we feed it
> garbage (that is, without being sure that at least one pice _will_ vectorize).
>
> OTOH with the current way of computing "matches" we'd restrict ourselves
> to cases where the first stmt we look at (and match to) happens to be
> the operation that in the end will form a matching group.
...
> Eventually we'd want to improve the "matches" return
> to include the affected stmts (ok, that'll be not very easy) so we can do
> a cheap "if we split here will it fix that particular mismatch" check.

Yes, I think there are a bunch of things we can do here, that would be more
powerful than the simple approach I used here. The biggest limiting factor will
probably be (lack of) permutation, i.e. if we only SLP stores to consecutive
addresses.

> So, please split the patch and I suggest to leave the controversical part
> for next stage1 together with some improvement on the SLP build process
> itself?

Here's a reduced version with just the second case,
bootstrapped+check-gcc/g++ on x86_64.

gcc/ChangeLog:

* tree-vect-slp.c (vect_split_slp_store_group): New.
(vect_analyze_slp_instance): During basic block SLP, recurse on
subgroups if vect_build_slp_tree fails after 1st vector.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/bb-slp-7.c (main1): Make subgroups non-isomorphic.
* gcc.dg/vect/bb-slp-subgroups-1.c: New.
* gcc.dg/vect/bb-slp-subgroups-2.c: New.
* gcc.dg/vect/bb-slp-subgroups-3.c: New.
* gcc.dg/vect/bb-slp-subgroups-4.c: New.
---
 gcc/testsuite/gcc.dg/vect/bb-slp-7.c   | 10 ++--
 gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-1.c | 44 +++
 gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-2.c | 42 +++
 gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c | 41 ++
 gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-4.c | 41 ++
 gcc/tree-vect-slp.c| 74 +-
 6 files changed, 246 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-2.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-4.c

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-7.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-7.c
index ab54a48..b8bef8c 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-7.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-7.c
@@ -16,12 +16,12 @@ main1 (unsigned int x, unsigned int y)
   unsigned int *pout = [0];
   unsigned int a0, a1, a2, a3;
 
-  /* Non isomorphic.  */
+  /* Non isomorphic, even 64-bit subgroups.  */
   a0 = *pin++ + 23;
-  a1 = *pin++ + 142;
+  a1 = *pin++ * 142;
   a2 = *pin++ + 2;
   a3 = *pin++ * 31;
-  
+
   *pout++ = a0 * x;
   *pout++ = a1 * y;
   *pout++ = a2 * x;
@@ -29,7 +29,7 @@ main1 (unsigned int x, unsigned int y)
 
   /* Check results.  */
   if (out[0] != (in[0] +

[patch] backport PIE support for FreeBSD to gcc-49

2015-11-09 Thread Andreas Tobler


Hi,

any objections that I apply this patch to gcc-4.9?

It is FreeBSD only.


TIA,
Andreas

2015-11-09  Andreas Tobler  

Backport from mainline
2015-05-18  Andreas Tobler  

* config/freebsd-spec.h (FBSD_STARTFILE_SPEC): Add the bits to build
pie executables.
(FBSD_ENDFILE_SPEC): Likewise.
* config/i386/freebsd.h (STARTFILE_SPEC): Remove and use the one from
config/freebsd-spec.h.
(ENDFILE_SPEC): Likewise.

2015-11-02  Andreas Tobler  

* config/rs6000/freebsd64.h (ASM_SPEC32): Adust spec to handle
PIE executables.

Index: gcc/config/freebsd-spec.h
===
--- gcc/config/freebsd-spec.h   (revision 230016)
+++ gcc/config/freebsd-spec.h   (working copy)
@@ -66,8 +66,9 @@
   "%{!shared: \
  %{pg:gcrt1.o%s} %{!pg:%{p:gcrt1.o%s} \
   %{!p:%{profile:gcrt1.o%s} \
-%{!profile:crt1.o%s \
-   crti.o%s %{!shared:crtbegin.o%s} %{shared:crtbeginS.o%s}"
+%{!profile: \
+%{pie: Scrt1.o%s;:crt1.o%s} \
+   crti.o%s %{static:crtbeginT.o%s;shared|pie:crtbeginS.o%s;:crtbegin.o%s}"
 
 /* Provide a ENDFILE_SPEC appropriate for FreeBSD.  Here we tack on
the magical crtend.o file (see crtstuff.c) which provides part of 
@@ -76,7 +77,7 @@
`crtn.o'.  */
 
 #define FBSD_ENDFILE_SPEC \
-  "%{!shared:crtend.o%s} %{shared:crtendS.o%s} crtn.o%s"
+  "%{shared|pie:crtendS.o%s;:crtend.o%s} crtn.o%s"
 
 /* Provide a LIB_SPEC appropriate for FreeBSD as configured and as
required by the user-land thread model.  Before __FreeBSD_version
Index: gcc/config/i386/freebsd.h
===
--- gcc/config/i386/freebsd.h   (revision 230016)
+++ gcc/config/i386/freebsd.h   (working copy)
@@ -59,29 +59,16 @@
 #define SUBTARGET_EXTRA_SPECS \
   { "fbsd_dynamic_linker", FBSD_DYNAMIC_LINKER }
 
-/* Provide a STARTFILE_SPEC appropriate for FreeBSD.  Here we add
-   the magical crtbegin.o file (see crtstuff.c) which provides part 
-   of the support for getting C++ file-scope static object constructed 
-   before entering `main'.  */
-   
-#undef STARTFILE_SPEC
-#define STARTFILE_SPEC \
-  "%{!shared: \
- %{pg:gcrt1.o%s} %{!pg:%{p:gcrt1.o%s} \
-  %{!p:%{profile:gcrt1.o%s} \
-%{!profile:crt1.o%s \
-   crti.o%s %{!shared:crtbegin.o%s} %{shared:crtbeginS.o%s}"
+/* Use the STARTFILE_SPEC from config/freebsd-spec.h.  */
 
-/* Provide a ENDFILE_SPEC appropriate for FreeBSD.  Here we tack on
-   the magical crtend.o file (see crtstuff.c) which provides part of 
-   the support for getting C++ file-scope static object constructed 
-   before entering `main', followed by a normal "finalizer" file, 
-   `crtn.o'.  */
+#undef  STARTFILE_SPEC
+#define STARTFILE_SPEC FBSD_STARTFILE_SPEC
 
-#undef ENDFILE_SPEC
-#define ENDFILE_SPEC \
-  "%{!shared:crtend.o%s} %{shared:crtendS.o%s} crtn.o%s"
+/* Use the ENDFILE_SPEC from config/freebsd-spec.h.  */
 
+#undef  ENDFILE_SPEC
+#define ENDFILE_SPEC FBSD_ENDFILE_SPEC
+
 /* Provide a LINK_SPEC appropriate for FreeBSD.  Here we provide support
for the special GCC options -static and -shared, which allow us to
link things in one of these three modes by applying the appropriate
Index: gcc/config/rs6000/freebsd64.h
===
--- gcc/config/rs6000/freebsd64.h   (revision 230016)
+++ gcc/config/rs6000/freebsd64.h   (working copy)
@@ -130,7 +130,7 @@
 #defineLINK_OS_FREEBSD_SPEC 
"%{m32:%(link_os_freebsd_spec32)}%{!m32:%(link_os_freebsd_spec64)}"
 
 #define ASM_SPEC32 "-a32 \
-%{mrelocatable} %{mrelocatable-lib} %{fpic:-K PIC} %{fPIC:-K PIC} \
+%{mrelocatable} %{mrelocatable-lib} %{fpic|fpie|fPIC|fPIE:-K PIC} \
 %{memb} %{!memb: %{msdata=eabi: -memb}} \
 %{!mlittle: %{!mlittle-endian: %{!mbig: %{!mbig-endian: \
 %{mcall-freebsd: -mbig} \

Re: [PATCH 1/2] s/390: Implement "target" attribute.

2015-11-09 Thread Andreas Krebbel

On 11/02/2015 09:44 AM, Dominik Vogt wrote:
> (@Uli: I'd like to hear your opinion on this issue.
> Original message:
> https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03403.html).
> 
> On Fri, Oct 30, 2015 at 03:09:39PM +0100, Andreas Krebbel wrote:
>> Why do we need x_s390_arch_specified and x_s390_tune_specified?  You
>> should be able to use opts_set->x_s390_arch and opts_set->x_s390_tune
>> instead? (patch attached, your tests keep working with that change).
> 
> The idea was that -mtune on the command line is *not* overridden
> by the "arch" target attribute.  This would allow to change the
> architecture for a specific function and keep the -mtune= option
> from the command line.  But as a matter of fact, the current patch
> doesn't do it either (bug?).

Your testcases even seem to check for this behavior so it looked intentional to 
me.  But I agree
that being able to keep the -mtune cmdline value for a function while only 
changing the used
instruction set would be good.

Could you please elaborate why implementing this requires the new flags?

-Andreas-

Re: [vec-cmp, patch 2/6] Vectorization factor computation

2015-11-09 Thread Ilya Enkovich

2015-10-20 16:45 GMT+03:00 Richard Biener :
> On Wed, Oct 14, 2015 at 1:21 PM, Ilya Enkovich  wrote:
>> 2015-10-13 16:37 GMT+03:00 Richard Biener :
>>> On Thu, Oct 8, 2015 at 4:59 PM, Ilya Enkovich  
>>> wrote:
 Hi,

 This patch handles statements with boolean result in vectorization factor 
 computation.  For comparison its operands type is used instead of restult 
 type to compute VF.  Other boolean statements are ignored for VF.

 Vectype for comparison is computed using type of compared values.  
 Computed type is propagated into other boolean operations.
>>>
>>> This feels rather ad-hoc, mixing up the existing way of computing
>>> vector type and VF.  I'd rather have turned the whole
>>> vector type computation around to the scheme working on the operands
>>> rather than on the lhs and then searching
>>> for smaller/larger types on the rhs'.
>>>
>>> I know this is a tricky function (heh, but you make it even worse...).
>>> And it needs a helper with knowledge about operations
>>> so one can compute the result vector type for an operation on its
>>> operands.  The seeds should be PHIs (handled like now)
>>> and loads, and yes, externals need special handling.
>>>
>>> Ideally we'd do things in two stages, first compute vector types in a
>>> less constrained manner (not forcing a single vector size)
>>> and then in a 2nd run promote to a common size also computing the VF to do 
>>> that.
>>
>> This sounds like a refactoring, not a functional change, right? Also I
>> don't see a reason to analyze DF to compute vectypes if we promote it
>> to a single vector size anyway. For booleans we have to do it because
>> boolean vectors of the same size may have different number of
>> elements. What is the reason to do it for other types?
>
> For conversions and operators which support different sized operands

That's what we handle in vector patterns and use some helper functions
to determine vectypes there. Looks like this refactoring would affects
patterns significantly. Probably compute vectypes before searching for
patterns?

>
>> Shouldn't it be a patch independent from comparison vectorization series?
>
> As you like.

I'd like to move on with vector comparison and consider VF computation
refactoring when it's stabilized. This patch is the last one (except
target ones) not approved in all vector comparison related series.
Would it be OK to go on with it in a current shape?

Thanks,
Ilya

Re: [PATCH] Use signed boolean type for boolean vectors

2015-11-09 Thread Ilya Enkovich

On 03 Nov 14:42, Richard Biener wrote:
> On Wed, Oct 28, 2015 at 4:30 PM, Ilya Enkovich  wrote:
> > 2015-10-28 18:21 GMT+03:00 Richard Biener :
> >> On Wed, Oct 28, 2015 at 2:13 PM, Ilya Enkovich  
> >> wrote:
> >>> Hi,
> >>>
> >>> Testing boolean vector conversions I found several runtime regressions
> >>> and investigation showed it's due to incorrect conversion caused by
> >>> unsigned boolean type.  When boolean vector is represented as an
> >>> integer vector on target it's a signed integer actually.  Unsigned
> >>> boolean type was chosen due to possible single bit values, but for
> >>> multiple bit values it causes wrong casting.  The easiest way to fix
> >>> it is to use signed boolean value.  The following patch does this and
> >>> fixes my problems with conversion.  Bootstrapped and tested on
> >>> x86_64-unknown-linux-gnu.  Is it OK?
> >>
> >> Hmm.  Actually formally the "boolean" vectors were always 0 or -1
> >> (all bits set).  That is also true for a signed boolean with precision 1
> >> but with higher precision what makes sure to sign-extend 'true'?
> >>
> >> So it's far from an obvious change, esp as you don't change the
> >> precision == 1 case.  [I still think we should have precision == 1
> >> for all boolean types]
> >>
> >> Richard.
> >>
> >
> > For 1 bit precision signed type value 1 is out of range, right? This might 
> > break
> > in many place due to used 1 as true value.
> 
> For vectors -1 is true.  Did you try whether it breaks many places?
> build_int_cst (type, 1) should still work fine.
> 
> Richard.
> 

I tried it and didn't find any new failures.  So looks I was wrong assuming it 
should cause many failures.  Testing is not complete because many SPEC 
benchmarks are failing to compile on -O3 for AVX-512 on trunk.  But I think we 
may proceed with signed type and fix constant generation issues if any 
revealed.  This patch was bootstrapped and regtested on 
x86_64-unknown-linux-gnu.  OK for trunk?

Thanks,
Ilya
--
gcc/

2015-11-09  Ilya Enkovich  

* optabs.c (expand_vec_cond_expr): Always get sign from type.
* tree.c (wide_int_to_tree): Support negative values for boolean.
(build_nonstandard_boolean_type): Use signed type for booleans.


diff --git a/gcc/optabs.c b/gcc/optabs.c
index fdcdc6a..44971ad 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -5365,7 +5365,6 @@ expand_vec_cond_expr (tree vec_cond_type, tree op0, tree 
op1, tree op2,
   op0a = TREE_OPERAND (op0, 0);
   op0b = TREE_OPERAND (op0, 1);
   tcode = TREE_CODE (op0);
-  unsignedp = TYPE_UNSIGNED (TREE_TYPE (op0a));
 }
   else
 {
@@ -5374,9 +5373,9 @@ expand_vec_cond_expr (tree vec_cond_type, tree op0, tree 
op1, tree op2,
   op0a = op0;
   op0b = build_zero_cst (TREE_TYPE (op0));
   tcode = LT_EXPR;
-  unsignedp = false;
 }
   cmp_op_mode = TYPE_MODE (TREE_TYPE (op0a));
+  unsignedp = TYPE_UNSIGNED (TREE_TYPE (op0a));
 
 
   gcc_assert (GET_MODE_SIZE (mode) == GET_MODE_SIZE (cmp_op_mode)
diff --git a/gcc/tree.c b/gcc/tree.c
index 18d6544..6fb4c09 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -1437,7 +1437,7 @@ wide_int_to_tree (tree type, const wide_int_ref )
case BOOLEAN_TYPE:
  /* Cache false or true.  */
  limit = 2;
- if (hwi < 2)
+ if (IN_RANGE (hwi, 0, 1))
ix = hwi;
  break;
 
@@ -8069,7 +8069,7 @@ build_nonstandard_boolean_type (unsigned HOST_WIDE_INT 
precision)
 
   type = make_node (BOOLEAN_TYPE);
   TYPE_PRECISION (type) = precision;
-  fixup_unsigned_type (type);
+  fixup_signed_type (type);
 
   if (precision <= MAX_INT_CACHED_PREC)
 nonstandard_boolean_type_cache[precision] = type;

Re: OpenACC Firstprivate

2015-11-09 Thread Jakub Jelinek

On Mon, Nov 09, 2015 at 08:59:15AM -0500, Nathan Sidwell wrote:
> >This I'm afraid performs often two copies rather than just one (one to copy
> >the host value to the present_copyin mapped value, another one in the
> >region),
> 
> I don't think that can be avoided.  The host doesn't have control over when
> the CTAs (a gang) start -- they may even be serialized onto the same
> physical HW. So each gang has to initialize its own instance.  Or did you
> mean something else?

So, what is the scope of the private and firstprivate vars in OpenACC?
In OpenMP if a variable is private or firstprivate on the target construct,
unless further privatized in inner constructs it is really shared among all
the threads in all the teams (ro one var per all CTAs/workers in PTX terms).
Is that the case for OpenACC too, or are the vars e.g. private to each CTA
already or to each thread in each CTA, something different?
If they are shared by all CTAs, then you should hopefully be able to use the
GOMP_MAP_FIRSTPRIVATE{,_INT}, if not, then I'd say you should at least use
those to provide you the initializer data to initialize your private vars
from as a cheaper alternative to mapping.

Jakub

Re: [PATCH] 02/N Fix memory leaks in IPA

2015-11-09 Thread Richard Biener

On Mon, Nov 9, 2015 at 2:29 PM, Martin Liška  wrote:
> Hi.
>
> Following changes were consulted with Martin Jambor to properly release
> memory in IPA. It fixes leaks which popped up in tramp3d with -O2.
>
> Bootstrap and regression tests have been running.
>
> Ready after it finishes?

Ok.

Richard.

> Thanks,
> Martin

Re: Extend tree-call-cdce to calls whose result is used

2015-11-09 Thread Michael Matz

Hi,

On Sat, 7 Nov 2015, Richard Sandiford wrote:

> For -fmath-errno, builtins.c currently expands calls to sqrt to:
> 
> y = sqrt_optab (x);
> if (y != y)
>   [ sqrt (x); or errno = EDOM; ]
> 
> - the call to sqrt is protected by the result of the optab rather
>   than the input.  It would be better to check !(x >= 0), like
>   tree-call-cdce.c does.

It depends.  With fast-math (and hence without NaNs) you can trivially 
optimize away a (y != y) test.  You can't do so with !(x>=0) at all.  

> - the branch isn't exposed at the gimple level and so gets little
>   high-level optimisation.
> 
> - we do this for log too, but for log a zero input produces
>   -inf rather than a NaN, and sets errno to ERANGE rather than EDOM.
> 
> This patch moves the code to tree-call-cdce.c instead,

This somehow feels wrong.  Dead-code elimination doesn't have anything to 
do with the transformation you want, it rather is rewriting all feasible 
calls into something else, like fold_builtins does.  Also cdce currently 
doesn't seem to do any checks on the fast-math flags, so I wonder if some 
of the conditions that you now also insert for calls whose results are 
used stay until final code.

> Previously the pass was only enabled by default at -O2 or above, but the 
> old builtins.c code was enabled at -O.  The patch therefore enables the 
> pass at -O as well.

The pass is somewhat expensive in that it removes dominator info and 
schedules a full ssa update.  The transformation is trivial enough that 
dominators and SSA form can be updated on the fly, I think without that 
it's not feasible for -O.

But as said I think this transformation should better be moved into 
builtin folding (or other call folding), at which point also the fast-math 
flags can be checked.  The infrastructure routines of tree-call-cdce can 
be used there of course.  If so moved the cdce pass would be subsumed by 
that btw. (because the dead call result will be trivially exposed), and 
that would be a good thing.

Ciao,
Michael.

Re: [PATCH v2 11/13] Test case for conversion from __seg_tls:0

2015-11-09 Thread Richard Biener

On Tue, Oct 20, 2015 at 11:27 PM, Richard Henderson  wrote:
> ---
>  gcc/testsuite/gcc.target/i386/addr-space-3.c | 10 ++
>  1 file changed, 10 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/addr-space-3.c
>
> diff --git a/gcc/testsuite/gcc.target/i386/addr-space-3.c 
> b/gcc/testsuite/gcc.target/i386/addr-space-3.c
> new file mode 100644
> index 000..63f1f03
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/addr-space-3.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O" } */
> +/* { dg-final { scan-assembler "[fg]s:0" } } */

Causes

ERROR: (DejaGnu) proc "fg" does not exist.
The error code is NONE
The info on the error is:
close: spawn id exp6 not open
while executing
"close -i exp6"
invoked from within
"catch "close -i $spawn_id""



> +
> +void test(int *y)
> +{
> +  int *x = (int __seg_tls *)0;
> +  if (x == y)
> +asm("");
> +}
> --
> 2.4.3
>

Re: [PATCH 4b/4] [ARM] PR63870 Remove error for invalid lane numbers

2015-11-09 Thread Ramana Radhakrishnan



On 08/11/15 00:26, charles.bay...@linaro.org wrote:
> From: Charles Baylis 
> 
>   Charles Baylis  
> 
>   * config/arm/neon.md (neon_vld1_lane): Remove error for invalid
>   lane number.
>   (neon_vst1_lane): Likewise.
>   (neon_vld2_lane): Likewise.
>   (neon_vst2_lane): Likewise.
>   (neon_vld3_lane): Likewise.
>   (neon_vst3_lane): Likewise.
>   (neon_vld4_lane): Likewise.
>   (neon_vst4_lane): Likewise.
> 

The only way we can get here is through the intrinsics - we do a check for lane 
numbers earlier.

If things go horribly wrong - the assembler will complain, so it's ok to elide 
this internal_error here, thus OK.

regards
Ramana

> Change-Id: Id7b4b6fa7320157e62e5bae574b4c4688d921774
> ---
>  gcc/config/arm/neon.md | 48 
>  1 file changed, 8 insertions(+), 40 deletions(-)
> 
> diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
> index e8db020..6574e6e 100644
> --- a/gcc/config/arm/neon.md
> +++ b/gcc/config/arm/neon.md
> @@ -4264,8 +4264,6 @@ if (BYTES_BIG_ENDIAN)
>HOST_WIDE_INT lane = ENDIAN_LANE_N(mode, INTVAL (operands[3]));
>HOST_WIDE_INT max = GET_MODE_NUNITS (mode);
>operands[3] = GEN_INT (lane);
> -  if (lane < 0 || lane >= max)
> -error ("lane out of range");
>if (max == 1)
>  return "vld1.\t%P0, %A1";
>else
> @@ -4286,9 +4284,7 @@ if (BYTES_BIG_ENDIAN)
>HOST_WIDE_INT max = GET_MODE_NUNITS (mode);
>operands[3] = GEN_INT (lane);
>int regno = REGNO (operands[0]);
> -  if (lane < 0 || lane >= max)
> -error ("lane out of range");
> -  else if (lane >= max / 2)
> +  if (lane >= max / 2)
>  {
>lane -= max / 2;
>regno += 2;
> @@ -4372,8 +4368,6 @@ if (BYTES_BIG_ENDIAN)
>HOST_WIDE_INT lane = ENDIAN_LANE_N(mode, INTVAL (operands[2]));
>HOST_WIDE_INT max = GET_MODE_NUNITS (mode);
>operands[2] = GEN_INT (lane);
> -  if (lane < 0 || lane >= max)
> -error ("lane out of range");
>if (max == 1)
>  return "vst1.\t{%P1}, %A0";
>else
> @@ -4393,9 +4387,7 @@ if (BYTES_BIG_ENDIAN)
>HOST_WIDE_INT lane = ENDIAN_LANE_N(mode, INTVAL (operands[2]));
>HOST_WIDE_INT max = GET_MODE_NUNITS (mode);
>int regno = REGNO (operands[1]);
> -  if (lane < 0 || lane >= max)
> -error ("lane out of range");
> -  else if (lane >= max / 2)
> +  if (lane >= max / 2)
>  {
>lane -= max / 2;
>regno += 2;
> @@ -4464,8 +4456,6 @@ if (BYTES_BIG_ENDIAN)
>HOST_WIDE_INT max = GET_MODE_NUNITS (mode);
>int regno = REGNO (operands[0]);
>rtx ops[4];
> -  if (lane < 0 || lane >= max)
> -error ("lane out of range");
>ops[0] = gen_rtx_REG (DImode, regno);
>ops[1] = gen_rtx_REG (DImode, regno + 2);
>ops[2] = operands[1];
> @@ -4489,9 +4479,7 @@ if (BYTES_BIG_ENDIAN)
>HOST_WIDE_INT max = GET_MODE_NUNITS (mode);
>int regno = REGNO (operands[0]);
>rtx ops[4];
> -  if (lane < 0 || lane >= max)
> -error ("lane out of range");
> -  else if (lane >= max / 2)
> +  if (lane >= max / 2)
>  {
>lane -= max / 2;
>regno += 2;
> @@ -4579,8 +4567,6 @@ if (BYTES_BIG_ENDIAN)
>HOST_WIDE_INT max = GET_MODE_NUNITS (mode);
>int regno = REGNO (operands[1]);
>rtx ops[4];
> -  if (lane < 0 || lane >= max)
> -error ("lane out of range");
>ops[0] = operands[0];
>ops[1] = gen_rtx_REG (DImode, regno);
>ops[2] = gen_rtx_REG (DImode, regno + 2);
> @@ -4604,9 +4590,7 @@ if (BYTES_BIG_ENDIAN)
>HOST_WIDE_INT max = GET_MODE_NUNITS (mode);
>int regno = REGNO (operands[1]);
>rtx ops[4];
> -  if (lane < 0 || lane >= max)
> -error ("lane out of range");
> -  else if (lane >= max / 2)
> +  if (lane >= max / 2)
>  {
>lane -= max / 2;
>regno += 2;
> @@ -4723,8 +4707,6 @@ if (BYTES_BIG_ENDIAN)
>HOST_WIDE_INT max = GET_MODE_NUNITS (mode);
>int regno = REGNO (operands[0]);
>rtx ops[5];
> -  if (lane < 0 || lane >= max)
> -error ("lane out of range");
>ops[0] = gen_rtx_REG (DImode, regno);
>ops[1] = gen_rtx_REG (DImode, regno + 2);
>ops[2] = gen_rtx_REG (DImode, regno + 4);
> @@ -4750,9 +4732,7 @@ if (BYTES_BIG_ENDIAN)
>HOST_WIDE_INT max = GET_MODE_NUNITS (mode);
>int regno = REGNO (operands[0]);
>rtx ops[5];
> -  if (lane < 0 || lane >= max)
> -error ("lane out of range");
> -  else if (lane >= max / 2)
> +  if (lane >= max / 2)
>  {
>lane -= max / 2;
>regno += 2;
> @@ -4895,8 +4875,6 @@ if (BYTES_BIG_ENDIAN)
>HOST_WIDE_INT max = GET_MODE_NUNITS (mode);
>int regno = REGNO (operands[1]);
>rtx ops[5];
> -  if (lane < 0 || lane >= max)
> -error ("lane out of range");
>ops[0] = operands[0];
>ops[1] = gen_rtx_REG (DImode, regno);
>ops[2] = gen_rtx_REG (DImode, regno + 2);
> @@ -4922,9 +4900,7 @@ if (BYTES_BIG_ENDIAN)
>HOST_WIDE_INT max = GET_MODE_NUNITS (mode);
>int regno = REGNO (operands[1]);
>

Re: [vec-cmp, patch 4/6] Support vector mask invariants

2015-11-09 Thread Richard Biener

On Mon, Nov 9, 2015 at 1:11 PM, Ilya Enkovich  wrote:
> On 26 Oct 16:21, Richard Biener wrote:
>> On Wed, Oct 14, 2015 at 6:13 PM, Ilya Enkovich  
>> wrote:
>> > -   val = fold_unary (VIEW_CONVERT_EXPR, TREE_TYPE (type), val);
>> > +   {
>> > + /* Can't use VIEW_CONVERT_EXPR for booleans because
>> > +of possibly different sizes of scalar value and
>> > +vector element.  */
>> > + if (VECTOR_BOOLEAN_TYPE_P (type))
>> > +   {
>> > + if (integer_zerop (val))
>> > +   val = build_int_cst (TREE_TYPE (type), 0);
>> > + else if (integer_onep (val))
>> > +   val = build_int_cst (TREE_TYPE (type), 1);
>> > + else
>> > +   gcc_unreachable ();
>> > +   }
>> > + else
>> > +   val = fold_unary (VIEW_CONVERT_EXPR, TREE_TYPE (type), 
>> > val);
>>
>> I think the existing code is fine with using fold_convert () here
>> which should also work
>> for the boolean types.  So does just
>>
>>   val = fold_convert (TREE_TYPE (type), val);
>>
>> work?
>
> It seems to work OK.
>
>>
>> > @@ -7428,13 +7459,13 @@ vectorizable_condition (gimple *stmt, 
>> > gimple_stmt_iterator *gsi,
>> >   gimple *gtemp;
>> >   vec_cond_lhs =
>> >   vect_get_vec_def_for_operand (TREE_OPERAND (cond_expr, 0),
>> > -   stmt, NULL);
>> > +   stmt, NULL, comp_vectype);
>> >   vect_is_simple_use (TREE_OPERAND (cond_expr, 0), stmt,
>> >   loop_vinfo, , , [0]);
>> >
>> >   vec_cond_rhs =
>> > vect_get_vec_def_for_operand (TREE_OPERAND (cond_expr, 1),
>> > -   stmt, NULL);
>> > + stmt, NULL, comp_vectype);
>> >   vect_is_simple_use (TREE_OPERAND (cond_expr, 1), stmt,
>> >   loop_vinfo, , , [1]);
>>
>> I still don't like this very much but I guess without some major
>> refactoring of all
>> the functions there isn't a better way to do it for now.
>>
>> Thus, ok with trying the change suggested above.
>>
>> Thanks,
>> Richard.
>>
>
> Here is an updated version.

Ok.

Thanks,
Richard.

> Thanks,
> Ilya
> --
> gcc/
>
> 2015-11-09  Ilya Enkovich  
>
> * expr.c (const_vector_mask_from_tree): New.
> (const_vector_from_tree): Use const_vector_mask_from_tree
> for boolean vectors.
> * tree-vect-stmts.c (vect_init_vector): Support boolean vector
> invariants.
> (vect_get_vec_def_for_operand): Add VECTYPE arg.
> (vectorizable_condition): Directly provide vectype for invariants
> used in comparison.
> * tree-vectorizer.h (vect_get_vec_def_for_operand): Add VECTYPE
> arg.
>
>
> diff --git a/gcc/expr.c b/gcc/expr.c
> index 2b2174f..03936ee 100644
> --- a/gcc/expr.c
> +++ b/gcc/expr.c
> @@ -11423,6 +11423,40 @@ try_tablejump (tree index_type, tree index_expr, 
> tree minval, tree range,
>return 1;
>  }
>
> +/* Return a CONST_VECTOR rtx representing vector mask for
> +   a VECTOR_CST of booleans.  */
> +static rtx
> +const_vector_mask_from_tree (tree exp)
> +{
> +  rtvec v;
> +  unsigned i;
> +  int units;
> +  tree elt;
> +  machine_mode inner, mode;
> +
> +  mode = TYPE_MODE (TREE_TYPE (exp));
> +  units = GET_MODE_NUNITS (mode);
> +  inner = GET_MODE_INNER (mode);
> +
> +  v = rtvec_alloc (units);
> +
> +  for (i = 0; i < VECTOR_CST_NELTS (exp); ++i)
> +{
> +  elt = VECTOR_CST_ELT (exp, i);
> +
> +  gcc_assert (TREE_CODE (elt) == INTEGER_CST);
> +  if (integer_zerop (elt))
> +   RTVEC_ELT (v, i) = CONST0_RTX (inner);
> +  else if (integer_onep (elt)
> +  || integer_minus_onep (elt))
> +   RTVEC_ELT (v, i) = CONSTM1_RTX (inner);
> +  else
> +   gcc_unreachable ();
> +}
> +
> +  return gen_rtx_CONST_VECTOR (mode, v);
> +}
> +
>  /* Return a CONST_VECTOR rtx for a VECTOR_CST tree.  */
>  static rtx
>  const_vector_from_tree (tree exp)
> @@ -11438,6 +11472,9 @@ const_vector_from_tree (tree exp)
>if (initializer_zerop (exp))
>  return CONST0_RTX (mode);
>
> +  if (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (exp)))
> +return const_vector_mask_from_tree (exp);
> +
>units = GET_MODE_NUNITS (mode);
>inner = GET_MODE_INNER (mode);
>
> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> index ee549f4..af203ab 100644
> --- a/gcc/tree-vect-stmts.c
> +++ b/gcc/tree-vect-stmts.c
> @@ -1300,7 +1300,7 @@ vect_init_vector (gimple *stmt, tree val, tree type, 
> gimple_stmt_iterator *gsi)
>if (!types_compatible_p (TREE_TYPE (type), TREE_TYPE (val)))
> {
>   if (CONSTANT_CLASS_P (val))
> -   val

Re: Add a combined_fn enum

2015-11-09 Thread Richard Sandiford

Bernd Schmidt  writes:
> On 11/09/2015 11:24 AM, Richard Sandiford wrote:
>> Bernd Schmidt  writes:
>>> I see it's already acked, but have you considered just doing away with
>>> the builtin/internal function distinction?
>>
>> I think they're too different to be done away with entirely.  built-in
>> functions map directly to a specific C-level callable function and
>> must have an fndecl, whereas no internal function should have an fndecl.
>> Whether a built-in function is available depends on the selected
>> language and what declarations the front-end has seen, while whether
>> an internal function is available depends entirely on GCC internal
>> information.
>
> Yes... but aren't these things fixable relatively easily (compared with 
> what your patches are doing)?

I'm not sure what you mean by "fix" though.  I don't think we can change
any of the constraints above.

> I also have the problem that I can't quite see where your patch series 
> is going. Let's take "Add internal bitcount functions", it adds new 
> internal functions but no users AFAICS. What is the end goal here (there 
> doesn't seem to be a [0/N] description in my inbox)?

The main goal is to allow functions to be vectorised simply by defining
the associated optab.  At the moment you can get a scalar square root
instruction simply by defining something like sqrtdf2.  But if you want
to have vectorised sqrt, you need to have a target-specific C-level
built-in function for the vector form of sqrt, implement
TARGET_BUILTIN_VECTORIZED_FUNCTION, and expand the sqrt in the same
way that the target expands other directly-called built-in functions.
That seems unnecessarily indirect, especially when in practice those
target-specific functions tend to use patterns like sqrtv2df2 anyway.
It seems better to have GCC use the vector optabs directly.

This was prompted by the patch Dave Sherwood posted to support scalar
and vector fmin() and fmax() even with -fno-fast-math on aarch64.
As things stood we needed the same approach: use an optab for the scalar
version and TARGET_BUILTIN_VECTORIZED_FUNCTION for the vector version.
The problem is that at present there's no aarch64 built-in function that
does what we want, so we'd have to define a new one.  And that seems
silly when GCC really ought to be able to use the vector version
of the optab without any more help from the target.

I'm hoping to post those patches later today.

But the series has other side-benefits, like:

- allowing genmatch to fold calls to internal functions

- splitting the computational part of maths function from the
  the fallback errno handling at an earlier point, so that they
  get more optimisation

- clearly separating out the call folds that we're prepared to do
  on gimple, rather than calling into builtins.c.

Thanks,
Richard

Re: [PATCH] Fix memory leaks and use a pool_allocator

2015-11-09 Thread Trevor Saunders

On Mon, Nov 09, 2015 at 01:11:48PM +0100, Richard Biener wrote:
> On Mon, Nov 9, 2015 at 12:22 PM, Martin Liška  wrote:
> > Hi.
> >
> > This is follow-up of changes that Richi started on Friday.
> >
> > Patch can bootstrap on x86_64-linux-pc and regression tests are running.
> >
> > Ready for trunk?
> 
> * tree-ssa-dom.c (free_edge_info): Make the function extern.
> ...
> * tree-ssa.h (free_edge_info): Declare function extern.
> 
> declare this in tree-ssa-threadupdate.h instead and renaming it to
> sth less "public", like free_dom_edge_info.
> 
> diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
> index fff62de..eb6b7df 100644
> --- a/gcc/ifcvt.c
> +++ b/gcc/ifcvt.c
> @@ -3161,6 +3161,8 @@ noce_convert_multiple_sets (struct noce_if_info 
> *if_info)
>set_used_flags (targets[i]);
>  }
> 
> +  temporaries.release ();
> +
>set_used_flags (cond);
>set_used_flags (x);
>set_used_flags (y);
> @@ -3194,6 +3196,7 @@ noce_convert_multiple_sets (struct noce_if_info 
> *if_info)
>  }
> 
>num_updated_if_blocks++;
> +  targets.release ();
>return TRUE;
> 
> suspiciously look like candidates for an auto_vec<> (didn't check).

I was about to say the same thing after a little checking (maybe the
region one in tree-ssa-threadupdate.c to, but didn't check that)

> 
> @@ -1240,6 +1240,7 @@ lra_create_live_ranges_1 (bool all_p, bool dead_insn_p)
>dead_set = sparseset_alloc (max_regno);
>unused_set = sparseset_alloc (max_regno);
>curr_point = 0;
> +  point_freq_vec.release ();
>point_freq_vec.create (get_max_uid () * 2);
> 
> a truncate (0) instead of a release () should be cheaper, avoiding the
> re-allocation.

yeah, or even change it to just grow the array, afaict it doesn't expect
the array to be cleared?

> @@ -674,6 +674,10 @@ sra_deinitialize (void)
>assign_link_pool.release ();
>obstack_free (_obstack, NULL);
> 
> +  for (hash_map::iterator it =
> +   base_access_vec->begin (); it != base_access_vec->end (); ++it)
> +(*it).second.release ();
> +
>delete base_access_vec;
> 
> I wonder if the better fix is to provide a proper free method for the 
> hash_map?
> A hash_map with 'auto_vec' looks suspicous - eventually a proper release
> was intented here via default_hash_map_traits <>?

in fact I would expect that already works, but apparently not, so I'd
say that's the bug.

Trev

> 
> Anyway, most of the things above can be improved as followup of course.
> 
> Thanks,
> Richard.
> 
> > Thanks,
> > Martin

Re: [PATCH] S/390: Fix warning in "*movstr" pattern.

2015-11-09 Thread Andreas Krebbel

On 11/04/2015 02:39 AM, Dominik Vogt wrote:
> On Tue, Nov 03, 2015 at 06:47:28PM +0100, Ulrich Weigand wrote:
>> Dominik Vogt wrote:
>>
>>> @@ -2936,7 +2936,7 @@
>>> (set (mem:BLK (match_operand:P 1 "register_operand" "0"))
>>> (mem:BLK (match_operand:P 3 "register_operand" "2")))
>>> (set (match_operand:P 0 "register_operand" "=d")
>>> -   (unspec [(mem:BLK (match_dup 1))
>>> +   (unspec:P [(mem:BLK (match_dup 1))
>>>  (mem:BLK (match_dup 3))
>>>  (reg:SI 0)] UNSPEC_MVST))
>>> (clobber (reg:CC CC_REGNUM))]
>>
>> Don't you have to change the expander too?  Otherwise the
>> pattern will no longer match ...
> 
> Yes, you're right.  This turned out to be a bit tricky to do
> because the "movstr" expander doesn't allow variants with
> different modes.  :-/
> 
> New patch attached, including a test case that works on 31-bit and
> 64-bit.

Could you please check that the generated code doesn't change with a larger 
code base (e.g.
speccpu)?  It should not affect it but I really think we omitted the mode here 
for a reason
(although I don't remember why).

-Andreas-

Re: [PATCH] Fix memory leaks and use a pool_allocator

2015-11-09 Thread Martin Liška

On 11/09/2015 01:11 PM, Richard Biener wrote:
> On Mon, Nov 9, 2015 at 12:22 PM, Martin Liška  wrote:
>> Hi.
>>
>> This is follow-up of changes that Richi started on Friday.
>>
>> Patch can bootstrap on x86_64-linux-pc and regression tests are running.
>>
>> Ready for trunk?
> 
> * tree-ssa-dom.c (free_edge_info): Make the function extern.
> ...
> * tree-ssa.h (free_edge_info): Declare function extern.
> 
> declare this in tree-ssa-threadupdate.h instead and renaming it to
> sth less "public", like free_dom_edge_info.
> 
> diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
> index fff62de..eb6b7df 100644
> --- a/gcc/ifcvt.c
> +++ b/gcc/ifcvt.c
> @@ -3161,6 +3161,8 @@ noce_convert_multiple_sets (struct noce_if_info 
> *if_info)
>set_used_flags (targets[i]);
>  }
> 
> +  temporaries.release ();
> +
>set_used_flags (cond);
>set_used_flags (x);
>set_used_flags (y);
> @@ -3194,6 +3196,7 @@ noce_convert_multiple_sets (struct noce_if_info 
> *if_info)
>  }
> 
>num_updated_if_blocks++;
> +  targets.release ();
>return TRUE;
> 
> suspiciously look like candidates for an auto_vec<> (didn't check).
> 
> @@ -1240,6 +1240,7 @@ lra_create_live_ranges_1 (bool all_p, bool dead_insn_p)
>dead_set = sparseset_alloc (max_regno);
>unused_set = sparseset_alloc (max_regno);
>curr_point = 0;
> +  point_freq_vec.release ();
>point_freq_vec.create (get_max_uid () * 2);
> 
> a truncate (0) instead of a release () should be cheaper, avoiding the
> re-allocation.
> 
> @@ -674,6 +674,10 @@ sra_deinitialize (void)
>assign_link_pool.release ();
>obstack_free (_obstack, NULL);
> 
> +  for (hash_map::iterator it =
> +   base_access_vec->begin (); it != base_access_vec->end (); ++it)
> +(*it).second.release ();
> +
>delete base_access_vec;
> 
> I wonder if the better fix is to provide a proper free method for the 
> hash_map?
> A hash_map with 'auto_vec' looks suspicous - eventually a proper release
> was intented here via default_hash_map_traits <>?
> 
> Anyway, most of the things above can be improved as followup of course.
> 
> Thanks,
> Richard.
> 
>> Thanks,
>> Martin

Hi.

All suggested changes were applied, sending v2 and waiting for bootstrap and
regression tests.

Thanks,
Martin
>From c97270f2daadcca1efe6201adf1eb0df469ca91e Mon Sep 17 00:00:00 2001
From: marxin 
Date: Mon, 9 Nov 2015 10:49:14 +0100
Subject: [PATCH 1/2] Fix memory leaks and use a pool_allocator

gcc/ChangeLog:

2015-11-09  Martin Liska  

	* gcc.c (record_temp_file): Release name string.
	* ifcvt.c (noce_convert_multiple_sets): Use auto_vec instead
	of vec.
	* lra-lives.c (free_live_range_list): Utilize
	lra_live_range_pool for allocation and deallocation.
	(create_live_range): Likewise.
	(copy_live_range): Likewise.
	(lra_merge_live_ranges): Likewise.
	(remove_some_program_points_and_update_live_ranges): Likewise.
	(lra_create_live_ranges_1): Release point_freq_vec that can
	be not freed from previous iteration of the function.
	* tree-eh.c (lower_try_finally_switch): Use auto_vec instead of
	vec.
	* tree-sra.c (sra_deinitialize): Release all vectors in
	base_access_vec.
	* tree-ssa-dom.c (free_dom_edge_info): Make the function extern.
	* tree-ssa-threadupdate.c (remove_ctrl_stmt_and_useless_edges):
	Release edge_info for a removed edge.
	(thread_through_all_blocks): Free region vector.
	* tree-ssa.h (free_dom_edge_info): Declare function extern.
---
 gcc/gcc.c   |  5 -
 gcc/ifcvt.c |  8 +---
 gcc/lra-lives.c | 14 --
 gcc/tree-eh.c   |  2 +-
 gcc/tree-sra.c  |  6 ++
 gcc/tree-ssa-dom.c  |  8 
 gcc/tree-ssa-threadupdate.c |  6 +-
 gcc/tree-ssa-threadupdate.h |  1 +
 8 files changed, 34 insertions(+), 16 deletions(-)

diff --git a/gcc/gcc.c b/gcc/gcc.c
index bbc9b23..8bbf5be 100644
--- a/gcc/gcc.c
+++ b/gcc/gcc.c
@@ -2345,7 +2345,10 @@ record_temp_file (const char *filename, int always_delete, int fail_delete)
   struct temp_file *temp;
   for (temp = always_delete_queue; temp; temp = temp->next)
 	if (! filename_cmp (name, temp->name))
-	  goto already1;
+	  {
+	free (name);
+	goto already1;
+	  }
 
   temp = XNEW (struct temp_file);
   temp->next = always_delete_queue;
diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index fff62de..3401faa 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -3076,12 +3076,12 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
   rtx_code cond_code = GET_CODE (cond);
 
   /* The true targets for a conditional move.  */
-  vec targets = vNULL;
+  auto_vec targets;
   /* The temporaries introduced to allow us to not consider register
  overlap.  */
-  vec temporaries = vNULL;
+  auto_vec temporaries;
   /* The insns we've emitted.  */
-  vec unmodified_insns = vNULL;
+  auto_vec unmodified_insns;
   int count = 0;
 
   FOR_BB_INSNS (then_bb, insn)
@@ -3161,6

Re: OpenACC Firstprivate

2015-11-09 Thread Jakub Jelinek

On Sat, Nov 07, 2015 at 08:50:28AM -0500, Nathan Sidwell wrote:
> Index: gcc/gimplify.c
> ===
> --- gcc/gimplify.c(revision 229892)
> +++ gcc/gimplify.c(working copy)
> @@ -108,9 +108,15 @@ enum omp_region_type
>/* Data region with offloading.  */
>ORT_TARGET = 32,
>ORT_COMBINED_TARGET = 33,
> +
> +  ORT_ACC = 0x40,  /* An OpenACC region.  */
> +  ORT_ACC_DATA = ORT_ACC | ORT_TARGET_DATA, /* Data construct.  */
> +  ORT_ACC_PARALLEL = ORT_ACC | ORT_TARGET,  /* Parallel construct */
> +  ORT_ACC_KERNELS  = ORT_ACC | ORT_TARGET | 0x80,  /* Kernels construct.  */
> +
>/* Dummy OpenMP region, used to disable expansion of
>   DECL_VALUE_EXPRs in taskloop pre body.  */
> -  ORT_NONE = 64
> +  ORT_NONE = 0x100
>  };

If you want to switch to hexadecimal, you should change all values
in the enum to hexadecimal for consistency.
>  
>  /* Gimplify hashtable helper.  */
> @@ -377,6 +383,12 @@ new_omp_context (enum omp_region_type re
>else
>  c->default_kind = OMP_CLAUSE_DEFAULT_UNSPECIFIED;
>  
> +  c->combined_loop = false;
> +  c->distribute = false;
> +  c->target_map_scalars_firstprivate = false;
> +  c->target_map_pointers_as_0len_arrays = false;
> +  c->target_firstprivatize_array_bases = false;

Why this?  c is XCNEW allocated, so zero initialized.

> @@ -5667,11 +5682,13 @@ omp_add_variable (struct gimplify_omp_ct
>/* We shouldn't be re-adding the decl with the same data
>sharing class.  */
>gcc_assert ((n->value & GOVD_DATA_SHARE_CLASS & flags) == 0);
> -  /* The only combination of data sharing classes we should see is
> -  FIRSTPRIVATE and LASTPRIVATE.  */
>nflags = n->value | flags;
> -  gcc_assert ((nflags & GOVD_DATA_SHARE_CLASS)
> -   == (GOVD_FIRSTPRIVATE | GOVD_LASTPRIVATE)
> +  /* The only combination of data sharing classes we should see is
> +  FIRSTPRIVATE and LASTPRIVATE.  However, OpenACC permits
> +  reduction variables to be used in data sharing clauses.  */
> +  gcc_assert ((ctx->region_type & ORT_ACC) != 0
> +   || ((nflags & GOVD_DATA_SHARE_CLASS)
> +   == (GOVD_FIRSTPRIVATE | GOVD_LASTPRIVATE))
> || (flags & GOVD_DATA_SHARE_CLASS) == 0);

Are you sure you want to give up on any kind of consistency checks for
OpenACC?  If only reduction is special on OpenACC, perhaps you could tweak
the assert for that instead?  Something that can be done incrementally of
course.

> +
> +   /*  OpenMP doesn't look in outer contexts to find an
> +   enclosing data clause.  */

I'm puzzled by the comment.  OpenMP does look in outer context for clauses
that need that (pretty much all closes but private), that is the do_outer:
recursion in omp_notice_variable.  Say for firstprivate in order to copy (or
copy construct) the private variable one needs the access to the outer
context's var etc.).
So perhaps it would help to document what you are doing here for OpenACC and
why.

> +   struct gimplify_omp_ctx *octx = ctx->outer_context;
> +   if ((ctx->region_type & ORT_ACC) && octx)
> + {
> +   omp_notice_variable (octx, decl, in_code);
> +   
> +   for (; octx; octx = octx->outer_context)
> + {
> +   if (!(octx->region_type & (ORT_TARGET_DATA | ORT_TARGET)))
> + break;
> +   splay_tree_node n2
> + = splay_tree_lookup (octx->variables,
> +  (splay_tree_key) decl);
> +   if (n2)
> + {
> +   nflags |= GOVD_MAP;
> +   goto found_outer;
> + }
> + }
>   }
> -   else if (nflags == flags)
> - nflags |= GOVD_MAP;
> +

The main issue I have is with the omp-low.c changes.
I see:
"2.5.9
private clause
The private clause is allowed on the parallel construct; it declares that a 
copy of each
item on the list will be created for each parallel gang.

2.5.10
firstprivate clause
The firstprivate clause is allowed on the parallel construct; it declares that 
a copy
of each item on the list will be created for each parallel gang, and that the 
copy will be
initialized with the value of that item on the host when the parallel construct 
is
encountered."

but looking at what you actually emit looks like standard present_copyin
clause I think with a private variable defined in the region where the
value of the present_copyin mapped variable is assigned to the private one.
This I'm afraid performs often two copies rather than just one (one to copy
the host value to the present_copyin mapped value, another one in the
region), but more importantly, if the var is already mapped, you could
initialize the private var with old data.
Say
  int arr[64];
// initialize arr
#pragma acc data copyin (arr)
{
  // modify arr on the host
  # pragma acc parallel firstprivate (arr)

Re: RFC: C++ delayed folding merge

2015-11-09 Thread Jason Merrill


On 11/09/2015 04:24 AM, Eric Botcazou wrote:

One question: The branch changes 'convert' to not fold its result, and
it's not clear to me whether that's part of the expected behavior of a
front end 'convert' function or not.


I don't think that you should change the behavior for front-ends that have an
internal representation distinct from the GENERIC trees and thus do a global
translation to GENERIC at the end; e.g. in the Ada compiler we'd rather have
*more* folding than less during this translation.


Right, the change is just to the C++ front end 'convert'.

Jason

Re: OpenACC Firstprivate

2015-11-09 Thread Nathan Sidwell


On 11/09/15 08:46, Jakub Jelinek wrote:

On Sat, Nov 07, 2015 at 08:50:28AM -0500, Nathan Sidwell wrote:

Index: gcc/gimplify.c
===




If you want to switch to hexadecimal, you should change all values
in the enum to hexadecimal for consistency.


ok.



  /* Gimplify hashtable helper.  */
@@ -377,6 +383,12 @@ new_omp_context (enum omp_region_type re
else
  c->default_kind = OMP_CLAUSE_DEFAULT_UNSPECIFIED;

+  c->combined_loop = false;
+  c->distribute = false;
+  c->target_map_scalars_firstprivate = false;
+  c->target_map_pointers_as_0len_arrays = false;
+  c->target_firstprivatize_array_bases = false;


Why this?  c is XCNEW allocated, so zero initialized.


I presumed it necessary, as it was on the branch.  will  remove.




@@ -5667,11 +5682,13 @@ omp_add_variable (struct gimplify_omp_ct
/* We shouldn't be re-adding the decl with the same data
 sharing class.  */
gcc_assert ((n->value & GOVD_DATA_SHARE_CLASS & flags) == 0);
-  /* The only combination of data sharing classes we should see is
-FIRSTPRIVATE and LASTPRIVATE.  */
nflags = n->value | flags;
-  gcc_assert ((nflags & GOVD_DATA_SHARE_CLASS)
- == (GOVD_FIRSTPRIVATE | GOVD_LASTPRIVATE)
+  /* The only combination of data sharing classes we should see is
+FIRSTPRIVATE and LASTPRIVATE.  However, OpenACC permits
+reduction variables to be used in data sharing clauses.  */
+  gcc_assert ((ctx->region_type & ORT_ACC) != 0
+ || ((nflags & GOVD_DATA_SHARE_CLASS)
+ == (GOVD_FIRSTPRIVATE | GOVD_LASTPRIVATE))
  || (flags & GOVD_DATA_SHARE_CLASS) == 0);


Are you sure you want to give up on any kind of consistency checks for
OpenACC?  If only reduction is special on OpenACC, perhaps you could tweak
the assert for that instead?  Something that can be done incrementally of
course.


Will investigate (later)




+
+ /*  OpenMP doesn't look in outer contexts to find an
+ enclosing data clause.  */


I'm puzzled by the comment.  OpenMP does look in outer context for clauses
that need that (pretty much all closes but private), that is the do_outer:
recursion in omp_notice_variable.  Say for firstprivate in order to copy (or
copy construct) the private variable one needs the access to the outer
context's var etc.).
So perhaps it would help to document what you are doing here for OpenACC and
why.


Ok.  It seemed (and it may become clearer with default handling added), that 
OpenACC  and OpenMP scanned scopes in opposite orders.  I remember trying to 
get the ACC code to scan in the same order, but came up blank.  Anyway, you're 
right, it should say what OpenACC is trying.




The main issue I have is with the omp-low.c changes.
I see:
"2.5.9
private clause
The private clause is allowed on the parallel construct; it declares that a 
copy of each
item on the list will be created for each parallel gang.

2.5.10
firstprivate clause
The firstprivate clause is allowed on the parallel construct; it declares that 
a copy
of each item on the list will be created for each parallel gang, and that the 
copy will be
initialized with the value of that item on the host when the parallel construct 
is
encountered."

but looking at what you actually emit looks like standard present_copyin
clause I think with a private variable defined in the region where the
value of the present_copyin mapped variable is assigned to the private one.




This I'm afraid performs often two copies rather than just one (one to copy
the host value to the present_copyin mapped value, another one in the
region),


I don't think that can be avoided.  The host doesn't have control over when the 
CTAs (a gang) start -- they may even be serialized onto the same physical HW. 
So each gang has to initialize its own instance.  Or did you mean something else?



but more importantly, if the var is already mapped, you could
initialize the private var with old data.




Say
   int arr[64];
// initialize arr
#pragma acc data copyin (arr)
{
   // modify arr on the host
   # pragma acc parallel firstprivate (arr)
   {
 ...
   }
}


Hm, I suspect that is either ill formed or the std does not contemplate.


Is that really what you want?  If not, any reason not to implement
GOMP_MAP_FIRSTPRIVATE and GOMP_MAP_FIRSTPRIVATE_INT on the libgomp oacc-*
side and just use the OpenMP firstprivate handling in omp-low.c?


I would have to investigate ...

nathan

Re: [PATCH] Use signed boolean type for boolean vectors

2015-11-09 Thread Richard Biener

On Mon, Nov 9, 2015 at 3:03 PM, Ilya Enkovich  wrote:
> On 03 Nov 14:42, Richard Biener wrote:
>> On Wed, Oct 28, 2015 at 4:30 PM, Ilya Enkovich  
>> wrote:
>> > 2015-10-28 18:21 GMT+03:00 Richard Biener :
>> >> On Wed, Oct 28, 2015 at 2:13 PM, Ilya Enkovich  
>> >> wrote:
>> >>> Hi,
>> >>>
>> >>> Testing boolean vector conversions I found several runtime regressions
>> >>> and investigation showed it's due to incorrect conversion caused by
>> >>> unsigned boolean type.  When boolean vector is represented as an
>> >>> integer vector on target it's a signed integer actually.  Unsigned
>> >>> boolean type was chosen due to possible single bit values, but for
>> >>> multiple bit values it causes wrong casting.  The easiest way to fix
>> >>> it is to use signed boolean value.  The following patch does this and
>> >>> fixes my problems with conversion.  Bootstrapped and tested on
>> >>> x86_64-unknown-linux-gnu.  Is it OK?
>> >>
>> >> Hmm.  Actually formally the "boolean" vectors were always 0 or -1
>> >> (all bits set).  That is also true for a signed boolean with precision 1
>> >> but with higher precision what makes sure to sign-extend 'true'?
>> >>
>> >> So it's far from an obvious change, esp as you don't change the
>> >> precision == 1 case.  [I still think we should have precision == 1
>> >> for all boolean types]
>> >>
>> >> Richard.
>> >>
>> >
>> > For 1 bit precision signed type value 1 is out of range, right? This might 
>> > break
>> > in many place due to used 1 as true value.
>>
>> For vectors -1 is true.  Did you try whether it breaks many places?
>> build_int_cst (type, 1) should still work fine.
>>
>> Richard.
>>
>
> I tried it and didn't find any new failures.  So looks I was wrong assuming 
> it should cause many failures.  Testing is not complete because many SPEC 
> benchmarks are failing to compile on -O3 for AVX-512 on trunk.  But I think 
> we may proceed with signed type and fix constant generation issues if any 
> revealed.  This patch was bootstrapped and regtested on 
> x86_64-unknown-linux-gnu.  OK for trunk?

Ok.

Richard.

> Thanks,
> Ilya
> --
> gcc/
>
> 2015-11-09  Ilya Enkovich  
>
> * optabs.c (expand_vec_cond_expr): Always get sign from type.
> * tree.c (wide_int_to_tree): Support negative values for boolean.
> (build_nonstandard_boolean_type): Use signed type for booleans.
>
>
> diff --git a/gcc/optabs.c b/gcc/optabs.c
> index fdcdc6a..44971ad 100644
> --- a/gcc/optabs.c
> +++ b/gcc/optabs.c
> @@ -5365,7 +5365,6 @@ expand_vec_cond_expr (tree vec_cond_type, tree op0, 
> tree op1, tree op2,
>op0a = TREE_OPERAND (op0, 0);
>op0b = TREE_OPERAND (op0, 1);
>tcode = TREE_CODE (op0);
> -  unsignedp = TYPE_UNSIGNED (TREE_TYPE (op0a));
>  }
>else
>  {
> @@ -5374,9 +5373,9 @@ expand_vec_cond_expr (tree vec_cond_type, tree op0, 
> tree op1, tree op2,
>op0a = op0;
>op0b = build_zero_cst (TREE_TYPE (op0));
>tcode = LT_EXPR;
> -  unsignedp = false;
>  }
>cmp_op_mode = TYPE_MODE (TREE_TYPE (op0a));
> +  unsignedp = TYPE_UNSIGNED (TREE_TYPE (op0a));
>
>
>gcc_assert (GET_MODE_SIZE (mode) == GET_MODE_SIZE (cmp_op_mode)
> diff --git a/gcc/tree.c b/gcc/tree.c
> index 18d6544..6fb4c09 100644
> --- a/gcc/tree.c
> +++ b/gcc/tree.c
> @@ -1437,7 +1437,7 @@ wide_int_to_tree (tree type, const wide_int_ref )
> case BOOLEAN_TYPE:
>   /* Cache false or true.  */
>   limit = 2;
> - if (hwi < 2)
> + if (IN_RANGE (hwi, 0, 1))
> ix = hwi;
>   break;
>
> @@ -8069,7 +8069,7 @@ build_nonstandard_boolean_type (unsigned HOST_WIDE_INT 
> precision)
>
>type = make_node (BOOLEAN_TYPE);
>TYPE_PRECISION (type) = precision;
> -  fixup_unsigned_type (type);
> +  fixup_signed_type (type);
>
>if (precision <= MAX_INT_CACHED_PREC)
>  nonstandard_boolean_type_cache[precision] = type;

RE: [RFC][PATCH] Preferred rename register in regrename pass

2015-11-09 Thread Robert Suchanek

Hi Bernd,

Sorry for late reply.

The updated patch was bootstrapped on x86_64-unknown-linux-gnu and cross tested
on mips-img-linux-gnu using r229786.

The results below were generated for CSiBE benchmark and the numbers in
columns express bytes in format 'net (gain/loss)' to show the difference
with and without the patch when -frename-registers switch is used. 

I looked at the gains, especially for MIPS and 'teem', and it appears
that renaming registers affects the rtl_dce pass i.e. makes it less effective.
However, on average case the patch appears to reduce the code size
slightly and moves are genuinely removed.  

I haven't tested the performance extensively but the SPEC benchmarks
showed almost the same results, which could be just the noise. 


   | MIPS n64 -Os   | MIPS o32 -Os   | x86_64 -Os   |
---+++--+
bzip2-1.0.2| -32  (0/-32)   | -24  (0/-24)   | -34   (1/-35)|
cg_compiler| -172 (0/-172)  | -156 (0/-156)  | -46   (0/-46)|
compiler   | -36  (0/-36)   | -24  (0/-24)   | -6(0/-6) |
flex-2.5.31| -68  (0/-68)   | -80  (0/-80)   | -98   (7/-105)   |
jikespg-1.3| -284 (0/-284)  | -204 (0/-204)  | -127  (9/-136)   |
jpeg-6b| -52  (8/-60)   | -20  (0/-20)   | -80   (11/-91)   |
libmspack  | -136 (0/-136)  | -28  (0/-28)   | -33   (23/-56)   |
libpng-1.2.5   | -72  (0/-72)   | -64  (0/-64)   | -176  (14/-190)  |
linux-2.4.23   | -700 (20/-720) | -384 (0/-384)  | -691  (44/-735)  |
lwip-0.5.3 | -4   (0/-4)| -4   (0/-4)| +4(13/-9)|
mpeg2dec-0.3.1 | -16  (0/-16)   || -142  (6/-148)   |
mpgcut-1.1 | -24  (0/-24)   | -12  (4/-16)   | -2(0/-2) |
OpenTCP-1.0.4  | -28  (0/-28)   | -12  (0/-12)   | -1(0/-1) |
replaypc-0.4.0 | -32  (0/-32)   | -12  (0/-12)   | -4(2/-6) |
teem-1.6.0 | -88  (480/-568)| +108 (564/-456)| -1272 (117/-1389)|
ttt-0.10.1 | -24  (0/-24)   | -20  (0/-20)   | -16   (0/-16)|
unrarlib-0.4.0 | -20  (0/-20)   | -8   (0/-8)| -59   (9/-68)|
zlib-1.1.4 | -12  (0/-12)   | -4   (0/-4)| -23   (8/-31)|


   | MIPS n64 -O2   | MIPS o32 -O2   | x86_64 -O2   |
---+++--+
bzip2-1.0.2| -104 (0/-104)  | -48  (0/-48)   | -55   (0/-55)|
cg_compiler| -184 (4/-188)  | -232 (0/-232)  | -31   (5/-36)|
compiler   | -32  (0/-32)   | -12  (0/-12)   | -4(1/-5) |
flex-2.5.31| -96  (0/-96)   | -112 (0/-112)  | -12   (34/-46)   |
jikespg-1.3| -540 (20/-560) | -476 (4/-480)  | -154  (30/-184)  |
jpeg-6b| -112 (16/-128) | -60  (0/-60)   | -136  (84/-220)  |
libmspack  | -164 (0/-164)  | -40  (0/-40)   | -87   (32/-119)  |
libpng-1.2.5   | -120 (8/-128)  | -92  (4/-96)   | -140  (53/-193)  |
linux-2.4.23   | -596 (12/-608) | -320 (8/-328)  | -794  (285/-1079)|
lwip-0.5.3 | -8   (0/-8)| -8   (0/-8)| +2(4/-2) |
mpeg2dec-0.3.1 | -44  (0/-44)   | -4   (0/-4)| -122  (8/-130)   |
mpgcut-1.1 | -8   (0/-8)| -8   (0/-8)| +28   (32/-4)|
OpenTCP-1.0.4  | -4   (0/-4)| -4   (0/-4)| -2(0/-2) |
replaypc-0.4.0 | -20  (0/-20)   | -24  (0/-24)   | -13   (0/-13)|
teem-1.6.0 | +100 (740/-640)| +84  (736/-652)| -1998 (168/-2166)|
ttt-0.10.1 | -16  (0/-16)   ||  |
unrarlib-0.4.0 | -16  (0/-16)   | -8   (0/-8)| +19   (37/-18)   |
zlib-1.1.4 | -12  (0/-12)   | -4   (0/-4)| -15   (1/-16)|

Regards,
Robert

> Hi Robert,
> > gcc/
> > * regrename.c (create_new_chain): Initialize terminated_dead,
> > renamed and tied_chain.
> > (find_best_rename_reg): Pick and check register from the tied chain.
> > (regrename_do_replace): Mark head as renamed.
> > (scan_rtx_reg): Tie chains in move insns.  Set terminate_dead flag.
> > * regrename.h (struct du_head): Add tied_chain, renamed and
> > terminated_dead members.
> 
> Thanks - this looks a lot better already. You didn't say how it was
> bootstrapped and tested; please include this information for future
> submissions. For a patch like this, some data on the improvement you got
> would also be appreciated.
> 
> I'd still like to investigate the possibility of further simplification:
> 
> > +   {
> > + /* Find the input chain.  */
> > + for (i = c->id - 1; id_to_chain.iterate (i, ); i--)
> > +   if (head->last && head->last->insn == insn
> > +   && head->terminated_dead)
> > + {
> > +   gcc_assert (head->regno == REGNO (recog_data.operand[1]));
> > +   c->tied_chain = head;
> > +   head->tied_chain = c;
> > +
> > +   if (dump_file)
> > + fprintf (dump_file, "Tying chain %s (%d) with %s (%d)\n",
> > +  reg_names[c->regno], c->id,
> > +  reg_names[head->regno], head->id);

Re: [AArch64][PATCH 6/7] Add NEON intrinsics vqrdmlah and vqrdmlsh.

2015-11-09 Thread Christophe Lyon

On 30 October 2015 at 16:52, Matthew Wahab  wrote:
> On 30/10/15 12:51, Christophe Lyon wrote:
>>
>> On 23 October 2015 at 14:26, Matthew Wahab 
>> wrote:
>>>
>>> The ARMv8.1 architecture extension adds two Adv.SIMD instructions,
>>> sqrdmlah and sqrdmlsh. This patch adds the NEON intrinsics vqrdmlah and
>>> vqrdmlsh for these instructions. The new intrinsics are of the form
>>> vqrdml{as}h[q]_.
>>>
>>> Tested the series for aarch64-none-linux-gnu with native bootstrap and
>>> make check on an ARMv8 architecture. Also tested aarch64-none-elf with
>>> cross-compiled check-gcc on an ARMv8.1 emulator.
>>>
>>
>> Is there a publicly available simulator for v8.1? QEMU or Foundation
>> Model?
>>
>
> Sorry, I don't know.
> Matthew
>

So, what will happen to the testsuite once this is committed?
Are we going to see FAILs when using QEMU?

Thanks,

Christophe.

Re: [PATCH] Minor refactoring in tree-ssanames.c & freelists verifier

2015-11-09 Thread Michael Matz

Hi,

On Mon, 9 Nov 2015, Jeff Law wrote:

+verify_ssaname_freelists (struct function *fun)
+{
+  /* Do nothing if we are in RTL format.  */
+  basic_block bb;
+  FOR_EACH_BB_FN (bb, fun)
+{
+  if (bb->flags & BB_RTL)
+   return;
+}

gimple_in_ssa_p (fun);

+  /* Then note the operands of each statement.  */
+  for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
+  !gsi_end_p (gsi);
+  gsi_next ())
+   {
+ ssa_op_iter iter;
+ gimple *stmt = gsi_stmt (gsi);
+ FOR_EACH_SSA_TREE_OPERAND (t, stmt, iter, SSA_OP_ALL_OPERANDS)
+   if (TREE_CODE (t) == SSA_NAME)
+ bitmap_set_bit (names_in_il, SSA_NAME_VERSION (t));
+   }

t will always be an SSA_NAME here.


Ciao,
Michael.

GCC 6 Status Report (2015-11-09)

2015-11-09 Thread Richard Biener


Status
==

We've pushed back the switch to Stage 3 to the end of Saturday, Nov 14th.
This is to allow smooth draining of review queues.


Quality Data


Priority  #   Change from last report
---   ---
P14+   2
P2   84
P3  130+  10
P4   83-   5
P5   32
---   ---
Total P1-P3 218+  12
Total   333+   7


Previous Report
===

https://gcc.gnu.org/ml/gcc/2015-10/msg00113.html

[PATCH][optabs][ifcvt][1/3] Define negcc, notcc optabs

2015-11-09 Thread Kyrill Tkachov


Hi all,

This is a rebase of the patch I posted at:
https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00154.html

The patch has been ok'd by Jeff but I wanted to hold off committing it until
my fixes for the ifcvt regressions on sparc and x86_64 were fixed.

The rebase conflicts were due to Richard's optabs splitting patch.

I've also noticed that in my original patch I had a comparison of branch cost 
with
the magic number '2'. I removed it from this version as it's not really 
meaningful.
The transformation this patch enables is, at the moment, only supported for arm
and aarch64 where it is always beneficial. If/when we have a proper ifcvt 
costing
model (perhaps for GCC 7?) we'll update this accordingly if needed.

Jeff, sorry for taking so long to commit this, I just wanted to fix the other
ifcvt fallout before proceeding with more new functionality.
I have also uncovered a bug in the arm implementation of these optabs
(patch 3/3 in the series), so I'll post an updated version of that patch as 
well soon.

Ok to commit this updated version instead?

Bootstrapped and tested on arm, aarch64 and x86_64.
It has been sitting in my tree for a couple of months now with no issues.

Thanks,
Kyrill

2015-11-09  Kyrylo Tkachov  

* ifcvt.c (noce_try_inverse_constants): New function.
(noce_process_if_block): Call it.
* optabs.h (emit_conditional_neg_or_complement): Declare prototype.
* optabs.def (negcc_optab, notcc_optab): Declare.
* optabs.c (emit_conditional_neg_or_complement): New function.
* doc/tm.texi (Standard Names): Document negcc, notcc names.
commit 93cd987e9ab02ac68b44b2470bb5c4c6345efeca
Author: Kyrylo Tkachov 
Date:   Thu Aug 13 18:14:52 2015 +0100

[optabs][ifcvt][1/3] Define negcc, notcc optabs

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 619259f..c4e43f3 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5791,6 +5791,21 @@ move operand 2 or (operands 2 + operand 3) into operand 0 according to the
 comparison in operand 1.  If the comparison is false, operand 2 is moved into
 operand 0, otherwise (operand 2 + operand 3) is moved.
 
+@cindex @code{neg@var{mode}cc} instruction pattern
+@item @samp{neg@var{mode}cc}
+Similar to @samp{mov@var{mode}cc} but for conditional negation.  Conditionally
+move the negation of operand 2 or the unchanged operand 3 into operand 0
+according to the comparison in operand 1.  If the comparison is true, the negation
+of operand 2 is moved into operand 0, otherwise operand 3 is moved.
+
+@cindex @code{not@var{mode}cc} instruction pattern
+@item @samp{not@var{mode}cc}
+Similar to @samp{neg@var{mode}cc} but for conditional complement.
+Conditionally move the bitwise complement of operand 2 or the unchanged
+operand 3 into operand 0 according to the comparison in operand 1.
+If the comparison is true, the complement of operand 2 is moved into
+operand 0, otherwise operand 3 is moved.
+
 @cindex @code{cstore@var{mode}4} instruction pattern
 @item @samp{cstore@var{mode}4}
 Store zero or nonzero in operand 0 according to whether a comparison
diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 157a716..1e773d8 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -1179,6 +1179,83 @@ noce_try_store_flag (struct noce_if_info *if_info)
 }
 }
 
+
+/* Convert "if (test) x = -A; else x = A" into
+   x = A; if (test) x = -x if the machine can do the
+   conditional negate form of this cheaply.
+   Try this before noce_try_cmove that will just load the
+   immediates into two registers and do a conditional select
+   between them.  If the target has a conditional negate or
+   conditional invert operation we can save a potentially
+   expensive constant synthesis.  */
+
+static bool
+noce_try_inverse_constants (struct noce_if_info *if_info)
+{
+  if (!noce_simple_bbs (if_info))
+return false;
+
+  if (!CONST_INT_P (if_info->a)
+  || !CONST_INT_P (if_info->b)
+  || !REG_P (if_info->x))
+return false;
+
+  machine_mode mode = GET_MODE (if_info->x);
+
+  HOST_WIDE_INT val_a = INTVAL (if_info->a);
+  HOST_WIDE_INT val_b = INTVAL (if_info->b);
+
+  rtx cond = if_info->cond;
+
+  rtx x = if_info->x;
+  rtx target;
+
+  start_sequence ();
+
+  rtx_code code;
+  if (val_b != HOST_WIDE_INT_MIN && val_a == -val_b)
+code = NEG;
+  else if (val_a == ~val_b)
+code = NOT;
+  else
+{
+  end_sequence ();
+  return false;
+}
+
+  rtx tmp = gen_reg_rtx (mode);
+  noce_emit_move_insn (tmp, if_info->a);
+
+  target = emit_conditional_neg_or_complement (x, code, mode, cond, tmp, tmp);
+
+  if (target)
+{
+  rtx_insn *seq = get_insns ();
+
+  if (!seq)
+	{
+	  end_sequence ();
+	  return false;
+	}
+
+  if (target != if_info->x)
+	noce_emit_move_insn (if_info->x, target);
+
+	seq = end_ifcvt_sequence (if_info);
+
+	if (!seq)
+	  return false;
+
+	emit_insn_before_setloc (seq, if_info->jump,
+ INSN_LOCATION (if_info->insn_a));
+	return true;
+}
+
+  end_sequence ();
+

[patch] backport remove of soft float for FreeBSD powerpc for gcc-4.9

2015-11-09 Thread Andreas Tobler


Hi all,

any objections when I apply the below patch to gcc-4.9?

TIA,

Andreas

2015-11-09  Andreas Tobler  

Backport from mainline
2015-03-04  Andreas Tobler  

* config/rs6000/t-freebsd64: Remove 32-bit soft-float multilibs.


Index: gcc/config/rs6000/t-freebsd64
===
--- gcc/config/rs6000/t-freebsd64   (revision 230016)
+++ gcc/config/rs6000/t-freebsd64   (working copy)
@@ -21,11 +21,9 @@
 # On FreeBSD the 32-bit libraries are found under /usr/lib32.
 # Set MULTILIB_OSDIRNAMES according to this.

-MULTILIB_OPTIONS= m32 msoft-float
-MULTILIB_DIRNAMES   = 32 nof
+MULTILIB_OPTIONS= m32
+MULTILIB_DIRNAMES   = 32
 MULTILIB_EXTRA_OPTS = fPIC mstrict-align
 MULTILIB_EXCEPTIONS =
-MULTILIB_EXCLUSIONS = !m32/msoft-float
 MULTILIB_OSDIRNAMES= ../lib32
-#MULTILIB_MATCHES= $(MULTILIB_MATCHES_FLOAT)

Re: OpenACC Firstprivate

2015-11-09 Thread Nathan Sidwell


On 11/09/15 08:59, Nathan Sidwell wrote:

On 11/09/15 08:46, Jakub Jelinek wrote:

On Sat, Nov 07, 2015 at 08:50:28AM -0500, Nathan Sidwell wrote:





Say
   int arr[64];
// initialize arr
#pragma acc data copyin (arr)
{
   // modify arr on the host
   # pragma acc parallel firstprivate (arr)
   {
 ...
   }
}


Hm, I suspect that is either ill formed or the std does not contemplate.


just realized, there are two ways to consider the above.

1) it's  ill formed.   Once you've transferred data to the device, modifying it 
on the host is unspecified.  I'm having trouble finding words in the std that 
actually say that though :(


2) on a system with shared physical global memory, the host modification would 
be visiable on the device (possibly at an arbitrary point due to lack of 
synchronization primitive?)


I don't think this changes 'why not use OpenMP's ...' question, because IIUC you 
think that can be made to DTRT anyway?


nathan

Re: [Patch AArch64] Switch constant pools to separate rodata sections.

2015-11-09 Thread Ramana Radhakrishnan



On 08/11/15 11:42, Andreas Schwab wrote:
> This is causing a bootstrap comparison failure in gcc/go/gogo.o.

I'm looking into this - this is now PR68256.

regards
Ramana
> 
> Andreas.
>

Re: [PATCH] Fix memory leaks and use a pool_allocator

2015-11-09 Thread Richard Biener

On Mon, Nov 9, 2015 at 2:26 PM, Martin Liška  wrote:
> On 11/09/2015 01:11 PM, Richard Biener wrote:
>> On Mon, Nov 9, 2015 at 12:22 PM, Martin Liška  wrote:
>>> Hi.
>>>
>>> This is follow-up of changes that Richi started on Friday.
>>>
>>> Patch can bootstrap on x86_64-linux-pc and regression tests are running.
>>>
>>> Ready for trunk?
>>
>> * tree-ssa-dom.c (free_edge_info): Make the function extern.
>> ...
>> * tree-ssa.h (free_edge_info): Declare function extern.
>>
>> declare this in tree-ssa-threadupdate.h instead and renaming it to
>> sth less "public", like free_dom_edge_info.
>>
>> diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
>> index fff62de..eb6b7df 100644
>> --- a/gcc/ifcvt.c
>> +++ b/gcc/ifcvt.c
>> @@ -3161,6 +3161,8 @@ noce_convert_multiple_sets (struct noce_if_info 
>> *if_info)
>>set_used_flags (targets[i]);
>>  }
>>
>> +  temporaries.release ();
>> +
>>set_used_flags (cond);
>>set_used_flags (x);
>>set_used_flags (y);
>> @@ -3194,6 +3196,7 @@ noce_convert_multiple_sets (struct noce_if_info 
>> *if_info)
>>  }
>>
>>num_updated_if_blocks++;
>> +  targets.release ();
>>return TRUE;
>>
>> suspiciously look like candidates for an auto_vec<> (didn't check).
>>
>> @@ -1240,6 +1240,7 @@ lra_create_live_ranges_1 (bool all_p, bool dead_insn_p)
>>dead_set = sparseset_alloc (max_regno);
>>unused_set = sparseset_alloc (max_regno);
>>curr_point = 0;
>> +  point_freq_vec.release ();
>>point_freq_vec.create (get_max_uid () * 2);
>>
>> a truncate (0) instead of a release () should be cheaper, avoiding the
>> re-allocation.
>>
>> @@ -674,6 +674,10 @@ sra_deinitialize (void)
>>assign_link_pool.release ();
>>obstack_free (_obstack, NULL);
>>
>> +  for (hash_map::iterator it =
>> +   base_access_vec->begin (); it != base_access_vec->end (); ++it)
>> +(*it).second.release ();
>> +
>>delete base_access_vec;
>>
>> I wonder if the better fix is to provide a proper free method for the 
>> hash_map?
>> A hash_map with 'auto_vec' looks suspicous - eventually a proper release
>> was intented here via default_hash_map_traits <>?
>>
>> Anyway, most of the things above can be improved as followup of course.
>>
>> Thanks,
>> Richard.
>>
>>> Thanks,
>>> Martin
>
> Hi.
>
> All suggested changes were applied, sending v2 and waiting for bootstrap and
> regression tests.

Ok.

Thanks,
Richard.

> Thanks,
> Martin

Re: OpenACC Firstprivate

2015-11-09 Thread Nathan Sidwell


On 11/09/15 09:10, Jakub Jelinek wrote:

On Mon, Nov 09, 2015 at 08:59:15AM -0500, Nathan Sidwell wrote:

This I'm afraid performs often two copies rather than just one (one to copy
the host value to the present_copyin mapped value, another one in the
region),


I don't think that can be avoided.  The host doesn't have control over when
the CTAs (a gang) start -- they may even be serialized onto the same
physical HW. So each gang has to initialize its own instance.  Or did you
mean something else?


So, what is the scope of the private and firstprivate vars in OpenACC?
In OpenMP if a variable is private or firstprivate on the target construct,
unless further privatized in inner constructs it is really shared among all
the threads in all the teams (ro one var per all CTAs/workers in PTX terms).
Is that the case for OpenACC too, or are the vars e.g. private to each CTA
already or to each thread in each CTA, something different?
If they are shared by all CTAs, then you should hopefully be able to use the
GOMP_MAP_FIRSTPRIVATE{,_INT}, if not, then I'd say you should at least use
those to provide you the initializer data to initialize your private vars
from as a cheaper alternative to mapping.


I'm going to try and get clarification, but I think the intent is to initialize 
with the value seen on the device.  Consider:



int foo = 0;
#pragma acc data copyin(foo)
{
  #pragma acc parallel present(foo)
  {
foo = 2;
  }

  if (expr){
#pragma update host (foo)
  }

  #pragma acc parallel firstprivate (foo)
  {
  // which initialization value?
  }
}

Here we copy data to the device, then set it a distinct value there.  We 
conditionally update the host's instance from the device.


My thinking is that the intent of the firstprivate is to initialize with the 
value known on the device (and behave as-if copyin, if it's not there).  Not the 
value most recently seen on the host -- the update clause could change that, and 
may well be being used as a debugging aide, so it seems bizarre that it can 
change program semantics in such a way.

Re: Add null identifiers to genmatch

2015-11-09 Thread Richard Biener

On Mon, Nov 9, 2015 at 12:17 AM, Jeff Law  wrote:
> On 11/07/2015 07:31 AM, Pedro Alves wrote:
>>
>> Hi Richard,
>>
>> Passerby comment below.
>>
>> On 11/07/2015 01:21 PM, Richard Sandiford wrote:
>>>
>>> -/* Lookup the identifier ID.  */
>>> +/* Lookup the identifier ID.  Allow "null" if ALLOW_NULL.  */
>>>
>>>   id_base *
>>> -get_operator (const char *id)
>>> +get_operator (const char *id, bool allow_null = false)
>>>   {
>>> +  if (allow_null && strcmp (id, "null") == 0)
>>> +return null_id;
>>> +
>>> id_base tem (id_base::CODE, id);
>>
>>
>> Boolean params are best avoided if possible, IMO.  In this case,
>> it seems this could instead be a new wrapper function, like:
>
> This hasn't been something we've required for GCC.I've come across this
> recommendation a few times over the last several months as I continue to
> look at refactoring and best practices for codebases such as GCC.
>
> By encoding the boolean in the function's signature, it (IMHO) does make the
> code a bit easier to read, primarily because you don't have to go lookup the
> tense of the boolean).  The problem is when the boolean is telling us some
> property an argument, but there's more than one argument and other similar
> situations.
>
> I wonder if the real benefit is in the refactoring necessary to do things in
> this way without a ton of code duplication.

I think the patch is ok as-is.

Thus ok.

Thanks,
Richard.

> Jeff
>
>

Re: [PATCH 10/12] always define EH_RETURN_HANDLER_RTX

2015-11-09 Thread Mike Stump

On Nov 9, 2015, at 11:46 AM, Jeff Law  wrote:
On 11/09/2015 12:38 PM, Bernd Schmidt wrote:
>> We might want to think about making a policy decision to try waiving
>> some of the testing requirements for target macro -> hook conversions.
>> Maybe try only a "build to cc1" requirement and see whether that causes
>> too much breakage.
> A config-list.mk build is a build to cc1*, f951, gnat1, so we're not 
> requiring deep tests on the affected targets.  Not sure how much we're 
> getting by forcing a bootstrap & regression test of that kind of change.
> 
> I'm certainly open to this kind of relaxed testing to help this stuff move 
> forward an complete before we're all retired :-)

Testing is a cornerstone of gcc quality.  I like it.  It is useful.  That said, 
I don’t think we should always be fanatical about it.  How and when we accept 
less that a standard bootstrap and regression test run I’ve sure would be a big 
topic, but rather than make a ton of rules, I’d rather let small handful of 
reviewers decide when and how to accept less, and let them do what they want.  
We can give them negative feedback if it impacts too many people, too often and 
they can adjust.

The other way, would be to have an integration branch that is tested and merged 
post testing on a regular basis and let people contribute less than well tested 
things on it, the idea being that it still won’t hit trunk until after a 
bootstrap and tests suite run, but that we can bundle 2-100 patches into one 
test suite run.  This strikes me as more scalable, easier for developers and 
removes the requirement of test suite + bootstrap before checkin while 
retaining the useful quality of everything merged to trunk is tested.  Hardest 
part about this would be ChangeLogs, merge resolution and svn blame.  git 
handles this gracefully.  svn as I recall, a little less so.  [ quick check ] 
Ah, seems svn blame -g TARGET ca n handle this graceful (in theory).

Re: [Patch] Change to argument promotion in fixed conversion library calls

2015-11-09 Thread Steve Ellcey

On Mon, 2015-11-09 at 21:47 +0100, Bernd Schmidt wrote:
> On 11/09/2015 05:59 PM, Steve Ellcey wrote:
> > Here is a version with the code moved into a new function.  How does
> > this look?
> >
> > 2015-11-09  Steve Ellcey  
> >
> > * optabs.c (prepare_libcall_arg): New function.
> > (expand_fixed_convert): Add call to prepare_libcall_arg.
> 
> Hold on a moment - I see that emit_library_call_value_1 calls 
> promote_function_mode for arguments. Can you investigate why that 
> doesn't do what you need?
> 
> 
> Bernd

emit_library_call_value_1 has no way of knowing if the promotion should
be signed or unsigned because it has a mode (probably QImode or HImode)
that it knows may need to be promoted to SImode but it has no way to
know if that should be a signed or unsigned promotion because it has no
tree type information about the library call argument types.

Right now it guesses based on the return type but it may guess wrong
when converting an unsigned int to a signed fixed type or visa versa.

By doing the promotion in expand_fixed_convert GCC can use the uintp
argument to ensure that the signedness of the promotion is done
correctly.  We could pass that argument into emit_library_call_value_1
so it can do the correct promotion but that would require changing the
argument list for emit_library_call and emit_library_call_value_1 and
changing all the other call locations for those functions and that
seemed like overkill.

Steve Ellcey

Re: [PATCH], Add power9 support to GCC, patch #5 (ISA 3.0 fusion)

2015-11-09 Thread David Edelsohn

On Mon, Nov 9, 2015 at 11:57 AM, Segher Boessenkool
 wrote:
> On Mon, Nov 09, 2015 at 12:34:20PM -0500, Michael Meissner wrote:
>> > > +(define_insn "*toc_fusionload_"
>> > > +  [(set (match_operand:QHSI 0 "int_reg_operand" "=,??r")
>> > > + (match_operand:QHSI 1 "toc_fusion_mem_wrapped" "wG,wG"))
>> > > +   (unspec [(const_int 0)] UNSPEC_FUSION_ADDIS)
>> > > +   (use (match_operand:DI 2 "base_reg_operand" "r,r"))
>> > > +   (clobber (match_scratch:DI 3 "=X,"))]
>> > > +  "TARGET_TOC_FUSION_INT"
>> >
>> > Do you need that "??r" alternative?  Same for the next define_insn.
>>
>> Yes unfortunately.  The ??r catches the case where r0 is chosen.  R0 is not a
>> base register, and it can't be used for power8 gpr fusion (where you use the
>> value being loaded for the ADDIS instruction), but it can be used for power9
>> fusion (where the ADDIS must be adjancent, but it no longer has to be the
>> register being loaded).
>
> If you have only "b", r0 will not be chosen.  Does that help?  Or are
> you generating this pattern from somewhere else where you put in r0?

Mike,

What happens if you leave out the "r" alternative?  Does other code
explicitly generate that pattern with r0?

Thanks, David

Re: [PATCH 10/12] always define EH_RETURN_HANDLER_RTX

2015-11-09 Thread Trevor Saunders

On Mon, Nov 09, 2015 at 12:46:33PM -0700, Jeff Law wrote:
> On 11/09/2015 12:38 PM, Bernd Schmidt wrote:
> >On 11/09/2015 07:52 PM, Trevor Saunders wrote:
> >
> >>yeah, that's more or less my thought, and this makes hookization easier
> >>since you can now mechanically add a hook for each thing in defaults.h
> >>that invokes the macro.  Then for each target you can go through and
> >>replace the macro with an override of the hooks.  That ends up with the
> >>macros replaced by hooks without writing a lot of patches that need to
> >>go through config-list.mk, and testing on multiple targets which imho is
> >>a giant pain, and rather slow.
> >
> >We might want to think about making a policy decision to try waiving
> >some of the testing requirements for target macro -> hook conversions.
> >Maybe try only a "build to cc1" requirement and see whether that causes
> >too much breakage.
> A config-list.mk build is a build to cc1*, f951, gnat1, so we're not
> requiring deep tests on the affected targets.  Not sure how much we're
> getting by forcing a bootstrap & regression test of that kind of change.

So in general when I've done cross target things I think I've found more
bugs with config-list.mk than with a regtest, but the regtest has found
some things I think.

However I actually don't mind bootstrapping and regtesting that much,
its more or less a few hours for the control and then another few for
each patch.  On the other hand config-list.mk takes on the order of 12
hours, and setting up a cross for a quick test isn't really that quick.
Which means that if you have a patch touching a number of targets you
end up not checking it compiles at all until you run config-list.mk, and
then its a heavy weight operation.

So at least for the way I work I'd really rather write series that I can
incrementally test on just one target and be reasonably confident they
won't break other targets.

The add default macro definitions then wrap those with hooks, then
target by target replace the macro by hook overrides approach seems to
provide that you can incrementally test and fiind most of the issues,
but the change a macro every where approach doesn't really.

Trev

The add default macros then use those in hooks, and finally add overides
> 
> I'm certainly open to this kind of relaxed testing to help this stuff move
> forward an complete before we're all retired :-)
> 
> Jeff
>

Re: [PATCH 05/12] always define VMS_DEBUGGING_INFO

2015-11-09 Thread Trevor Saunders

On Mon, Nov 09, 2015 at 08:37:19PM +0100, Bernd Schmidt wrote:
> On 11/09/2015 08:29 PM, Trevor Saunders wrote:
> >as I said in 0/12 this did go through config-list.mk, and checking again
> >this does build on alpha-dec-vms.
> 
> The question I have is - why does it build on any other target? It's the
> reference that's unconditional, not the definition. Do we have enough DCE at
> -O0 to eliminate the reference? It's still incorrect IMO (and should be
> fixed in the other patches as well.

dce would be my guess.  I guess going back to #if ing the bits that
reference it, and then incrementally removing the #ifs starting with the
ones defining the functions used in the structs, but given you seem to
be against patches that only change ifdef to #if you might not likethat
:(

> >
> >I'd actually really rather review them, or really deal with them in any
> >way, the way they are.  Smaller simpler patches that only deal with one
> >thing are much better.  I think the most macros that appear on one line
> >are 2, so at most you could lower that to 1 change instead of 2, but who
> >really cares anyway?
> 
> Well, I do, because I get to see this stuff:
> 
> -#if 1 < (defined (DBX_DEBUGGING_INFO) + defined (SDB_DEBUGGING_INFO) \
> +#if 1 < (defined (DBX_DEBUGGING_INFO) + (SDB_DEBUGGING_INFO) \
>   + defined (DWARF2_DEBUGGING_INFO) + defined (XCOFF_DEBUGGING_INFO)
> \
>   + defined (VMS_DEBUGGING_INFO))
> 
>  #if 1 < (defined (DBX_DEBUGGING_INFO) + (SDB_DEBUGGING_INFO) \
> - + defined (DWARF2_DEBUGGING_INFO) + defined (XCOFF_DEBUGGING_INFO)
> \
> +  + defined (DWARF2_DEBUGGING_INFO) + (XCOFF_DEBUGGING_INFO) \
>   + defined (VMS_DEBUGGING_INFO))
> 
>  #if 1 < (defined (DBX_DEBUGGING_INFO) + (SDB_DEBUGGING_INFO) \
>+ defined (DWARF2_DEBUGGING_INFO) + (XCOFF_DEBUGGING_INFO) \
> - + defined (VMS_DEBUGGING_INFO))
> +  + (VMS_DEBUGGING_INFO))
> 
>  #if 1 < (defined (DBX_DEBUGGING_INFO) + (SDB_DEBUGGING_INFO) \
> -  + defined (DWARF2_DEBUGGING_INFO) + (XCOFF_DEBUGGING_INFO) \
> +  + (DWARF2_DEBUGGING_INFO) + (XCOFF_DEBUGGING_INFO) \
>+ (VMS_DEBUGGING_INFO))
> 
> -#if 1 < (defined (DBX_DEBUGGING_INFO) + (SDB_DEBUGGING_INFO) \
> +#if 1 < ((DBX_DEBUGGING_INFO) + (SDB_DEBUGGING_INFO) \
>+ (DWARF2_DEBUGGING_INFO) + (XCOFF_DEBUGGING_INFO) \
>+ (VMS_DEBUGGING_INFO))
> 
> etc.

other than reading this now I'm not sure what the context would be, but
either way personally I really don't mind reading that, and think its
simpler to reason about the correctness of one thing at a time.

Trev

> 
> 
> Bernd

Re: [PATCH], Add power9 support to GCC, patch #5 (ISA 3.0 fusion)

2015-11-09 Thread Michael Meissner

On Mon, Nov 09, 2015 at 01:11:41PM -0800, David Edelsohn wrote:
> On Mon, Nov 9, 2015 at 11:57 AM, Segher Boessenkool
>  wrote:
> > On Mon, Nov 09, 2015 at 12:34:20PM -0500, Michael Meissner wrote:
> >> > > +(define_insn "*toc_fusionload_"
> >> > > +  [(set (match_operand:QHSI 0 "int_reg_operand" "=,??r")
> >> > > + (match_operand:QHSI 1 "toc_fusion_mem_wrapped" "wG,wG"))
> >> > > +   (unspec [(const_int 0)] UNSPEC_FUSION_ADDIS)
> >> > > +   (use (match_operand:DI 2 "base_reg_operand" "r,r"))
> >> > > +   (clobber (match_scratch:DI 3 "=X,"))]
> >> > > +  "TARGET_TOC_FUSION_INT"
> >> >
> >> > Do you need that "??r" alternative?  Same for the next define_insn.
> >>
> >> Yes unfortunately.  The ??r catches the case where r0 is chosen.  R0 is 
> >> not a
> >> base register, and it can't be used for power8 gpr fusion (where you use 
> >> the
> >> value being loaded for the ADDIS instruction), but it can be used for 
> >> power9
> >> fusion (where the ADDIS must be adjancent, but it no longer has to be the
> >> register being loaded).
> >
> > If you have only "b", r0 will not be chosen.  Does that help?  Or are
> > you generating this pattern from somewhere else where you put in r0?
> 
> Mike,
> 
> What happens if you leave out the "r" alternative?  Does other code
> explicitly generate that pattern with r0?

Sometimes, one of the passes after reload (usually -fgcse-after-reload) decides
to redo the register allocation, and I would see a failure in building things
like Spec 2006.  I have tried not putting the "r" in there, or using
base_reg_operand instead of gpc_reg_operand, but I still got failures.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797

Re: [PATCH], Add power9 support to GCC, patch #7 (direct move enhancements)

2015-11-09 Thread Michael Meissner

I evidently forgot to attach the patch.

[gcc]
2015-11-08  Michael Meissner  

* config/rs6000/constraints.md (we constraint): New constraint for
64-bit power9 vector support.
(wL constraint): New constraint for the element in a vector that
can be addressed by the MFVSRLD instruction.

* config/rs6000/rs6000.c (rs6000_debug_reg_global): Add ISA 3.0
debugging.
(rs6000_init_hard_regno_mode_ok): If ISA 3.0 and 64-bit, enable we
constraint.  Disable the VSX<->GPR direct move helpers if we have
the MFVSRLD and MTVSRDD instructions.
(rs6000_secondary_reload_simple_move): Add support for doing
vector direct moves directly without additional scratch registers
if we have ISA 3.0 instructions.
(rs6000_secondary_reload_direct_move): Update comments.
(rs6000_output_move_128bit): Add support for ISA 3.0 vector
instructions.

* config/rs6000/vsx.md (vsx_mov): Add support for ISA 3.0
direct move instructions.
(vsx_movti_64bit): Likewise.
(vsx_extract_): Likewise.

* config/rs6000/rs6000.h (VECTOR_ELEMENT_MFVSRLD_64BIT): New
macros for ISA 3.0 direct move instructions.
(TARGET_DIRECT_MOVE_128): Likewise.

* config/rs6000/rs6000.md (128-bit GPR splitters): Don't split a
128-bit move that is a direct move between GPR and vector
registers using ISA 3.0 direct move instructions.

* doc/md.texi (RS/6000 constraints): Document we, wF, wG, wL
constraints.  Update wa documentation to say not to use %x on
instructions that only take Altivec registers.

[gcc/testsuite]
2015-11-08  Michael Meissner  

* gcc.target/powerpc/direct-move-vector.c: New test for 128-bit
vector direct move instructions.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/constraints.md
===
--- gcc/config/rs6000/constraints.md(revision 229976)
+++ gcc/config/rs6000/constraints.md(working copy)
@@ -64,7 +64,8 @@ (define_register_constraint "wa" "rs6000
 (define_register_constraint "wd" "rs6000_constraints[RS6000_CONSTRAINT_wd]"
   "VSX vector register to hold vector double data or NO_REGS.")
 
-;; we is not currently used
+(define_register_constraint "we" "rs6000_constraints[RS6000_CONSTRAINT_we]"
+  "VSX register if the -mpower9-vector -m64 options were used or NO_REGS.")
 
 (define_register_constraint "wf" "rs6000_constraints[RS6000_CONSTRAINT_wf]"
   "VSX vector register to hold vector float data or NO_REGS.")
@@ -147,6 +148,12 @@ (define_memory_constraint "wG"
   "Memory operand suitable for TOC fusion memory references"
   (match_operand 0 "toc_fusion_mem_wrapped"))
 
+(define_constraint "wL"
+  "Int constant that is the element number mfvsrld accesses in a vector."
+  (and (match_code "const_int")
+   (and (match_test "TARGET_DIRECT_MOVE_128")
+   (match_test "(ival == VECTOR_ELEMENT_MFVSRLD_64BIT)"
+
 ;; Lq/stq validates the address for load/store quad
 (define_memory_constraint "wQ"
   "Memory operand suitable for the load/store quad instructions"
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 229977)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -2575,6 +2575,10 @@ rs6000_debug_reg_global (void)
   if (TARGET_VSX)
 fprintf (stderr, DEBUG_FMT_D, "VSX easy 64-bit scalar element",
 (int)VECTOR_ELEMENT_SCALAR_64BIT);
+
+  if (TARGET_DIRECT_MOVE_128)
+fprintf (stderr, DEBUG_FMT_D, "VSX easy 64-bit mfvsrld element",
+(int)VECTOR_ELEMENT_MFVSRLD_64BIT);
 }
 
 
@@ -2986,6 +2990,10 @@ rs6000_init_hard_regno_mode_ok (bool glo
rs6000_constraints[RS6000_CONSTRAINT_wp] = VSX_REGS;/* TFmode  */
 }
 
+  /* Support for new direct moves.  */
+  if (TARGET_DIRECT_MOVE_128)
+rs6000_constraints[RS6000_CONSTRAINT_we] = VSX_REGS;
+
   /* Set up the reload helper and direct move functions.  */
   if (TARGET_VSX || TARGET_ALTIVEC)
 {
@@ -3034,7 +3042,7 @@ rs6000_init_hard_regno_mode_ok (bool glo
  reg_addr[TImode].reload_load   = CODE_FOR_reload_ti_di_load;
}
 
- if (TARGET_DIRECT_MOVE)
+ if (TARGET_DIRECT_MOVE && !TARGET_DIRECT_MOVE_128)
{
  reg_addr[TImode].reload_gpr_vsx= 
CODE_FOR_reload_gpr_from_vsxti;
  reg_addr[V1TImode].reload_gpr_vsx  = 
CODE_FOR_reload_gpr_from_vsxv1ti;
@@ -18081,6 +18089,11 @@ rs6000_secondary_reload_simple_move (enu
  || (to_type == VSX_REG_TYPE && from_type == GPR_REG_TYPE)))
 return true;
 
+  else if (TARGET_DIRECT_MOVE_128 && size == 16
+  && ((to_type == VSX_REG_TYPE && from_type ==

Re: Extend tree-call-cdce to calls whose result is used

2015-11-09 Thread Michael Matz

Hi,

On Mon, 9 Nov 2015, Richard Sandiford wrote:

> +static bool
> +can_use_internal_fn (gcall *call)
> +{
> +  /* Only replace calls that set errno.  */
> +  if (!gimple_vdef (call))
> +return false;

Oh, I managed to confuse this in my head while reading the patch.  So, 
hmm, you don't actually replace the builtin with an internal function 
(without the condition) under no-errno-math?  Does something else do that?  
Because otherwise that seems an unnecessary restriction?

> >> r229916 fixed that for the non-EH case.
> >
> > Ah, missed it.  Even the EH case shouldn't be difficult.  If the 
> > original dominator of the EH destination was the call block it moves, 
> > otherwise it remains unchanged.
> 
> The target of the edge is easy in itself, I agree, but that isn't
> necessarily the only affected block, if the EH handler doesn't
> exit or rethrow.

You're worried the non-EH and the EH regions merge again, right?  Like so:

before change:

BB1: throwing-call
 fallthru/   \EH
BB2   BBeh
 |   /\ (stuff in EH-region)
 | /some path out of EH region
 | /--/
BB3

Here, BB3 must at least be dominated by BB1 (the throwing block), or by 
something further up (when there are other side-entries to the path 
BB2->BB3 or into the EH region).  When further up, nothing changes, when 
it's BB1, then it's afterwards dominated by the BB containing the 
condition.  So everything with idom==BB1 gets idom=Bcond, except for BBeh, 
which gets idom=Bcall.  Depending on how you split BB1, either Bcond or 
BBcall might still be BB1 and doesn't lead to changes in the dom tree.

> > Currently we have quite some of such passes (reassoc, forwprop, 
> > lower_vector_ssa, cse_reciprocals, cse_sincos (sigh!), optimize_bswap 
> > and others), but they are all handling only special situations in one 
> > way or the other.  pass_fold_builtins is another one, but it seems 
> > most related to what you want (replacing a call with something else), 
> > so I thought that'd be the natural choice.
> 
> Well, to be pedantic, it's not really replacing the call.  Except for
> the special case of targets that support direct assignments to errno,
> it keeps the original call but ensures that it isn't usually executed.
> From that point of view it doesn't really seem like a fold.
> 
> But I suppose that's just naming again :-).  And it's easily solved with
> s/fold/rewrite/.

Exactly, in my mind pass_fold_builtin (like many of the others I 
mentioned) doesn't do folding but rewriting :)

> > call_cdce is also such a pass, but I think it's simply not the 
> > appropriate one (only in so far as its source file contains the helper 
> > routines you need), and in addition I think it shouldn't exist at all 
> > (and wouldn't need to if it had been part of DCE from the start, or if 
> > you implemented the conditionalizing as part of another pass).  Hey, 
> > you could be one to remove a pass! ;-)
> 
> It still seems a bit artificial to me to say that the transformation 
> with a null lhs is "DCE enough" to go in the main DCE pass (even though 
> like I say it doesn't actually eliminate any code from the IR, it just 
> adds more code) and should be kept in a separate pass from the one that 
> does the transformation on a non-null lhs.

Oh, I agree, I might not have been clear: I'm not arguing that the normal 
DCE should now be changed to do the conditionalizing when it removes an 
call LHS; I was saying that it _would_ have been good instead of adding 
the call_cdce pass in the past, when it was for DCE purposes only.  But 
now your proposal is on the plate, namely doing the conditionalizing also 
with an LHS.  So that conditionalizing should take place in some rewriting 
pass (and ideally not call_cdce), no matter the LHS, and normal DCE not be 
changed (it will still remove LHSs of non-removable calls, just that those 
then are sometimes under a condition, when DCE runs after the rewriting).

Ciao,
Michael.

Remove instantiations when no concept check

2015-11-09 Thread François Dumont

Hi

I just committed this trivial cleanup.

2015-11-09  François Dumont  

* include/bits/stl_algo.h
(partial_sort_copy): Instantiate std::iterator_traits only if concept
checks.
(lower_bound): Likewise.
(upper_bound): Likewise.
(equal_range): Likewise.
(binary_search): Likewise.
* include/bits/stl_heap.h (pop_heap): Likewise.

François

diff --git libstdc++-v3/include/bits/stl_algo.h libstdc++-v3/include/bits/stl_algo.h
index c90f479..6037044 100644
--- libstdc++-v3/include/bits/stl_algo.h
+++ libstdc++-v3/include/bits/stl_algo.h
@@ -1735,12 +1735,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 		  _RandomAccessIterator __result_first,
 		  _RandomAccessIterator __result_last)
 {
+#ifdef _GLIBCXX_CONCEPT_CHECKS
   typedef typename iterator_traits<_InputIterator>::value_type
 	_InputValueType;
   typedef typename iterator_traits<_RandomAccessIterator>::value_type
 	_OutputValueType;
-  typedef typename iterator_traits<_RandomAccessIterator>::difference_type
-	_DistanceType;
+#endif
 
   // concept requirements
   __glibcxx_function_requires(_InputIteratorConcept<_InputIterator>)
@@ -1786,12 +1786,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 		  _RandomAccessIterator __result_last,
 		  _Compare __comp)
 {
+#ifdef _GLIBCXX_CONCEPT_CHECKS
   typedef typename iterator_traits<_InputIterator>::value_type
 	_InputValueType;
   typedef typename iterator_traits<_RandomAccessIterator>::value_type
 	_OutputValueType;
-  typedef typename iterator_traits<_RandomAccessIterator>::difference_type
-	_DistanceType;
+#endif
 
   // concept requirements
   __glibcxx_function_requires(_InputIteratorConcept<_InputIterator>)
@@ -2020,13 +2020,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 lower_bound(_ForwardIterator __first, _ForwardIterator __last,
 		const _Tp& __val, _Compare __comp)
 {
-  typedef typename iterator_traits<_ForwardIterator>::value_type
-	_ValueType;
-
   // concept requirements
   __glibcxx_function_requires(_ForwardIteratorConcept<_ForwardIterator>)
   __glibcxx_function_requires(_BinaryPredicateConcept<_Compare,
-  _ValueType, _Tp>)
+	typename iterator_traits<_ForwardIterator>::value_type, _Tp>)
   __glibcxx_requires_partitioned_lower_pred(__first, __last,
 		__val, __comp);
   __glibcxx_requires_irreflexive_pred2(__first, __last, __comp);
@@ -2078,12 +2075,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 upper_bound(_ForwardIterator __first, _ForwardIterator __last,
 		const _Tp& __val)
 {
-  typedef typename iterator_traits<_ForwardIterator>::value_type
-	_ValueType;
-
   // concept requirements
   __glibcxx_function_requires(_ForwardIteratorConcept<_ForwardIterator>)
-  __glibcxx_function_requires(_LessThanOpConcept<_Tp, _ValueType>)
+  __glibcxx_function_requires(_LessThanOpConcept<
+	_Tp, typename iterator_traits<_ForwardIterator>::value_type>)
   __glibcxx_requires_partitioned_upper(__first, __last, __val);
   __glibcxx_requires_irreflexive2(__first, __last);
 
@@ -2111,13 +2106,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 upper_bound(_ForwardIterator __first, _ForwardIterator __last,
 		const _Tp& __val, _Compare __comp)
 {
-  typedef typename iterator_traits<_ForwardIterator>::value_type
-	_ValueType;
-
   // concept requirements
   __glibcxx_function_requires(_ForwardIteratorConcept<_ForwardIterator>)
   __glibcxx_function_requires(_BinaryPredicateConcept<_Compare,
-  _Tp, _ValueType>)
+	_Tp, typename iterator_traits<_ForwardIterator>::value_type>)
   __glibcxx_requires_partitioned_upper_pred(__first, __last,
 		__val, __comp);
   __glibcxx_requires_irreflexive_pred2(__first, __last, __comp);
@@ -2186,13 +2178,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 equal_range(_ForwardIterator __first, _ForwardIterator __last,
 		const _Tp& __val)
 {
-  typedef typename iterator_traits<_ForwardIterator>::value_type
-	_ValueType;
-
   // concept requirements
   __glibcxx_function_requires(_ForwardIteratorConcept<_ForwardIterator>)
-  __glibcxx_function_requires(_LessThanOpConcept<_ValueType, _Tp>)
-  __glibcxx_function_requires(_LessThanOpConcept<_Tp, _ValueType>)
+  __glibcxx_function_requires(_LessThanOpConcept<
+	typename iterator_traits<_ForwardIterator>::value_type, _Tp>)
+  __glibcxx_function_requires(_LessThanOpConcept<
+	_Tp, typename iterator_traits<_ForwardIterator>::value_type>)
   __glibcxx_requires_partitioned_lower(__first, __last, __val);
   __glibcxx_requires_partitioned_upper(__first, __last, __val);
   __glibcxx_requires_irreflexive2(__first, __last);
@@ -2224,15 +2215,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 equal_range(_ForwardIterator __first, _ForwardIterator __last,
 		const _Tp& __val, _Compare __comp)
 {
-  typedef typename iterator_traits<_ForwardIterator>::value_type
-	_ValueType;
-
   // concept requirements

Remove unused openacc call

2015-11-09 Thread Nathan Sidwell

I've committed this to trunk.   It nuke the now unused GOACC_GET_NUM_THREADS and 
GOACC_GET_THREAD_NUM  calls.   Also fixed up some comment typos I noticed



nathan
2015-11-09  Nathan Sidwell  

	* omp-low.c: Fix some OpenACC comment typos.
	(lower_reduction_clauses): Remove BUILT_IN_GOACC_GET_THREAD_NUM call.
	* omp-builtins.def (BUILT_IN_GOACC_GET_THREAD_NUM,
	BUILT_IN_GOACC_GET_NUM_THREADS): Delete.

Index: omp-low.c
===
--- omp-low.c	(revision 230038)
+++ omp-low.c	(working copy)
@@ -5559,7 +5559,7 @@ lower_reduction_clauses (tree clauses, g
 {
   gimple_seq sub_seq = NULL;
   gimple *stmt;
-  tree x, c, tid = NULL_TREE;
+  tree x, c;
   int count = 0;
 
   /* OpenACC loop reductions are handled elsewhere.  */
@@ -5589,17 +5589,6 @@ lower_reduction_clauses (tree clauses, g
   if (count == 0)
 return;
 
-  /* Initialize thread info for OpenACC.  */
-  if (is_gimple_omp_oacc (ctx->stmt))
-{
-  /* Get the current thread id.  */
-  tree call = builtin_decl_explicit (BUILT_IN_GOACC_GET_THREAD_NUM);
-  tid = create_tmp_var (TREE_TYPE (TREE_TYPE (call)));
-  gimple *stmt = gimple_build_call (call, 0);
-  gimple_call_set_lhs (stmt, tid);
-  gimple_seq_add_stmt (stmt_seqp, stmt);
-}
-
   for (c = clauses; c ; c = OMP_CLAUSE_CHAIN (c))
 {
   tree var, ref, new_var, orig_var;
@@ -12266,7 +12255,7 @@ expand_omp_atomic (struct omp_region *re
 }
 
 
-/* Encode an oacc launc argument.  This matches the GOMP_LAUNCH_PACK
+/* Encode an oacc launch argument.  This matches the GOMP_LAUNCH_PACK
macro on gomp-constants.h.  We do not check for overflow.  */
 
 static tree
@@ -12292,7 +12281,7 @@ oacc_launch_pack (unsigned code, tree de
 
The attribute value is a TREE_LIST.  A set of dimensions is
represented as a list of INTEGER_CST.  Those that are runtime
-   expres are represented as an INTEGER_CST of zero.
+   exprs are represented as an INTEGER_CST of zero.
 
TOOO. Normally the attribute will just contain a single such list.  If
however it contains a list of lists, this will represent the use of
@@ -14311,7 +14300,7 @@ lower_omp_for (gimple_stmt_iterator *gsi
 			  gimple_omp_for_clauses (stmt),
 			  _head, _tail, ctx);
 
-  /* Add OpenACC partitioning markers just before the loop  */
+  /* Add OpenACC partitioning and reduction markers just before the loop  */
   if (oacc_head)
 gimple_seq_add_seq (, oacc_head);
   
@@ -19524,7 +19513,7 @@ public:
   return execute_oacc_device_lower ();
 }
 
-}; // class pass_oacc_transform
+}; // class pass_oacc_device_lower
 
 } // anon namespace
 
Index: omp-builtins.def
===
--- omp-builtins.def	(revision 230038)
+++ omp-builtins.def	(working copy)
@@ -47,10 +47,6 @@ DEF_GOACC_BUILTIN (BUILT_IN_GOACC_UPDATE
 DEF_GOACC_BUILTIN (BUILT_IN_GOACC_WAIT, "GOACC_wait",
 		   BT_FN_VOID_INT_INT_VAR,
 		   ATTR_NOTHROW_LIST)
-DEF_GOACC_BUILTIN (BUILT_IN_GOACC_GET_THREAD_NUM, "GOACC_get_thread_num",
-		   BT_FN_INT, ATTR_CONST_NOTHROW_LEAF_LIST)
-DEF_GOACC_BUILTIN (BUILT_IN_GOACC_GET_NUM_THREADS, "GOACC_get_num_threads",
-		   BT_FN_INT, ATTR_CONST_NOTHROW_LEAF_LIST)
 
 DEF_GOACC_BUILTIN_COMPILER (BUILT_IN_ACC_ON_DEVICE, "acc_on_device",
 			BT_FN_INT_INT, ATTR_CONST_NOTHROW_LEAF_LIST)

Re: [PATCH 10/12] always define EH_RETURN_HANDLER_RTX

2015-11-09 Thread Jeff Law


On 11/09/2015 02:30 PM, Trevor Saunders wrote:


So in general when I've done cross target things I think I've found more
bugs with config-list.mk than with a regtest, but the regtest has found
some things I think.
I'm finding config-list.mk fairly reliable, with the notable exception 
of the avr-rtems issue and interix.  But that may simply be function of 
running it regularly.




However I actually don't mind bootstrapping and regtesting that much,
its more or less a few hours for the control and then another few for
each patch.
I usually save my results and only go back for a control build if 
something goes wrong.  Of course I'm usually stepping forward at least 
once a day, so the number of new tests is usually manageable and allows 
me to compare the first run of the day with the last run of the prior day.




  On the other hand config-list.mk takes on the order of 12

hours, and setting up a cross for a quick test isn't really that quick.
Which means that if you have a patch touching a number of targets you
end up not checking it compiles at all until you run config-list.mk, and
then its a heavy weight operation.
FWIW, If we know what ports a particular patch would hit, I'd fully 
support folks doing builds that didn't hit all of config-list.mk.


In case it's not obvious I do hope that we'll get to a point where the 
class of bugs like "X is unused on port PDQ because it defines/does not 
define FROBIT" just go away and we can get good first level coverage 
with a native and perhaps a very small number of crosses (instead of the 
200+ in config-list.mk now).


At some point I also want to see config-list.mk extended to do things 
like "build the crosses and run test tree-ssa/ssa-dom-thread-11.c on all 
of them".  I've got hacks to do that locally, but they're strictly 
hacks.  I think this selectively deeper testing will become more 
important as we put the first level coverage behind us.





So at least for the way I work I'd really rather write series that I can
incrementally test on just one target and be reasonably confident they
won't break other targets.

That generally works for me.



The add default macro definitions then wrap those with hooks, then
target by target replace the macro by hook overrides approach seems to
provide that you can incrementally test and fiind most of the issues,
but the change a macro every where approach doesn't really.
I think Bernd and I just have different approaches, preferences and 
priorities on some stuff which results in slightly different priorities 
or approaches to certain issues.


I've known Bernd a long time and will say he's very reasonable and his 
concerns/objections are well thought out and carry a ton of weight with me.


Jeff

Re: RFC: C++ delayed folding merge

2015-11-09 Thread Eric Botcazou

> Right, the change is just to the C++ front end 'convert'.

OK, thanks for the clarification.

-- 
Eric Botcazou

Re: [1/2] OpenACC routine support

2015-11-09 Thread Cesar Philippidis

On 11/09/2015 04:31 PM, Nathan Sidwell wrote:
> On 11/03/15 10:35, Jakub Jelinek wrote:
>> On Mon, Nov 02, 2015 at 02:21:43PM -0500, Nathan Sidwell wrote:
>>> --- gcc/c/c-parser.c(revision 229667)
>>> +++ gcc/c/c-parser.c(working copy)
>>> @@ -1160,7 +1160,8 @@ enum c_parser_prec {
>>>   static void c_parser_external_declaration (c_parser *);
>>>   static void c_parser_asm_definition (c_parser *);
>>>   static void c_parser_declaration_or_fndef (c_parser *, bool, bool,
>>> bool,
>>> -   bool, bool, tree *, vec);
>>> +   bool, bool, tree *, vec,
>>> +   tree);
>>
>> Wonder if this shouldn't be tree = NULL_TREE, then you'd avoid most of
>> the
>> c_parser_declaration_or_fndef caller changes.
>>
>> Otherwise, LGTM.
> 
> This is the patch I've just committed.  It includes c parser adjustments
> to detect the case of two function decls with a single type specifier. 
> Cesar will be applying a patch for the C++ parser for the same  case.

Here's the patch that Nathan was referring to. I ended up introducing a
boolean variable named first in the various functions which call
finalize_oacc_routines. The problem the original approach was having was
that the routine clauses is only applied to the first function
declarator in a declaration list. By using 'first', which is set to true
if the current declarator is the first in a sequence of declarators, I
was able to defer setting parser->oacc_routine to NULL.

Nathan already approved this patch, so I've applied it to trunk.

Cesar
2015-11-09  Cesar Philippidis  

	gcc/cp/
	* parser.c (cp_finalize_oacc_routine): New boolean first argument.
	(cp_ensure_no_oacc_routine): Update call to cp_finalize_oacc_routine.
	(cp_parser_simple_declaration): Maintain a boolean first to keep track
	of each new declarator.  Propagate it to cp_parser_init_declarator.
	(cp_parser_init_declarator): New boolean first argument.  Propagate it
	to cp_parser_save_member_function_body and cp_finalize_oacc_routine.
	(cp_parser_member_declaration): Likewise.
	(cp_parser_single_declaration): Update call to
	cp_parser_init_declarator.
	(cp_parser_save_member_function_body): New boolean first_decl argument.
	Propagate it to cp_finalize_oacc_routine.
	(cp_parser_finish_oacc_routine): New boolean first argument.  Use it to
	determine if multiple declarators follow a routine construct.
	(cp_parser_oacc_routine): Update call to cp_parser_finish_oacc_routine.

	gcc/testsuite/
	* c-c++-common/goacc/routine-5.c: Enable c++ tests.

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 6fc2c6a..f3b4b46 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -246,7 +246,7 @@ static bool cp_parser_omp_declare_reduction_exprs
 static tree cp_parser_cilk_simd_vectorlength 
   (cp_parser *, tree, bool);
 static void cp_finalize_oacc_routine
-  (cp_parser *, tree, bool);
+  (cp_parser *, tree, bool, bool);
 
 /* Manifest constants.  */
 #define CP_LEXER_BUFFER_SIZE ((256 * 1024) / sizeof (cp_token))
@@ -1329,7 +1329,7 @@ cp_finalize_omp_declare_simd (cp_parser *parser, tree fndecl)
 static inline void
 cp_ensure_no_oacc_routine (cp_parser *parser)
 {
-  cp_finalize_oacc_routine (parser, NULL_TREE, false);
+  cp_finalize_oacc_routine (parser, NULL_TREE, false, true);
 }
 
 /* Decl-specifiers.  */
@@ -2135,7 +2135,7 @@ static tree cp_parser_decltype
 
 static tree cp_parser_init_declarator
   (cp_parser *, cp_decl_specifier_seq *, vec *,
-   bool, bool, int, bool *, tree *, location_t *);
+   bool, bool, int, bool *, tree *, bool, location_t *);
 static cp_declarator *cp_parser_declarator
   (cp_parser *, cp_parser_declarator_kind, int *, bool *, bool, bool);
 static cp_declarator *cp_parser_direct_declarator
@@ -2445,7 +2445,7 @@ static tree cp_parser_single_declaration
 static tree cp_parser_functional_cast
   (cp_parser *, tree);
 static tree cp_parser_save_member_function_body
-  (cp_parser *, cp_decl_specifier_seq *, cp_declarator *, tree);
+  (cp_parser *, cp_decl_specifier_seq *, cp_declarator *, tree, bool);
 static tree cp_parser_save_nsdmi
   (cp_parser *);
 static tree cp_parser_enclosed_template_argument_list
@@ -11909,6 +11909,7 @@ cp_parser_simple_declaration (cp_parser* parser,
   bool saw_declarator;
   location_t comma_loc = UNKNOWN_LOCATION;
   location_t init_loc = UNKNOWN_LOCATION;
+  bool first = true;
 
   if (maybe_range_for_decl)
 *maybe_range_for_decl = NULL_TREE;
@@ -12005,7 +12006,10 @@ cp_parser_simple_declaration (cp_parser* parser,
 	declares_class_or_enum,
 	_definition_p,
 	maybe_range_for_decl,
+	first,
 	_loc);
+  first = false;
+
   /* If an error occurred while parsing tentatively, exit quickly.
 	 (That usually happens when in the body of a function; each
 	 statement is treated as a declaration-statement until proven
@@ -12104,6 +12108,9 @@ cp_parser_simple_declaration (cp_parser* parser,
 
  done:
   pop_deferring_access_checks

Re: [PATCH 10/12] always define EH_RETURN_HANDLER_RTX

2015-11-09 Thread Joseph Myers

On Mon, 9 Nov 2015, Trevor Saunders wrote:

> The add default macro definitions then wrap those with hooks, then
> target by target replace the macro by hook overrides approach seems to
> provide that you can incrementally test and fiind most of the issues,
> but the change a macro every where approach doesn't really.

I have this notion that once a target macro is "regular" enough - not used 
in code built for the target, not used in driver code, not used directly 
or indirectly in #if conditions except for the single default definition 
in defaults.h, target definitions only depend on the target architecture 
and not OS or other variations - it ought to be possible to do the 
conversion to a hook with some kind of automated refactoring tool 
(possibly with a little editing of its results).  And so this sort of 
regularizing of target macros is helpful because it increases the number 
of target macros that could be converted in an automated manner.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH], Add power9 support to GCC, patch #4

2015-11-09 Thread Michael Meissner

On Mon, Nov 09, 2015 at 10:29:10AM -0600, Segher Boessenkool wrote:
> On Sun, Nov 08, 2015 at 07:39:14PM -0500, Michael Meissner wrote:
> > +;; Pretend we have a memory form of extswsli until register allocation is 
> > done
> > +;; so that we use LWZ to load the value from memory, instead of LWA.
> 
> We generate sign_extend loads for many cases where zero_extend would be
> preferable.  We should deal with that generically, and then we can lose
> this hack.

Well it would be nice in theory.  But since we don't have that generic pass, I
need to use the combiner to generate the instruction.

> > +(define_insn_and_split "*ashdi3_extswsli_dot"
> 
> ...
> 
> > +  if (REGNO (cr) == CR0_REGNO)
> > +{
> > +  emit_insn (gen_ashdi3_extswsli_dot2 (dest, src2, shift, cr));
> > +  DONE;
> > +}
> 
> s/dot2/dot/

No, it will endless recurse until there is a stack overflow if you use dot
(since it will call itself, generating the same pattern over and over again).

> > +/* { dg-final { scan-assembler "extswsli\\. " } } */
> > +/* { dg-final { scan-assembler "lwz " } } */
> > +/* { dg-final { scan-assembler-not "lwa " } } */
> 
> "lwa" is a nasty string to search for ("always").  You can write this as
> {\mlwa\M} for more sanity.
> 
> > +/* { dg-final { scan-assembler-not "sldi "} } */
> > +/* { dg-final { scan-assembler-not "sldi\\. " } } */
> 
> Similarly {\msldi\M} catches both.

Thanks.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797

Re: [PATCH], Add power9 support to GCC, patch #5 (ISA 3.0 fusion)

2015-11-09 Thread Michael Meissner

On Mon, Nov 09, 2015 at 11:16:27AM -0600, Segher Boessenkool wrote:
> On Sun, Nov 08, 2015 at 07:42:04PM -0500, Michael Meissner wrote:
> > -  /* Power8 currently will only do the fusion if the top 11 bits of the 
> > addis
> > - value are all 1's or 0's.  */
> >value = INTVAL (int_const);
> >if ((value & (HOST_WIDE_INT)0x) != 0)
> 
> Space after cast, like  (HOST_WIDE_INT) 0x  .

Thanks.

> > +  /* Power8 currently will only do the fusion if the top 11 bits of the 
> > addis
> > + value are all 1's or 0's.  Ignore this restriction if we are testing
> > + advanced fusion.  */
> > +  if (TARGET_P9_FUSION)
> > +return 1;
> 
> This comment seems out of date?

Yeah, when I first coded it when the fusion semantics were being nailed down, I
couldn't reference power9 in the branch which was kept on the FSF servers, so I
just called it advanced fusion.  I evidently missed a few places in doing the
merge to change the name.

> >  ;; Match a GPR load (lbz, lhz, lwz, ld) that uses a combined address in the
> >  ;; memory field with both the addis and the memory offset.  Sign extension
> >  ;; is not handled here, since lha and lwa are not fused.
> > -(define_predicate "fusion_gpr_mem_combo"
> > -  (match_code "mem,zero_extend")
> > +;; With extended fusion, also match a FPR load (lfd, lfs) and float_extend
> 
> And here?

Yes.

> > --- gcc/config/rs6000/rs6000.c  (revision 229975)
> > +++ gcc/config/rs6000/rs6000.c  (working copy)
> > @@ -376,8 +376,18 @@ struct rs6000_reg_addr {
> >enum insn_code reload_fpr_gpr;   /* INSN to move from FPR to GPR.  */
> >enum insn_code reload_gpr_vsx;   /* INSN to move from GPR to VSX.  */
> >enum insn_code reload_vsx_gpr;   /* INSN to move from VSX to GPR.  */
> > +  enum insn_code fusion_gpr_ld;/* INSN for fusing gpr 
> > ADDIS/loads.  */
> > +   /* INSNs for fusing addi with loads
> > +  or stores for each reg. class.  */   
> >
> > +  enum insn_code fusion_addi_ld[(int)N_RELOAD_REG];
> > +  enum insn_code fusion_addi_st[(int)N_RELOAD_REG];
> > +   /* INSNs for fusing addis with loads
> > +  or stores for each reg. class.  */   
> >
> 
> Trailing tabs.

Ok.

> > +/* Return true if the peephole2 can combine a load/store involving a
> > +   combination of an addis instruction and the memory operation.  This was
> > +   added to the ISA 3.0 (power9) hardware.  */
> > +
> > +bool
> > +fusion_p9_p (rtx addis_reg,/* register set via addis.  */
> > +rtx addis_value,   /* addis value.  */
> > +rtx dest,  /* destination (memory or register). */
> > +rtx src)   /* source (register or memory).  */
> 
> The function header comment should explain the params, after which you
> can use the normal style for the function declaration itself.

Ok.

> > +(define_insn "*toc_fusionload_"
> > +  [(set (match_operand:QHSI 0 "int_reg_operand" "=,??r")
> > +   (match_operand:QHSI 1 "toc_fusion_mem_wrapped" "wG,wG"))
> > +   (unspec [(const_int 0)] UNSPEC_FUSION_ADDIS)
> > +   (use (match_operand:DI 2 "base_reg_operand" "r,r"))
> > +   (clobber (match_scratch:DI 3 "=X,"))]
> > +  "TARGET_TOC_FUSION_INT"
> 
> Do you need that "??r" alternative?  Same for the next define_insn.

Yes unfortunately.  The ??r catches the case where r0 is chosen.  R0 is not a
base register, and it can't be used for power8 gpr fusion (where you use the
value being loaded for the ADDIS instruction), but it can be used for power9
fusion (where the ADDIS must be adjancent, but it no longer has to be the
register being loaded).

> Big patch, most looks good :-)

Thanks.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797

[PATCH, 6/16] Add pass_oacc_kernels

2015-11-09 Thread Tom de Vries


On 09/11/15 16:35, Tom de Vries wrote:

Hi,

this patch series for stage1 trunk adds support to:
- parallelize oacc kernels regions using parloops, and
- map the loops onto the oacc gang dimension.

The patch series contains these patches:

  1Insert new exit block only when needed in
 transform_to_exit_first_loop_alt
  2Make create_parallel_loop return void
  3Ignore reduction clause on kernels directive
  4Implement -foffload-alias
  5Add in_oacc_kernels_region in struct loop
  6Add pass_oacc_kernels
  7Add pass_dominator_oacc_kernels
  8Add pass_ch_oacc_kernels
  9Add pass_parallelize_loops_oacc_kernels
 10Add pass_oacc_kernels pass group in passes.def
 11Update testcases after adding kernels pass group
 12Handle acc loop directive
 13Add c-c++-common/goacc/kernels-*.c
 14Add gfortran.dg/goacc/kernels-*.f95
 15Add libgomp.oacc-c-c++-common/kernels-*.c
 16Add libgomp.oacc-fortran/kernels-*.f95

The first 9 patches are more or less independent, but patches 10-16 are
intended to be committed at the same time.

Bootstrapped and reg-tested on x86_64.

Build and reg-tested with nvidia accelerator, in combination with a
patch that enables accelerator testing (which is submitted at
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ).

I'll post the individual patches in reply to this message.


this patchs add a pass group pass_oacc_kernels (which will be added to 
the pass list as a whole in patch 10).


Atm, the parallelization behaviour for the kernels region is controlled 
by flag_tree_parallelize_loops, which is also used to control generic 
auto-parallelization by autopar using omp. That is not ideal, and we may 
want a separate flag (or param) to control the behaviour for oacc 
kernels, f.i. -foacc-kernels-gang-parallelize=. I'm open to suggestions.


The purpose of the pass group as a whole is to massage the offloaded 
function into a shape that parloops can deal with it, and then run 
parloops on it.


Consider a testcase with a reduction, and a loop counter declared 
outside the offload region:

...
unsigned int a[n];

unsigned int
foo (void)
{
  int i;
  unsigned int sum = 1;

#pragma acc kernels copyin (a[0:n]) copy (sum)
  {
for (i = 0; i < n; ++i)
  sum += a[i];
  }

  return sum;
}
...

After ealias, the loop body looks like this:
...
  :
  _8 = *.omp_data_i_3(D).a;
  _9 = *.omp_data_i_3(D).i;
  _10 = *_9;
  _11 = *_8[_10];
  _12 = *.omp_data_i_3(D).sum;
  sum.0_13 = *_12;
  sum.1_14 = _11 + sum.0_13;
  _15 = *.omp_data_i_3(D).sum;
  *_15 = sum.1_14;
  _17 = *.omp_data_i_3(D).i;
  _18 = *_17;
  _19 = *.omp_data_i_3(D).i;
  _20 = _18 + 1;
  *_19 = _20;
  goto ;
...
In other words, the iteration variable is in memory, as is the reduction 
variable, and the body contains lots of loop invariant loads.


At the end of the pass group, just before parloops, the body has been 
rewritten to have a local iteration variable and a local reduction 
variable, and all the loop invariant loads have been moved out of the loop:

...
  :
  # _27 = PHI <0(2), _20(5)>
  # D__lsm.7_28 = PHI 
  _11 = *_8[_27];
  sum.1_14 = _11 + D__lsm.7_28;
  _20 = _27 + 1;
  if (_20 <= )
goto ;
  else
goto ;
...

Thanks,
- Tom

Add pass_oacc_kernels

2015-11-09  Tom de Vries  

	* tree-pass.h (make_pass_oacc_kernels): Declare.
	* tree-ssa-loop.c (gate_oacc_kernels): New static function.
	(pass_data_oacc_kernels): New pass_data.
	(class pass_oacc_kernels): New pass.
	(make_pass_oacc_kernels): New function.
---
 gcc/tree-pass.h |  1 +
 gcc/tree-ssa-loop.c | 65 +
 2 files changed, 66 insertions(+)

diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 49e22a9..4ed8da6 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -463,6 +463,7 @@ extern gimple_opt_pass *make_pass_strength_reduction (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_vtable_verify (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_ubsan (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_sanopt (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_oacc_kernels (gcc::context *ctxt);
 
 /* IPA Passes */
 extern simple_ipa_opt_pass *make_pass_ipa_lower_emutls (gcc::context *ctxt);
diff --git a/gcc/tree-ssa-loop.c b/gcc/tree-ssa-loop.c
index 8ecd140..b51cac2 100644
--- a/gcc/tree-ssa-loop.c
+++ b/gcc/tree-ssa-loop.c
@@ -35,6 +35,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-inline.h"
 #include "tree-scalar-evolution.h"
 #include "tree-vectorizer.h"
+#include "omp-low.h"
 
 
 /* A pass making sure loops are fixed up.  */
@@ -141,6 +142,70 @@ make_pass_tree_loop (gcc::context *ctxt)
   return new pass_tree_loop (ctxt);
 }
 
+/* Gate for oacc kernels pass group.  */
+
+static bool
+gate_oacc_kernels (function *fn)
+{
+  if (flag_tree_parallelize_loops <= 1)
+

[gomp4] remove IFN_GOACC_DIM handling from device_lower

2015-11-09 Thread Nathan Sidwell


I've committed this to gomp4, the relevant handling is in gimple-fold now.

nathan
2015-11-09  Nathan Sidwell  

	* omp-low.c (oacc_xform_dim): Delete.
	(execute_oacc_device_lower): Remove IFN_GOACC_DIM_POS,
	IFN_GOACC_DIM_SIZE handling.

Index: omp-low.c
===
--- omp-low.c	(revision 230022)
+++ omp-low.c	(working copy)
@@ -18835,38 +18835,6 @@ omp_finish_file (void)
 }
 }
 
-/* Transform oacc_dim_size and oacc_dim_pos internal function calls to
-   constants, where possible.  */
-
-static bool
-oacc_xform_dim (gcall *call, const int dims[], bool is_pos)
-{
-  tree arg = gimple_call_arg (call, 0);
-  unsigned axis = (unsigned)TREE_INT_CST_LOW (arg);
-  int size = dims[axis];
-
-  if (!size)
-/* Dimension size is dynamic.  */
-return false;
-  
-  if (is_pos)
-{
-  if (size != 1)
-	/* Size is more than 1, so POS might be non-zero.  */
-	return false;
-  size = 0;
-}
-
-  /* Replace the internal call with a constant.  */
-  tree lhs = gimple_call_lhs (call);
-  gimple *g = gimple_build_assign
-(lhs, build_int_cst (integer_type_node, size));
-
-  gimple_stmt_iterator gsi = gsi_for_stmt (call);
-  gsi_replace (, g, false);
-  return true;
-}
-
 /* Find the number of threads (POS = false), or thread number (POS =
true) for an OpenACC region partitioned as MASK.  Setup code
required for the calculation is added to SEQ.  */
@@ -19877,15 +19845,6 @@ execute_oacc_device_lower ()
 	  {
 	  default: break;
 
-	  case IFN_GOACC_DIM_POS:
-	  case IFN_GOACC_DIM_SIZE:
-	if (gimple_call_lhs (call) == NULL_TREE)
-	  remove = true;
-	else if (oacc_xform_dim (call, dims,
- ifn_code == IFN_GOACC_DIM_POS))
-	  rescan = true;
-	break;
-
 	  case IFN_GOACC_LOOP:
 	oacc_xform_loop (call);
 	rescan = true;

Re: [PATCH v3 2/2] [PR debug/67192] Further fix C loops' back-jump location

2015-11-09 Thread Andreas Arnez

On Sat, Nov 07 2015, Jeff Law wrote:

> Also OK.  And please consider using those tests with the C++ compiler
> to see if it's suffering from the same problem.

Not really, but there's still an issue.

In the C front-end the back-jump's location of an unconditional loop was
sometimes set to the token after the loop, particularly after the
misleading-indent patch.  This does *not* apply to C++.

Before the misleading-indent patch the location was usually set to the
last line of the loop instead.  This may be slightly confusing when the
loop body consists of an if-else statement: Breaking on that line then
causes a breakpoint hit on every iteration even if the else-path is
never executed.  This issue does *not* apply to C++ either.

But the C++ front-end always sets the location to the "while" or "for"
token.  This can cause confusion when setting a breakpoint there: When
hitting it for the first time, one loop iteration will already have
executed.

For that issue I included an informal patch in my earlier post.  It
mimics the C patch and seems to fix the issue:

  https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00478.html

I'll go ahead and prepare a full patch (with test case, ChangeLog, etc.)
for this.

Re: [PATCH], Add power9 support to GCC, patch #3 (scalar count trailing zeros)

2015-11-09 Thread David Edelsohn

On Sun, Nov 8, 2015 at 4:37 PM, Michael Meissner
 wrote:
> This patch adds support for scalar count trailing zeros instruction that is
> being added to ISA 3.0 (power9).
>
> I have built this patch (along with patches #2 and #4) with a bootstrap build
> on a power8 little endian system.  There were no regressions in the test
> suite.  Is this patch ok to install in the trunk once patch #1 has been
> installed.
>
> [gcc]
> 2015-11-08  Michael Meissner  
>
> * config/rs6000/rs6000.c (rs6000_rtx_costs): Update costs for
> count trailing zero instruction if we have hardware support.
>
> * config/rs6000/rs6000.h (TARGET_CTZ): Add support for count
> trailing zero instruction in ISA 3.0.
> * config/rs6000/rs6000.c (ctz2): Likewise.
> (ctz2_h): Likewise.
>
> [gcc/testsuite]
> 2015-11-08  Michael Meissner  
>
> * gcc.target/powerpc/ctz-1.c: Add test for count trailing zero
> instruciton support.
> * gcc.target/powerpc/ctz-2.c: Likewise.

This is okay.  We can address the attribute at a later time if necessary.

Please re-check CTZ_DEFINED_VALUE_AT_ZERO.

Thanks, David

Re: Extend tree-call-cdce to calls whose result is used

2015-11-09 Thread Richard Sandiford

Michael Matz  writes:
> On Mon, 9 Nov 2015, Richard Sandiford wrote:
>
>> -ffast-math would already cause us to treat the function as not setting 
>> errno, so the code wouldn't be used.
>
> What is "the code"?  I don't see any checking of the relevant flags in 
> tree-call-cdce.c, so I wonder what would prevent the addition of the 
> unnecessary checking.

-ffast-math implies -fno-errno-math, which in turn changes how the
function attributes are set.  E.g.:

#undef ATTR_MATHFN_FPROUNDING_ERRNO
#define ATTR_MATHFN_FPROUNDING_ERRNO (flag_errno_math ? \
ATTR_NOTHROW_LEAF_LIST : ATTR_MATHFN_FPROUNDING)

So with -ffast-math these functions don't set errno and don't have a vdef.
The patch checks for that here:

+/* Return true if built-in function call CALL could be implemented using
+   a combination of an internal function to compute the result and a
+   separate call to set errno.  */
+
+static bool
+can_use_internal_fn (gcall *call)
+{
+  /* Only replace calls that set errno.  */
+  if (!gimple_vdef (call))
+return false;

Checking for a vdef seemed reasonable.  If we treat these libm functions
as doing nothing other than producing a numerical result then we'll get
better optimisation across the board if we don't add unnecessary vops.
(Which we do, but see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68235 )

>> > The pass is somewhat expensive in that it removes dominator info and 
>> > schedules a full ssa update.  The transformation is trivial enough 
>> > that dominators and SSA form can be updated on the fly, I think 
>> > without that it's not feasible for -O.
>> 
>> r229916 fixed that for the non-EH case.
>
> Ah, missed it.  Even the EH case shouldn't be difficult.  If the original 
> dominator of the EH destination was the call block it moves, otherwise it 
> remains unchanged.

The target of the edge is easy in itself, I agree, but that isn't
necessarily the only affected block, if the EH handler doesn't
exit or rethrow.

>> I posted a patch to update the vops for the non-EH case as well:
>> 
>> https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03355.html
>
> I see, here the EH case is a bit more difficult as you need to 
> differentiate between VOP uses in the EH and the non-EH region, but not 
> insurmountable.

Well, I agree it's not insurmountable. :-)

>> But by "builtin folding", do you mean fold_builtin_n etc.?
>
> I had the pass_fold_builtins in mind.

OK.

> Currently we have quite some of such passes (reassoc, forwprop, 
> lower_vector_ssa, cse_reciprocals, cse_sincos (sigh!), optimize_bswap and 
> others), but they are all handling only special situations in one way or 
> the other.  pass_fold_builtins is another one, but it seems most related 
> to what you want (replacing a call with something else), so I thought 
> that'd be the natural choice.

Well, to be pedantic, it's not really replacing the call.  Except for
the special case of targets that support direct assignments to errno,
it keeps the original call but ensures that it isn't usually executed.
From that point of view it doesn't really seem like a fold.

But I suppose that's just naming again :-).  And it's easily solved with
s/fold/rewrite/.

> call_cdce is also such a pass, but I think it's simply not the appropriate 
> one (only in so far as its source file contains the helper routines you 
> need), and in addition I think it shouldn't exist at all (and wouldn't 
> need to if it had been part of DCE from the start, or if you implemented 
> the conditionalizing as part of another pass).  Hey, you could be one to 
> remove a pass! ;-)

It still seems a bit artificial to me to say that the transformation
with a null lhs is "DCE enough" to go in the main DCE pass (even
though like I say it doesn't actually eliminate any code from the IR,
it just adds more code) and should be kept in a separate pass from
the one that does the transformation on a non-null lhs.

Thanks,
Richard

[PATCH, 7/16] Add pass_dominator_oacc_kernels

2015-11-09 Thread Tom de Vries


On 09/11/15 16:35, Tom de Vries wrote:

Hi,

this patch series for stage1 trunk adds support to:
- parallelize oacc kernels regions using parloops, and
- map the loops onto the oacc gang dimension.

The patch series contains these patches:

  1Insert new exit block only when needed in
 transform_to_exit_first_loop_alt
  2Make create_parallel_loop return void
  3Ignore reduction clause on kernels directive
  4Implement -foffload-alias
  5Add in_oacc_kernels_region in struct loop
  6Add pass_oacc_kernels
  7Add pass_dominator_oacc_kernels
  8Add pass_ch_oacc_kernels
  9Add pass_parallelize_loops_oacc_kernels
 10Add pass_oacc_kernels pass group in passes.def
 11Update testcases after adding kernels pass group
 12Handle acc loop directive
 13Add c-c++-common/goacc/kernels-*.c
 14Add gfortran.dg/goacc/kernels-*.f95
 15Add libgomp.oacc-c-c++-common/kernels-*.c
 16Add libgomp.oacc-fortran/kernels-*.f95

The first 9 patches are more or less independent, but patches 10-16 are
intended to be committed at the same time.

Bootstrapped and reg-tested on x86_64.

Build and reg-tested with nvidia accelerator, in combination with a
patch that enables accelerator testing (which is submitted at
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ).

I'll post the individual patches in reply to this message.


this patch adds pass_dominator_oacc_kernels (which we may as well call 
pass_dominator_no_peel_loop_headers. It doesn't do anything 
oacc-kernels-specific), to be used in the kernels pass group.


The reason I'm adding a new pass instead of using pass_dominator is that 
pass_dominator uses first_pass_instance. So adding a pass_dominator 
instance A before a pass_dominator instance B has the unexpected 
consequence that it may change the behaviour of instance B. I've filed 
PR68247 - "Remove pass_first_instance" to note this issue.


Thanks,
- Tom

Add pass_dominator_oacc_kernels

2015-11-09  Tom de Vries  

	* tree-pass.h (make_pass_dominator_oacc_kernels): Declare.
	* tree-ssa-dom.c (class dominator_base): New class.  Factor out of ...
	(class pass_dominator): ... here.
	(dominator_base::may_peel_loop_headers_p)
(pass_dominator::may_peel_loop_headers_p): New function.
	(pass_dominator_oacc_kernels): New pass.
	(make_pass_dominator_oacc_kernels): New function.
	(dominator_base::execute): Use may_peel_loop_headers_p.
---
 gcc/tree-pass.h|  1 +
 gcc/tree-ssa-dom.c | 57 +-
 2 files changed, 53 insertions(+), 5 deletions(-)

diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 4ed8da6..2825aea 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -395,6 +395,7 @@ extern gimple_opt_pass *make_pass_build_ssa (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_build_alias (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_build_ealias (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_dominator (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_dominator_oacc_kernels (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_dce (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_cd_dce (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_call_cdce (gcc::context *ctxt);
diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c
index 3887bbe1..e4ff63a 100644
--- a/gcc/tree-ssa-dom.c
+++ b/gcc/tree-ssa-dom.c
@@ -519,6 +519,19 @@ private:
 
 namespace {
 
+class dominator_base : public gimple_opt_pass
+{
+ protected:
+  dominator_base (pass_data data, gcc::context *ctxt)
+: gimple_opt_pass (data, ctxt)
+  {}
+
+  unsigned int execute (function *);
+
+ protected:
+  virtual bool may_peel_loop_headers_p (void) { return true; }
+}; // class dominator_base
+
 const pass_data pass_data_dominator =
 {
   GIMPLE_PASS, /* type */
@@ -532,22 +545,23 @@ const pass_data pass_data_dominator =
   ( TODO_cleanup_cfg | TODO_update_ssa ), /* todo_flags_finish */
 };
 
-class pass_dominator : public gimple_opt_pass
+class pass_dominator : public dominator_base
 {
 public:
   pass_dominator (gcc::context *ctxt)
-: gimple_opt_pass (pass_data_dominator, ctxt)
+: dominator_base (pass_data_dominator, ctxt)
   {}
 
   /* opt_pass methods: */
   opt_pass * clone () { return new pass_dominator (m_ctxt); }
   virtual bool gate (function *) { return flag_tree_dom != 0; }
-  virtual unsigned int execute (function *);
 
+ protected:
+  virtual bool may_peel_loop_headers_p (void) { return first_pass_instance; }
 }; // class pass_dominator
 
 unsigned int
-pass_dominator::execute (function *fun)
+dominator_base::execute (function *fun)
 {
   memset (_stats, 0, sizeof (opt_stats));
 
@@ -619,7 +633,7 @@ pass_dominator::execute (function *fun)
   free_all_edge_infos ();
 
   /* Thread jumps, creating duplicate blocks as needed.  */
-  cfg_altered |= thread_through_all_blocks (first_pass_instance);
+

Re: [PATCH 10/12] always define EH_RETURN_HANDLER_RTX

2015-11-09 Thread Bernd Schmidt


On 11/09/2015 05:47 PM, tbsaunde+...@tbsaunde.org wrote:

From: Trevor Saunders 

gcc/ChangeLog:

2015-11-09  Trevor Saunders  

* defaults.h (EH_RETURN_HANDLER_RTX): New default definition.
* df-scan.c (df_get_exit_block_use_set): Adjust.
* except.c (expand_eh_return): Likewise.


As I said for a previous patch series, if we go to the trouble of fixing 
up stuff like this, we might as well do it properly and turn things like 
this into a target hook.



Bernd

Re: [PATCH 05/12] always define VMS_DEBUGGING_INFO

2015-11-09 Thread Bernd Schmidt

In general I think the _DEBUGGING_INFO patches are going to be OK, 
modulo Jeff's comment about stage 1. I think they shouldn't have been 
split - it causes numerous unnecessary extra changes, and the 
intermediate stages look very inconsistent.



-#ifdef VMS_DEBUGGING_INFO
-  else if (write_symbols == VMS_DEBUG || write_symbols == VMS_AND_DWARF2_DEBUG)
+  else if (VMS_DEBUGGING_INFO
+  && (write_symbols == VMS_DEBUG
+  || write_symbols == VMS_AND_DWARF2_DEBUG))
  debug_hooks = _debug_hooks;
-#endif
  #ifdef DWARF2_LINENO_DEBUGGING_INFO
else if (write_symbols == DWARF2_DEBUG)
  debug_hooks = _lineno_debug_hooks;
diff --git a/gcc/vmsdbgout.c b/gcc/vmsdbgout.c
index d41d4b2..6dd6878 100644
--- a/gcc/vmsdbgout.c
+++ b/gcc/vmsdbgout.c
@@ -24,7 +24,7 @@ along with GCC; see the file COPYING3.  If not see
  #include "coretypes.h"
  #include "tm.h"

-#ifdef VMS_DEBUGGING_INFO
+#if VMS_DEBUGGING_INFO
  #include "alias.h"
  #include "tree.h"
  #include "varasm.h"


This seems to reference vmsdbg_debug_hooks unconditionally, but as far 
as I can tell the definition is still guarded by an #if? Does this compile?



Bernd

[PATCH, 8/16] Add pass_ch_oacc_kernels

2015-11-09 Thread Tom de Vries


On 09/11/15 16:35, Tom de Vries wrote:

Hi,

this patch series for stage1 trunk adds support to:
- parallelize oacc kernels regions using parloops, and
- map the loops onto the oacc gang dimension.

The patch series contains these patches:

  1Insert new exit block only when needed in
 transform_to_exit_first_loop_alt
  2Make create_parallel_loop return void
  3Ignore reduction clause on kernels directive
  4Implement -foffload-alias
  5Add in_oacc_kernels_region in struct loop
  6Add pass_oacc_kernels
  7Add pass_dominator_oacc_kernels
  8Add pass_ch_oacc_kernels
  9Add pass_parallelize_loops_oacc_kernels
 10Add pass_oacc_kernels pass group in passes.def
 11Update testcases after adding kernels pass group
 12Handle acc loop directive
 13Add c-c++-common/goacc/kernels-*.c
 14Add gfortran.dg/goacc/kernels-*.f95
 15Add libgomp.oacc-c-c++-common/kernels-*.c
 16Add libgomp.oacc-fortran/kernels-*.f95

The first 9 patches are more or less independent, but patches 10-16 are
intended to be committed at the same time.

Bootstrapped and reg-tested on x86_64.

Build and reg-tested with nvidia accelerator, in combination with a
patch that enables accelerator testing (which is submitted at
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ).

I'll post the individual patches in reply to this message.


this patch adds a pass pass_ch_oacc_kernels, which is like pass_ch, but 
only runs for loops with oacc_kernels_region set.


[ But... thinking about it a bit more, I think that we could use a 
regular pass_ch instead. We only use the kernels pass group for a single 
loop nest in a kernels region, and we mark all the loops in the loop 
nest with oacc_kernels_region. So I think that the oacc_kernels_region 
test in pass_ch_oacc_kernels::process_loop_p evaluates to true. ]


So, I'll try to confirm with retesting that we can drop this patch.

Thanks,
- Tom

Add pass_ch_oacc_kernels

2015-11-09  Tom de Vries  

	* tree-pass.h (make_pass_ch_oacc_kernels): Declare.
	* tree-ssa-loop-ch.c (pass_ch::pass_ch (pass_data, gcc::context)): New
	constructor.
	(pass_data_ch_oacc_kernels): New pass_data.
	(class pass_ch_oacc_kernels): New pass.
	(pass_ch_oacc_kernels::process_loop_p): New function.
	(make_pass_ch_oacc_kernels): New function.
---
 gcc/tree-pass.h|  1 +
 gcc/tree-ssa-loop-ch.c | 54 +-
 2 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 2825aea..f95a820 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -389,6 +389,7 @@ extern gimple_opt_pass *make_pass_iv_optimize (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_tree_loop_done (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_ch (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_ch_vect (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_ch_oacc_kernels (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_ccp (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_phi_only_cprop (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_build_ssa (gcc::context *ctxt);
diff --git a/gcc/tree-ssa-loop-ch.c b/gcc/tree-ssa-loop-ch.c
index 7e618bf..8bf47fe 100644
--- a/gcc/tree-ssa-loop-ch.c
+++ b/gcc/tree-ssa-loop-ch.c
@@ -33,6 +33,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-inline.h"
 #include "tree-ssa-scopedtables.h"
 #include "tree-ssa-threadedge.h"
+#include "omp-low.h"
 
 /* Duplicates headers of loops if they are small enough, so that the statements
in the loop body are always executed when the loop is entered.  This
@@ -124,7 +125,7 @@ do_while_loop_p (struct loop *loop)
 
 namespace {
 
-/* Common superclass for both header-copying phases.  */
+/* Common superclass for header-copying phases.  */
 class ch_base : public gimple_opt_pass
 {
   protected:
@@ -159,6 +160,10 @@ public:
 : ch_base (pass_data_ch, ctxt)
   {}
 
+  pass_ch (pass_data data, gcc::context *ctxt)
+: ch_base (data, ctxt)
+  {}
+
   /* opt_pass methods: */
   virtual bool gate (function *) { return flag_tree_ch != 0; }
   
@@ -414,3 +419,50 @@ make_pass_ch (gcc::context *ctxt)
 {
   return new pass_ch (ctxt);
 }
+
+namespace {
+
+const pass_data pass_data_ch_oacc_kernels =
+{
+  GIMPLE_PASS, /* type */
+  "ch_oacc_kernels", /* name */
+  OPTGROUP_LOOP, /* optinfo_flags */
+  TV_TREE_CH, /* tv_id */
+  ( PROP_cfg | PROP_ssa ), /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  TODO_cleanup_cfg, /* todo_flags_finish */
+};
+
+class pass_ch_oacc_kernels : public pass_ch
+{
+public:
+  pass_ch_oacc_kernels (gcc::context *ctxt)
+: pass_ch (pass_data_ch_oacc_kernels, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *) { return true; }
+
+protected:
+  /* ch_base

Re: [ping] Fix PR debug/66728

2015-11-09 Thread Mike Stump

On Nov 6, 2015, at 5:06 AM, Richard Biener  wrote:
>> If there are no substantial reasons to not check it in now, I’d like to 
>> proceed and get it checked in.  People can refine it further in tree if they 
>> want.  Any objections?
> 
> Ok with a changelog entry and bootstrap/regtest.

Also committed to the release branch after waiting a few days to ensure no 
issue on trunk after the normal regression test and bootstrap.

Re: [PATCH 01/12] reduce conditional compilation for HARD_FRAME_POINTER_IS_ARG_POINTER

2015-11-09 Thread Bernd Schmidt


On 11/09/2015 05:47 PM, tbsaunde+...@tbsaunde.org wrote:

+++ b/gcc/dbxout.c
@@ -3076,10 +3076,8 @@ dbxout_symbol_location (tree decl, tree type, const char 
*suffix, rtx home)
   || (REG_P (XEXP (home, 0))
   && REGNO (XEXP (home, 0)) != HARD_FRAME_POINTER_REGNUM
   && REGNO (XEXP (home, 0)) != STACK_POINTER_REGNUM
-#if !HARD_FRAME_POINTER_IS_ARG_POINTER
-  && REGNO (XEXP (home, 0)) != ARG_POINTER_REGNUM
-#endif
-  )))
+  && (HARD_FRAME_POINTER_IS_ARG_POINTER
+  || REGNO (XEXP (home, 0)) != ARG_POINTER_REGNUM


This used to be

#if ARG_POINTER_REGNUM != HARD_FRAME_POINTER_REGNUM

and the whole macro seems kind of pointless - why not just make the 
ARG_POINTER_REGNUM test unconditional? I think the conditional 
compilation was originally just a "performance optimization", avoiding 
unnecessary tests - which means the reason to have the tests goes away 
if we move away from the conditional compilation.



Bernd

Re: [PATCH][AArch64] PR target/68129: Define TARGET_SUPPORTS_WIDE_INT

2015-11-09 Thread Marcus Shawcroft

On 9 November 2015 at 11:32, Kyrill Tkachov  wrote:

> 2015-11-09  Kyrylo Tkachov  
>
> PR target/68129
> * config/aarch64/aarch64.h (TARGET_SUPPORTS_WIDE_INT): Define to 1.
> * config/aarch64/aarch64.c (aarch64_print_operand, CONST_DOUBLE):
> Delete VOIDmode case.  Assert that mode is not VOIDmode.
> * config/aarch64/predicates.md (const0_operand): Remove const_double
> match.
>
> 2015-11-09  Kyrylo Tkachov  
>
> PR target/68129
> * gcc.target/aarch64/pr68129_1.c: New test.

Hi, This test isn't aarch64 specific, does it need to be in gcc.target/aarch64 ?

Cheers
/Marcus

Re: [PATCH][AArch64] PR target/68129: Define TARGET_SUPPORTS_WIDE_INT

2015-11-09 Thread Marcus Shawcroft

On 9 November 2015 at 15:45, Kyrill Tkachov  wrote:
>
> On 09/11/15 15:34, Marcus Shawcroft wrote:
>>
>> On 9 November 2015 at 11:32, Kyrill Tkachov 
>> wrote:
>>
>>> 2015-11-09  Kyrylo Tkachov  
>>>
>>>  PR target/68129
>>>  * config/aarch64/aarch64.h (TARGET_SUPPORTS_WIDE_INT): Define to 1.
>>>  * config/aarch64/aarch64.c (aarch64_print_operand, CONST_DOUBLE):
>>>  Delete VOIDmode case.  Assert that mode is not VOIDmode.
>>>  * config/aarch64/predicates.md (const0_operand): Remove const_double
>>>  match.
>>>
>>> 2015-11-09  Kyrylo Tkachov  
>>>
>>>  PR target/68129
>>>  * gcc.target/aarch64/pr68129_1.c: New test.
>>
>> Hi, This test isn't aarch64 specific, does it need to be in
>> gcc.target/aarch64 ?
>
>
> Not really, here is the patch with the test in gcc.dg/ if that's preferred.


OK /Marcus

[PATCH 2/6] Make builtin_vectorized_function take a combined_fn

2015-11-09 Thread Richard Sandiford

This patch replaces the fndecl argument to builtin_vectorized_function
with a combined_fn and gets the vectoriser to call it for internal
functions too.  The patch also moves vectorisation of machine-specific
built-ins to a new hook, builtin_md_vectorized_function.

I've attached a -b version too since that's easier to read.


gcc/
* target.def (builtin_vectorized_function): Take a combined_fn (in
the form of an unsigned int) rather than a function decl.
(builtin_md_vectorized_function): New.
* targhooks.h (default_builtin_vectorized_function): Replace the
fndecl argument with an unsigned int.
(default_builtin_md_vectorized_function): Declare.
* targhooks.c (default_builtin_vectorized_function): Replace the
fndecl argument with an unsigned int.
(default_builtin_md_vectorized_function): New function.
* doc/tm.texi.in (TARGET_VECTORIZE_BUILTIN_MD_VECTORIZED_FUNCTION):
New hook.
* doc/tm.texi: Regenerate.
* tree-vect-stmts.c (vectorizable_function): Update call to
builtin_vectorized_function, also passing internal functions.
Call builtin_md_vectorized_function for target-specific builtins.
* config/aarch64/aarch64-protos.h
(aarch64_builtin_vectorized_function): Replace fndecl argument
with an unsigned int.
* config/aarch64/aarch64-builtins.c: Include case-cfn-macros.h.
(aarch64_builtin_vectorized_function): Update after above changes.
Use CASE_CFN_*.
* config/arm/arm-protos.h (arm_builtin_vectorized_function): Replace
fndecl argument with an unsigned int.
* config/arm/arm-builtins.c: Include case-cfn-macros.h
(arm_builtin_vectorized_function): Update after above changes.
Use CASE_CFN_*.
* config/i386/i386.c: Include case-cfn-macros.h
(ix86_veclib_handler): Take a combined_fn rather than a
built_in_function.
(ix86_veclibabi_svml, ix86_veclibabi_acml): Likewise.  Use
mathfn_built_in rather than calling builtin_decl_implicit directly.
(ix86_builtin_vectorized_function) Update after above changes.
Use CASE_CFN_*.
* config/rs6000/rs6000.c: Include case-cfn-macros.h
(rs6000_builtin_vectorized_libmass): Replace fndecl argument
with a combined_fn.  Use CASE_CFN_*.  Use mathfn_built_in rather
than calling builtin_decl_implicit directly.
(rs6000_builtin_vectorized_function): Update after above changes.
Use CASE_CFN_*.  Move BUILT_IN_MD to...
(rs6000_builtin_md_vectorized_function): ...this new function.
(TARGET_VECTORIZE_BUILTIN_MD_VECTORIZED_FUNCTION): Define.

diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index 6b4208f..c4cda4f 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -38,6 +38,7 @@
 #include "expr.h"
 #include "langhooks.h"
 #include "gimple-iterator.h"
+#include "case-cfn-macros.h"
 
 #define v8qi_UP  V8QImode
 #define v4hi_UP  V4HImode
@@ -1258,7 +1259,8 @@ aarch64_expand_builtin (tree exp,
 }
 
 tree
-aarch64_builtin_vectorized_function (tree fndecl, tree type_out, tree type_in)
+aarch64_builtin_vectorized_function (unsigned int fn, tree type_out,
+ tree type_in)
 {
   machine_mode in_mode, out_mode;
   int in_n, out_n;
@@ -1282,130 +1284,119 @@ aarch64_builtin_vectorized_function (tree fndecl, tree type_out, tree type_in)
 	: (AARCH64_CHECK_BUILTIN_MODE (2, S) \
 	   ? aarch64_builtin_decls[AARCH64_SIMD_BUILTIN_UNOP_##N##v2sf] \
 	   : NULL_TREE)))
-  if (DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL)
+  switch (fn)
 {
-  enum built_in_function fn = DECL_FUNCTION_CODE (fndecl);
-  switch (fn)
-	{
 #undef AARCH64_CHECK_BUILTIN_MODE
 #define AARCH64_CHECK_BUILTIN_MODE(C, N) \
   (out_mode == N##Fmode && out_n == C \
&& in_mode == N##Fmode && in_n == C)
-	case BUILT_IN_FLOOR:
-	case BUILT_IN_FLOORF:
-	  return AARCH64_FIND_FRINT_VARIANT (floor);
-	case BUILT_IN_CEIL:
-	case BUILT_IN_CEILF:
-	  return AARCH64_FIND_FRINT_VARIANT (ceil);
-	case BUILT_IN_TRUNC:
-	case BUILT_IN_TRUNCF:
-	  return AARCH64_FIND_FRINT_VARIANT (btrunc);
-	case BUILT_IN_ROUND:
-	case BUILT_IN_ROUNDF:
-	  return AARCH64_FIND_FRINT_VARIANT (round);
-	case BUILT_IN_NEARBYINT:
-	case BUILT_IN_NEARBYINTF:
-	  return AARCH64_FIND_FRINT_VARIANT (nearbyint);
-	case BUILT_IN_SQRT:
-	case BUILT_IN_SQRTF:
-	  return AARCH64_FIND_FRINT_VARIANT (sqrt);
+CASE_CFN_FLOOR:
+  return AARCH64_FIND_FRINT_VARIANT (floor);
+CASE_CFN_CEIL:
+  return AARCH64_FIND_FRINT_VARIANT (ceil);
+CASE_CFN_TRUNC:
+  return AARCH64_FIND_FRINT_VARIANT (btrunc);
+CASE_CFN_ROUND:
+  return AARCH64_FIND_FRINT_VARIANT (round);
+CASE_CFN_NEARBYINT:
+  return AARCH64_FIND_FRINT_VARIANT (nearbyint);
+CASE_CFN_SQRT:
+  return AARCH64_FIND_FRINT_VARIANT (sqrt);
 #undef

[PATCH 03/12] remove conditional compilation of sdb debug info

2015-11-09 Thread tbsaunde+gcc

From: Trevor Saunders 

We need to include gsyms.h before tm.h because some targets (rl78 iirc) define
macros that conflict with identifiers in gsyms.h.  This means sdbout.c won't
produce correct output for those targets, but it previously couldn't either
because it wasn't compiled at all.

gcc/ChangeLog:

2015-11-09  Trevor Saunders  

* defaults.h: New definition of SDB_DEBUGGING_INFO.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in: Adjust.
* final.c (rest_of_clean_state): Remove check if
SDB_DEBUGGING_INFO is defined.
* function.c (number_blocks): Likewise.
* output.h: Likewise.
* sdbout.c: Likewise.
* toplev.c (process_options): Likewise.
---
 gcc/defaults.h | 8 ++--
 gcc/doc/tm.texi| 2 +-
 gcc/doc/tm.texi.in | 2 +-
 gcc/final.c| 6 +-
 gcc/function.c | 2 +-
 gcc/output.h   | 2 --
 gcc/sdbout.c   | 6 +-
 gcc/toplev.c   | 6 +-
 8 files changed, 12 insertions(+), 22 deletions(-)

diff --git a/gcc/defaults.h b/gcc/defaults.h
index cee799d..ddda89a 100644
--- a/gcc/defaults.h
+++ b/gcc/defaults.h
@@ -914,10 +914,14 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
 #define DEFAULT_GDB_EXTENSIONS 1
 #endif
 
+#ifndef SDB_DEBUGGING_INFO
+#define SDB_DEBUGGING_INFO 0
+#endif
+
 /* If more than one debugging type is supported, you must define
PREFERRED_DEBUGGING_TYPE to choose the default.  */
 
-#if 1 < (defined (DBX_DEBUGGING_INFO) + defined (SDB_DEBUGGING_INFO) \
+#if 1 < (defined (DBX_DEBUGGING_INFO) + (SDB_DEBUGGING_INFO) \
  + defined (DWARF2_DEBUGGING_INFO) + defined (XCOFF_DEBUGGING_INFO) \
  + defined (VMS_DEBUGGING_INFO))
 #ifndef PREFERRED_DEBUGGING_TYPE
@@ -929,7 +933,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
 #elif defined DBX_DEBUGGING_INFO
 #define PREFERRED_DEBUGGING_TYPE DBX_DEBUG
 
-#elif defined SDB_DEBUGGING_INFO
+#elif SDB_DEBUGGING_INFO
 #define PREFERRED_DEBUGGING_TYPE SDB_DEBUG
 
 #elif defined DWARF2_DEBUGGING_INFO || defined DWARF2_LINENO_DEBUGGING_INFO
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 5609a98..a174e21 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -9567,7 +9567,7 @@ whose value is the highest absolute text address in the 
file.
 Here are macros for SDB and DWARF output.
 
 @defmac SDB_DEBUGGING_INFO
-Define this macro if GCC should produce COFF-style debugging output
+Define this macro to 1 if GCC should produce COFF-style debugging output
 for SDB in response to the @option{-g} option.
 @end defmac
 
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 96ca063a..9c13e9b 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -6992,7 +6992,7 @@ whose value is the highest absolute text address in the 
file.
 Here are macros for SDB and DWARF output.
 
 @defmac SDB_DEBUGGING_INFO
-Define this macro if GCC should produce COFF-style debugging output
+Define this macro to 1 if GCC should produce COFF-style debugging output
 for SDB in response to the @option{-g} option.
 @end defmac
 
diff --git a/gcc/final.c b/gcc/final.c
index 30b3826..2f57b1b 100644
--- a/gcc/final.c
+++ b/gcc/final.c
@@ -88,9 +88,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "dbxout.h"
 #endif
 
-#ifdef SDB_DEBUGGING_INFO
 #include "sdbout.h"
-#endif
 
 /* Most ports that aren't using cc0 don't need to define CC_STATUS_INIT.
So define a null default for it to save conditionalization later.  */
@@ -4644,10 +4642,8 @@ rest_of_clean_state (void)
   /* In case the function was not output,
  don't leave any temporary anonymous types
  queued up for sdb output.  */
-#ifdef SDB_DEBUGGING_INFO
-  if (write_symbols == SDB_DEBUG)
+  if (SDB_DEBUGGING_INFO && write_symbols == SDB_DEBUG)
 sdbout_types (NULL_TREE);
-#endif
 
   flag_rerun_cse_after_global_opts = 0;
   reload_completed = 0;
diff --git a/gcc/function.c b/gcc/function.c
index a637cb3..afc2c87 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -4671,7 +4671,7 @@ number_blocks (tree fn)
   /* For SDB and XCOFF debugging output, we start numbering the blocks
  from 1 within each function, rather than keeping a running
  count.  */
-#if defined (SDB_DEBUGGING_INFO) || defined (XCOFF_DEBUGGING_INFO)
+#if SDB_DEBUGGING_INFO || defined (XCOFF_DEBUGGING_INFO)
   if (write_symbols == SDB_DEBUG || write_symbols == XCOFF_DEBUG)
 next_block_index = 1;
 #endif
diff --git a/gcc/output.h b/gcc/output.h
index f6a576c..d485cd6 100644
--- a/gcc/output.h
+++ b/gcc/output.h
@@ -309,9 +309,7 @@ extern rtx_sequence *final_sequence;
 /* The line number of the beginning of the current function.  Various
md code needs this so that it can output relative linenumbers.  */
 
-#ifdef SDB_DEBUGGING_INFO /* Avoid undef sym in certain broken linkers.  */
 extern int sdb_begin_function_line;
-#endif
 
 /* File in which assembler code is being written.  */
 
diff

[PATCH 08/12] always define DWARF2_LINENO_DEBUGGING_INFO

2015-11-09 Thread tbsaunde+gcc

From: Trevor Saunders 

gcc/ChangeLog:

2015-11-09  Trevor Saunders  

* defaults.h (DWARF2_LINENO_DEBUGGING_INFO): new default
definition.
* dwarf2out.c (dwarf2out_init): Adjust.
* opts.c (set_debug_level): Likewise.
* toplev.c (process_options): Likewise.
---
 gcc/defaults.h  | 6 +-
 gcc/dwarf2out.c | 4 ++--
 gcc/opts.c  | 9 -
 gcc/toplev.c| 4 +---
 4 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/gcc/defaults.h b/gcc/defaults.h
index d1728aa..65ffe59 100644
--- a/gcc/defaults.h
+++ b/gcc/defaults.h
@@ -926,6 +926,10 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
 #define DWARF2_DEBUGGING_INFO 0
 #endif
 
+#ifndef DWARF2_LINENO_DEBUGGING_INFO
+#define DWARF2_LINENO_DEBUGGING_INFO 0
+#endif
+
 #ifndef XCOFF_DEBUGGING_INFO
 #define XCOFF_DEBUGGING_INFO 0
 #endif
@@ -952,7 +956,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
 #elif SDB_DEBUGGING_INFO
 #define PREFERRED_DEBUGGING_TYPE SDB_DEBUG
 
-#elif DWARF2_DEBUGGING_INFO || defined DWARF2_LINENO_DEBUGGING_INFO
+#elif DWARF2_DEBUGGING_INFO || DWARF2_LINENO_DEBUGGING_INFO
 #define PREFERRED_DEBUGGING_TYPE DWARF2_DEBUG
 
 #elif VMS_DEBUGGING_INFO
diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index cb6acc6..2d94bc3 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -23257,7 +23257,7 @@ dwarf2out_init (const char *filename ATTRIBUTE_UNUSED)
   /* Allocate the file_table.  */
   file_table = hash_table::create_ggc (50);
 
-#ifndef DWARF2_LINENO_DEBUGGING_INFO
+#if !DWARF2_LINENO_DEBUGGING_INFO
   /* Allocate the decl_die_table.  */
   decl_die_table = hash_table::create_ggc (10);
 
@@ -23379,7 +23379,7 @@ dwarf2out_init (const char *filename ATTRIBUTE_UNUSED)
   text_section_line_info = new_line_info_table ();
   text_section_line_info->end_label = text_end_label;
 
-#ifdef DWARF2_LINENO_DEBUGGING_INFO
+#if DWARF2_LINENO_DEBUGGING_INFO
   cur_line_info_table = text_section_line_info;
 #endif
 
diff --git a/gcc/opts.c b/gcc/opts.c
index 0ed9ac6..1300a92 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -2287,11 +2287,10 @@ set_debug_level (enum debug_info_type type, int 
extended, const char *arg,
 
  if (extended == 2)
{
-#if DWARF2_DEBUGGING_INFO || defined DWARF2_LINENO_DEBUGGING_INFO
- opts->x_write_symbols = DWARF2_DEBUG;
-#elif DBX_DEBUGGING_INFO
- opts->x_write_symbols = DBX_DEBUG;
-#endif
+ if (DWARF2_DEBUGGING_INFO || DWARF2_LINENO_DEBUGGING_INFO)
+   opts->x_write_symbols = DWARF2_DEBUG;
+ else if (DBX_DEBUGGING_INFO)
+   opts->x_write_symbols = DBX_DEBUG;
}
 
  if (opts->x_write_symbols == NO_DEBUG)
diff --git a/gcc/toplev.c b/gcc/toplev.c
index d015f0f..f318a98 100644
--- a/gcc/toplev.c
+++ b/gcc/toplev.c
@@ -1380,10 +1380,8 @@ process_options (void)
   && (write_symbols == VMS_DEBUG
   || write_symbols == VMS_AND_DWARF2_DEBUG))
 debug_hooks = _debug_hooks;
-#ifdef DWARF2_LINENO_DEBUGGING_INFO
-  else if (write_symbols == DWARF2_DEBUG)
+  else if (DWARF2_LINENO_DEBUGGING_INFO && write_symbols == DWARF2_DEBUG)
 debug_hooks = _lineno_debug_hooks;
-#endif
   else
 error ("target system does not support the %qs debug format",
   debug_type_names[write_symbols]);
-- 
2.5.0.rc1.5.gc07173f

[PATCH 05/12] always define VMS_DEBUGGING_INFO

2015-11-09 Thread tbsaunde+gcc

From: Trevor Saunders 

gcc/ChangeLog:

2015-11-09  Trevor Saunders  

* defaults.h (VMS_DEBUGGING_INFO): New default definition.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in: Adjust.
* dwarf2out.c (output_file_names): Likewise.
(add_name_and_src_coords_attributes): Likewise.
* dwarf2out.h: Likewise.
* toplev.c (process_options): Likewise.
* vmsdbgout.c: Likewise.
---
 gcc/defaults.h | 8 ++--
 gcc/doc/tm.texi| 2 +-
 gcc/doc/tm.texi.in | 2 +-
 gcc/dwarf2out.c| 8 
 gcc/dwarf2out.h| 2 --
 gcc/toplev.c   | 6 +++---
 gcc/vmsdbgout.c| 2 +-
 7 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/gcc/defaults.h b/gcc/defaults.h
index b518863..0de7899 100644
--- a/gcc/defaults.h
+++ b/gcc/defaults.h
@@ -922,12 +922,16 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
 #define XCOFF_DEBUGGING_INFO 0
 #endif
 
+#ifndef VMS_DEBUGGING_INFO
+#define VMS_DEBUGGING_INFO 0
+#endif
+
 /* If more than one debugging type is supported, you must define
PREFERRED_DEBUGGING_TYPE to choose the default.  */
 
 #if 1 < (defined (DBX_DEBUGGING_INFO) + (SDB_DEBUGGING_INFO) \
 + defined (DWARF2_DEBUGGING_INFO) + (XCOFF_DEBUGGING_INFO) \
- + defined (VMS_DEBUGGING_INFO))
++ (VMS_DEBUGGING_INFO))
 #ifndef PREFERRED_DEBUGGING_TYPE
 #error You must define PREFERRED_DEBUGGING_TYPE
 #endif /* no PREFERRED_DEBUGGING_TYPE */
@@ -943,7 +947,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
 #elif defined DWARF2_DEBUGGING_INFO || defined DWARF2_LINENO_DEBUGGING_INFO
 #define PREFERRED_DEBUGGING_TYPE DWARF2_DEBUG
 
-#elif defined VMS_DEBUGGING_INFO
+#elif VMS_DEBUGGING_INFO
 #define PREFERRED_DEBUGGING_TYPE VMS_AND_DWARF2_DEBUG
 
 #elif XCOFF_DEBUGGING_INFO
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 0399248..b3b684a 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -9720,7 +9720,7 @@ number @var{line} of the current source file to the stdio 
stream
 Here are macros for VMS debug format.
 
 @defmac VMS_DEBUGGING_INFO
-Define this macro if GCC should produce debugging output for VMS
+Define this macro to 1 if GCC should produce debugging output for VMS
 in response to the @option{-g} option.  The default behavior for VMS
 is to generate minimal debug info for a traceback in the absence of
 @option{-g} unless explicitly overridden with @option{-g0}.  This
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 84e8383..0f0a4f2 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -7113,7 +7113,7 @@ number @var{line} of the current source file to the stdio 
stream
 Here are macros for VMS debug format.
 
 @defmac VMS_DEBUGGING_INFO
-Define this macro if GCC should produce debugging output for VMS
+Define this macro to 1 if GCC should produce debugging output for VMS
 in response to the @option{-g} option.  The default behavior for VMS
 is to generate minimal debug info for a traceback in the absence of
 @option{-g} unless explicitly overridden with @option{-g0}.  This
diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index 072e485..88c931c 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -101,7 +101,7 @@ static void dwarf2out_decl (tree);
 #define HAVE_XCOFF_DWARF_EXTRAS 0
 #endif
 
-#ifdef VMS_DEBUGGING_INFO
+#if VMS_DEBUGGING_INFO
 int vms_file_stats_name (const char *, long long *, long *, char *, int *);
 
 /* Define this macro to be a nonzero value if the directory specifications
@@ -10229,7 +10229,7 @@ output_file_names (void)
   int file_idx = backmap[i];
   int dir_idx = dirs[files[file_idx].dir_idx].dir_idx;
 
-#ifdef VMS_DEBUGGING_INFO
+#if VMS_DEBUGGING_INFO
 #define MAX_VMS_VERSION_LEN 6 /* ";32768" */
 
   /* Setting these fields can lead to debugger miscomparisons,
@@ -17319,7 +17319,7 @@ add_name_and_src_coords_attributes (dw_die_ref die, 
tree decl)
   add_linkage_name (die, decl);
 }
 
-#ifdef VMS_DEBUGGING_INFO
+#if VMS_DEBUGGING_INFO
   /* Get the function's name, as described by its RTL.  This may be different
  from the DECL_NAME name used in the source file.  */
   if (TREE_CODE (decl) == FUNCTION_DECL && TREE_ASM_WRITTEN (decl))
@@ -17331,7 +17331,7 @@ add_name_and_src_coords_attributes (dw_die_ref die, 
tree decl)
 #endif /* VMS_DEBUGGING_INFO */
 }
 
-#ifdef VMS_DEBUGGING_INFO
+#if VMS_DEBUGGING_INFO
 /* Output the debug main pointer die for VMS */
 
 void
diff --git a/gcc/dwarf2out.h b/gcc/dwarf2out.h
index 4fe3527..d344508 100644
--- a/gcc/dwarf2out.h
+++ b/gcc/dwarf2out.h
@@ -257,9 +257,7 @@ extern void debug_dwarf_loc_descr (dw_loc_descr_ref);
 extern void debug (die_struct );
 extern void debug (die_struct *ptr);
 extern void dwarf2out_set_demangle_name_func (const char *(*) (const char *));
-#ifdef VMS_DEBUGGING_INFO
 extern void dwarf2out_vms_debug_main_pointer (void);
-#endif
 
 enum array_descr_ordering
 {
diff --git a/gcc/toplev.c

Re: [PATCH 10/12] always define EH_RETURN_HANDLER_RTX

2015-11-09 Thread David Malcolm

On Mon, 2015-11-09 at 11:47 -0500, tbsaunde+...@tbsaunde.org wrote:
> From: Trevor Saunders 
> 
> gcc/ChangeLog:
> 
> 2015-11-09  Trevor Saunders  
> 
>   * defaults.h (EH_RETURN_HANDLER_RTX): New default definition.
>   * df-scan.c (df_get_exit_block_use_set): Adjust.
>   * except.c (expand_eh_return): Likewise.
> ---
>  gcc/defaults.h | 4 
>  gcc/df-scan.c  | 2 --
>  gcc/except.c   | 9 -
>  3 files changed, 8 insertions(+), 7 deletions(-)
> 
> diff --git a/gcc/defaults.h b/gcc/defaults.h
> index c20de44..047a0db 100644
> --- a/gcc/defaults.h
> +++ b/gcc/defaults.h
> @@ -1325,6 +1325,10 @@ see the files COPYING3 and COPYING.RUNTIME 
> respectively.  If not, see
>  #define TARGET_PECOFF 0
>  #endif
>  
> +#ifndef EH_RETURN_HANDLER_RTX
> +#define EH_RETURN_HANDLER_RTX NULL
> +#endif
> +
>  #ifdef GCC_INSN_FLAGS_H
>  /* Dependent default target macro definitions
>  
> diff --git a/gcc/df-scan.c b/gcc/df-scan.c
> index 2e5fe97..a735925 100644
> --- a/gcc/df-scan.c
> +++ b/gcc/df-scan.c
> @@ -3714,7 +3714,6 @@ df_get_exit_block_use_set (bitmap exit_block_uses)
>  }
>  #endif
>  
> -#ifdef EH_RETURN_HANDLER_RTX
>if ((!targetm.have_epilogue () || ! epilogue_completed)
>&& crtl->calls_eh_return)
>  {
> @@ -3722,7 +3721,6 @@ df_get_exit_block_use_set (bitmap exit_block_uses)
>if (tmp && REG_P (tmp))
>   df_mark_reg (tmp, exit_block_uses);
>  }
> -#endif
>  
>/* Mark function return value.  */
>diddle_return_value (df_mark_reg, (void*) exit_block_uses);
> diff --git a/gcc/except.c b/gcc/except.c
> index 1801fe7..1a41a34 100644
> --- a/gcc/except.c
> +++ b/gcc/except.c
> @@ -2255,11 +2255,10 @@ expand_eh_return (void)
>  emit_insn (targetm.gen_eh_return (crtl->eh.ehr_handler));
>else
>  {
> -#ifdef EH_RETURN_HANDLER_RTX
> -  emit_move_insn (EH_RETURN_HANDLER_RTX, crtl->eh.ehr_handler);
> -#else
> -  error ("__builtin_eh_return not supported on this target");
> -#endif
> +  if (rtx handler = EH_RETURN_HANDLER_RTX)

Would this be clearer as

 rtx handler = EH_RETURN_HANDLER_RTX;
 if (handler)

?  (to avoid an assignment inside a conditional)

> + emit_move_insn (handler, crtl->eh.ehr_handler);
> +  else
> + error ("__builtin_eh_return not supported on this target");
>  }
>  
>emit_label (around_label);

Re: [PATCH], Add power9 support to GCC, patch #2 (add modulus instructions)

2015-11-09 Thread Michael Meissner

On Mon, Nov 09, 2015 at 09:48:50AM -0600, Segher Boessenkool wrote:
> Hi,
> 
> On Sun, Nov 08, 2015 at 07:36:16PM -0500, Michael Meissner wrote:
> > [gcc/testsuite]
> > * lib/target-supports.exp (check_p9vector_hw_available): Add
> > checks for power9 availability.
> > (check_effective_target_powerpc_p9vector_ok): Likewise.
> 
> It's probably better not to use this for modulo; it is confusing and if
> you'll later need to untangle it it is much more work.
> 
> > +/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
> > +/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
> 
> Lose this line?  If Darwin cannot support modulo, the next line will
> catch that.
> 
> +/* { dg-require-effective-target powerpc_p9vector_ok } */
> > +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
> > "-mcpu=power9" } } */
> > +/* { dg-options "-mcpu=power9 -O3" } */
> 
> Is -O3 needed?  Why won't -O2 work?

Just habit.

> > +proc check_p9vector_hw_available { } {
> > +return [check_cached_effective_target p9vector_hw_available {
> > +   # Some simulators are known to not support VSX/power8 instructions.
> > +   # For now, disable on Darwin
> > +   if { [istarget powerpc-*-eabi] || [istarget powerpc*-*-eabispe] || 
> > [istarget *-*-darwin*]} {
> 
> Long line.

Cut and paste from other tests.

> > Index: gcc/config/rs6000/rs6000.md
> > ===
> > --- gcc/config/rs6000/rs6000.md (revision 229972)
> > +++ gcc/config/rs6000/rs6000.md (working copy)
> > @@ -2885,9 +2885,9 @@ (define_insn_and_split "*div3_sra_
> > (set_attr "cell_micro" "not")])
> >  
> >  (define_expand "mod3"
> > -  [(use (match_operand:GPR 0 "gpc_reg_operand" ""))
> > -   (use (match_operand:GPR 1 "gpc_reg_operand" ""))
> > -   (use (match_operand:GPR 2 "reg_or_cint_operand" ""))]
> > +  [(set (match_operand:GPR 0 "gpc_reg_operand" "")
> > +   (mod:GPR (match_operand:GPR 1 "gpc_reg_operand" "")
> > +(match_operand:GPR 2 "reg_or_cint_operand" "")))]
> 
> You could delete the empty constraint strings while you're at it.
> 
> > +;; On machines with modulo support, do a combined div/mod the old fashioned
> > +;; method, since the multiply/subtract is faster than doing the mod 
> > instruction
> > +;; after a divide.
> 
> You can instead have a "divmod" insn that is split to either of div, mod,
> or div+mul+sub depending on which of the outputs is unused.  Peepholes
> do not get all cases.

Yes, though as I recall, I couldn't get it to do what I wanted, and moved on to
other targets.

> This can be a later improvement of course.

Yep.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797

Re: [PATCH 05/12] always define VMS_DEBUGGING_INFO

2015-11-09 Thread Jeff Law


On 11/09/2015 11:34 AM, Bernd Schmidt wrote:

In general I think the _DEBUGGING_INFO patches are going to be OK,
modulo Jeff's comment about stage 1. I think they shouldn't have been
split - it causes numerous unnecessary extra changes, and the
intermediate stages look very inconsistent.


-#ifdef VMS_DEBUGGING_INFO
-  else if (write_symbols == VMS_DEBUG || write_symbols ==
VMS_AND_DWARF2_DEBUG)
+  else if (VMS_DEBUGGING_INFO
+   && (write_symbols == VMS_DEBUG
+   || write_symbols == VMS_AND_DWARF2_DEBUG))
  debug_hooks = _debug_hooks;
-#endif
  #ifdef DWARF2_LINENO_DEBUGGING_INFO
else if (write_symbols == DWARF2_DEBUG)
  debug_hooks = _lineno_debug_hooks;
diff --git a/gcc/vmsdbgout.c b/gcc/vmsdbgout.c
index d41d4b2..6dd6878 100644
--- a/gcc/vmsdbgout.c
+++ b/gcc/vmsdbgout.c
@@ -24,7 +24,7 @@ along with GCC; see the file COPYING3.  If not see
  #include "coretypes.h"
  #include "tm.h"

-#ifdef VMS_DEBUGGING_INFO
+#if VMS_DEBUGGING_INFO
  #include "alias.h"
  #include "tree.h"
  #include "varasm.h"


This seems to reference vmsdbg_debug_hooks unconditionally, but as far
as I can tell the definition is still guarded by an #if? Does this compile?
There's an easy way for Trevor to find out.  Build a cross for one of 
the VMS targets (there's 3 defined in config-list.mk) :-)


jeff

Re: [PATCH 02/12] remove EXTENDED_SDB_BASIC_TYPES

2015-11-09 Thread Bernd Schmidt



The last target using this was i960, which was removed many years ago,
so there's no reason to keep it.

gcc/ChangeLog:

2015-11-09  Trevor Saunders  

* gsyms.h (enum sdb_type): Remove code for
EXTENDED_SDB_BASIC_TYPES.
(enum sdb_masks): Likewise.
* sdbout.c (plain_type_1): Likewise.


Ok if you also poison the macro name as usual.


Bernd

Re: [PATCH], Add power9 support to GCC, patch #3 (scalar count trailing zeros)

2015-11-09 Thread Michael Meissner

On Mon, Nov 09, 2015 at 09:59:43AM -0600, Segher Boessenkool wrote:
> On Sun, Nov 08, 2015 at 07:37:53PM -0500, Michael Meissner wrote:
> > This patch adds support for scalar count trailing zeros instruction that is
> > being added to ISA 3.0 (power9).
> 
> I bet you should change CTZ_DEFINED_VALUE_AT_ZERO as well.
> 
> > +(define_insn "ctz2_hw"
> > +  [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
> > +   (ctz:GPR (match_operand:GPR 1 "gpc_reg_operand" "r")))]
> > +  "TARGET_CTZ"
> > +  "cnttz %0,%1"
> > +  [(set_attr "type" "cntlz")])
> 
> We should probably rename this attr value now.  "cntz" maybe?  Could be
> later of course.

I don't see a need to add another type attribute for count trailing zeros
unless count leading zeros has a different timing than count trailing zeros.
The cntlz attribute was added because in Power7 the CNTLZ instruction became a
2 cycle instruction, and we wanted to model this in power7.md (and hence cntlz
was split from the simple integer attribute).

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797

Re: [PATCH 12/12] always define ENABLE_OFFLOADING

2015-11-09 Thread Bernd Schmidt


On 11/09/2015 05:47 PM, tbsaunde+...@tbsaunde.org wrote:

-#ifdef ENABLE_OFFLOADING
/* If the user didn't specify any, default to all configured offload
   targets.  */
if (offload_targets == NULL)
  handle_foffload_option (OFFLOAD_TARGETS);
-#endif


This one I would keep guarded with an if.

Otherwise ok modulo stage 1 end.


Bernd

Re: [PATCH 00/12] misc conditional compilation work

2015-11-09 Thread Trevor Saunders

On Mon, Nov 09, 2015 at 10:57:10AM -0700, Jeff Law wrote:
> On 11/09/2015 09:47 AM, tbsaunde+...@tbsaunde.org wrote:
> >From: Trevor Saunders 
> >
> >Hi,
> >
> >basically $subject, making some code unconditionally compiled, and changing
> >other things from #ifdef to #if so they can be made unconditional
> >incrementally.
> >
> >patches individually bootstrapped + regtested on x86_64-linux-gnu, and a
> >slightly earlier version of the series ran through config-list.mk.  I think
> >everything here is either preapproved, or obvious so I'll commit it later
> >today if nobody complains.
> Are these the last patches of this nature planned for GCC6?  While the
> window was left slightly open by Richi this morning, I think that's more to
> allow the queues to drain rather than to allow more new work to go into the
> tree :-)

yeah, I guess I misread, I thought the end was tonight not last night (I
could easily have sent this out a day or so earlier).  Given my in
correct assumption about timing I was considering trying to sneak in a
little more around reg-stack.c, but I suspect that isn't going to work
out anyway (turns out even after the macros reg-stack.c uses x86
specific variables).

Trev

> 
> jeff
>

Re: [PATCH v2 11/13] Test case for conversion from __seg_tls:0

2015-11-09 Thread Thomas Schwinge

Hi!

On Mon, 9 Nov 2015 15:46:20 +0100, Richard Biener  
wrote:
> On Tue, Oct 20, 2015 at 11:27 PM, Richard Henderson  wrote:
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/addr-space-3.c
> > @@ -0,0 +1,10 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O" } */
> > +/* { dg-final { scan-assembler "[fg]s:0" } } */
> 
> Causes
> 
> ERROR: (DejaGnu) proc "fg" does not exist.
> The error code is NONE
> The info on the error is:
> close: spawn id exp6 not open
> while executing
> "close -i exp6"
> invoked from within
> "catch "close -i $spawn_id""

In r230038, I checked in the the following, as obvious:

commit a7d978247cd261d66010195908ce0e9ef0e501b9
Author: tschwinge 
Date:   Mon Nov 9 17:53:02 2015 +

Resolve DejaGnu hard stop

gcc/testsuite/
* gcc.target/i386/addr-space-3.c: Fix quoting in dg-final
scan-assembler directive.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@230038 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/testsuite/ChangeLog  |5 +
 gcc/testsuite/gcc.target/i386/addr-space-3.c |2 +-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git gcc/testsuite/ChangeLog gcc/testsuite/ChangeLog
index ca1991b..da4f940 100644
--- gcc/testsuite/ChangeLog
+++ gcc/testsuite/ChangeLog
@@ -1,3 +1,8 @@
+2015-11-09  Thomas Schwinge  
+
+   * gcc.target/i386/addr-space-3.c: Fix quoting in dg-final
+   scan-assembler directive.
+
 2015-11-09  Kyrylo Tkachov  
 
PR target/68129
diff --git gcc/testsuite/gcc.target/i386/addr-space-3.c 
gcc/testsuite/gcc.target/i386/addr-space-3.c
index 63f1f03..2b6f47e 100644
--- gcc/testsuite/gcc.target/i386/addr-space-3.c
+++ gcc/testsuite/gcc.target/i386/addr-space-3.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O" } */
-/* { dg-final { scan-assembler "[fg]s:0" } } */
+/* { dg-final { scan-assembler "\[fg]s:0" } } */
 
 void test(int *y)
 {


Grüße
 Thomas


signature.asc
Description: PGP signature

Re: [PATCH 10/12] always define EH_RETURN_HANDLER_RTX

2015-11-09 Thread Bernd Schmidt


On 11/09/2015 07:42 PM, Jeff Law wrote:

On 11/09/2015 11:27 AM, Bernd Schmidt wrote:

On 11/09/2015 05:47 PM, tbsaunde+...@tbsaunde.org wrote:

From: Trevor Saunders 

gcc/ChangeLog:

2015-11-09  Trevor Saunders  

* defaults.h (EH_RETURN_HANDLER_RTX): New default definition.
* df-scan.c (df_get_exit_block_use_set): Adjust.
* except.c (expand_eh_return): Likewise.


As I said for a previous patch series, if we go to the trouble of fixing
up stuff like this, we might as well do it properly and turn things like
this into a target hook.

I agree that pushing hookization further is good as well.  I still think
the patch in and of itself is a step forward, even if it doesn't hookize
EH_RETURN_HANDLER_RTX.


Well, I was hoping that, by pointing out the issue for the last patch 
set, the next set of patches would get things right. We really shouldn't 
make sideways steps when there's a simple way to go forward.



Bernd

Re: [PATCH], Add power9 support to GCC, patch #5 (ISA 3.0 fusion)

2015-11-09 Thread David Edelsohn

On Sun, Nov 8, 2015 at 4:42 PM, Michael Meissner
 wrote:
> This patch adds support for new fusion forms in ISA 3.0 (power9).  In
> particular, ISA 3.0 can fuse GPR loads of R0, FPR loads, GPR stores, FPR
> stores, and some constant generation that ISA 2.07 (power8) could not
> generate.
>
> I have built this patch with a bootstrap build on a power8 little endian
> system.  There were no regressions in the test suite.  Is this patch ok to
> install in the trunk once patch #1 has been installed.
>
> [gcc]
> 2015-11-08  Michael Meissner  
>
> * config/rs6000/constraints.md (wF constraint): New constraints
> for power9/toc fusion.
> (wG constraint): Likewise.
>
> * config/rs6000/predicates.md (upper16_cint_operand): New
> predicate for power9 and toc fusion.
> (fpr_reg_operand): Likewise.
> (toc_fusion_or_p9_reg_operand): Likewise.
> (toc_fusion_mem_raw): Likewise.
> (toc_fusion_mem_wrapped): Likewise.
> (fusion_gpr_addis): If power9 fusion, allow fusion for a larger
> address range.
> (fusion_gpr_mem_combo): Delete, use fusion_addis_mem_combo_load
> instead.
> (fusion_addis_mem_combo_load): Add support for power9 fusion of
> floating point loads, floating point stores, and gpr stores.
> (fusion_addis_mem_combo_store): Likewise.
> (fusion_offsettable_mem_operand): Likewise.
>
> * config/rs6000/rs6000-protos.h (emit_fusion_addis): Add
> declarations.
> (emit_fusion_load_store): Likewise.
> (fusion_p9_p): Likewise.
> (expand_fusion_p9_load): Likewise.
> (expand_fusion_p9_store): Likewise.
> (emit_fusion_p9_load): Likewise.
> (emit_fusion_p9_store): Likewise.
> (fusion_wrap_memory_address): Likewise.
>
> * config/rs6000/rs6000.c (struct rs6000_reg_addr): Add new
> elements for power9 fusion.
> (rs6000_debug_print_mode): Rework debug information to print more
> information about fusion.
> (rs6000_init_hard_regno_mode_ok): Setup for power9 fusion
> support.
> (rs6000_legitimate_address_p): Recognize toc fusion as a valid
> offsettable memory address.
> (emit_fusion_gpr_load): Move most of the code from
> emit_fusion_gpr_load into emit_fusion-addis that handles both
> power8 and power9 fusion.
> (emit_fusion_addis): Likewise.
> (emit_fusion_load_store): Likewise.
> (fusion_wrap_memory_address): Add support for TOC fusion.
> (fusion_split_address): Likewise.
> (fusion_p9_p): Add support for power9 fusion.
> (expand_fusion_p9_load): Likewise.
> (expand_fusion_p9_store): Likewise.
> (emit_fusion_p9_load): Likewise.
> (emit_fusion_p9_store): Likewise.
>
> * config/rs6000/rs6000.h (TARGET_TOC_FUSION_INT): New macros for
> power9 fusion support.
> (TARGET_TOC_FUSION_FP): Likewise.
>
> * config/rs6000/rs6000.md (UNSPEC_FUSION_P9): New power9/toc
> fusion unspecs.
> (UNSPEC_FUSION_ADDIS): Likewise.
> (QHSI mode iterator): New iterator for power9 fusion.
> (GPR_FUSION): Likewise.
> (FPR_FUSION): Likewise.
> (power9 fusion splitter): New power9/toc fusion support.
> (toc_fusionload_): Likewise.
> (toc_fusionload_di): Likewise.
> (fusion_gpr_load_): Update predicate function.
> (power9 fusion peephole2s): New power9/toc fusion support.
> (fusion_gpr___load): Likewise.
> (fusion_gpr___store): Likewise.
> (fusion_fpr___load): Likewise.
> (fusion_fpr___store): Likewise.
> (fusion_p9__constant): Likewise.
>
> [gcc/testsuite]
> 2015-11-08  Michael Meissner  
>
> * gcc.target/powerpc/fusion.c (fusion_vector): Move to fusion2.c
> and allow the test on PowerPC LE.
> * gcc.target/powerpc/fusion2.c (fusion_vector): Likewise.
>
> * gcc.target/powerpc/fusion3.c: New file, test power9 fusion.

Okay, with the changes that you and Segher discussed.

Thanks, David

Re: RFC: Incomplete Draft Patches to Correct Errors in Loop Unrolling Frequencies (bugzilla problem 68212)

2015-11-09 Thread Bernd Schmidt


On 11/07/2015 03:44 PM, Kelvin Nilsen wrote:

This is a draft patch to partially address the concerns described in
bugzilla problem report
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68212). The patch is
incomplete in the sense that there are some known shortcomings with
nested loops which I am still working on.  I am sending this out for
comments at this time because we would like these patches to be
integrated into the GCC 6 release and want to begin responding to
community feedback as soon as possible in order to make the integration
possible.


I'll mainly comment on style points right now. Your code generally looks 
mostly good but doesn't quite follow our guidelines. In terms of logic 
I'm sure there will be followup questions after the first round of 
points is addressed. Others might be better qualified to review the 
frequencies manipulation; for this first round I'm just assuming that 
the general idea is sound (but would appreciate input).



1. Before a loop body is unpeeled into a pre-header location, we
temporarily adjust the loop body frequencies to represent the values
appropriate for the context into which the loop body is to be
copied.

2. After unrolling the loop body (by replicating the loop body (N-1)
times within the loop), we recompute all frequencies associated with
blocks contained within the loop.


If these are independent from each other it might be better to split up 
the patch.



 (check_loop_frequency_integrity): new helper routine
 (set_zero_probability): added another parameter
 (duplicate_loop_to_header_edge): Add code to recompute loop
body frequencies after blocks are replicated (unrolled) into the loop
body. Introduce certain help routines because existing infrastructure
routines are not reliable during typical executions of
duplicate_loop_to_header_edge().


Please review our guidelines how to write ChangeLogs - capitalize, 
punctuate, and document only the what, not the why. Also, linewrap manually.



opt_info_start_duplication (opt_info);
+
ok = duplicate_loop_to_header_edge (loop, loop_latch_edge (loop),


Please make sure you don't change whitespace unnecessarily. There are a 
few other occurrences in the patch, and also cases where you seem to be 
adding spaces to otherwise blank lines.



@@ -1015,14 +1041,44 @@ unroll_loop_runtime_iterations (struct loop *loop)
bitmap_clear_bit (wont_exit, may_exit_copy);
opt_info_start_duplication (opt_info);

+  {
+/* Recompute the loop body frequencies. */
+zero_loop_frequencies (loop);


No reason to start a braced block here.


+/* Scale the incoming frequencies according to the heuristic that
+ * the loop frequency is the incoming edge frequency divided by
+ * 0.09.  This heuristic applies only to loops that iterate over a
+ * run-time value that is not known at compile time.  Note that
+ * 1/.09 equals 11..  We'll use integer arithmetic on ten
+ * thousandths, and then divide by 10,000 after we've "rounded".
+ */


Please examine the comment style in gcc - no asterisks to start new 
lines, and comment terminators don't go on their own line.



+sum_incoming_frequencies *= 11;  /* multiply by 11. */
+sum_incoming_frequencies += 5000;/* round by adding 0.5 */
+sum_incoming_frequencies /= 1;  /* convert ten thousandths
+   to ones
+*/


These comments could also be improved, but really they should just be 
removed since they're pretty obvious and redundant with the one before.



+/* Define ENABLE_CHECKING to enforce the following run-time checks.


"With checking enabled, the following run-time checks are performed:"


+ * This may report false-positive errors due to round-off errors.


That doesn't sound good as it could lead to bootstrap failures when 
checking is enabled.



@@ -44,6 +55,543 @@ static void fix_loop_placements (struct loop *, bo
  static bool fix_bb_placement (basic_block);
  static void fix_bb_placements (basic_block, bool *, bitmap);

+/*
+ * Return true iff block is considered to reside within the loop
+ * represented by loop_ptr.
+ */


Arguments are capitalized in function comments.


+bool
+in_loop_p (basic_block block, struct loop *loop_ptr)
+{
+  basic_block *bbs = get_loop_body (loop_ptr);
+  bool result = false;
+
+  for (unsigned int i = 0; i < loop_ptr->num_nodes; i++)
+{
+  if (bbs[i] == block)
+   result = true;
+}


I think something that starts with bb->loop_father and iterates outwards 
would be more efficient.



+/* A list of block_ladder_rung structs is used to keep track of all the
+ * blocks visited in a depth-first recursive traversal of a control-flow
+ * graph.  This list is used to detect and prevent attempts to revisit
+ * a block that is already being visited in the recursive traversal.
+ */
+typedef struct block_ladder_rung {
+  basic_block block;
+  struct

Re: [PATCH 5/6] Simplify rs6000_builtin_vectorized_function

2015-11-09 Thread David Edelsohn

On Mon, Nov 9, 2015 at 8:30 AM, Richard Sandiford
 wrote:
> After the previous patches it's no longer necessary for
> TARGET_BUILTIN_VECTORIZED_FUNCTION to return functions that
> map to the vector optab of the original operation.  We'll use
> a vector form of the internal function instead.
>
>
> gcc/
> * config/rs6000/rs6000.c (rs6000_builtin_vectorized_function): Remove
> entries that map directly to optabs.

Okay.

Thanks, David

Re: [PATCH 11/12] always define HAVE_AS_LEB128

2015-11-09 Thread Bernd Schmidt


-#ifdef HAVE_AS_LEB128
+#if HAVE_AS_LEB128


This patch doesn't seem to actually remove any conditional compilation?


Bernd

Re: [PATCH 10/12] always define EH_RETURN_HANDLER_RTX

2015-11-09 Thread Jeff Law


On 11/09/2015 11:27 AM, Bernd Schmidt wrote:

On 11/09/2015 05:47 PM, tbsaunde+...@tbsaunde.org wrote:

From: Trevor Saunders 

gcc/ChangeLog:

2015-11-09  Trevor Saunders  

* defaults.h (EH_RETURN_HANDLER_RTX): New default definition.
* df-scan.c (df_get_exit_block_use_set): Adjust.
* except.c (expand_eh_return): Likewise.


As I said for a previous patch series, if we go to the trouble of fixing
up stuff like this, we might as well do it properly and turn things like
this into a target hook.
I agree that pushing hookization further is good as well.  I still think 
the patch in and of itself is a step forward, even if it doesn't hookize 
EH_RETURN_HANDLER_RTX.


jeff

Re: [PATCH][AArch64] PR target/68129: Define TARGET_SUPPORTS_WIDE_INT

2015-11-09 Thread Mike Stump

On Nov 9, 2015, at 3:32 AM, Kyrill Tkachov  wrote:
> The aarch64 port does not define TARGET_SUPPORTS_WIDE_INT.

> Ok for trunk and GCC 5?

:-)  I’d endorse it, but, best left to the target folks.

Re: [PATCH 10/12] always define EH_RETURN_HANDLER_RTX

2015-11-09 Thread Trevor Saunders

On Mon, Nov 09, 2015 at 11:42:19AM -0700, Jeff Law wrote:
> On 11/09/2015 11:27 AM, Bernd Schmidt wrote:
> >On 11/09/2015 05:47 PM, tbsaunde+...@tbsaunde.org wrote:
> >>From: Trevor Saunders 
> >>
> >>gcc/ChangeLog:
> >>
> >>2015-11-09  Trevor Saunders  
> >>
> >>* defaults.h (EH_RETURN_HANDLER_RTX): New default definition.
> >>* df-scan.c (df_get_exit_block_use_set): Adjust.
> >>* except.c (expand_eh_return): Likewise.
> >
> >As I said for a previous patch series, if we go to the trouble of fixing
> >up stuff like this, we might as well do it properly and turn things like
> >this into a target hook.
> I agree that pushing hookization further is good as well.  I still think the
> patch in and of itself is a step forward, even if it doesn't hookize
> EH_RETURN_HANDLER_RTX.

yeah, that's more or less my thought, and this makes hookization easier
since you can now mechanically add a hook for each thing in defaults.h
that invokes the macro.  Then for each target you can go through and
replace the macro with an override of the hooks.  That ends up with the
macros replaced by hooks without writing a lot of patches that need to
go through config-list.mk, and testing on multiple targets which imho is
a giant pain, and rather slow.

Trev

> 
> jeff

Re: [Patch AArch64] Switch constant pools to separate rodata sections.

2015-11-09 Thread James Greenhalgh

On Mon, Nov 09, 2015 at 04:46:01PM +, Ramana Radhakrishnan wrote:
> 
> 
> On 08/11/15 11:42, Andreas Schwab wrote:
> > This is causing a bootstrap comparison failure in gcc/go/gogo.o.
> > 
> > Andreas.
> > 
> 
> I've had a look at this for sometime this afternoon and the trigger is the
> aarch64_use_constant_blocks_p change which appears to be causing a bootstrap
> comparison failure because of differences to offsets in add instructions when
> built with debug and without debug. For now, in the interest of go bootstraps
> continuing on trunk - I'm proposing a patch that partially rolls back the
> change in aarch64_use_constant_blocks_p and will still look into the
> underlying issue.
> 
> Bootstrapped on aarch64-none-linux-gnu including (c,c++ and go) - testing
> finished ok.
> 
> Ok ?

I agreem, this seems like the timely way to get the auto-testers back
building.

OK.

Thanks,
James

> 
> 
> Ramana
> 
> 
> PR bootstrap/68256
> 
> * config/aarch64/aarch64.c (aarch64_use_constant_blocks_p): Return false.

> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 1b7be83..1fff878 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -5251,9 +5251,11 @@ aarch64_can_use_per_function_literal_pools_p (void)
>  static bool
>  aarch64_use_blocks_for_constant_p (machine_mode, const_rtx)
>  {
> -  /* We can't use blocks for constants when we're using a per-function
> - constant pool.  */
> -  return !aarch64_can_use_per_function_literal_pools_p ();
> +  /* Fixme:: In an ideal world this would work similar
> + to the logic in aarch64_select_rtx_section but this
> + breaks bootstrap in gcc go.  For now we workaround
> + this by returning false here.  */
> +  return false;
>  }
>  
>  /* Select appropriate section for constants depending

Re: [PATCH 00/12] misc conditional compilation work

2015-11-09 Thread Jeff Law


On 11/09/2015 09:47 AM, tbsaunde+...@tbsaunde.org wrote:

From: Trevor Saunders 

Hi,

basically $subject, making some code unconditionally compiled, and changing
other things from #ifdef to #if so they can be made unconditional
incrementally.

patches individually bootstrapped + regtested on x86_64-linux-gnu, and a
slightly earlier version of the series ran through config-list.mk.  I think
everything here is either preapproved, or obvious so I'll commit it later
today if nobody complains.
Are these the last patches of this nature planned for GCC6?  While the 
window was left slightly open by Richi this morning, I think that's more 
to allow the queues to drain rather than to allow more new work to go 
into the tree :-)


jeff

Re: [PATCH], Add power9 support to GCC, patch #4

2015-11-09 Thread David Edelsohn

On Sun, Nov 8, 2015 at 4:39 PM, Michael Meissner
 wrote:
> This patch adds support for the EXTSWSLI instruction that is being added to
> PowerPC ISA 3.0 (power9).
>
> I have built this patch (along with patches #2 and #3) with a bootstrap build
> on a power8 little endian system.  There were no regressions in the test
> suite.  Is this patch ok to install in the trunk once patch #1 has been
> installed.
>
> [gcc]
> 2015-11-08  Michael Meissner  
>
> * config/rs6000/predicates.md (u6bit_cint_operand): New
> predicate, recognize 0..63.
>
> * config/rs6000/rs6000.c (rs6000_rtx_costs): Adjust the costs if
> the EXTSWSLI instruction is generated.
>
> * config/rs6000/rs6000.h (TARGET_EXTSWSLI): Add support for ISA
> 3.0 EXTSWSLI instruction.
> * config/rs6000/rs6000.md (ashdi3_extswsli): Likewise.
> (ashdi3_extswsli_dot): Likewise.
> (ashdi3_extswsli_dot2): Likewise.
>
> [gcc/testsuite]
> 2015-11-08  Michael Meissner  
>
> * gcc.target/powerpc/extswsli-1.c: New file to test EXTSWSLI
> instruction generation.
> * gcc.target/powerpc/extswsli-2.c: Likewise.
> * gcc.target/powerpc/extswsli-3.c: Likewise.

Okay.

Thanks, David

Re: [PATCH][AArch64] PR target/68129: Define TARGET_SUPPORTS_WIDE_INT

2015-11-09 Thread Kyrill Tkachov



On 09/11/15 15:34, Marcus Shawcroft wrote:

On 9 November 2015 at 11:32, Kyrill Tkachov  wrote:


2015-11-09  Kyrylo Tkachov  

 PR target/68129
 * config/aarch64/aarch64.h (TARGET_SUPPORTS_WIDE_INT): Define to 1.
 * config/aarch64/aarch64.c (aarch64_print_operand, CONST_DOUBLE):
 Delete VOIDmode case.  Assert that mode is not VOIDmode.
 * config/aarch64/predicates.md (const0_operand): Remove const_double
 match.

2015-11-09  Kyrylo Tkachov  

 PR target/68129
 * gcc.target/aarch64/pr68129_1.c: New test.

Hi, This test isn't aarch64 specific, does it need to be in gcc.target/aarch64 ?


Not really, here is the patch with the test in gcc.dg/ if that's preferred.

Thanks,
Kyrill

2015-11-09  Kyrylo Tkachov  

PR target/68129
* config/aarch64/aarch64.h (TARGET_SUPPORTS_WIDE_INT): Define to 1.
* config/aarch64/aarch64.c (aarch64_print_operand, CONST_DOUBLE):
Delete VOIDmode case.  Assert that mode is not VOIDmode.
* config/aarch64/predicates.md (const0_operand): Remove const_double
match.

2015-11-09  Kyrylo Tkachov  

PR target/68129
* gcc.dg/pr68129_1.c: New test.


Cheers
/Marcus



commit 623ffaa527b17ad01179c30c1d4a9911243f818a
Author: Kyrylo Tkachov 
Date:   Wed Oct 28 10:49:44 2015 +

[AArch64] PR target/68129: Define TARGET_SUPPORTS_WIDE_INT

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index ce155dc..927b72a 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -4403,11 +4403,10 @@ aarch64_print_operand (FILE *f, rtx x, char code)
 	  break;
 
 	case CONST_DOUBLE:
-	  /* CONST_DOUBLE can represent a double-width integer.
-	 In this case, the mode of x is VOIDmode.  */
-	  if (GET_MODE (x) == VOIDmode)
-	; /* Do Nothing.  */
-	  else if (aarch64_float_const_zero_rtx_p (x))
+	  /* Since we define TARGET_SUPPORTS_WIDE_INT we shouldn't ever
+	 be getting CONST_DOUBLEs holding integers.  */
+	  gcc_assert (GET_MODE (x) != VOIDmode);
+	  if (aarch64_float_const_zero_rtx_p (x))
 	{
 	  fputc ('0', f);
 	  break;
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index b041a1e..0fac0a7 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -863,6 +863,8 @@ extern enum aarch64_code_model aarch64_cmodel;
   (aarch64_cmodel == AARCH64_CMODEL_TINY		\
|| aarch64_cmodel == AARCH64_CMODEL_TINY_PIC)
 
+#define TARGET_SUPPORTS_WIDE_INT 1
+
 /* Modes valid for AdvSIMD D registers, i.e. that fit in half a Q register.  */
 #define AARCH64_VALID_SIMD_DREG_MODE(MODE) \
   ((MODE) == V2SImode || (MODE) == V4HImode || (MODE) == V8QImode \
diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
index 1bcbf62..8775460 100644
--- a/gcc/config/aarch64/predicates.md
+++ b/gcc/config/aarch64/predicates.md
@@ -32,7 +32,7 @@ (define_predicate "aarch64_call_insn_operand"
 
 ;; Return true if OP a (const_int 0) operand.
 (define_predicate "const0_operand"
-  (and (match_code "const_int, const_double")
+  (and (match_code "const_int")
(match_test "op == CONST0_RTX (mode)")))
 
 (define_predicate "aarch64_ccmp_immediate"
diff --git a/gcc/testsuite/gcc.dg/pr68129_1.c b/gcc/testsuite/gcc.dg/pr68129_1.c
new file mode 100644
index 000..112331e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr68129_1.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fno-split-wide-types" } */
+
+typedef int V __attribute__ ((vector_size (8 * sizeof (int;
+
+void
+foo (V *p, V *q)
+{
+  *p = (*p == *q);
+}

[PATCH, 2/16] Make create_parallel_loop return void

2015-11-09 Thread Tom de Vries


On 09/11/15 16:35, Tom de Vries wrote:

Hi,

this patch series for stage1 trunk adds support to:
- parallelize oacc kernels regions using parloops, and
- map the loops onto the oacc gang dimension.

The patch series contains these patches:

  1Insert new exit block only when needed in
 transform_to_exit_first_loop_alt
  2Make create_parallel_loop return void
  3Ignore reduction clause on kernels directive
  4Implement -foffload-alias
  5Add in_oacc_kernels_region in struct loop
  6Add pass_oacc_kernels
  7Add pass_dominator_oacc_kernels
  8Add pass_ch_oacc_kernels
  9Add pass_parallelize_loops_oacc_kernels
 10Add pass_oacc_kernels pass group in passes.def
 11Update testcases after adding kernels pass group
 12Handle acc loop directive
 13Add c-c++-common/goacc/kernels-*.c
 14Add gfortran.dg/goacc/kernels-*.f95
 15Add libgomp.oacc-c-c++-common/kernels-*.c
 16Add libgomp.oacc-fortran/kernels-*.f95

The first 9 patches are more or less independent, but patches 10-16 are
intended to be committed at the same time.

Bootstrapped and reg-tested on x86_64.

Build and reg-tested with nvidia accelerator, in combination with a
patch that enables accelerator testing (which is submitted at
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ).

I'll post the individual patches in reply to this message.


this patch makes create_parallel_loop return void.  The result is 
currently unused.


Thanks,
- Tom

Make create_parallel_loop return void

2015-11-09  Tom de Vries  

	* tree-parloops.c (create_parallel_loop): Return void.
---
 gcc/tree-parloops.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index 6a49aa9..17415a8 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -1986,10 +1986,9 @@ transform_to_exit_first_loop (struct loop *loop,
 /* Create the parallel constructs for LOOP as described in gen_parallel_loop.
LOOP_FN and DATA are the arguments of GIMPLE_OMP_PARALLEL.
NEW_DATA is the variable that should be initialized from the argument
-   of LOOP_FN.  N_THREADS is the requested number of threads.  Returns the
-   basic block containing GIMPLE_OMP_PARALLEL tree.  */
+   of LOOP_FN.  N_THREADS is the requested number of threads.  */
 
-static basic_block
+static void
 create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
 		  tree new_data, unsigned n_threads, location_t loc)
 {
@@ -2162,8 +2161,6 @@ create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
   /* After the above dom info is hosed.  Re-compute it.  */
   free_dominance_info (CDI_DOMINATORS);
   calculate_dominance_info (CDI_DOMINATORS);
-
-  return paral_bb;
 }
 
 /* Generates code to execute the iterations of LOOP in N_THREADS
-- 
1.9.1

Re: [PATCH], Add power9 support to GCC, patch #1 (revised)

2015-11-09 Thread David Edelsohn

On Sun, Nov 8, 2015 at 4:33 PM, Michael Meissner
 wrote:
> This is patch #1 that I revised.  I changed -mfusion-toc to -mtoc-fusion.  I
> changed the references to ISA 2.08 to 3.0.  I added two new debug switches for
> code in future patches that in undergoing development and is not ready to be 
> on
> by default.
>
> I have done a bootstrap build on a little endian power8 system and there were
> no regressions in this patch.  Is it ok to install in the trunk?
>
> 2015-11-08  Michael Meissner  
>
> * config/rs6000/rs6000.opt (-mpower9-fusion): Add new switches for
> ISA 3.0 (power9).
> (-mpower9-vector): Likewise.
> (-mpower9-dform): Likewise.
> (-mpower9-minmax): Likewise.
> (-mtoc-fusion): Likewise.
> (-mmodulo): Likewise.
> (-mfloat128-hardware): Likewise.
>
> * config/rs6000/rs6000-cpus.def (ISA_3_0_MASKS_SERVER): Add option
> mask for ISA 3.0 (power9).
> (POWERPC_MASKS): Add new ISA 3.0 switches.
> (power9 cpu): Add power9 cpu.
>
> * config/rs6000/rs6000.h (ASM_CPU_POWER9_SPEC): Add support for
> power9.
> (ASM_CPU_SPEC): Likewise.
> (EXTRA_SPECS): Likewise.
>
> * config/rs6000/rs6000-opts.h (enum processor_type): Add
> PROCESSOR_POWER9.
>
> * config/rs6000/rs6000.c (power9_cost): Initial cost setup for
> power9.
> (rs6000_debug_reg_global): Add support for power9 fusion.
> (rs6000_setup_reg_addr_masks): Cache mode size.
> (rs6000_option_override_internal): Until real power9 tuning is
> added, use -mtune=power8 for -mcpu=power9.
> (rs6000_setup_reg_addr_masks): Do not allow pre-increment,
> pre-decrement, or pre-modify on SFmode/DFmode if we allow the use
> of Altivec registers.
> (rs6000_option_override_internal): Add support for ISA 3.0
> switches.
> (rs6000_loop_align): Add support for power9 cpu.
> (rs6000_file_start): Likewise.
> (rs6000_adjust_cost): Likewise.
> (rs6000_issue_rate): Likewise.
> (insn_must_be_first_in_group): Likewise.
> (insn_must_be_last_in_group): Likewise.
> (force_new_group): Likewise.
> (rs6000_register_move_cost): Likewise.
> (rs6000_opt_masks): Likewise.
>
> * config/rs6000/rs6000.md (cpu attribute): Add power9.
> * config/rs6000/rs6000-tables.opt: Regenerate.
>
> * config/rs6000/rs6000-c.c (rs6000_target_modify_macros): Define
> _ARCH_PWR9 if power9 support is available.
>
> * config/rs6000/aix61.h (ASM_CPU_SPEC): Add power9.
> * config/rs6000/aix53.h (ASM_CPU_SPEC): Likewise.
>
> * configure.ac: Determine if the assembler supports the ISA 3.0
> instructions.
> * config.in (HAVE_AS_POWER9): Likewise.
> * configure: Regenerate.
>
> * doc/invoke.texi (RS/6000 and PowerPC Options): Document ISA 3.0
> switches.

Okay.

Thanks, David

[PATCH, 5/16] Add in_oacc_kernels_region in struct loop

2015-11-09 Thread Tom de Vries


On 09/11/15 16:35, Tom de Vries wrote:

Hi,

this patch series for stage1 trunk adds support to:
- parallelize oacc kernels regions using parloops, and
- map the loops onto the oacc gang dimension.

The patch series contains these patches:

  1Insert new exit block only when needed in
 transform_to_exit_first_loop_alt
  2Make create_parallel_loop return void
  3Ignore reduction clause on kernels directive
  4Implement -foffload-alias
  5Add in_oacc_kernels_region in struct loop
  6Add pass_oacc_kernels
  7Add pass_dominator_oacc_kernels
  8Add pass_ch_oacc_kernels
  9Add pass_parallelize_loops_oacc_kernels
 10Add pass_oacc_kernels pass group in passes.def
 11Update testcases after adding kernels pass group
 12Handle acc loop directive
 13Add c-c++-common/goacc/kernels-*.c
 14Add gfortran.dg/goacc/kernels-*.f95
 15Add libgomp.oacc-c-c++-common/kernels-*.c
 16Add libgomp.oacc-fortran/kernels-*.f95

The first 9 patches are more or less independent, but patches 10-16 are
intended to be committed at the same time.

Bootstrapped and reg-tested on x86_64.

Build and reg-tested with nvidia accelerator, in combination with a
patch that enables accelerator testing (which is submitted at
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ).

I'll post the individual patches in reply to this message.


this patch adds and initializes the field in_oacc_kernels_region field 
in struct loop.


The field is used to signal to subsequent passes that we're dealing with 
a loop in a kernels region that we're trying parallelize.


Note that we do not parallelize kernels regions with more than one loop 
nest. [ In general, kernels regions with more than one loop nest should 
be split up into seperate kernels regions, but that's not supported atm. ]


Thanks,
- Tom

Add in_oacc_kernels_region in struct loop

2015-11-09  Tom de Vries  

	* cfgloop.h (struct loop): Add in_oacc_kernels_region field.
	* omp-low.c (mark_loops_in_oacc_kernels_region): New function.
	(expand_omp_target): Call mark_loops_in_oacc_kernels_region.
---
 gcc/cfgloop.h |  3 +++
 gcc/omp-low.c | 58 ++
 2 files changed, 61 insertions(+)

diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
index 6af6893..ee73bf9 100644
--- a/gcc/cfgloop.h
+++ b/gcc/cfgloop.h
@@ -191,6 +191,9 @@ struct GTY ((chain_next ("%h.next"))) loop {
   /* True if we should try harder to vectorize this loop.  */
   bool force_vectorize;
 
+  /* True if the loop is part of an oacc kernels region.  */
+  bool in_oacc_kernels_region;
+
   /* For SIMD loops, this is a unique identifier of the loop, referenced
  by IFN_GOMP_SIMD_VF, IFN_GOMP_SIMD_LANE and IFN_GOMP_SIMD_LAST_LANE
  builtins.  */
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index d052c13..7121d73 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -12429,6 +12429,61 @@ get_oacc_ifn_dim_arg (const gimple *stmt)
   return (int) axis;
 }
 
+/* Mark the loops inside the kernels region starting at REGION_ENTRY and ending
+   at REGION_EXIT.  */
+
+static void
+mark_loops_in_oacc_kernels_region (basic_block region_entry,
+   basic_block region_exit)
+{
+  bitmap dominated_bitmap = BITMAP_GGC_ALLOC ();
+  bitmap excludes_bitmap = BITMAP_GGC_ALLOC ();
+  unsigned di;
+  basic_block bb;
+
+  bitmap_clear (dominated_bitmap);
+  bitmap_clear (excludes_bitmap);
+
+  /* Get all the blocks dominated by the region entry.  That will include the
+ entire region.  */
+  vec dominated
+= get_all_dominated_blocks (CDI_DOMINATORS, region_entry);
+  FOR_EACH_VEC_ELT (dominated, di, bb)
+  bitmap_set_bit (dominated_bitmap, bb->index);
+
+  /* Exclude all the blocks which are not in the region: the blocks dominated by
+ the region exit.  */
+  if (region_exit != NULL)
+{
+  vec excludes
+	= get_all_dominated_blocks (CDI_DOMINATORS, region_exit);
+  FOR_EACH_VEC_ELT (excludes, di, bb)
+	bitmap_set_bit (excludes_bitmap, bb->index);
+}
+
+  /* Don't parallelize the kernels region if it contains more than one outer
+ loop.  */
+  unsigned int nr_outer_loops = 0;
+  struct loop *loop;
+  FOR_EACH_LOOP (loop, 0)
+{
+  if (loop_outer (loop) != current_loops->tree_root)
+	continue;
+
+  if (bitmap_bit_p (dominated_bitmap, loop->header->index)
+	  && !bitmap_bit_p (excludes_bitmap, loop->header->index))
+	nr_outer_loops++;
+}
+  if (nr_outer_loops != 1)
+return;
+
+  /* Mark the loops in the region.  */
+  FOR_EACH_LOOP (loop, 0)
+if (bitmap_bit_p (dominated_bitmap, loop->header->index)
+	&& !bitmap_bit_p (excludes_bitmap, loop->header->index))
+  loop->in_oacc_kernels_region = true;
+}
+
 /* Expand the GIMPLE_OMP_TARGET starting at REGION.  */
 
 static void
@@ -12483,6 +12538,9 @@ expand_omp_target (struct omp_region *region)
   entry_bb = region->entry;
   exit_bb = region->exit;

[PATCH 6/6] Simplify aarch64_builtin_vectorized_function

2015-11-09 Thread Richard Sandiford

After the previous patches it's no longer necessary for
TARGET_BUILTIN_VECTORIZED_FUNCTION to return functions that
map to the vector optab of the original operation.  We'll use
a vector form of the internal function instead.


gcc/
* config/aarch64/aarch64-builtins.c
(aarch64_builtin_vectorized_function): Remove entries that map
directly to optabs.

diff --git a/gcc/config/aarch64/aarch64-builtins.c 
b/gcc/config/aarch64/aarch64-builtins.c
index c4cda4f..2a560a9 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -1288,40 +1288,6 @@ aarch64_builtin_vectorized_function (unsigned int fn, 
tree type_out,
 {
 #undef AARCH64_CHECK_BUILTIN_MODE
 #define AARCH64_CHECK_BUILTIN_MODE(C, N) \
-  (out_mode == N##Fmode && out_n == C \
-   && in_mode == N##Fmode && in_n == C)
-CASE_CFN_FLOOR:
-  return AARCH64_FIND_FRINT_VARIANT (floor);
-CASE_CFN_CEIL:
-  return AARCH64_FIND_FRINT_VARIANT (ceil);
-CASE_CFN_TRUNC:
-  return AARCH64_FIND_FRINT_VARIANT (btrunc);
-CASE_CFN_ROUND:
-  return AARCH64_FIND_FRINT_VARIANT (round);
-CASE_CFN_NEARBYINT:
-  return AARCH64_FIND_FRINT_VARIANT (nearbyint);
-CASE_CFN_SQRT:
-  return AARCH64_FIND_FRINT_VARIANT (sqrt);
-#undef AARCH64_CHECK_BUILTIN_MODE
-#define AARCH64_CHECK_BUILTIN_MODE(C, N) \
-  (out_mode == SImode && out_n == C \
-   && in_mode == N##Imode && in_n == C)
-CASE_CFN_CLZ:
-  {
-   if (AARCH64_CHECK_BUILTIN_MODE (4, S))
- return aarch64_builtin_decls[AARCH64_SIMD_BUILTIN_UNOP_clzv4si];
-   return NULL_TREE;
-  }
-CASE_CFN_CTZ:
-  {
-   if (AARCH64_CHECK_BUILTIN_MODE (2, S))
- return aarch64_builtin_decls[AARCH64_SIMD_BUILTIN_UNOP_ctzv2si];
-   else if (AARCH64_CHECK_BUILTIN_MODE (4, S))
- return aarch64_builtin_decls[AARCH64_SIMD_BUILTIN_UNOP_ctzv4si];
-   return NULL_TREE;
-  }
-#undef AARCH64_CHECK_BUILTIN_MODE
-#define AARCH64_CHECK_BUILTIN_MODE(C, N) \
   (out_mode == N##Imode && out_n == C \
&& in_mode == N##Fmode && in_n == C)
 CASE_CFN_IFLOOR:

Re: [OpenACC] declare directive

2015-11-09 Thread James Norris


Jakub,

On 11/09/2015 10:21 AM, Jakub Jelinek wrote:

On Mon, Nov 09, 2015 at 10:01:32AM -0600, James Norris wrote:

+  if (lookup_attribute ("omp declare target", DECL_ATTRIBUTES (decl)))

Here you only look up "omp declare target", not "omp declare target link".
So, what happens if you mix that (once in some copy clause, once in link),
or mention twice in link, etc.?  Needs testsuite coverage and clear rules.


Will fix.




+ DECL_ATTRIBUTES (decl) =
+   tree_cons (id, NULL_TREE, DECL_ATTRIBUTES (decl));

Incorrect formatting, = goes already on the following line, no whitespace
at end of line, and next line is indented below CL from DECL.


Will fix.




+ t = build_omp_clause (OMP_CLAUSE_LOCATION (c) , OMP_CLAUSE_MAP);

Wrong formatting, no space before ,.


Will fix.


+if (ret_clauses)
+  {
+   tree fndecl = current_function_decl;
+   tree attrs = lookup_attribute ("oacc declare returns",
+  DECL_ATTRIBUTES (fndecl));

Why do you use an attribute for this?  I think adding the automatic
vars to hash_map during gimplification of the OACC_DECLARE is best.


See below (This doesn't scale...)




+   tree id = get_identifier ("oacc declare returns");
+   DECL_ATTRIBUTES (fndecl) =
+   tree_cons (id, ret_clauses, DECL_ATTRIBUTES (fndecl));

Formatting error.


Will fix.




--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -1065,6 +1065,7 @@ gimplify_bind_expr (tree *expr_p, gimple_seq *pre_p)
gimple_seq body, cleanup;
gcall *stack_save;
location_t start_locus = 0, end_locus = 0;
+  tree ret_clauses = NULL;
  
tree temp = voidify_wrapper_expr (bind_expr, NULL);
  
@@ -1166,9 +1167,56 @@ gimplify_bind_expr (tree *expr_p, gimple_seq *pre_p)

  clobber_stmt = gimple_build_assign (t, clobber);
  gimple_set_location (clobber_stmt, end_locus);
  gimplify_seq_add_stmt (, clobber_stmt);
+
+ if (flag_openacc)
+   {
+ tree attrs = lookup_attribute ("oacc declare returns",
+  DECL_ATTRIBUTES (current_function_decl));
+ tree clauses, c, c_next = NULL, c_prev = NULL;
+
+ if (!attrs)
+   break;
+
+ clauses = TREE_VALUE (attrs);
+
+ for (c = clauses; c; c_prev = c, c = c_next)
+   {
+ c_next = OMP_CLAUSE_CHAIN (c);
+
+ if (t == OMP_CLAUSE_DECL (c))
+   {
+ if (ret_clauses)
+   OMP_CLAUSE_CHAIN (c) = ret_clauses;
+
+ ret_clauses = c;
+
+ if (c_prev == NULL)
+   clauses = c_next;
+ else
+   OMP_CLAUSE_CHAIN (c_prev) = c_next;
+   }
+   }

This doesn't really scale.  Consider 1 clauses on various
oacc declare constructs in a single function, and 100 automatic
variables in such a function.
So, what I'm suggesting is during gimplification of OACC_DECLARE,
if you find a clause on an automatic variable in the current function
that you want to unmap afterwards, have a
static hash_map *oacc_declare_returns;
and you just add into the hash map the VAR_DECL -> the clause you want,
then in this spot you check
   if (oacc_declare_returns)
 {
   clause = lookup in hash_map (t);
   if (clause)
{
  ...
}
 }


Now I see what you were getting at in using the hash_map. I didn't
consider creating a static hash_map and populating it as you suggest.

Thank you!




+
+ if (clauses == NULL)
+   {
+ DECL_ATTRIBUTES (current_function_decl) =
+   remove_attribute ("oacc declare returns",
+   DECL_ATTRIBUTES (current_function_decl));

Wrong formatting.


Will fix.



Jakub


Thanks for taking the time to review.

Jim

[PATCH 11/12] always define HAVE_AS_LEB128

2015-11-09 Thread tbsaunde+gcc

From: Trevor Saunders 

gcc/ChangeLog:

2015-11-09  Trevor Saunders  

* acinclude.m4: Always define HAVE_AS_LEB128.
* configure: Regenerate.
* configure.ac: Adjust.
* dwarf2asm.c (dw2_asm_output_data_uleb128): Likewise.
(dw2_asm_output_data_sleb128): Likewise.
(dw2_asm_output_delta_uleb128): Likewise.
(dw2_asm_output_delta_sleb128): Likewise.
* except.c (output_one_function_exception_table): Likewise.
---
 gcc/acinclude.m4 |  4 +++
 gcc/configure| 98 +++-
 gcc/configure.ac |  2 ++
 gcc/dwarf2asm.c  |  8 ++---
 gcc/except.c | 18 +--
 5 files changed, 116 insertions(+), 14 deletions(-)

diff --git a/gcc/acinclude.m4 b/gcc/acinclude.m4
index b8a4c28..e7d75c8 100644
--- a/gcc/acinclude.m4
+++ b/gcc/acinclude.m4
@@ -550,6 +550,10 @@ AC_CACHE_CHECK([assembler for $1], [$2],
 ifelse([$7],,,[dnl
 if test $[$2] = yes; then
   $7
+fi])
+ifelse([$8],,,[dnl
+if test $[$2] != yes; then
+  $8
 fi])])
 
 dnl gcc_SUN_LD_VERSION
diff --git a/gcc/configure b/gcc/configure
index de6cf13..14d828c 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -22411,6 +22411,7 @@ $as_echo "#define HAVE_GAS_BALIGN_AND_P2ALIGN 1" 
>>confdefs.h
 
 fi
 
+
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for .p2align with 
maximum skip" >&5
 $as_echo_n "checking assembler for .p2align with maximum skip... " >&6; }
 if test "${gcc_cv_as_max_skip_p2align+set}" = set; then :
@@ -22446,6 +22447,7 @@ $as_echo "#define HAVE_GAS_MAX_SKIP_P2ALIGN 1" 
>>confdefs.h
 
 fi
 
+
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for .literal16" >&5
 $as_echo_n "checking assembler for .literal16... " >&6; }
 if test "${gcc_cv_as_literal16+set}" = set; then :
@@ -22481,6 +22483,7 @@ $as_echo "#define HAVE_GAS_LITERAL16 1" >>confdefs.h
 
 fi
 
+
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for working 
.subsection -1" >&5
 $as_echo_n "checking assembler for working .subsection -1... " >&6; }
 if test "${gcc_cv_as_subsection_m1+set}" = set; then :
@@ -22528,6 +22531,7 @@ $as_echo "#define HAVE_GAS_SUBSECTION_ORDERING 1" 
>>confdefs.h
 
 fi
 
+
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for .weak" >&5
 $as_echo_n "checking assembler for .weak... " >&6; }
 if test "${gcc_cv_as_weak+set}" = set; then :
@@ -22563,6 +22567,7 @@ $as_echo "#define HAVE_GAS_WEAK 1" >>confdefs.h
 
 fi
 
+
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for .weakref" >&5
 $as_echo_n "checking assembler for .weakref... " >&6; }
 if test "${gcc_cv_as_weakref+set}" = set; then :
@@ -22598,6 +22603,7 @@ $as_echo "#define HAVE_GAS_WEAKREF 1" >>confdefs.h
 
 fi
 
+
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for .nsubspa 
comdat" >&5
 $as_echo_n "checking assembler for .nsubspa comdat... " >&6; }
 if test "${gcc_cv_as_nsubspa_comdat+set}" = set; then :
@@ -22634,6 +22640,7 @@ $as_echo "#define HAVE_GAS_NSUBSPA_COMDAT 1" 
>>confdefs.h
 
 fi
 
+
 # .hidden needs to be supported in both the assembler and the linker,
 # because GNU LD versions before 2.12.1 have buggy support for STV_HIDDEN.
 # This is irritatingly difficult to feature test for; we have to check the
@@ -22673,6 +22680,7 @@ fi
 { $as_echo "$as_me:${as_lineno-$LINENO}: result: $gcc_cv_as_hidden" >&5
 $as_echo "$gcc_cv_as_hidden" >&6; }
 
+
 case "${target}" in
   *-*-darwin*)
 # Darwin as has some visibility support, though with a different syntax.
@@ -23125,6 +23133,11 @@ if test $gcc_cv_as_leb128 = yes; then
 $as_echo "#define HAVE_AS_LEB128 1" >>confdefs.h
 
 fi
+if test $gcc_cv_as_leb128 != yes; then
+
+$as_echo "#define HAVE_AS_LEB128 0" >>confdefs.h
+
+fi
 
 # Check if we have assembler support for unwind directives.
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for cfi 
directives" >&5
@@ -23204,6 +23217,7 @@ fi
 { $as_echo "$as_me:${as_lineno-$LINENO}: result: $gcc_cv_as_cfi_directive" >&5
 $as_echo "$gcc_cv_as_cfi_directive" >&6; }
 
+
 if test $gcc_cv_as_cfi_directive = yes && test x$gcc_cv_objdump != x; then
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for working cfi 
advance" >&5
 $as_echo_n "checking assembler for working cfi advance... " >&6; }
@@ -23241,6 +23255,7 @@ fi
 { $as_echo "$as_me:${as_lineno-$LINENO}: result: 
$gcc_cv_as_cfi_advance_working" >&5
 $as_echo "$gcc_cv_as_cfi_advance_working" >&6; }
 
+
 else
   # no objdump, err on the side of caution
   gcc_cv_as_cfi_advance_working=no
@@ -23284,6 +23299,7 @@ fi
 $as_echo "$gcc_cv_as_cfi_personality_directive" >&6; }
 
 
+
 cat >>confdefs.h <<_ACEOF
 #define HAVE_GAS_CFI_PERSONALITY_DIRECTIVE `if test 
$gcc_cv_as_cfi_personality_directive = yes;
 then echo 1; else echo 0; fi`
@@ -23336,6 +23352,7 @@ $as_echo "$gcc_cv_as_cfi_sections_directive" >&6; }
 
 
 
+
 cat >>confdefs.h <<_ACEOF
 #define HAVE_GAS_CFI_SECTIONS_DIRECTIVE `if test

[PATCH 3/6] Vectorize internal functions

2015-11-09 Thread Richard Sandiford

This patch tries to vectorize built-in and internal functions as
internal functions first, falling back on the current built-in
target hooks otherwise.


gcc/
* internal-fn.h (direct_internal_fn_info): Add vectorizable flag.
* internal-fn.c (direct_internal_fn_array): Update accordingly.
* tree-vectorizer.h (vectorizable_function): Delete.
* tree-vect-stmts.c: Include internal-fn.h.
(vectorizable_internal_function): New function.
(vectorizable_function): Inline into...
(vectorizable_call): ...here.  Explicitly reject calls that read
from or write to memory.  Try using an internal function before
falling back on the old vectorizable_function behavior.

diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index 898c83d..a5bda2f 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -69,13 +69,13 @@ init_internal_fns ()
 
 /* Create static initializers for the information returned by
direct_internal_fn.  */
-#define not_direct { -2, -2 }
-#define mask_load_direct { -1, -1 }
-#define load_lanes_direct { -1, -1 }
-#define mask_store_direct { 3, 3 }
-#define store_lanes_direct { 0, 0 }
-#define unary_direct { 0, 0 }
-#define binary_direct { 0, 0 }
+#define not_direct { -2, -2, false }
+#define mask_load_direct { -1, -1, false }
+#define load_lanes_direct { -1, -1, false }
+#define mask_store_direct { 3, 3, false }
+#define store_lanes_direct { 0, 0, false }
+#define unary_direct { 0, 0, true }
+#define binary_direct { 0, 0, true }
 
 const direct_internal_fn_info direct_internal_fn_array[IFN_LAST + 1] = {
 #define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) not_direct,
diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h
index 6cb123f..aea6abd 100644
--- a/gcc/internal-fn.h
+++ b/gcc/internal-fn.h
@@ -134,6 +134,14 @@ struct direct_internal_fn_info
  function isn't directly mapped to an optab.  */
   signed int type0 : 8;
   signed int type1 : 8;
+  /* True if the function is pointwise, so that it can be vectorized by
+ converting the return type and all argument types to vectors of the
+ same number of elements.  E.g. we can vectorize an IFN_SQRT on
+ floats as an IFN_SQRT on vectors of N floats.
+
+ This only needs 1 bit, but occupies the full 16 to ensure a nice
+ layout.  */
+  unsigned int vectorizable : 16;
 };
 
 extern const direct_internal_fn_info direct_internal_fn_array[IFN_LAST + 1];
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 75389c4..1142142 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -47,6 +47,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-scalar-evolution.h"
 #include "tree-vectorizer.h"
 #include "builtins.h"
+#include "internal-fn.h"
 
 /* For lang_hooks.types.type_for_mode.  */
 #include "langhooks.h"
@@ -1632,27 +1633,32 @@ vect_finish_stmt_generation (gimple *stmt, gimple 
*vec_stmt,
 add_stmt_to_eh_lp (vec_stmt, lp_nr);
 }
 
-/* Checks if CALL can be vectorized in type VECTYPE.  Returns
-   a function declaration if the target has a vectorized version
-   of the function, or NULL_TREE if the function cannot be vectorized.  */
+/* We want to vectorize a call to combined function CFN with function
+   decl FNDECL, using VECTYPE_OUT as the type of the output and VECTYPE_IN
+   as the types of all inputs.  Check whether this is possible using
+   an internal function, returning its code if so or IFN_LAST if not.  */
 
-tree
-vectorizable_function (gcall *call, tree vectype_out, tree vectype_in)
+static internal_fn
+vectorizable_internal_function (combined_fn cfn, tree fndecl,
+   tree vectype_out, tree vectype_in)
 {
-  /* We only handle functions that do not read or clobber memory.  */
-  if (gimple_vuse (call))
-return NULL_TREE;
-
-  combined_fn fn = gimple_call_combined_fn (call);
-  if (fn != CFN_LAST)
-return targetm.vectorize.builtin_vectorized_function
-  (fn, vectype_out, vectype_in);
-
-  if (gimple_call_builtin_p (call, BUILT_IN_MD))
-return targetm.vectorize.builtin_md_vectorized_function
-  (gimple_call_fndecl (call), vectype_out, vectype_in);
-
-  return NULL_TREE;
+  internal_fn ifn;
+  if (internal_fn_p (cfn))
+ifn = as_internal_fn (cfn);
+  else
+ifn = associated_internal_fn (fndecl);
+  if (ifn != IFN_LAST && direct_internal_fn_p (ifn))
+{
+  const direct_internal_fn_info  = direct_internal_fn (ifn);
+  if (info.vectorizable)
+   {
+ tree type0 = (info.type0 < 0 ? vectype_out : vectype_in);
+ tree type1 = (info.type1 < 0 ? vectype_out : vectype_in);
+ if (direct_internal_fn_supported_p (ifn, tree_pair (type0, type1)))
+   return ifn;
+   }
+}
+  return IFN_LAST;
 }
 
 
@@ -2232,15 +2238,43 @@ vectorizable_call (gimple *gs, gimple_stmt_iterator 
*gsi, gimple **vec_stmt,
   else
 return false;
 
+  /* We only handle functions that do not read or clobber memory.  */
+  if (gimple_vuse (stmt))
+{
+

[PATCH 07/12] always define DBX_DEBUGGING_INFO

2015-11-09 Thread tbsaunde+gcc

From: Trevor Saunders 

gcc/ChangeLog:

2015-11-09  Trevor Saunders  

* config/arc/arc.h: Define DBX_DEBUGGING_INFO to 1.
* config/pdp11/pdp11.h: Likewise.
* defaults.h (DBX_DEBUGGING_INFO): New default definition.
* config/rs6000/rs6000.c (macho_branch_islands): Adjust.
* dbxout.c (struct dbx_file): Likewise.
(debug_flush_symbol_queue): Likewise.
(default_stabs_asm_out_destructor): Likewise.
(default_stabs_asm_out_constructor): Likewise.
* dbxout.h: Likewise.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in: Adjust.
* final.c: Likewise.
* gcc.c: Likewise.
* opts.c (set_debug_level): Likewise.
* toplev.c (process_options): Likewise.
---
 gcc/config/arc/arc.h   |  2 +-
 gcc/config/pdp11/pdp11.h   |  2 +-
 gcc/config/rs6000/rs6000.c |  4 ++--
 gcc/dbxout.c   | 20 ++--
 gcc/dbxout.h   |  2 +-
 gcc/defaults.h |  8 ++--
 gcc/doc/tm.texi|  4 ++--
 gcc/doc/tm.texi.in |  4 ++--
 gcc/final.c|  2 +-
 gcc/gcc.c  |  4 ++--
 gcc/opts.c |  2 +-
 gcc/toplev.c   |  6 ++
 12 files changed, 31 insertions(+), 29 deletions(-)

diff --git a/gcc/config/arc/arc.h b/gcc/config/arc/arc.h
index cb98bda..b40b04f 100644
--- a/gcc/config/arc/arc.h
+++ b/gcc/config/arc/arc.h
@@ -1441,7 +1441,7 @@ extern int arc_return_address_regs[4];
 #ifdef DBX_DEBUGGING_INFO
 #undef DBX_DEBUGGING_INFO
 #endif
-#define DBX_DEBUGGING_INFO
+#define DBX_DEBUGGING_INFO 1
 
 #ifdef DWARF2_DEBUGGING_INFO
 #undef DWARF2_DEBUGGING_INFO
diff --git a/gcc/config/pdp11/pdp11.h b/gcc/config/pdp11/pdp11.h
index 8339f1c..2c82e2c 100644
--- a/gcc/config/pdp11/pdp11.h
+++ b/gcc/config/pdp11/pdp11.h
@@ -38,7 +38,7 @@ along with GCC; see the file COPYING3.  If not see
 
 /* Generate DBX debugging information.  */
 
-#define DBX_DEBUGGING_INFO
+#define DBX_DEBUGGING_INFO 1
 
 #define TARGET_40_PLUS (TARGET_40 || TARGET_45)
 #define TARGET_10  (! TARGET_40_PLUS)
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 6ed82cb..4ccee23 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -30455,7 +30455,7 @@ macho_branch_islands (void)
}
   strcpy (tmp_buf, "\n");
   strcat (tmp_buf, label);
-#if defined (DBX_DEBUGGING_INFO) || (XCOFF_DEBUGGING_INFO)
+#if (DBX_DEBUGGING_INFO) || (XCOFF_DEBUGGING_INFO)
   if (write_symbols == DBX_DEBUG || write_symbols == XCOFF_DEBUG)
dbxout_stabd (N_SLINE, bi->line_number);
 #endif /* DBX_DEBUGGING_INFO || XCOFF_DEBUGGING_INFO */
@@ -30505,7 +30505,7 @@ macho_branch_islands (void)
  strcat (tmp_buf, ")\n\tmtctr r12\n\tbctr");
}
   output_asm_insn (tmp_buf, 0);
-#if defined (DBX_DEBUGGING_INFO) || (XCOFF_DEBUGGING_INFO)
+#if (DBX_DEBUGGING_INFO) || (XCOFF_DEBUGGING_INFO)
   if (write_symbols == DBX_DEBUG || write_symbols == XCOFF_DEBUG)
dbxout_stabd (N_SLINE, bi->line_number);
 #endif /* DBX_DEBUGGING_INFO || XCOFF_DEBUGGING_INFO */
diff --git a/gcc/dbxout.c b/gcc/dbxout.c
index d9bd59f..993ceda 100644
--- a/gcc/dbxout.c
+++ b/gcc/dbxout.c
@@ -217,7 +217,7 @@ struct dbx_file
should always be 0 because we should not have needed any file numbers
yet.  */
 
-#if (defined (DBX_DEBUGGING_INFO) || (XCOFF_DEBUGGING_INFO)) \
+#if ((DBX_DEBUGGING_INFO) || (XCOFF_DEBUGGING_INFO)) \
 && defined (DBX_USE_BINCL)
 static struct dbx_file *current_file;
 #endif
@@ -250,7 +250,7 @@ static GTY(()) int lastfile_is_base;
 /* Typical USG systems don't have stab.h, and they also have
no use for DBX-format debugging info.  */
 
-#if defined (DBX_DEBUGGING_INFO) || (XCOFF_DEBUGGING_INFO)
+#if (DBX_DEBUGGING_INFO) || (XCOFF_DEBUGGING_INFO)
 
 #ifdef DBX_USE_BINCL
 /* If zero then there is no pending BINCL.  */
@@ -329,7 +329,7 @@ static void dbxout_handle_pch (unsigned);
 static void debug_free_queue (void);
 
 /* The debug hooks structure.  */
-#if defined (DBX_DEBUGGING_INFO)
+#if (DBX_DEBUGGING_INFO)
 
 static void dbxout_source_line (unsigned int, const char *, int, bool);
 static void dbxout_begin_prologue (unsigned int, const char *);
@@ -860,7 +860,7 @@ dbxout_finish_complex_stabs (tree sym, stab_code_type code,
   obstack_free (_ob, str);
 }
 
-#if defined (DBX_DEBUGGING_INFO) || (XCOFF_DEBUGGING_INFO)
+#if (DBX_DEBUGGING_INFO) || (XCOFF_DEBUGGING_INFO)
 
 /* When -gused is used, emit debug info for only used symbols. But in
addition to the standard intercepted debug_hooks there are some
@@ -885,7 +885,7 @@ static int symbol_queue_size = 0;
 
 #endif /* DBX_DEBUGGING_INFO || XCOFF_DEBUGGING_INFO */
 
-#if defined (DBX_DEBUGGING_INFO)
+#if (DBX_DEBUGGING_INFO)
 
 static void
 dbxout_function_end (tree decl ATTRIBUTE_UNUSED)
@@ -1207,7 +1207,7 @@ dbxout_handle_pch

Re: [Patch AArch64] Switch constant pools to separate rodata sections.

2015-11-09 Thread Ramana Radhakrishnan

On 08/11/15 11:42, Andreas Schwab wrote:
> This is causing a bootstrap comparison failure in gcc/go/gogo.o.
> 
> Andreas.
> 

I've had a look at this for sometime this afternoon and the trigger is the 
aarch64_use_constant_blocks_p change which appears to be causing a bootstrap 
comparison failure because of differences to offsets in add instructions when 
built with debug and without debug. For now, in the interest of go bootstraps 
continuing on trunk - I'm proposing a patch that partially rolls back the 
change in aarch64_use_constant_blocks_p and will still look into the underlying 
issue.

Bootstrapped on aarch64-none-linux-gnu including (c,c++ and go) - testing 
finished ok.

Ok ?

Ramana

PR bootstrap/68256

* config/aarch64/aarch64.c (aarch64_use_constant_blocks_p): Return false.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 1b7be83..1fff878 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -5251,9 +5251,11 @@ aarch64_can_use_per_function_literal_pools_p (void)
 static bool
 aarch64_use_blocks_for_constant_p (machine_mode, const_rtx)
 {
-  /* We can't use blocks for constants when we're using a per-function
- constant pool.  */
-  return !aarch64_can_use_per_function_literal_pools_p ();
+  /* Fixme:: In an ideal world this would work similar
+ to the logic in aarch64_select_rtx_section but this
+ breaks bootstrap in gcc go.  For now we workaround
+ this by returning false here.  */
+  return false;
 }

 /* Select appropriate section for constants depending

[PATCH series, 16] Use parloops to parallelize oacc kernels regions

2015-11-09 Thread Tom de Vries


Hi,

this patch series for stage1 trunk adds support to:
- parallelize oacc kernels regions using parloops, and
- map the loops onto the oacc gang dimension.

The patch series contains these patches:

 1  Insert new exit block only when needed in
transform_to_exit_first_loop_alt
 2  Make create_parallel_loop return void
 3  Ignore reduction clause on kernels directive
 4  Implement -foffload-alias
 5  Add in_oacc_kernels_region in struct loop
 6  Add pass_oacc_kernels
 7  Add pass_dominator_oacc_kernels
 8  Add pass_ch_oacc_kernels
 9  Add pass_parallelize_loops_oacc_kernels
10  Add pass_oacc_kernels pass group in passes.def
11  Update testcases after adding kernels pass group
12  Handle acc loop directive
13  Add c-c++-common/goacc/kernels-*.c
14  Add gfortran.dg/goacc/kernels-*.f95
15  Add libgomp.oacc-c-c++-common/kernels-*.c
16  Add libgomp.oacc-fortran/kernels-*.f95

The first 9 patches are more or less independent, but patches 10-16 are 
intended to be committed at the same time.


Bootstrapped and reg-tested on x86_64.

Build and reg-tested with nvidia accelerator, in combination with a 
patch that enables accelerator testing (which is submitted at 
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ).


I'll post the individual patches in reply to this message.

Thanks,
- Tom
---

1
Insert new exit block only when needed in transform_to_exit_first_loop_alt

2015-06-30  Tom de Vries  

* tree-parloops.c (transform_to_exit_first_loop_alt): Insert new exit
block only when needed.
---

2
Make create_parallel_loop return void

2015-11-09  Tom de Vries  

* tree-parloops.c (create_parallel_loop): Return void.
---

3
Ignore reduction clause on kernels directive

2015-11-08  Tom de Vries  

* c-omp.c (c_oacc_split_loop_clauses): Don't copy OMP_CLAUSE_REDUCTION,
classify as loop clause.
---

4
Implement -foffload-alias

2015-11-03  Tom de Vries  

* common.opt (foffload-alias): New option.
* flag-types.h (enum offload_alias): New enum.
* omp-low.c (install_var_field): Handle flag_offload_alias.
* doc/invoke.texi (@item Code Generation Options): Add -foffload-alias.
(@item -foffload-alias): New item.

* c-c++-common/goacc/kernels-loop-offload-alias-none.c: New test.
* c-c++-common/goacc/kernels-loop-offload-alias-ptr.c: New test.
---

5
Add in_oacc_kernels_region in struct loop

2015-11-09  Tom de Vries  

* cfgloop.h (struct loop): Add in_oacc_kernels_region field.
* omp-low.c (mark_loops_in_oacc_kernels_region): New function.
(expand_omp_target): Call mark_loops_in_oacc_kernels_region.
---

6
Add pass_oacc_kernels

2015-11-09  Tom de Vries  

* tree-pass.h (make_pass_oacc_kernels): Declare.
* tree-ssa-loop.c (gate_oacc_kernels): New static function.
(pass_data_oacc_kernels): New pass_data.
(class pass_oacc_kernels): New pass.
(make_pass_oacc_kernels): New function.
---

7
Add pass_dominator_oacc_kernels

2015-11-09  Tom de Vries  

* tree-pass.h (make_pass_dominator_oacc_kernels): Declare.
* tree-ssa-dom.c (class dominator_base): New class.  Factor out of ...
(class pass_dominator): ... here.
(dominator_base::may_peel_loop_headers_p)
(pass_dominator::may_peel_loop_headers_p): New function.
(pass_dominator_oacc_kernels): New pass.
(make_pass_dominator_oacc_kernels): New function.
(dominator_base::execute): Use may_peel_loop_headers_p.
---

8
Add pass_ch_oacc_kernels

2015-11-09  Tom de Vries  

* tree-pass.h (make_pass_ch_oacc_kernels): Declare.
* tree-ssa-loop-ch.c (pass_ch::pass_ch (pass_data, gcc::context)): New
constructor.
(pass_data_ch_oacc_kernels): New pass_data.
(class pass_ch_oacc_kernels): New pass.
(pass_ch_oacc_kernels::process_loop_p): New function.
(make_pass_ch_oacc_kernels): New function.
---

9
Add pass_parallelize_loops_oacc_kernels

2015-11-09  Tom de Vries  

* omp-low.c (set_oacc_fn_attrib): Make extern.
* omp-low.c (expand_omp_atomic_fetch_op):  Release defs of update stmt.
* omp-low.h (set_oacc_fn_attrib): Declare.
* tree-parloops.c (struct reduction_info): Add reduc_addr field.
(create_call_for_reduction_1): Handle case that reduc_addr is non-NULL.
(create_parallel_loop, gen_parallel_loop, try_create_reduction_list):
Add and handle function parameter oacc_kernels_p.
(get_omp_data_i_param): New function.
(ref_conflicts_with_region, oacc_entry_exit_ok_1)
(oacc_entry_exit_single_gang, oacc_entry_exit_ok): New function.
(parallelize_loops): Add

1 2 3 >

1 - 100 of 249 matches

Mail list logo