[PATCH] Fix PR68248
Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2015-11-09 Richard BienerPR tree-optimization/68248 * tree-vect-generic.c (expand_vector_operations_1): Handle scalar rhs2. * gcc.dg/torture/pr68248.c: New testcase. Index: gcc/tree-vect-generic.c === *** gcc/tree-vect-generic.c (revision 230003) --- gcc/tree-vect-generic.c (working copy) *** expand_vector_operations_1 (gimple_stmt_ *** 1527,1532 --- 1528,1535 tree srhs1, srhs2 = NULL_TREE; if ((srhs1 = ssa_uniform_vector_p (rhs1)) != NULL_TREE && (rhs2 == NULL_TREE + || (! VECTOR_TYPE_P (TREE_TYPE (rhs2)) + && (srhs2 = rhs2)) || (srhs2 = ssa_uniform_vector_p (rhs2)) != NULL_TREE) /* As we query direct optabs restrict to non-convert operations. */ && TYPE_MODE (TREE_TYPE (type)) == TYPE_MODE (TREE_TYPE (srhs1))) Index: gcc/testsuite/gcc.dg/torture/pr68248.c === *** gcc/testsuite/gcc.dg/torture/pr68248.c (revision 0) --- gcc/testsuite/gcc.dg/torture/pr68248.c (working copy) *** *** 0 --- 1,20 + /* { dg-do compile } */ + + int a, b, c, d; + + int + fn1 (int p1) + { + return a > 0 ? p1 : p1 >> a; + } + + void + fn2 () + { + char e; + for (; c; c++) + { + e = fn1 (!d ^ 2); + b ^= e; + } + }
Re: [Patch] Change to argument promotion in fixed conversion library calls
On Fri, Nov 6, 2015 at 8:14 PM, Bernd Schmidtwrote: > On 11/06/2015 08:04 PM, Steve Ellcey wrote: >> >> When I made this change I had one regression in the GCC testsuite >> (gcc.dg/fixed-point/convert-sat.c). I tracked this down to the >> fact that emit_library_call_value_1 does not do any argument promotion >> because it does not have the original tree type information for library >> calls. It only knows about modes. I can't change >> emit_library_call_value_1 >> to do the promotion because it does not know whether to do a signed or >> unsigned promotion, but expand_fixed_convert could do the conversion >> before calling emit_library_call_value_1 and that is what this patch does. > > > Hmm, difficult. I can see how there would be a problem, but considering how > many calls to emit_library_call_* we have, I'm slightly worried whether this > is really is a good approach. > > On the other hand, there seems to be precedent in this file: > > if (GET_MODE_PRECISION (GET_MODE (from)) < GET_MODE_PRECISION > (SImode)) > from = convert_to_mode (SImode, from, unsignedp); > >> The 'real' long term fix for this problem is to have tree types for >> builtin >> functions so the proper promotions can always be done but that is a fairly >> large change that I am not willing to tackle right now and it could >> probably >> not be done in time for GCC 6.0 anyway. > > > Yeah, but I agree that this is the real fix. We should aim to get rid of the > emit_library_call functions. Indeed. In the "great plan" of simplifying RTL expansion by moving stuff up to the GIMPLE level this could be done in a lowering stage lowering all operations that we need to do via libcalls to GIMPLE calls. Now, we'd either need proper function declarations for all libcalls of optabs for this or have the optab internal function stuff from Richard also provide the libcall fallback. In the expansion code for the as-libcall path we can then simply use the type of the incoming argument (as we could if emit_library_call_value_1 would in addition to the RTX operands also receive the original tree ones). Richard. >> + if (SCALAR_INT_MODE_P (from_mode)) >> +{ >> + /* If we need to promote the integer function argument we need to >> do > > > Extra space at the start of the comment. > >> + it here instead of inside emit_library_call_value because here >> we >> + know if we should be doing a signed or unsigned promotion. */ >> + >> + machine_mode arg_mode; >> + int unsigned_p = 0; >> + >> + arg_mode = promote_function_mode (NULL_TREE, from_mode, >> + _p, NULL_TREE, 0); >> + if (arg_mode != from_mode) >> + { >> + from = convert_to_mode (arg_mode, from, uintp); >> + from_mode = arg_mode; >> + } >> +} > > > Move this into a separate function (prepare_libcall_arg)? I'll think about > it over the weekend and let others chime in if they want, but I think I'll > probably end up approving it with that change. > > > Bernd
[PATCH] Improve BB vectorization dependence analysis
Currently BB vectorization computes all dependences inside a BB region and fails all vectorization if it cannot handle some of them. This is obviously not needed - BB vectorization can restrict the dependence tests to those that are needed to apply the load/store motion effectively performed by the vectorization (sinking all participating loads/stores to the place of the last one). With restructuring it that way it's also easy to not give up completely but only for the SLP instance we cannot vectorize (this gives a slight bump in my SPEC CPU 2006 testing to 756 vectorized basic block regions). But first and foremost this patch is to reduce the dependence analysis cost and somewhat mitigate the compile-time effects of the first patch. For fixing PR56118 only a cost model issue remains. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. Richard. 2015-11-09 Richard BienerPR tree-optimization/56118 * tree-vectorizer.h (vect_find_last_scalar_stmt_in_slp): Declare. * tree-vect-slp.c (vect_find_last_scalar_stmt_in_slp): Export. * tree-vect-data-refs.c (vect_slp_analyze_node_dependences): New function. (vect_slp_analyze_data_ref_dependences): Instead of computing all dependences of the region DRs just analyze the code motions SLP vectorization will perform. Remove SLP instances that cannot have their store/load motions applied. (vect_analyze_data_refs): Allow DRs without a vectype in BB vectorization. * gcc.dg/vect/no-tree-sra-bb-slp-pr50730.c: Adjust. Index: gcc/tree-vectorizer.h === *** gcc/tree-vectorizer.h.orig 2015-11-09 11:01:55.688175321 +0100 --- gcc/tree-vectorizer.h 2015-11-09 11:02:18.987432840 +0100 *** extern void vect_detect_hybrid_slp (loop *** 1075,1080 --- 1075,1081 extern void vect_get_slp_defs (vec , slp_tree, vec *, int); extern bool vect_slp_bb (basic_block); + extern gimple *vect_find_last_scalar_stmt_in_slp (slp_tree); /* In tree-vect-patterns.c. */ /* Pattern recognition functions. Index: gcc/tree-vect-data-refs.c === *** gcc/tree-vect-data-refs.c.orig 2015-11-09 10:22:33.140125722 +0100 --- gcc/tree-vect-data-refs.c 2015-11-09 11:33:05.503874719 +0100 *** vect_slp_analyze_data_ref_dependence (st *** 581,586 --- 581,629 } + /* Analyze dependences involved in the transform of SLP NODE. */ + + static bool + vect_slp_analyze_node_dependences (slp_instance instance, slp_tree node) + { + /* This walks over all stmts involved in the SLP load/store done + in NODE verifying we can sink them up to the last stmt in the + group. */ + gimple *last_access = vect_find_last_scalar_stmt_in_slp (node); + for (unsigned k = 0; k < SLP_INSTANCE_GROUP_SIZE (instance); ++k) + { + gimple *access = SLP_TREE_SCALAR_STMTS (node)[k]; + if (access == last_access) + continue; + stmt_vec_info access_stmt_info = vinfo_for_stmt (access); + gimple_stmt_iterator gsi = gsi_for_stmt (access); + gsi_next (); + for (; gsi_stmt (gsi) != last_access; gsi_next ()) + { + gimple *stmt = gsi_stmt (gsi); + stmt_vec_info stmt_info = vinfo_for_stmt (stmt); + if (!STMT_VINFO_DATA_REF (stmt_info) + || (DR_IS_READ (STMT_VINFO_DATA_REF (stmt_info)) + && DR_IS_READ (STMT_VINFO_DATA_REF (access_stmt_info + continue; + + ddr_p ddr = initialize_data_dependence_relation + (STMT_VINFO_DATA_REF (access_stmt_info), + STMT_VINFO_DATA_REF (stmt_info), vNULL); + if (vect_slp_analyze_data_ref_dependence (ddr)) + { + /* ??? If the dependence analysis failed we can resort to the +alias oracle which can handle more kinds of stmts. */ + free_dependence_relation (ddr); + return false; + } + free_dependence_relation (ddr); + } + } + return true; + } + + /* Function vect_analyze_data_ref_dependences. Examine all the data references in the basic-block, and make sure there *** vect_slp_analyze_data_ref_dependence (st *** 590,610 bool vect_slp_analyze_data_ref_dependences (bb_vec_info bb_vinfo) { - struct data_dependence_relation *ddr; - unsigned int i; - if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, "=== vect_slp_analyze_data_ref_dependences ===\n"); ! if (!compute_all_dependences (BB_VINFO_DATAREFS (bb_vinfo), ! _VINFO_DDRS (bb_vinfo), ! vNULL, true)) ! return false; ! FOR_EACH_VEC_ELT (BB_VINFO_DDRS (bb_vinfo), i, ddr) ! if
[PATCH] 02/N Fix memory leaks in IPA
Hi. Following changes were consulted with Martin Jambor to properly release memory in IPA. It fixes leaks which popped up in tramp3d with -O2. Bootstrap and regression tests have been running. Ready after it finishes? Thanks, Martin >From 85b63f738030dd7a901c228ba76e24f820d31c5d Mon Sep 17 00:00:00 2001 From: marxinDate: Mon, 9 Nov 2015 12:38:27 +0100 Subject: [PATCH 2/2] Fix memory leaks in IPA. gcc/ChangeLog: 2015-11-09 Martin Liska * ipa-inline-analysis.c (estimate_function_body_sizes): Call body_info release function. * ipa-prop.c (ipa_release_body_info): New function. (ipa_analyze_node): Call the function. (ipa_node_params::~ipa_node_params): Release known_csts. * ipa-prop.h (ipa_release_body_info): Declare. --- gcc/ipa-inline-analysis.c | 2 +- gcc/ipa-prop.c| 20 +++- gcc/ipa-prop.h| 2 +- 3 files changed, 17 insertions(+), 7 deletions(-) diff --git a/gcc/ipa-inline-analysis.c b/gcc/ipa-inline-analysis.c index c07b0da..8c8b8e3 100644 --- a/gcc/ipa-inline-analysis.c +++ b/gcc/ipa-inline-analysis.c @@ -2853,7 +2853,7 @@ estimate_function_body_sizes (struct cgraph_node *node, bool early) inline_summaries->get (node)->self_time = time; inline_summaries->get (node)->self_size = size; nonconstant_names.release (); - fbi.bb_infos.release (); + ipa_release_body_info (); if (opt_for_fn (node->decl, optimize)) { if (!early) diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c index d15f0eb..f379ea7 100644 --- a/gcc/ipa-prop.c +++ b/gcc/ipa-prop.c @@ -2258,6 +2258,19 @@ analysis_dom_walker::before_dom_children (basic_block bb) ipa_compute_jump_functions_for_bb (m_fbi, bb); } +/* Release body info FBI. */ + +void +ipa_release_body_info (struct ipa_func_body_info *fbi) +{ + int i; + struct ipa_bb_info *bi; + + FOR_EACH_VEC_ELT (fbi->bb_infos, i, bi) +free_ipa_bb_info (bi); + fbi->bb_infos.release (); +} + /* Initialize the array describing properties of formal parameters of NODE, analyze their uses and compute jump functions associated with actual arguments of calls from within NODE. */ @@ -2313,11 +2326,7 @@ ipa_analyze_node (struct cgraph_node *node) analysis_dom_walker ().walk (ENTRY_BLOCK_PTR_FOR_FN (cfun)); - int i; - struct ipa_bb_info *bi; - FOR_EACH_VEC_ELT (fbi.bb_infos, i, bi) -free_ipa_bb_info (bi); - fbi.bb_infos.release (); + ipa_release_body_info (); free_dominance_info (CDI_DOMINATORS); pop_cfun (); } @@ -3306,6 +3315,7 @@ ipa_node_params::~ipa_node_params () free (lattices); /* Lattice values and their sources are deallocated with their alocation pool. */ + known_csts.release (); known_contexts.release (); lattices = NULL; diff --git a/gcc/ipa-prop.h b/gcc/ipa-prop.h index b69ee8a..2fe824d 100644 --- a/gcc/ipa-prop.h +++ b/gcc/ipa-prop.h @@ -775,7 +775,7 @@ bool ipa_modify_expr (tree *, bool, ipa_parm_adjustment_vec); ipa_parm_adjustment *ipa_get_adjustment_candidate (tree **, bool *, ipa_parm_adjustment_vec, bool); - +void ipa_release_body_info (struct ipa_func_body_info *); /* From tree-sra.c: */ tree build_ref_for_offset (location_t, tree, HOST_WIDE_INT, bool, tree, -- 2.6.2
Re: [AArch64][PATCH 6/7] Add NEON intrinsics vqrdmlah and vqrdmlsh.
On 09/11/15 13:31, Christophe Lyon wrote: On 30 October 2015 at 16:52, Matthew Wahabwrote: On 30/10/15 12:51, Christophe Lyon wrote: On 23 October 2015 at 14:26, Matthew Wahab wrote: The ARMv8.1 architecture extension adds two Adv.SIMD instructions, sqrdmlah and sqrdmlsh. This patch adds the NEON intrinsics vqrdmlah and vqrdmlsh for these instructions. The new intrinsics are of the form vqrdml{as}h[q]_. Tested the series for aarch64-none-linux-gnu with native bootstrap and make check on an ARMv8 architecture. Also tested aarch64-none-elf with cross-compiled check-gcc on an ARMv8.1 emulator. Is there a publicly available simulator for v8.1? QEMU or Foundation Model? Sorry, I don't know. Matthew So, what will happen to the testsuite once this is committed? Are we going to see FAILs when using QEMU? No, the check at the top of the test files +/* { dg-require-effective-target arm_v8_1a_neon_hw } */ should make this test UNSUPPORTED if the the HW/simulator can't execute it. (Support for this check is added in patch #5 in this series.) Note that the aarch64-none-linux make check was run on ARMv8 HW which can't execute the test and correctly reported it as unsupported. Matthew
Re: [vec-cmp, patch 3/6] Vectorize comparison
On Mon, Nov 9, 2015 at 1:07 PM, Ilya Enkovichwrote: > On 26 Oct 16:09, Richard Biener wrote: >> On Wed, Oct 14, 2015 at 6:12 PM, Ilya Enkovich >> wrote: >> > + >> > + ops.release (); >> > + vec_defs.release (); >> >> No need to release auto_vec<>s at the end of scope explicitely. > > Fixed > >> >> > + vec_compare = build2 (code, mask_type, vec_rhs1, vec_rhs2); >> > + new_stmt = gimple_build_assign (mask, vec_compare); >> > + new_temp = make_ssa_name (mask, new_stmt); >> > + gimple_assign_set_lhs (new_stmt, new_temp); >> >> new_temp = make_ssa_name (mask); >> gimple_build_assign (new_temp, code, vec_rhs1, vec_rhs2); >> >> for the 4 stmts above. > > Fixed > >> >> > + >> > + vec_oprnds0.release (); >> > + vec_oprnds1.release (); >> >> Please use auto_vec<>s. > > These are used to hold vecs returned by vect_get_slp_defs. Thus can't > use auto_vec. Ok. Richard. >> >> Ok with those changes. >> >> RIchard. >> > > > gcc/ > > 2015-11-09 Ilya Enkovich > > * tree-vect-data-refs.c (vect_get_new_vect_var): Support > vect_mask_var. > (vect_create_destination_var): Likewise. > * tree-vect-stmts.c (vectorizable_comparison): New. > (vect_analyze_stmt): Add vectorizable_comparison. > (vect_transform_stmt): Likewise. > * tree-vectorizer.h (enum vect_var_kind): Add vect_mask_var. > (enum stmt_vec_info_type): Add comparison_vec_info_type. > (vectorizable_comparison): New. > > > diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c > index 11bce79..926752b 100644 > --- a/gcc/tree-vect-data-refs.c > +++ b/gcc/tree-vect-data-refs.c > @@ -3790,6 +3790,9 @@ vect_get_new_vect_var (tree type, enum vect_var_kind > var_kind, const char *name) >case vect_scalar_var: > prefix = "stmp"; > break; > + case vect_mask_var: > +prefix = "mask"; > +break; >case vect_pointer_var: > prefix = "vectp"; > break; > @@ -4379,7 +4382,11 @@ vect_create_destination_var (tree scalar_dest, tree > vectype) >tree type; >enum vect_var_kind kind; > > - kind = vectype ? vect_simple_var : vect_scalar_var; > + kind = vectype > +? VECTOR_BOOLEAN_TYPE_P (vectype) > +? vect_mask_var > +: vect_simple_var > +: vect_scalar_var; >type = vectype ? vectype : TREE_TYPE (scalar_dest); > >gcc_assert (TREE_CODE (scalar_dest) == SSA_NAME); > diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c > index f1216c8..ee549f4 100644 > --- a/gcc/tree-vect-stmts.c > +++ b/gcc/tree-vect-stmts.c > @@ -7416,6 +7416,185 @@ vectorizable_condition (gimple *stmt, > gimple_stmt_iterator *gsi, >return true; > } > > +/* vectorizable_comparison. > + > + Check if STMT is comparison expression that can be vectorized. > + If VEC_STMT is also passed, vectorize the STMT: create a vectorized > + comparison, put it in VEC_STMT, and insert it at GSI. > + > + Return FALSE if not a vectorizable STMT, TRUE otherwise. */ > + > +bool > +vectorizable_comparison (gimple *stmt, gimple_stmt_iterator *gsi, > +gimple **vec_stmt, tree reduc_def, > +slp_tree slp_node) > +{ > + tree lhs, rhs1, rhs2; > + stmt_vec_info stmt_info = vinfo_for_stmt (stmt); > + tree vectype1 = NULL_TREE, vectype2 = NULL_TREE; > + tree vectype = STMT_VINFO_VECTYPE (stmt_info); > + tree vec_rhs1 = NULL_TREE, vec_rhs2 = NULL_TREE; > + tree new_temp; > + loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info); > + enum vect_def_type dts[2] = {vect_unknown_def_type, vect_unknown_def_type}; > + unsigned nunits; > + int ncopies; > + enum tree_code code; > + stmt_vec_info prev_stmt_info = NULL; > + int i, j; > + bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info); > + vec vec_oprnds0 = vNULL; > + vec vec_oprnds1 = vNULL; > + gimple *def_stmt; > + tree mask_type; > + tree mask; > + > + if (!VECTOR_BOOLEAN_TYPE_P (vectype)) > +return false; > + > + mask_type = vectype; > + nunits = TYPE_VECTOR_SUBPARTS (vectype); > + > + if (slp_node || PURE_SLP_STMT (stmt_info)) > +ncopies = 1; > + else > +ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits; > + > + gcc_assert (ncopies >= 1); > + if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo) > +return false; > + > + if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def > + && !(STMT_VINFO_DEF_TYPE (stmt_info) == vect_nested_cycle > + && reduc_def)) > +return false; > + > + if (STMT_VINFO_LIVE_P (stmt_info)) > +{ > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > +"value used after loop.\n"); > + return false; > +} > + > + if (!is_gimple_assign (stmt)) > +return false; > + > + code = gimple_assign_rhs_code (stmt); > + > + if (TREE_CODE_CLASS (code) != tcc_comparison) > +return false;
Re: [vec-cmp, patch 2/6] Vectorization factor computation
On Mon, Nov 9, 2015 at 2:54 PM, Ilya Enkovichwrote: > 2015-10-20 16:45 GMT+03:00 Richard Biener : >> On Wed, Oct 14, 2015 at 1:21 PM, Ilya Enkovich >> wrote: >>> 2015-10-13 16:37 GMT+03:00 Richard Biener : On Thu, Oct 8, 2015 at 4:59 PM, Ilya Enkovich wrote: > Hi, > > This patch handles statements with boolean result in vectorization factor > computation. For comparison its operands type is used instead of restult > type to compute VF. Other boolean statements are ignored for VF. > > Vectype for comparison is computed using type of compared values. > Computed type is propagated into other boolean operations. This feels rather ad-hoc, mixing up the existing way of computing vector type and VF. I'd rather have turned the whole vector type computation around to the scheme working on the operands rather than on the lhs and then searching for smaller/larger types on the rhs'. I know this is a tricky function (heh, but you make it even worse...). And it needs a helper with knowledge about operations so one can compute the result vector type for an operation on its operands. The seeds should be PHIs (handled like now) and loads, and yes, externals need special handling. Ideally we'd do things in two stages, first compute vector types in a less constrained manner (not forcing a single vector size) and then in a 2nd run promote to a common size also computing the VF to do that. >>> >>> This sounds like a refactoring, not a functional change, right? Also I >>> don't see a reason to analyze DF to compute vectypes if we promote it >>> to a single vector size anyway. For booleans we have to do it because >>> boolean vectors of the same size may have different number of >>> elements. What is the reason to do it for other types? >> >> For conversions and operators which support different sized operands > > That's what we handle in vector patterns and use some helper functions > to determine vectypes there. Looks like this refactoring would affects > patterns significantly. Probably compute vectypes before searching for > patterns? > >> >>> Shouldn't it be a patch independent from comparison vectorization series? >> >> As you like. > > I'd like to move on with vector comparison and consider VF computation > refactoring when it's stabilized. This patch is the last one (except > target ones) not approved in all vector comparison related series. > Would it be OK to go on with it in a current shape? Yes. Thanks, Richard. > Thanks, > Ilya
Re: [PATCH] PR/67682, break SLP groups up if only some elements match
On 06/11/15 12:55, Richard Biener wrote: > >> + /* GROUP_GAP of the first group now has to skip over the second group >> too. */ >> + GROUP_GAP (first_vinfo) += group2_size; > > Please add a MSG_NOTE debug printf stating that we split the group and > at which element. Done. > I think you want to add && STMT_VINFO_GROUPED_ACCESS (vinfo_for_stmt (stmt)) > otherwise this could be SLP reductions where there is no way the split > group would enable vectorization. Ah, I had thought that the (GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt))) check sufficed for that, as per e.g. the section above /* Create a node (a root of the SLP tree) for the packed grouped stores. */ But done. > Note that BB vectorization now also very aggressively falls back to > considering > non-matches being "external". > > Not sure why that doesn't trigger for your testcases ;) I tested against trunk r229944, on which all of my scan-tree-dump's were failing > I'm comfortable with the i < group_size half of the patch. For the other > piece > I'd like to see more compile-time / success data from, say, building > SPEC CPU 2006. Well, as a couple of quick data points, a compile of SPEC2000 on aarch64-none-linux-gnu (-Ofast -fomit-frame-pointer -mcpu=cortex-a57), I have: 3080 successes without patch; +79 successes from the "i < vectorization_factor" part of the patch (on its own) +90 successes from the (i>=vectorization_factor) && "i < group_size" part (on its own) +(79 from first) +(90 from second) + (an additional 62) from both parts together. And for SPEC2006, aarch64-linux-gnu (-O3 -fomit-frame-pointer -mcpu=cortex-a57): 11979 successes without patch; + 499 from the "i < vectorization_factor" part + 264 from the (i >= vectorization factor) && (i < group_size)" part + extra 336 if both parts combined. I haven't done any significant measurements of compile-time yet. (snipping this bit out-of-order) > Hmm. This is of course pretty bad for compile-time for the non-matching > case as that effectively will always split into N pieces if we feed it > garbage (that is, without being sure that at least one pice _will_ vectorize). > > OTOH with the current way of computing "matches" we'd restrict ourselves > to cases where the first stmt we look at (and match to) happens to be > the operation that in the end will form a matching group. ... > Eventually we'd want to improve the "matches" return > to include the affected stmts (ok, that'll be not very easy) so we can do > a cheap "if we split here will it fix that particular mismatch" check. Yes, I think there are a bunch of things we can do here, that would be more powerful than the simple approach I used here. The biggest limiting factor will probably be (lack of) permutation, i.e. if we only SLP stores to consecutive addresses. > So, please split the patch and I suggest to leave the controversical part > for next stage1 together with some improvement on the SLP build process > itself? Here's a reduced version with just the second case, bootstrapped+check-gcc/g++ on x86_64. gcc/ChangeLog: * tree-vect-slp.c (vect_split_slp_store_group): New. (vect_analyze_slp_instance): During basic block SLP, recurse on subgroups if vect_build_slp_tree fails after 1st vector. gcc/testsuite/ChangeLog: * gcc.dg/vect/bb-slp-7.c (main1): Make subgroups non-isomorphic. * gcc.dg/vect/bb-slp-subgroups-1.c: New. * gcc.dg/vect/bb-slp-subgroups-2.c: New. * gcc.dg/vect/bb-slp-subgroups-3.c: New. * gcc.dg/vect/bb-slp-subgroups-4.c: New. --- gcc/testsuite/gcc.dg/vect/bb-slp-7.c | 10 ++-- gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-1.c | 44 +++ gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-2.c | 42 +++ gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c | 41 ++ gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-4.c | 41 ++ gcc/tree-vect-slp.c| 74 +- 6 files changed, 246 insertions(+), 6 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-1.c create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-2.c create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-4.c diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-7.c b/gcc/testsuite/gcc.dg/vect/bb-slp-7.c index ab54a48..b8bef8c 100644 --- a/gcc/testsuite/gcc.dg/vect/bb-slp-7.c +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-7.c @@ -16,12 +16,12 @@ main1 (unsigned int x, unsigned int y) unsigned int *pout = [0]; unsigned int a0, a1, a2, a3; - /* Non isomorphic. */ + /* Non isomorphic, even 64-bit subgroups. */ a0 = *pin++ + 23; - a1 = *pin++ + 142; + a1 = *pin++ * 142; a2 = *pin++ + 2; a3 = *pin++ * 31; - + *pout++ = a0 * x; *pout++ = a1 * y; *pout++ = a2 * x; @@ -29,7 +29,7 @@ main1 (unsigned int x, unsigned int y) /* Check results. */ if (out[0] != (in[0] +
[patch] backport PIE support for FreeBSD to gcc-49
Hi, any objections that I apply this patch to gcc-4.9? It is FreeBSD only. TIA, Andreas 2015-11-09 Andreas ToblerBackport from mainline 2015-05-18 Andreas Tobler * config/freebsd-spec.h (FBSD_STARTFILE_SPEC): Add the bits to build pie executables. (FBSD_ENDFILE_SPEC): Likewise. * config/i386/freebsd.h (STARTFILE_SPEC): Remove and use the one from config/freebsd-spec.h. (ENDFILE_SPEC): Likewise. 2015-11-02 Andreas Tobler * config/rs6000/freebsd64.h (ASM_SPEC32): Adust spec to handle PIE executables. Index: gcc/config/freebsd-spec.h === --- gcc/config/freebsd-spec.h (revision 230016) +++ gcc/config/freebsd-spec.h (working copy) @@ -66,8 +66,9 @@ "%{!shared: \ %{pg:gcrt1.o%s} %{!pg:%{p:gcrt1.o%s} \ %{!p:%{profile:gcrt1.o%s} \ -%{!profile:crt1.o%s \ - crti.o%s %{!shared:crtbegin.o%s} %{shared:crtbeginS.o%s}" +%{!profile: \ +%{pie: Scrt1.o%s;:crt1.o%s} \ + crti.o%s %{static:crtbeginT.o%s;shared|pie:crtbeginS.o%s;:crtbegin.o%s}" /* Provide a ENDFILE_SPEC appropriate for FreeBSD. Here we tack on the magical crtend.o file (see crtstuff.c) which provides part of @@ -76,7 +77,7 @@ `crtn.o'. */ #define FBSD_ENDFILE_SPEC \ - "%{!shared:crtend.o%s} %{shared:crtendS.o%s} crtn.o%s" + "%{shared|pie:crtendS.o%s;:crtend.o%s} crtn.o%s" /* Provide a LIB_SPEC appropriate for FreeBSD as configured and as required by the user-land thread model. Before __FreeBSD_version Index: gcc/config/i386/freebsd.h === --- gcc/config/i386/freebsd.h (revision 230016) +++ gcc/config/i386/freebsd.h (working copy) @@ -59,29 +59,16 @@ #define SUBTARGET_EXTRA_SPECS \ { "fbsd_dynamic_linker", FBSD_DYNAMIC_LINKER } -/* Provide a STARTFILE_SPEC appropriate for FreeBSD. Here we add - the magical crtbegin.o file (see crtstuff.c) which provides part - of the support for getting C++ file-scope static object constructed - before entering `main'. */ - -#undef STARTFILE_SPEC -#define STARTFILE_SPEC \ - "%{!shared: \ - %{pg:gcrt1.o%s} %{!pg:%{p:gcrt1.o%s} \ - %{!p:%{profile:gcrt1.o%s} \ -%{!profile:crt1.o%s \ - crti.o%s %{!shared:crtbegin.o%s} %{shared:crtbeginS.o%s}" +/* Use the STARTFILE_SPEC from config/freebsd-spec.h. */ -/* Provide a ENDFILE_SPEC appropriate for FreeBSD. Here we tack on - the magical crtend.o file (see crtstuff.c) which provides part of - the support for getting C++ file-scope static object constructed - before entering `main', followed by a normal "finalizer" file, - `crtn.o'. */ +#undef STARTFILE_SPEC +#define STARTFILE_SPEC FBSD_STARTFILE_SPEC -#undef ENDFILE_SPEC -#define ENDFILE_SPEC \ - "%{!shared:crtend.o%s} %{shared:crtendS.o%s} crtn.o%s" +/* Use the ENDFILE_SPEC from config/freebsd-spec.h. */ +#undef ENDFILE_SPEC +#define ENDFILE_SPEC FBSD_ENDFILE_SPEC + /* Provide a LINK_SPEC appropriate for FreeBSD. Here we provide support for the special GCC options -static and -shared, which allow us to link things in one of these three modes by applying the appropriate Index: gcc/config/rs6000/freebsd64.h === --- gcc/config/rs6000/freebsd64.h (revision 230016) +++ gcc/config/rs6000/freebsd64.h (working copy) @@ -130,7 +130,7 @@ #defineLINK_OS_FREEBSD_SPEC "%{m32:%(link_os_freebsd_spec32)}%{!m32:%(link_os_freebsd_spec64)}" #define ASM_SPEC32 "-a32 \ -%{mrelocatable} %{mrelocatable-lib} %{fpic:-K PIC} %{fPIC:-K PIC} \ +%{mrelocatable} %{mrelocatable-lib} %{fpic|fpie|fPIC|fPIE:-K PIC} \ %{memb} %{!memb: %{msdata=eabi: -memb}} \ %{!mlittle: %{!mlittle-endian: %{!mbig: %{!mbig-endian: \ %{mcall-freebsd: -mbig} \
Re: [PATCH 1/2] s/390: Implement "target" attribute.
On 11/02/2015 09:44 AM, Dominik Vogt wrote: > (@Uli: I'd like to hear your opinion on this issue. > Original message: > https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03403.html). > > On Fri, Oct 30, 2015 at 03:09:39PM +0100, Andreas Krebbel wrote: >> Why do we need x_s390_arch_specified and x_s390_tune_specified? You >> should be able to use opts_set->x_s390_arch and opts_set->x_s390_tune >> instead? (patch attached, your tests keep working with that change). > > The idea was that -mtune on the command line is *not* overridden > by the "arch" target attribute. This would allow to change the > architecture for a specific function and keep the -mtune= option > from the command line. But as a matter of fact, the current patch > doesn't do it either (bug?). Your testcases even seem to check for this behavior so it looked intentional to me. But I agree that being able to keep the -mtune cmdline value for a function while only changing the used instruction set would be good. Could you please elaborate why implementing this requires the new flags? -Andreas-
Re: [vec-cmp, patch 2/6] Vectorization factor computation
2015-10-20 16:45 GMT+03:00 Richard Biener: > On Wed, Oct 14, 2015 at 1:21 PM, Ilya Enkovich wrote: >> 2015-10-13 16:37 GMT+03:00 Richard Biener : >>> On Thu, Oct 8, 2015 at 4:59 PM, Ilya Enkovich >>> wrote: Hi, This patch handles statements with boolean result in vectorization factor computation. For comparison its operands type is used instead of restult type to compute VF. Other boolean statements are ignored for VF. Vectype for comparison is computed using type of compared values. Computed type is propagated into other boolean operations. >>> >>> This feels rather ad-hoc, mixing up the existing way of computing >>> vector type and VF. I'd rather have turned the whole >>> vector type computation around to the scheme working on the operands >>> rather than on the lhs and then searching >>> for smaller/larger types on the rhs'. >>> >>> I know this is a tricky function (heh, but you make it even worse...). >>> And it needs a helper with knowledge about operations >>> so one can compute the result vector type for an operation on its >>> operands. The seeds should be PHIs (handled like now) >>> and loads, and yes, externals need special handling. >>> >>> Ideally we'd do things in two stages, first compute vector types in a >>> less constrained manner (not forcing a single vector size) >>> and then in a 2nd run promote to a common size also computing the VF to do >>> that. >> >> This sounds like a refactoring, not a functional change, right? Also I >> don't see a reason to analyze DF to compute vectypes if we promote it >> to a single vector size anyway. For booleans we have to do it because >> boolean vectors of the same size may have different number of >> elements. What is the reason to do it for other types? > > For conversions and operators which support different sized operands That's what we handle in vector patterns and use some helper functions to determine vectypes there. Looks like this refactoring would affects patterns significantly. Probably compute vectypes before searching for patterns? > >> Shouldn't it be a patch independent from comparison vectorization series? > > As you like. I'd like to move on with vector comparison and consider VF computation refactoring when it's stabilized. This patch is the last one (except target ones) not approved in all vector comparison related series. Would it be OK to go on with it in a current shape? Thanks, Ilya
Re: [PATCH] Use signed boolean type for boolean vectors
On 03 Nov 14:42, Richard Biener wrote: > On Wed, Oct 28, 2015 at 4:30 PM, Ilya Enkovichwrote: > > 2015-10-28 18:21 GMT+03:00 Richard Biener : > >> On Wed, Oct 28, 2015 at 2:13 PM, Ilya Enkovich > >> wrote: > >>> Hi, > >>> > >>> Testing boolean vector conversions I found several runtime regressions > >>> and investigation showed it's due to incorrect conversion caused by > >>> unsigned boolean type. When boolean vector is represented as an > >>> integer vector on target it's a signed integer actually. Unsigned > >>> boolean type was chosen due to possible single bit values, but for > >>> multiple bit values it causes wrong casting. The easiest way to fix > >>> it is to use signed boolean value. The following patch does this and > >>> fixes my problems with conversion. Bootstrapped and tested on > >>> x86_64-unknown-linux-gnu. Is it OK? > >> > >> Hmm. Actually formally the "boolean" vectors were always 0 or -1 > >> (all bits set). That is also true for a signed boolean with precision 1 > >> but with higher precision what makes sure to sign-extend 'true'? > >> > >> So it's far from an obvious change, esp as you don't change the > >> precision == 1 case. [I still think we should have precision == 1 > >> for all boolean types] > >> > >> Richard. > >> > > > > For 1 bit precision signed type value 1 is out of range, right? This might > > break > > in many place due to used 1 as true value. > > For vectors -1 is true. Did you try whether it breaks many places? > build_int_cst (type, 1) should still work fine. > > Richard. > I tried it and didn't find any new failures. So looks I was wrong assuming it should cause many failures. Testing is not complete because many SPEC benchmarks are failing to compile on -O3 for AVX-512 on trunk. But I think we may proceed with signed type and fix constant generation issues if any revealed. This patch was bootstrapped and regtested on x86_64-unknown-linux-gnu. OK for trunk? Thanks, Ilya -- gcc/ 2015-11-09 Ilya Enkovich * optabs.c (expand_vec_cond_expr): Always get sign from type. * tree.c (wide_int_to_tree): Support negative values for boolean. (build_nonstandard_boolean_type): Use signed type for booleans. diff --git a/gcc/optabs.c b/gcc/optabs.c index fdcdc6a..44971ad 100644 --- a/gcc/optabs.c +++ b/gcc/optabs.c @@ -5365,7 +5365,6 @@ expand_vec_cond_expr (tree vec_cond_type, tree op0, tree op1, tree op2, op0a = TREE_OPERAND (op0, 0); op0b = TREE_OPERAND (op0, 1); tcode = TREE_CODE (op0); - unsignedp = TYPE_UNSIGNED (TREE_TYPE (op0a)); } else { @@ -5374,9 +5373,9 @@ expand_vec_cond_expr (tree vec_cond_type, tree op0, tree op1, tree op2, op0a = op0; op0b = build_zero_cst (TREE_TYPE (op0)); tcode = LT_EXPR; - unsignedp = false; } cmp_op_mode = TYPE_MODE (TREE_TYPE (op0a)); + unsignedp = TYPE_UNSIGNED (TREE_TYPE (op0a)); gcc_assert (GET_MODE_SIZE (mode) == GET_MODE_SIZE (cmp_op_mode) diff --git a/gcc/tree.c b/gcc/tree.c index 18d6544..6fb4c09 100644 --- a/gcc/tree.c +++ b/gcc/tree.c @@ -1437,7 +1437,7 @@ wide_int_to_tree (tree type, const wide_int_ref ) case BOOLEAN_TYPE: /* Cache false or true. */ limit = 2; - if (hwi < 2) + if (IN_RANGE (hwi, 0, 1)) ix = hwi; break; @@ -8069,7 +8069,7 @@ build_nonstandard_boolean_type (unsigned HOST_WIDE_INT precision) type = make_node (BOOLEAN_TYPE); TYPE_PRECISION (type) = precision; - fixup_unsigned_type (type); + fixup_signed_type (type); if (precision <= MAX_INT_CACHED_PREC) nonstandard_boolean_type_cache[precision] = type;
Re: OpenACC Firstprivate
On Mon, Nov 09, 2015 at 08:59:15AM -0500, Nathan Sidwell wrote: > >This I'm afraid performs often two copies rather than just one (one to copy > >the host value to the present_copyin mapped value, another one in the > >region), > > I don't think that can be avoided. The host doesn't have control over when > the CTAs (a gang) start -- they may even be serialized onto the same > physical HW. So each gang has to initialize its own instance. Or did you > mean something else? So, what is the scope of the private and firstprivate vars in OpenACC? In OpenMP if a variable is private or firstprivate on the target construct, unless further privatized in inner constructs it is really shared among all the threads in all the teams (ro one var per all CTAs/workers in PTX terms). Is that the case for OpenACC too, or are the vars e.g. private to each CTA already or to each thread in each CTA, something different? If they are shared by all CTAs, then you should hopefully be able to use the GOMP_MAP_FIRSTPRIVATE{,_INT}, if not, then I'd say you should at least use those to provide you the initializer data to initialize your private vars from as a cheaper alternative to mapping. Jakub
Re: [PATCH] 02/N Fix memory leaks in IPA
On Mon, Nov 9, 2015 at 2:29 PM, Martin Liškawrote: > Hi. > > Following changes were consulted with Martin Jambor to properly release > memory in IPA. It fixes leaks which popped up in tramp3d with -O2. > > Bootstrap and regression tests have been running. > > Ready after it finishes? Ok. Richard. > Thanks, > Martin
Re: Extend tree-call-cdce to calls whose result is used
Hi, On Sat, 7 Nov 2015, Richard Sandiford wrote: > For -fmath-errno, builtins.c currently expands calls to sqrt to: > > y = sqrt_optab (x); > if (y != y) > [ sqrt (x); or errno = EDOM; ] > > - the call to sqrt is protected by the result of the optab rather > than the input. It would be better to check !(x >= 0), like > tree-call-cdce.c does. It depends. With fast-math (and hence without NaNs) you can trivially optimize away a (y != y) test. You can't do so with !(x>=0) at all. > - the branch isn't exposed at the gimple level and so gets little > high-level optimisation. > > - we do this for log too, but for log a zero input produces > -inf rather than a NaN, and sets errno to ERANGE rather than EDOM. > > This patch moves the code to tree-call-cdce.c instead, This somehow feels wrong. Dead-code elimination doesn't have anything to do with the transformation you want, it rather is rewriting all feasible calls into something else, like fold_builtins does. Also cdce currently doesn't seem to do any checks on the fast-math flags, so I wonder if some of the conditions that you now also insert for calls whose results are used stay until final code. > Previously the pass was only enabled by default at -O2 or above, but the > old builtins.c code was enabled at -O. The patch therefore enables the > pass at -O as well. The pass is somewhat expensive in that it removes dominator info and schedules a full ssa update. The transformation is trivial enough that dominators and SSA form can be updated on the fly, I think without that it's not feasible for -O. But as said I think this transformation should better be moved into builtin folding (or other call folding), at which point also the fast-math flags can be checked. The infrastructure routines of tree-call-cdce can be used there of course. If so moved the cdce pass would be subsumed by that btw. (because the dead call result will be trivially exposed), and that would be a good thing. Ciao, Michael.
Re: [PATCH v2 11/13] Test case for conversion from __seg_tls:0
On Tue, Oct 20, 2015 at 11:27 PM, Richard Hendersonwrote: > --- > gcc/testsuite/gcc.target/i386/addr-space-3.c | 10 ++ > 1 file changed, 10 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/i386/addr-space-3.c > > diff --git a/gcc/testsuite/gcc.target/i386/addr-space-3.c > b/gcc/testsuite/gcc.target/i386/addr-space-3.c > new file mode 100644 > index 000..63f1f03 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/addr-space-3.c > @@ -0,0 +1,10 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O" } */ > +/* { dg-final { scan-assembler "[fg]s:0" } } */ Causes ERROR: (DejaGnu) proc "fg" does not exist. The error code is NONE The info on the error is: close: spawn id exp6 not open while executing "close -i exp6" invoked from within "catch "close -i $spawn_id"" > + > +void test(int *y) > +{ > + int *x = (int __seg_tls *)0; > + if (x == y) > +asm(""); > +} > -- > 2.4.3 >
Re: [PATCH 4b/4] [ARM] PR63870 Remove error for invalid lane numbers
On 08/11/15 00:26, charles.bay...@linaro.org wrote: > From: Charles Baylis> > Charles Baylis > > * config/arm/neon.md (neon_vld1_lane): Remove error for invalid > lane number. > (neon_vst1_lane): Likewise. > (neon_vld2_lane): Likewise. > (neon_vst2_lane): Likewise. > (neon_vld3_lane): Likewise. > (neon_vst3_lane): Likewise. > (neon_vld4_lane): Likewise. > (neon_vst4_lane): Likewise. > The only way we can get here is through the intrinsics - we do a check for lane numbers earlier. If things go horribly wrong - the assembler will complain, so it's ok to elide this internal_error here, thus OK. regards Ramana > Change-Id: Id7b4b6fa7320157e62e5bae574b4c4688d921774 > --- > gcc/config/arm/neon.md | 48 > 1 file changed, 8 insertions(+), 40 deletions(-) > > diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md > index e8db020..6574e6e 100644 > --- a/gcc/config/arm/neon.md > +++ b/gcc/config/arm/neon.md > @@ -4264,8 +4264,6 @@ if (BYTES_BIG_ENDIAN) >HOST_WIDE_INT lane = ENDIAN_LANE_N(mode, INTVAL (operands[3])); >HOST_WIDE_INT max = GET_MODE_NUNITS (mode); >operands[3] = GEN_INT (lane); > - if (lane < 0 || lane >= max) > -error ("lane out of range"); >if (max == 1) > return "vld1.\t%P0, %A1"; >else > @@ -4286,9 +4284,7 @@ if (BYTES_BIG_ENDIAN) >HOST_WIDE_INT max = GET_MODE_NUNITS (mode); >operands[3] = GEN_INT (lane); >int regno = REGNO (operands[0]); > - if (lane < 0 || lane >= max) > -error ("lane out of range"); > - else if (lane >= max / 2) > + if (lane >= max / 2) > { >lane -= max / 2; >regno += 2; > @@ -4372,8 +4368,6 @@ if (BYTES_BIG_ENDIAN) >HOST_WIDE_INT lane = ENDIAN_LANE_N(mode, INTVAL (operands[2])); >HOST_WIDE_INT max = GET_MODE_NUNITS (mode); >operands[2] = GEN_INT (lane); > - if (lane < 0 || lane >= max) > -error ("lane out of range"); >if (max == 1) > return "vst1.\t{%P1}, %A0"; >else > @@ -4393,9 +4387,7 @@ if (BYTES_BIG_ENDIAN) >HOST_WIDE_INT lane = ENDIAN_LANE_N(mode, INTVAL (operands[2])); >HOST_WIDE_INT max = GET_MODE_NUNITS (mode); >int regno = REGNO (operands[1]); > - if (lane < 0 || lane >= max) > -error ("lane out of range"); > - else if (lane >= max / 2) > + if (lane >= max / 2) > { >lane -= max / 2; >regno += 2; > @@ -4464,8 +4456,6 @@ if (BYTES_BIG_ENDIAN) >HOST_WIDE_INT max = GET_MODE_NUNITS (mode); >int regno = REGNO (operands[0]); >rtx ops[4]; > - if (lane < 0 || lane >= max) > -error ("lane out of range"); >ops[0] = gen_rtx_REG (DImode, regno); >ops[1] = gen_rtx_REG (DImode, regno + 2); >ops[2] = operands[1]; > @@ -4489,9 +4479,7 @@ if (BYTES_BIG_ENDIAN) >HOST_WIDE_INT max = GET_MODE_NUNITS (mode); >int regno = REGNO (operands[0]); >rtx ops[4]; > - if (lane < 0 || lane >= max) > -error ("lane out of range"); > - else if (lane >= max / 2) > + if (lane >= max / 2) > { >lane -= max / 2; >regno += 2; > @@ -4579,8 +4567,6 @@ if (BYTES_BIG_ENDIAN) >HOST_WIDE_INT max = GET_MODE_NUNITS (mode); >int regno = REGNO (operands[1]); >rtx ops[4]; > - if (lane < 0 || lane >= max) > -error ("lane out of range"); >ops[0] = operands[0]; >ops[1] = gen_rtx_REG (DImode, regno); >ops[2] = gen_rtx_REG (DImode, regno + 2); > @@ -4604,9 +4590,7 @@ if (BYTES_BIG_ENDIAN) >HOST_WIDE_INT max = GET_MODE_NUNITS (mode); >int regno = REGNO (operands[1]); >rtx ops[4]; > - if (lane < 0 || lane >= max) > -error ("lane out of range"); > - else if (lane >= max / 2) > + if (lane >= max / 2) > { >lane -= max / 2; >regno += 2; > @@ -4723,8 +4707,6 @@ if (BYTES_BIG_ENDIAN) >HOST_WIDE_INT max = GET_MODE_NUNITS (mode); >int regno = REGNO (operands[0]); >rtx ops[5]; > - if (lane < 0 || lane >= max) > -error ("lane out of range"); >ops[0] = gen_rtx_REG (DImode, regno); >ops[1] = gen_rtx_REG (DImode, regno + 2); >ops[2] = gen_rtx_REG (DImode, regno + 4); > @@ -4750,9 +4732,7 @@ if (BYTES_BIG_ENDIAN) >HOST_WIDE_INT max = GET_MODE_NUNITS (mode); >int regno = REGNO (operands[0]); >rtx ops[5]; > - if (lane < 0 || lane >= max) > -error ("lane out of range"); > - else if (lane >= max / 2) > + if (lane >= max / 2) > { >lane -= max / 2; >regno += 2; > @@ -4895,8 +4875,6 @@ if (BYTES_BIG_ENDIAN) >HOST_WIDE_INT max = GET_MODE_NUNITS (mode); >int regno = REGNO (operands[1]); >rtx ops[5]; > - if (lane < 0 || lane >= max) > -error ("lane out of range"); >ops[0] = operands[0]; >ops[1] = gen_rtx_REG (DImode, regno); >ops[2] = gen_rtx_REG (DImode, regno + 2); > @@ -4922,9 +4900,7 @@ if (BYTES_BIG_ENDIAN) >HOST_WIDE_INT max = GET_MODE_NUNITS (mode); >int regno = REGNO (operands[1]); >
Re: [vec-cmp, patch 4/6] Support vector mask invariants
On Mon, Nov 9, 2015 at 1:11 PM, Ilya Enkovichwrote: > On 26 Oct 16:21, Richard Biener wrote: >> On Wed, Oct 14, 2015 at 6:13 PM, Ilya Enkovich >> wrote: >> > - val = fold_unary (VIEW_CONVERT_EXPR, TREE_TYPE (type), val); >> > + { >> > + /* Can't use VIEW_CONVERT_EXPR for booleans because >> > +of possibly different sizes of scalar value and >> > +vector element. */ >> > + if (VECTOR_BOOLEAN_TYPE_P (type)) >> > + { >> > + if (integer_zerop (val)) >> > + val = build_int_cst (TREE_TYPE (type), 0); >> > + else if (integer_onep (val)) >> > + val = build_int_cst (TREE_TYPE (type), 1); >> > + else >> > + gcc_unreachable (); >> > + } >> > + else >> > + val = fold_unary (VIEW_CONVERT_EXPR, TREE_TYPE (type), >> > val); >> >> I think the existing code is fine with using fold_convert () here >> which should also work >> for the boolean types. So does just >> >> val = fold_convert (TREE_TYPE (type), val); >> >> work? > > It seems to work OK. > >> >> > @@ -7428,13 +7459,13 @@ vectorizable_condition (gimple *stmt, >> > gimple_stmt_iterator *gsi, >> > gimple *gtemp; >> > vec_cond_lhs = >> > vect_get_vec_def_for_operand (TREE_OPERAND (cond_expr, 0), >> > - stmt, NULL); >> > + stmt, NULL, comp_vectype); >> > vect_is_simple_use (TREE_OPERAND (cond_expr, 0), stmt, >> > loop_vinfo, , , [0]); >> > >> > vec_cond_rhs = >> > vect_get_vec_def_for_operand (TREE_OPERAND (cond_expr, 1), >> > - stmt, NULL); >> > + stmt, NULL, comp_vectype); >> > vect_is_simple_use (TREE_OPERAND (cond_expr, 1), stmt, >> > loop_vinfo, , , [1]); >> >> I still don't like this very much but I guess without some major >> refactoring of all >> the functions there isn't a better way to do it for now. >> >> Thus, ok with trying the change suggested above. >> >> Thanks, >> Richard. >> > > Here is an updated version. Ok. Thanks, Richard. > Thanks, > Ilya > -- > gcc/ > > 2015-11-09 Ilya Enkovich > > * expr.c (const_vector_mask_from_tree): New. > (const_vector_from_tree): Use const_vector_mask_from_tree > for boolean vectors. > * tree-vect-stmts.c (vect_init_vector): Support boolean vector > invariants. > (vect_get_vec_def_for_operand): Add VECTYPE arg. > (vectorizable_condition): Directly provide vectype for invariants > used in comparison. > * tree-vectorizer.h (vect_get_vec_def_for_operand): Add VECTYPE > arg. > > > diff --git a/gcc/expr.c b/gcc/expr.c > index 2b2174f..03936ee 100644 > --- a/gcc/expr.c > +++ b/gcc/expr.c > @@ -11423,6 +11423,40 @@ try_tablejump (tree index_type, tree index_expr, > tree minval, tree range, >return 1; > } > > +/* Return a CONST_VECTOR rtx representing vector mask for > + a VECTOR_CST of booleans. */ > +static rtx > +const_vector_mask_from_tree (tree exp) > +{ > + rtvec v; > + unsigned i; > + int units; > + tree elt; > + machine_mode inner, mode; > + > + mode = TYPE_MODE (TREE_TYPE (exp)); > + units = GET_MODE_NUNITS (mode); > + inner = GET_MODE_INNER (mode); > + > + v = rtvec_alloc (units); > + > + for (i = 0; i < VECTOR_CST_NELTS (exp); ++i) > +{ > + elt = VECTOR_CST_ELT (exp, i); > + > + gcc_assert (TREE_CODE (elt) == INTEGER_CST); > + if (integer_zerop (elt)) > + RTVEC_ELT (v, i) = CONST0_RTX (inner); > + else if (integer_onep (elt) > + || integer_minus_onep (elt)) > + RTVEC_ELT (v, i) = CONSTM1_RTX (inner); > + else > + gcc_unreachable (); > +} > + > + return gen_rtx_CONST_VECTOR (mode, v); > +} > + > /* Return a CONST_VECTOR rtx for a VECTOR_CST tree. */ > static rtx > const_vector_from_tree (tree exp) > @@ -11438,6 +11472,9 @@ const_vector_from_tree (tree exp) >if (initializer_zerop (exp)) > return CONST0_RTX (mode); > > + if (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (exp))) > +return const_vector_mask_from_tree (exp); > + >units = GET_MODE_NUNITS (mode); >inner = GET_MODE_INNER (mode); > > diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c > index ee549f4..af203ab 100644 > --- a/gcc/tree-vect-stmts.c > +++ b/gcc/tree-vect-stmts.c > @@ -1300,7 +1300,7 @@ vect_init_vector (gimple *stmt, tree val, tree type, > gimple_stmt_iterator *gsi) >if (!types_compatible_p (TREE_TYPE (type), TREE_TYPE (val))) > { > if (CONSTANT_CLASS_P (val)) > - val
Re: Add a combined_fn enum
Bernd Schmidtwrites: > On 11/09/2015 11:24 AM, Richard Sandiford wrote: >> Bernd Schmidt writes: >>> I see it's already acked, but have you considered just doing away with >>> the builtin/internal function distinction? >> >> I think they're too different to be done away with entirely. built-in >> functions map directly to a specific C-level callable function and >> must have an fndecl, whereas no internal function should have an fndecl. >> Whether a built-in function is available depends on the selected >> language and what declarations the front-end has seen, while whether >> an internal function is available depends entirely on GCC internal >> information. > > Yes... but aren't these things fixable relatively easily (compared with > what your patches are doing)? I'm not sure what you mean by "fix" though. I don't think we can change any of the constraints above. > I also have the problem that I can't quite see where your patch series > is going. Let's take "Add internal bitcount functions", it adds new > internal functions but no users AFAICS. What is the end goal here (there > doesn't seem to be a [0/N] description in my inbox)? The main goal is to allow functions to be vectorised simply by defining the associated optab. At the moment you can get a scalar square root instruction simply by defining something like sqrtdf2. But if you want to have vectorised sqrt, you need to have a target-specific C-level built-in function for the vector form of sqrt, implement TARGET_BUILTIN_VECTORIZED_FUNCTION, and expand the sqrt in the same way that the target expands other directly-called built-in functions. That seems unnecessarily indirect, especially when in practice those target-specific functions tend to use patterns like sqrtv2df2 anyway. It seems better to have GCC use the vector optabs directly. This was prompted by the patch Dave Sherwood posted to support scalar and vector fmin() and fmax() even with -fno-fast-math on aarch64. As things stood we needed the same approach: use an optab for the scalar version and TARGET_BUILTIN_VECTORIZED_FUNCTION for the vector version. The problem is that at present there's no aarch64 built-in function that does what we want, so we'd have to define a new one. And that seems silly when GCC really ought to be able to use the vector version of the optab without any more help from the target. I'm hoping to post those patches later today. But the series has other side-benefits, like: - allowing genmatch to fold calls to internal functions - splitting the computational part of maths function from the the fallback errno handling at an earlier point, so that they get more optimisation - clearly separating out the call folds that we're prepared to do on gimple, rather than calling into builtins.c. Thanks, Richard
Re: [PATCH] Fix memory leaks and use a pool_allocator
On Mon, Nov 09, 2015 at 01:11:48PM +0100, Richard Biener wrote: > On Mon, Nov 9, 2015 at 12:22 PM, Martin Liškawrote: > > Hi. > > > > This is follow-up of changes that Richi started on Friday. > > > > Patch can bootstrap on x86_64-linux-pc and regression tests are running. > > > > Ready for trunk? > > * tree-ssa-dom.c (free_edge_info): Make the function extern. > ... > * tree-ssa.h (free_edge_info): Declare function extern. > > declare this in tree-ssa-threadupdate.h instead and renaming it to > sth less "public", like free_dom_edge_info. > > diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c > index fff62de..eb6b7df 100644 > --- a/gcc/ifcvt.c > +++ b/gcc/ifcvt.c > @@ -3161,6 +3161,8 @@ noce_convert_multiple_sets (struct noce_if_info > *if_info) >set_used_flags (targets[i]); > } > > + temporaries.release (); > + >set_used_flags (cond); >set_used_flags (x); >set_used_flags (y); > @@ -3194,6 +3196,7 @@ noce_convert_multiple_sets (struct noce_if_info > *if_info) > } > >num_updated_if_blocks++; > + targets.release (); >return TRUE; > > suspiciously look like candidates for an auto_vec<> (didn't check). I was about to say the same thing after a little checking (maybe the region one in tree-ssa-threadupdate.c to, but didn't check that) > > @@ -1240,6 +1240,7 @@ lra_create_live_ranges_1 (bool all_p, bool dead_insn_p) >dead_set = sparseset_alloc (max_regno); >unused_set = sparseset_alloc (max_regno); >curr_point = 0; > + point_freq_vec.release (); >point_freq_vec.create (get_max_uid () * 2); > > a truncate (0) instead of a release () should be cheaper, avoiding the > re-allocation. yeah, or even change it to just grow the array, afaict it doesn't expect the array to be cleared? > @@ -674,6 +674,10 @@ sra_deinitialize (void) >assign_link_pool.release (); >obstack_free (_obstack, NULL); > > + for (hash_map ::iterator it = > + base_access_vec->begin (); it != base_access_vec->end (); ++it) > +(*it).second.release (); > + >delete base_access_vec; > > I wonder if the better fix is to provide a proper free method for the > hash_map? > A hash_map with 'auto_vec' looks suspicous - eventually a proper release > was intented here via default_hash_map_traits <>? in fact I would expect that already works, but apparently not, so I'd say that's the bug. Trev > > Anyway, most of the things above can be improved as followup of course. > > Thanks, > Richard. > > > Thanks, > > Martin
Re: [PATCH] S/390: Fix warning in "*movstr" pattern.
On 11/04/2015 02:39 AM, Dominik Vogt wrote: > On Tue, Nov 03, 2015 at 06:47:28PM +0100, Ulrich Weigand wrote: >> Dominik Vogt wrote: >> >>> @@ -2936,7 +2936,7 @@ >>> (set (mem:BLK (match_operand:P 1 "register_operand" "0")) >>> (mem:BLK (match_operand:P 3 "register_operand" "2"))) >>> (set (match_operand:P 0 "register_operand" "=d") >>> - (unspec [(mem:BLK (match_dup 1)) >>> + (unspec:P [(mem:BLK (match_dup 1)) >>> (mem:BLK (match_dup 3)) >>> (reg:SI 0)] UNSPEC_MVST)) >>> (clobber (reg:CC CC_REGNUM))] >> >> Don't you have to change the expander too? Otherwise the >> pattern will no longer match ... > > Yes, you're right. This turned out to be a bit tricky to do > because the "movstr" expander doesn't allow variants with > different modes. :-/ > > New patch attached, including a test case that works on 31-bit and > 64-bit. Could you please check that the generated code doesn't change with a larger code base (e.g. speccpu)? It should not affect it but I really think we omitted the mode here for a reason (although I don't remember why). -Andreas-
Re: [PATCH] Fix memory leaks and use a pool_allocator
On 11/09/2015 01:11 PM, Richard Biener wrote: > On Mon, Nov 9, 2015 at 12:22 PM, Martin Liškawrote: >> Hi. >> >> This is follow-up of changes that Richi started on Friday. >> >> Patch can bootstrap on x86_64-linux-pc and regression tests are running. >> >> Ready for trunk? > > * tree-ssa-dom.c (free_edge_info): Make the function extern. > ... > * tree-ssa.h (free_edge_info): Declare function extern. > > declare this in tree-ssa-threadupdate.h instead and renaming it to > sth less "public", like free_dom_edge_info. > > diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c > index fff62de..eb6b7df 100644 > --- a/gcc/ifcvt.c > +++ b/gcc/ifcvt.c > @@ -3161,6 +3161,8 @@ noce_convert_multiple_sets (struct noce_if_info > *if_info) >set_used_flags (targets[i]); > } > > + temporaries.release (); > + >set_used_flags (cond); >set_used_flags (x); >set_used_flags (y); > @@ -3194,6 +3196,7 @@ noce_convert_multiple_sets (struct noce_if_info > *if_info) > } > >num_updated_if_blocks++; > + targets.release (); >return TRUE; > > suspiciously look like candidates for an auto_vec<> (didn't check). > > @@ -1240,6 +1240,7 @@ lra_create_live_ranges_1 (bool all_p, bool dead_insn_p) >dead_set = sparseset_alloc (max_regno); >unused_set = sparseset_alloc (max_regno); >curr_point = 0; > + point_freq_vec.release (); >point_freq_vec.create (get_max_uid () * 2); > > a truncate (0) instead of a release () should be cheaper, avoiding the > re-allocation. > > @@ -674,6 +674,10 @@ sra_deinitialize (void) >assign_link_pool.release (); >obstack_free (_obstack, NULL); > > + for (hash_map ::iterator it = > + base_access_vec->begin (); it != base_access_vec->end (); ++it) > +(*it).second.release (); > + >delete base_access_vec; > > I wonder if the better fix is to provide a proper free method for the > hash_map? > A hash_map with 'auto_vec' looks suspicous - eventually a proper release > was intented here via default_hash_map_traits <>? > > Anyway, most of the things above can be improved as followup of course. > > Thanks, > Richard. > >> Thanks, >> Martin Hi. All suggested changes were applied, sending v2 and waiting for bootstrap and regression tests. Thanks, Martin >From c97270f2daadcca1efe6201adf1eb0df469ca91e Mon Sep 17 00:00:00 2001 From: marxin Date: Mon, 9 Nov 2015 10:49:14 +0100 Subject: [PATCH 1/2] Fix memory leaks and use a pool_allocator gcc/ChangeLog: 2015-11-09 Martin Liska * gcc.c (record_temp_file): Release name string. * ifcvt.c (noce_convert_multiple_sets): Use auto_vec instead of vec. * lra-lives.c (free_live_range_list): Utilize lra_live_range_pool for allocation and deallocation. (create_live_range): Likewise. (copy_live_range): Likewise. (lra_merge_live_ranges): Likewise. (remove_some_program_points_and_update_live_ranges): Likewise. (lra_create_live_ranges_1): Release point_freq_vec that can be not freed from previous iteration of the function. * tree-eh.c (lower_try_finally_switch): Use auto_vec instead of vec. * tree-sra.c (sra_deinitialize): Release all vectors in base_access_vec. * tree-ssa-dom.c (free_dom_edge_info): Make the function extern. * tree-ssa-threadupdate.c (remove_ctrl_stmt_and_useless_edges): Release edge_info for a removed edge. (thread_through_all_blocks): Free region vector. * tree-ssa.h (free_dom_edge_info): Declare function extern. --- gcc/gcc.c | 5 - gcc/ifcvt.c | 8 +--- gcc/lra-lives.c | 14 -- gcc/tree-eh.c | 2 +- gcc/tree-sra.c | 6 ++ gcc/tree-ssa-dom.c | 8 gcc/tree-ssa-threadupdate.c | 6 +- gcc/tree-ssa-threadupdate.h | 1 + 8 files changed, 34 insertions(+), 16 deletions(-) diff --git a/gcc/gcc.c b/gcc/gcc.c index bbc9b23..8bbf5be 100644 --- a/gcc/gcc.c +++ b/gcc/gcc.c @@ -2345,7 +2345,10 @@ record_temp_file (const char *filename, int always_delete, int fail_delete) struct temp_file *temp; for (temp = always_delete_queue; temp; temp = temp->next) if (! filename_cmp (name, temp->name)) - goto already1; + { + free (name); + goto already1; + } temp = XNEW (struct temp_file); temp->next = always_delete_queue; diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c index fff62de..3401faa 100644 --- a/gcc/ifcvt.c +++ b/gcc/ifcvt.c @@ -3076,12 +3076,12 @@ noce_convert_multiple_sets (struct noce_if_info *if_info) rtx_code cond_code = GET_CODE (cond); /* The true targets for a conditional move. */ - vec targets = vNULL; + auto_vec targets; /* The temporaries introduced to allow us to not consider register overlap. */ - vec temporaries = vNULL; + auto_vec temporaries; /* The insns we've emitted. */ - vec unmodified_insns = vNULL; + auto_vec unmodified_insns; int count = 0; FOR_BB_INSNS (then_bb, insn) @@ -3161,6
Re: OpenACC Firstprivate
On Sat, Nov 07, 2015 at 08:50:28AM -0500, Nathan Sidwell wrote: > Index: gcc/gimplify.c > === > --- gcc/gimplify.c(revision 229892) > +++ gcc/gimplify.c(working copy) > @@ -108,9 +108,15 @@ enum omp_region_type >/* Data region with offloading. */ >ORT_TARGET = 32, >ORT_COMBINED_TARGET = 33, > + > + ORT_ACC = 0x40, /* An OpenACC region. */ > + ORT_ACC_DATA = ORT_ACC | ORT_TARGET_DATA, /* Data construct. */ > + ORT_ACC_PARALLEL = ORT_ACC | ORT_TARGET, /* Parallel construct */ > + ORT_ACC_KERNELS = ORT_ACC | ORT_TARGET | 0x80, /* Kernels construct. */ > + >/* Dummy OpenMP region, used to disable expansion of > DECL_VALUE_EXPRs in taskloop pre body. */ > - ORT_NONE = 64 > + ORT_NONE = 0x100 > }; If you want to switch to hexadecimal, you should change all values in the enum to hexadecimal for consistency. > > /* Gimplify hashtable helper. */ > @@ -377,6 +383,12 @@ new_omp_context (enum omp_region_type re >else > c->default_kind = OMP_CLAUSE_DEFAULT_UNSPECIFIED; > > + c->combined_loop = false; > + c->distribute = false; > + c->target_map_scalars_firstprivate = false; > + c->target_map_pointers_as_0len_arrays = false; > + c->target_firstprivatize_array_bases = false; Why this? c is XCNEW allocated, so zero initialized. > @@ -5667,11 +5682,13 @@ omp_add_variable (struct gimplify_omp_ct >/* We shouldn't be re-adding the decl with the same data >sharing class. */ >gcc_assert ((n->value & GOVD_DATA_SHARE_CLASS & flags) == 0); > - /* The only combination of data sharing classes we should see is > - FIRSTPRIVATE and LASTPRIVATE. */ >nflags = n->value | flags; > - gcc_assert ((nflags & GOVD_DATA_SHARE_CLASS) > - == (GOVD_FIRSTPRIVATE | GOVD_LASTPRIVATE) > + /* The only combination of data sharing classes we should see is > + FIRSTPRIVATE and LASTPRIVATE. However, OpenACC permits > + reduction variables to be used in data sharing clauses. */ > + gcc_assert ((ctx->region_type & ORT_ACC) != 0 > + || ((nflags & GOVD_DATA_SHARE_CLASS) > + == (GOVD_FIRSTPRIVATE | GOVD_LASTPRIVATE)) > || (flags & GOVD_DATA_SHARE_CLASS) == 0); Are you sure you want to give up on any kind of consistency checks for OpenACC? If only reduction is special on OpenACC, perhaps you could tweak the assert for that instead? Something that can be done incrementally of course. > + > + /* OpenMP doesn't look in outer contexts to find an > + enclosing data clause. */ I'm puzzled by the comment. OpenMP does look in outer context for clauses that need that (pretty much all closes but private), that is the do_outer: recursion in omp_notice_variable. Say for firstprivate in order to copy (or copy construct) the private variable one needs the access to the outer context's var etc.). So perhaps it would help to document what you are doing here for OpenACC and why. > + struct gimplify_omp_ctx *octx = ctx->outer_context; > + if ((ctx->region_type & ORT_ACC) && octx) > + { > + omp_notice_variable (octx, decl, in_code); > + > + for (; octx; octx = octx->outer_context) > + { > + if (!(octx->region_type & (ORT_TARGET_DATA | ORT_TARGET))) > + break; > + splay_tree_node n2 > + = splay_tree_lookup (octx->variables, > + (splay_tree_key) decl); > + if (n2) > + { > + nflags |= GOVD_MAP; > + goto found_outer; > + } > + } > } > - else if (nflags == flags) > - nflags |= GOVD_MAP; > + The main issue I have is with the omp-low.c changes. I see: "2.5.9 private clause The private clause is allowed on the parallel construct; it declares that a copy of each item on the list will be created for each parallel gang. 2.5.10 firstprivate clause The firstprivate clause is allowed on the parallel construct; it declares that a copy of each item on the list will be created for each parallel gang, and that the copy will be initialized with the value of that item on the host when the parallel construct is encountered." but looking at what you actually emit looks like standard present_copyin clause I think with a private variable defined in the region where the value of the present_copyin mapped variable is assigned to the private one. This I'm afraid performs often two copies rather than just one (one to copy the host value to the present_copyin mapped value, another one in the region), but more importantly, if the var is already mapped, you could initialize the private var with old data. Say int arr[64]; // initialize arr #pragma acc data copyin (arr) { // modify arr on the host # pragma acc parallel firstprivate (arr)
Re: RFC: C++ delayed folding merge
On 11/09/2015 04:24 AM, Eric Botcazou wrote: One question: The branch changes 'convert' to not fold its result, and it's not clear to me whether that's part of the expected behavior of a front end 'convert' function or not. I don't think that you should change the behavior for front-ends that have an internal representation distinct from the GENERIC trees and thus do a global translation to GENERIC at the end; e.g. in the Ada compiler we'd rather have *more* folding than less during this translation. Right, the change is just to the C++ front end 'convert'. Jason
Re: OpenACC Firstprivate
On 11/09/15 08:46, Jakub Jelinek wrote: On Sat, Nov 07, 2015 at 08:50:28AM -0500, Nathan Sidwell wrote: Index: gcc/gimplify.c === If you want to switch to hexadecimal, you should change all values in the enum to hexadecimal for consistency. ok. /* Gimplify hashtable helper. */ @@ -377,6 +383,12 @@ new_omp_context (enum omp_region_type re else c->default_kind = OMP_CLAUSE_DEFAULT_UNSPECIFIED; + c->combined_loop = false; + c->distribute = false; + c->target_map_scalars_firstprivate = false; + c->target_map_pointers_as_0len_arrays = false; + c->target_firstprivatize_array_bases = false; Why this? c is XCNEW allocated, so zero initialized. I presumed it necessary, as it was on the branch. will remove. @@ -5667,11 +5682,13 @@ omp_add_variable (struct gimplify_omp_ct /* We shouldn't be re-adding the decl with the same data sharing class. */ gcc_assert ((n->value & GOVD_DATA_SHARE_CLASS & flags) == 0); - /* The only combination of data sharing classes we should see is -FIRSTPRIVATE and LASTPRIVATE. */ nflags = n->value | flags; - gcc_assert ((nflags & GOVD_DATA_SHARE_CLASS) - == (GOVD_FIRSTPRIVATE | GOVD_LASTPRIVATE) + /* The only combination of data sharing classes we should see is +FIRSTPRIVATE and LASTPRIVATE. However, OpenACC permits +reduction variables to be used in data sharing clauses. */ + gcc_assert ((ctx->region_type & ORT_ACC) != 0 + || ((nflags & GOVD_DATA_SHARE_CLASS) + == (GOVD_FIRSTPRIVATE | GOVD_LASTPRIVATE)) || (flags & GOVD_DATA_SHARE_CLASS) == 0); Are you sure you want to give up on any kind of consistency checks for OpenACC? If only reduction is special on OpenACC, perhaps you could tweak the assert for that instead? Something that can be done incrementally of course. Will investigate (later) + + /* OpenMP doesn't look in outer contexts to find an + enclosing data clause. */ I'm puzzled by the comment. OpenMP does look in outer context for clauses that need that (pretty much all closes but private), that is the do_outer: recursion in omp_notice_variable. Say for firstprivate in order to copy (or copy construct) the private variable one needs the access to the outer context's var etc.). So perhaps it would help to document what you are doing here for OpenACC and why. Ok. It seemed (and it may become clearer with default handling added), that OpenACC and OpenMP scanned scopes in opposite orders. I remember trying to get the ACC code to scan in the same order, but came up blank. Anyway, you're right, it should say what OpenACC is trying. The main issue I have is with the omp-low.c changes. I see: "2.5.9 private clause The private clause is allowed on the parallel construct; it declares that a copy of each item on the list will be created for each parallel gang. 2.5.10 firstprivate clause The firstprivate clause is allowed on the parallel construct; it declares that a copy of each item on the list will be created for each parallel gang, and that the copy will be initialized with the value of that item on the host when the parallel construct is encountered." but looking at what you actually emit looks like standard present_copyin clause I think with a private variable defined in the region where the value of the present_copyin mapped variable is assigned to the private one. This I'm afraid performs often two copies rather than just one (one to copy the host value to the present_copyin mapped value, another one in the region), I don't think that can be avoided. The host doesn't have control over when the CTAs (a gang) start -- they may even be serialized onto the same physical HW. So each gang has to initialize its own instance. Or did you mean something else? but more importantly, if the var is already mapped, you could initialize the private var with old data. Say int arr[64]; // initialize arr #pragma acc data copyin (arr) { // modify arr on the host # pragma acc parallel firstprivate (arr) { ... } } Hm, I suspect that is either ill formed or the std does not contemplate. Is that really what you want? If not, any reason not to implement GOMP_MAP_FIRSTPRIVATE and GOMP_MAP_FIRSTPRIVATE_INT on the libgomp oacc-* side and just use the OpenMP firstprivate handling in omp-low.c? I would have to investigate ... nathan
Re: [PATCH] Use signed boolean type for boolean vectors
On Mon, Nov 9, 2015 at 3:03 PM, Ilya Enkovichwrote: > On 03 Nov 14:42, Richard Biener wrote: >> On Wed, Oct 28, 2015 at 4:30 PM, Ilya Enkovich >> wrote: >> > 2015-10-28 18:21 GMT+03:00 Richard Biener : >> >> On Wed, Oct 28, 2015 at 2:13 PM, Ilya Enkovich >> >> wrote: >> >>> Hi, >> >>> >> >>> Testing boolean vector conversions I found several runtime regressions >> >>> and investigation showed it's due to incorrect conversion caused by >> >>> unsigned boolean type. When boolean vector is represented as an >> >>> integer vector on target it's a signed integer actually. Unsigned >> >>> boolean type was chosen due to possible single bit values, but for >> >>> multiple bit values it causes wrong casting. The easiest way to fix >> >>> it is to use signed boolean value. The following patch does this and >> >>> fixes my problems with conversion. Bootstrapped and tested on >> >>> x86_64-unknown-linux-gnu. Is it OK? >> >> >> >> Hmm. Actually formally the "boolean" vectors were always 0 or -1 >> >> (all bits set). That is also true for a signed boolean with precision 1 >> >> but with higher precision what makes sure to sign-extend 'true'? >> >> >> >> So it's far from an obvious change, esp as you don't change the >> >> precision == 1 case. [I still think we should have precision == 1 >> >> for all boolean types] >> >> >> >> Richard. >> >> >> > >> > For 1 bit precision signed type value 1 is out of range, right? This might >> > break >> > in many place due to used 1 as true value. >> >> For vectors -1 is true. Did you try whether it breaks many places? >> build_int_cst (type, 1) should still work fine. >> >> Richard. >> > > I tried it and didn't find any new failures. So looks I was wrong assuming > it should cause many failures. Testing is not complete because many SPEC > benchmarks are failing to compile on -O3 for AVX-512 on trunk. But I think > we may proceed with signed type and fix constant generation issues if any > revealed. This patch was bootstrapped and regtested on > x86_64-unknown-linux-gnu. OK for trunk? Ok. Richard. > Thanks, > Ilya > -- > gcc/ > > 2015-11-09 Ilya Enkovich > > * optabs.c (expand_vec_cond_expr): Always get sign from type. > * tree.c (wide_int_to_tree): Support negative values for boolean. > (build_nonstandard_boolean_type): Use signed type for booleans. > > > diff --git a/gcc/optabs.c b/gcc/optabs.c > index fdcdc6a..44971ad 100644 > --- a/gcc/optabs.c > +++ b/gcc/optabs.c > @@ -5365,7 +5365,6 @@ expand_vec_cond_expr (tree vec_cond_type, tree op0, > tree op1, tree op2, >op0a = TREE_OPERAND (op0, 0); >op0b = TREE_OPERAND (op0, 1); >tcode = TREE_CODE (op0); > - unsignedp = TYPE_UNSIGNED (TREE_TYPE (op0a)); > } >else > { > @@ -5374,9 +5373,9 @@ expand_vec_cond_expr (tree vec_cond_type, tree op0, > tree op1, tree op2, >op0a = op0; >op0b = build_zero_cst (TREE_TYPE (op0)); >tcode = LT_EXPR; > - unsignedp = false; > } >cmp_op_mode = TYPE_MODE (TREE_TYPE (op0a)); > + unsignedp = TYPE_UNSIGNED (TREE_TYPE (op0a)); > > >gcc_assert (GET_MODE_SIZE (mode) == GET_MODE_SIZE (cmp_op_mode) > diff --git a/gcc/tree.c b/gcc/tree.c > index 18d6544..6fb4c09 100644 > --- a/gcc/tree.c > +++ b/gcc/tree.c > @@ -1437,7 +1437,7 @@ wide_int_to_tree (tree type, const wide_int_ref ) > case BOOLEAN_TYPE: > /* Cache false or true. */ > limit = 2; > - if (hwi < 2) > + if (IN_RANGE (hwi, 0, 1)) > ix = hwi; > break; > > @@ -8069,7 +8069,7 @@ build_nonstandard_boolean_type (unsigned HOST_WIDE_INT > precision) > >type = make_node (BOOLEAN_TYPE); >TYPE_PRECISION (type) = precision; > - fixup_unsigned_type (type); > + fixup_signed_type (type); > >if (precision <= MAX_INT_CACHED_PREC) > nonstandard_boolean_type_cache[precision] = type;
RE: [RFC][PATCH] Preferred rename register in regrename pass
Hi Bernd, Sorry for late reply. The updated patch was bootstrapped on x86_64-unknown-linux-gnu and cross tested on mips-img-linux-gnu using r229786. The results below were generated for CSiBE benchmark and the numbers in columns express bytes in format 'net (gain/loss)' to show the difference with and without the patch when -frename-registers switch is used. I looked at the gains, especially for MIPS and 'teem', and it appears that renaming registers affects the rtl_dce pass i.e. makes it less effective. However, on average case the patch appears to reduce the code size slightly and moves are genuinely removed. I haven't tested the performance extensively but the SPEC benchmarks showed almost the same results, which could be just the noise. | MIPS n64 -Os | MIPS o32 -Os | x86_64 -Os | ---+++--+ bzip2-1.0.2| -32 (0/-32) | -24 (0/-24) | -34 (1/-35)| cg_compiler| -172 (0/-172) | -156 (0/-156) | -46 (0/-46)| compiler | -36 (0/-36) | -24 (0/-24) | -6(0/-6) | flex-2.5.31| -68 (0/-68) | -80 (0/-80) | -98 (7/-105) | jikespg-1.3| -284 (0/-284) | -204 (0/-204) | -127 (9/-136) | jpeg-6b| -52 (8/-60) | -20 (0/-20) | -80 (11/-91) | libmspack | -136 (0/-136) | -28 (0/-28) | -33 (23/-56) | libpng-1.2.5 | -72 (0/-72) | -64 (0/-64) | -176 (14/-190) | linux-2.4.23 | -700 (20/-720) | -384 (0/-384) | -691 (44/-735) | lwip-0.5.3 | -4 (0/-4)| -4 (0/-4)| +4(13/-9)| mpeg2dec-0.3.1 | -16 (0/-16) || -142 (6/-148) | mpgcut-1.1 | -24 (0/-24) | -12 (4/-16) | -2(0/-2) | OpenTCP-1.0.4 | -28 (0/-28) | -12 (0/-12) | -1(0/-1) | replaypc-0.4.0 | -32 (0/-32) | -12 (0/-12) | -4(2/-6) | teem-1.6.0 | -88 (480/-568)| +108 (564/-456)| -1272 (117/-1389)| ttt-0.10.1 | -24 (0/-24) | -20 (0/-20) | -16 (0/-16)| unrarlib-0.4.0 | -20 (0/-20) | -8 (0/-8)| -59 (9/-68)| zlib-1.1.4 | -12 (0/-12) | -4 (0/-4)| -23 (8/-31)| | MIPS n64 -O2 | MIPS o32 -O2 | x86_64 -O2 | ---+++--+ bzip2-1.0.2| -104 (0/-104) | -48 (0/-48) | -55 (0/-55)| cg_compiler| -184 (4/-188) | -232 (0/-232) | -31 (5/-36)| compiler | -32 (0/-32) | -12 (0/-12) | -4(1/-5) | flex-2.5.31| -96 (0/-96) | -112 (0/-112) | -12 (34/-46) | jikespg-1.3| -540 (20/-560) | -476 (4/-480) | -154 (30/-184) | jpeg-6b| -112 (16/-128) | -60 (0/-60) | -136 (84/-220) | libmspack | -164 (0/-164) | -40 (0/-40) | -87 (32/-119) | libpng-1.2.5 | -120 (8/-128) | -92 (4/-96) | -140 (53/-193) | linux-2.4.23 | -596 (12/-608) | -320 (8/-328) | -794 (285/-1079)| lwip-0.5.3 | -8 (0/-8)| -8 (0/-8)| +2(4/-2) | mpeg2dec-0.3.1 | -44 (0/-44) | -4 (0/-4)| -122 (8/-130) | mpgcut-1.1 | -8 (0/-8)| -8 (0/-8)| +28 (32/-4)| OpenTCP-1.0.4 | -4 (0/-4)| -4 (0/-4)| -2(0/-2) | replaypc-0.4.0 | -20 (0/-20) | -24 (0/-24) | -13 (0/-13)| teem-1.6.0 | +100 (740/-640)| +84 (736/-652)| -1998 (168/-2166)| ttt-0.10.1 | -16 (0/-16) || | unrarlib-0.4.0 | -16 (0/-16) | -8 (0/-8)| +19 (37/-18) | zlib-1.1.4 | -12 (0/-12) | -4 (0/-4)| -15 (1/-16)| Regards, Robert > Hi Robert, > > gcc/ > > * regrename.c (create_new_chain): Initialize terminated_dead, > > renamed and tied_chain. > > (find_best_rename_reg): Pick and check register from the tied chain. > > (regrename_do_replace): Mark head as renamed. > > (scan_rtx_reg): Tie chains in move insns. Set terminate_dead flag. > > * regrename.h (struct du_head): Add tied_chain, renamed and > > terminated_dead members. > > Thanks - this looks a lot better already. You didn't say how it was > bootstrapped and tested; please include this information for future > submissions. For a patch like this, some data on the improvement you got > would also be appreciated. > > I'd still like to investigate the possibility of further simplification: > > > + { > > + /* Find the input chain. */ > > + for (i = c->id - 1; id_to_chain.iterate (i, ); i--) > > + if (head->last && head->last->insn == insn > > + && head->terminated_dead) > > + { > > + gcc_assert (head->regno == REGNO (recog_data.operand[1])); > > + c->tied_chain = head; > > + head->tied_chain = c; > > + > > + if (dump_file) > > + fprintf (dump_file, "Tying chain %s (%d) with %s (%d)\n", > > + reg_names[c->regno], c->id, > > + reg_names[head->regno], head->id);
Re: [AArch64][PATCH 6/7] Add NEON intrinsics vqrdmlah and vqrdmlsh.
On 30 October 2015 at 16:52, Matthew Wahabwrote: > On 30/10/15 12:51, Christophe Lyon wrote: >> >> On 23 October 2015 at 14:26, Matthew Wahab >> wrote: >>> >>> The ARMv8.1 architecture extension adds two Adv.SIMD instructions, >>> sqrdmlah and sqrdmlsh. This patch adds the NEON intrinsics vqrdmlah and >>> vqrdmlsh for these instructions. The new intrinsics are of the form >>> vqrdml{as}h[q]_. >>> >>> Tested the series for aarch64-none-linux-gnu with native bootstrap and >>> make check on an ARMv8 architecture. Also tested aarch64-none-elf with >>> cross-compiled check-gcc on an ARMv8.1 emulator. >>> >> >> Is there a publicly available simulator for v8.1? QEMU or Foundation >> Model? >> > > Sorry, I don't know. > Matthew > So, what will happen to the testsuite once this is committed? Are we going to see FAILs when using QEMU? Thanks, Christophe.
Re: [PATCH] Minor refactoring in tree-ssanames.c & freelists verifier
Hi, On Mon, 9 Nov 2015, Jeff Law wrote: +verify_ssaname_freelists (struct function *fun) +{ + /* Do nothing if we are in RTL format. */ + basic_block bb; + FOR_EACH_BB_FN (bb, fun) +{ + if (bb->flags & BB_RTL) + return; +} gimple_in_ssa_p (fun); + /* Then note the operands of each statement. */ + for (gimple_stmt_iterator gsi = gsi_start_bb (bb); + !gsi_end_p (gsi); + gsi_next ()) + { + ssa_op_iter iter; + gimple *stmt = gsi_stmt (gsi); + FOR_EACH_SSA_TREE_OPERAND (t, stmt, iter, SSA_OP_ALL_OPERANDS) + if (TREE_CODE (t) == SSA_NAME) + bitmap_set_bit (names_in_il, SSA_NAME_VERSION (t)); + } t will always be an SSA_NAME here. Ciao, Michael.
GCC 6 Status Report (2015-11-09)
Status == We've pushed back the switch to Stage 3 to the end of Saturday, Nov 14th. This is to allow smooth draining of review queues. Quality Data Priority # Change from last report --- --- P14+ 2 P2 84 P3 130+ 10 P4 83- 5 P5 32 --- --- Total P1-P3 218+ 12 Total 333+ 7 Previous Report === https://gcc.gnu.org/ml/gcc/2015-10/msg00113.html
[PATCH][optabs][ifcvt][1/3] Define negcc, notcc optabs
Hi all, This is a rebase of the patch I posted at: https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00154.html The patch has been ok'd by Jeff but I wanted to hold off committing it until my fixes for the ifcvt regressions on sparc and x86_64 were fixed. The rebase conflicts were due to Richard's optabs splitting patch. I've also noticed that in my original patch I had a comparison of branch cost with the magic number '2'. I removed it from this version as it's not really meaningful. The transformation this patch enables is, at the moment, only supported for arm and aarch64 where it is always beneficial. If/when we have a proper ifcvt costing model (perhaps for GCC 7?) we'll update this accordingly if needed. Jeff, sorry for taking so long to commit this, I just wanted to fix the other ifcvt fallout before proceeding with more new functionality. I have also uncovered a bug in the arm implementation of these optabs (patch 3/3 in the series), so I'll post an updated version of that patch as well soon. Ok to commit this updated version instead? Bootstrapped and tested on arm, aarch64 and x86_64. It has been sitting in my tree for a couple of months now with no issues. Thanks, Kyrill 2015-11-09 Kyrylo Tkachov* ifcvt.c (noce_try_inverse_constants): New function. (noce_process_if_block): Call it. * optabs.h (emit_conditional_neg_or_complement): Declare prototype. * optabs.def (negcc_optab, notcc_optab): Declare. * optabs.c (emit_conditional_neg_or_complement): New function. * doc/tm.texi (Standard Names): Document negcc, notcc names. commit 93cd987e9ab02ac68b44b2470bb5c4c6345efeca Author: Kyrylo Tkachov Date: Thu Aug 13 18:14:52 2015 +0100 [optabs][ifcvt][1/3] Define negcc, notcc optabs diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 619259f..c4e43f3 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -5791,6 +5791,21 @@ move operand 2 or (operands 2 + operand 3) into operand 0 according to the comparison in operand 1. If the comparison is false, operand 2 is moved into operand 0, otherwise (operand 2 + operand 3) is moved. +@cindex @code{neg@var{mode}cc} instruction pattern +@item @samp{neg@var{mode}cc} +Similar to @samp{mov@var{mode}cc} but for conditional negation. Conditionally +move the negation of operand 2 or the unchanged operand 3 into operand 0 +according to the comparison in operand 1. If the comparison is true, the negation +of operand 2 is moved into operand 0, otherwise operand 3 is moved. + +@cindex @code{not@var{mode}cc} instruction pattern +@item @samp{not@var{mode}cc} +Similar to @samp{neg@var{mode}cc} but for conditional complement. +Conditionally move the bitwise complement of operand 2 or the unchanged +operand 3 into operand 0 according to the comparison in operand 1. +If the comparison is true, the complement of operand 2 is moved into +operand 0, otherwise operand 3 is moved. + @cindex @code{cstore@var{mode}4} instruction pattern @item @samp{cstore@var{mode}4} Store zero or nonzero in operand 0 according to whether a comparison diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c index 157a716..1e773d8 100644 --- a/gcc/ifcvt.c +++ b/gcc/ifcvt.c @@ -1179,6 +1179,83 @@ noce_try_store_flag (struct noce_if_info *if_info) } } + +/* Convert "if (test) x = -A; else x = A" into + x = A; if (test) x = -x if the machine can do the + conditional negate form of this cheaply. + Try this before noce_try_cmove that will just load the + immediates into two registers and do a conditional select + between them. If the target has a conditional negate or + conditional invert operation we can save a potentially + expensive constant synthesis. */ + +static bool +noce_try_inverse_constants (struct noce_if_info *if_info) +{ + if (!noce_simple_bbs (if_info)) +return false; + + if (!CONST_INT_P (if_info->a) + || !CONST_INT_P (if_info->b) + || !REG_P (if_info->x)) +return false; + + machine_mode mode = GET_MODE (if_info->x); + + HOST_WIDE_INT val_a = INTVAL (if_info->a); + HOST_WIDE_INT val_b = INTVAL (if_info->b); + + rtx cond = if_info->cond; + + rtx x = if_info->x; + rtx target; + + start_sequence (); + + rtx_code code; + if (val_b != HOST_WIDE_INT_MIN && val_a == -val_b) +code = NEG; + else if (val_a == ~val_b) +code = NOT; + else +{ + end_sequence (); + return false; +} + + rtx tmp = gen_reg_rtx (mode); + noce_emit_move_insn (tmp, if_info->a); + + target = emit_conditional_neg_or_complement (x, code, mode, cond, tmp, tmp); + + if (target) +{ + rtx_insn *seq = get_insns (); + + if (!seq) + { + end_sequence (); + return false; + } + + if (target != if_info->x) + noce_emit_move_insn (if_info->x, target); + + seq = end_ifcvt_sequence (if_info); + + if (!seq) + return false; + + emit_insn_before_setloc (seq, if_info->jump, + INSN_LOCATION (if_info->insn_a)); + return true; +} + + end_sequence (); +
[patch] backport remove of soft float for FreeBSD powerpc for gcc-4.9
Hi all, any objections when I apply the below patch to gcc-4.9? TIA, Andreas 2015-11-09 Andreas ToblerBackport from mainline 2015-03-04 Andreas Tobler * config/rs6000/t-freebsd64: Remove 32-bit soft-float multilibs. Index: gcc/config/rs6000/t-freebsd64 === --- gcc/config/rs6000/t-freebsd64 (revision 230016) +++ gcc/config/rs6000/t-freebsd64 (working copy) @@ -21,11 +21,9 @@ # On FreeBSD the 32-bit libraries are found under /usr/lib32. # Set MULTILIB_OSDIRNAMES according to this. -MULTILIB_OPTIONS= m32 msoft-float -MULTILIB_DIRNAMES = 32 nof +MULTILIB_OPTIONS= m32 +MULTILIB_DIRNAMES = 32 MULTILIB_EXTRA_OPTS = fPIC mstrict-align MULTILIB_EXCEPTIONS = -MULTILIB_EXCLUSIONS = !m32/msoft-float MULTILIB_OSDIRNAMES= ../lib32 -#MULTILIB_MATCHES= $(MULTILIB_MATCHES_FLOAT)
Re: OpenACC Firstprivate
On 11/09/15 08:59, Nathan Sidwell wrote: On 11/09/15 08:46, Jakub Jelinek wrote: On Sat, Nov 07, 2015 at 08:50:28AM -0500, Nathan Sidwell wrote: Say int arr[64]; // initialize arr #pragma acc data copyin (arr) { // modify arr on the host # pragma acc parallel firstprivate (arr) { ... } } Hm, I suspect that is either ill formed or the std does not contemplate. just realized, there are two ways to consider the above. 1) it's ill formed. Once you've transferred data to the device, modifying it on the host is unspecified. I'm having trouble finding words in the std that actually say that though :( 2) on a system with shared physical global memory, the host modification would be visiable on the device (possibly at an arbitrary point due to lack of synchronization primitive?) I don't think this changes 'why not use OpenMP's ...' question, because IIUC you think that can be made to DTRT anyway? nathan
Re: [Patch AArch64] Switch constant pools to separate rodata sections.
On 08/11/15 11:42, Andreas Schwab wrote: > This is causing a bootstrap comparison failure in gcc/go/gogo.o. I'm looking into this - this is now PR68256. regards Ramana > > Andreas. >
Re: [PATCH] Fix memory leaks and use a pool_allocator
On Mon, Nov 9, 2015 at 2:26 PM, Martin Liškawrote: > On 11/09/2015 01:11 PM, Richard Biener wrote: >> On Mon, Nov 9, 2015 at 12:22 PM, Martin Liška wrote: >>> Hi. >>> >>> This is follow-up of changes that Richi started on Friday. >>> >>> Patch can bootstrap on x86_64-linux-pc and regression tests are running. >>> >>> Ready for trunk? >> >> * tree-ssa-dom.c (free_edge_info): Make the function extern. >> ... >> * tree-ssa.h (free_edge_info): Declare function extern. >> >> declare this in tree-ssa-threadupdate.h instead and renaming it to >> sth less "public", like free_dom_edge_info. >> >> diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c >> index fff62de..eb6b7df 100644 >> --- a/gcc/ifcvt.c >> +++ b/gcc/ifcvt.c >> @@ -3161,6 +3161,8 @@ noce_convert_multiple_sets (struct noce_if_info >> *if_info) >>set_used_flags (targets[i]); >> } >> >> + temporaries.release (); >> + >>set_used_flags (cond); >>set_used_flags (x); >>set_used_flags (y); >> @@ -3194,6 +3196,7 @@ noce_convert_multiple_sets (struct noce_if_info >> *if_info) >> } >> >>num_updated_if_blocks++; >> + targets.release (); >>return TRUE; >> >> suspiciously look like candidates for an auto_vec<> (didn't check). >> >> @@ -1240,6 +1240,7 @@ lra_create_live_ranges_1 (bool all_p, bool dead_insn_p) >>dead_set = sparseset_alloc (max_regno); >>unused_set = sparseset_alloc (max_regno); >>curr_point = 0; >> + point_freq_vec.release (); >>point_freq_vec.create (get_max_uid () * 2); >> >> a truncate (0) instead of a release () should be cheaper, avoiding the >> re-allocation. >> >> @@ -674,6 +674,10 @@ sra_deinitialize (void) >>assign_link_pool.release (); >>obstack_free (_obstack, NULL); >> >> + for (hash_map ::iterator it = >> + base_access_vec->begin (); it != base_access_vec->end (); ++it) >> +(*it).second.release (); >> + >>delete base_access_vec; >> >> I wonder if the better fix is to provide a proper free method for the >> hash_map? >> A hash_map with 'auto_vec' looks suspicous - eventually a proper release >> was intented here via default_hash_map_traits <>? >> >> Anyway, most of the things above can be improved as followup of course. >> >> Thanks, >> Richard. >> >>> Thanks, >>> Martin > > Hi. > > All suggested changes were applied, sending v2 and waiting for bootstrap and > regression tests. Ok. Thanks, Richard. > Thanks, > Martin
Re: OpenACC Firstprivate
On 11/09/15 09:10, Jakub Jelinek wrote: On Mon, Nov 09, 2015 at 08:59:15AM -0500, Nathan Sidwell wrote: This I'm afraid performs often two copies rather than just one (one to copy the host value to the present_copyin mapped value, another one in the region), I don't think that can be avoided. The host doesn't have control over when the CTAs (a gang) start -- they may even be serialized onto the same physical HW. So each gang has to initialize its own instance. Or did you mean something else? So, what is the scope of the private and firstprivate vars in OpenACC? In OpenMP if a variable is private or firstprivate on the target construct, unless further privatized in inner constructs it is really shared among all the threads in all the teams (ro one var per all CTAs/workers in PTX terms). Is that the case for OpenACC too, or are the vars e.g. private to each CTA already or to each thread in each CTA, something different? If they are shared by all CTAs, then you should hopefully be able to use the GOMP_MAP_FIRSTPRIVATE{,_INT}, if not, then I'd say you should at least use those to provide you the initializer data to initialize your private vars from as a cheaper alternative to mapping. I'm going to try and get clarification, but I think the intent is to initialize with the value seen on the device. Consider: int foo = 0; #pragma acc data copyin(foo) { #pragma acc parallel present(foo) { foo = 2; } if (expr){ #pragma update host (foo) } #pragma acc parallel firstprivate (foo) { // which initialization value? } } Here we copy data to the device, then set it a distinct value there. We conditionally update the host's instance from the device. My thinking is that the intent of the firstprivate is to initialize with the value known on the device (and behave as-if copyin, if it's not there). Not the value most recently seen on the host -- the update clause could change that, and may well be being used as a debugging aide, so it seems bizarre that it can change program semantics in such a way.
Re: Add null identifiers to genmatch
On Mon, Nov 9, 2015 at 12:17 AM, Jeff Lawwrote: > On 11/07/2015 07:31 AM, Pedro Alves wrote: >> >> Hi Richard, >> >> Passerby comment below. >> >> On 11/07/2015 01:21 PM, Richard Sandiford wrote: >>> >>> -/* Lookup the identifier ID. */ >>> +/* Lookup the identifier ID. Allow "null" if ALLOW_NULL. */ >>> >>> id_base * >>> -get_operator (const char *id) >>> +get_operator (const char *id, bool allow_null = false) >>> { >>> + if (allow_null && strcmp (id, "null") == 0) >>> +return null_id; >>> + >>> id_base tem (id_base::CODE, id); >> >> >> Boolean params are best avoided if possible, IMO. In this case, >> it seems this could instead be a new wrapper function, like: > > This hasn't been something we've required for GCC.I've come across this > recommendation a few times over the last several months as I continue to > look at refactoring and best practices for codebases such as GCC. > > By encoding the boolean in the function's signature, it (IMHO) does make the > code a bit easier to read, primarily because you don't have to go lookup the > tense of the boolean). The problem is when the boolean is telling us some > property an argument, but there's more than one argument and other similar > situations. > > I wonder if the real benefit is in the refactoring necessary to do things in > this way without a ton of code duplication. I think the patch is ok as-is. Thus ok. Thanks, Richard. > Jeff > >
Re: [PATCH 10/12] always define EH_RETURN_HANDLER_RTX
On Nov 9, 2015, at 11:46 AM, Jeff Lawwrote: On 11/09/2015 12:38 PM, Bernd Schmidt wrote: >> We might want to think about making a policy decision to try waiving >> some of the testing requirements for target macro -> hook conversions. >> Maybe try only a "build to cc1" requirement and see whether that causes >> too much breakage. > A config-list.mk build is a build to cc1*, f951, gnat1, so we're not > requiring deep tests on the affected targets. Not sure how much we're > getting by forcing a bootstrap & regression test of that kind of change. > > I'm certainly open to this kind of relaxed testing to help this stuff move > forward an complete before we're all retired :-) Testing is a cornerstone of gcc quality. I like it. It is useful. That said, I don’t think we should always be fanatical about it. How and when we accept less that a standard bootstrap and regression test run I’ve sure would be a big topic, but rather than make a ton of rules, I’d rather let small handful of reviewers decide when and how to accept less, and let them do what they want. We can give them negative feedback if it impacts too many people, too often and they can adjust. The other way, would be to have an integration branch that is tested and merged post testing on a regular basis and let people contribute less than well tested things on it, the idea being that it still won’t hit trunk until after a bootstrap and tests suite run, but that we can bundle 2-100 patches into one test suite run. This strikes me as more scalable, easier for developers and removes the requirement of test suite + bootstrap before checkin while retaining the useful quality of everything merged to trunk is tested. Hardest part about this would be ChangeLogs, merge resolution and svn blame. git handles this gracefully. svn as I recall, a little less so. [ quick check ] Ah, seems svn blame -g TARGET ca n handle this graceful (in theory).
Re: [Patch] Change to argument promotion in fixed conversion library calls
On Mon, 2015-11-09 at 21:47 +0100, Bernd Schmidt wrote: > On 11/09/2015 05:59 PM, Steve Ellcey wrote: > > Here is a version with the code moved into a new function. How does > > this look? > > > > 2015-11-09 Steve Ellcey> > > > * optabs.c (prepare_libcall_arg): New function. > > (expand_fixed_convert): Add call to prepare_libcall_arg. > > Hold on a moment - I see that emit_library_call_value_1 calls > promote_function_mode for arguments. Can you investigate why that > doesn't do what you need? > > > Bernd emit_library_call_value_1 has no way of knowing if the promotion should be signed or unsigned because it has a mode (probably QImode or HImode) that it knows may need to be promoted to SImode but it has no way to know if that should be a signed or unsigned promotion because it has no tree type information about the library call argument types. Right now it guesses based on the return type but it may guess wrong when converting an unsigned int to a signed fixed type or visa versa. By doing the promotion in expand_fixed_convert GCC can use the uintp argument to ensure that the signedness of the promotion is done correctly. We could pass that argument into emit_library_call_value_1 so it can do the correct promotion but that would require changing the argument list for emit_library_call and emit_library_call_value_1 and changing all the other call locations for those functions and that seemed like overkill. Steve Ellcey
Re: [PATCH], Add power9 support to GCC, patch #5 (ISA 3.0 fusion)
On Mon, Nov 9, 2015 at 11:57 AM, Segher Boessenkoolwrote: > On Mon, Nov 09, 2015 at 12:34:20PM -0500, Michael Meissner wrote: >> > > +(define_insn "*toc_fusionload_" >> > > + [(set (match_operand:QHSI 0 "int_reg_operand" "=,??r") >> > > + (match_operand:QHSI 1 "toc_fusion_mem_wrapped" "wG,wG")) >> > > + (unspec [(const_int 0)] UNSPEC_FUSION_ADDIS) >> > > + (use (match_operand:DI 2 "base_reg_operand" "r,r")) >> > > + (clobber (match_scratch:DI 3 "=X,"))] >> > > + "TARGET_TOC_FUSION_INT" >> > >> > Do you need that "??r" alternative? Same for the next define_insn. >> >> Yes unfortunately. The ??r catches the case where r0 is chosen. R0 is not a >> base register, and it can't be used for power8 gpr fusion (where you use the >> value being loaded for the ADDIS instruction), but it can be used for power9 >> fusion (where the ADDIS must be adjancent, but it no longer has to be the >> register being loaded). > > If you have only "b", r0 will not be chosen. Does that help? Or are > you generating this pattern from somewhere else where you put in r0? Mike, What happens if you leave out the "r" alternative? Does other code explicitly generate that pattern with r0? Thanks, David
Re: [PATCH 10/12] always define EH_RETURN_HANDLER_RTX
On Mon, Nov 09, 2015 at 12:46:33PM -0700, Jeff Law wrote: > On 11/09/2015 12:38 PM, Bernd Schmidt wrote: > >On 11/09/2015 07:52 PM, Trevor Saunders wrote: > > > >>yeah, that's more or less my thought, and this makes hookization easier > >>since you can now mechanically add a hook for each thing in defaults.h > >>that invokes the macro. Then for each target you can go through and > >>replace the macro with an override of the hooks. That ends up with the > >>macros replaced by hooks without writing a lot of patches that need to > >>go through config-list.mk, and testing on multiple targets which imho is > >>a giant pain, and rather slow. > > > >We might want to think about making a policy decision to try waiving > >some of the testing requirements for target macro -> hook conversions. > >Maybe try only a "build to cc1" requirement and see whether that causes > >too much breakage. > A config-list.mk build is a build to cc1*, f951, gnat1, so we're not > requiring deep tests on the affected targets. Not sure how much we're > getting by forcing a bootstrap & regression test of that kind of change. So in general when I've done cross target things I think I've found more bugs with config-list.mk than with a regtest, but the regtest has found some things I think. However I actually don't mind bootstrapping and regtesting that much, its more or less a few hours for the control and then another few for each patch. On the other hand config-list.mk takes on the order of 12 hours, and setting up a cross for a quick test isn't really that quick. Which means that if you have a patch touching a number of targets you end up not checking it compiles at all until you run config-list.mk, and then its a heavy weight operation. So at least for the way I work I'd really rather write series that I can incrementally test on just one target and be reasonably confident they won't break other targets. The add default macro definitions then wrap those with hooks, then target by target replace the macro by hook overrides approach seems to provide that you can incrementally test and fiind most of the issues, but the change a macro every where approach doesn't really. Trev The add default macros then use those in hooks, and finally add overides > > I'm certainly open to this kind of relaxed testing to help this stuff move > forward an complete before we're all retired :-) > > Jeff >
Re: [PATCH 05/12] always define VMS_DEBUGGING_INFO
On Mon, Nov 09, 2015 at 08:37:19PM +0100, Bernd Schmidt wrote: > On 11/09/2015 08:29 PM, Trevor Saunders wrote: > >as I said in 0/12 this did go through config-list.mk, and checking again > >this does build on alpha-dec-vms. > > The question I have is - why does it build on any other target? It's the > reference that's unconditional, not the definition. Do we have enough DCE at > -O0 to eliminate the reference? It's still incorrect IMO (and should be > fixed in the other patches as well. dce would be my guess. I guess going back to #if ing the bits that reference it, and then incrementally removing the #ifs starting with the ones defining the functions used in the structs, but given you seem to be against patches that only change ifdef to #if you might not likethat :( > > > >I'd actually really rather review them, or really deal with them in any > >way, the way they are. Smaller simpler patches that only deal with one > >thing are much better. I think the most macros that appear on one line > >are 2, so at most you could lower that to 1 change instead of 2, but who > >really cares anyway? > > Well, I do, because I get to see this stuff: > > -#if 1 < (defined (DBX_DEBUGGING_INFO) + defined (SDB_DEBUGGING_INFO) \ > +#if 1 < (defined (DBX_DEBUGGING_INFO) + (SDB_DEBUGGING_INFO) \ > + defined (DWARF2_DEBUGGING_INFO) + defined (XCOFF_DEBUGGING_INFO) > \ > + defined (VMS_DEBUGGING_INFO)) > > #if 1 < (defined (DBX_DEBUGGING_INFO) + (SDB_DEBUGGING_INFO) \ > - + defined (DWARF2_DEBUGGING_INFO) + defined (XCOFF_DEBUGGING_INFO) > \ > + + defined (DWARF2_DEBUGGING_INFO) + (XCOFF_DEBUGGING_INFO) \ > + defined (VMS_DEBUGGING_INFO)) > > #if 1 < (defined (DBX_DEBUGGING_INFO) + (SDB_DEBUGGING_INFO) \ >+ defined (DWARF2_DEBUGGING_INFO) + (XCOFF_DEBUGGING_INFO) \ > - + defined (VMS_DEBUGGING_INFO)) > + + (VMS_DEBUGGING_INFO)) > > #if 1 < (defined (DBX_DEBUGGING_INFO) + (SDB_DEBUGGING_INFO) \ > - + defined (DWARF2_DEBUGGING_INFO) + (XCOFF_DEBUGGING_INFO) \ > + + (DWARF2_DEBUGGING_INFO) + (XCOFF_DEBUGGING_INFO) \ >+ (VMS_DEBUGGING_INFO)) > > -#if 1 < (defined (DBX_DEBUGGING_INFO) + (SDB_DEBUGGING_INFO) \ > +#if 1 < ((DBX_DEBUGGING_INFO) + (SDB_DEBUGGING_INFO) \ >+ (DWARF2_DEBUGGING_INFO) + (XCOFF_DEBUGGING_INFO) \ >+ (VMS_DEBUGGING_INFO)) > > etc. other than reading this now I'm not sure what the context would be, but either way personally I really don't mind reading that, and think its simpler to reason about the correctness of one thing at a time. Trev > > > Bernd
Re: [PATCH], Add power9 support to GCC, patch #5 (ISA 3.0 fusion)
On Mon, Nov 09, 2015 at 01:11:41PM -0800, David Edelsohn wrote: > On Mon, Nov 9, 2015 at 11:57 AM, Segher Boessenkool >wrote: > > On Mon, Nov 09, 2015 at 12:34:20PM -0500, Michael Meissner wrote: > >> > > +(define_insn "*toc_fusionload_" > >> > > + [(set (match_operand:QHSI 0 "int_reg_operand" "=,??r") > >> > > + (match_operand:QHSI 1 "toc_fusion_mem_wrapped" "wG,wG")) > >> > > + (unspec [(const_int 0)] UNSPEC_FUSION_ADDIS) > >> > > + (use (match_operand:DI 2 "base_reg_operand" "r,r")) > >> > > + (clobber (match_scratch:DI 3 "=X,"))] > >> > > + "TARGET_TOC_FUSION_INT" > >> > > >> > Do you need that "??r" alternative? Same for the next define_insn. > >> > >> Yes unfortunately. The ??r catches the case where r0 is chosen. R0 is > >> not a > >> base register, and it can't be used for power8 gpr fusion (where you use > >> the > >> value being loaded for the ADDIS instruction), but it can be used for > >> power9 > >> fusion (where the ADDIS must be adjancent, but it no longer has to be the > >> register being loaded). > > > > If you have only "b", r0 will not be chosen. Does that help? Or are > > you generating this pattern from somewhere else where you put in r0? > > Mike, > > What happens if you leave out the "r" alternative? Does other code > explicitly generate that pattern with r0? Sometimes, one of the passes after reload (usually -fgcse-after-reload) decides to redo the register allocation, and I would see a failure in building things like Spec 2006. I have tried not putting the "r" in there, or using base_reg_operand instead of gpc_reg_operand, but I still got failures. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Re: [PATCH], Add power9 support to GCC, patch #7 (direct move enhancements)
I evidently forgot to attach the patch. [gcc] 2015-11-08 Michael Meissner* config/rs6000/constraints.md (we constraint): New constraint for 64-bit power9 vector support. (wL constraint): New constraint for the element in a vector that can be addressed by the MFVSRLD instruction. * config/rs6000/rs6000.c (rs6000_debug_reg_global): Add ISA 3.0 debugging. (rs6000_init_hard_regno_mode_ok): If ISA 3.0 and 64-bit, enable we constraint. Disable the VSX<->GPR direct move helpers if we have the MFVSRLD and MTVSRDD instructions. (rs6000_secondary_reload_simple_move): Add support for doing vector direct moves directly without additional scratch registers if we have ISA 3.0 instructions. (rs6000_secondary_reload_direct_move): Update comments. (rs6000_output_move_128bit): Add support for ISA 3.0 vector instructions. * config/rs6000/vsx.md (vsx_mov): Add support for ISA 3.0 direct move instructions. (vsx_movti_64bit): Likewise. (vsx_extract_): Likewise. * config/rs6000/rs6000.h (VECTOR_ELEMENT_MFVSRLD_64BIT): New macros for ISA 3.0 direct move instructions. (TARGET_DIRECT_MOVE_128): Likewise. * config/rs6000/rs6000.md (128-bit GPR splitters): Don't split a 128-bit move that is a direct move between GPR and vector registers using ISA 3.0 direct move instructions. * doc/md.texi (RS/6000 constraints): Document we, wF, wG, wL constraints. Update wa documentation to say not to use %x on instructions that only take Altivec registers. [gcc/testsuite] 2015-11-08 Michael Meissner * gcc.target/powerpc/direct-move-vector.c: New test for 128-bit vector direct move instructions. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797 Index: gcc/config/rs6000/constraints.md === --- gcc/config/rs6000/constraints.md(revision 229976) +++ gcc/config/rs6000/constraints.md(working copy) @@ -64,7 +64,8 @@ (define_register_constraint "wa" "rs6000 (define_register_constraint "wd" "rs6000_constraints[RS6000_CONSTRAINT_wd]" "VSX vector register to hold vector double data or NO_REGS.") -;; we is not currently used +(define_register_constraint "we" "rs6000_constraints[RS6000_CONSTRAINT_we]" + "VSX register if the -mpower9-vector -m64 options were used or NO_REGS.") (define_register_constraint "wf" "rs6000_constraints[RS6000_CONSTRAINT_wf]" "VSX vector register to hold vector float data or NO_REGS.") @@ -147,6 +148,12 @@ (define_memory_constraint "wG" "Memory operand suitable for TOC fusion memory references" (match_operand 0 "toc_fusion_mem_wrapped")) +(define_constraint "wL" + "Int constant that is the element number mfvsrld accesses in a vector." + (and (match_code "const_int") + (and (match_test "TARGET_DIRECT_MOVE_128") + (match_test "(ival == VECTOR_ELEMENT_MFVSRLD_64BIT)" + ;; Lq/stq validates the address for load/store quad (define_memory_constraint "wQ" "Memory operand suitable for the load/store quad instructions" Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 229977) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -2575,6 +2575,10 @@ rs6000_debug_reg_global (void) if (TARGET_VSX) fprintf (stderr, DEBUG_FMT_D, "VSX easy 64-bit scalar element", (int)VECTOR_ELEMENT_SCALAR_64BIT); + + if (TARGET_DIRECT_MOVE_128) +fprintf (stderr, DEBUG_FMT_D, "VSX easy 64-bit mfvsrld element", +(int)VECTOR_ELEMENT_MFVSRLD_64BIT); } @@ -2986,6 +2990,10 @@ rs6000_init_hard_regno_mode_ok (bool glo rs6000_constraints[RS6000_CONSTRAINT_wp] = VSX_REGS;/* TFmode */ } + /* Support for new direct moves. */ + if (TARGET_DIRECT_MOVE_128) +rs6000_constraints[RS6000_CONSTRAINT_we] = VSX_REGS; + /* Set up the reload helper and direct move functions. */ if (TARGET_VSX || TARGET_ALTIVEC) { @@ -3034,7 +3042,7 @@ rs6000_init_hard_regno_mode_ok (bool glo reg_addr[TImode].reload_load = CODE_FOR_reload_ti_di_load; } - if (TARGET_DIRECT_MOVE) + if (TARGET_DIRECT_MOVE && !TARGET_DIRECT_MOVE_128) { reg_addr[TImode].reload_gpr_vsx= CODE_FOR_reload_gpr_from_vsxti; reg_addr[V1TImode].reload_gpr_vsx = CODE_FOR_reload_gpr_from_vsxv1ti; @@ -18081,6 +18089,11 @@ rs6000_secondary_reload_simple_move (enu || (to_type == VSX_REG_TYPE && from_type == GPR_REG_TYPE))) return true; + else if (TARGET_DIRECT_MOVE_128 && size == 16 + && ((to_type == VSX_REG_TYPE && from_type ==
Re: Extend tree-call-cdce to calls whose result is used
Hi, On Mon, 9 Nov 2015, Richard Sandiford wrote: > +static bool > +can_use_internal_fn (gcall *call) > +{ > + /* Only replace calls that set errno. */ > + if (!gimple_vdef (call)) > +return false; Oh, I managed to confuse this in my head while reading the patch. So, hmm, you don't actually replace the builtin with an internal function (without the condition) under no-errno-math? Does something else do that? Because otherwise that seems an unnecessary restriction? > >> r229916 fixed that for the non-EH case. > > > > Ah, missed it. Even the EH case shouldn't be difficult. If the > > original dominator of the EH destination was the call block it moves, > > otherwise it remains unchanged. > > The target of the edge is easy in itself, I agree, but that isn't > necessarily the only affected block, if the EH handler doesn't > exit or rethrow. You're worried the non-EH and the EH regions merge again, right? Like so: before change: BB1: throwing-call fallthru/ \EH BB2 BBeh | /\ (stuff in EH-region) | /some path out of EH region | /--/ BB3 Here, BB3 must at least be dominated by BB1 (the throwing block), or by something further up (when there are other side-entries to the path BB2->BB3 or into the EH region). When further up, nothing changes, when it's BB1, then it's afterwards dominated by the BB containing the condition. So everything with idom==BB1 gets idom=Bcond, except for BBeh, which gets idom=Bcall. Depending on how you split BB1, either Bcond or BBcall might still be BB1 and doesn't lead to changes in the dom tree. > > Currently we have quite some of such passes (reassoc, forwprop, > > lower_vector_ssa, cse_reciprocals, cse_sincos (sigh!), optimize_bswap > > and others), but they are all handling only special situations in one > > way or the other. pass_fold_builtins is another one, but it seems > > most related to what you want (replacing a call with something else), > > so I thought that'd be the natural choice. > > Well, to be pedantic, it's not really replacing the call. Except for > the special case of targets that support direct assignments to errno, > it keeps the original call but ensures that it isn't usually executed. > From that point of view it doesn't really seem like a fold. > > But I suppose that's just naming again :-). And it's easily solved with > s/fold/rewrite/. Exactly, in my mind pass_fold_builtin (like many of the others I mentioned) doesn't do folding but rewriting :) > > call_cdce is also such a pass, but I think it's simply not the > > appropriate one (only in so far as its source file contains the helper > > routines you need), and in addition I think it shouldn't exist at all > > (and wouldn't need to if it had been part of DCE from the start, or if > > you implemented the conditionalizing as part of another pass). Hey, > > you could be one to remove a pass! ;-) > > It still seems a bit artificial to me to say that the transformation > with a null lhs is "DCE enough" to go in the main DCE pass (even though > like I say it doesn't actually eliminate any code from the IR, it just > adds more code) and should be kept in a separate pass from the one that > does the transformation on a non-null lhs. Oh, I agree, I might not have been clear: I'm not arguing that the normal DCE should now be changed to do the conditionalizing when it removes an call LHS; I was saying that it _would_ have been good instead of adding the call_cdce pass in the past, when it was for DCE purposes only. But now your proposal is on the plate, namely doing the conditionalizing also with an LHS. So that conditionalizing should take place in some rewriting pass (and ideally not call_cdce), no matter the LHS, and normal DCE not be changed (it will still remove LHSs of non-removable calls, just that those then are sometimes under a condition, when DCE runs after the rewriting). Ciao, Michael.
Remove instantiations when no concept check
Hi I just committed this trivial cleanup. 2015-11-09 François Dumont* include/bits/stl_algo.h (partial_sort_copy): Instantiate std::iterator_traits only if concept checks. (lower_bound): Likewise. (upper_bound): Likewise. (equal_range): Likewise. (binary_search): Likewise. * include/bits/stl_heap.h (pop_heap): Likewise. François diff --git libstdc++-v3/include/bits/stl_algo.h libstdc++-v3/include/bits/stl_algo.h index c90f479..6037044 100644 --- libstdc++-v3/include/bits/stl_algo.h +++ libstdc++-v3/include/bits/stl_algo.h @@ -1735,12 +1735,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _RandomAccessIterator __result_first, _RandomAccessIterator __result_last) { +#ifdef _GLIBCXX_CONCEPT_CHECKS typedef typename iterator_traits<_InputIterator>::value_type _InputValueType; typedef typename iterator_traits<_RandomAccessIterator>::value_type _OutputValueType; - typedef typename iterator_traits<_RandomAccessIterator>::difference_type - _DistanceType; +#endif // concept requirements __glibcxx_function_requires(_InputIteratorConcept<_InputIterator>) @@ -1786,12 +1786,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _RandomAccessIterator __result_last, _Compare __comp) { +#ifdef _GLIBCXX_CONCEPT_CHECKS typedef typename iterator_traits<_InputIterator>::value_type _InputValueType; typedef typename iterator_traits<_RandomAccessIterator>::value_type _OutputValueType; - typedef typename iterator_traits<_RandomAccessIterator>::difference_type - _DistanceType; +#endif // concept requirements __glibcxx_function_requires(_InputIteratorConcept<_InputIterator>) @@ -2020,13 +2020,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION lower_bound(_ForwardIterator __first, _ForwardIterator __last, const _Tp& __val, _Compare __comp) { - typedef typename iterator_traits<_ForwardIterator>::value_type - _ValueType; - // concept requirements __glibcxx_function_requires(_ForwardIteratorConcept<_ForwardIterator>) __glibcxx_function_requires(_BinaryPredicateConcept<_Compare, - _ValueType, _Tp>) + typename iterator_traits<_ForwardIterator>::value_type, _Tp>) __glibcxx_requires_partitioned_lower_pred(__first, __last, __val, __comp); __glibcxx_requires_irreflexive_pred2(__first, __last, __comp); @@ -2078,12 +2075,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION upper_bound(_ForwardIterator __first, _ForwardIterator __last, const _Tp& __val) { - typedef typename iterator_traits<_ForwardIterator>::value_type - _ValueType; - // concept requirements __glibcxx_function_requires(_ForwardIteratorConcept<_ForwardIterator>) - __glibcxx_function_requires(_LessThanOpConcept<_Tp, _ValueType>) + __glibcxx_function_requires(_LessThanOpConcept< + _Tp, typename iterator_traits<_ForwardIterator>::value_type>) __glibcxx_requires_partitioned_upper(__first, __last, __val); __glibcxx_requires_irreflexive2(__first, __last); @@ -2111,13 +2106,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION upper_bound(_ForwardIterator __first, _ForwardIterator __last, const _Tp& __val, _Compare __comp) { - typedef typename iterator_traits<_ForwardIterator>::value_type - _ValueType; - // concept requirements __glibcxx_function_requires(_ForwardIteratorConcept<_ForwardIterator>) __glibcxx_function_requires(_BinaryPredicateConcept<_Compare, - _Tp, _ValueType>) + _Tp, typename iterator_traits<_ForwardIterator>::value_type>) __glibcxx_requires_partitioned_upper_pred(__first, __last, __val, __comp); __glibcxx_requires_irreflexive_pred2(__first, __last, __comp); @@ -2186,13 +2178,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION equal_range(_ForwardIterator __first, _ForwardIterator __last, const _Tp& __val) { - typedef typename iterator_traits<_ForwardIterator>::value_type - _ValueType; - // concept requirements __glibcxx_function_requires(_ForwardIteratorConcept<_ForwardIterator>) - __glibcxx_function_requires(_LessThanOpConcept<_ValueType, _Tp>) - __glibcxx_function_requires(_LessThanOpConcept<_Tp, _ValueType>) + __glibcxx_function_requires(_LessThanOpConcept< + typename iterator_traits<_ForwardIterator>::value_type, _Tp>) + __glibcxx_function_requires(_LessThanOpConcept< + _Tp, typename iterator_traits<_ForwardIterator>::value_type>) __glibcxx_requires_partitioned_lower(__first, __last, __val); __glibcxx_requires_partitioned_upper(__first, __last, __val); __glibcxx_requires_irreflexive2(__first, __last); @@ -2224,15 +2215,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION equal_range(_ForwardIterator __first, _ForwardIterator __last, const _Tp& __val, _Compare __comp) { - typedef typename iterator_traits<_ForwardIterator>::value_type - _ValueType; - // concept requirements
Remove unused openacc call
I've committed this to trunk. It nuke the now unused GOACC_GET_NUM_THREADS and GOACC_GET_THREAD_NUM calls. Also fixed up some comment typos I noticed nathan 2015-11-09 Nathan Sidwell* omp-low.c: Fix some OpenACC comment typos. (lower_reduction_clauses): Remove BUILT_IN_GOACC_GET_THREAD_NUM call. * omp-builtins.def (BUILT_IN_GOACC_GET_THREAD_NUM, BUILT_IN_GOACC_GET_NUM_THREADS): Delete. Index: omp-low.c === --- omp-low.c (revision 230038) +++ omp-low.c (working copy) @@ -5559,7 +5559,7 @@ lower_reduction_clauses (tree clauses, g { gimple_seq sub_seq = NULL; gimple *stmt; - tree x, c, tid = NULL_TREE; + tree x, c; int count = 0; /* OpenACC loop reductions are handled elsewhere. */ @@ -5589,17 +5589,6 @@ lower_reduction_clauses (tree clauses, g if (count == 0) return; - /* Initialize thread info for OpenACC. */ - if (is_gimple_omp_oacc (ctx->stmt)) -{ - /* Get the current thread id. */ - tree call = builtin_decl_explicit (BUILT_IN_GOACC_GET_THREAD_NUM); - tid = create_tmp_var (TREE_TYPE (TREE_TYPE (call))); - gimple *stmt = gimple_build_call (call, 0); - gimple_call_set_lhs (stmt, tid); - gimple_seq_add_stmt (stmt_seqp, stmt); -} - for (c = clauses; c ; c = OMP_CLAUSE_CHAIN (c)) { tree var, ref, new_var, orig_var; @@ -12266,7 +12255,7 @@ expand_omp_atomic (struct omp_region *re } -/* Encode an oacc launc argument. This matches the GOMP_LAUNCH_PACK +/* Encode an oacc launch argument. This matches the GOMP_LAUNCH_PACK macro on gomp-constants.h. We do not check for overflow. */ static tree @@ -12292,7 +12281,7 @@ oacc_launch_pack (unsigned code, tree de The attribute value is a TREE_LIST. A set of dimensions is represented as a list of INTEGER_CST. Those that are runtime - expres are represented as an INTEGER_CST of zero. + exprs are represented as an INTEGER_CST of zero. TOOO. Normally the attribute will just contain a single such list. If however it contains a list of lists, this will represent the use of @@ -14311,7 +14300,7 @@ lower_omp_for (gimple_stmt_iterator *gsi gimple_omp_for_clauses (stmt), _head, _tail, ctx); - /* Add OpenACC partitioning markers just before the loop */ + /* Add OpenACC partitioning and reduction markers just before the loop */ if (oacc_head) gimple_seq_add_seq (, oacc_head); @@ -19524,7 +19513,7 @@ public: return execute_oacc_device_lower (); } -}; // class pass_oacc_transform +}; // class pass_oacc_device_lower } // anon namespace Index: omp-builtins.def === --- omp-builtins.def (revision 230038) +++ omp-builtins.def (working copy) @@ -47,10 +47,6 @@ DEF_GOACC_BUILTIN (BUILT_IN_GOACC_UPDATE DEF_GOACC_BUILTIN (BUILT_IN_GOACC_WAIT, "GOACC_wait", BT_FN_VOID_INT_INT_VAR, ATTR_NOTHROW_LIST) -DEF_GOACC_BUILTIN (BUILT_IN_GOACC_GET_THREAD_NUM, "GOACC_get_thread_num", - BT_FN_INT, ATTR_CONST_NOTHROW_LEAF_LIST) -DEF_GOACC_BUILTIN (BUILT_IN_GOACC_GET_NUM_THREADS, "GOACC_get_num_threads", - BT_FN_INT, ATTR_CONST_NOTHROW_LEAF_LIST) DEF_GOACC_BUILTIN_COMPILER (BUILT_IN_ACC_ON_DEVICE, "acc_on_device", BT_FN_INT_INT, ATTR_CONST_NOTHROW_LEAF_LIST)
Re: [PATCH 10/12] always define EH_RETURN_HANDLER_RTX
On 11/09/2015 02:30 PM, Trevor Saunders wrote: So in general when I've done cross target things I think I've found more bugs with config-list.mk than with a regtest, but the regtest has found some things I think. I'm finding config-list.mk fairly reliable, with the notable exception of the avr-rtems issue and interix. But that may simply be function of running it regularly. However I actually don't mind bootstrapping and regtesting that much, its more or less a few hours for the control and then another few for each patch. I usually save my results and only go back for a control build if something goes wrong. Of course I'm usually stepping forward at least once a day, so the number of new tests is usually manageable and allows me to compare the first run of the day with the last run of the prior day. On the other hand config-list.mk takes on the order of 12 hours, and setting up a cross for a quick test isn't really that quick. Which means that if you have a patch touching a number of targets you end up not checking it compiles at all until you run config-list.mk, and then its a heavy weight operation. FWIW, If we know what ports a particular patch would hit, I'd fully support folks doing builds that didn't hit all of config-list.mk. In case it's not obvious I do hope that we'll get to a point where the class of bugs like "X is unused on port PDQ because it defines/does not define FROBIT" just go away and we can get good first level coverage with a native and perhaps a very small number of crosses (instead of the 200+ in config-list.mk now). At some point I also want to see config-list.mk extended to do things like "build the crosses and run test tree-ssa/ssa-dom-thread-11.c on all of them". I've got hacks to do that locally, but they're strictly hacks. I think this selectively deeper testing will become more important as we put the first level coverage behind us. So at least for the way I work I'd really rather write series that I can incrementally test on just one target and be reasonably confident they won't break other targets. That generally works for me. The add default macro definitions then wrap those with hooks, then target by target replace the macro by hook overrides approach seems to provide that you can incrementally test and fiind most of the issues, but the change a macro every where approach doesn't really. I think Bernd and I just have different approaches, preferences and priorities on some stuff which results in slightly different priorities or approaches to certain issues. I've known Bernd a long time and will say he's very reasonable and his concerns/objections are well thought out and carry a ton of weight with me. Jeff
Re: RFC: C++ delayed folding merge
> Right, the change is just to the C++ front end 'convert'. OK, thanks for the clarification. -- Eric Botcazou
Re: [1/2] OpenACC routine support
On 11/09/2015 04:31 PM, Nathan Sidwell wrote: > On 11/03/15 10:35, Jakub Jelinek wrote: >> On Mon, Nov 02, 2015 at 02:21:43PM -0500, Nathan Sidwell wrote: >>> --- gcc/c/c-parser.c(revision 229667) >>> +++ gcc/c/c-parser.c(working copy) >>> @@ -1160,7 +1160,8 @@ enum c_parser_prec { >>> static void c_parser_external_declaration (c_parser *); >>> static void c_parser_asm_definition (c_parser *); >>> static void c_parser_declaration_or_fndef (c_parser *, bool, bool, >>> bool, >>> - bool, bool, tree *, vec); >>> + bool, bool, tree *, vec, >>> + tree); >> >> Wonder if this shouldn't be tree = NULL_TREE, then you'd avoid most of >> the >> c_parser_declaration_or_fndef caller changes. >> >> Otherwise, LGTM. > > This is the patch I've just committed. It includes c parser adjustments > to detect the case of two function decls with a single type specifier. > Cesar will be applying a patch for the C++ parser for the same case. Here's the patch that Nathan was referring to. I ended up introducing a boolean variable named first in the various functions which call finalize_oacc_routines. The problem the original approach was having was that the routine clauses is only applied to the first function declarator in a declaration list. By using 'first', which is set to true if the current declarator is the first in a sequence of declarators, I was able to defer setting parser->oacc_routine to NULL. Nathan already approved this patch, so I've applied it to trunk. Cesar 2015-11-09 Cesar Philippidisgcc/cp/ * parser.c (cp_finalize_oacc_routine): New boolean first argument. (cp_ensure_no_oacc_routine): Update call to cp_finalize_oacc_routine. (cp_parser_simple_declaration): Maintain a boolean first to keep track of each new declarator. Propagate it to cp_parser_init_declarator. (cp_parser_init_declarator): New boolean first argument. Propagate it to cp_parser_save_member_function_body and cp_finalize_oacc_routine. (cp_parser_member_declaration): Likewise. (cp_parser_single_declaration): Update call to cp_parser_init_declarator. (cp_parser_save_member_function_body): New boolean first_decl argument. Propagate it to cp_finalize_oacc_routine. (cp_parser_finish_oacc_routine): New boolean first argument. Use it to determine if multiple declarators follow a routine construct. (cp_parser_oacc_routine): Update call to cp_parser_finish_oacc_routine. gcc/testsuite/ * c-c++-common/goacc/routine-5.c: Enable c++ tests. diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c index 6fc2c6a..f3b4b46 100644 --- a/gcc/cp/parser.c +++ b/gcc/cp/parser.c @@ -246,7 +246,7 @@ static bool cp_parser_omp_declare_reduction_exprs static tree cp_parser_cilk_simd_vectorlength (cp_parser *, tree, bool); static void cp_finalize_oacc_routine - (cp_parser *, tree, bool); + (cp_parser *, tree, bool, bool); /* Manifest constants. */ #define CP_LEXER_BUFFER_SIZE ((256 * 1024) / sizeof (cp_token)) @@ -1329,7 +1329,7 @@ cp_finalize_omp_declare_simd (cp_parser *parser, tree fndecl) static inline void cp_ensure_no_oacc_routine (cp_parser *parser) { - cp_finalize_oacc_routine (parser, NULL_TREE, false); + cp_finalize_oacc_routine (parser, NULL_TREE, false, true); } /* Decl-specifiers. */ @@ -2135,7 +2135,7 @@ static tree cp_parser_decltype static tree cp_parser_init_declarator (cp_parser *, cp_decl_specifier_seq *, vec *, - bool, bool, int, bool *, tree *, location_t *); + bool, bool, int, bool *, tree *, bool, location_t *); static cp_declarator *cp_parser_declarator (cp_parser *, cp_parser_declarator_kind, int *, bool *, bool, bool); static cp_declarator *cp_parser_direct_declarator @@ -2445,7 +2445,7 @@ static tree cp_parser_single_declaration static tree cp_parser_functional_cast (cp_parser *, tree); static tree cp_parser_save_member_function_body - (cp_parser *, cp_decl_specifier_seq *, cp_declarator *, tree); + (cp_parser *, cp_decl_specifier_seq *, cp_declarator *, tree, bool); static tree cp_parser_save_nsdmi (cp_parser *); static tree cp_parser_enclosed_template_argument_list @@ -11909,6 +11909,7 @@ cp_parser_simple_declaration (cp_parser* parser, bool saw_declarator; location_t comma_loc = UNKNOWN_LOCATION; location_t init_loc = UNKNOWN_LOCATION; + bool first = true; if (maybe_range_for_decl) *maybe_range_for_decl = NULL_TREE; @@ -12005,7 +12006,10 @@ cp_parser_simple_declaration (cp_parser* parser, declares_class_or_enum, _definition_p, maybe_range_for_decl, + first, _loc); + first = false; + /* If an error occurred while parsing tentatively, exit quickly. (That usually happens when in the body of a function; each statement is treated as a declaration-statement until proven @@ -12104,6 +12108,9 @@ cp_parser_simple_declaration (cp_parser* parser, done: pop_deferring_access_checks
Re: [PATCH 10/12] always define EH_RETURN_HANDLER_RTX
On Mon, 9 Nov 2015, Trevor Saunders wrote: > The add default macro definitions then wrap those with hooks, then > target by target replace the macro by hook overrides approach seems to > provide that you can incrementally test and fiind most of the issues, > but the change a macro every where approach doesn't really. I have this notion that once a target macro is "regular" enough - not used in code built for the target, not used in driver code, not used directly or indirectly in #if conditions except for the single default definition in defaults.h, target definitions only depend on the target architecture and not OS or other variations - it ought to be possible to do the conversion to a hook with some kind of automated refactoring tool (possibly with a little editing of its results). And so this sort of regularizing of target macros is helpful because it increases the number of target macros that could be converted in an automated manner. -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH], Add power9 support to GCC, patch #4
On Mon, Nov 09, 2015 at 10:29:10AM -0600, Segher Boessenkool wrote: > On Sun, Nov 08, 2015 at 07:39:14PM -0500, Michael Meissner wrote: > > +;; Pretend we have a memory form of extswsli until register allocation is > > done > > +;; so that we use LWZ to load the value from memory, instead of LWA. > > We generate sign_extend loads for many cases where zero_extend would be > preferable. We should deal with that generically, and then we can lose > this hack. Well it would be nice in theory. But since we don't have that generic pass, I need to use the combiner to generate the instruction. > > +(define_insn_and_split "*ashdi3_extswsli_dot" > > ... > > > + if (REGNO (cr) == CR0_REGNO) > > +{ > > + emit_insn (gen_ashdi3_extswsli_dot2 (dest, src2, shift, cr)); > > + DONE; > > +} > > s/dot2/dot/ No, it will endless recurse until there is a stack overflow if you use dot (since it will call itself, generating the same pattern over and over again). > > +/* { dg-final { scan-assembler "extswsli\\. " } } */ > > +/* { dg-final { scan-assembler "lwz " } } */ > > +/* { dg-final { scan-assembler-not "lwa " } } */ > > "lwa" is a nasty string to search for ("always"). You can write this as > {\mlwa\M} for more sanity. > > > +/* { dg-final { scan-assembler-not "sldi "} } */ > > +/* { dg-final { scan-assembler-not "sldi\\. " } } */ > > Similarly {\msldi\M} catches both. Thanks. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Re: [PATCH], Add power9 support to GCC, patch #5 (ISA 3.0 fusion)
On Mon, Nov 09, 2015 at 11:16:27AM -0600, Segher Boessenkool wrote: > On Sun, Nov 08, 2015 at 07:42:04PM -0500, Michael Meissner wrote: > > - /* Power8 currently will only do the fusion if the top 11 bits of the > > addis > > - value are all 1's or 0's. */ > >value = INTVAL (int_const); > >if ((value & (HOST_WIDE_INT)0x) != 0) > > Space after cast, like (HOST_WIDE_INT) 0x . Thanks. > > + /* Power8 currently will only do the fusion if the top 11 bits of the > > addis > > + value are all 1's or 0's. Ignore this restriction if we are testing > > + advanced fusion. */ > > + if (TARGET_P9_FUSION) > > +return 1; > > This comment seems out of date? Yeah, when I first coded it when the fusion semantics were being nailed down, I couldn't reference power9 in the branch which was kept on the FSF servers, so I just called it advanced fusion. I evidently missed a few places in doing the merge to change the name. > > ;; Match a GPR load (lbz, lhz, lwz, ld) that uses a combined address in the > > ;; memory field with both the addis and the memory offset. Sign extension > > ;; is not handled here, since lha and lwa are not fused. > > -(define_predicate "fusion_gpr_mem_combo" > > - (match_code "mem,zero_extend") > > +;; With extended fusion, also match a FPR load (lfd, lfs) and float_extend > > And here? Yes. > > --- gcc/config/rs6000/rs6000.c (revision 229975) > > +++ gcc/config/rs6000/rs6000.c (working copy) > > @@ -376,8 +376,18 @@ struct rs6000_reg_addr { > >enum insn_code reload_fpr_gpr; /* INSN to move from FPR to GPR. */ > >enum insn_code reload_gpr_vsx; /* INSN to move from GPR to VSX. */ > >enum insn_code reload_vsx_gpr; /* INSN to move from VSX to GPR. */ > > + enum insn_code fusion_gpr_ld;/* INSN for fusing gpr > > ADDIS/loads. */ > > + /* INSNs for fusing addi with loads > > + or stores for each reg. class. */ > > > > + enum insn_code fusion_addi_ld[(int)N_RELOAD_REG]; > > + enum insn_code fusion_addi_st[(int)N_RELOAD_REG]; > > + /* INSNs for fusing addis with loads > > + or stores for each reg. class. */ > > > > Trailing tabs. Ok. > > +/* Return true if the peephole2 can combine a load/store involving a > > + combination of an addis instruction and the memory operation. This was > > + added to the ISA 3.0 (power9) hardware. */ > > + > > +bool > > +fusion_p9_p (rtx addis_reg,/* register set via addis. */ > > +rtx addis_value, /* addis value. */ > > +rtx dest, /* destination (memory or register). */ > > +rtx src) /* source (register or memory). */ > > The function header comment should explain the params, after which you > can use the normal style for the function declaration itself. Ok. > > +(define_insn "*toc_fusionload_" > > + [(set (match_operand:QHSI 0 "int_reg_operand" "=,??r") > > + (match_operand:QHSI 1 "toc_fusion_mem_wrapped" "wG,wG")) > > + (unspec [(const_int 0)] UNSPEC_FUSION_ADDIS) > > + (use (match_operand:DI 2 "base_reg_operand" "r,r")) > > + (clobber (match_scratch:DI 3 "=X,"))] > > + "TARGET_TOC_FUSION_INT" > > Do you need that "??r" alternative? Same for the next define_insn. Yes unfortunately. The ??r catches the case where r0 is chosen. R0 is not a base register, and it can't be used for power8 gpr fusion (where you use the value being loaded for the ADDIS instruction), but it can be used for power9 fusion (where the ADDIS must be adjancent, but it no longer has to be the register being loaded). > Big patch, most looks good :-) Thanks. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
[PATCH, 6/16] Add pass_oacc_kernels
On 09/11/15 16:35, Tom de Vries wrote: Hi, this patch series for stage1 trunk adds support to: - parallelize oacc kernels regions using parloops, and - map the loops onto the oacc gang dimension. The patch series contains these patches: 1Insert new exit block only when needed in transform_to_exit_first_loop_alt 2Make create_parallel_loop return void 3Ignore reduction clause on kernels directive 4Implement -foffload-alias 5Add in_oacc_kernels_region in struct loop 6Add pass_oacc_kernels 7Add pass_dominator_oacc_kernels 8Add pass_ch_oacc_kernels 9Add pass_parallelize_loops_oacc_kernels 10Add pass_oacc_kernels pass group in passes.def 11Update testcases after adding kernels pass group 12Handle acc loop directive 13Add c-c++-common/goacc/kernels-*.c 14Add gfortran.dg/goacc/kernels-*.f95 15Add libgomp.oacc-c-c++-common/kernels-*.c 16Add libgomp.oacc-fortran/kernels-*.f95 The first 9 patches are more or less independent, but patches 10-16 are intended to be committed at the same time. Bootstrapped and reg-tested on x86_64. Build and reg-tested with nvidia accelerator, in combination with a patch that enables accelerator testing (which is submitted at https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ). I'll post the individual patches in reply to this message. this patchs add a pass group pass_oacc_kernels (which will be added to the pass list as a whole in patch 10). Atm, the parallelization behaviour for the kernels region is controlled by flag_tree_parallelize_loops, which is also used to control generic auto-parallelization by autopar using omp. That is not ideal, and we may want a separate flag (or param) to control the behaviour for oacc kernels, f.i. -foacc-kernels-gang-parallelize=. I'm open to suggestions. The purpose of the pass group as a whole is to massage the offloaded function into a shape that parloops can deal with it, and then run parloops on it. Consider a testcase with a reduction, and a loop counter declared outside the offload region: ... unsigned int a[n]; unsigned int foo (void) { int i; unsigned int sum = 1; #pragma acc kernels copyin (a[0:n]) copy (sum) { for (i = 0; i < n; ++i) sum += a[i]; } return sum; } ... After ealias, the loop body looks like this: ... : _8 = *.omp_data_i_3(D).a; _9 = *.omp_data_i_3(D).i; _10 = *_9; _11 = *_8[_10]; _12 = *.omp_data_i_3(D).sum; sum.0_13 = *_12; sum.1_14 = _11 + sum.0_13; _15 = *.omp_data_i_3(D).sum; *_15 = sum.1_14; _17 = *.omp_data_i_3(D).i; _18 = *_17; _19 = *.omp_data_i_3(D).i; _20 = _18 + 1; *_19 = _20; goto ; ... In other words, the iteration variable is in memory, as is the reduction variable, and the body contains lots of loop invariant loads. At the end of the pass group, just before parloops, the body has been rewritten to have a local iteration variable and a local reduction variable, and all the loop invariant loads have been moved out of the loop: ... : # _27 = PHI <0(2), _20(5)> # D__lsm.7_28 = PHI_11 = *_8[_27]; sum.1_14 = _11 + D__lsm.7_28; _20 = _27 + 1; if (_20 <= ) goto ; else goto ; ... Thanks, - Tom Add pass_oacc_kernels 2015-11-09 Tom de Vries * tree-pass.h (make_pass_oacc_kernels): Declare. * tree-ssa-loop.c (gate_oacc_kernels): New static function. (pass_data_oacc_kernels): New pass_data. (class pass_oacc_kernels): New pass. (make_pass_oacc_kernels): New function. --- gcc/tree-pass.h | 1 + gcc/tree-ssa-loop.c | 65 + 2 files changed, 66 insertions(+) diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h index 49e22a9..4ed8da6 100644 --- a/gcc/tree-pass.h +++ b/gcc/tree-pass.h @@ -463,6 +463,7 @@ extern gimple_opt_pass *make_pass_strength_reduction (gcc::context *ctxt); extern gimple_opt_pass *make_pass_vtable_verify (gcc::context *ctxt); extern gimple_opt_pass *make_pass_ubsan (gcc::context *ctxt); extern gimple_opt_pass *make_pass_sanopt (gcc::context *ctxt); +extern gimple_opt_pass *make_pass_oacc_kernels (gcc::context *ctxt); /* IPA Passes */ extern simple_ipa_opt_pass *make_pass_ipa_lower_emutls (gcc::context *ctxt); diff --git a/gcc/tree-ssa-loop.c b/gcc/tree-ssa-loop.c index 8ecd140..b51cac2 100644 --- a/gcc/tree-ssa-loop.c +++ b/gcc/tree-ssa-loop.c @@ -35,6 +35,7 @@ along with GCC; see the file COPYING3. If not see #include "tree-inline.h" #include "tree-scalar-evolution.h" #include "tree-vectorizer.h" +#include "omp-low.h" /* A pass making sure loops are fixed up. */ @@ -141,6 +142,70 @@ make_pass_tree_loop (gcc::context *ctxt) return new pass_tree_loop (ctxt); } +/* Gate for oacc kernels pass group. */ + +static bool +gate_oacc_kernels (function *fn) +{ + if (flag_tree_parallelize_loops <= 1) +
[gomp4] remove IFN_GOACC_DIM handling from device_lower
I've committed this to gomp4, the relevant handling is in gimple-fold now. nathan 2015-11-09 Nathan Sidwell* omp-low.c (oacc_xform_dim): Delete. (execute_oacc_device_lower): Remove IFN_GOACC_DIM_POS, IFN_GOACC_DIM_SIZE handling. Index: omp-low.c === --- omp-low.c (revision 230022) +++ omp-low.c (working copy) @@ -18835,38 +18835,6 @@ omp_finish_file (void) } } -/* Transform oacc_dim_size and oacc_dim_pos internal function calls to - constants, where possible. */ - -static bool -oacc_xform_dim (gcall *call, const int dims[], bool is_pos) -{ - tree arg = gimple_call_arg (call, 0); - unsigned axis = (unsigned)TREE_INT_CST_LOW (arg); - int size = dims[axis]; - - if (!size) -/* Dimension size is dynamic. */ -return false; - - if (is_pos) -{ - if (size != 1) - /* Size is more than 1, so POS might be non-zero. */ - return false; - size = 0; -} - - /* Replace the internal call with a constant. */ - tree lhs = gimple_call_lhs (call); - gimple *g = gimple_build_assign -(lhs, build_int_cst (integer_type_node, size)); - - gimple_stmt_iterator gsi = gsi_for_stmt (call); - gsi_replace (, g, false); - return true; -} - /* Find the number of threads (POS = false), or thread number (POS = true) for an OpenACC region partitioned as MASK. Setup code required for the calculation is added to SEQ. */ @@ -19877,15 +19845,6 @@ execute_oacc_device_lower () { default: break; - case IFN_GOACC_DIM_POS: - case IFN_GOACC_DIM_SIZE: - if (gimple_call_lhs (call) == NULL_TREE) - remove = true; - else if (oacc_xform_dim (call, dims, - ifn_code == IFN_GOACC_DIM_POS)) - rescan = true; - break; - case IFN_GOACC_LOOP: oacc_xform_loop (call); rescan = true;
Re: [PATCH v3 2/2] [PR debug/67192] Further fix C loops' back-jump location
On Sat, Nov 07 2015, Jeff Law wrote: > Also OK. And please consider using those tests with the C++ compiler > to see if it's suffering from the same problem. Not really, but there's still an issue. In the C front-end the back-jump's location of an unconditional loop was sometimes set to the token after the loop, particularly after the misleading-indent patch. This does *not* apply to C++. Before the misleading-indent patch the location was usually set to the last line of the loop instead. This may be slightly confusing when the loop body consists of an if-else statement: Breaking on that line then causes a breakpoint hit on every iteration even if the else-path is never executed. This issue does *not* apply to C++ either. But the C++ front-end always sets the location to the "while" or "for" token. This can cause confusion when setting a breakpoint there: When hitting it for the first time, one loop iteration will already have executed. For that issue I included an informal patch in my earlier post. It mimics the C patch and seems to fix the issue: https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00478.html I'll go ahead and prepare a full patch (with test case, ChangeLog, etc.) for this.
Re: [PATCH], Add power9 support to GCC, patch #3 (scalar count trailing zeros)
On Sun, Nov 8, 2015 at 4:37 PM, Michael Meissnerwrote: > This patch adds support for scalar count trailing zeros instruction that is > being added to ISA 3.0 (power9). > > I have built this patch (along with patches #2 and #4) with a bootstrap build > on a power8 little endian system. There were no regressions in the test > suite. Is this patch ok to install in the trunk once patch #1 has been > installed. > > [gcc] > 2015-11-08 Michael Meissner > > * config/rs6000/rs6000.c (rs6000_rtx_costs): Update costs for > count trailing zero instruction if we have hardware support. > > * config/rs6000/rs6000.h (TARGET_CTZ): Add support for count > trailing zero instruction in ISA 3.0. > * config/rs6000/rs6000.c (ctz2): Likewise. > (ctz2_h): Likewise. > > [gcc/testsuite] > 2015-11-08 Michael Meissner > > * gcc.target/powerpc/ctz-1.c: Add test for count trailing zero > instruciton support. > * gcc.target/powerpc/ctz-2.c: Likewise. This is okay. We can address the attribute at a later time if necessary. Please re-check CTZ_DEFINED_VALUE_AT_ZERO. Thanks, David
Re: Extend tree-call-cdce to calls whose result is used
Michael Matzwrites: > On Mon, 9 Nov 2015, Richard Sandiford wrote: > >> -ffast-math would already cause us to treat the function as not setting >> errno, so the code wouldn't be used. > > What is "the code"? I don't see any checking of the relevant flags in > tree-call-cdce.c, so I wonder what would prevent the addition of the > unnecessary checking. -ffast-math implies -fno-errno-math, which in turn changes how the function attributes are set. E.g.: #undef ATTR_MATHFN_FPROUNDING_ERRNO #define ATTR_MATHFN_FPROUNDING_ERRNO (flag_errno_math ? \ ATTR_NOTHROW_LEAF_LIST : ATTR_MATHFN_FPROUNDING) So with -ffast-math these functions don't set errno and don't have a vdef. The patch checks for that here: +/* Return true if built-in function call CALL could be implemented using + a combination of an internal function to compute the result and a + separate call to set errno. */ + +static bool +can_use_internal_fn (gcall *call) +{ + /* Only replace calls that set errno. */ + if (!gimple_vdef (call)) +return false; Checking for a vdef seemed reasonable. If we treat these libm functions as doing nothing other than producing a numerical result then we'll get better optimisation across the board if we don't add unnecessary vops. (Which we do, but see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68235 ) >> > The pass is somewhat expensive in that it removes dominator info and >> > schedules a full ssa update. The transformation is trivial enough >> > that dominators and SSA form can be updated on the fly, I think >> > without that it's not feasible for -O. >> >> r229916 fixed that for the non-EH case. > > Ah, missed it. Even the EH case shouldn't be difficult. If the original > dominator of the EH destination was the call block it moves, otherwise it > remains unchanged. The target of the edge is easy in itself, I agree, but that isn't necessarily the only affected block, if the EH handler doesn't exit or rethrow. >> I posted a patch to update the vops for the non-EH case as well: >> >> https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03355.html > > I see, here the EH case is a bit more difficult as you need to > differentiate between VOP uses in the EH and the non-EH region, but not > insurmountable. Well, I agree it's not insurmountable. :-) >> But by "builtin folding", do you mean fold_builtin_n etc.? > > I had the pass_fold_builtins in mind. OK. > Currently we have quite some of such passes (reassoc, forwprop, > lower_vector_ssa, cse_reciprocals, cse_sincos (sigh!), optimize_bswap and > others), but they are all handling only special situations in one way or > the other. pass_fold_builtins is another one, but it seems most related > to what you want (replacing a call with something else), so I thought > that'd be the natural choice. Well, to be pedantic, it's not really replacing the call. Except for the special case of targets that support direct assignments to errno, it keeps the original call but ensures that it isn't usually executed. From that point of view it doesn't really seem like a fold. But I suppose that's just naming again :-). And it's easily solved with s/fold/rewrite/. > call_cdce is also such a pass, but I think it's simply not the appropriate > one (only in so far as its source file contains the helper routines you > need), and in addition I think it shouldn't exist at all (and wouldn't > need to if it had been part of DCE from the start, or if you implemented > the conditionalizing as part of another pass). Hey, you could be one to > remove a pass! ;-) It still seems a bit artificial to me to say that the transformation with a null lhs is "DCE enough" to go in the main DCE pass (even though like I say it doesn't actually eliminate any code from the IR, it just adds more code) and should be kept in a separate pass from the one that does the transformation on a non-null lhs. Thanks, Richard
[PATCH, 7/16] Add pass_dominator_oacc_kernels
On 09/11/15 16:35, Tom de Vries wrote: Hi, this patch series for stage1 trunk adds support to: - parallelize oacc kernels regions using parloops, and - map the loops onto the oacc gang dimension. The patch series contains these patches: 1Insert new exit block only when needed in transform_to_exit_first_loop_alt 2Make create_parallel_loop return void 3Ignore reduction clause on kernels directive 4Implement -foffload-alias 5Add in_oacc_kernels_region in struct loop 6Add pass_oacc_kernels 7Add pass_dominator_oacc_kernels 8Add pass_ch_oacc_kernels 9Add pass_parallelize_loops_oacc_kernels 10Add pass_oacc_kernels pass group in passes.def 11Update testcases after adding kernels pass group 12Handle acc loop directive 13Add c-c++-common/goacc/kernels-*.c 14Add gfortran.dg/goacc/kernels-*.f95 15Add libgomp.oacc-c-c++-common/kernels-*.c 16Add libgomp.oacc-fortran/kernels-*.f95 The first 9 patches are more or less independent, but patches 10-16 are intended to be committed at the same time. Bootstrapped and reg-tested on x86_64. Build and reg-tested with nvidia accelerator, in combination with a patch that enables accelerator testing (which is submitted at https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ). I'll post the individual patches in reply to this message. this patch adds pass_dominator_oacc_kernels (which we may as well call pass_dominator_no_peel_loop_headers. It doesn't do anything oacc-kernels-specific), to be used in the kernels pass group. The reason I'm adding a new pass instead of using pass_dominator is that pass_dominator uses first_pass_instance. So adding a pass_dominator instance A before a pass_dominator instance B has the unexpected consequence that it may change the behaviour of instance B. I've filed PR68247 - "Remove pass_first_instance" to note this issue. Thanks, - Tom Add pass_dominator_oacc_kernels 2015-11-09 Tom de Vries* tree-pass.h (make_pass_dominator_oacc_kernels): Declare. * tree-ssa-dom.c (class dominator_base): New class. Factor out of ... (class pass_dominator): ... here. (dominator_base::may_peel_loop_headers_p) (pass_dominator::may_peel_loop_headers_p): New function. (pass_dominator_oacc_kernels): New pass. (make_pass_dominator_oacc_kernels): New function. (dominator_base::execute): Use may_peel_loop_headers_p. --- gcc/tree-pass.h| 1 + gcc/tree-ssa-dom.c | 57 +- 2 files changed, 53 insertions(+), 5 deletions(-) diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h index 4ed8da6..2825aea 100644 --- a/gcc/tree-pass.h +++ b/gcc/tree-pass.h @@ -395,6 +395,7 @@ extern gimple_opt_pass *make_pass_build_ssa (gcc::context *ctxt); extern gimple_opt_pass *make_pass_build_alias (gcc::context *ctxt); extern gimple_opt_pass *make_pass_build_ealias (gcc::context *ctxt); extern gimple_opt_pass *make_pass_dominator (gcc::context *ctxt); +extern gimple_opt_pass *make_pass_dominator_oacc_kernels (gcc::context *ctxt); extern gimple_opt_pass *make_pass_dce (gcc::context *ctxt); extern gimple_opt_pass *make_pass_cd_dce (gcc::context *ctxt); extern gimple_opt_pass *make_pass_call_cdce (gcc::context *ctxt); diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c index 3887bbe1..e4ff63a 100644 --- a/gcc/tree-ssa-dom.c +++ b/gcc/tree-ssa-dom.c @@ -519,6 +519,19 @@ private: namespace { +class dominator_base : public gimple_opt_pass +{ + protected: + dominator_base (pass_data data, gcc::context *ctxt) +: gimple_opt_pass (data, ctxt) + {} + + unsigned int execute (function *); + + protected: + virtual bool may_peel_loop_headers_p (void) { return true; } +}; // class dominator_base + const pass_data pass_data_dominator = { GIMPLE_PASS, /* type */ @@ -532,22 +545,23 @@ const pass_data pass_data_dominator = ( TODO_cleanup_cfg | TODO_update_ssa ), /* todo_flags_finish */ }; -class pass_dominator : public gimple_opt_pass +class pass_dominator : public dominator_base { public: pass_dominator (gcc::context *ctxt) -: gimple_opt_pass (pass_data_dominator, ctxt) +: dominator_base (pass_data_dominator, ctxt) {} /* opt_pass methods: */ opt_pass * clone () { return new pass_dominator (m_ctxt); } virtual bool gate (function *) { return flag_tree_dom != 0; } - virtual unsigned int execute (function *); + protected: + virtual bool may_peel_loop_headers_p (void) { return first_pass_instance; } }; // class pass_dominator unsigned int -pass_dominator::execute (function *fun) +dominator_base::execute (function *fun) { memset (_stats, 0, sizeof (opt_stats)); @@ -619,7 +633,7 @@ pass_dominator::execute (function *fun) free_all_edge_infos (); /* Thread jumps, creating duplicate blocks as needed. */ - cfg_altered |= thread_through_all_blocks (first_pass_instance); +
Re: [PATCH 10/12] always define EH_RETURN_HANDLER_RTX
On 11/09/2015 05:47 PM, tbsaunde+...@tbsaunde.org wrote: From: Trevor Saundersgcc/ChangeLog: 2015-11-09 Trevor Saunders * defaults.h (EH_RETURN_HANDLER_RTX): New default definition. * df-scan.c (df_get_exit_block_use_set): Adjust. * except.c (expand_eh_return): Likewise. As I said for a previous patch series, if we go to the trouble of fixing up stuff like this, we might as well do it properly and turn things like this into a target hook. Bernd
Re: [PATCH 05/12] always define VMS_DEBUGGING_INFO
In general I think the _DEBUGGING_INFO patches are going to be OK, modulo Jeff's comment about stage 1. I think they shouldn't have been split - it causes numerous unnecessary extra changes, and the intermediate stages look very inconsistent. -#ifdef VMS_DEBUGGING_INFO - else if (write_symbols == VMS_DEBUG || write_symbols == VMS_AND_DWARF2_DEBUG) + else if (VMS_DEBUGGING_INFO + && (write_symbols == VMS_DEBUG + || write_symbols == VMS_AND_DWARF2_DEBUG)) debug_hooks = _debug_hooks; -#endif #ifdef DWARF2_LINENO_DEBUGGING_INFO else if (write_symbols == DWARF2_DEBUG) debug_hooks = _lineno_debug_hooks; diff --git a/gcc/vmsdbgout.c b/gcc/vmsdbgout.c index d41d4b2..6dd6878 100644 --- a/gcc/vmsdbgout.c +++ b/gcc/vmsdbgout.c @@ -24,7 +24,7 @@ along with GCC; see the file COPYING3. If not see #include "coretypes.h" #include "tm.h" -#ifdef VMS_DEBUGGING_INFO +#if VMS_DEBUGGING_INFO #include "alias.h" #include "tree.h" #include "varasm.h" This seems to reference vmsdbg_debug_hooks unconditionally, but as far as I can tell the definition is still guarded by an #if? Does this compile? Bernd
[PATCH, 8/16] Add pass_ch_oacc_kernels
On 09/11/15 16:35, Tom de Vries wrote: Hi, this patch series for stage1 trunk adds support to: - parallelize oacc kernels regions using parloops, and - map the loops onto the oacc gang dimension. The patch series contains these patches: 1Insert new exit block only when needed in transform_to_exit_first_loop_alt 2Make create_parallel_loop return void 3Ignore reduction clause on kernels directive 4Implement -foffload-alias 5Add in_oacc_kernels_region in struct loop 6Add pass_oacc_kernels 7Add pass_dominator_oacc_kernels 8Add pass_ch_oacc_kernels 9Add pass_parallelize_loops_oacc_kernels 10Add pass_oacc_kernels pass group in passes.def 11Update testcases after adding kernels pass group 12Handle acc loop directive 13Add c-c++-common/goacc/kernels-*.c 14Add gfortran.dg/goacc/kernels-*.f95 15Add libgomp.oacc-c-c++-common/kernels-*.c 16Add libgomp.oacc-fortran/kernels-*.f95 The first 9 patches are more or less independent, but patches 10-16 are intended to be committed at the same time. Bootstrapped and reg-tested on x86_64. Build and reg-tested with nvidia accelerator, in combination with a patch that enables accelerator testing (which is submitted at https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ). I'll post the individual patches in reply to this message. this patch adds a pass pass_ch_oacc_kernels, which is like pass_ch, but only runs for loops with oacc_kernels_region set. [ But... thinking about it a bit more, I think that we could use a regular pass_ch instead. We only use the kernels pass group for a single loop nest in a kernels region, and we mark all the loops in the loop nest with oacc_kernels_region. So I think that the oacc_kernels_region test in pass_ch_oacc_kernels::process_loop_p evaluates to true. ] So, I'll try to confirm with retesting that we can drop this patch. Thanks, - Tom Add pass_ch_oacc_kernels 2015-11-09 Tom de Vries* tree-pass.h (make_pass_ch_oacc_kernels): Declare. * tree-ssa-loop-ch.c (pass_ch::pass_ch (pass_data, gcc::context)): New constructor. (pass_data_ch_oacc_kernels): New pass_data. (class pass_ch_oacc_kernels): New pass. (pass_ch_oacc_kernels::process_loop_p): New function. (make_pass_ch_oacc_kernels): New function. --- gcc/tree-pass.h| 1 + gcc/tree-ssa-loop-ch.c | 54 +- 2 files changed, 54 insertions(+), 1 deletion(-) diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h index 2825aea..f95a820 100644 --- a/gcc/tree-pass.h +++ b/gcc/tree-pass.h @@ -389,6 +389,7 @@ extern gimple_opt_pass *make_pass_iv_optimize (gcc::context *ctxt); extern gimple_opt_pass *make_pass_tree_loop_done (gcc::context *ctxt); extern gimple_opt_pass *make_pass_ch (gcc::context *ctxt); extern gimple_opt_pass *make_pass_ch_vect (gcc::context *ctxt); +extern gimple_opt_pass *make_pass_ch_oacc_kernels (gcc::context *ctxt); extern gimple_opt_pass *make_pass_ccp (gcc::context *ctxt); extern gimple_opt_pass *make_pass_phi_only_cprop (gcc::context *ctxt); extern gimple_opt_pass *make_pass_build_ssa (gcc::context *ctxt); diff --git a/gcc/tree-ssa-loop-ch.c b/gcc/tree-ssa-loop-ch.c index 7e618bf..8bf47fe 100644 --- a/gcc/tree-ssa-loop-ch.c +++ b/gcc/tree-ssa-loop-ch.c @@ -33,6 +33,7 @@ along with GCC; see the file COPYING3. If not see #include "tree-inline.h" #include "tree-ssa-scopedtables.h" #include "tree-ssa-threadedge.h" +#include "omp-low.h" /* Duplicates headers of loops if they are small enough, so that the statements in the loop body are always executed when the loop is entered. This @@ -124,7 +125,7 @@ do_while_loop_p (struct loop *loop) namespace { -/* Common superclass for both header-copying phases. */ +/* Common superclass for header-copying phases. */ class ch_base : public gimple_opt_pass { protected: @@ -159,6 +160,10 @@ public: : ch_base (pass_data_ch, ctxt) {} + pass_ch (pass_data data, gcc::context *ctxt) +: ch_base (data, ctxt) + {} + /* opt_pass methods: */ virtual bool gate (function *) { return flag_tree_ch != 0; } @@ -414,3 +419,50 @@ make_pass_ch (gcc::context *ctxt) { return new pass_ch (ctxt); } + +namespace { + +const pass_data pass_data_ch_oacc_kernels = +{ + GIMPLE_PASS, /* type */ + "ch_oacc_kernels", /* name */ + OPTGROUP_LOOP, /* optinfo_flags */ + TV_TREE_CH, /* tv_id */ + ( PROP_cfg | PROP_ssa ), /* properties_required */ + 0, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_start */ + TODO_cleanup_cfg, /* todo_flags_finish */ +}; + +class pass_ch_oacc_kernels : public pass_ch +{ +public: + pass_ch_oacc_kernels (gcc::context *ctxt) +: pass_ch (pass_data_ch_oacc_kernels, ctxt) + {} + + /* opt_pass methods: */ + virtual bool gate (function *) { return true; } + +protected: + /* ch_base
Re: [ping] Fix PR debug/66728
On Nov 6, 2015, at 5:06 AM, Richard Bienerwrote: >> If there are no substantial reasons to not check it in now, I’d like to >> proceed and get it checked in. People can refine it further in tree if they >> want. Any objections? > > Ok with a changelog entry and bootstrap/regtest. Also committed to the release branch after waiting a few days to ensure no issue on trunk after the normal regression test and bootstrap.
Re: [PATCH 01/12] reduce conditional compilation for HARD_FRAME_POINTER_IS_ARG_POINTER
On 11/09/2015 05:47 PM, tbsaunde+...@tbsaunde.org wrote: +++ b/gcc/dbxout.c @@ -3076,10 +3076,8 @@ dbxout_symbol_location (tree decl, tree type, const char *suffix, rtx home) || (REG_P (XEXP (home, 0)) && REGNO (XEXP (home, 0)) != HARD_FRAME_POINTER_REGNUM && REGNO (XEXP (home, 0)) != STACK_POINTER_REGNUM -#if !HARD_FRAME_POINTER_IS_ARG_POINTER - && REGNO (XEXP (home, 0)) != ARG_POINTER_REGNUM -#endif - ))) + && (HARD_FRAME_POINTER_IS_ARG_POINTER + || REGNO (XEXP (home, 0)) != ARG_POINTER_REGNUM This used to be #if ARG_POINTER_REGNUM != HARD_FRAME_POINTER_REGNUM and the whole macro seems kind of pointless - why not just make the ARG_POINTER_REGNUM test unconditional? I think the conditional compilation was originally just a "performance optimization", avoiding unnecessary tests - which means the reason to have the tests goes away if we move away from the conditional compilation. Bernd
Re: [PATCH][AArch64] PR target/68129: Define TARGET_SUPPORTS_WIDE_INT
On 9 November 2015 at 11:32, Kyrill Tkachovwrote: > 2015-11-09 Kyrylo Tkachov > > PR target/68129 > * config/aarch64/aarch64.h (TARGET_SUPPORTS_WIDE_INT): Define to 1. > * config/aarch64/aarch64.c (aarch64_print_operand, CONST_DOUBLE): > Delete VOIDmode case. Assert that mode is not VOIDmode. > * config/aarch64/predicates.md (const0_operand): Remove const_double > match. > > 2015-11-09 Kyrylo Tkachov > > PR target/68129 > * gcc.target/aarch64/pr68129_1.c: New test. Hi, This test isn't aarch64 specific, does it need to be in gcc.target/aarch64 ? Cheers /Marcus
Re: [PATCH][AArch64] PR target/68129: Define TARGET_SUPPORTS_WIDE_INT
On 9 November 2015 at 15:45, Kyrill Tkachovwrote: > > On 09/11/15 15:34, Marcus Shawcroft wrote: >> >> On 9 November 2015 at 11:32, Kyrill Tkachov >> wrote: >> >>> 2015-11-09 Kyrylo Tkachov >>> >>> PR target/68129 >>> * config/aarch64/aarch64.h (TARGET_SUPPORTS_WIDE_INT): Define to 1. >>> * config/aarch64/aarch64.c (aarch64_print_operand, CONST_DOUBLE): >>> Delete VOIDmode case. Assert that mode is not VOIDmode. >>> * config/aarch64/predicates.md (const0_operand): Remove const_double >>> match. >>> >>> 2015-11-09 Kyrylo Tkachov >>> >>> PR target/68129 >>> * gcc.target/aarch64/pr68129_1.c: New test. >> >> Hi, This test isn't aarch64 specific, does it need to be in >> gcc.target/aarch64 ? > > > Not really, here is the patch with the test in gcc.dg/ if that's preferred. OK /Marcus
[PATCH 2/6] Make builtin_vectorized_function take a combined_fn
This patch replaces the fndecl argument to builtin_vectorized_function with a combined_fn and gets the vectoriser to call it for internal functions too. The patch also moves vectorisation of machine-specific built-ins to a new hook, builtin_md_vectorized_function. I've attached a -b version too since that's easier to read. gcc/ * target.def (builtin_vectorized_function): Take a combined_fn (in the form of an unsigned int) rather than a function decl. (builtin_md_vectorized_function): New. * targhooks.h (default_builtin_vectorized_function): Replace the fndecl argument with an unsigned int. (default_builtin_md_vectorized_function): Declare. * targhooks.c (default_builtin_vectorized_function): Replace the fndecl argument with an unsigned int. (default_builtin_md_vectorized_function): New function. * doc/tm.texi.in (TARGET_VECTORIZE_BUILTIN_MD_VECTORIZED_FUNCTION): New hook. * doc/tm.texi: Regenerate. * tree-vect-stmts.c (vectorizable_function): Update call to builtin_vectorized_function, also passing internal functions. Call builtin_md_vectorized_function for target-specific builtins. * config/aarch64/aarch64-protos.h (aarch64_builtin_vectorized_function): Replace fndecl argument with an unsigned int. * config/aarch64/aarch64-builtins.c: Include case-cfn-macros.h. (aarch64_builtin_vectorized_function): Update after above changes. Use CASE_CFN_*. * config/arm/arm-protos.h (arm_builtin_vectorized_function): Replace fndecl argument with an unsigned int. * config/arm/arm-builtins.c: Include case-cfn-macros.h (arm_builtin_vectorized_function): Update after above changes. Use CASE_CFN_*. * config/i386/i386.c: Include case-cfn-macros.h (ix86_veclib_handler): Take a combined_fn rather than a built_in_function. (ix86_veclibabi_svml, ix86_veclibabi_acml): Likewise. Use mathfn_built_in rather than calling builtin_decl_implicit directly. (ix86_builtin_vectorized_function) Update after above changes. Use CASE_CFN_*. * config/rs6000/rs6000.c: Include case-cfn-macros.h (rs6000_builtin_vectorized_libmass): Replace fndecl argument with a combined_fn. Use CASE_CFN_*. Use mathfn_built_in rather than calling builtin_decl_implicit directly. (rs6000_builtin_vectorized_function): Update after above changes. Use CASE_CFN_*. Move BUILT_IN_MD to... (rs6000_builtin_md_vectorized_function): ...this new function. (TARGET_VECTORIZE_BUILTIN_MD_VECTORIZED_FUNCTION): Define. diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c index 6b4208f..c4cda4f 100644 --- a/gcc/config/aarch64/aarch64-builtins.c +++ b/gcc/config/aarch64/aarch64-builtins.c @@ -38,6 +38,7 @@ #include "expr.h" #include "langhooks.h" #include "gimple-iterator.h" +#include "case-cfn-macros.h" #define v8qi_UP V8QImode #define v4hi_UP V4HImode @@ -1258,7 +1259,8 @@ aarch64_expand_builtin (tree exp, } tree -aarch64_builtin_vectorized_function (tree fndecl, tree type_out, tree type_in) +aarch64_builtin_vectorized_function (unsigned int fn, tree type_out, + tree type_in) { machine_mode in_mode, out_mode; int in_n, out_n; @@ -1282,130 +1284,119 @@ aarch64_builtin_vectorized_function (tree fndecl, tree type_out, tree type_in) : (AARCH64_CHECK_BUILTIN_MODE (2, S) \ ? aarch64_builtin_decls[AARCH64_SIMD_BUILTIN_UNOP_##N##v2sf] \ : NULL_TREE))) - if (DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL) + switch (fn) { - enum built_in_function fn = DECL_FUNCTION_CODE (fndecl); - switch (fn) - { #undef AARCH64_CHECK_BUILTIN_MODE #define AARCH64_CHECK_BUILTIN_MODE(C, N) \ (out_mode == N##Fmode && out_n == C \ && in_mode == N##Fmode && in_n == C) - case BUILT_IN_FLOOR: - case BUILT_IN_FLOORF: - return AARCH64_FIND_FRINT_VARIANT (floor); - case BUILT_IN_CEIL: - case BUILT_IN_CEILF: - return AARCH64_FIND_FRINT_VARIANT (ceil); - case BUILT_IN_TRUNC: - case BUILT_IN_TRUNCF: - return AARCH64_FIND_FRINT_VARIANT (btrunc); - case BUILT_IN_ROUND: - case BUILT_IN_ROUNDF: - return AARCH64_FIND_FRINT_VARIANT (round); - case BUILT_IN_NEARBYINT: - case BUILT_IN_NEARBYINTF: - return AARCH64_FIND_FRINT_VARIANT (nearbyint); - case BUILT_IN_SQRT: - case BUILT_IN_SQRTF: - return AARCH64_FIND_FRINT_VARIANT (sqrt); +CASE_CFN_FLOOR: + return AARCH64_FIND_FRINT_VARIANT (floor); +CASE_CFN_CEIL: + return AARCH64_FIND_FRINT_VARIANT (ceil); +CASE_CFN_TRUNC: + return AARCH64_FIND_FRINT_VARIANT (btrunc); +CASE_CFN_ROUND: + return AARCH64_FIND_FRINT_VARIANT (round); +CASE_CFN_NEARBYINT: + return AARCH64_FIND_FRINT_VARIANT (nearbyint); +CASE_CFN_SQRT: + return AARCH64_FIND_FRINT_VARIANT (sqrt); #undef
[PATCH 03/12] remove conditional compilation of sdb debug info
From: Trevor SaundersWe need to include gsyms.h before tm.h because some targets (rl78 iirc) define macros that conflict with identifiers in gsyms.h. This means sdbout.c won't produce correct output for those targets, but it previously couldn't either because it wasn't compiled at all. gcc/ChangeLog: 2015-11-09 Trevor Saunders * defaults.h: New definition of SDB_DEBUGGING_INFO. * doc/tm.texi: Regenerate. * doc/tm.texi.in: Adjust. * final.c (rest_of_clean_state): Remove check if SDB_DEBUGGING_INFO is defined. * function.c (number_blocks): Likewise. * output.h: Likewise. * sdbout.c: Likewise. * toplev.c (process_options): Likewise. --- gcc/defaults.h | 8 ++-- gcc/doc/tm.texi| 2 +- gcc/doc/tm.texi.in | 2 +- gcc/final.c| 6 +- gcc/function.c | 2 +- gcc/output.h | 2 -- gcc/sdbout.c | 6 +- gcc/toplev.c | 6 +- 8 files changed, 12 insertions(+), 22 deletions(-) diff --git a/gcc/defaults.h b/gcc/defaults.h index cee799d..ddda89a 100644 --- a/gcc/defaults.h +++ b/gcc/defaults.h @@ -914,10 +914,14 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see #define DEFAULT_GDB_EXTENSIONS 1 #endif +#ifndef SDB_DEBUGGING_INFO +#define SDB_DEBUGGING_INFO 0 +#endif + /* If more than one debugging type is supported, you must define PREFERRED_DEBUGGING_TYPE to choose the default. */ -#if 1 < (defined (DBX_DEBUGGING_INFO) + defined (SDB_DEBUGGING_INFO) \ +#if 1 < (defined (DBX_DEBUGGING_INFO) + (SDB_DEBUGGING_INFO) \ + defined (DWARF2_DEBUGGING_INFO) + defined (XCOFF_DEBUGGING_INFO) \ + defined (VMS_DEBUGGING_INFO)) #ifndef PREFERRED_DEBUGGING_TYPE @@ -929,7 +933,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see #elif defined DBX_DEBUGGING_INFO #define PREFERRED_DEBUGGING_TYPE DBX_DEBUG -#elif defined SDB_DEBUGGING_INFO +#elif SDB_DEBUGGING_INFO #define PREFERRED_DEBUGGING_TYPE SDB_DEBUG #elif defined DWARF2_DEBUGGING_INFO || defined DWARF2_LINENO_DEBUGGING_INFO diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 5609a98..a174e21 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -9567,7 +9567,7 @@ whose value is the highest absolute text address in the file. Here are macros for SDB and DWARF output. @defmac SDB_DEBUGGING_INFO -Define this macro if GCC should produce COFF-style debugging output +Define this macro to 1 if GCC should produce COFF-style debugging output for SDB in response to the @option{-g} option. @end defmac diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index 96ca063a..9c13e9b 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -6992,7 +6992,7 @@ whose value is the highest absolute text address in the file. Here are macros for SDB and DWARF output. @defmac SDB_DEBUGGING_INFO -Define this macro if GCC should produce COFF-style debugging output +Define this macro to 1 if GCC should produce COFF-style debugging output for SDB in response to the @option{-g} option. @end defmac diff --git a/gcc/final.c b/gcc/final.c index 30b3826..2f57b1b 100644 --- a/gcc/final.c +++ b/gcc/final.c @@ -88,9 +88,7 @@ along with GCC; see the file COPYING3. If not see #include "dbxout.h" #endif -#ifdef SDB_DEBUGGING_INFO #include "sdbout.h" -#endif /* Most ports that aren't using cc0 don't need to define CC_STATUS_INIT. So define a null default for it to save conditionalization later. */ @@ -4644,10 +4642,8 @@ rest_of_clean_state (void) /* In case the function was not output, don't leave any temporary anonymous types queued up for sdb output. */ -#ifdef SDB_DEBUGGING_INFO - if (write_symbols == SDB_DEBUG) + if (SDB_DEBUGGING_INFO && write_symbols == SDB_DEBUG) sdbout_types (NULL_TREE); -#endif flag_rerun_cse_after_global_opts = 0; reload_completed = 0; diff --git a/gcc/function.c b/gcc/function.c index a637cb3..afc2c87 100644 --- a/gcc/function.c +++ b/gcc/function.c @@ -4671,7 +4671,7 @@ number_blocks (tree fn) /* For SDB and XCOFF debugging output, we start numbering the blocks from 1 within each function, rather than keeping a running count. */ -#if defined (SDB_DEBUGGING_INFO) || defined (XCOFF_DEBUGGING_INFO) +#if SDB_DEBUGGING_INFO || defined (XCOFF_DEBUGGING_INFO) if (write_symbols == SDB_DEBUG || write_symbols == XCOFF_DEBUG) next_block_index = 1; #endif diff --git a/gcc/output.h b/gcc/output.h index f6a576c..d485cd6 100644 --- a/gcc/output.h +++ b/gcc/output.h @@ -309,9 +309,7 @@ extern rtx_sequence *final_sequence; /* The line number of the beginning of the current function. Various md code needs this so that it can output relative linenumbers. */ -#ifdef SDB_DEBUGGING_INFO /* Avoid undef sym in certain broken linkers. */ extern int sdb_begin_function_line; -#endif /* File in which assembler code is being written. */ diff
[PATCH 08/12] always define DWARF2_LINENO_DEBUGGING_INFO
From: Trevor Saundersgcc/ChangeLog: 2015-11-09 Trevor Saunders * defaults.h (DWARF2_LINENO_DEBUGGING_INFO): new default definition. * dwarf2out.c (dwarf2out_init): Adjust. * opts.c (set_debug_level): Likewise. * toplev.c (process_options): Likewise. --- gcc/defaults.h | 6 +- gcc/dwarf2out.c | 4 ++-- gcc/opts.c | 9 - gcc/toplev.c| 4 +--- 4 files changed, 12 insertions(+), 11 deletions(-) diff --git a/gcc/defaults.h b/gcc/defaults.h index d1728aa..65ffe59 100644 --- a/gcc/defaults.h +++ b/gcc/defaults.h @@ -926,6 +926,10 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see #define DWARF2_DEBUGGING_INFO 0 #endif +#ifndef DWARF2_LINENO_DEBUGGING_INFO +#define DWARF2_LINENO_DEBUGGING_INFO 0 +#endif + #ifndef XCOFF_DEBUGGING_INFO #define XCOFF_DEBUGGING_INFO 0 #endif @@ -952,7 +956,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see #elif SDB_DEBUGGING_INFO #define PREFERRED_DEBUGGING_TYPE SDB_DEBUG -#elif DWARF2_DEBUGGING_INFO || defined DWARF2_LINENO_DEBUGGING_INFO +#elif DWARF2_DEBUGGING_INFO || DWARF2_LINENO_DEBUGGING_INFO #define PREFERRED_DEBUGGING_TYPE DWARF2_DEBUG #elif VMS_DEBUGGING_INFO diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c index cb6acc6..2d94bc3 100644 --- a/gcc/dwarf2out.c +++ b/gcc/dwarf2out.c @@ -23257,7 +23257,7 @@ dwarf2out_init (const char *filename ATTRIBUTE_UNUSED) /* Allocate the file_table. */ file_table = hash_table::create_ggc (50); -#ifndef DWARF2_LINENO_DEBUGGING_INFO +#if !DWARF2_LINENO_DEBUGGING_INFO /* Allocate the decl_die_table. */ decl_die_table = hash_table::create_ggc (10); @@ -23379,7 +23379,7 @@ dwarf2out_init (const char *filename ATTRIBUTE_UNUSED) text_section_line_info = new_line_info_table (); text_section_line_info->end_label = text_end_label; -#ifdef DWARF2_LINENO_DEBUGGING_INFO +#if DWARF2_LINENO_DEBUGGING_INFO cur_line_info_table = text_section_line_info; #endif diff --git a/gcc/opts.c b/gcc/opts.c index 0ed9ac6..1300a92 100644 --- a/gcc/opts.c +++ b/gcc/opts.c @@ -2287,11 +2287,10 @@ set_debug_level (enum debug_info_type type, int extended, const char *arg, if (extended == 2) { -#if DWARF2_DEBUGGING_INFO || defined DWARF2_LINENO_DEBUGGING_INFO - opts->x_write_symbols = DWARF2_DEBUG; -#elif DBX_DEBUGGING_INFO - opts->x_write_symbols = DBX_DEBUG; -#endif + if (DWARF2_DEBUGGING_INFO || DWARF2_LINENO_DEBUGGING_INFO) + opts->x_write_symbols = DWARF2_DEBUG; + else if (DBX_DEBUGGING_INFO) + opts->x_write_symbols = DBX_DEBUG; } if (opts->x_write_symbols == NO_DEBUG) diff --git a/gcc/toplev.c b/gcc/toplev.c index d015f0f..f318a98 100644 --- a/gcc/toplev.c +++ b/gcc/toplev.c @@ -1380,10 +1380,8 @@ process_options (void) && (write_symbols == VMS_DEBUG || write_symbols == VMS_AND_DWARF2_DEBUG)) debug_hooks = _debug_hooks; -#ifdef DWARF2_LINENO_DEBUGGING_INFO - else if (write_symbols == DWARF2_DEBUG) + else if (DWARF2_LINENO_DEBUGGING_INFO && write_symbols == DWARF2_DEBUG) debug_hooks = _lineno_debug_hooks; -#endif else error ("target system does not support the %qs debug format", debug_type_names[write_symbols]); -- 2.5.0.rc1.5.gc07173f
[PATCH 05/12] always define VMS_DEBUGGING_INFO
From: Trevor Saundersgcc/ChangeLog: 2015-11-09 Trevor Saunders * defaults.h (VMS_DEBUGGING_INFO): New default definition. * doc/tm.texi: Regenerate. * doc/tm.texi.in: Adjust. * dwarf2out.c (output_file_names): Likewise. (add_name_and_src_coords_attributes): Likewise. * dwarf2out.h: Likewise. * toplev.c (process_options): Likewise. * vmsdbgout.c: Likewise. --- gcc/defaults.h | 8 ++-- gcc/doc/tm.texi| 2 +- gcc/doc/tm.texi.in | 2 +- gcc/dwarf2out.c| 8 gcc/dwarf2out.h| 2 -- gcc/toplev.c | 6 +++--- gcc/vmsdbgout.c| 2 +- 7 files changed, 16 insertions(+), 14 deletions(-) diff --git a/gcc/defaults.h b/gcc/defaults.h index b518863..0de7899 100644 --- a/gcc/defaults.h +++ b/gcc/defaults.h @@ -922,12 +922,16 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see #define XCOFF_DEBUGGING_INFO 0 #endif +#ifndef VMS_DEBUGGING_INFO +#define VMS_DEBUGGING_INFO 0 +#endif + /* If more than one debugging type is supported, you must define PREFERRED_DEBUGGING_TYPE to choose the default. */ #if 1 < (defined (DBX_DEBUGGING_INFO) + (SDB_DEBUGGING_INFO) \ + defined (DWARF2_DEBUGGING_INFO) + (XCOFF_DEBUGGING_INFO) \ - + defined (VMS_DEBUGGING_INFO)) ++ (VMS_DEBUGGING_INFO)) #ifndef PREFERRED_DEBUGGING_TYPE #error You must define PREFERRED_DEBUGGING_TYPE #endif /* no PREFERRED_DEBUGGING_TYPE */ @@ -943,7 +947,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see #elif defined DWARF2_DEBUGGING_INFO || defined DWARF2_LINENO_DEBUGGING_INFO #define PREFERRED_DEBUGGING_TYPE DWARF2_DEBUG -#elif defined VMS_DEBUGGING_INFO +#elif VMS_DEBUGGING_INFO #define PREFERRED_DEBUGGING_TYPE VMS_AND_DWARF2_DEBUG #elif XCOFF_DEBUGGING_INFO diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 0399248..b3b684a 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -9720,7 +9720,7 @@ number @var{line} of the current source file to the stdio stream Here are macros for VMS debug format. @defmac VMS_DEBUGGING_INFO -Define this macro if GCC should produce debugging output for VMS +Define this macro to 1 if GCC should produce debugging output for VMS in response to the @option{-g} option. The default behavior for VMS is to generate minimal debug info for a traceback in the absence of @option{-g} unless explicitly overridden with @option{-g0}. This diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index 84e8383..0f0a4f2 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -7113,7 +7113,7 @@ number @var{line} of the current source file to the stdio stream Here are macros for VMS debug format. @defmac VMS_DEBUGGING_INFO -Define this macro if GCC should produce debugging output for VMS +Define this macro to 1 if GCC should produce debugging output for VMS in response to the @option{-g} option. The default behavior for VMS is to generate minimal debug info for a traceback in the absence of @option{-g} unless explicitly overridden with @option{-g0}. This diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c index 072e485..88c931c 100644 --- a/gcc/dwarf2out.c +++ b/gcc/dwarf2out.c @@ -101,7 +101,7 @@ static void dwarf2out_decl (tree); #define HAVE_XCOFF_DWARF_EXTRAS 0 #endif -#ifdef VMS_DEBUGGING_INFO +#if VMS_DEBUGGING_INFO int vms_file_stats_name (const char *, long long *, long *, char *, int *); /* Define this macro to be a nonzero value if the directory specifications @@ -10229,7 +10229,7 @@ output_file_names (void) int file_idx = backmap[i]; int dir_idx = dirs[files[file_idx].dir_idx].dir_idx; -#ifdef VMS_DEBUGGING_INFO +#if VMS_DEBUGGING_INFO #define MAX_VMS_VERSION_LEN 6 /* ";32768" */ /* Setting these fields can lead to debugger miscomparisons, @@ -17319,7 +17319,7 @@ add_name_and_src_coords_attributes (dw_die_ref die, tree decl) add_linkage_name (die, decl); } -#ifdef VMS_DEBUGGING_INFO +#if VMS_DEBUGGING_INFO /* Get the function's name, as described by its RTL. This may be different from the DECL_NAME name used in the source file. */ if (TREE_CODE (decl) == FUNCTION_DECL && TREE_ASM_WRITTEN (decl)) @@ -17331,7 +17331,7 @@ add_name_and_src_coords_attributes (dw_die_ref die, tree decl) #endif /* VMS_DEBUGGING_INFO */ } -#ifdef VMS_DEBUGGING_INFO +#if VMS_DEBUGGING_INFO /* Output the debug main pointer die for VMS */ void diff --git a/gcc/dwarf2out.h b/gcc/dwarf2out.h index 4fe3527..d344508 100644 --- a/gcc/dwarf2out.h +++ b/gcc/dwarf2out.h @@ -257,9 +257,7 @@ extern void debug_dwarf_loc_descr (dw_loc_descr_ref); extern void debug (die_struct ); extern void debug (die_struct *ptr); extern void dwarf2out_set_demangle_name_func (const char *(*) (const char *)); -#ifdef VMS_DEBUGGING_INFO extern void dwarf2out_vms_debug_main_pointer (void); -#endif enum array_descr_ordering { diff --git a/gcc/toplev.c
Re: [PATCH 10/12] always define EH_RETURN_HANDLER_RTX
On Mon, 2015-11-09 at 11:47 -0500, tbsaunde+...@tbsaunde.org wrote: > From: Trevor Saunders> > gcc/ChangeLog: > > 2015-11-09 Trevor Saunders > > * defaults.h (EH_RETURN_HANDLER_RTX): New default definition. > * df-scan.c (df_get_exit_block_use_set): Adjust. > * except.c (expand_eh_return): Likewise. > --- > gcc/defaults.h | 4 > gcc/df-scan.c | 2 -- > gcc/except.c | 9 - > 3 files changed, 8 insertions(+), 7 deletions(-) > > diff --git a/gcc/defaults.h b/gcc/defaults.h > index c20de44..047a0db 100644 > --- a/gcc/defaults.h > +++ b/gcc/defaults.h > @@ -1325,6 +1325,10 @@ see the files COPYING3 and COPYING.RUNTIME > respectively. If not, see > #define TARGET_PECOFF 0 > #endif > > +#ifndef EH_RETURN_HANDLER_RTX > +#define EH_RETURN_HANDLER_RTX NULL > +#endif > + > #ifdef GCC_INSN_FLAGS_H > /* Dependent default target macro definitions > > diff --git a/gcc/df-scan.c b/gcc/df-scan.c > index 2e5fe97..a735925 100644 > --- a/gcc/df-scan.c > +++ b/gcc/df-scan.c > @@ -3714,7 +3714,6 @@ df_get_exit_block_use_set (bitmap exit_block_uses) > } > #endif > > -#ifdef EH_RETURN_HANDLER_RTX >if ((!targetm.have_epilogue () || ! epilogue_completed) >&& crtl->calls_eh_return) > { > @@ -3722,7 +3721,6 @@ df_get_exit_block_use_set (bitmap exit_block_uses) >if (tmp && REG_P (tmp)) > df_mark_reg (tmp, exit_block_uses); > } > -#endif > >/* Mark function return value. */ >diddle_return_value (df_mark_reg, (void*) exit_block_uses); > diff --git a/gcc/except.c b/gcc/except.c > index 1801fe7..1a41a34 100644 > --- a/gcc/except.c > +++ b/gcc/except.c > @@ -2255,11 +2255,10 @@ expand_eh_return (void) > emit_insn (targetm.gen_eh_return (crtl->eh.ehr_handler)); >else > { > -#ifdef EH_RETURN_HANDLER_RTX > - emit_move_insn (EH_RETURN_HANDLER_RTX, crtl->eh.ehr_handler); > -#else > - error ("__builtin_eh_return not supported on this target"); > -#endif > + if (rtx handler = EH_RETURN_HANDLER_RTX) Would this be clearer as rtx handler = EH_RETURN_HANDLER_RTX; if (handler) ? (to avoid an assignment inside a conditional) > + emit_move_insn (handler, crtl->eh.ehr_handler); > + else > + error ("__builtin_eh_return not supported on this target"); > } > >emit_label (around_label);
Re: [PATCH], Add power9 support to GCC, patch #2 (add modulus instructions)
On Mon, Nov 09, 2015 at 09:48:50AM -0600, Segher Boessenkool wrote: > Hi, > > On Sun, Nov 08, 2015 at 07:36:16PM -0500, Michael Meissner wrote: > > [gcc/testsuite] > > * lib/target-supports.exp (check_p9vector_hw_available): Add > > checks for power9 availability. > > (check_effective_target_powerpc_p9vector_ok): Likewise. > > It's probably better not to use this for modulo; it is confusing and if > you'll later need to untangle it it is much more work. > > > +/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */ > > +/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */ > > Lose this line? If Darwin cannot support modulo, the next line will > catch that. > > +/* { dg-require-effective-target powerpc_p9vector_ok } */ > > +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { > > "-mcpu=power9" } } */ > > +/* { dg-options "-mcpu=power9 -O3" } */ > > Is -O3 needed? Why won't -O2 work? Just habit. > > +proc check_p9vector_hw_available { } { > > +return [check_cached_effective_target p9vector_hw_available { > > + # Some simulators are known to not support VSX/power8 instructions. > > + # For now, disable on Darwin > > + if { [istarget powerpc-*-eabi] || [istarget powerpc*-*-eabispe] || > > [istarget *-*-darwin*]} { > > Long line. Cut and paste from other tests. > > Index: gcc/config/rs6000/rs6000.md > > === > > --- gcc/config/rs6000/rs6000.md (revision 229972) > > +++ gcc/config/rs6000/rs6000.md (working copy) > > @@ -2885,9 +2885,9 @@ (define_insn_and_split "*div3_sra_ > > (set_attr "cell_micro" "not")]) > > > > (define_expand "mod3" > > - [(use (match_operand:GPR 0 "gpc_reg_operand" "")) > > - (use (match_operand:GPR 1 "gpc_reg_operand" "")) > > - (use (match_operand:GPR 2 "reg_or_cint_operand" ""))] > > + [(set (match_operand:GPR 0 "gpc_reg_operand" "") > > + (mod:GPR (match_operand:GPR 1 "gpc_reg_operand" "") > > +(match_operand:GPR 2 "reg_or_cint_operand" "")))] > > You could delete the empty constraint strings while you're at it. > > > +;; On machines with modulo support, do a combined div/mod the old fashioned > > +;; method, since the multiply/subtract is faster than doing the mod > > instruction > > +;; after a divide. > > You can instead have a "divmod" insn that is split to either of div, mod, > or div+mul+sub depending on which of the outputs is unused. Peepholes > do not get all cases. Yes, though as I recall, I couldn't get it to do what I wanted, and moved on to other targets. > This can be a later improvement of course. Yep. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Re: [PATCH 05/12] always define VMS_DEBUGGING_INFO
On 11/09/2015 11:34 AM, Bernd Schmidt wrote: In general I think the _DEBUGGING_INFO patches are going to be OK, modulo Jeff's comment about stage 1. I think they shouldn't have been split - it causes numerous unnecessary extra changes, and the intermediate stages look very inconsistent. -#ifdef VMS_DEBUGGING_INFO - else if (write_symbols == VMS_DEBUG || write_symbols == VMS_AND_DWARF2_DEBUG) + else if (VMS_DEBUGGING_INFO + && (write_symbols == VMS_DEBUG + || write_symbols == VMS_AND_DWARF2_DEBUG)) debug_hooks = _debug_hooks; -#endif #ifdef DWARF2_LINENO_DEBUGGING_INFO else if (write_symbols == DWARF2_DEBUG) debug_hooks = _lineno_debug_hooks; diff --git a/gcc/vmsdbgout.c b/gcc/vmsdbgout.c index d41d4b2..6dd6878 100644 --- a/gcc/vmsdbgout.c +++ b/gcc/vmsdbgout.c @@ -24,7 +24,7 @@ along with GCC; see the file COPYING3. If not see #include "coretypes.h" #include "tm.h" -#ifdef VMS_DEBUGGING_INFO +#if VMS_DEBUGGING_INFO #include "alias.h" #include "tree.h" #include "varasm.h" This seems to reference vmsdbg_debug_hooks unconditionally, but as far as I can tell the definition is still guarded by an #if? Does this compile? There's an easy way for Trevor to find out. Build a cross for one of the VMS targets (there's 3 defined in config-list.mk) :-) jeff
Re: [PATCH 02/12] remove EXTENDED_SDB_BASIC_TYPES
The last target using this was i960, which was removed many years ago, so there's no reason to keep it. gcc/ChangeLog: 2015-11-09 Trevor Saunders* gsyms.h (enum sdb_type): Remove code for EXTENDED_SDB_BASIC_TYPES. (enum sdb_masks): Likewise. * sdbout.c (plain_type_1): Likewise. Ok if you also poison the macro name as usual. Bernd
Re: [PATCH], Add power9 support to GCC, patch #3 (scalar count trailing zeros)
On Mon, Nov 09, 2015 at 09:59:43AM -0600, Segher Boessenkool wrote: > On Sun, Nov 08, 2015 at 07:37:53PM -0500, Michael Meissner wrote: > > This patch adds support for scalar count trailing zeros instruction that is > > being added to ISA 3.0 (power9). > > I bet you should change CTZ_DEFINED_VALUE_AT_ZERO as well. > > > +(define_insn "ctz2_hw" > > + [(set (match_operand:GPR 0 "gpc_reg_operand" "=r") > > + (ctz:GPR (match_operand:GPR 1 "gpc_reg_operand" "r")))] > > + "TARGET_CTZ" > > + "cnttz %0,%1" > > + [(set_attr "type" "cntlz")]) > > We should probably rename this attr value now. "cntz" maybe? Could be > later of course. I don't see a need to add another type attribute for count trailing zeros unless count leading zeros has a different timing than count trailing zeros. The cntlz attribute was added because in Power7 the CNTLZ instruction became a 2 cycle instruction, and we wanted to model this in power7.md (and hence cntlz was split from the simple integer attribute). -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Re: [PATCH 12/12] always define ENABLE_OFFLOADING
On 11/09/2015 05:47 PM, tbsaunde+...@tbsaunde.org wrote: -#ifdef ENABLE_OFFLOADING /* If the user didn't specify any, default to all configured offload targets. */ if (offload_targets == NULL) handle_foffload_option (OFFLOAD_TARGETS); -#endif This one I would keep guarded with an if. Otherwise ok modulo stage 1 end. Bernd
Re: [PATCH 00/12] misc conditional compilation work
On Mon, Nov 09, 2015 at 10:57:10AM -0700, Jeff Law wrote: > On 11/09/2015 09:47 AM, tbsaunde+...@tbsaunde.org wrote: > >From: Trevor Saunders> > > >Hi, > > > >basically $subject, making some code unconditionally compiled, and changing > >other things from #ifdef to #if so they can be made unconditional > >incrementally. > > > >patches individually bootstrapped + regtested on x86_64-linux-gnu, and a > >slightly earlier version of the series ran through config-list.mk. I think > >everything here is either preapproved, or obvious so I'll commit it later > >today if nobody complains. > Are these the last patches of this nature planned for GCC6? While the > window was left slightly open by Richi this morning, I think that's more to > allow the queues to drain rather than to allow more new work to go into the > tree :-) yeah, I guess I misread, I thought the end was tonight not last night (I could easily have sent this out a day or so earlier). Given my in correct assumption about timing I was considering trying to sneak in a little more around reg-stack.c, but I suspect that isn't going to work out anyway (turns out even after the macros reg-stack.c uses x86 specific variables). Trev > > jeff >
Re: [PATCH v2 11/13] Test case for conversion from __seg_tls:0
Hi! On Mon, 9 Nov 2015 15:46:20 +0100, Richard Bienerwrote: > On Tue, Oct 20, 2015 at 11:27 PM, Richard Henderson wrote: > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/i386/addr-space-3.c > > @@ -0,0 +1,10 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-O" } */ > > +/* { dg-final { scan-assembler "[fg]s:0" } } */ > > Causes > > ERROR: (DejaGnu) proc "fg" does not exist. > The error code is NONE > The info on the error is: > close: spawn id exp6 not open > while executing > "close -i exp6" > invoked from within > "catch "close -i $spawn_id"" In r230038, I checked in the the following, as obvious: commit a7d978247cd261d66010195908ce0e9ef0e501b9 Author: tschwinge Date: Mon Nov 9 17:53:02 2015 + Resolve DejaGnu hard stop gcc/testsuite/ * gcc.target/i386/addr-space-3.c: Fix quoting in dg-final scan-assembler directive. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@230038 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/testsuite/ChangeLog |5 + gcc/testsuite/gcc.target/i386/addr-space-3.c |2 +- 2 files changed, 6 insertions(+), 1 deletion(-) diff --git gcc/testsuite/ChangeLog gcc/testsuite/ChangeLog index ca1991b..da4f940 100644 --- gcc/testsuite/ChangeLog +++ gcc/testsuite/ChangeLog @@ -1,3 +1,8 @@ +2015-11-09 Thomas Schwinge + + * gcc.target/i386/addr-space-3.c: Fix quoting in dg-final + scan-assembler directive. + 2015-11-09 Kyrylo Tkachov PR target/68129 diff --git gcc/testsuite/gcc.target/i386/addr-space-3.c gcc/testsuite/gcc.target/i386/addr-space-3.c index 63f1f03..2b6f47e 100644 --- gcc/testsuite/gcc.target/i386/addr-space-3.c +++ gcc/testsuite/gcc.target/i386/addr-space-3.c @@ -1,6 +1,6 @@ /* { dg-do compile } */ /* { dg-options "-O" } */ -/* { dg-final { scan-assembler "[fg]s:0" } } */ +/* { dg-final { scan-assembler "\[fg]s:0" } } */ void test(int *y) { Grüße Thomas signature.asc Description: PGP signature
Re: [PATCH 10/12] always define EH_RETURN_HANDLER_RTX
On 11/09/2015 07:42 PM, Jeff Law wrote: On 11/09/2015 11:27 AM, Bernd Schmidt wrote: On 11/09/2015 05:47 PM, tbsaunde+...@tbsaunde.org wrote: From: Trevor Saundersgcc/ChangeLog: 2015-11-09 Trevor Saunders * defaults.h (EH_RETURN_HANDLER_RTX): New default definition. * df-scan.c (df_get_exit_block_use_set): Adjust. * except.c (expand_eh_return): Likewise. As I said for a previous patch series, if we go to the trouble of fixing up stuff like this, we might as well do it properly and turn things like this into a target hook. I agree that pushing hookization further is good as well. I still think the patch in and of itself is a step forward, even if it doesn't hookize EH_RETURN_HANDLER_RTX. Well, I was hoping that, by pointing out the issue for the last patch set, the next set of patches would get things right. We really shouldn't make sideways steps when there's a simple way to go forward. Bernd
Re: [PATCH], Add power9 support to GCC, patch #5 (ISA 3.0 fusion)
On Sun, Nov 8, 2015 at 4:42 PM, Michael Meissnerwrote: > This patch adds support for new fusion forms in ISA 3.0 (power9). In > particular, ISA 3.0 can fuse GPR loads of R0, FPR loads, GPR stores, FPR > stores, and some constant generation that ISA 2.07 (power8) could not > generate. > > I have built this patch with a bootstrap build on a power8 little endian > system. There were no regressions in the test suite. Is this patch ok to > install in the trunk once patch #1 has been installed. > > [gcc] > 2015-11-08 Michael Meissner > > * config/rs6000/constraints.md (wF constraint): New constraints > for power9/toc fusion. > (wG constraint): Likewise. > > * config/rs6000/predicates.md (upper16_cint_operand): New > predicate for power9 and toc fusion. > (fpr_reg_operand): Likewise. > (toc_fusion_or_p9_reg_operand): Likewise. > (toc_fusion_mem_raw): Likewise. > (toc_fusion_mem_wrapped): Likewise. > (fusion_gpr_addis): If power9 fusion, allow fusion for a larger > address range. > (fusion_gpr_mem_combo): Delete, use fusion_addis_mem_combo_load > instead. > (fusion_addis_mem_combo_load): Add support for power9 fusion of > floating point loads, floating point stores, and gpr stores. > (fusion_addis_mem_combo_store): Likewise. > (fusion_offsettable_mem_operand): Likewise. > > * config/rs6000/rs6000-protos.h (emit_fusion_addis): Add > declarations. > (emit_fusion_load_store): Likewise. > (fusion_p9_p): Likewise. > (expand_fusion_p9_load): Likewise. > (expand_fusion_p9_store): Likewise. > (emit_fusion_p9_load): Likewise. > (emit_fusion_p9_store): Likewise. > (fusion_wrap_memory_address): Likewise. > > * config/rs6000/rs6000.c (struct rs6000_reg_addr): Add new > elements for power9 fusion. > (rs6000_debug_print_mode): Rework debug information to print more > information about fusion. > (rs6000_init_hard_regno_mode_ok): Setup for power9 fusion > support. > (rs6000_legitimate_address_p): Recognize toc fusion as a valid > offsettable memory address. > (emit_fusion_gpr_load): Move most of the code from > emit_fusion_gpr_load into emit_fusion-addis that handles both > power8 and power9 fusion. > (emit_fusion_addis): Likewise. > (emit_fusion_load_store): Likewise. > (fusion_wrap_memory_address): Add support for TOC fusion. > (fusion_split_address): Likewise. > (fusion_p9_p): Add support for power9 fusion. > (expand_fusion_p9_load): Likewise. > (expand_fusion_p9_store): Likewise. > (emit_fusion_p9_load): Likewise. > (emit_fusion_p9_store): Likewise. > > * config/rs6000/rs6000.h (TARGET_TOC_FUSION_INT): New macros for > power9 fusion support. > (TARGET_TOC_FUSION_FP): Likewise. > > * config/rs6000/rs6000.md (UNSPEC_FUSION_P9): New power9/toc > fusion unspecs. > (UNSPEC_FUSION_ADDIS): Likewise. > (QHSI mode iterator): New iterator for power9 fusion. > (GPR_FUSION): Likewise. > (FPR_FUSION): Likewise. > (power9 fusion splitter): New power9/toc fusion support. > (toc_fusionload_): Likewise. > (toc_fusionload_di): Likewise. > (fusion_gpr_load_): Update predicate function. > (power9 fusion peephole2s): New power9/toc fusion support. > (fusion_gpr___load): Likewise. > (fusion_gpr___store): Likewise. > (fusion_fpr___load): Likewise. > (fusion_fpr___store): Likewise. > (fusion_p9__constant): Likewise. > > [gcc/testsuite] > 2015-11-08 Michael Meissner > > * gcc.target/powerpc/fusion.c (fusion_vector): Move to fusion2.c > and allow the test on PowerPC LE. > * gcc.target/powerpc/fusion2.c (fusion_vector): Likewise. > > * gcc.target/powerpc/fusion3.c: New file, test power9 fusion. Okay, with the changes that you and Segher discussed. Thanks, David
Re: RFC: Incomplete Draft Patches to Correct Errors in Loop Unrolling Frequencies (bugzilla problem 68212)
On 11/07/2015 03:44 PM, Kelvin Nilsen wrote: This is a draft patch to partially address the concerns described in bugzilla problem report https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68212). The patch is incomplete in the sense that there are some known shortcomings with nested loops which I am still working on. I am sending this out for comments at this time because we would like these patches to be integrated into the GCC 6 release and want to begin responding to community feedback as soon as possible in order to make the integration possible. I'll mainly comment on style points right now. Your code generally looks mostly good but doesn't quite follow our guidelines. In terms of logic I'm sure there will be followup questions after the first round of points is addressed. Others might be better qualified to review the frequencies manipulation; for this first round I'm just assuming that the general idea is sound (but would appreciate input). 1. Before a loop body is unpeeled into a pre-header location, we temporarily adjust the loop body frequencies to represent the values appropriate for the context into which the loop body is to be copied. 2. After unrolling the loop body (by replicating the loop body (N-1) times within the loop), we recompute all frequencies associated with blocks contained within the loop. If these are independent from each other it might be better to split up the patch. (check_loop_frequency_integrity): new helper routine (set_zero_probability): added another parameter (duplicate_loop_to_header_edge): Add code to recompute loop body frequencies after blocks are replicated (unrolled) into the loop body. Introduce certain help routines because existing infrastructure routines are not reliable during typical executions of duplicate_loop_to_header_edge(). Please review our guidelines how to write ChangeLogs - capitalize, punctuate, and document only the what, not the why. Also, linewrap manually. opt_info_start_duplication (opt_info); + ok = duplicate_loop_to_header_edge (loop, loop_latch_edge (loop), Please make sure you don't change whitespace unnecessarily. There are a few other occurrences in the patch, and also cases where you seem to be adding spaces to otherwise blank lines. @@ -1015,14 +1041,44 @@ unroll_loop_runtime_iterations (struct loop *loop) bitmap_clear_bit (wont_exit, may_exit_copy); opt_info_start_duplication (opt_info); + { +/* Recompute the loop body frequencies. */ +zero_loop_frequencies (loop); No reason to start a braced block here. +/* Scale the incoming frequencies according to the heuristic that + * the loop frequency is the incoming edge frequency divided by + * 0.09. This heuristic applies only to loops that iterate over a + * run-time value that is not known at compile time. Note that + * 1/.09 equals 11.. We'll use integer arithmetic on ten + * thousandths, and then divide by 10,000 after we've "rounded". + */ Please examine the comment style in gcc - no asterisks to start new lines, and comment terminators don't go on their own line. +sum_incoming_frequencies *= 11; /* multiply by 11. */ +sum_incoming_frequencies += 5000;/* round by adding 0.5 */ +sum_incoming_frequencies /= 1; /* convert ten thousandths + to ones +*/ These comments could also be improved, but really they should just be removed since they're pretty obvious and redundant with the one before. +/* Define ENABLE_CHECKING to enforce the following run-time checks. "With checking enabled, the following run-time checks are performed:" + * This may report false-positive errors due to round-off errors. That doesn't sound good as it could lead to bootstrap failures when checking is enabled. @@ -44,6 +55,543 @@ static void fix_loop_placements (struct loop *, bo static bool fix_bb_placement (basic_block); static void fix_bb_placements (basic_block, bool *, bitmap); +/* + * Return true iff block is considered to reside within the loop + * represented by loop_ptr. + */ Arguments are capitalized in function comments. +bool +in_loop_p (basic_block block, struct loop *loop_ptr) +{ + basic_block *bbs = get_loop_body (loop_ptr); + bool result = false; + + for (unsigned int i = 0; i < loop_ptr->num_nodes; i++) +{ + if (bbs[i] == block) + result = true; +} I think something that starts with bb->loop_father and iterates outwards would be more efficient. +/* A list of block_ladder_rung structs is used to keep track of all the + * blocks visited in a depth-first recursive traversal of a control-flow + * graph. This list is used to detect and prevent attempts to revisit + * a block that is already being visited in the recursive traversal. + */ +typedef struct block_ladder_rung { + basic_block block; + struct
Re: [PATCH 5/6] Simplify rs6000_builtin_vectorized_function
On Mon, Nov 9, 2015 at 8:30 AM, Richard Sandifordwrote: > After the previous patches it's no longer necessary for > TARGET_BUILTIN_VECTORIZED_FUNCTION to return functions that > map to the vector optab of the original operation. We'll use > a vector form of the internal function instead. > > > gcc/ > * config/rs6000/rs6000.c (rs6000_builtin_vectorized_function): Remove > entries that map directly to optabs. Okay. Thanks, David
Re: [PATCH 11/12] always define HAVE_AS_LEB128
-#ifdef HAVE_AS_LEB128 +#if HAVE_AS_LEB128 This patch doesn't seem to actually remove any conditional compilation? Bernd
Re: [PATCH 10/12] always define EH_RETURN_HANDLER_RTX
On 11/09/2015 11:27 AM, Bernd Schmidt wrote: On 11/09/2015 05:47 PM, tbsaunde+...@tbsaunde.org wrote: From: Trevor Saundersgcc/ChangeLog: 2015-11-09 Trevor Saunders * defaults.h (EH_RETURN_HANDLER_RTX): New default definition. * df-scan.c (df_get_exit_block_use_set): Adjust. * except.c (expand_eh_return): Likewise. As I said for a previous patch series, if we go to the trouble of fixing up stuff like this, we might as well do it properly and turn things like this into a target hook. I agree that pushing hookization further is good as well. I still think the patch in and of itself is a step forward, even if it doesn't hookize EH_RETURN_HANDLER_RTX. jeff
Re: [PATCH][AArch64] PR target/68129: Define TARGET_SUPPORTS_WIDE_INT
On Nov 9, 2015, at 3:32 AM, Kyrill Tkachovwrote: > The aarch64 port does not define TARGET_SUPPORTS_WIDE_INT. > Ok for trunk and GCC 5? :-) I’d endorse it, but, best left to the target folks.
Re: [PATCH 10/12] always define EH_RETURN_HANDLER_RTX
On Mon, Nov 09, 2015 at 11:42:19AM -0700, Jeff Law wrote: > On 11/09/2015 11:27 AM, Bernd Schmidt wrote: > >On 11/09/2015 05:47 PM, tbsaunde+...@tbsaunde.org wrote: > >>From: Trevor Saunders> >> > >>gcc/ChangeLog: > >> > >>2015-11-09 Trevor Saunders > >> > >>* defaults.h (EH_RETURN_HANDLER_RTX): New default definition. > >>* df-scan.c (df_get_exit_block_use_set): Adjust. > >>* except.c (expand_eh_return): Likewise. > > > >As I said for a previous patch series, if we go to the trouble of fixing > >up stuff like this, we might as well do it properly and turn things like > >this into a target hook. > I agree that pushing hookization further is good as well. I still think the > patch in and of itself is a step forward, even if it doesn't hookize > EH_RETURN_HANDLER_RTX. yeah, that's more or less my thought, and this makes hookization easier since you can now mechanically add a hook for each thing in defaults.h that invokes the macro. Then for each target you can go through and replace the macro with an override of the hooks. That ends up with the macros replaced by hooks without writing a lot of patches that need to go through config-list.mk, and testing on multiple targets which imho is a giant pain, and rather slow. Trev > > jeff
Re: [Patch AArch64] Switch constant pools to separate rodata sections.
On Mon, Nov 09, 2015 at 04:46:01PM +, Ramana Radhakrishnan wrote: > > > On 08/11/15 11:42, Andreas Schwab wrote: > > This is causing a bootstrap comparison failure in gcc/go/gogo.o. > > > > Andreas. > > > > I've had a look at this for sometime this afternoon and the trigger is the > aarch64_use_constant_blocks_p change which appears to be causing a bootstrap > comparison failure because of differences to offsets in add instructions when > built with debug and without debug. For now, in the interest of go bootstraps > continuing on trunk - I'm proposing a patch that partially rolls back the > change in aarch64_use_constant_blocks_p and will still look into the > underlying issue. > > Bootstrapped on aarch64-none-linux-gnu including (c,c++ and go) - testing > finished ok. > > Ok ? I agreem, this seems like the timely way to get the auto-testers back building. OK. Thanks, James > > > Ramana > > > PR bootstrap/68256 > > * config/aarch64/aarch64.c (aarch64_use_constant_blocks_p): Return false. > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index 1b7be83..1fff878 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -5251,9 +5251,11 @@ aarch64_can_use_per_function_literal_pools_p (void) > static bool > aarch64_use_blocks_for_constant_p (machine_mode, const_rtx) > { > - /* We can't use blocks for constants when we're using a per-function > - constant pool. */ > - return !aarch64_can_use_per_function_literal_pools_p (); > + /* Fixme:: In an ideal world this would work similar > + to the logic in aarch64_select_rtx_section but this > + breaks bootstrap in gcc go. For now we workaround > + this by returning false here. */ > + return false; > } > > /* Select appropriate section for constants depending
Re: [PATCH 00/12] misc conditional compilation work
On 11/09/2015 09:47 AM, tbsaunde+...@tbsaunde.org wrote: From: Trevor SaundersHi, basically $subject, making some code unconditionally compiled, and changing other things from #ifdef to #if so they can be made unconditional incrementally. patches individually bootstrapped + regtested on x86_64-linux-gnu, and a slightly earlier version of the series ran through config-list.mk. I think everything here is either preapproved, or obvious so I'll commit it later today if nobody complains. Are these the last patches of this nature planned for GCC6? While the window was left slightly open by Richi this morning, I think that's more to allow the queues to drain rather than to allow more new work to go into the tree :-) jeff
Re: [PATCH], Add power9 support to GCC, patch #4
On Sun, Nov 8, 2015 at 4:39 PM, Michael Meissnerwrote: > This patch adds support for the EXTSWSLI instruction that is being added to > PowerPC ISA 3.0 (power9). > > I have built this patch (along with patches #2 and #3) with a bootstrap build > on a power8 little endian system. There were no regressions in the test > suite. Is this patch ok to install in the trunk once patch #1 has been > installed. > > [gcc] > 2015-11-08 Michael Meissner > > * config/rs6000/predicates.md (u6bit_cint_operand): New > predicate, recognize 0..63. > > * config/rs6000/rs6000.c (rs6000_rtx_costs): Adjust the costs if > the EXTSWSLI instruction is generated. > > * config/rs6000/rs6000.h (TARGET_EXTSWSLI): Add support for ISA > 3.0 EXTSWSLI instruction. > * config/rs6000/rs6000.md (ashdi3_extswsli): Likewise. > (ashdi3_extswsli_dot): Likewise. > (ashdi3_extswsli_dot2): Likewise. > > [gcc/testsuite] > 2015-11-08 Michael Meissner > > * gcc.target/powerpc/extswsli-1.c: New file to test EXTSWSLI > instruction generation. > * gcc.target/powerpc/extswsli-2.c: Likewise. > * gcc.target/powerpc/extswsli-3.c: Likewise. Okay. Thanks, David
Re: [PATCH][AArch64] PR target/68129: Define TARGET_SUPPORTS_WIDE_INT
On 09/11/15 15:34, Marcus Shawcroft wrote: On 9 November 2015 at 11:32, Kyrill Tkachovwrote: 2015-11-09 Kyrylo Tkachov PR target/68129 * config/aarch64/aarch64.h (TARGET_SUPPORTS_WIDE_INT): Define to 1. * config/aarch64/aarch64.c (aarch64_print_operand, CONST_DOUBLE): Delete VOIDmode case. Assert that mode is not VOIDmode. * config/aarch64/predicates.md (const0_operand): Remove const_double match. 2015-11-09 Kyrylo Tkachov PR target/68129 * gcc.target/aarch64/pr68129_1.c: New test. Hi, This test isn't aarch64 specific, does it need to be in gcc.target/aarch64 ? Not really, here is the patch with the test in gcc.dg/ if that's preferred. Thanks, Kyrill 2015-11-09 Kyrylo Tkachov PR target/68129 * config/aarch64/aarch64.h (TARGET_SUPPORTS_WIDE_INT): Define to 1. * config/aarch64/aarch64.c (aarch64_print_operand, CONST_DOUBLE): Delete VOIDmode case. Assert that mode is not VOIDmode. * config/aarch64/predicates.md (const0_operand): Remove const_double match. 2015-11-09 Kyrylo Tkachov PR target/68129 * gcc.dg/pr68129_1.c: New test. Cheers /Marcus commit 623ffaa527b17ad01179c30c1d4a9911243f818a Author: Kyrylo Tkachov Date: Wed Oct 28 10:49:44 2015 + [AArch64] PR target/68129: Define TARGET_SUPPORTS_WIDE_INT diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index ce155dc..927b72a 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -4403,11 +4403,10 @@ aarch64_print_operand (FILE *f, rtx x, char code) break; case CONST_DOUBLE: - /* CONST_DOUBLE can represent a double-width integer. - In this case, the mode of x is VOIDmode. */ - if (GET_MODE (x) == VOIDmode) - ; /* Do Nothing. */ - else if (aarch64_float_const_zero_rtx_p (x)) + /* Since we define TARGET_SUPPORTS_WIDE_INT we shouldn't ever + be getting CONST_DOUBLEs holding integers. */ + gcc_assert (GET_MODE (x) != VOIDmode); + if (aarch64_float_const_zero_rtx_p (x)) { fputc ('0', f); break; diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index b041a1e..0fac0a7 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -863,6 +863,8 @@ extern enum aarch64_code_model aarch64_cmodel; (aarch64_cmodel == AARCH64_CMODEL_TINY \ || aarch64_cmodel == AARCH64_CMODEL_TINY_PIC) +#define TARGET_SUPPORTS_WIDE_INT 1 + /* Modes valid for AdvSIMD D registers, i.e. that fit in half a Q register. */ #define AARCH64_VALID_SIMD_DREG_MODE(MODE) \ ((MODE) == V2SImode || (MODE) == V4HImode || (MODE) == V8QImode \ diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md index 1bcbf62..8775460 100644 --- a/gcc/config/aarch64/predicates.md +++ b/gcc/config/aarch64/predicates.md @@ -32,7 +32,7 @@ (define_predicate "aarch64_call_insn_operand" ;; Return true if OP a (const_int 0) operand. (define_predicate "const0_operand" - (and (match_code "const_int, const_double") + (and (match_code "const_int") (match_test "op == CONST0_RTX (mode)"))) (define_predicate "aarch64_ccmp_immediate" diff --git a/gcc/testsuite/gcc.dg/pr68129_1.c b/gcc/testsuite/gcc.dg/pr68129_1.c new file mode 100644 index 000..112331e --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr68129_1.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O -fno-split-wide-types" } */ + +typedef int V __attribute__ ((vector_size (8 * sizeof (int; + +void +foo (V *p, V *q) +{ + *p = (*p == *q); +}
[PATCH, 2/16] Make create_parallel_loop return void
On 09/11/15 16:35, Tom de Vries wrote: Hi, this patch series for stage1 trunk adds support to: - parallelize oacc kernels regions using parloops, and - map the loops onto the oacc gang dimension. The patch series contains these patches: 1Insert new exit block only when needed in transform_to_exit_first_loop_alt 2Make create_parallel_loop return void 3Ignore reduction clause on kernels directive 4Implement -foffload-alias 5Add in_oacc_kernels_region in struct loop 6Add pass_oacc_kernels 7Add pass_dominator_oacc_kernels 8Add pass_ch_oacc_kernels 9Add pass_parallelize_loops_oacc_kernels 10Add pass_oacc_kernels pass group in passes.def 11Update testcases after adding kernels pass group 12Handle acc loop directive 13Add c-c++-common/goacc/kernels-*.c 14Add gfortran.dg/goacc/kernels-*.f95 15Add libgomp.oacc-c-c++-common/kernels-*.c 16Add libgomp.oacc-fortran/kernels-*.f95 The first 9 patches are more or less independent, but patches 10-16 are intended to be committed at the same time. Bootstrapped and reg-tested on x86_64. Build and reg-tested with nvidia accelerator, in combination with a patch that enables accelerator testing (which is submitted at https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ). I'll post the individual patches in reply to this message. this patch makes create_parallel_loop return void. The result is currently unused. Thanks, - Tom Make create_parallel_loop return void 2015-11-09 Tom de Vries* tree-parloops.c (create_parallel_loop): Return void. --- gcc/tree-parloops.c | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c index 6a49aa9..17415a8 100644 --- a/gcc/tree-parloops.c +++ b/gcc/tree-parloops.c @@ -1986,10 +1986,9 @@ transform_to_exit_first_loop (struct loop *loop, /* Create the parallel constructs for LOOP as described in gen_parallel_loop. LOOP_FN and DATA are the arguments of GIMPLE_OMP_PARALLEL. NEW_DATA is the variable that should be initialized from the argument - of LOOP_FN. N_THREADS is the requested number of threads. Returns the - basic block containing GIMPLE_OMP_PARALLEL tree. */ + of LOOP_FN. N_THREADS is the requested number of threads. */ -static basic_block +static void create_parallel_loop (struct loop *loop, tree loop_fn, tree data, tree new_data, unsigned n_threads, location_t loc) { @@ -2162,8 +2161,6 @@ create_parallel_loop (struct loop *loop, tree loop_fn, tree data, /* After the above dom info is hosed. Re-compute it. */ free_dominance_info (CDI_DOMINATORS); calculate_dominance_info (CDI_DOMINATORS); - - return paral_bb; } /* Generates code to execute the iterations of LOOP in N_THREADS -- 1.9.1
Re: [PATCH], Add power9 support to GCC, patch #1 (revised)
On Sun, Nov 8, 2015 at 4:33 PM, Michael Meissnerwrote: > This is patch #1 that I revised. I changed -mfusion-toc to -mtoc-fusion. I > changed the references to ISA 2.08 to 3.0. I added two new debug switches for > code in future patches that in undergoing development and is not ready to be > on > by default. > > I have done a bootstrap build on a little endian power8 system and there were > no regressions in this patch. Is it ok to install in the trunk? > > 2015-11-08 Michael Meissner > > * config/rs6000/rs6000.opt (-mpower9-fusion): Add new switches for > ISA 3.0 (power9). > (-mpower9-vector): Likewise. > (-mpower9-dform): Likewise. > (-mpower9-minmax): Likewise. > (-mtoc-fusion): Likewise. > (-mmodulo): Likewise. > (-mfloat128-hardware): Likewise. > > * config/rs6000/rs6000-cpus.def (ISA_3_0_MASKS_SERVER): Add option > mask for ISA 3.0 (power9). > (POWERPC_MASKS): Add new ISA 3.0 switches. > (power9 cpu): Add power9 cpu. > > * config/rs6000/rs6000.h (ASM_CPU_POWER9_SPEC): Add support for > power9. > (ASM_CPU_SPEC): Likewise. > (EXTRA_SPECS): Likewise. > > * config/rs6000/rs6000-opts.h (enum processor_type): Add > PROCESSOR_POWER9. > > * config/rs6000/rs6000.c (power9_cost): Initial cost setup for > power9. > (rs6000_debug_reg_global): Add support for power9 fusion. > (rs6000_setup_reg_addr_masks): Cache mode size. > (rs6000_option_override_internal): Until real power9 tuning is > added, use -mtune=power8 for -mcpu=power9. > (rs6000_setup_reg_addr_masks): Do not allow pre-increment, > pre-decrement, or pre-modify on SFmode/DFmode if we allow the use > of Altivec registers. > (rs6000_option_override_internal): Add support for ISA 3.0 > switches. > (rs6000_loop_align): Add support for power9 cpu. > (rs6000_file_start): Likewise. > (rs6000_adjust_cost): Likewise. > (rs6000_issue_rate): Likewise. > (insn_must_be_first_in_group): Likewise. > (insn_must_be_last_in_group): Likewise. > (force_new_group): Likewise. > (rs6000_register_move_cost): Likewise. > (rs6000_opt_masks): Likewise. > > * config/rs6000/rs6000.md (cpu attribute): Add power9. > * config/rs6000/rs6000-tables.opt: Regenerate. > > * config/rs6000/rs6000-c.c (rs6000_target_modify_macros): Define > _ARCH_PWR9 if power9 support is available. > > * config/rs6000/aix61.h (ASM_CPU_SPEC): Add power9. > * config/rs6000/aix53.h (ASM_CPU_SPEC): Likewise. > > * configure.ac: Determine if the assembler supports the ISA 3.0 > instructions. > * config.in (HAVE_AS_POWER9): Likewise. > * configure: Regenerate. > > * doc/invoke.texi (RS/6000 and PowerPC Options): Document ISA 3.0 > switches. Okay. Thanks, David
[PATCH, 5/16] Add in_oacc_kernels_region in struct loop
On 09/11/15 16:35, Tom de Vries wrote: Hi, this patch series for stage1 trunk adds support to: - parallelize oacc kernels regions using parloops, and - map the loops onto the oacc gang dimension. The patch series contains these patches: 1Insert new exit block only when needed in transform_to_exit_first_loop_alt 2Make create_parallel_loop return void 3Ignore reduction clause on kernels directive 4Implement -foffload-alias 5Add in_oacc_kernels_region in struct loop 6Add pass_oacc_kernels 7Add pass_dominator_oacc_kernels 8Add pass_ch_oacc_kernels 9Add pass_parallelize_loops_oacc_kernels 10Add pass_oacc_kernels pass group in passes.def 11Update testcases after adding kernels pass group 12Handle acc loop directive 13Add c-c++-common/goacc/kernels-*.c 14Add gfortran.dg/goacc/kernels-*.f95 15Add libgomp.oacc-c-c++-common/kernels-*.c 16Add libgomp.oacc-fortran/kernels-*.f95 The first 9 patches are more or less independent, but patches 10-16 are intended to be committed at the same time. Bootstrapped and reg-tested on x86_64. Build and reg-tested with nvidia accelerator, in combination with a patch that enables accelerator testing (which is submitted at https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ). I'll post the individual patches in reply to this message. this patch adds and initializes the field in_oacc_kernels_region field in struct loop. The field is used to signal to subsequent passes that we're dealing with a loop in a kernels region that we're trying parallelize. Note that we do not parallelize kernels regions with more than one loop nest. [ In general, kernels regions with more than one loop nest should be split up into seperate kernels regions, but that's not supported atm. ] Thanks, - Tom Add in_oacc_kernels_region in struct loop 2015-11-09 Tom de Vries* cfgloop.h (struct loop): Add in_oacc_kernels_region field. * omp-low.c (mark_loops_in_oacc_kernels_region): New function. (expand_omp_target): Call mark_loops_in_oacc_kernels_region. --- gcc/cfgloop.h | 3 +++ gcc/omp-low.c | 58 ++ 2 files changed, 61 insertions(+) diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h index 6af6893..ee73bf9 100644 --- a/gcc/cfgloop.h +++ b/gcc/cfgloop.h @@ -191,6 +191,9 @@ struct GTY ((chain_next ("%h.next"))) loop { /* True if we should try harder to vectorize this loop. */ bool force_vectorize; + /* True if the loop is part of an oacc kernels region. */ + bool in_oacc_kernels_region; + /* For SIMD loops, this is a unique identifier of the loop, referenced by IFN_GOMP_SIMD_VF, IFN_GOMP_SIMD_LANE and IFN_GOMP_SIMD_LAST_LANE builtins. */ diff --git a/gcc/omp-low.c b/gcc/omp-low.c index d052c13..7121d73 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -12429,6 +12429,61 @@ get_oacc_ifn_dim_arg (const gimple *stmt) return (int) axis; } +/* Mark the loops inside the kernels region starting at REGION_ENTRY and ending + at REGION_EXIT. */ + +static void +mark_loops_in_oacc_kernels_region (basic_block region_entry, + basic_block region_exit) +{ + bitmap dominated_bitmap = BITMAP_GGC_ALLOC (); + bitmap excludes_bitmap = BITMAP_GGC_ALLOC (); + unsigned di; + basic_block bb; + + bitmap_clear (dominated_bitmap); + bitmap_clear (excludes_bitmap); + + /* Get all the blocks dominated by the region entry. That will include the + entire region. */ + vec dominated += get_all_dominated_blocks (CDI_DOMINATORS, region_entry); + FOR_EACH_VEC_ELT (dominated, di, bb) + bitmap_set_bit (dominated_bitmap, bb->index); + + /* Exclude all the blocks which are not in the region: the blocks dominated by + the region exit. */ + if (region_exit != NULL) +{ + vec excludes + = get_all_dominated_blocks (CDI_DOMINATORS, region_exit); + FOR_EACH_VEC_ELT (excludes, di, bb) + bitmap_set_bit (excludes_bitmap, bb->index); +} + + /* Don't parallelize the kernels region if it contains more than one outer + loop. */ + unsigned int nr_outer_loops = 0; + struct loop *loop; + FOR_EACH_LOOP (loop, 0) +{ + if (loop_outer (loop) != current_loops->tree_root) + continue; + + if (bitmap_bit_p (dominated_bitmap, loop->header->index) + && !bitmap_bit_p (excludes_bitmap, loop->header->index)) + nr_outer_loops++; +} + if (nr_outer_loops != 1) +return; + + /* Mark the loops in the region. */ + FOR_EACH_LOOP (loop, 0) +if (bitmap_bit_p (dominated_bitmap, loop->header->index) + && !bitmap_bit_p (excludes_bitmap, loop->header->index)) + loop->in_oacc_kernels_region = true; +} + /* Expand the GIMPLE_OMP_TARGET starting at REGION. */ static void @@ -12483,6 +12538,9 @@ expand_omp_target (struct omp_region *region) entry_bb = region->entry; exit_bb = region->exit;
[PATCH 6/6] Simplify aarch64_builtin_vectorized_function
After the previous patches it's no longer necessary for TARGET_BUILTIN_VECTORIZED_FUNCTION to return functions that map to the vector optab of the original operation. We'll use a vector form of the internal function instead. gcc/ * config/aarch64/aarch64-builtins.c (aarch64_builtin_vectorized_function): Remove entries that map directly to optabs. diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c index c4cda4f..2a560a9 100644 --- a/gcc/config/aarch64/aarch64-builtins.c +++ b/gcc/config/aarch64/aarch64-builtins.c @@ -1288,40 +1288,6 @@ aarch64_builtin_vectorized_function (unsigned int fn, tree type_out, { #undef AARCH64_CHECK_BUILTIN_MODE #define AARCH64_CHECK_BUILTIN_MODE(C, N) \ - (out_mode == N##Fmode && out_n == C \ - && in_mode == N##Fmode && in_n == C) -CASE_CFN_FLOOR: - return AARCH64_FIND_FRINT_VARIANT (floor); -CASE_CFN_CEIL: - return AARCH64_FIND_FRINT_VARIANT (ceil); -CASE_CFN_TRUNC: - return AARCH64_FIND_FRINT_VARIANT (btrunc); -CASE_CFN_ROUND: - return AARCH64_FIND_FRINT_VARIANT (round); -CASE_CFN_NEARBYINT: - return AARCH64_FIND_FRINT_VARIANT (nearbyint); -CASE_CFN_SQRT: - return AARCH64_FIND_FRINT_VARIANT (sqrt); -#undef AARCH64_CHECK_BUILTIN_MODE -#define AARCH64_CHECK_BUILTIN_MODE(C, N) \ - (out_mode == SImode && out_n == C \ - && in_mode == N##Imode && in_n == C) -CASE_CFN_CLZ: - { - if (AARCH64_CHECK_BUILTIN_MODE (4, S)) - return aarch64_builtin_decls[AARCH64_SIMD_BUILTIN_UNOP_clzv4si]; - return NULL_TREE; - } -CASE_CFN_CTZ: - { - if (AARCH64_CHECK_BUILTIN_MODE (2, S)) - return aarch64_builtin_decls[AARCH64_SIMD_BUILTIN_UNOP_ctzv2si]; - else if (AARCH64_CHECK_BUILTIN_MODE (4, S)) - return aarch64_builtin_decls[AARCH64_SIMD_BUILTIN_UNOP_ctzv4si]; - return NULL_TREE; - } -#undef AARCH64_CHECK_BUILTIN_MODE -#define AARCH64_CHECK_BUILTIN_MODE(C, N) \ (out_mode == N##Imode && out_n == C \ && in_mode == N##Fmode && in_n == C) CASE_CFN_IFLOOR:
Re: [OpenACC] declare directive
Jakub, On 11/09/2015 10:21 AM, Jakub Jelinek wrote: On Mon, Nov 09, 2015 at 10:01:32AM -0600, James Norris wrote: + if (lookup_attribute ("omp declare target", DECL_ATTRIBUTES (decl))) Here you only look up "omp declare target", not "omp declare target link". So, what happens if you mix that (once in some copy clause, once in link), or mention twice in link, etc.? Needs testsuite coverage and clear rules. Will fix. + DECL_ATTRIBUTES (decl) = + tree_cons (id, NULL_TREE, DECL_ATTRIBUTES (decl)); Incorrect formatting, = goes already on the following line, no whitespace at end of line, and next line is indented below CL from DECL. Will fix. + t = build_omp_clause (OMP_CLAUSE_LOCATION (c) , OMP_CLAUSE_MAP); Wrong formatting, no space before ,. Will fix. +if (ret_clauses) + { + tree fndecl = current_function_decl; + tree attrs = lookup_attribute ("oacc declare returns", + DECL_ATTRIBUTES (fndecl)); Why do you use an attribute for this? I think adding the automatic vars to hash_map during gimplification of the OACC_DECLARE is best. See below (This doesn't scale...) + tree id = get_identifier ("oacc declare returns"); + DECL_ATTRIBUTES (fndecl) = + tree_cons (id, ret_clauses, DECL_ATTRIBUTES (fndecl)); Formatting error. Will fix. --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -1065,6 +1065,7 @@ gimplify_bind_expr (tree *expr_p, gimple_seq *pre_p) gimple_seq body, cleanup; gcall *stack_save; location_t start_locus = 0, end_locus = 0; + tree ret_clauses = NULL; tree temp = voidify_wrapper_expr (bind_expr, NULL); @@ -1166,9 +1167,56 @@ gimplify_bind_expr (tree *expr_p, gimple_seq *pre_p) clobber_stmt = gimple_build_assign (t, clobber); gimple_set_location (clobber_stmt, end_locus); gimplify_seq_add_stmt (, clobber_stmt); + + if (flag_openacc) + { + tree attrs = lookup_attribute ("oacc declare returns", + DECL_ATTRIBUTES (current_function_decl)); + tree clauses, c, c_next = NULL, c_prev = NULL; + + if (!attrs) + break; + + clauses = TREE_VALUE (attrs); + + for (c = clauses; c; c_prev = c, c = c_next) + { + c_next = OMP_CLAUSE_CHAIN (c); + + if (t == OMP_CLAUSE_DECL (c)) + { + if (ret_clauses) + OMP_CLAUSE_CHAIN (c) = ret_clauses; + + ret_clauses = c; + + if (c_prev == NULL) + clauses = c_next; + else + OMP_CLAUSE_CHAIN (c_prev) = c_next; + } + } This doesn't really scale. Consider 1 clauses on various oacc declare constructs in a single function, and 100 automatic variables in such a function. So, what I'm suggesting is during gimplification of OACC_DECLARE, if you find a clause on an automatic variable in the current function that you want to unmap afterwards, have a static hash_map*oacc_declare_returns; and you just add into the hash map the VAR_DECL -> the clause you want, then in this spot you check if (oacc_declare_returns) { clause = lookup in hash_map (t); if (clause) { ... } } Now I see what you were getting at in using the hash_map. I didn't consider creating a static hash_map and populating it as you suggest. Thank you! + + if (clauses == NULL) + { + DECL_ATTRIBUTES (current_function_decl) = + remove_attribute ("oacc declare returns", + DECL_ATTRIBUTES (current_function_decl)); Wrong formatting. Will fix. Jakub Thanks for taking the time to review. Jim
[PATCH 11/12] always define HAVE_AS_LEB128
From: Trevor Saundersgcc/ChangeLog: 2015-11-09 Trevor Saunders * acinclude.m4: Always define HAVE_AS_LEB128. * configure: Regenerate. * configure.ac: Adjust. * dwarf2asm.c (dw2_asm_output_data_uleb128): Likewise. (dw2_asm_output_data_sleb128): Likewise. (dw2_asm_output_delta_uleb128): Likewise. (dw2_asm_output_delta_sleb128): Likewise. * except.c (output_one_function_exception_table): Likewise. --- gcc/acinclude.m4 | 4 +++ gcc/configure| 98 +++- gcc/configure.ac | 2 ++ gcc/dwarf2asm.c | 8 ++--- gcc/except.c | 18 +-- 5 files changed, 116 insertions(+), 14 deletions(-) diff --git a/gcc/acinclude.m4 b/gcc/acinclude.m4 index b8a4c28..e7d75c8 100644 --- a/gcc/acinclude.m4 +++ b/gcc/acinclude.m4 @@ -550,6 +550,10 @@ AC_CACHE_CHECK([assembler for $1], [$2], ifelse([$7],,,[dnl if test $[$2] = yes; then $7 +fi]) +ifelse([$8],,,[dnl +if test $[$2] != yes; then + $8 fi])]) dnl gcc_SUN_LD_VERSION diff --git a/gcc/configure b/gcc/configure index de6cf13..14d828c 100755 --- a/gcc/configure +++ b/gcc/configure @@ -22411,6 +22411,7 @@ $as_echo "#define HAVE_GAS_BALIGN_AND_P2ALIGN 1" >>confdefs.h fi + { $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for .p2align with maximum skip" >&5 $as_echo_n "checking assembler for .p2align with maximum skip... " >&6; } if test "${gcc_cv_as_max_skip_p2align+set}" = set; then : @@ -22446,6 +22447,7 @@ $as_echo "#define HAVE_GAS_MAX_SKIP_P2ALIGN 1" >>confdefs.h fi + { $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for .literal16" >&5 $as_echo_n "checking assembler for .literal16... " >&6; } if test "${gcc_cv_as_literal16+set}" = set; then : @@ -22481,6 +22483,7 @@ $as_echo "#define HAVE_GAS_LITERAL16 1" >>confdefs.h fi + { $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for working .subsection -1" >&5 $as_echo_n "checking assembler for working .subsection -1... " >&6; } if test "${gcc_cv_as_subsection_m1+set}" = set; then : @@ -22528,6 +22531,7 @@ $as_echo "#define HAVE_GAS_SUBSECTION_ORDERING 1" >>confdefs.h fi + { $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for .weak" >&5 $as_echo_n "checking assembler for .weak... " >&6; } if test "${gcc_cv_as_weak+set}" = set; then : @@ -22563,6 +22567,7 @@ $as_echo "#define HAVE_GAS_WEAK 1" >>confdefs.h fi + { $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for .weakref" >&5 $as_echo_n "checking assembler for .weakref... " >&6; } if test "${gcc_cv_as_weakref+set}" = set; then : @@ -22598,6 +22603,7 @@ $as_echo "#define HAVE_GAS_WEAKREF 1" >>confdefs.h fi + { $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for .nsubspa comdat" >&5 $as_echo_n "checking assembler for .nsubspa comdat... " >&6; } if test "${gcc_cv_as_nsubspa_comdat+set}" = set; then : @@ -22634,6 +22640,7 @@ $as_echo "#define HAVE_GAS_NSUBSPA_COMDAT 1" >>confdefs.h fi + # .hidden needs to be supported in both the assembler and the linker, # because GNU LD versions before 2.12.1 have buggy support for STV_HIDDEN. # This is irritatingly difficult to feature test for; we have to check the @@ -22673,6 +22680,7 @@ fi { $as_echo "$as_me:${as_lineno-$LINENO}: result: $gcc_cv_as_hidden" >&5 $as_echo "$gcc_cv_as_hidden" >&6; } + case "${target}" in *-*-darwin*) # Darwin as has some visibility support, though with a different syntax. @@ -23125,6 +23133,11 @@ if test $gcc_cv_as_leb128 = yes; then $as_echo "#define HAVE_AS_LEB128 1" >>confdefs.h fi +if test $gcc_cv_as_leb128 != yes; then + +$as_echo "#define HAVE_AS_LEB128 0" >>confdefs.h + +fi # Check if we have assembler support for unwind directives. { $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for cfi directives" >&5 @@ -23204,6 +23217,7 @@ fi { $as_echo "$as_me:${as_lineno-$LINENO}: result: $gcc_cv_as_cfi_directive" >&5 $as_echo "$gcc_cv_as_cfi_directive" >&6; } + if test $gcc_cv_as_cfi_directive = yes && test x$gcc_cv_objdump != x; then { $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for working cfi advance" >&5 $as_echo_n "checking assembler for working cfi advance... " >&6; } @@ -23241,6 +23255,7 @@ fi { $as_echo "$as_me:${as_lineno-$LINENO}: result: $gcc_cv_as_cfi_advance_working" >&5 $as_echo "$gcc_cv_as_cfi_advance_working" >&6; } + else # no objdump, err on the side of caution gcc_cv_as_cfi_advance_working=no @@ -23284,6 +23299,7 @@ fi $as_echo "$gcc_cv_as_cfi_personality_directive" >&6; } + cat >>confdefs.h <<_ACEOF #define HAVE_GAS_CFI_PERSONALITY_DIRECTIVE `if test $gcc_cv_as_cfi_personality_directive = yes; then echo 1; else echo 0; fi` @@ -23336,6 +23352,7 @@ $as_echo "$gcc_cv_as_cfi_sections_directive" >&6; } + cat >>confdefs.h <<_ACEOF #define HAVE_GAS_CFI_SECTIONS_DIRECTIVE `if test
[PATCH 3/6] Vectorize internal functions
This patch tries to vectorize built-in and internal functions as internal functions first, falling back on the current built-in target hooks otherwise. gcc/ * internal-fn.h (direct_internal_fn_info): Add vectorizable flag. * internal-fn.c (direct_internal_fn_array): Update accordingly. * tree-vectorizer.h (vectorizable_function): Delete. * tree-vect-stmts.c: Include internal-fn.h. (vectorizable_internal_function): New function. (vectorizable_function): Inline into... (vectorizable_call): ...here. Explicitly reject calls that read from or write to memory. Try using an internal function before falling back on the old vectorizable_function behavior. diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c index 898c83d..a5bda2f 100644 --- a/gcc/internal-fn.c +++ b/gcc/internal-fn.c @@ -69,13 +69,13 @@ init_internal_fns () /* Create static initializers for the information returned by direct_internal_fn. */ -#define not_direct { -2, -2 } -#define mask_load_direct { -1, -1 } -#define load_lanes_direct { -1, -1 } -#define mask_store_direct { 3, 3 } -#define store_lanes_direct { 0, 0 } -#define unary_direct { 0, 0 } -#define binary_direct { 0, 0 } +#define not_direct { -2, -2, false } +#define mask_load_direct { -1, -1, false } +#define load_lanes_direct { -1, -1, false } +#define mask_store_direct { 3, 3, false } +#define store_lanes_direct { 0, 0, false } +#define unary_direct { 0, 0, true } +#define binary_direct { 0, 0, true } const direct_internal_fn_info direct_internal_fn_array[IFN_LAST + 1] = { #define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) not_direct, diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h index 6cb123f..aea6abd 100644 --- a/gcc/internal-fn.h +++ b/gcc/internal-fn.h @@ -134,6 +134,14 @@ struct direct_internal_fn_info function isn't directly mapped to an optab. */ signed int type0 : 8; signed int type1 : 8; + /* True if the function is pointwise, so that it can be vectorized by + converting the return type and all argument types to vectors of the + same number of elements. E.g. we can vectorize an IFN_SQRT on + floats as an IFN_SQRT on vectors of N floats. + + This only needs 1 bit, but occupies the full 16 to ensure a nice + layout. */ + unsigned int vectorizable : 16; }; extern const direct_internal_fn_info direct_internal_fn_array[IFN_LAST + 1]; diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c index 75389c4..1142142 100644 --- a/gcc/tree-vect-stmts.c +++ b/gcc/tree-vect-stmts.c @@ -47,6 +47,7 @@ along with GCC; see the file COPYING3. If not see #include "tree-scalar-evolution.h" #include "tree-vectorizer.h" #include "builtins.h" +#include "internal-fn.h" /* For lang_hooks.types.type_for_mode. */ #include "langhooks.h" @@ -1632,27 +1633,32 @@ vect_finish_stmt_generation (gimple *stmt, gimple *vec_stmt, add_stmt_to_eh_lp (vec_stmt, lp_nr); } -/* Checks if CALL can be vectorized in type VECTYPE. Returns - a function declaration if the target has a vectorized version - of the function, or NULL_TREE if the function cannot be vectorized. */ +/* We want to vectorize a call to combined function CFN with function + decl FNDECL, using VECTYPE_OUT as the type of the output and VECTYPE_IN + as the types of all inputs. Check whether this is possible using + an internal function, returning its code if so or IFN_LAST if not. */ -tree -vectorizable_function (gcall *call, tree vectype_out, tree vectype_in) +static internal_fn +vectorizable_internal_function (combined_fn cfn, tree fndecl, + tree vectype_out, tree vectype_in) { - /* We only handle functions that do not read or clobber memory. */ - if (gimple_vuse (call)) -return NULL_TREE; - - combined_fn fn = gimple_call_combined_fn (call); - if (fn != CFN_LAST) -return targetm.vectorize.builtin_vectorized_function - (fn, vectype_out, vectype_in); - - if (gimple_call_builtin_p (call, BUILT_IN_MD)) -return targetm.vectorize.builtin_md_vectorized_function - (gimple_call_fndecl (call), vectype_out, vectype_in); - - return NULL_TREE; + internal_fn ifn; + if (internal_fn_p (cfn)) +ifn = as_internal_fn (cfn); + else +ifn = associated_internal_fn (fndecl); + if (ifn != IFN_LAST && direct_internal_fn_p (ifn)) +{ + const direct_internal_fn_info = direct_internal_fn (ifn); + if (info.vectorizable) + { + tree type0 = (info.type0 < 0 ? vectype_out : vectype_in); + tree type1 = (info.type1 < 0 ? vectype_out : vectype_in); + if (direct_internal_fn_supported_p (ifn, tree_pair (type0, type1))) + return ifn; + } +} + return IFN_LAST; } @@ -2232,15 +2238,43 @@ vectorizable_call (gimple *gs, gimple_stmt_iterator *gsi, gimple **vec_stmt, else return false; + /* We only handle functions that do not read or clobber memory. */ + if (gimple_vuse (stmt)) +{ +
[PATCH 07/12] always define DBX_DEBUGGING_INFO
From: Trevor Saundersgcc/ChangeLog: 2015-11-09 Trevor Saunders * config/arc/arc.h: Define DBX_DEBUGGING_INFO to 1. * config/pdp11/pdp11.h: Likewise. * defaults.h (DBX_DEBUGGING_INFO): New default definition. * config/rs6000/rs6000.c (macho_branch_islands): Adjust. * dbxout.c (struct dbx_file): Likewise. (debug_flush_symbol_queue): Likewise. (default_stabs_asm_out_destructor): Likewise. (default_stabs_asm_out_constructor): Likewise. * dbxout.h: Likewise. * doc/tm.texi: Regenerate. * doc/tm.texi.in: Adjust. * final.c: Likewise. * gcc.c: Likewise. * opts.c (set_debug_level): Likewise. * toplev.c (process_options): Likewise. --- gcc/config/arc/arc.h | 2 +- gcc/config/pdp11/pdp11.h | 2 +- gcc/config/rs6000/rs6000.c | 4 ++-- gcc/dbxout.c | 20 ++-- gcc/dbxout.h | 2 +- gcc/defaults.h | 8 ++-- gcc/doc/tm.texi| 4 ++-- gcc/doc/tm.texi.in | 4 ++-- gcc/final.c| 2 +- gcc/gcc.c | 4 ++-- gcc/opts.c | 2 +- gcc/toplev.c | 6 ++ 12 files changed, 31 insertions(+), 29 deletions(-) diff --git a/gcc/config/arc/arc.h b/gcc/config/arc/arc.h index cb98bda..b40b04f 100644 --- a/gcc/config/arc/arc.h +++ b/gcc/config/arc/arc.h @@ -1441,7 +1441,7 @@ extern int arc_return_address_regs[4]; #ifdef DBX_DEBUGGING_INFO #undef DBX_DEBUGGING_INFO #endif -#define DBX_DEBUGGING_INFO +#define DBX_DEBUGGING_INFO 1 #ifdef DWARF2_DEBUGGING_INFO #undef DWARF2_DEBUGGING_INFO diff --git a/gcc/config/pdp11/pdp11.h b/gcc/config/pdp11/pdp11.h index 8339f1c..2c82e2c 100644 --- a/gcc/config/pdp11/pdp11.h +++ b/gcc/config/pdp11/pdp11.h @@ -38,7 +38,7 @@ along with GCC; see the file COPYING3. If not see /* Generate DBX debugging information. */ -#define DBX_DEBUGGING_INFO +#define DBX_DEBUGGING_INFO 1 #define TARGET_40_PLUS (TARGET_40 || TARGET_45) #define TARGET_10 (! TARGET_40_PLUS) diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 6ed82cb..4ccee23 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -30455,7 +30455,7 @@ macho_branch_islands (void) } strcpy (tmp_buf, "\n"); strcat (tmp_buf, label); -#if defined (DBX_DEBUGGING_INFO) || (XCOFF_DEBUGGING_INFO) +#if (DBX_DEBUGGING_INFO) || (XCOFF_DEBUGGING_INFO) if (write_symbols == DBX_DEBUG || write_symbols == XCOFF_DEBUG) dbxout_stabd (N_SLINE, bi->line_number); #endif /* DBX_DEBUGGING_INFO || XCOFF_DEBUGGING_INFO */ @@ -30505,7 +30505,7 @@ macho_branch_islands (void) strcat (tmp_buf, ")\n\tmtctr r12\n\tbctr"); } output_asm_insn (tmp_buf, 0); -#if defined (DBX_DEBUGGING_INFO) || (XCOFF_DEBUGGING_INFO) +#if (DBX_DEBUGGING_INFO) || (XCOFF_DEBUGGING_INFO) if (write_symbols == DBX_DEBUG || write_symbols == XCOFF_DEBUG) dbxout_stabd (N_SLINE, bi->line_number); #endif /* DBX_DEBUGGING_INFO || XCOFF_DEBUGGING_INFO */ diff --git a/gcc/dbxout.c b/gcc/dbxout.c index d9bd59f..993ceda 100644 --- a/gcc/dbxout.c +++ b/gcc/dbxout.c @@ -217,7 +217,7 @@ struct dbx_file should always be 0 because we should not have needed any file numbers yet. */ -#if (defined (DBX_DEBUGGING_INFO) || (XCOFF_DEBUGGING_INFO)) \ +#if ((DBX_DEBUGGING_INFO) || (XCOFF_DEBUGGING_INFO)) \ && defined (DBX_USE_BINCL) static struct dbx_file *current_file; #endif @@ -250,7 +250,7 @@ static GTY(()) int lastfile_is_base; /* Typical USG systems don't have stab.h, and they also have no use for DBX-format debugging info. */ -#if defined (DBX_DEBUGGING_INFO) || (XCOFF_DEBUGGING_INFO) +#if (DBX_DEBUGGING_INFO) || (XCOFF_DEBUGGING_INFO) #ifdef DBX_USE_BINCL /* If zero then there is no pending BINCL. */ @@ -329,7 +329,7 @@ static void dbxout_handle_pch (unsigned); static void debug_free_queue (void); /* The debug hooks structure. */ -#if defined (DBX_DEBUGGING_INFO) +#if (DBX_DEBUGGING_INFO) static void dbxout_source_line (unsigned int, const char *, int, bool); static void dbxout_begin_prologue (unsigned int, const char *); @@ -860,7 +860,7 @@ dbxout_finish_complex_stabs (tree sym, stab_code_type code, obstack_free (_ob, str); } -#if defined (DBX_DEBUGGING_INFO) || (XCOFF_DEBUGGING_INFO) +#if (DBX_DEBUGGING_INFO) || (XCOFF_DEBUGGING_INFO) /* When -gused is used, emit debug info for only used symbols. But in addition to the standard intercepted debug_hooks there are some @@ -885,7 +885,7 @@ static int symbol_queue_size = 0; #endif /* DBX_DEBUGGING_INFO || XCOFF_DEBUGGING_INFO */ -#if defined (DBX_DEBUGGING_INFO) +#if (DBX_DEBUGGING_INFO) static void dbxout_function_end (tree decl ATTRIBUTE_UNUSED) @@ -1207,7 +1207,7 @@ dbxout_handle_pch
Re: [Patch AArch64] Switch constant pools to separate rodata sections.
On 08/11/15 11:42, Andreas Schwab wrote: > This is causing a bootstrap comparison failure in gcc/go/gogo.o. > > Andreas. > I've had a look at this for sometime this afternoon and the trigger is the aarch64_use_constant_blocks_p change which appears to be causing a bootstrap comparison failure because of differences to offsets in add instructions when built with debug and without debug. For now, in the interest of go bootstraps continuing on trunk - I'm proposing a patch that partially rolls back the change in aarch64_use_constant_blocks_p and will still look into the underlying issue. Bootstrapped on aarch64-none-linux-gnu including (c,c++ and go) - testing finished ok. Ok ? Ramana PR bootstrap/68256 * config/aarch64/aarch64.c (aarch64_use_constant_blocks_p): Return false. diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 1b7be83..1fff878 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -5251,9 +5251,11 @@ aarch64_can_use_per_function_literal_pools_p (void) static bool aarch64_use_blocks_for_constant_p (machine_mode, const_rtx) { - /* We can't use blocks for constants when we're using a per-function - constant pool. */ - return !aarch64_can_use_per_function_literal_pools_p (); + /* Fixme:: In an ideal world this would work similar + to the logic in aarch64_select_rtx_section but this + breaks bootstrap in gcc go. For now we workaround + this by returning false here. */ + return false; } /* Select appropriate section for constants depending
[PATCH series, 16] Use parloops to parallelize oacc kernels regions
Hi, this patch series for stage1 trunk adds support to: - parallelize oacc kernels regions using parloops, and - map the loops onto the oacc gang dimension. The patch series contains these patches: 1 Insert new exit block only when needed in transform_to_exit_first_loop_alt 2 Make create_parallel_loop return void 3 Ignore reduction clause on kernels directive 4 Implement -foffload-alias 5 Add in_oacc_kernels_region in struct loop 6 Add pass_oacc_kernels 7 Add pass_dominator_oacc_kernels 8 Add pass_ch_oacc_kernels 9 Add pass_parallelize_loops_oacc_kernels 10 Add pass_oacc_kernels pass group in passes.def 11 Update testcases after adding kernels pass group 12 Handle acc loop directive 13 Add c-c++-common/goacc/kernels-*.c 14 Add gfortran.dg/goacc/kernels-*.f95 15 Add libgomp.oacc-c-c++-common/kernels-*.c 16 Add libgomp.oacc-fortran/kernels-*.f95 The first 9 patches are more or less independent, but patches 10-16 are intended to be committed at the same time. Bootstrapped and reg-tested on x86_64. Build and reg-tested with nvidia accelerator, in combination with a patch that enables accelerator testing (which is submitted at https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ). I'll post the individual patches in reply to this message. Thanks, - Tom --- 1 Insert new exit block only when needed in transform_to_exit_first_loop_alt 2015-06-30 Tom de Vries* tree-parloops.c (transform_to_exit_first_loop_alt): Insert new exit block only when needed. --- 2 Make create_parallel_loop return void 2015-11-09 Tom de Vries * tree-parloops.c (create_parallel_loop): Return void. --- 3 Ignore reduction clause on kernels directive 2015-11-08 Tom de Vries * c-omp.c (c_oacc_split_loop_clauses): Don't copy OMP_CLAUSE_REDUCTION, classify as loop clause. --- 4 Implement -foffload-alias 2015-11-03 Tom de Vries * common.opt (foffload-alias): New option. * flag-types.h (enum offload_alias): New enum. * omp-low.c (install_var_field): Handle flag_offload_alias. * doc/invoke.texi (@item Code Generation Options): Add -foffload-alias. (@item -foffload-alias): New item. * c-c++-common/goacc/kernels-loop-offload-alias-none.c: New test. * c-c++-common/goacc/kernels-loop-offload-alias-ptr.c: New test. --- 5 Add in_oacc_kernels_region in struct loop 2015-11-09 Tom de Vries * cfgloop.h (struct loop): Add in_oacc_kernels_region field. * omp-low.c (mark_loops_in_oacc_kernels_region): New function. (expand_omp_target): Call mark_loops_in_oacc_kernels_region. --- 6 Add pass_oacc_kernels 2015-11-09 Tom de Vries * tree-pass.h (make_pass_oacc_kernels): Declare. * tree-ssa-loop.c (gate_oacc_kernels): New static function. (pass_data_oacc_kernels): New pass_data. (class pass_oacc_kernels): New pass. (make_pass_oacc_kernels): New function. --- 7 Add pass_dominator_oacc_kernels 2015-11-09 Tom de Vries * tree-pass.h (make_pass_dominator_oacc_kernels): Declare. * tree-ssa-dom.c (class dominator_base): New class. Factor out of ... (class pass_dominator): ... here. (dominator_base::may_peel_loop_headers_p) (pass_dominator::may_peel_loop_headers_p): New function. (pass_dominator_oacc_kernels): New pass. (make_pass_dominator_oacc_kernels): New function. (dominator_base::execute): Use may_peel_loop_headers_p. --- 8 Add pass_ch_oacc_kernels 2015-11-09 Tom de Vries * tree-pass.h (make_pass_ch_oacc_kernels): Declare. * tree-ssa-loop-ch.c (pass_ch::pass_ch (pass_data, gcc::context)): New constructor. (pass_data_ch_oacc_kernels): New pass_data. (class pass_ch_oacc_kernels): New pass. (pass_ch_oacc_kernels::process_loop_p): New function. (make_pass_ch_oacc_kernels): New function. --- 9 Add pass_parallelize_loops_oacc_kernels 2015-11-09 Tom de Vries * omp-low.c (set_oacc_fn_attrib): Make extern. * omp-low.c (expand_omp_atomic_fetch_op): Release defs of update stmt. * omp-low.h (set_oacc_fn_attrib): Declare. * tree-parloops.c (struct reduction_info): Add reduc_addr field. (create_call_for_reduction_1): Handle case that reduc_addr is non-NULL. (create_parallel_loop, gen_parallel_loop, try_create_reduction_list): Add and handle function parameter oacc_kernels_p. (get_omp_data_i_param): New function. (ref_conflicts_with_region, oacc_entry_exit_ok_1) (oacc_entry_exit_single_gang, oacc_entry_exit_ok): New function. (parallelize_loops): Add