Commit: Xstormy16: Add modes to post_inc and pre_dec patterns
Hi Guys, I am applying the patch below to add modes to the POST_INC and PRE_DEC patterns in the XStormy16 backend. The lack of the modes was leading to some build problems. Cheers Nick gcc/ChangeLog 2014-02-21 Nick Clifton ni...@redhat.com * config/stormy16/stormy16.md (pushdqi1): Add mode to post_inc. (pushhi1): Likewise. (popqi1): Add mode to pre_dec. (pophi1): Likewise. Index: gcc/config/stormy16/stormy16.md === --- gcc/config/stormy16/stormy16.md (revision 207983) +++ gcc/config/stormy16/stormy16.md (working copy) @@ -114,7 +114,7 @@ ;; insns like this one are never generated. (define_insn pushqi1 - [(set (mem:QI (post_inc (reg:HI 15))) + [(set (mem:QI (post_inc:HI (reg:HI 15))) (match_operand:QI 0 register_operand r))] push %0 @@ -123,7 +123,7 @@ (define_insn popqi1 [(set (match_operand:QI 0 register_operand =r) - (mem:QI (pre_dec (reg:HI 15] + (mem:QI (pre_dec:HI (reg:HI 15] pop %0 [(set_attr psw_operand nop) @@ -168,7 +168,7 @@ (set_attr psw_operand 0,0,0,0,nop,0,nop,0,0)]) (define_insn pushhi1 - [(set (mem:HI (post_inc (reg:HI 15))) + [(set (mem:HI (post_inc:HI (reg:HI 15))) (match_operand:HI 0 register_operand r))] push %0 @@ -177,7 +177,7 @@ (define_insn pophi1 [(set (match_operand:HI 0 register_operand =r) - (mem:HI (pre_dec (reg:HI 15] + (mem:HI (pre_dec:HI (reg:HI 15] pop %0 [(set_attr psw_operand nop)
Re: [PATCH] Fix PR c++/60065.
On 2014-02-20 16:18, Jason Merrill wrote: On 02/19/2014 10:00 PM, Adam Butcher wrote: + if (current_template_parms) +{ + cp_binding_level *maybe_tmpl_scope = current_binding_level-level_chain; + while (maybe_tmpl_scope maybe_tmpl_scope-kind == sk_class) + maybe_tmpl_scope = maybe_tmpl_scope-level_chain; + if (maybe_tmpl_scope maybe_tmpl_scope-kind == sk_template_parms) + declaring_template_p = true; +} Won't this return true for a member function of a class template? i.e. template class T struct A { void f(auto x); }; Yes I think you're right. I was thinking about that yesterday but hadn't had a chance to get to my PC to check or post a reply. The intent is to deal with out-of-line implicit member templates. But I think the issue is more complex; and I think it may be true for the synthesize code as well as this new code. A class template with an out-of-line generic function definition will give the same issue I think: template typename T void AT::f(auto x) {} // should inject a new list It needs to know when to extend a function template parameter list and when to insert a new one. Another case: struct B { template int N void f(auto x); }; template int N void B::f(auto x) {} // should extend existing inner list And also: template typename T struct C { template int N void f(auto x); }; template typename T template int N void CT::f(auto x) {} // should extend existing inner list Obviously there is an arbitrary depth of class and class templates. Need to look further into it when I get some more time. Once it's resolved I think it'd be useful to create a new function to determine this rather than doing the scope walk in a number of places. Something like 'templ_parm_scope_for_fn_being_declared' --- or hopefully some more elegant name! Why doesn't num_template_parameter_lists work as a predicate here? It works in the lambda case as it is updated there, but for generic functions I think the following prevents it: cp/parser.c:17063: /* Inside the function parameter list, surrounding template-parameter-lists do not apply. */ saved_num_template_parameter_lists = parser-num_template_parameter_lists; parser-num_template_parameter_lists = 0; begin_scope (sk_function_parms, NULL_TREE); /* Parse the parameter-declaration-clause. */ params = cp_parser_parameter_declaration_clause (parser); /* Restore saved template parameter lists accounting for implicit template parameters. */ parser-num_template_parameter_lists += saved_num_template_parameter_lists; Cheers, Adam
[PATCH] Bound number of recursive compute_control_dep_chain calls with a param (PR tree-optimization/56490)
Hi! As discussed in the PR, on larger functions we can end up with over 3 million of compute_control_dep_chain nested calls from a single compute_control_dep_chain call, on that testcase all that effort just to get zero or at most one (useless) control dep path. The problem is that the function is really unbound, even with the 6 element path length limitation (recursion depth) and the limit of 8 find_pdom calls - everything still iterates on all the successor edges at each level. And, the function is often called on the same basic block again and again, even at a particular depth level (e.g. over 20 times same bb same depth level). But the preceeding edge list is slightly different in each case and in theory it could give different answers. Fixed by bounding the total number of nested calls. Additionally, I've made a couple of cleanups, heap allocating 8 field array instead of using an automatic array makes no sense, the chain length is at most 6 and thus we can use a stack vector, etc. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2014-02-21 Jakub Jelinek ja...@redhat.com PR tree-optimization/56490 * params.def (PARAM_UNINIT_CONTROL_DEP_ATTEMPTS): New param. * tree-ssa-uninit.c: Include params.h. (compute_control_dep_chain): Add num_calls argument, return false if it exceed PARAM_UNINIT_CONTROL_DEP_ATTEMPTS param, pass num_calls to recursive call. (find_predicates): Change dep_chain into normal array, cur_chain into auto_vecedge, MAX_CHAIN_LEN + 1, add num_calls variable and adjust compute_control_dep_chain caller. (find_def_preds): Likewise. --- gcc/params.def.jj 2014-01-09 19:09:47.0 +0100 +++ gcc/params.def 2014-02-20 19:30:37.467597338 +0100 @@ -1078,6 +1078,12 @@ DEFPARAM (PARAM_ASAN_USE_AFTER_RETURN, asan-use-after-return, Enable asan builtin functions protection, 1, 0, 1) + +DEFPARAM (PARAM_UNINIT_CONTROL_DEP_ATTEMPTS, + uninit-control-dep-attempts, + Maximum number of nested calls to search for control dependencies + during uninitialized variable analysis, + 1000, 1, 0) /* Local variables: --- gcc/tree-ssa-uninit.c.jj2014-02-04 01:35:58.0 +0100 +++ gcc/tree-ssa-uninit.c 2014-02-20 19:31:14.198385817 +0100 @@ -44,6 +44,7 @@ along with GCC; see the file COPYING3. #include hashtab.h #include tree-pass.h #include diagnostic-core.h +#include params.h /* This implements the pass that does predicate aware warning on uses of possibly uninitialized variables. The pass first collects the set of @@ -390,8 +391,8 @@ find_control_equiv_block (basic_block bb /* Computes the control dependence chains (paths of edges) for DEP_BB up to the dominating basic block BB (the head node of a - chain should be dominated by it). CD_CHAINS is pointer to a - dynamic array holding the result chains. CUR_CD_CHAIN is the current + chain should be dominated by it). CD_CHAINS is pointer to an + array holding the result chains. CUR_CD_CHAIN is the current chain being computed. *NUM_CHAINS is total number of chains. The function returns true if the information is successfully computed, return false if there is no control dependence or not computed. */ @@ -400,7 +401,8 @@ static bool compute_control_dep_chain (basic_block bb, basic_block dep_bb, vecedge *cd_chains, size_t *num_chains, - vecedge *cur_cd_chain) + vecedge *cur_cd_chain, + int *num_calls) { edge_iterator ei; edge e; @@ -411,6 +413,10 @@ compute_control_dep_chain (basic_block b if (EDGE_COUNT (bb-succs) 2) return false; + if (*num_calls PARAM_VALUE (PARAM_UNINIT_CONTROL_DEP_ATTEMPTS)) +return false; + ++*num_calls; + /* Could use a set instead. */ cur_chain_len = cur_cd_chain-length (); if (cur_chain_len MAX_CHAIN_LEN) @@ -450,7 +456,7 @@ compute_control_dep_chain (basic_block b /* Now check if DEP_BB is indirectly control dependent on BB. */ if (compute_control_dep_chain (cd_bb, dep_bb, cd_chains, - num_chains, cur_cd_chain)) +num_chains, cur_cd_chain, num_calls)) { found_cd_chain = true; break; @@ -595,14 +601,12 @@ find_predicates (pred_chain_union *preds basic_block use_bb) { size_t num_chains = 0, i; - vecedge *dep_chains = 0; - vecedge cur_chain = vNULL; + int num_calls = 0; + vecedge dep_chains[MAX_NUM_CHAINS]; + auto_vecedge, MAX_CHAIN_LEN + 1 cur_chain; bool has_valid_pred = false; basic_block cd_root = 0; - typedef vecedge vec_edge_heap; - dep_chains = XCNEWVEC (vec_edge_heap, MAX_NUM_CHAINS); - /* First find the closest bb that is control equivalent to
Re: [RFA/dwarf v2] Add DW_AT_GNAT_use_descriptive_type flag for Ada units.
Hello, Would anyone be able to (re-)approve this patch, please? It should be really really straightforward, and only adds a DWARF flag to Ada Compilation Units, so I should think that the risk is near zero. I've tested the patch as usual regardless. Parallel to that, we have also started working on producing standard DWARF in place of our encodings, and small progress has been made. But this is even more of a huge task than we thought, and in the meantime, this little flag will help non-AdaCore users... Thank you! On Fri, Jan 31, 2014 at 09:09:05AM +0400, Joel Brobecker wrote: On Tue, Feb 19, 2013 at 10:50:46PM -0500, Jason Merrill wrote: On 02/19/2013 10:42 PM, Joel Brobecker wrote: This is useful when a DIE does not have a descriptive type attribute. In that case, the debugger needs to determine whether the unit was compiled with a compiler that normally provides that information, or not. Ah. OK, then. But I'd prefer to call it DW_AT_GNAT_use_descriptive_type, to follow the convention of keeping the vendor tag at the beginning of the name. Almost a year ago, you privately approved a small patch of mine, with the small request above. I'm sorry I let it drag so long! Here is the updated patch. include/ChangeLog: * dwarf2.def: Rename DW_AT_use_GNAT_descriptive_type into DW_AT_GNAT_use_descriptive_type. gcc/ChangeLog: * dwarf2out.c (gen_compile_unit_die): Add DW_AT_use_GNAT_descriptive_type attribute for Ada units. Tested on x86_64-linux. I should also adjust the Wiki page accordingly, but the login process keeps timing out. I know I have the right login and passwd since I succesfully reset them using the passwd recovery procedure, just in case the error was due to bad credentials. I'll try again later. If approved, I will also take care of coordinating the dwarf2.def change with binutils-gdb.git. Is this patch still OK to commit? Thank you, -- Joel From 7aae3721addf6905113d9f0287a5cbb5301a462b Mon Sep 17 00:00:00 2001 From: Joel Brobecker brobec...@adacore.com Date: Thu, 3 Jan 2013 09:25:12 -0500 Subject: [PATCH] [dwarf] Add DW_AT_GNAT_use_descriptive_type flag for Ada units. This patch first renames the DW_AT_use_GNAT_descriptive_type DWARF attribute into DW_AT_GNAT_use_descriptive_type to better follow the usual convention of keeping the vendor tag at the beginning of the name. It then modifies dwadrf2out to generate this attribute for Ada units. include/ChangeLog: * dwarf2.def: Rename DW_AT_use_GNAT_descriptive_type into DW_AT_GNAT_use_descriptive_type. gcc/ChangeLog: * dwarf2out.c (gen_compile_unit_die): Add DW_AT_use_GNAT_descriptive_type attribute for Ada units. --- gcc/dwarf2out.c|4 include/dwarf2.def |2 +- 2 files changed, 5 insertions(+), 1 deletions(-) diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c index d1ca4ba..057605c 100644 --- a/gcc/dwarf2out.c +++ b/gcc/dwarf2out.c @@ -19318,6 +19318,10 @@ gen_compile_unit_die (const char *filename) /* The default DW_ID_case_sensitive doesn't need to be specified. */ break; } + + if (language == DW_LANG_Ada95) +add_AT_flag (die, DW_AT_GNAT_use_descriptive_type, 1); + return die; } diff --git a/include/dwarf2.def b/include/dwarf2.def index 71a37b3..4dd636e 100644 --- a/include/dwarf2.def +++ b/include/dwarf2.def @@ -398,7 +398,7 @@ DW_AT (DW_AT_VMS_rtnbeg_pd_address, 0x2201) /* GNAT extensions. */ /* GNAT descriptive type. See http://gcc.gnu.org/wiki/DW_AT_GNAT_descriptive_type . */ -DW_AT (DW_AT_use_GNAT_descriptive_type, 0x2301) +DW_AT (DW_AT_GNAT_use_descriptive_type, 0x2301) DW_AT (DW_AT_GNAT_descriptive_type, 0x2302) /* UPC extension. */ DW_AT (DW_AT_upc_threads_scaled, 0x3210) -- 1.7.0.4 -- Joel
Re: [Patch, Fortran, OOP, Regression] PR 60234: ICE in generate_finalization_wrapper at fortran/class.c:1883
2014-02-21 8:25 GMT+01:00 Tobias Burnus bur...@net-b.de: Hi Janus, Janus Weil wrote: What the patch does is to defer the building of the vtabs to a later stage. Previously this was done only for some rare cases, now we do it basically for all vtabs. This is necessary with finalization, since building the vtab also implies building the finalization wrapper, for which it is necessary that the finalizers have been resolved. Anyway, the patch regtests cleanly on x86_64-unknown-linux-gnu. Ok for trunk? Looks good to me. Does comp_is_finalizable (gfc_component *comp) { - if (comp-attr.allocatable comp-ts.type != BT_CLASS) + if (comp-attr.proc_pointer) +return false; + else if (comp-attr.allocatable comp-ts.type != BT_CLASS) fix an other PR - or did you just spot it when looking at the code? It it certainly simple, correct and should go in. this became necessary after the vtab changes (although I don't remember which test case triggered it). comp_is_finalizable is called (more or less directly) from generate_finalization_wrapper. Since the latter was called too early, the problem with PPCs was not triggered previously, it seems. I have committed the patch as r207986. Thanks for the review! Cheers, Janus 2014-02-20 Janus Weil ja...@gcc.gnu.org PR fortran/60234 * gfortran.h (gfc_build_class_symbol): Removed argument. * class.c (gfc_add_component_ref): Fix up missing vtype if necessary. (gfc_build_class_symbol): Remove argument 'delayed_vtab'. vtab is always delayed now, except for unlimited polymorphics. (comp_is_finalizable): Procedure pointer components are not finalizable. * decl. (build_sym, build_struct, attr_decl1): Removed argument of 'gfc_build_class_symbol'. * match.c (copy_ts_from_selector_to_associate, select_type_set_tmp): Ditto. * symbol.c (gfc_set_default_type): Ditto. 2014-02-20 Janus Weil ja...@gcc.gnu.org PR fortran/60234 * gfortran.dg/finalize_23.f90: New.
Re: [PATCH] Bound number of recursive compute_control_dep_chain calls with a param (PR tree-optimization/56490)
On Fri, 21 Feb 2014, Jakub Jelinek wrote: Hi! As discussed in the PR, on larger functions we can end up with over 3 million of compute_control_dep_chain nested calls from a single compute_control_dep_chain call, on that testcase all that effort just to get zero or at most one (useless) control dep path. The problem is that the function is really unbound, even with the 6 element path length limitation (recursion depth) and the limit of 8 find_pdom calls - everything still iterates on all the successor edges at each level. And, the function is often called on the same basic block again and again, even at a particular depth level (e.g. over 20 times same bb same depth level). But the preceeding edge list is slightly different in each case and in theory it could give different answers. Fixed by bounding the total number of nested calls. Additionally, I've made a couple of cleanups, heap allocating 8 field array instead of using an automatic array makes no sense, the chain length is at most 6 and thus we can use a stack vector, etc. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Ok. Thanks, Richard. 2014-02-21 Jakub Jelinek ja...@redhat.com PR tree-optimization/56490 * params.def (PARAM_UNINIT_CONTROL_DEP_ATTEMPTS): New param. * tree-ssa-uninit.c: Include params.h. (compute_control_dep_chain): Add num_calls argument, return false if it exceed PARAM_UNINIT_CONTROL_DEP_ATTEMPTS param, pass num_calls to recursive call. (find_predicates): Change dep_chain into normal array, cur_chain into auto_vecedge, MAX_CHAIN_LEN + 1, add num_calls variable and adjust compute_control_dep_chain caller. (find_def_preds): Likewise. --- gcc/params.def.jj 2014-01-09 19:09:47.0 +0100 +++ gcc/params.def2014-02-20 19:30:37.467597338 +0100 @@ -1078,6 +1078,12 @@ DEFPARAM (PARAM_ASAN_USE_AFTER_RETURN, asan-use-after-return, Enable asan builtin functions protection, 1, 0, 1) + +DEFPARAM (PARAM_UNINIT_CONTROL_DEP_ATTEMPTS, + uninit-control-dep-attempts, + Maximum number of nested calls to search for control dependencies + during uninitialized variable analysis, + 1000, 1, 0) /* Local variables: --- gcc/tree-ssa-uninit.c.jj 2014-02-04 01:35:58.0 +0100 +++ gcc/tree-ssa-uninit.c 2014-02-20 19:31:14.198385817 +0100 @@ -44,6 +44,7 @@ along with GCC; see the file COPYING3. #include hashtab.h #include tree-pass.h #include diagnostic-core.h +#include params.h /* This implements the pass that does predicate aware warning on uses of possibly uninitialized variables. The pass first collects the set of @@ -390,8 +391,8 @@ find_control_equiv_block (basic_block bb /* Computes the control dependence chains (paths of edges) for DEP_BB up to the dominating basic block BB (the head node of a - chain should be dominated by it). CD_CHAINS is pointer to a - dynamic array holding the result chains. CUR_CD_CHAIN is the current + chain should be dominated by it). CD_CHAINS is pointer to an + array holding the result chains. CUR_CD_CHAIN is the current chain being computed. *NUM_CHAINS is total number of chains. The function returns true if the information is successfully computed, return false if there is no control dependence or not computed. */ @@ -400,7 +401,8 @@ static bool compute_control_dep_chain (basic_block bb, basic_block dep_bb, vecedge *cd_chains, size_t *num_chains, - vecedge *cur_cd_chain) +vecedge *cur_cd_chain, +int *num_calls) { edge_iterator ei; edge e; @@ -411,6 +413,10 @@ compute_control_dep_chain (basic_block b if (EDGE_COUNT (bb-succs) 2) return false; + if (*num_calls PARAM_VALUE (PARAM_UNINIT_CONTROL_DEP_ATTEMPTS)) +return false; + ++*num_calls; + /* Could use a set instead. */ cur_chain_len = cur_cd_chain-length (); if (cur_chain_len MAX_CHAIN_LEN) @@ -450,7 +456,7 @@ compute_control_dep_chain (basic_block b /* Now check if DEP_BB is indirectly control dependent on BB. */ if (compute_control_dep_chain (cd_bb, dep_bb, cd_chains, - num_chains, cur_cd_chain)) + num_chains, cur_cd_chain, num_calls)) { found_cd_chain = true; break; @@ -595,14 +601,12 @@ find_predicates (pred_chain_union *preds basic_block use_bb) { size_t num_chains = 0, i; - vecedge *dep_chains = 0; - vecedge cur_chain = vNULL; + int num_calls = 0; + vecedge dep_chains[MAX_NUM_CHAINS]; + auto_vecedge, MAX_CHAIN_LEN + 1 cur_chain; bool has_valid_pred = false; basic_block cd_root = 0; - typedef vecedge
Re: PING: Fwd: Re: [patch] implement Cilk Plus simd loops on trunk
Hi! On Fri, 15 Nov 2013 14:44:45 -0700, Aldy Hernandez al...@redhat.com wrote: Attached is the final version of the patch I have committed to trunk. --- a/gcc/gimple-pretty-print.c +++ b/gcc/gimple-pretty-print.c @@ -1118,6 +1118,8 @@ dump_gimple_omp_for (pretty_printer *buffer, gimple gs, int spc, int flags) case GF_OMP_FOR_KIND_SIMD: kind = simd; break; + case GF_OMP_FOR_KIND_CILKSIMD: + kind = cilksimd; case GF_OMP_FOR_KIND_DISTRIBUTE: kind = distribute; break; Fixed (untested, but obvious) in r207987: commit b12563e00026b48b817fd3532fc3df2db2a0f460 Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4 Date: Fri Feb 21 09:18:15 2014 + Correct TDF_RAW pretty-printing of GIMPLE_OMP_FOR's GF_OMP_FOR_KIND_CILKSIMD. gcc/ * gimple-pretty-print.c (dump_gimple_omp_for) [flags TDF_RAW] case GF_OMP_FOR_KIND_CILKSIMD: Add missing break statement. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@207987 138bc75d-0d04-0410-961f-82ee72b054a4 diff --git gcc/ChangeLog gcc/ChangeLog index 67299af..cc9031b 100644 --- gcc/ChangeLog +++ gcc/ChangeLog @@ -1,3 +1,8 @@ +2014-02-21 Thomas Schwinge tho...@codesourcery.com + + * gimple-pretty-print.c (dump_gimple_omp_for) [flags TDF_RAW] + case GF_OMP_FOR_KIND_CILKSIMD: Add missing break statement. + 2014-02-21 Nick Clifton ni...@redhat.com * config/stormy16/stormy16.md (pushdqi1): Add mode to post_inc. diff --git gcc/gimple-pretty-print.c gcc/gimple-pretty-print.c index 2d1e1c7..741cd92 100644 --- gcc/gimple-pretty-print.c +++ gcc/gimple-pretty-print.c @@ -1121,6 +1121,7 @@ dump_gimple_omp_for (pretty_printer *buffer, gimple gs, int spc, int flags) break; case GF_OMP_FOR_KIND_CILKSIMD: kind = cilksimd; + break; case GF_OMP_FOR_KIND_DISTRIBUTE: kind = distribute; break; Grüße, Thomas pgpOQlUIk9VU2.pgp Description: PGP signature
[PATCH][1/n] Improve PR60291
This improves compile-time of PR60291 at -O1 from 210s to 85s, getting remove unused locals out of the profile. There walking DECL_INITIAL of globals is quadratic when that is refered to from multiple functions. We've had the same issue with add_referenced_vars when that still existed. Bootstrapped and tested on x86_64-unknown-linux-gnu, ok for trunk and branch? I've verified that I can still properly debug int ** foo (void) { static int a = 0; static int *b = a; static int **c = b; return c; } int main() { return **foo(); } (step into foo and print a, b and c). Note that even with 4.8 right now int ** foo (void) { int **q; { static int a = 0; static int *b = a; static int **c = b; q = c; } return q; } int main() { return **foo(); } is broken (with -O1 -fno-inline, with inlining both cases are broken). But that all doesn't regress with the following and if we fix it then we should fix it another way, not by walking global initializers. Thanks, Richard. 2014-02-21 Richard Biener rguent...@suse.de PR middle-end/60291 * tree-ssa-live.c (mark_all_vars_used_1): Do not walk DECL_INITIAL. Index: gcc/tree-ssa-live.c === *** gcc/tree-ssa-live.c (revision 207960) --- gcc/tree-ssa-live.c (working copy) *** mark_all_vars_used_1 (tree *tp, int *wal *** 432,443 /* Only need to mark VAR_DECLS; parameters and return results are not eliminated as unused. */ if (TREE_CODE (t) == VAR_DECL) ! { ! /* When a global var becomes used for the first time also walk its ! initializer (non global ones don't have any). */ ! if (set_is_used (t) is_global_var (t)) ! mark_all_vars_used (DECL_INITIAL (t)); ! } /* remove_unused_scope_block_p requires information about labels which are not DECL_IGNORED_P to tell if they might be used in the IL. */ else if (TREE_CODE (t) == LABEL_DECL) --- 432,438 /* Only need to mark VAR_DECLS; parameters and return results are not eliminated as unused. */ if (TREE_CODE (t) == VAR_DECL) ! set_is_used (t); /* remove_unused_scope_block_p requires information about labels which are not DECL_IGNORED_P to tell if they might be used in the IL. */ else if (TREE_CODE (t) == LABEL_DECL)
[PATCH][2/2] Fix expansion slowness of PR60291
This fixes the slowness of RTL expansion in PR60291 which is caused by excessive collisions in mem-attr sharing. The issue here is that sharing attempts happens across functions and we have a _lot_ of functions in this testcase referencing the same lexically equivalent memory, for example MEM[(StgWord *)_5 + -64B]. That means those get the same hash value. But they don't compare equal because an SSA name _5 from function A is of course not equal to one from function B. The following fixes that by not doing mem-attr sharing across functions by clearing the mem-attrs hashtable in rest_of_clean_state. Another fix may be to do what the comment in iterative_hash_expr says for SSA names: case SSA_NAME: /* We can just compare by pointer. */ return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val); (probably blame me for changing that to hashing the SSA version). But I'm not sure that doesn't uncover issues with various hashtables and walking them, generating code dependent on the order. It's IMHO just not expected that you compare function-local expressions from different functions. The other thing would be to discard mem-attr sharing alltogether, but that doesn't seem appropriate at this stage (but it would also simplify quite some code). With only one function in RTL at a time that shouldn't be too bad (see several suggestions along that line, even with statistics). Bootstrap / regtest running on x86_64-unknown-linux-gnu, ok for trunk and 4.8 branch? Thanks, Richard. 2014-02-21 Richard Biener rguent...@suse.de PR middle-end/60291 * rtl.h (clear_mem_attrs_htab): Declare. * emit-rtl.c (clear_mem_attrs_htab): New function. * final.c (rest_of_clean_state): Call clear_mem_attrs_htab to avoid sharing mem-attrs between functions. Index: gcc/rtl.h === *** gcc/rtl.h (revision 207960) --- gcc/rtl.h (working copy) *** extern int in_sequence_p (void); *** 2546,2551 --- 2546,2552 extern void init_emit (void); extern void init_emit_regs (void); extern void init_emit_once (void); + extern void clear_mem_attrs_htab (void); extern void push_topmost_sequence (void); extern void pop_topmost_sequence (void); extern void set_new_first_and_last_insn (rtx, rtx); Index: gcc/emit-rtl.c === *** gcc/emit-rtl.c (revision 207960) --- gcc/emit-rtl.c (working copy) *** init_emit_once (void) *** 5913,5918 --- 5913,5926 simple_return_rtx = gen_rtx_fmt_ (SIMPLE_RETURN, VOIDmode); cc0_rtx = gen_rtx_fmt_ (CC0, VOIDmode); } + + /* Clear the mem-attrs sharing hash table. */ + + void + clear_mem_attrs_htab (void) + { + htab_empty (mem_attrs_htab); + } /* Produce exact duplicate of insn INSN after AFTER. Care updating of libcall regions if present. */ Index: gcc/final.c === *** gcc/final.c (revision 207960) --- gcc/final.c (working copy) *** rest_of_clean_state (void) *** 4678,4683 --- 4678,4686 init_recog_no_volatile (); + /* Reset mem-attrs sharing. */ + clear_mem_attrs_htab (); + /* We're done with this function. Free up memory if we can. */ free_after_parsing (cfun); free_after_compilation (cfun);
Re: [PATCH][2/2] Fix expansion slowness of PR60291
On Fri, 21 Feb 2014, Richard Biener wrote: This fixes the slowness of RTL expansion in PR60291 which is caused by excessive collisions in mem-attr sharing. The issue here is that sharing attempts happens across functions and we have a _lot_ of functions in this testcase referencing the same lexically equivalent memory, for example MEM[(StgWord *)_5 + -64B]. That means those get the same hash value. But they don't compare equal because an SSA name _5 from function A is of course not equal to one from function B. The following fixes that by not doing mem-attr sharing across functions by clearing the mem-attrs hashtable in rest_of_clean_state. Another fix may be to do what the comment in iterative_hash_expr says for SSA names: case SSA_NAME: /* We can just compare by pointer. */ return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val); (probably blame me for changing that to hashing the SSA version). It was lxo. But I'm not sure that doesn't uncover issues with various hashtables and walking them, generating code dependent on the order. It's IMHO just not expected that you compare function-local expressions from different functions. Same speedup result from Index: gcc/tree.c === --- gcc/tree.c (revision 207960) +++ gcc/tree.c (working copy) @@ -7428,7 +7428,7 @@ iterative_hash_expr (const_tree t, hashv } case SSA_NAME: /* We can just compare by pointer. */ - return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val); + return iterative_hash_hashval_t ((uintptr_t)t3, val); case PLACEHOLDER_EXPR: /* The node itself doesn't matter. */ return val; and from Index: gcc/tree.c === --- gcc/tree.c (revision 207960) +++ gcc/tree.c (working copy) @@ -7428,7 +7428,9 @@ iterative_hash_expr (const_tree t, hashv } case SSA_NAME: /* We can just compare by pointer. */ - return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val); + return iterative_hash_host_wide_int + (DECL_UID (cfun-decl), + iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val)); case PLACEHOLDER_EXPR: /* The node itself doesn't matter. */ return val; better than hashing pointers but requring cfun != NULL in this function isn't good either. At this point I'm more comfortable with clearing the hashtable than with changing iterative_hash_expr in any way. It's also along the way to get rid of the hash completely. Oh, btw, the speedup is going from expand : 481.98 (94%) usr 1.15 (17%) sys 481.94 (93%) wall 293891 kB (15%) ggc to expand : 2.66 ( 7%) usr 0.13 ( 2%) sys 2.64 ( 6%) wall 262544 kB (13%) ggc at -O0 (less dramatic slowness for -On). The other thing would be to discard mem-attr sharing alltogether, but that doesn't seem appropriate at this stage (but it would also simplify quite some code). With only one function in RTL at a time that shouldn't be too bad (see several suggestions along that line, even with statistics). Bootstrap / regtest running on x86_64-unknown-linux-gnu, ok for trunk and 4.8 branch? Thanks, Richard. 2014-02-21 Richard Biener rguent...@suse.de PR middle-end/60291 * rtl.h (clear_mem_attrs_htab): Declare. * emit-rtl.c (clear_mem_attrs_htab): New function. * final.c (rest_of_clean_state): Call clear_mem_attrs_htab to avoid sharing mem-attrs between functions. Index: gcc/rtl.h === *** gcc/rtl.h (revision 207960) --- gcc/rtl.h (working copy) *** extern int in_sequence_p (void); *** 2546,2551 --- 2546,2552 extern void init_emit (void); extern void init_emit_regs (void); extern void init_emit_once (void); + extern void clear_mem_attrs_htab (void); extern void push_topmost_sequence (void); extern void pop_topmost_sequence (void); extern void set_new_first_and_last_insn (rtx, rtx); Index: gcc/emit-rtl.c === *** gcc/emit-rtl.c(revision 207960) --- gcc/emit-rtl.c(working copy) *** init_emit_once (void) *** 5913,5918 --- 5913,5926 simple_return_rtx = gen_rtx_fmt_ (SIMPLE_RETURN, VOIDmode); cc0_rtx = gen_rtx_fmt_ (CC0, VOIDmode); } + + /* Clear the mem-attrs sharing hash table. */ + + void + clear_mem_attrs_htab (void) + { + htab_empty (mem_attrs_htab); + } /* Produce exact duplicate of insn INSN after AFTER. Care updating of libcall regions if present. */ Index: gcc/final.c === *** gcc/final.c (revision 207960) --- gcc/final.c (working copy) ***
[PATCH] Fix PR60276
This attempts to fix PR60276 - the fact that the vectorizer dependence analysis is run too early and that it invalidates assumptions it makes there later. The specific issue in question arises when the vectorizer needs to effectively unroll the loop and by performing all vectorized loads first and vectorized stores last the idea that it can ignore known dependences with negative distance doesn't work out if that distance is too short. The following is the shortest (and eventually backportable) change I could come up with - record the negative distance during dependence analysis and re-validate it when decisions about stmt copying and group sizes are fixed. Bootstrapped and tested on x86_64-unknown-linux-gnu - does this look ok? Thanks, Richard. 2014-02-21 Richard Biener rguent...@suse.de PR tree-optimization/60276 * tree-vectorizer.h (struct _stmt_vec_info): Add min_neg_dist field. (STMT_VINFO_MIN_NEG_DIST): New macro. * tree-vect-data-refs.c (vect_analyze_data_ref_dependence): Record STMT_VINFO_MIN_NEG_DIST. * tree-vect-stmts.c (vectorizable_load): Verify if assumptions made for negative dependence distances still hold. * gcc.dg/vect/pr60276.c: New testcase. Index: gcc/tree-vectorizer.h === *** gcc/tree-vectorizer.h (revision 207938) --- gcc/tree-vectorizer.h (working copy) *** typedef struct _stmt_vec_info { *** 622,627 --- 622,631 is 1. */ unsigned int gap; + /* The minimum negative dependence distance this stmt participates in + or zero if none. */ + unsigned int min_neg_dist; + /* Not all stmts in the loop need to be vectorized. e.g, the increment of the loop induction variable and computation of array indexes. relevant indicates whether the stmt needs to be vectorized. */ *** typedef struct _stmt_vec_info { *** 677,682 --- 681,687 #define STMT_VINFO_GROUP_SAME_DR_STMT(S) (S)-same_dr_stmt #define STMT_VINFO_GROUPED_ACCESS(S) ((S)-first_element != NULL (S)-data_ref_info) #define STMT_VINFO_LOOP_PHI_EVOLUTION_PART(S) (S)-loop_phi_evolution_part + #define STMT_VINFO_MIN_NEG_DIST(S)(S)-min_neg_dist #define GROUP_FIRST_ELEMENT(S) (S)-first_element #define GROUP_NEXT_ELEMENT(S) (S)-next_element Index: gcc/tree-vect-data-refs.c === *** gcc/tree-vect-data-refs.c (revision 207938) --- gcc/tree-vect-data-refs.c (working copy) *** vect_analyze_data_ref_dependence (struct *** 403,408 --- 425,437 if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, dependence distance negative.\n); + /* Record a negative dependence distance to later limit the +amount of stmt copying / unrolling we can perform. +Only need to handle read-after-write dependence. */ + if (DR_IS_READ (drb) + (STMT_VINFO_MIN_NEG_DIST (stmtinfo_b) == 0 + || STMT_VINFO_MIN_NEG_DIST (stmtinfo_b) dist)) + STMT_VINFO_MIN_NEG_DIST (stmtinfo_b) = dist; continue; } Index: gcc/tree-vect-stmts.c === *** gcc/tree-vect-stmts.c (revision 207938) --- gcc/tree-vect-stmts.c (working copy) *** vectorizable_load (gimple stmt, gimple_s *** 5629,5634 --- 5629,5648 return false; } + /* Invalidate assumptions made by dependence analysis when vectorization + on the unrolled body effectively re-orders stmts. */ + if (ncopies 1 +STMT_VINFO_MIN_NEG_DIST (stmt_info) != 0 +((unsigned)LOOP_VINFO_VECT_FACTOR (loop_vinfo) + STMT_VINFO_MIN_NEG_DIST (stmt_info))) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, +cannot perform implicit CSE when unrolling +with negative dependence distance\n); + return false; + } + if (!STMT_VINFO_RELEVANT_P (stmt_info) !bb_vinfo) return false; *** vectorizable_load (gimple stmt, gimple_s *** 5686,5691 --- 5700,5719 else if (!vect_grouped_load_supported (vectype, group_size)) return false; } + + /* Invalidate assumptions made by dependence analysis when vectorization +on the unrolled body effectively re-orders stmts. */ + if (!PURE_SLP_STMT (stmt_info) + STMT_VINFO_MIN_NEG_DIST (stmt_info) != 0 + ((unsigned)LOOP_VINFO_VECT_FACTOR (loop_vinfo) + STMT_VINFO_MIN_NEG_DIST (stmt_info))) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, +cannot
Re: [PATCH] Fix latent bug in replace_uses_by
On Thu, Feb 20, 2014 at 10:51 PM, Richard Biener rguent...@suse.de wrote: The following fixes an ICE I got when building libjava with a local patch. This causes us to substitute MEM[a, 5] into MEM[_3, 0] to MEM[MEM[a, 5], 0] and then asking stmt_ends_bb_p which doesn't expect such bogus MEM_REFs. The MEM_REF is canonicalized by calling fold_stmt on it later, but the fix is of course to move the marking of altered BBs before doing the actual substitution (only then we are sure to catch all previous bb-ending stmts). I also noticed we don't verify MEM_REFs on LHSs. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk and branch (it's a regression uncovered by the fix for PR60115). Richard. 2014-02-20 Richard Biener rguent...@suse.de * tree-cfg.c (replace_uses_by): Mark altered BBs before doing the substitution. (verify_gimple_assign_single): Also verify bare MEM_REFs on the lhs. Index: gcc/tree-cfg.c === --- gcc/tree-cfg.c (revision 207936) +++ gcc/tree-cfg.c (working copy) @@ -1677,6 +1677,11 @@ replace_uses_by (tree name, tree val) FOR_EACH_IMM_USE_STMT (stmt, imm_iter, name) { + /* Mark the block if we change the last stmt in it. */ + if (cfgcleanup_altered_bbs + stmt_ends_bb_p (stmt)) + bitmap_set_bit (cfgcleanup_altered_bbs, gimple_bb (stmt)-index); + FOR_EACH_IMM_USE_ON_STMT (use, imm_iter) { replace_exp (use, val); @@ -1701,11 +1706,6 @@ replace_uses_by (tree name, tree val) gimple orig_stmt = stmt; size_t i; - /* Mark the block if we changed the last stmt in it. */ - if (cfgcleanup_altered_bbs - stmt_ends_bb_p (stmt)) - bitmap_set_bit (cfgcleanup_altered_bbs, gimple_bb (stmt)-index); - Hi Richard, I also noticed this with local patch, but is it OK just to move above code after fold_stmt? In other words, does phi node matter (according to comments before cfgcleanup_altered_bbs)? Thanks in advance. /* FIXME. It shouldn't be required to keep TREE_CONSTANT on ADDR_EXPRs up-to-date on GIMPLE. Propagation will only change sth from non-invariant to invariant, and only @@ -3986,7 +3986,9 @@ verify_gimple_assign_single (gimple stmt return true; } - if (handled_component_p (lhs)) + if (handled_component_p (lhs) + || TREE_CODE (lhs) == MEM_REF + || TREE_CODE (lhs) == TARGET_MEM_REF) res |= verify_types_in_gimple_reference (lhs, true); /* Special codes we cannot handle via their class. */ -- Best Regards.
Re: [PATCH] Fix PR60276
On Fri, Feb 21, 2014 at 11:32:41AM +0100, Richard Biener wrote: This attempts to fix PR60276 - the fact that the vectorizer dependence analysis is run too early and that it invalidates assumptions it makes there later. The specific issue in question arises when the vectorizer needs to effectively unroll the loop and by performing all vectorized loads first and vectorized stores last the idea that it can ignore known dependences with negative distance doesn't work out if that distance is too short. The following is the shortest (and eventually backportable) change I could come up with - record the negative distance during dependence analysis and re-validate it when decisions about stmt copying and group sizes are fixed. Bootstrapped and tested on x86_64-unknown-linux-gnu - does this look ok? Ok, thanks. 2014-02-21 Richard Biener rguent...@suse.de PR tree-optimization/60276 * tree-vectorizer.h (struct _stmt_vec_info): Add min_neg_dist field. (STMT_VINFO_MIN_NEG_DIST): New macro. * tree-vect-data-refs.c (vect_analyze_data_ref_dependence): Record STMT_VINFO_MIN_NEG_DIST. * tree-vect-stmts.c (vectorizable_load): Verify if assumptions made for negative dependence distances still hold. * gcc.dg/vect/pr60276.c: New testcase. Jakub
Re: [PATCH][1/n] Improve PR60291
On Fri, 21 Feb 2014, Richard Biener wrote: This improves compile-time of PR60291 at -O1 from 210s to 85s, getting remove unused locals out of the profile. There walking DECL_INITIAL of globals is quadratic when that is refered to from multiple functions. We've had the same issue with add_referenced_vars when that still existed. Bootstrapped and tested on x86_64-unknown-linux-gnu, ok for trunk and branch? I've verified that I can still properly debug int ** foo (void) { static int a = 0; static int *b = a; static int **c = b; return c; } int main() { return **foo(); } (step into foo and print a, b and c). Note that even with 4.8 right now int ** foo (void) { int **q; { static int a = 0; static int *b = a; static int **c = b; q = c; } return q; } int main() { return **foo(); } is broken (with -O1 -fno-inline, with inlining both cases are broken). But that all doesn't regress with the following and if we fix it then we should fix it another way, not by walking global initializers. So I checked if this all is a regression and this particular piece is a regression from 4.7 where we only walk global initializers for VAR_DECLs with DECL_CONTEXT == current_function_decl. So at this point it's easiest and least intrusive to re-instantiate this restriction which was removed by r187800 (that was me - the change looks accidential). Re-bootstrapping / testing on x86_64-unknown-linux-gnu and will commit afterwards to trunk and to the branch a bit later. Thanks, Richard. 2014-02-21 Richard Biener rguent...@suse.de PR middle-end/60291 * tree-ssa-live.c (mark_all_vars_used_1): Do not walk DECL_INITIAL for globals not in the current function context. Index: gcc/tree-ssa-live.c === *** gcc/tree-ssa-live.c (revision 207960) --- gcc/tree-ssa-live.c (working copy) *** mark_all_vars_used_1 (tree *tp, int *wal *** 435,441 { /* When a global var becomes used for the first time also walk its initializer (non global ones don't have any). */ ! if (set_is_used (t) is_global_var (t)) mark_all_vars_used (DECL_INITIAL (t)); } /* remove_unused_scope_block_p requires information about labels --- 435,442 { /* When a global var becomes used for the first time also walk its initializer (non global ones don't have any). */ ! if (set_is_used (t) is_global_var (t) ! DECL_CONTEXT (t) == current_function_decl) mark_all_vars_used (DECL_INITIAL (t)); } /* remove_unused_scope_block_p requires information about labels
Re: [PATCH] Fix latent bug in replace_uses_by
On Fri, 21 Feb 2014, Bin.Cheng wrote: On Thu, Feb 20, 2014 at 10:51 PM, Richard Biener rguent...@suse.de wrote: The following fixes an ICE I got when building libjava with a local patch. This causes us to substitute MEM[a, 5] into MEM[_3, 0] to MEM[MEM[a, 5], 0] and then asking stmt_ends_bb_p which doesn't expect such bogus MEM_REFs. The MEM_REF is canonicalized by calling fold_stmt on it later, but the fix is of course to move the marking of altered BBs before doing the actual substitution (only then we are sure to catch all previous bb-ending stmts). I also noticed we don't verify MEM_REFs on LHSs. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk and branch (it's a regression uncovered by the fix for PR60115). Richard. 2014-02-20 Richard Biener rguent...@suse.de * tree-cfg.c (replace_uses_by): Mark altered BBs before doing the substitution. (verify_gimple_assign_single): Also verify bare MEM_REFs on the lhs. Index: gcc/tree-cfg.c === --- gcc/tree-cfg.c (revision 207936) +++ gcc/tree-cfg.c (working copy) @@ -1677,6 +1677,11 @@ replace_uses_by (tree name, tree val) FOR_EACH_IMM_USE_STMT (stmt, imm_iter, name) { + /* Mark the block if we change the last stmt in it. */ + if (cfgcleanup_altered_bbs + stmt_ends_bb_p (stmt)) + bitmap_set_bit (cfgcleanup_altered_bbs, gimple_bb (stmt)-index); + FOR_EACH_IMM_USE_ON_STMT (use, imm_iter) { replace_exp (use, val); @@ -1701,11 +1706,6 @@ replace_uses_by (tree name, tree val) gimple orig_stmt = stmt; size_t i; - /* Mark the block if we changed the last stmt in it. */ - if (cfgcleanup_altered_bbs - stmt_ends_bb_p (stmt)) - bitmap_set_bit (cfgcleanup_altered_bbs, gimple_bb (stmt)-index); - Hi Richard, I also noticed this with local patch, but is it OK just to move above code after fold_stmt? In other words, does phi node matter (according to comments before cfgcleanup_altered_bbs)? PHI node doesn't matter but doesn't trigger stmt_ends_bb_p anyway. It's better to do before the replacement because a stmt that may have been stmt_ends_bb_p before the replacement might not be after it (and thus we'd miss a cfgcleanup opportunity to merge two blocks). Richard. Thanks in advance. /* FIXME. It shouldn't be required to keep TREE_CONSTANT on ADDR_EXPRs up-to-date on GIMPLE. Propagation will only change sth from non-invariant to invariant, and only @@ -3986,7 +3986,9 @@ verify_gimple_assign_single (gimple stmt return true; } - if (handled_component_p (lhs)) + if (handled_component_p (lhs) + || TREE_CODE (lhs) == MEM_REF + || TREE_CODE (lhs) == TARGET_MEM_REF) res |= verify_types_in_gimple_reference (lhs, true); /* Special codes we cannot handle via their class. */ -- Richard Biener rguent...@suse.de SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imendorffer
[C++ Patch] PR 60253
Hi, unless we have reasons to believe that the diagnostic quality could regress in some circumstances, we can easily resolve this ICE on invalid regression by always returning error_mark_node after error (thus outside SFINAE too). Tested x86_64-linux. Thanks, Paolo. / /cp 2014-02-21 Paolo Carlini paolo.carl...@oracle.com PR c++/60253 * call.c (convert_arg_to_ellipsis): Return error_mark_node after error_at. /testsuite 2014-02-21 Paolo Carlini paolo.carl...@oracle.com PR c++/60253 * g++.dg/overload/ellipsis2.C: New. Index: cp/call.c === --- cp/call.c (revision 207987) +++ cp/call.c (working copy) @@ -6406,8 +6406,7 @@ convert_arg_to_ellipsis (tree arg, tsubst_flags_t if (complain tf_error) error_at (loc, cannot pass objects of non-trivially-copyable type %q#T through %...%, arg_type); - else - return error_mark_node; + return error_mark_node; } } Index: testsuite/g++.dg/overload/ellipsis2.C === --- testsuite/g++.dg/overload/ellipsis2.C (revision 0) +++ testsuite/g++.dg/overload/ellipsis2.C (working copy) @@ -0,0 +1,13 @@ +// PR c++/60253 + +struct A +{ + ~A(); +}; + +struct B +{ + B(...); +}; + +B b(0, A()); // { dg-error cannot pass }
Re: [PATCH][2/2] Fix expansion slowness of PR60291
On Fri, 21 Feb 2014, Richard Biener wrote: On Fri, 21 Feb 2014, Richard Biener wrote: This fixes the slowness of RTL expansion in PR60291 which is caused by excessive collisions in mem-attr sharing. The issue here is that sharing attempts happens across functions and we have a _lot_ of functions in this testcase referencing the same lexically equivalent memory, for example MEM[(StgWord *)_5 + -64B]. That means those get the same hash value. But they don't compare equal because an SSA name _5 from function A is of course not equal to one from function B. The following fixes that by not doing mem-attr sharing across functions by clearing the mem-attrs hashtable in rest_of_clean_state. Another fix may be to do what the comment in iterative_hash_expr says for SSA names: case SSA_NAME: /* We can just compare by pointer. */ return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val); (probably blame me for changing that to hashing the SSA version). It was lxo. But I'm not sure that doesn't uncover issues with various hashtables and walking them, generating code dependent on the order. It's IMHO just not expected that you compare function-local expressions from different functions. Same speedup result from Index: gcc/tree.c === --- gcc/tree.c (revision 207960) +++ gcc/tree.c (working copy) @@ -7428,7 +7428,7 @@ iterative_hash_expr (const_tree t, hashv } case SSA_NAME: /* We can just compare by pointer. */ - return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val); + return iterative_hash_hashval_t ((uintptr_t)t3, val); case PLACEHOLDER_EXPR: /* The node itself doesn't matter. */ return val; and from Index: gcc/tree.c === --- gcc/tree.c (revision 207960) +++ gcc/tree.c (working copy) @@ -7428,7 +7428,9 @@ iterative_hash_expr (const_tree t, hashv } case SSA_NAME: /* We can just compare by pointer. */ - return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val); + return iterative_hash_host_wide_int + (DECL_UID (cfun-decl), + iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val)); case PLACEHOLDER_EXPR: /* The node itself doesn't matter. */ return val; better than hashing pointers but requring cfun != NULL in this function isn't good either. At this point I'm more comfortable with clearing the hashtable than with changing iterative_hash_expr in any way. It's also along the way to get rid of the hash completely. Oh, btw, the speedup is going from expand : 481.98 (94%) usr 1.15 (17%) sys 481.94 (93%) wall 293891 kB (15%) ggc to expand : 2.66 ( 7%) usr 0.13 ( 2%) sys 2.64 ( 6%) wall 262544 kB (13%) ggc at -O0 (less dramatic slowness for -On). The other thing would be to discard mem-attr sharing alltogether, but that doesn't seem appropriate at this stage (but it would also simplify quite some code). With only one function in RTL at a time that shouldn't be too bad (see several suggestions along that line, even with statistics). Last statistics: http://gcc.gnu.org/ml/gcc-patches/2011-08/msg01784.html Richard.
[patch] [arm] Fix PR60169 - thumb1 far jump
Patch http://gcc.gnu.org/ml/gcc-patches/2012-12/msg01229.html introduced this ICE: 1. thumb1 estimate if far_jump is used based on function insn size 2. During reload, after stack layout finalized, it does reload_as_needed. It however increases insn size that changes estimation result of far_jump, which in return need to save lr and change stack layout again. While there is not chance to change, GCC crashes. Solution: Do not change estimation result of far_jump if reload_in_progress or reload_completed is true. Not likely need to fix lra according to Vlad: http://gcc.gnu.org/ml/gcc/2014-02/msg00355.html ChangeLog: * config/arm/arm.c (thumb_far_jump_used_p): Don't change if reload in progress or completed. * gcc.target/arm/thumb1-far-jump-3.c: New case. diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index b562986..2cf362c 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -26255,6 +26255,11 @@ thumb_far_jump_used_p (void) return 0; } + /* We should not change far_jump_used during or after reload, as there is + no chance to change stack frame layout. */ + if (reload_in_progress || reload_completed) +return 0; + /* Check to see if the function contains a branch insn with the far jump attribute set. */ for (insn = get_insns (); insn; insn = NEXT_INSN (insn)) diff --git a/gcc/testsuite/gcc.target/arm/thumb1-far-jump-3.c b/gcc/testsuite/gcc.target/arm/thumb1-far-jump-3.c new file mode 100644 index 000..90559ba --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/thumb1-far-jump-3.c @@ -0,0 +1,108 @@ +/* Catch reload ICE on target thumb1 with far jump optimization. + * It is also a valid case for non-thumb1 target. */ + +/* Add -mno-lra option as it is only reproducable with reload. It will + be removed after reload is completely removed. */ +/* { dg-options -mno-lra -fomit-frame-pointer } */ +/* { dg-do compile } */ + +#define C 2 +#define A 4 +#define RGB (C | A) +#define GRAY (A) + +typedef unsigned long uint_32; +typedef unsigned char byte; +typedef byte* bytep; + +typedef struct ss +{ + uint_32 w; + uint_32 r; + byte c; + byte b; + byte p; +} info; + +typedef info * infop; + +void +foo(infop info, bytep row) +{ + uint_32 iw = info-w; + if (info-c == RGB) + { + if (info-b == 8) + { + bytep sp = row + info-r; + bytep dp = sp; + byte save; + uint_32 i; + + for (i = 0; i iw; i++) + { +save = *(--sp); +*(--dp) = *(--sp); +*(--dp) = *(--sp); +*(--dp) = *(--sp); +*(--dp) = save; + } + } + + else + { + bytep sp = row + info-r; + bytep dp = sp; + byte save[2]; + uint_32 i; + + for (i = 0; i iw; i++) + { +save[0] = *(--sp); +save[1] = *(--sp); +*(--dp) = *(--sp); +*(--dp) = *(--sp); +*(--dp) = *(--sp); +*(--dp) = *(--sp); +*(--dp) = *(--sp); +*(--dp) = *(--sp); +*(--dp) = save[0]; +*(--dp) = save[1]; + } + } + } + else if (info-c == GRAY) + { + if (info-b == 8) + { + bytep sp = row + info-r; + bytep dp = sp; + byte save; + uint_32 i; + + for (i = 0; i iw; i++) + { +save = *(--sp); +*(--dp) = *(--sp); +*(--dp) = save; + } + } + else + { + bytep sp = row + info-r; + bytep dp = sp; + byte save[2]; + uint_32 i; + + for (i = 0; i iw; i++) + { +save[0] = *(--sp); +save[1] = *(--sp); +*(--dp) = *(--sp); +*(--dp) = *(--sp); +*(--dp) = save[0]; +*(--dp) = save[1]; + } + } + } +}
RE: [patch] [arm] Fix PR60169 - thumb1 far jump
OK to trunk and 4.8? -Original Message- From: Joey Ye [mailto:joey...@arm.com] Sent: 2014年2月21日 19:32 To: gcc-patches@gcc.gnu.org Subject: [patch] [arm] Fix PR60169 - thumb1 far jump Patch http://gcc.gnu.org/ml/gcc-patches/2012-12/msg01229.html introduced this ICE: 1. thumb1 estimate if far_jump is used based on function insn size 2. During reload, after stack layout finalized, it does reload_as_needed. It however increases insn size that changes estimation result of far_jump, which in return need to save lr and change stack layout again. While there is not chance to change, GCC crashes. Solution: Do not change estimation result of far_jump if reload_in_progress or reload_completed is true. Not likely need to fix lra according to Vlad: http://gcc.gnu.org/ml/gcc/2014-02/msg00355.html ChangeLog: * config/arm/arm.c (thumb_far_jump_used_p): Don't change if reload in progress or completed. * gcc.target/arm/thumb1-far-jump-3.c: New case.
Re: [PATCH][2/2] Fix expansion slowness of PR60291
On Fri, 21 Feb 2014, Richard Biener wrote: On Fri, 21 Feb 2014, Richard Biener wrote: On Fri, 21 Feb 2014, Richard Biener wrote: This fixes the slowness of RTL expansion in PR60291 which is caused by excessive collisions in mem-attr sharing. The issue here is that sharing attempts happens across functions and we have a _lot_ of functions in this testcase referencing the same lexically equivalent memory, for example MEM[(StgWord *)_5 + -64B]. That means those get the same hash value. But they don't compare equal because an SSA name _5 from function A is of course not equal to one from function B. The following fixes that by not doing mem-attr sharing across functions by clearing the mem-attrs hashtable in rest_of_clean_state. Another fix may be to do what the comment in iterative_hash_expr says for SSA names: case SSA_NAME: /* We can just compare by pointer. */ return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val); (probably blame me for changing that to hashing the SSA version). It was lxo. But I'm not sure that doesn't uncover issues with various hashtables and walking them, generating code dependent on the order. It's IMHO just not expected that you compare function-local expressions from different functions. Same speedup result from Index: gcc/tree.c === --- gcc/tree.c (revision 207960) +++ gcc/tree.c (working copy) @@ -7428,7 +7428,7 @@ iterative_hash_expr (const_tree t, hashv } case SSA_NAME: /* We can just compare by pointer. */ - return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val); + return iterative_hash_hashval_t ((uintptr_t)t3, val); case PLACEHOLDER_EXPR: /* The node itself doesn't matter. */ return val; and from Index: gcc/tree.c === --- gcc/tree.c (revision 207960) +++ gcc/tree.c (working copy) @@ -7428,7 +7428,9 @@ iterative_hash_expr (const_tree t, hashv } case SSA_NAME: /* We can just compare by pointer. */ - return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val); + return iterative_hash_host_wide_int + (DECL_UID (cfun-decl), + iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val)); case PLACEHOLDER_EXPR: /* The node itself doesn't matter. */ return val; better than hashing pointers but requring cfun != NULL in this function isn't good either. At this point I'm more comfortable with clearing the hashtable than with changing iterative_hash_expr in any way. It's also along the way to get rid of the hash completely. Oh, btw, the speedup is going from expand : 481.98 (94%) usr 1.15 (17%) sys 481.94 (93%) wall 293891 kB (15%) ggc to expand : 2.66 ( 7%) usr 0.13 ( 2%) sys 2.64 ( 6%) wall 262544 kB (13%) ggc at -O0 (less dramatic slowness for -On). The other thing would be to discard mem-attr sharing alltogether, but that doesn't seem appropriate at this stage (but it would also simplify quite some code). With only one function in RTL at a time that shouldn't be too bad (see several suggestions along that line, even with statistics). Last statistics: http://gcc.gnu.org/ml/gcc-patches/2011-08/msg01784.html With the patch below to get some statistics we see that one important piece of sharing not covered by above measurements is RTX copying(?). On the testcase for this PR I get at -O1 and without the patch to clear the hashtable after each function 142489 mem_attrs created (142439 for new, 50 for modification) 1983225 mem_attrs shared (4044 for new, 820241 for modification, 1158940 by rtx copying) 0 mem_attrs dropped and with the patch to clear after each function 364411 mem_attrs created (144478 for new, 219933 for modification) 1761303 mem_attrs shared (2005 for new, 600358 for modification, 1158940 by rtx copying) 0 mem_attrs dropped while for dwarf2out.c I see without the clearing 24399 mem_attrs created (6929 for new, 17470 for modification) 102676 mem_attrs shared (10878 for new, 29265 for modification, 62533 by rtx copying) 16 mem_attrs dropped which means that completely dropping the sharing would result in creating of 6929 + 17807 + 62533(!) vs. 24399 mem-attrs. That's still not a lot overhead given that mem-attrs take 40 bytes (3MB vs. 950kB). There is also always the possibility to explicitely ref-count mem-attrs to handle sharing by rtx copying (at least cse, fwprop, combine, ira and reload copy MEMs, probably some for no good reason because MEMs are not shared), thus make mem-attrs unshare-on-modify. Richard. Index: gcc/rtl.c
Re: [PATCH][AArch64] vrnd*_f64 patch for stage-1
On 13/02/14 17:43, Richard Henderson wrote: On 02/13/2014 03:17 AM, Alex Velenko wrote: +/* Sets rmode field of FPCR control register to + FPROUNDING_ZERO. */ Comment is wrong, or at least misleading. +void __inline __attribute__ ((__always_inline__)) +set_rounding_mode (uint32_t mode) +{ + uint32_t r; + + /* Read current FPCR. */ + asm volatile (mrs %[r], fpcr : [r] =r (r) : :); + + /* Clear rmode. */ + r = 3 RMODE_START; ~(3 RMODE_START) + /* Calculate desired FPCR. */ + r |= mode RMODE_START; + + /* Write desired FPCR back. */ + asm volatile (msr fpcr, %[r] : : [r] r (r) :); +} Fortunately for this testcase, you do always use FPROUNDING_ZERO == 3 when calling this function, so the bugs are hidden. r~ Hi Richard, Thank you for pointing those issue out. here is a respin of the same patch with indecated issues fixed. the description of the patch is as follows: This patch adds vrnd*_f64 aarch64 intrinsics. A testcase for those intrinsics is added. Run a complete LE and BE regression run with no regressions. Is patch OK for stage-1? gcc/ 2014-02-21 Alex Velenko alex.vele...@arm.com * config/aarch64/aarch64-builtins.c (BUILTIN_VDQF_DF): Macro added. * config/aarch64/aarch64-simd-builtins.def (frintn): Use added macro. * config/aarch64/aarch64-simd.md (frint_pattern): Comment corrected. * config/aarch64/aarch64.md (frint_pattern): Likewise. * config/aarch64/arm_neon.h (vrnd_f64): Added. (vrnda_f64): Likewise. (vrndi_f64): Likewise. (vrndm_f64): Likewise. (vrndn_f64): Likewise. (vrndp_f64): Likewise. (vrndx_f64): Likewise. gcc/testsuite/ 2014-02-21 Alex Velenko alex.vele...@arm.com gcc.target/aarch64/vrnd_f64_1.c : New testcase. diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c index ebab2ce8347a4425977c5cbd0f285c3ff1d9f2f1..7adc5fb96b6473ecde5c4f76973aff68af0ca7d4 100644 --- a/gcc/config/aarch64/aarch64-builtins.c +++ b/gcc/config/aarch64/aarch64-builtins.c @@ -307,6 +307,8 @@ aarch64_types_store1_qualifiers[SIMD_MAX_BUILTIN_ARGS] VAR7 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si, v2di) #define BUILTIN_VDQF(T, N, MAP) \ VAR3 (T, N, MAP, v2sf, v4sf, v2df) +#define BUILTIN_VDQF_DF(T, N, MAP) \ + VAR4 (T, N, MAP, v2sf, v4sf, v2df, df) #define BUILTIN_VDQH(T, N, MAP) \ VAR2 (T, N, MAP, v4hi, v8hi) #define BUILTIN_VDQHS(T, N, MAP) \ diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def index e5f71b479ccfd1a9cbf84aed0f96b49762053f59..09e230c56683a0225f8760472d7137b7bac98297 100644 --- a/gcc/config/aarch64/aarch64-simd-builtins.def +++ b/gcc/config/aarch64/aarch64-simd-builtins.def @@ -264,7 +264,7 @@ BUILTIN_VDQF (UNOP, nearbyint, 2) BUILTIN_VDQF (UNOP, rint, 2) BUILTIN_VDQF (UNOP, round, 2) - BUILTIN_VDQF (UNOP, frintn, 2) + BUILTIN_VDQF_DF (UNOP, frintn, 2) /* Implemented by lfcvt_patternsu_optabVQDF:modevcvt_target2. */ VAR1 (UNOP, lbtruncv2sf, 2, v2si) diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 4dffb59e856aeaafb79007255d3b91a73ef1ef13..0c1d7de5b3f4fb0fa8fa226b81ec690d8112b849 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -1427,7 +1427,7 @@ ) ;; Vector versions of the floating-point frint patterns. -;; Expands to btrunc, ceil, floor, nearbyint, rint, round. +;; Expands to btrunc, ceil, floor, nearbyint, rint, round, frintn. (define_insn frint_patternmode2 [(set (match_operand:VDQF 0 register_operand =w) (unspec:VDQF [(match_operand:VDQF 1 register_operand w)] diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 99a6ac8fcbdcd24a0ea18cc037bef9cf72070281..577aa9fe08bb445e66734bc404e94e13dc1fa65b 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -3187,7 +3187,7 @@ ;; --- ;; frint floating-point round to integral standard patterns. -;; Expands to btrunc, ceil, floor, nearbyint, rint, round. +;; Expands to btrunc, ceil, floor, nearbyint, rint, round, frintn. (define_insn frint_patternmode2 [(set (match_operand:GPF 0 register_operand =w) diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index 6af99361b8e265f66026dc506cfc23f044d153b4..797e37ad638648312ef34bcd63c463e5873c30c4 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -22481,6 +22481,12 @@ vrnd_f32 (float32x2_t __a) return __builtin_aarch64_btruncv2sf (__a); } +__extension__ static __inline float64x1_t __attribute__ ((__always_inline__)) +vrnd_f64 (float64x1_t __a) +{ + return vset_lane_f64 (__builtin_trunc (vget_lane_f64 (__a, 0)), __a, 0); +} + __extension__ static __inline float32x4_t __attribute__ ((__always_inline__)) vrndq_f32 (float32x4_t __a) { @@ -22501,6 +22507,12 @@ vrnda_f32 (float32x2_t __a) return
Re: [PATCH][2/2] Fix expansion slowness of PR60291
Richard Biener rguent...@suse.de writes: On Fri, 21 Feb 2014, Richard Biener wrote: On Fri, 21 Feb 2014, Richard Biener wrote: On Fri, 21 Feb 2014, Richard Biener wrote: This fixes the slowness of RTL expansion in PR60291 which is caused by excessive collisions in mem-attr sharing. The issue here is that sharing attempts happens across functions and we have a _lot_ of functions in this testcase referencing the same lexically equivalent memory, for example MEM[(StgWord *)_5 + -64B]. That means those get the same hash value. But they don't compare equal because an SSA name _5 from function A is of course not equal to one from function B. The following fixes that by not doing mem-attr sharing across functions by clearing the mem-attrs hashtable in rest_of_clean_state. Another fix may be to do what the comment in iterative_hash_expr says for SSA names: case SSA_NAME: /* We can just compare by pointer. */ return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val); (probably blame me for changing that to hashing the SSA version). It was lxo. But I'm not sure that doesn't uncover issues with various hashtables and walking them, generating code dependent on the order. It's IMHO just not expected that you compare function-local expressions from different functions. Same speedup result from Index: gcc/tree.c === --- gcc/tree.c (revision 207960) +++ gcc/tree.c (working copy) @@ -7428,7 +7428,7 @@ iterative_hash_expr (const_tree t, hashv } case SSA_NAME: /* We can just compare by pointer. */ - return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val); + return iterative_hash_hashval_t ((uintptr_t)t3, val); case PLACEHOLDER_EXPR: /* The node itself doesn't matter. */ return val; and from Index: gcc/tree.c === --- gcc/tree.c (revision 207960) +++ gcc/tree.c (working copy) @@ -7428,7 +7428,9 @@ iterative_hash_expr (const_tree t, hashv } case SSA_NAME: /* We can just compare by pointer. */ - return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val); + return iterative_hash_host_wide_int + (DECL_UID (cfun-decl), + iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val)); case PLACEHOLDER_EXPR: /* The node itself doesn't matter. */ return val; better than hashing pointers but requring cfun != NULL in this function isn't good either. At this point I'm more comfortable with clearing the hashtable than with changing iterative_hash_expr in any way. It's also along the way to get rid of the hash completely. Oh, btw, the speedup is going from expand : 481.98 (94%) usr 1.15 (17%) sys 481.94 (93%) wall 293891 kB (15%) ggc to expand : 2.66 ( 7%) usr 0.13 ( 2%) sys 2.64 ( 6%) wall 262544 kB (13%) ggc at -O0 (less dramatic slowness for -On). The other thing would be to discard mem-attr sharing alltogether, but that doesn't seem appropriate at this stage (but it would also simplify quite some code). With only one function in RTL at a time that shouldn't be too bad (see several suggestions along that line, even with statistics). Last statistics: http://gcc.gnu.org/ml/gcc-patches/2011-08/msg01784.html With the patch below to get some statistics we see that one important piece of sharing not covered by above measurements is RTX copying(?). On the testcase for this PR I get at -O1 and without the patch to clear the hashtable after each function 142489 mem_attrs created (142439 for new, 50 for modification) 1983225 mem_attrs shared (4044 for new, 820241 for modification, 1158940 by rtx copying) 0 mem_attrs dropped and with the patch to clear after each function 364411 mem_attrs created (144478 for new, 219933 for modification) 1761303 mem_attrs shared (2005 for new, 600358 for modification, 1158940 by rtx copying) 0 mem_attrs dropped while for dwarf2out.c I see without the clearing 24399 mem_attrs created (6929 for new, 17470 for modification) 102676 mem_attrs shared (10878 for new, 29265 for modification, 62533 by rtx copying) 16 mem_attrs dropped which means that completely dropping the sharing would result in creating of 6929 + 17807 + 62533(!) vs. 24399 mem-attrs. That's still not a lot overhead given that mem-attrs take 40 bytes (3MB vs. 950kB). There is also always the possibility to explicitely ref-count mem-attrs to handle sharing by rtx copying (at least cse, fwprop, combine, ira and reload copy MEMs, probably some for no good reason because MEMs are not shared), thus make mem-attrs unshare-on-modify. In a thread a few
Re: [PATCH][2/2] Fix expansion slowness of PR60291
On Fri, 21 Feb 2014, Richard Biener wrote: On Fri, 21 Feb 2014, Richard Biener wrote: Last statistics: http://gcc.gnu.org/ml/gcc-patches/2011-08/msg01784.html With the patch below to get some statistics we see that one important piece of sharing not covered by above measurements is RTX copying(?). On the testcase for this PR I get at -O1 and without the patch to clear the hashtable after each function 142489 mem_attrs created (142439 for new, 50 for modification) 1983225 mem_attrs shared (4044 for new, 820241 for modification, 1158940 by rtx copying) 0 mem_attrs dropped and with the patch to clear after each function 364411 mem_attrs created (144478 for new, 219933 for modification) 1761303 mem_attrs shared (2005 for new, 600358 for modification, 1158940 by rtx copying) 0 mem_attrs dropped while for dwarf2out.c I see without the clearing 24399 mem_attrs created (6929 for new, 17470 for modification) 102676 mem_attrs shared (10878 for new, 29265 for modification, 62533 by rtx copying) 16 mem_attrs dropped Oh, and more than half of shared-modified are actually not modified so are false sharing reports (set_mem_attrs (mem, MEM_ATTRS (mem))). 24399 mem_attrs created (6929 for new, 17470 for modification) 85801 mem_attrs shared (10878 for new, 12390 for modification, 62533 by rtx copying) 16 mem_attrs dropped when dropping sharing completely you win creations for modification but lose shares for new and copy. Losing the copy case makes it a loss overall which you can eventually offset by using a ref-counting scheme (or better by avoiding copying the MEM in the first place, a MEM is currently 24 bytes while its attrs are 40 bytes). Richard. Index: gcc/rtl.c === --- gcc/rtl.c (revision 207938) +++ gcc/rtl.c (working copy) @@ -326,6 +326,8 @@ copy_rtx (rtx orig) return copy; } +unsigned long mem_attrs_shared_copy; + /* Create a new copy of an rtx. Only copy just one level. */ rtx @@ -333,6 +335,8 @@ shallow_copy_rtx_stat (const_rtx orig ME { const unsigned int size = rtx_size (orig); rtx const copy = ggc_alloc_rtx_def_stat (size PASS_MEM_STAT); + if (MEM_P (orig) MEM_ATTRS (orig)) +mem_attrs_shared_copy++; return (rtx) memcpy (copy, orig, size); } Index: gcc/emit-rtl.c === --- gcc/emit-rtl.c (revision 207938) +++ gcc/emit-rtl.c (working copy) @@ -290,6 +290,12 @@ mem_attrs_htab_eq (const void *x, const return mem_attrs_eq_p ((const mem_attrs *) x, (const mem_attrs *) y); } +unsigned long mem_attrs_dropped; +unsigned long mem_attrs_new; +unsigned long mem_attrs_new_modified; +unsigned long mem_attrs_shared; +unsigned long mem_attrs_shared_modified; + /* Set MEM's memory attributes so that they are the same as ATTRS. */ static void @@ -300,6 +306,8 @@ set_mem_attrs (rtx mem, mem_attrs *attrs /* If everything is the default, we can just clear the attributes. */ if (mem_attrs_eq_p (attrs, mode_mem_attrs[(int) GET_MODE (mem)])) { + if (MEM_ATTRS (mem)) + mem_attrs_dropped++; MEM_ATTRS (mem) = 0; return; } @@ -309,6 +317,20 @@ set_mem_attrs (rtx mem, mem_attrs *attrs { *slot = ggc_alloc_mem_attrs (); memcpy (*slot, attrs, sizeof (mem_attrs)); + if (MEM_ATTRS (mem)) + mem_attrs_new_modified++; + else + mem_attrs_new++; +} + else +{ + if (MEM_ATTRS (mem)) + { + if (MEM_ATTRS (mem) != *slot) + mem_attrs_shared_modified++; + } + else + mem_attrs_shared++; } MEM_ATTRS (mem) = (mem_attrs *) *slot; Index: gcc/toplev.c === --- gcc/toplev.c(revision 207938) +++ gcc/toplev.c(working copy) @@ -1989,6 +2023,26 @@ toplev_main (int argc, char **argv) if (!exit_after_options) do_compile (); +{ + extern unsigned long mem_attrs_dropped; + extern unsigned long mem_attrs_new; + extern unsigned long mem_attrs_new_modified; + extern unsigned long mem_attrs_shared; + extern unsigned long mem_attrs_shared_modified; + extern unsigned long mem_attrs_shared_copy; + fprintf (stderr, %lu mem_attrs created (%lu for new, %lu for + modification)\n, + mem_attrs_new + mem_attrs_new_modified, + mem_attrs_new, mem_attrs_new_modified); + fprintf (stderr, %lu mem_attrs shared (%lu for new, %lu for + modification, %lu by rtx copying)\n, + mem_attrs_shared + mem_attrs_shared_modified + + mem_attrs_shared_copy, + mem_attrs_shared, mem_attrs_shared_modified, + mem_attrs_shared_copy); + fprintf (stderr, %lu mem_attrs dropped\n, mem_attrs_dropped); +} + if (warningcount || errorcount || werrorcount) print_ignored_options ();
How to use GCC to compile glib
Sir, I have a cross compiler and I know how to cross compile a file . But I am doing all just for glib compilation that I do not know how to do. Anyone to guide me. or generally just inform me how can I compile a complete library using gcc. -- View this message in context: http://gcc.1065356.n5.nabble.com/PATCH-2-2-Fix-expansion-slowness-of-PR60291-tp1013329p1013362.html Sent from the gcc - patches mailing list archive at Nabble.com.
Re: [PATCH][2/2] Fix expansion slowness of PR60291
On Fri, 21 Feb 2014, Richard Sandiford wrote: Richard Biener rguent...@suse.de writes: On Fri, 21 Feb 2014, Richard Biener wrote: On Fri, 21 Feb 2014, Richard Biener wrote: On Fri, 21 Feb 2014, Richard Biener wrote: This fixes the slowness of RTL expansion in PR60291 which is caused by excessive collisions in mem-attr sharing. The issue here is that sharing attempts happens across functions and we have a _lot_ of functions in this testcase referencing the same lexically equivalent memory, for example MEM[(StgWord *)_5 + -64B]. That means those get the same hash value. But they don't compare equal because an SSA name _5 from function A is of course not equal to one from function B. The following fixes that by not doing mem-attr sharing across functions by clearing the mem-attrs hashtable in rest_of_clean_state. Another fix may be to do what the comment in iterative_hash_expr says for SSA names: case SSA_NAME: /* We can just compare by pointer. */ return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val); (probably blame me for changing that to hashing the SSA version). It was lxo. But I'm not sure that doesn't uncover issues with various hashtables and walking them, generating code dependent on the order. It's IMHO just not expected that you compare function-local expressions from different functions. Same speedup result from Index: gcc/tree.c === --- gcc/tree.c (revision 207960) +++ gcc/tree.c (working copy) @@ -7428,7 +7428,7 @@ iterative_hash_expr (const_tree t, hashv } case SSA_NAME: /* We can just compare by pointer. */ - return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val); + return iterative_hash_hashval_t ((uintptr_t)t3, val); case PLACEHOLDER_EXPR: /* The node itself doesn't matter. */ return val; and from Index: gcc/tree.c === --- gcc/tree.c (revision 207960) +++ gcc/tree.c (working copy) @@ -7428,7 +7428,9 @@ iterative_hash_expr (const_tree t, hashv } case SSA_NAME: /* We can just compare by pointer. */ - return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val); + return iterative_hash_host_wide_int + (DECL_UID (cfun-decl), + iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val)); case PLACEHOLDER_EXPR: /* The node itself doesn't matter. */ return val; better than hashing pointers but requring cfun != NULL in this function isn't good either. At this point I'm more comfortable with clearing the hashtable than with changing iterative_hash_expr in any way. It's also along the way to get rid of the hash completely. Oh, btw, the speedup is going from expand : 481.98 (94%) usr 1.15 (17%) sys 481.94 (93%) wall 293891 kB (15%) ggc to expand : 2.66 ( 7%) usr 0.13 ( 2%) sys 2.64 ( 6%) wall 262544 kB (13%) ggc at -O0 (less dramatic slowness for -On). The other thing would be to discard mem-attr sharing alltogether, but that doesn't seem appropriate at this stage (but it would also simplify quite some code). With only one function in RTL at a time that shouldn't be too bad (see several suggestions along that line, even with statistics). Last statistics: http://gcc.gnu.org/ml/gcc-patches/2011-08/msg01784.html With the patch below to get some statistics we see that one important piece of sharing not covered by above measurements is RTX copying(?). On the testcase for this PR I get at -O1 and without the patch to clear the hashtable after each function 142489 mem_attrs created (142439 for new, 50 for modification) 1983225 mem_attrs shared (4044 for new, 820241 for modification, 1158940 by rtx copying) 0 mem_attrs dropped and with the patch to clear after each function 364411 mem_attrs created (144478 for new, 219933 for modification) 1761303 mem_attrs shared (2005 for new, 600358 for modification, 1158940 by rtx copying) 0 mem_attrs dropped while for dwarf2out.c I see without the clearing 24399 mem_attrs created (6929 for new, 17470 for modification) 102676 mem_attrs shared (10878 for new, 29265 for modification, 62533 by rtx copying) 16 mem_attrs dropped which means that completely dropping the sharing would result in creating of 6929 + 17807 + 62533(!) vs. 24399 mem-attrs. That's still not a lot overhead given that mem-attrs take 40 bytes (3MB vs. 950kB). There is also always the possibility to explicitely ref-count mem-attrs to handle sharing by rtx
Re: [PATCH][2/2] Fix expansion slowness of PR60291
On Fri, 21 Feb 2014, Richard Biener wrote: On Fri, 21 Feb 2014, Richard Sandiford wrote: Richard Biener rguent...@suse.de writes: On Fri, 21 Feb 2014, Richard Biener wrote: On Fri, 21 Feb 2014, Richard Biener wrote: On Fri, 21 Feb 2014, Richard Biener wrote: This fixes the slowness of RTL expansion in PR60291 which is caused by excessive collisions in mem-attr sharing. The issue here is that sharing attempts happens across functions and we have a _lot_ of functions in this testcase referencing the same lexically equivalent memory, for example MEM[(StgWord *)_5 + -64B]. That means those get the same hash value. But they don't compare equal because an SSA name _5 from function A is of course not equal to one from function B. The following fixes that by not doing mem-attr sharing across functions by clearing the mem-attrs hashtable in rest_of_clean_state. Another fix may be to do what the comment in iterative_hash_expr says for SSA names: case SSA_NAME: /* We can just compare by pointer. */ return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val); (probably blame me for changing that to hashing the SSA version). It was lxo. But I'm not sure that doesn't uncover issues with various hashtables and walking them, generating code dependent on the order. It's IMHO just not expected that you compare function-local expressions from different functions. Same speedup result from Index: gcc/tree.c === --- gcc/tree.c (revision 207960) +++ gcc/tree.c (working copy) @@ -7428,7 +7428,7 @@ iterative_hash_expr (const_tree t, hashv } case SSA_NAME: /* We can just compare by pointer. */ - return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val); + return iterative_hash_hashval_t ((uintptr_t)t3, val); case PLACEHOLDER_EXPR: /* The node itself doesn't matter. */ return val; and from Index: gcc/tree.c === --- gcc/tree.c (revision 207960) +++ gcc/tree.c (working copy) @@ -7428,7 +7428,9 @@ iterative_hash_expr (const_tree t, hashv } case SSA_NAME: /* We can just compare by pointer. */ - return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val); + return iterative_hash_host_wide_int + (DECL_UID (cfun-decl), + iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val)); case PLACEHOLDER_EXPR: /* The node itself doesn't matter. */ return val; better than hashing pointers but requring cfun != NULL in this function isn't good either. At this point I'm more comfortable with clearing the hashtable than with changing iterative_hash_expr in any way. It's also along the way to get rid of the hash completely. Oh, btw, the speedup is going from expand : 481.98 (94%) usr 1.15 (17%) sys 481.94 (93%) wall 293891 kB (15%) ggc to expand : 2.66 ( 7%) usr 0.13 ( 2%) sys 2.64 ( 6%) wall 262544 kB (13%) ggc at -O0 (less dramatic slowness for -On). The other thing would be to discard mem-attr sharing alltogether, but that doesn't seem appropriate at this stage (but it would also simplify quite some code). With only one function in RTL at a time that shouldn't be too bad (see several suggestions along that line, even with statistics). Last statistics: http://gcc.gnu.org/ml/gcc-patches/2011-08/msg01784.html With the patch below to get some statistics we see that one important piece of sharing not covered by above measurements is RTX copying(?). On the testcase for this PR I get at -O1 and without the patch to clear the hashtable after each function 142489 mem_attrs created (142439 for new, 50 for modification) 1983225 mem_attrs shared (4044 for new, 820241 for modification, 1158940 by rtx copying) 0 mem_attrs dropped and with the patch to clear after each function 364411 mem_attrs created (144478 for new, 219933 for modification) 1761303 mem_attrs shared (2005 for new, 600358 for modification, 1158940 by rtx copying) 0 mem_attrs dropped while for dwarf2out.c I see without the clearing 24399 mem_attrs created (6929 for new, 17470 for modification) 102676 mem_attrs shared (10878 for new, 29265 for modification, 62533 by rtx copying) 16 mem_attrs dropped which means that completely dropping the sharing would result in creating of 6929 + 17807 + 62533(!) vs. 24399
C++ PATCH for c++/60167 (reference template parameters)
My patch for 58606 was incomplete; there were other places that needed to change to handle dereferencing reference non-type template parameters. Tested x86_64-pc-linux-gnu, applying to trunk. I reverted the earlier 58606 patch on the 4.8 branch. commit 7b1bb4515ae768ca44e192442d2578ea46c16f96 Author: Jason Merrill ja...@redhat.com Date: Thu Feb 20 23:22:21 2014 -0500 PR c++/60167 PR c++/60222 PR c++/58606 * parser.c (cp_parser_template_argument): Restore dereference. * pt.c (template_parm_to_arg): Dereference non-pack expansions too. (process_partial_specialization): Handle deref. (unify): Likewise. diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c index 4673f78..d8ccd2b 100644 --- a/gcc/cp/parser.c +++ b/gcc/cp/parser.c @@ -13937,6 +13937,7 @@ cp_parser_template_argument (cp_parser* parser) if (INDIRECT_REF_P (argument)) { + /* Strip the dereference temporarily. */ gcc_assert (REFERENCE_REF_P (argument)); argument = TREE_OPERAND (argument, 0); } @@ -13975,6 +13976,8 @@ cp_parser_template_argument (cp_parser* parser) if (address_p) argument = build_x_unary_op (loc, ADDR_EXPR, argument, tf_warning_or_error); + else + argument = convert_from_reference (argument); return argument; } } diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index 6477fce..4cf387a 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -3861,6 +3861,8 @@ template_parm_to_arg (tree t) SET_ARGUMENT_PACK_ARGS (t, vec); TREE_TYPE (t) = type; } + else + t = convert_from_reference (t); } return t; } @@ -4218,10 +4220,12 @@ process_partial_specialization (tree decl) if (/* These first two lines are the `non-type' bit. */ !TYPE_P (arg) TREE_CODE (arg) != TEMPLATE_DECL - /* This next line is the `argument expression is not just a + /* This next two lines are the `argument expression is not just a simple identifier' condition and also the `specialized non-type argument' bit. */ - TREE_CODE (arg) != TEMPLATE_PARM_INDEX) + TREE_CODE (arg) != TEMPLATE_PARM_INDEX + !(REFERENCE_REF_P (arg) + TREE_CODE (TREE_OPERAND (arg, 0)) == TEMPLATE_PARM_INDEX)) { if ((!packed_args tpd.arg_uses_template_parms[i]) || (packed_args uses_template_parms (arg))) @@ -17893,6 +17897,12 @@ unify (tree tparms, tree targs, tree parm, tree arg, int strict, /* Unification fails if we hit an error node. */ return unify_invalid (explain_p); +case INDIRECT_REF: + if (REFERENCE_REF_P (parm)) + return unify (tparms, targs, TREE_OPERAND (parm, 0), arg, + strict, explain_p); + /* FALLTHRU */ + default: /* An unresolved overload is a nondeduced context. */ if (is_overloaded_fn (parm) || type_unknown_p (parm)) diff --git a/gcc/testsuite/g++.dg/template/ref7.C b/gcc/testsuite/g++.dg/template/ref7.C new file mode 100644 index 000..f6395e2 --- /dev/null +++ b/gcc/testsuite/g++.dg/template/ref7.C @@ -0,0 +1,10 @@ +// PR c++/60167 + +template int F +struct Foo { + typedef int Bar; + + static Bar cache; +}; + +template int F typename FooF::Bar FooF::cache; diff --git a/gcc/testsuite/g++.dg/template/ref8.C b/gcc/testsuite/g++.dg/template/ref8.C new file mode 100644 index 000..a2fc847 --- /dev/null +++ b/gcc/testsuite/g++.dg/template/ref8.C @@ -0,0 +1,8 @@ +// PR c++/60222 + +templateint struct A +{ + templatetypename struct B; + + templatetypename T struct BT* {}; +};
C++ PATCH for c++/60251 (ICE with VLA capture)
is_normal_capture_proxy got confused by the contortions we go through to build up a capture proxy for a VLA capture, so it's easier to just check for variably modified type. Tested x86_64-pc-linux-gnu, applying to trunk. commit 1fa864d218992c8a1b9b1fd4fae2205d5572205b Author: Jason Merrill ja...@redhat.com Date: Thu Feb 20 23:35:28 2014 -0500 PR c++/60251 * lambda.c (is_normal_capture_proxy): Handle VLA capture. diff --git a/gcc/cp/lambda.c b/gcc/cp/lambda.c index 8bb820d..ad993e9d 100644 --- a/gcc/cp/lambda.c +++ b/gcc/cp/lambda.c @@ -250,6 +250,10 @@ is_normal_capture_proxy (tree decl) /* It's not a capture proxy. */ return false; + if (variably_modified_type_p (TREE_TYPE (decl), NULL_TREE)) +/* VLA capture. */ +return true; + /* It is a capture proxy, is it a normal capture? */ tree val = DECL_VALUE_EXPR (decl); if (val == error_mark_node) diff --git a/gcc/testsuite/g++.dg/cpp1y/vla11.C b/gcc/testsuite/g++.dg/cpp1y/vla11.C new file mode 100644 index 000..c9cdade --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp1y/vla11.C @@ -0,0 +1,8 @@ +// PR c++/60251 +// { dg-options -std=c++1y -pedantic-errors } + +void foo(int n) +{ + int x[n]; + [x]() { decltype(x) y; }; // { dg-error decltype of array of runtime bound } +}
C++ PATCH for c++/60250 (ICE with invalid array bound)
A type-dependent expression can have NULL TREE_TYPE, and if we wrap it in a NOP_EXPR also with NULL type, that confuses things. Let's not try to do that. Tested x86_64-pc-linux-gnu, applying to trunk. commit 5564347b2b7b39d92f8f3b8307bc8ed8551e4d91 Author: Jason Merrill ja...@redhat.com Date: Thu Feb 20 23:46:00 2014 -0500 PR c++/60250 * parser.c (cp_parser_direct_declarator): Don't wrap a type-dependent expression in a NOP_EXPR. diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c index d8ccd2b..d6c176f 100644 --- a/gcc/cp/parser.c +++ b/gcc/cp/parser.c @@ -17233,7 +17233,8 @@ cp_parser_direct_declarator (cp_parser* parser, array bound is not an integer constant); bounds = error_mark_node; } - else if (processing_template_decl) + else if (processing_template_decl + !type_dependent_expression_p (bounds)) { /* Remember this wasn't a constant-expression. */ bounds = build_nop (TREE_TYPE (bounds), bounds); diff --git a/gcc/testsuite/g++.dg/cpp1y/vla12.C b/gcc/testsuite/g++.dg/cpp1y/vla12.C new file mode 100644 index 000..df47f26 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp1y/vla12.C @@ -0,0 +1,7 @@ +// PR c++/60250 +// { dg-options -std=c++1y -pedantic-errors } + +templatetypename void foo() +{ + typedef int T[ ([](){ return 1; }()) ]; // { dg-error runtime bound } +}
Re: [AArch64 01/14] Use generic target, if no other default.
Hi Philipp, On 18/02/14 21:09, Philipp Tomsich wrote: The default target should be generic, as Cortex-A53 includes optional ISA features (CRC and CRYPTO) that are not required for architectural compliance. The key difference between generic (which already uses the cortexa53 pipeline model for scheduling) is the absence of any optional ISA features in the generic target. --- gcc/config/aarch64/aarch64.c | 2 +- gcc/config/aarch64/aarch64.h | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 784bfa3..70dda00 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -5244,7 +5244,7 @@ aarch64_override_options (void) /* If the user did not specify a processor, choose the default one for them. This will be the CPU set during configuration using - --with-cpu, otherwise it is cortex-a53. */ + --with-cpu, otherwise it is generic. */ if (!selected_cpu) { selected_cpu = all_cores[TARGET_CPU_DEFAULT 0x3f]; diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index 13c424c..b66a6b4 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -472,10 +472,10 @@ enum target_cpus TARGET_CPU_generic }; -/* If there is no CPU defined at configure, use cortex-a53 as default. */ +/* If there is no CPU defined at configure, use generic as default. */ #ifndef TARGET_CPU_DEFAULT #define TARGET_CPU_DEFAULT \ - (TARGET_CPU_cortexa53 | (AARCH64_CPU_DEFAULT_FLAGS 6)) + (TARGET_CPU_generic | (AARCH64_CPU_DEFAULT_FLAGS 6)) #endif /* The processor for which instructions should be scheduled. */ I don't think this approach will work. The bug we have here is that in config.gcc when processing a --with-arch directive it will use the CPU flags of the sample cpu given for the architecture in aarch64-arches.def. This will cause it to use cortex-a53+fp+simd+crypto+crc when asked to configure for --with-arch=armv8-a. Instead it should be using the 4th field of the AARCH64_ARCH which specifies the ISA flags implied by the architecture. Then we would get cortex-a53+fp+simd. Also, if no --with-arch or --with-cpu is specified, config.gcc will still specify TARGET_CPU_DEFAULT as TARGET_CPU_generic but without encoding the ISA flags (AARCH64_FL_FOR_ARCH8 in this case) for it in the upper bits of TARGET_CPU_DEFAULT, leading to an always defined TARGET_CPU_DEFAULT which will cause the last hunk in this patch to never be used and configuring. I'm working on a fix for these issues. HTH, Kyrill
C++ PATCH for c++/60252 (ICE with VLA in lambda parameter)
While parsing the template parameter list for a lambda, we've already pushed into the closure class but haven't created the op() FUNCTION_DECL, so trying to capture 'this' by way of the 'this' pointer of op() breaks. Avoid the ICE by not trying to capture 'this' when parsing a parameter list. Tested x86_64-pc-linux-gnu, applying to trunk. commit 415022d49d1cee84b6d2085e7585e1d801d15732 Author: Jason Merrill ja...@redhat.com Date: Fri Feb 21 00:35:35 2014 -0500 PR c++/60252 * lambda.c (maybe_resolve_dummy): Don't try to capture this in declaration context. diff --git a/gcc/cp/lambda.c b/gcc/cp/lambda.c index ad993e9d..7fe235b 100644 --- a/gcc/cp/lambda.c +++ b/gcc/cp/lambda.c @@ -749,7 +749,10 @@ maybe_resolve_dummy (tree object) if (type != current_class_type current_class_type LAMBDA_TYPE_P (current_class_type) - DERIVED_FROM_P (type, current_nonlambda_class_type ())) + DERIVED_FROM_P (type, current_nonlambda_class_type ()) + /* If we get here while parsing the parameter list of a lambda, it + will fail, so don't even try (c++/60252). */ + current_binding_level-kind != sk_function_parms) { /* In a lambda, need to go through 'this' capture. */ tree lam = CLASSTYPE_LAMBDA_EXPR (current_class_type); diff --git a/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-ice11.C b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-ice11.C new file mode 100644 index 000..58f0fa3 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-ice11.C @@ -0,0 +1,12 @@ +// PR c++/60252 +// { dg-require-effective-target c++11 } + +struct A +{ + int i; // { dg-message } + + void foo() + { +[](){ [](int[i]){}; }; // { dg-error } + } +};
C++ PATCH for c++/60248 (ICE with variadic template)
mangle_decl shouldn't try to make a forward-compatibility alias for a TYPE_DECL, since they don't have symbols. Tested x86_64-pc-linux-gnu, applying to trunk, 4.7, 4.8. commit 8d40d9322f567ba5720ac807168232ae3c5ee0e4 Author: Jason Merrill ja...@redhat.com Date: Fri Feb 21 00:39:25 2014 -0500 PR c++/60248 * mangle.c (mangle_decl): Don't make an alias for a TYPE_DECL. diff --git a/gcc/cp/mangle.c b/gcc/cp/mangle.c index 7bb6f4b..251edb1 100644 --- a/gcc/cp/mangle.c +++ b/gcc/cp/mangle.c @@ -3485,6 +3485,7 @@ mangle_decl (const tree decl) if (G.need_abi_warning /* Don't do this for a fake symbol we aren't going to emit anyway. */ + TREE_CODE (decl) != TYPE_DECL !DECL_MAYBE_IN_CHARGE_CONSTRUCTOR_P (decl) !DECL_MAYBE_IN_CHARGE_DESTRUCTOR_P (decl)) { diff --git a/gcc/testsuite/g++.dg/cpp0x/variadic149.C b/gcc/testsuite/g++.dg/cpp0x/variadic149.C new file mode 100644 index 000..a250e7c --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/variadic149.C @@ -0,0 +1,11 @@ +// PR c++/60248 +// { dg-options -std=c++11 -g -fabi-version=2 } + +templateint... struct A {}; + +template struct A0 +{ + typedef enum { e } B; +}; + +A0 a;
C++ PATCH for c++/60224 (ICE initializing array with PMF)
We shouldn't treat a CONSTRUCTOR as an init-list if it already has a type. Tested x86_64-pc-linux-gnu, applying to trunk. commit 8e1493a7a31ffdb1e70977c325e7d2f2686b14a7 Author: Jason Merrill ja...@redhat.com Date: Fri Feb 21 00:52:20 2014 -0500 PR c++/60224 * decl.c (cp_complete_array_type, maybe_deduce_size_from_array_init): Don't get confused by a CONSTRUCTOR that already has a type. diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c index b7d2d9f..04c4cf5 100644 --- a/gcc/cp/decl.c +++ b/gcc/cp/decl.c @@ -4880,7 +4880,7 @@ maybe_deduce_size_from_array_init (tree decl, tree init) those are not supported in GNU C++, and as the middle-end will crash if presented with a non-numeric designated initializer. */ - if (initializer TREE_CODE (initializer) == CONSTRUCTOR) + if (initializer BRACE_ENCLOSED_INITIALIZER_P (initializer)) { vecconstructor_elt, va_gc *v = CONSTRUCTOR_ELTS (initializer); constructor_elt *ce; @@ -7099,6 +7099,11 @@ cp_complete_array_type (tree *ptype, tree initial_value, bool do_default) int failure; tree type, elt_type; + /* Don't get confused by a CONSTRUCTOR for some other type. */ + if (initial_value TREE_CODE (initial_value) == CONSTRUCTOR + !BRACE_ENCLOSED_INITIALIZER_P (initial_value)) +return 1; + if (initial_value) { unsigned HOST_WIDE_INT i; diff --git a/gcc/testsuite/g++.dg/init/array36.C b/gcc/testsuite/g++.dg/init/array36.C new file mode 100644 index 000..77e4f90 --- /dev/null +++ b/gcc/testsuite/g++.dg/init/array36.C @@ -0,0 +1,8 @@ +// PR c++/60224 + +struct A {}; + +void foo() +{ + bool b[] = (int (A::*)())0; // { dg-error } +}
C++ PATCH for c++/60219 (ICE with invalid variadics)
In coerce_template_parms, if we try to pack the remaining arguments into an argument pack and that fails, we should immediately stop trying to process more arguments. Tested x86_64-pc-linux-gnu, applying to trunk and 4.8. commit 1555baa24f537d0e724c53845e7ba2881df7a77f Author: Jason Merrill ja...@redhat.com Date: Fri Feb 21 01:05:42 2014 -0500 PR c++/60219 * pt.c (coerce_template_parms): Bail if argument packing fails. diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index 3e464ff..0729d93 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -6808,6 +6808,8 @@ coerce_template_parms (tree parms, /* Store this argument. */ if (arg == error_mark_node) lost++; + if (lost) + break; TREE_VEC_ELT (new_inner_args, parm_idx) = arg; /* We are done with all of the arguments. */ diff --git a/gcc/testsuite/g++.dg/cpp0x/variadic150.C b/gcc/testsuite/g++.dg/cpp0x/variadic150.C new file mode 100644 index 000..6a30efe --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/variadic150.C @@ -0,0 +1,9 @@ +// PR c++/60219 +// { dg-require-effective-target c++11 } + +templatetypename..., int void foo(); + +void bar() +{ + foo0; // { dg-error } +}
C++ PATCH for c++/60216 (ICE with specialization of deleted template)
We need to propagate DECL_DELETED_FN to clones when we get a new specialization. Tested x86_64-pc-linux-gnu, applying to trunk and 4.8. commit eaf1689e134ff4fb364c0045965b19879bff8f32 Author: Jason Merrill ja...@redhat.com Date: Fri Feb 21 08:47:01 2014 -0500 PR c++/60216 * pt.c (register_specialization): Copy DECL_DELETED_FN to clones. (check_explicit_specialization): Don't clone. diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index 0729d93..f07f6e6 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -1440,6 +1440,8 @@ register_specialization (tree spec, tree tmpl, tree args, bool is_friend, = DECL_DECLARED_INLINE_P (fn); DECL_SOURCE_LOCATION (clone) = DECL_SOURCE_LOCATION (fn); + DECL_DELETED_FN (clone) + = DECL_DELETED_FN (fn); } check_specialization_namespace (tmpl); @@ -2770,15 +2772,16 @@ check_explicit_specialization (tree declarator, It's just the name of an instantiation. But, it's not a request for an instantiation, either. */ SET_DECL_IMPLICIT_INSTANTIATION (decl); - else if (DECL_CONSTRUCTOR_P (decl) || DECL_DESTRUCTOR_P (decl)) - /* This is indeed a specialization. In case of constructors - and destructors, we need in-charge and not-in-charge - versions in V3 ABI. */ - clone_function_decl (decl, /*update_method_vec_p=*/0); /* Register this specialization so that we can find it again. */ decl = register_specialization (decl, gen_tmpl, targs, is_friend, 0); + + /* A 'structor should already have clones. */ + gcc_assert (decl == error_mark_node + || !(DECL_CONSTRUCTOR_P (decl) + || DECL_DESTRUCTOR_P (decl)) + || DECL_CLONED_FUNCTION_P (DECL_CHAIN (decl))); } } diff --git a/gcc/testsuite/g++.dg/cpp0x/deleted3.C b/gcc/testsuite/g++.dg/cpp0x/deleted3.C new file mode 100644 index 000..6783677 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/deleted3.C @@ -0,0 +1,11 @@ +// PR c++/60216 +// { dg-require-effective-target c++11 } + +struct A +{ + templatetypename T A(T) = delete; +}; + +template A::Aint(int) {} + +A a(0);
Re: [PATCH][2/2] Fix expansion slowness of PR60291
On Fri, 21 Feb 2014, Richard Biener wrote: On Fri, 21 Feb 2014, Richard Biener wrote: On Fri, 21 Feb 2014, Richard Sandiford wrote: In a thread a few years ago you talked about the possibility of going further and folding the attributes into the MEM itself, so avoiding the indirection and separate allocation: http://thread.gmane.org/gmane.comp.gcc.patches/244464/focus=244538 (and earlier posts in the thread). Would that still be OK? I might have a go if so. It would work for me. Micha just brought up the easiest incremental change though, which is ... I am testing the following (and also consider it appropriate as a fix for the regression PR60291). Ok for trunk/branch(es)? Now we have many variants to choose from ;) Jakub requested statistics for a bootstrap for this one. I get for r207939 and a --enable-languages=c x86_64 bootstrap 3609924 mem-attrs created overall without the patch and 8268976 with the patch (that's a factor of 2.3 and thus nothing). Richard.
[PATCH, PR 60266] Fix problem with mixing -O0 and -O2 in propagate_constants_accross_call
Hi, in propagate_constants_accross_call we expect a thunk to have at least one parameter and thus an ipa-prop parameter descriptor. However, when the callee comes from a CU that was compiled with -O0, there are no parameter descriptors and we fail an index checking assert. This patch fixes it by bailing out early if there are no parameter descriptors because in that case there is nothing to do in that function anyway. Bootstrap and testing in progress, OK for trunk if it passes? Thanks, Martin 2014-02-21 Martin Jambor mjam...@suse.cz PR ipa/60266 * ipa-cp.c (propagate_constants_accross_call): Bail out early if there are no parameter descriptors. diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c index 7d8bc05..4c9ab12 100644 --- a/gcc/ipa-cp.c +++ b/gcc/ipa-cp.c @@ -1428,6 +1428,8 @@ propagate_constants_accross_call (struct cgraph_edge *cs) args = IPA_EDGE_REF (cs); args_count = ipa_get_cs_argument_count (args); parms_count = ipa_get_param_count (callee_info); + if (parms_count == 0) +return false; /* If this call goes through a thunk we must not propagate to the first (0th) parameter. However, we might need to uncover a thunk from below a series
Re: [PATCH, ARM] Support ORN for DImode
On 19/02/14 10:18, Ian Bolton wrote: Hi, Patterns had previously been added to thumb2.md to support ORN, but only for SImode. This patch adds DImode support, to cover the full 64|64-64 operation and the various 32|64-64 operations (see AND:DI variants that use NOT). The patch comes with its own execution test and looks for correct number of ORN instructions in the assembly. Regressions passed. OK for stage 1? OK. Do you not also need a pattern for (ior:DI (not:DI (reg:DI)) (zero_extend:DI (reg:SI)) - orn (lowpart)+ mvn(highpart) I don't think one works for sign-extension, though. R. 2014-02-19 Ian Bolton ian.bol...@arm.com gcc/ * config/arm/thumb2.md (*iordi_notdi_di): New pattern. (*iordi_notzesidi): New pattern. (*iordi_notsesidi_di): New pattern. testsuite/ * gcc.target/arm/iordi_notdi-1.c: New test.
[libstdc++-v3] Move shared_mutex to shared_timed_mutex - late C++14 change (n3891)
This are the patches as applied Built and tested x86_64-linux. 2014-02-20 Ed Smith-Rowland 3dw...@verizon.net Rename shared_mutex to shared_timed_mutex per C++14 acceptance of N3891. * include/std/shared_mutex: Rename shared_mutex to shared_timed_mutex. * testsuite/30_threads/shared_lock/locking/2.cc: Ditto. * testsuite/30_threads/shared_lock/locking/4.cc: Ditto. * testsuite/30_threads/shared_lock/locking/1.cc: Ditto. * testsuite/30_threads/shared_lock/locking/3.cc: Ditto. * testsuite/30_threads/shared_lock/requirements/ explicit_instantiation.cc: Ditto. * testsuite/30_threads/shared_lock/requirements/typedefs.cc: Ditto. * testsuite/30_threads/shared_lock/cons/2.cc: Ditto. * testsuite/30_threads/shared_lock/cons/4.cc: Ditto. * testsuite/30_threads/shared_lock/cons/1.cc: Ditto. * testsuite/30_threads/shared_lock/cons/6.cc: Ditto. * testsuite/30_threads/shared_lock/cons/3.cc: Ditto. * testsuite/30_threads/shared_lock/cons/5.cc: Ditto. * testsuite/30_threads/shared_lock/modifiers/2.cc: Ditto. * testsuite/30_threads/shared_lock/modifiers/1.cc: Ditto. * testsuite/30_threads/shared_mutex/requirements/ standard_layout.cc: Ditto. * testsuite/30_threads/shared_mutex/cons/copy_neg.cc: Ditto. * testsuite/30_threads/shared_mutex/cons/1.cc: Ditto. * testsuite/30_threads/shared_mutex/cons/assign_neg.cc: Ditto. * testsuite/30_threads/shared_mutex/try_lock/2.cc: Ditto. * testsuite/30_threads/shared_mutex/try_lock/1.cc: Ditto. 2014-02-21 Ed Smith-Rowland 3dw...@verizon.net Rename testsuite directory shared_mutex to shared_timed_mutex for consistency. * testsuite/30_threads/shared_mutex: Moved to... * testsuite/30_threads/shared_timed_mutex: ...here Index: include/std/shared_mutex === --- include/std/shared_mutex(revision 207061) +++ include/std/shared_mutex(working copy) @@ -52,8 +52,8 @@ */ #if defined(_GLIBCXX_HAS_GTHREADS) defined(_GLIBCXX_USE_C99_STDINT_TR1) - /// shared_mutex - class shared_mutex + /// shared_timed_mutex + class shared_timed_mutex { #if _GTHREAD_USE_MUTEX_TIMEDLOCK struct _Mutex : mutex, __timed_mutex_impl_Mutex @@ -84,15 +84,15 @@ static constexpr unsigned _M_n_readers = ~_S_write_entered; public: -shared_mutex() : _M_state(0) {} +shared_timed_mutex() : _M_state(0) {} -~shared_mutex() +~shared_timed_mutex() { _GLIBCXX_DEBUG_ASSERT( _M_state == 0 ); } -shared_mutex(const shared_mutex) = delete; -shared_mutex operator=(const shared_mutex) = delete; +shared_timed_mutex(const shared_timed_mutex) = delete; +shared_timed_mutex operator=(const shared_timed_mutex) = delete; // Exclusive ownership Index: testsuite/30_threads/shared_lock/locking/2.cc === --- testsuite/30_threads/shared_lock/locking/2.cc (revision 205961) +++ testsuite/30_threads/shared_lock/locking/2.cc (working copy) @@ -30,7 +30,7 @@ void test01() { bool test __attribute__((unused)) = true; - typedef std::shared_mutex mutex_type; + typedef std::shared_timed_mutex mutex_type; typedef std::shared_lockmutex_type lock_type; try @@ -66,7 +66,7 @@ void test02() { bool test __attribute__((unused)) = true; - typedef std::shared_mutex mutex_type; + typedef std::shared_timed_mutex mutex_type; typedef std::shared_lockmutex_type lock_type; try Index: testsuite/30_threads/shared_lock/locking/4.cc === --- testsuite/30_threads/shared_lock/locking/4.cc (revision 205961) +++ testsuite/30_threads/shared_lock/locking/4.cc (working copy) @@ -31,7 +31,7 @@ int main() { bool test __attribute__((unused)) = true; - typedef std::shared_mutex mutex_type; + typedef std::shared_timed_mutex mutex_type; typedef std::shared_lockmutex_type lock_type; typedef std::chrono::system_clock clock_type; Index: testsuite/30_threads/shared_lock/locking/1.cc === --- testsuite/30_threads/shared_lock/locking/1.cc (revision 205961) +++ testsuite/30_threads/shared_lock/locking/1.cc (working copy) @@ -30,7 +30,7 @@ int main() { bool test __attribute__((unused)) = true; - typedef std::shared_mutex mutex_type; + typedef std::shared_timed_mutex mutex_type; typedef std::shared_lockmutex_type lock_type; try Index: testsuite/30_threads/shared_lock/locking/3.cc === --- testsuite/30_threads/shared_lock/locking/3.cc (revision 205961) +++ testsuite/30_threads/shared_lock/locking/3.cc (working copy) @@ -31,7 +31,7 @@ int main() { bool test
C++ PATCH for c++/60051 (ICE deducing array)
This patch benefits from the discussion of array deduction at last week's C++ standardization committee meeting, where we clarified that we should only try to deduce the array bound from an initializer-list if the array bound is deducible, i.e. if it's a non-type template parameter. We also should avoid crashing on a 0-length init-list which would result in an invalid 0-length array. Tested x86_64-pc-linux-gnu, applying to trunk. commit 8fc69de2c377470b3ae9a8ebc65b0909d626d6e3 Author: Jason Merrill ja...@redhat.com Date: Fri Feb 21 00:16:52 2014 -0500 DR 1591 PR c++/60051 * pt.c (unify): Only unify if deducible. Handle 0-length list. diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index 4cf387a..0f576a5 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -17262,14 +17262,16 @@ unify (tree tparms, tree targs, tree parm, tree arg, int strict, explain_p); } - if (TREE_CODE (parm) == ARRAY_TYPE) + if (TREE_CODE (parm) == ARRAY_TYPE + deducible_array_bound (TYPE_DOMAIN (parm))) { /* Also deduce from the length of the initializer list. */ tree max = size_int (CONSTRUCTOR_NELTS (arg)); tree idx = compute_array_index_type (NULL_TREE, max, tf_none); - if (TYPE_DOMAIN (parm) != NULL_TREE) - return unify_array_domain (tparms, targs, TYPE_DOMAIN (parm), - idx, explain_p); + if (idx == error_mark_node) + return unify_invalid (explain_p); + return unify_array_domain (tparms, targs, TYPE_DOMAIN (parm), + idx, explain_p); } /* If the std::initializer_listT deduction worked, replace the diff --git a/gcc/testsuite/g++.dg/cpp0x/initlist80.C b/gcc/testsuite/g++.dg/cpp0x/initlist80.C new file mode 100644 index 000..7947f1f --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/initlist80.C @@ -0,0 +1,6 @@ +// PR c++/60051 +// { dg-require-effective-target c++11 } + +#include initializer_list + +auto x[2] = {}; // { dg-error }
Re: Fwd: [RFC][gomp4] Offloading patches (2/3): Add tables generation
2014-02-20 22:27 GMT+04:00 Bernd Schmidt ber...@codesourcery.com: There were still a number of things in these patches that did not make sense to me and which I've changed. Let me know if there was a good reason for the way some of these things were originally done. * Functions and variables now go into different tables, otherwise intermixing between them could be a problem that causes tables to go out of sync between host and target (imagine one big table being generated by ptx lto1/mkoffload, and multiple small table fragments being linked together on the host side). What do you mean by multiple small table fragments? The tables from every object file should be joined together while linking DSO in the same order for both host and target. If you need to join tables from multiple target images into one big table, the host tables also should be joined in the same order. In our case we're obtaining each target table while loading the image to target device, and merging it with a corresponding host table. How splitting functions and global vars into 2 tables will help to avoid intermixing? * Is there a reason to call a register function for the host tables? The way I've set it up, we register a target function/variable table while also passing a pointer to the __OPENMP_TARGET__ symbol which holds information about the host side tables. Suppose there is liba, that depends on libb, that depends on libc. Also corresponding target image tgtimga depends on tgtimgb, that depends on tgtimgc. When liba is going to start offloaded function, it calls GOMP_target with a pointer to its descriptor, which contains a pointer to tgtimga. But how does GOMP_target know that it should also load tgtimgb and tgtimgc to target? And where to get their descriptors from? That's why we have added host-side DSO registration. In this example they are loaded on host in the following order: libc, libb, liba. In the same order they are registered in libgomp, and loaded to target device while initialization. In the same order the tables received from target are merged with the host tables from the descriptors. I'm appending those parts of my current patch kit that seem relevant. This includes the ptx mkoffload tool and a patch to make a dummy GOMP_offload_register function. Most of the others are updated versions of patches I've posted before, and two adapted from Michael Zolotukhin's set (automatically generated files not included in the diffs for size reasons). How does this look? I will take a closer look at you changes, try to run it, and send feedback next week. -- Ilya
Re: [PATCH][i386][AVX512] Match latest spec. Add CPUID prefetchwt1.
Latest version of AVX512 spec http://download-software.intel.com/sites/default/files/managed/50/1a/319433-018.pdf Has a few changes. 1)PREFETCHWT1 instruction now has separate CPUID bit PREFETCHWT1. We can either support new CPUID or disable PREFETCHWT1 from generating, without removing code, and enable it in 4.9.1/latest version. I am not sure that adding new -m flag and related stuff this late is a good idea. Should still add it? Please submit the patch anyway. We can relax release constraints on non-algorithmic patch a bit, weighting in benefits of having gcc release that fully conforms to some published specification. Patch bellow add -mprefetchwt1 flag, corresponding TARGET_PREFETCHWT1, and uses them for prefetchwt1 instruction. Bootstraps/passes testing. Ok for trunk? ChangeLog: 2014-02-21 Ilya Tocar ilya.to...@intel.com * common/config/i386/i386-common.c (OPTION_MASK_ISA_PREFETCHWT1_SET), (OPTION_MASK_ISA_PREFETCHWT1_UNSET): New. (ix86_handle_option): Handle OPT_mprefetchwt1. * config/i386/cpuid.h (bit_PREFETCHWT1): New. * config/i386/driver-i386.c (host_detect_local_cpu): Detect PREFETCHWT1 CPUID. * config/i386/i386-c.c (ix86_target_macros_internal): Handle OPTION_MASK_ISA_PREFETCHWT1. * config/i386/i386.c (ix86_target_string): Handle mprefetchwt1. (PTA_PREFETCHWT1): New. (ix86_option_override_internal): Handle PTA_PREFETCHWT1. (ix86_valid_target_attribute_inner_p): Handle OPT_mprefetchwt1. * config/i386/i386.h (TARGET_PREFETCHWT1), (TARGET_PREFETCHWT1_P): New. * config/i386/i386.md (prefetch): Check TARGET_PREFETCHWT1 (*prefetch_avx512pf_mode_: Change into ... (*prefetch_prefetchwt1_mode: This. * config/i386/i386.opt (mprefetchwt1): New. * config/i386/xmmintrin.h (_mm_hint): Add _MM_HINT_ET1. (_mm_prefetch): Handle intent to write. * doc/invoke.texi (mprefetchwt1), (mno-prefetchwt1): Doccument. And for tests: 2014-02-22 Ilya Tocar ilya.to...@intel.com * gcc.target/i386/avx-1.c: Update __builtin_prefetch. * gcc.target/i386/prefetchwt1-1.c: New. * gcc.target/i386/sse-13.c: Update __builtin_prefetch. * gcc.target/i386/sse-23.c: Ditto. --- gcc/common/config/i386/i386-common.c | 15 +++ gcc/config/i386/cpuid.h | 4 gcc/config/i386/driver-i386.c | 7 +-- gcc/config/i386/i386-c.c | 2 ++ gcc/config/i386/i386.c| 6 ++ gcc/config/i386/i386.h| 2 ++ gcc/config/i386/i386.md | 13 ++--- gcc/config/i386/i386.opt | 4 gcc/config/i386/xmmintrin.h | 6 -- gcc/doc/invoke.texi | 4 +++- gcc/testsuite/gcc.target/i386/avx-1.c | 2 +- gcc/testsuite/gcc.target/i386/prefetchwt1-1.c | 14 ++ gcc/testsuite/gcc.target/i386/sse-13.c| 2 +- gcc/testsuite/gcc.target/i386/sse-23.c| 2 +- 14 files changed, 68 insertions(+), 15 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/prefetchwt1-1.c diff --git a/gcc/common/config/i386/i386-common.c b/gcc/common/config/i386/i386-common.c index b7f9ff6..a6ab555 100644 --- a/gcc/common/config/i386/i386-common.c +++ b/gcc/common/config/i386/i386-common.c @@ -69,6 +69,7 @@ along with GCC; see the file COPYING3. If not see #define OPTION_MASK_ISA_PRFCHW_SET OPTION_MASK_ISA_PRFCHW #define OPTION_MASK_ISA_RDSEED_SET OPTION_MASK_ISA_RDSEED #define OPTION_MASK_ISA_ADX_SET OPTION_MASK_ISA_ADX +#define OPTION_MASK_ISA_PREFETCHWT1_SET OPTION_MASK_ISA_PREFETCHWT1 /* SSE4 includes both SSE4.1 and SSE4.2. -msse4 should be the same as -msse4.2. */ @@ -154,6 +155,7 @@ along with GCC; see the file COPYING3. If not see #define OPTION_MASK_ISA_PRFCHW_UNSET OPTION_MASK_ISA_PRFCHW #define OPTION_MASK_ISA_RDSEED_UNSET OPTION_MASK_ISA_RDSEED #define OPTION_MASK_ISA_ADX_UNSET OPTION_MASK_ISA_ADX +#define OPTION_MASK_ISA_PREFETCHWT1_UNSET OPTION_MASK_ISA_PREFETCHWT1 /* SSE4 includes both SSE4.1 and SSE4.2. -mno-sse4 should the same as -mno-sse4.1. */ @@ -757,6 +759,19 @@ ix86_handle_option (struct gcc_options *opts, } return true; +case OPT_mprefetchwt1: + if (value) + { + opts-x_ix86_isa_flags |= OPTION_MASK_ISA_PREFETCHWT1_SET; + opts-x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_PREFETCHWT1_SET; + } + else + { + opts-x_ix86_isa_flags = ~OPTION_MASK_ISA_PREFETCHWT1_UNSET; + opts-x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_PREFETCHWT1_UNSET; + } + return true; + /* Comes from final.c -- no real reason to change it. */ #define MAX_CODE_ALIGN 16 diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h index c7a53dd..8c323ae 100644 ---
Re: Fwd: [RFC][gomp4] Offloading patches (2/3): Add tables generation
On 02/21/2014 04:17 PM, Ilya Verbin wrote: 2014-02-20 22:27 GMT+04:00 Bernd Schmidt ber...@codesourcery.com: There were still a number of things in these patches that did not make sense to me and which I've changed. Let me know if there was a good reason for the way some of these things were originally done. * Functions and variables now go into different tables, otherwise intermixing between them could be a problem that causes tables to go out of sync between host and target (imagine one big table being generated by ptx lto1/mkoffload, and multiple small table fragments being linked together on the host side). What do you mean by multiple small table fragments? Well, suppose you have file1.o and file2.o compiled for the host with a .offload_func_table_section in each, and they get linked together - each provides a fragment of the whole table. The tables from every object file should be joined together while linking DSO in the same order for both host and target. If you need to join tables from multiple target images into one big table, the host tables also should be joined in the same order. The problem is that ptx does not have a linker, so we cannot exactly reproduce what happens on the host side. We have to process all host .o files in one single invocation of ptx lto1, and produce a single ptx assembly file, with a single function/variable table, from there. Having functions and variables separated gives us at least a small chance that the order will match that found in the host tables if the host table is produced by linking multiple fragments. Suppose there is liba, that depends on libb, that depends on libc. What kind of dependencies between liba and libb do you expect to be able to support on the target side? References to each other's functions and variables? Bernd
Re: [PATCH] Fix PR c++/60065.
On 02/21/2014 03:19 AM, Adam Butcher wrote: A class template with an out-of-line generic function definition will give the same issue I think: template typename T void AT::f(auto x) {} // should inject a new list Right. template_class_depth should be useful here. This is basically the same question as whether a particular member function is a primary template (member template) or not, but figuring it out in the middle of the parameter list complicates things. Once it's resolved I think it'd be useful to create a new function to determine this rather than doing the scope walk in a number of places. Something like 'templ_parm_scope_for_fn_being_declared' --- or hopefully some more elegant name! Right. Why doesn't num_template_parameter_lists work as a predicate here? It works in the lambda case as it is updated there, but for generic functions I think the following prevents it: cp/parser.c:17063: /* Inside the function parameter list, surrounding template-parameter-lists do not apply. */ saved_num_template_parameter_lists = parser-num_template_parameter_lists; parser-num_template_parameter_lists = 0; Hmm, I wonder what that's for? What breaks when you remove it? :) Jason
[jit] New API entrypoint: gcc_jit_context_dump_to_file
Committed to branch dmalcolm/jit: Add a new gcc_jit_context_dump_to_file, which dumps a C-like representation of the context's IR to a given path. There is also a flag update_locations, which, when true, will set up gcc_jit_location information throughout the context, pointing at the dump file as if it were a source file. I've been using this in conjunction with GCC_JIT_BOOL_OPTION_DEBUGINFO to step through generated code in the debugger (when trying to debug my port of GNU Octave's JIT to libgccjit). gcc/jit/ * libgccjit.h (gcc_jit_context_dump_to_file): New. * libgccjit.map (gcc_jit_context_dump_to_file): New. * libgccjit.c (gcc_jit_context_dump_to_file): New. * libgccjit++.h (gccjit::context::dump_to_file): New. * internal-api.h (gcc::jit::dump): New class. (gcc::jit::recording::playback_location): Add a replayer argument, so that playback locations can be created before playback statements. (gcc::jit::recording::location::playback_location): Likewise. (gcc::jit::recording::statement::playback_location): Likewise. (gcc::jit::recording::context::dump_to_file): New. (gcc::jit::recording::context::m_structs): New field, for use by dump_to_file. (gcc::jit::recording::context::m_functions): Likewise. (gcc::jit::recording::memento::write_to_dump): New virtual function. (gcc::jit::recording::field::write_to_dump): New. (gcc::jit::recording::fields::write_to_dump): New. (gcc::jit::recording::function::write_to_dump): New. (gcc::jit::recording::function::m_locals): New field for use by write_to_dump. (gcc::jit::recording::function::m_activity): Likewise. (gcc::jit::recording::local::write_to_dump): New. (gcc::jit::recording::statement::write_to_dump): New. (gcc::jit::recording::place_label::write_to_dump): New. * internal-api.c (gcc::jit::dump::dump): New. (gcc::jit::dump::~dump): New. (gcc::jit::dump::write): New. (gcc::jit::dump::make_location): New. (gcc::jit::recording::playback_location): Add a replayer argument, so that playback locations can be created before playback statements. (gcc::jit::recording::context::context): Initialize new fields. (gcc::jit::recording::function::function): Likewise. (gcc::jit::recording::context::new_struct_type): Add struct to the context's m_structs vector. (gcc::jit::recording::context::new_function): Add function to the context's m_functions vector. (gcc::jit::recording::context::dump_to_file): New. (gcc::jit::recording::memento::write_to_dump): New. (gcc::jit::recording::field::write_to_dump): New. (gcc::jit::recording::fields::write_to_dump): New. (gcc::jit::recording::function::write_to_dump): New. (gcc::jit::recording::local::write_to_dump): New. (gcc::jit::recording::statement::write_to_dump): New. (gcc::jit::recording::place_label::write_to_dump): New. (gcc::jit::recording::array_type::replay_into): Pass on replayer to call to playback_location. (gcc::jit::recording::field::replay_into): Likewise. (gcc::jit::recording::struct_::replay_into): Likewise. (gcc::jit::recording::param::replay_into): Likewise. (gcc::jit::recording::function::replay_into): Likewise. (gcc::jit::recording::global::replay_into): Likewise. (gcc::jit::recording::unary_op::replay_into): Likewise. (gcc::jit::recording::binary_op::replay_into): Likewise. (gcc::jit::recording::comparison::replay_into): Likewise. (gcc::jit::recording::call::replay_into): Likewise. (gcc::jit::recording::array_access::replay_into): Likewise. (gcc::jit::recording::access_field_of_lvalue::replay_into): Likewise. (gcc::jit::recording::access_field_rvalue::replay_into): Likewise. (gcc::jit::recording::dereference_field_rvalue::replay_into): Likewise. (gcc::jit::recording::dereference_rvalue::replay_into): Likewise. (gcc::jit::recording::get_address_of_lvalue::replay_into): Likewise. (gcc::jit::recording::local::replay_into): Likewise. (gcc::jit::recording::eval::replay_into): Likewise. (gcc::jit::recording::assignment::replay_into): Likewise. (gcc::jit::recording::assignment_op::replay_into): Likewise. (gcc::jit::recording::comment::replay_into): Likewise. (gcc::jit::recording::conditional::replay_into): Likewise. (gcc::jit::recording::place_label::replay_into): Likewise. (gcc::jit::recording::jump::replay_into): Likewise. (gcc::jit::recording::return_::replay_into): Likewise. (gcc::jit::recording::loop::replay_into): Likewise. (gcc::jit::recording::loop_end::replay_into): Likewise. (gcc::jit::recording::function::new_local): Add to the function's
[Patch, AArch64] Fix shuffle for big-endian.
Hi, When a shuffle of more than one input happens, on NEON we end up with a 'mixed-endian' format in the register list which TBL operates on. We don't make this correction in RTL and therefore the shuffle operation gets it incorrect. Here is a patch that fixes-up the index table in the selector rtx in RTL to also be mixed-endian to reflect what's happening on NEON. As trunk stands, this patch will not be exercised as constant vector permute for Big-endian is disabled. I've tested this by locally enabling const vec_perm and it fixes the some regressions we have on big-endian: aarch64_be-none-elf: FAIL-PASS: gcc.c-torture/execute/loop-11.c execution, -O3 -fomit-frame-pointer FAIL-PASS: gcc.c-torture/execute/loop-11.c execution, -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions FAIL-PASS: gcc.c-torture/execute/loop-11.c execution, -O3 -fomit-frame-pointer -funroll-loops FAIL-PASS: gcc.c-torture/execute/loop-11.c execution, -O3 -g FAIL-PASS: gcc.dg/torture/vector-shuffle1.c -O0 execution test FAIL-PASS: gcc.dg/torture/vshuf-v16qi.c -O2 execution test FAIL-PASS: gcc.dg/torture/vshuf-v2df.c -O2 execution test FAIL-PASS: gcc.dg/torture/vshuf-v2di.c -O2 execution test FAIL-PASS: gcc.dg/torture/vshuf-v2sf.c -O2 execution test FAIL-PASS: gcc.dg/torture/vshuf-v2si.c -O2 execution test FAIL-PASS: gcc.dg/torture/vshuf-v4sf.c -O2 execution test FAIL-PASS: gcc.dg/torture/vshuf-v4si.c -O2 execution test FAIL-PASS: gcc.dg/torture/vshuf-v8hi.c -O2 execution test FAIL-PASS: gcc.dg/torture/vshuf-v8qi.c -O2 execution test FAIL-PASS: gcc.dg/vect/vect-114.c -flto -ffat-lto-objects execution test FAIL-PASS: gcc.dg/vect/vect-114.c execution test FAIL-PASS: gcc.dg/vect/vect-15.c -flto -ffat-lto-objects execution test FAIL-PASS: gcc.dg/vect/vect-15.c execution test Also regressed on aarch64-none-elf. OK for stage-1? Thanks, Tejas. 2014-02-21 Tejas Belagod tejas.bela...@arm.com gcc/ * config/aarch64/aarch64.c (aarch64_evpc_tbl): Fix index vector for big-endian when dealing with more than one input shuffle vector.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index ea90311..fd473a3 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -8128,7 +8128,28 @@ aarch64_evpc_tbl (struct expand_vec_perm_d *d) return false; for (i = 0; i nelt; ++i) -rperm[i] = GEN_INT (d-perm[i]); +{ + int nunits = GET_MODE_NUNITS (vmode); + int elt = d-perm[i]; + + /* If two vectors, we end up with a wierd mixed-endian mode on NEON. */ + if (BYTES_BIG_ENDIAN) + { + if (!d-one_vector_p d-perm[i] nunits) + { + /* Extract the offset. */ + elt = d-perm[i] (nunits - 1); + /* Reverse the top half. */ + elt = nunits - 1 - elt; + /* Offset it by the bottom half. */ + elt += nunits; + } + else + elt = nunits - 1 - d-perm[i]; + } + + rperm[i] = GEN_INT (elt); +} sel = gen_rtx_CONST_VECTOR (vmode, gen_rtvec_v (nelt, rperm)); sel = force_reg (vmode, sel);
Re: [PATCH] Bound number of recursive compute_control_dep_chain calls with a param (PR tree-optimization/56490)
thanks for the fix! David On Fri, Feb 21, 2014 at 12:21 AM, Jakub Jelinek ja...@redhat.com wrote: Hi! As discussed in the PR, on larger functions we can end up with over 3 million of compute_control_dep_chain nested calls from a single compute_control_dep_chain call, on that testcase all that effort just to get zero or at most one (useless) control dep path. The problem is that the function is really unbound, even with the 6 element path length limitation (recursion depth) and the limit of 8 find_pdom calls - everything still iterates on all the successor edges at each level. And, the function is often called on the same basic block again and again, even at a particular depth level (e.g. over 20 times same bb same depth level). But the preceeding edge list is slightly different in each case and in theory it could give different answers. Fixed by bounding the total number of nested calls. Additionally, I've made a couple of cleanups, heap allocating 8 field array instead of using an automatic array makes no sense, the chain length is at most 6 and thus we can use a stack vector, etc. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2014-02-21 Jakub Jelinek ja...@redhat.com PR tree-optimization/56490 * params.def (PARAM_UNINIT_CONTROL_DEP_ATTEMPTS): New param. * tree-ssa-uninit.c: Include params.h. (compute_control_dep_chain): Add num_calls argument, return false if it exceed PARAM_UNINIT_CONTROL_DEP_ATTEMPTS param, pass num_calls to recursive call. (find_predicates): Change dep_chain into normal array, cur_chain into auto_vecedge, MAX_CHAIN_LEN + 1, add num_calls variable and adjust compute_control_dep_chain caller. (find_def_preds): Likewise. --- gcc/params.def.jj 2014-01-09 19:09:47.0 +0100 +++ gcc/params.def 2014-02-20 19:30:37.467597338 +0100 @@ -1078,6 +1078,12 @@ DEFPARAM (PARAM_ASAN_USE_AFTER_RETURN, asan-use-after-return, Enable asan builtin functions protection, 1, 0, 1) + +DEFPARAM (PARAM_UNINIT_CONTROL_DEP_ATTEMPTS, + uninit-control-dep-attempts, + Maximum number of nested calls to search for control dependencies + during uninitialized variable analysis, + 1000, 1, 0) /* Local variables: --- gcc/tree-ssa-uninit.c.jj2014-02-04 01:35:58.0 +0100 +++ gcc/tree-ssa-uninit.c 2014-02-20 19:31:14.198385817 +0100 @@ -44,6 +44,7 @@ along with GCC; see the file COPYING3. #include hashtab.h #include tree-pass.h #include diagnostic-core.h +#include params.h /* This implements the pass that does predicate aware warning on uses of possibly uninitialized variables. The pass first collects the set of @@ -390,8 +391,8 @@ find_control_equiv_block (basic_block bb /* Computes the control dependence chains (paths of edges) for DEP_BB up to the dominating basic block BB (the head node of a - chain should be dominated by it). CD_CHAINS is pointer to a - dynamic array holding the result chains. CUR_CD_CHAIN is the current + chain should be dominated by it). CD_CHAINS is pointer to an + array holding the result chains. CUR_CD_CHAIN is the current chain being computed. *NUM_CHAINS is total number of chains. The function returns true if the information is successfully computed, return false if there is no control dependence or not computed. */ @@ -400,7 +401,8 @@ static bool compute_control_dep_chain (basic_block bb, basic_block dep_bb, vecedge *cd_chains, size_t *num_chains, - vecedge *cur_cd_chain) + vecedge *cur_cd_chain, + int *num_calls) { edge_iterator ei; edge e; @@ -411,6 +413,10 @@ compute_control_dep_chain (basic_block b if (EDGE_COUNT (bb-succs) 2) return false; + if (*num_calls PARAM_VALUE (PARAM_UNINIT_CONTROL_DEP_ATTEMPTS)) +return false; + ++*num_calls; + /* Could use a set instead. */ cur_chain_len = cur_cd_chain-length (); if (cur_chain_len MAX_CHAIN_LEN) @@ -450,7 +456,7 @@ compute_control_dep_chain (basic_block b /* Now check if DEP_BB is indirectly control dependent on BB. */ if (compute_control_dep_chain (cd_bb, dep_bb, cd_chains, - num_chains, cur_cd_chain)) +num_chains, cur_cd_chain, num_calls)) { found_cd_chain = true; break; @@ -595,14 +601,12 @@ find_predicates (pred_chain_union *preds basic_block use_bb) { size_t num_chains = 0, i; - vecedge *dep_chains = 0; - vecedge cur_chain = vNULL; + int num_calls = 0; + vecedge dep_chains[MAX_NUM_CHAINS]; + auto_vecedge, MAX_CHAIN_LEN + 1 cur_chain; bool has_valid_pred =
Re: [PATCH] Fix PR 60268
On 2/21/2014, 2:22 AM, Andrey Belevantsev wrote: Hello, While fixing PR 58960 I forgot about single-block regions placing the initialization of the new nr_regions_initial variable in the wrong place. Thus for single block regions we ended up with nr_regions = 1 and nr_regions_initial = 0 and effectively turned off sched-pressure immediately. No worries for the usual scheduling path but with the -flive-range-shrinkage we have broke an assert that sched-pressure is in the specific mode. Fixed by placing the initialization properly at the end of sched_rgn_init and also moving the check for sched_pressure != NONE outside of the if statement in schedule_region as discussed in the PR trail with Jakub. Bootstrapped and tested on x86-64, ok? Ok. Thanks, Andrey. 2014-02-21 Andrey Belevantsev a...@ispras.ru PR rtl-optimization/60268 * sched-rgn.c (haifa_find_rgns): Move the nr_regions_initial init to ... (sched_rgn_init) ... here. (schedule_region): Check for SCHED_PRESSURE_NONE earlier. testsuite/ 2014-02-21 Andrey Belevantsev a...@ispras.ru PR rtl-optimization/60268 * gcc.c-torture/compile/pr60268.c: New test.
Re: [PATCH, PR 60266] Fix problem with mixing -O0 and -O2 in propagate_constants_accross_call
Hi, in propagate_constants_accross_call we expect a thunk to have at least one parameter and thus an ipa-prop parameter descriptor. However, when the callee comes from a CU that was compiled with -O0, there are no parameter descriptors and we fail an index checking assert. This patch fixes it by bailing out early if there are no parameter descriptors because in that case there is nothing to do in that function anyway. Bootstrap and testing in progress, OK for trunk if it passes? Thanks, Martin 2014-02-21 Martin Jambor mjam...@suse.cz PR ipa/60266 * ipa-cp.c (propagate_constants_accross_call): Bail out early if there are no parameter descriptors. Actually I have similar patch in my tree for few days since I hit the problem while building libreoffice. OK. Honza diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c index 7d8bc05..4c9ab12 100644 --- a/gcc/ipa-cp.c +++ b/gcc/ipa-cp.c @@ -1428,6 +1428,8 @@ propagate_constants_accross_call (struct cgraph_edge *cs) args = IPA_EDGE_REF (cs); args_count = ipa_get_cs_argument_count (args); parms_count = ipa_get_param_count (callee_info); + if (parms_count == 0) +return false; /* If this call goes through a thunk we must not propagate to the first (0th) parameter. However, we might need to uncover a thunk from below a series
Re: [PATCH][i386][AVX512] Match latest spec. Add CPUID prefetchwt1.
On Fri, Feb 21, 2014 at 4:25 PM, Ilya Tocar tocarip.in...@gmail.com wrote: Latest version of AVX512 spec http://download-software.intel.com/sites/default/files/managed/50/1a/319433-018.pdf Has a few changes. 1)PREFETCHWT1 instruction now has separate CPUID bit PREFETCHWT1. We can either support new CPUID or disable PREFETCHWT1 from generating, without removing code, and enable it in 4.9.1/latest version. I am not sure that adding new -m flag and related stuff this late is a good idea. Should still add it? Please submit the patch anyway. We can relax release constraints on non-algorithmic patch a bit, weighting in benefits of having gcc release that fully conforms to some published specification. Patch bellow add -mprefetchwt1 flag, corresponding TARGET_PREFETCHWT1, and uses them for prefetchwt1 instruction. Bootstraps/passes testing. Ok for trunk? ChangeLog: 2014-02-21 Ilya Tocar ilya.to...@intel.com * common/config/i386/i386-common.c (OPTION_MASK_ISA_PREFETCHWT1_SET), (OPTION_MASK_ISA_PREFETCHWT1_UNSET): New. (ix86_handle_option): Handle OPT_mprefetchwt1. * config/i386/cpuid.h (bit_PREFETCHWT1): New. * config/i386/driver-i386.c (host_detect_local_cpu): Detect PREFETCHWT1 CPUID. * config/i386/i386-c.c (ix86_target_macros_internal): Handle OPTION_MASK_ISA_PREFETCHWT1. * config/i386/i386.c (ix86_target_string): Handle mprefetchwt1. (PTA_PREFETCHWT1): New. (ix86_option_override_internal): Handle PTA_PREFETCHWT1. (ix86_valid_target_attribute_inner_p): Handle OPT_mprefetchwt1. * config/i386/i386.h (TARGET_PREFETCHWT1), (TARGET_PREFETCHWT1_P): New. * config/i386/i386.md (prefetch): Check TARGET_PREFETCHWT1 (*prefetch_avx512pf_mode_: Change into ... (*prefetch_prefetchwt1_mode: This. * config/i386/i386.opt (mprefetchwt1): New. * config/i386/xmmintrin.h (_mm_hint): Add _MM_HINT_ET1. (_mm_prefetch): Handle intent to write. * doc/invoke.texi (mprefetchwt1), (mno-prefetchwt1): Doccument. And for tests: 2014-02-22 Ilya Tocar ilya.to...@intel.com * gcc.target/i386/avx-1.c: Update __builtin_prefetch. * gcc.target/i386/prefetchwt1-1.c: New. * gcc.target/i386/sse-13.c: Update __builtin_prefetch. * gcc.target/i386/sse-23.c: Ditto. Please also add new switch to gcc-target/i386/sse-{12,13,14}.c and g++.dg/other/i386-{2,3} and new options to gcc.tatget/i386/sse-{22,23}.c. Please re-test with new additions and repost the patch. @@ -17867,8 +17867,8 @@ supported by SSE counterpart or the SSE prefetch is not available (K6 machines). Otherwise use SSE prefetch as it allows specifying of locality. */ - if (TARGET_AVX512PF write) -operands[2] = const1_rtx; + if (TARGET_PREFETCHWT1 write) +operands[2] = GEN_INT (2); you can use const2_rtx here. Uros.
[PATCH, rs6000] vec_sums must define all result vector elements
Hi, The little-endian implementation of vec_sums is incorrect. I had misread the specification and thought that the fields not containing the result value were undefined, but in fact they are defined to contain zero. My previous implementation used a vector splat to copy the field from BE element 3 to LE element 3. The corrected implementation will use a vector shift left to move the field and fill the remaining fields with zeros. When I fixed this, I discovered I had also missed a use of gen_altivec_vsumsws, which should now use gen_altivec_vsumsws_direct instead. This is fixed in this patch as well. Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions. Bootstrap and regression test on powerpc64-unknown-linux-gnu is in progress. If no big-endian regressions are found, is this ok for trunk? Thanks, Bill gcc: 2014-02-21 Bill Schmidt wschm...@linux.vnet.ibm.com * config/rs6000/altivec.md (altivec_vsumsws): Replace second vspltw with vsldoi. (reduc_uplus_v16qi): Use gen_altivec_vsumsws_direct instead of gen_altivec_vsumsws. gcc/testsuite: 2014-02-21 Bill Schmidt wschm...@linux.vnet.ibm.com * gcc.dg/vmx/vsums.c: Check entire result vector. * gcc.dg/vmx/vsums-be-order.c: Likewise. Index: gcc/config/rs6000/altivec.md === --- gcc/config/rs6000/altivec.md(revision 207967) +++ gcc/config/rs6000/altivec.md(working copy) @@ -1651,7 +1651,7 @@ if (VECTOR_ELT_ORDER_BIG) return vsumsws %0,%1,%2; else -return vspltw %3,%2,0\n\tvsumsws %3,%1,%3\n\tvspltw %0,%3,3; +return vspltw %3,%2,0\n\tvsumsws %3,%1,%3\n\tvsldoi %0,%3,%3,12; } [(set_attr type veccomplex) (set (attr length) @@ -2483,7 +2539,7 @@ emit_insn (gen_altivec_vspltisw (vzero, const0_rtx)); emit_insn (gen_altivec_vsum4ubs (vtmp1, operands[1], vzero)); - emit_insn (gen_altivec_vsumsws (dest, vtmp1, vzero)); + emit_insn (gen_altivec_vsumsws_direct (dest, vtmp1, vzero)); DONE; }) Index: gcc/testsuite/gcc.dg/vmx/vsums-be-order.c === --- gcc/testsuite/gcc.dg/vmx/vsums-be-order.c (revision 207967) +++ gcc/testsuite/gcc.dg/vmx/vsums-be-order.c (working copy) @@ -8,12 +8,13 @@ static void test() #if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ vector signed int vb = {128,0,0,0}; + vector signed int evd = {136,0,0,0}; #else vector signed int vb = {0,0,0,128}; + vector signed int evd = {0,0,0,136}; #endif vector signed int vd = vec_sums (va, vb); - signed int r = vec_extract (vd, 3); - check (r == 136, sums); + check (vec_all_eq (vd, evd), sums); } Index: gcc/testsuite/gcc.dg/vmx/vsums.c === --- gcc/testsuite/gcc.dg/vmx/vsums.c(revision 207967) +++ gcc/testsuite/gcc.dg/vmx/vsums.c(working copy) @@ -4,9 +4,9 @@ static void test() { vector signed int va = {-7,11,-13,17}; vector signed int vb = {0,0,0,128}; + vector signed int evd = {0,0,0,136}; vector signed int vd = vec_sums (va, vb); - signed int r = vec_extract (vd, 3); - check (r == 136, sums); + check (vec_all_eq (vd, evd), sums); }
Re: Fwd: [RFC][gomp4] Offloading patches (2/3): Add tables generation
2014-02-21 19:41 GMT+04:00 Bernd Schmidt ber...@codesourcery.com: The problem is that ptx does not have a linker, so we cannot exactly reproduce what happens on the host side. We have to process all host .o files in one single invocation of ptx lto1, and produce a single ptx assembly file, with a single function/variable table, from there. Having functions and variables separated gives us at least a small chance that the order will match that found in the host tables if the host table is produced by linking multiple fragments. If ptx lto1 will process all .o files in order as they were passed to it, the resulting table should be consistent with the table produced by host's lto1. What kind of dependencies between liba and libb do you expect to be able to support on the target side? References to each other's functions and variables? Yes, references to global variables and calls to functions, marked with omp declare target.
Re: [PATCH, rs6000] vec_sums must define all result vector elements
On Fri, Feb 21, 2014 at 12:56 PM, Bill Schmidt wschm...@linux.vnet.ibm.com wrote: Hi, The little-endian implementation of vec_sums is incorrect. I had misread the specification and thought that the fields not containing the result value were undefined, but in fact they are defined to contain zero. My previous implementation used a vector splat to copy the field from BE element 3 to LE element 3. The corrected implementation will use a vector shift left to move the field and fill the remaining fields with zeros. When I fixed this, I discovered I had also missed a use of gen_altivec_vsumsws, which should now use gen_altivec_vsumsws_direct instead. This is fixed in this patch as well. Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions. Bootstrap and regression test on powerpc64-unknown-linux-gnu is in progress. If no big-endian regressions are found, is this ok for trunk? Okay. Thanks, David
[PATCH, testsuite]: Add some missing avx512 options to g++.dg/other/i386-{2,3}.C and gcc.target/i386/sse-{12,13}.c
Hello! No additional testsuite failures. 2014-02-21 Uros Bizjak ubiz...@gmail.com * g++.dg/other/i386-2.C (dg-options): Add -mavx512pf. * g++.dg/other/i386-3.C (dg-options): Ditto. * gcc.target/i386/sse-12.c (dg-options): Add -msha. * gcc.target/i386/sse-13.c (dg-options): Add -mavx512er, -mavx512cd, -mavx512pf and -msha. Tested on x86_64-pc-linux-gnu and committed to mainline SVN. Uros. Index: g++.dg/other/i386-2.C === --- g++.dg/other/i386-2.C (revision 208010) +++ g++.dg/other/i386-2.C (working copy) @@ -1,5 +1,5 @@ /* { dg-do compile { target i?86-*-* x86_64-*-* } } */ -/* { dg-options -O -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -msha } */ +/* { dg-options -O -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha } */ /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h, fma4intrin.h, xopintrin.h, abmintrin.h, bmiintrin.h, tbmintrin.h, lwpintrin.h, Index: g++.dg/other/i386-3.C === --- g++.dg/other/i386-3.C (revision 208010) +++ g++.dg/other/i386-3.C (working copy) @@ -1,5 +1,5 @@ /* { dg-do compile { target i?86-*-* x86_64-*-* } } */ -/* { dg-options -O -fkeep-inline-functions -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -msha } */ +/* { dg-options -O -fkeep-inline-functions -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha } */ /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h, fma4intrin.h, xopintrin.h, abmintrin.h, bmiintrin.h, tbmintrin.h, lwpintrin.h, Index: gcc.target/i386/sse-12.c === --- gcc.target/i386/sse-12.c(revision 208010) +++ gcc.target/i386/sse-12.c(working copy) @@ -3,7 +3,7 @@ popcntintrin.h and mm_malloc.h are usable with -O -std=c89 -pedantic-errors. */ /* { dg-do compile } */ -/* { dg-options -O -std=c89 -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512cd -mavx512er -mavx512pf } */ +/* { dg-options -O -std=c89 -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha } */ #include x86intrin.h Index: gcc.target/i386/sse-13.c === --- gcc.target/i386/sse-13.c(revision 208010) +++ gcc.target/i386/sse-13.c(working copy) @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options -O2 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f } */ +/* { dg-options -O2 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha } */ #include mm_malloc.h
Re: [PATCH, rs6000] Add -maltivec=be semantics in LE mode for vec_ld and vec_st
On Thu, Feb 20, 2014 at 2:46 PM, Bill Schmidt wschm...@linux.vnet.ibm.com wrote: Hi, For compatibility with the XL compilers, we need to support -maltivec=be for vec_ld, vec_ldl, vec_st, and vec_stl. (A later patch will also handle vec_lde and vec_ste.) This is a much simpler patch than its size would indicate. The original implementation of these built-ins treated them all as always loading and storing V4SI values, relying on subregs to adjust type mismatches. For this work we need to have the true type so that we know how to reverse the order of vector elements. So most of this patch is the busy-work of adding new built-in definitions for all the supported types (six types for each of the four built-ins). The real work is done in altivec.md to call altivec_expand_{lvx,stvx}_be for these built-ins when -maltivec=be is selected for a little endian target, and in rs6000.c where these functions are defined. For the loads, the usual load insn is generated followed by a permute to reverse the order of the vector elements. For the stores, the usual store insn is generated preceded by a permute to reverse the order of the vector elements. A common routine swap_selector_for_mode is used to generate the permute control vector for the permute. There are 16 new tests, 4 for each built-in. These cover the VMX and VSX built-ins for big-endian, little-endian, and little-endian with -maltivec=be. Bootstrapped and tested on powerpc64{,le}-unknown-linux-gnu with no regressions. All the new tests pass in all endian environments. Is this ok for trunk? Thanks, Bill gcc: 2014-02-20 Bill Schmidt wschm...@linux.vnet.ibm.com * config/rs6000/altivec.md (altivec_lvxl): Rename as *altivec_lvxl_mode_internal and use VM2 iterator instead of V4SI. (altivec_lvxl_mode): New define_expand incorporating -maltivec=be semantics where needed. (altivec_lvx): Rename as *altivec_lvx_mode_internal. (altivec_lvx_mode): New define_expand incorporating -maltivec=be semantics where needed. (altivec_stvx): Rename as *altivec_stvx_mode_internal. (altivec_stvx_mode): New define_expand incorporating -maltivec=be semantics where needed. (altivec_stvxl): Rename as *altivec_stvxl_mode_internal and use VM2 iterator instead of V4SI. (altivec_stvxl_mode): New define_expand incorporating -maltivec=be semantics where needed. * config/rs6000/rs6000-builtin.def: Add new built-in definitions LVXL_V2DF, LVXL_V2DI, LVXL_V4SF, LVXL_V4SI, LVXL_V8HI, LVXL_V16QI, LVX_V2DF, LVX_V2DI, LVX_V4SF, LVX_V4SI, LVX_V8HI, LVX_V16QI, STVX_V2DF, STVX_V2DI, STVX_V4SF, STVX_V4SI, STVX_V8HI, STVX_V16QI, STVXL_V2DF, STVXL_V2DI, STVXL_V4SF, STVXL_V4SI, STVXL_V8HI, STVXL_V16QI. * config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Replace ALTIVEC_BUILTIN_LVX with ALTIVEC_BUILTIN_LVX_MODE throughout; similarly for ALTIVEC_BUILTIN_LVXL, ALTIVEC_BUILTIN_STVX, and ALTIVEC_BUILTIN_STVXL. * config/rs6000/rs6000-protos.h (altivec_expand_lvx_be): New prototype. (altivec_expand_stvx_be): Likewise. * config/rs6000/rs6000.c (swap_selector_for_mode): New function. (altivec_expand_lvx_be): Likewise. (altivec_expand_stvx_be): Likewise. (altivec_expand_builtin): Add cases for ALTIVEC_BUILTIN_STVX_MODE, ALTIVEC_BUILTIN_STVXL_MODE, ALTIVEC_BUILTIN_LVXL_MODE, and ALTIVEC_BUILTIN_LVX_MODE. (altivec_init_builtins): Add definitions for __builtin_altivec_lvxl_mode, __builtin_altivec_lvx_mode, __builtin_altivec_stvx_mode, and __builtin_altivec_stvxl_mode. gcc/testsuite: 2014-02-20 Bill Schmidt wschm...@linux.vnet.ibm.com * gcc.dg/vmx/ld.c: New test. * gcc.dg/vmx/ld-be-order.c: New test. * gcc.dg/vmx/ld-vsx.c: New test. * gcc.dg/vmx/ld-vsx-be-order.c: New test. * gcc.dg/vmx/ldl.c: New test. * gcc.dg/vmx/ldl-be-order.c: New test. * gcc.dg/vmx/ldl-vsx.c: New test. * gcc.dg/vmx/ldl-vsx-be-order.c: New test. * gcc.dg/vmx/st.c: New test. * gcc.dg/vmx/st-be-order.c: New test. * gcc.dg/vmx/st-vsx.c: New test. * gcc.dg/vmx/st-vsx-be-order.c: New test. * gcc.dg/vmx/stl.c: New test. * gcc.dg/vmx/stl-be-order.c: New test. * gcc.dg/vmx/stl-vsx.c: New test. * gcc.dg/vmx/stl-vsx-be-order.c: New test. Okay. Thanks, David
[GOMP4] gimple_code_is_oacc - is_gimple_omp_oacc_specifically (was: [PATCH 4/6] [GOMP4] OpenACC 1.0+ support in fortran front-end)
Hi! On Tue, 11 Feb 2014 17:51:15 +0100, I wrote: On Fri, 31 Jan 2014 15:16:07 +0400, Ilmir Usmanov i.usma...@samsung.com wrote: --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -1491,6 +1491,18 @@ fixup_child_record_type (omp_context *ctx) TREE_TYPE (ctx-receiver_decl) = build_pointer_type (type); } +static bool +gimple_code_is_oacc (const_gimple g) +{ + switch (gimple_code (g)) +{ +case GIMPLE_OACC_PARALLEL: + return true; +default: + return false; +} +} + Eventually, this will probably end up next to CASE_GIMPLE_OMP/is_gimple_omp in gimple.h (or the latter be reworked to be able to ask for is_omp vs. is_oacc vs. is_omp_or_oacc), but it's fine to do that once we actually need it in files other than just omp-low.c, and once we support more GIMPLE_OACC_* codes. Ah, well, I'm now in the situation that I need to do such a check in another file, so I have applied the following to gomp-4_0-branch in r208013. I have also renamed the function to is_gimple_omp_oacc_specifically, building on the existing is_gimple_omp name. (Don't worry about the unwieldy name, as all this is to disappear as the development progresses.) commit 25aab0dd39a57661e9d7f3a5f405f4647977b9de Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4 Date: Fri Feb 21 19:26:01 2014 + gimple_code_is_oacc - is_gimple_omp_oacc_specifically. gcc/ * omp-low.c (gimple_code_is_oacc): Move to... * gimple.h (is_gimple_omp_oacc_specifically): ... here. Update users, and also use it in more places where currently we've only been checking for GIMPLE_OACC_PARALLEL. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@208013 138bc75d-0d04-0410-961f-82ee72b054a4 diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp index 14d8805..1ce952d 100644 --- gcc/ChangeLog.gomp +++ gcc/ChangeLog.gomp @@ -1,3 +1,10 @@ +2014-02-21 Thomas Schwinge tho...@codesourcery.com + + * omp-low.c (gimple_code_is_oacc): Move to... + * gimple.h (is_gimple_omp_oacc_specifically): ... here. Update + users, and also use it in more places where currently we've only + been checking for GIMPLE_OACC_PARALLEL. + 2014-02-18 Thomas Schwinge tho...@codesourcery.com * omp-low.c (diagnose_sb_0, diagnose_sb_1, diagnose_sb_2): Handle diff --git gcc/gimple.h gcc/gimple.h index 5b5a0ee..0d250ef 100644 --- gcc/gimple.h +++ gcc/gimple.h @@ -5670,6 +5670,25 @@ is_gimple_omp (const_gimple stmt) } } +/* Return true if STMT is any of the OpenACC types specifically. + + TODO: This function should go away eventually, once all its callers have + either been fixed, changed into more specific checks, or verified to not + need any special handling for OpenACC. */ + +static inline bool +is_gimple_omp_oacc_specifically (const_gimple stmt) +{ + gcc_assert (is_gimple_omp (stmt)); + switch (gimple_code (stmt)) +{ +case GIMPLE_OACC_PARALLEL: + return true; +default: + return false; +} +} + /* Returns TRUE if statement G is a GIMPLE_NOP. */ diff --git gcc/omp-low.c gcc/omp-low.c index 110ea63..b975dad 100644 --- gcc/omp-low.c +++ gcc/omp-low.c @@ -863,7 +863,7 @@ use_pointer_for_field (tree decl, omp_context *shared_ctx) when we know the value is not accessible from an outer scope. */ if (shared_ctx) { - gcc_assert (gimple_code (shared_ctx-stmt) != GIMPLE_OACC_PARALLEL); + gcc_assert (!is_gimple_omp_oacc_specifically (shared_ctx-stmt)); /* ??? Trivially accessible from anywhere. But why would we even be passing an address in this case? Should we simply assert @@ -1006,7 +1006,7 @@ build_receiver_ref (tree var, bool by_ref, omp_context *ctx) static tree build_outer_var_ref (tree var, omp_context *ctx) { - gcc_assert (gimple_code (ctx-stmt) != GIMPLE_OACC_PARALLEL); + gcc_assert (!is_gimple_omp_oacc_specifically (ctx-stmt)); tree x; @@ -1072,7 +1072,7 @@ install_var_field (tree var, bool by_ref, int mask, omp_context *ctx) gcc_assert ((mask 2) == 0 || !ctx-sfield_map || !splay_tree_lookup (ctx-sfield_map, (splay_tree_key) var)); gcc_assert ((mask 3) == 3 - || gimple_code (ctx-stmt) != GIMPLE_OACC_PARALLEL); + || !is_gimple_omp_oacc_specifically (ctx-stmt)); type = TREE_TYPE (var); if (mask 4) @@ -1491,18 +1491,6 @@ fixup_child_record_type (omp_context *ctx) TREE_TYPE (ctx-receiver_decl) = build_pointer_type (type); } -static bool -gimple_code_is_oacc (const_gimple g) -{ - switch (gimple_code (g)) -{ -case GIMPLE_OACC_PARALLEL: - return true; -default: - return false; -} -} - /* Instantiate decls as necessary in CTX to satisfy the data sharing specified by CLAUSES. */ @@ -1519,7 +1507,7 @@ scan_sharing_clauses (tree clauses, omp_context *ctx) switch (OMP_CLAUSE_CODE (c)) { case OMP_CLAUSE_PRIVATE: -
Re: [gomp4 3/6] Initial support for OpenACC memory mapping semantics.
Hi! On Tue, 14 Jan 2014 16:10:05 +0100, I wrote: --- gcc/gimplify.c +++ gcc/gimplify.c @@ -86,7 +92,11 @@ enum omp_region_type ORT_UNTIED_TASK = 5, ORT_TEAMS = 8, ORT_TARGET_DATA = 16, - ORT_TARGET = 32 + ORT_TARGET = 32, + + /* Flags for ORT_TARGET. */ + /* Default to GOVD_MAP_FORCE for implicit mappings in this region. */ + ORT_TARGET_MAP_FORCE = 64 }; Continuing on that route, I have now applied the following to gomp-4_0-branch in r208014: commit dee2965ae547af0bc90d618e7fa40fbf2f5292b4 Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4 Date: Fri Feb 21 19:45:12 2014 + Gimplification: New flag ORT_TARGET_OFFLOAD replaces !ORT_TARGET_DATA. gcc/ * gimplify.c (enum omp_region_type): Make ORT_TARGET_OFFLOAD a flag for ORT_TARGET, in its negation replacing ORT_TARGET_DATA. Update all users. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@208014 138bc75d-0d04-0410-961f-82ee72b054a4 diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp index 1ce952d..bf8ec96 100644 --- gcc/ChangeLog.gomp +++ gcc/ChangeLog.gomp @@ -1,5 +1,9 @@ 2014-02-21 Thomas Schwinge tho...@codesourcery.com + * gimplify.c (enum omp_region_type): Make ORT_TARGET_OFFLOAD a + flag for ORT_TARGET, in its negation replacing ORT_TARGET_DATA. + Update all users. + * omp-low.c (gimple_code_is_oacc): Move to... * gimple.h (is_gimple_omp_oacc_specifically): ... here. Update users, and also use it in more places where currently we've only diff --git gcc/gimplify.c gcc/gimplify.c index 51a1b73..9aa9301c 100644 --- gcc/gimplify.c +++ gcc/gimplify.c @@ -100,10 +100,11 @@ enum omp_region_type ORT_TASK = 4, ORT_UNTIED_TASK = 5, ORT_TEAMS = 8, - ORT_TARGET_DATA = 16, - ORT_TARGET = 32, + ORT_TARGET = 16, /* Flags for ORT_TARGET. */ + /* Prepare this region for offloading. */ + ORT_TARGET_OFFLOAD = 32, /* Default to GOVD_MAP_FORCE for implicit mappings in this region. */ ORT_TARGET_MAP_FORCE = 64 }; @@ -2202,7 +2203,7 @@ gimplify_arg (tree *arg_p, gimple_seq *pre_p, location_t call_location) return gimplify_expr (arg_p, pre_p, NULL, test, fb); } -/* Don't fold STMT inside ORT_TARGET, because it can break code by adding decl +/* Don't fold inside offloading regsion: it can break code by adding decl references that weren't in the source. We'll do it during omplower pass instead. */ @@ -2211,7 +2212,8 @@ maybe_fold_stmt (gimple_stmt_iterator *gsi) { struct gimplify_omp_ctx *ctx; for (ctx = gimplify_omp_ctxp; ctx; ctx = ctx-outer_context) -if (ctx-region_type ORT_TARGET) +if (ctx-region_type ORT_TARGET +ctx-region_type ORT_TARGET_OFFLOAD) return false; return fold_stmt (gsi); } @@ -5388,10 +5390,12 @@ omp_firstprivatize_variable (struct gimplify_omp_ctx *ctx, tree decl) return; } else if (ctx-region_type ORT_TARGET) - omp_add_variable (ctx, decl, GOVD_MAP | GOVD_MAP_TO_ONLY); + { + if (ctx-region_type ORT_TARGET_OFFLOAD) + omp_add_variable (ctx, decl, GOVD_MAP | GOVD_MAP_TO_ONLY); + } else if (ctx-region_type != ORT_WORKSHARE - ctx-region_type != ORT_SIMD - ctx-region_type != ORT_TARGET_DATA) + ctx-region_type != ORT_SIMD) omp_add_variable (ctx, decl, GOVD_FIRSTPRIVATE); ctx = ctx-outer_context; @@ -5580,7 +5584,8 @@ omp_notice_threadprivate_variable (struct gimplify_omp_ctx *ctx, tree decl, struct gimplify_omp_ctx *octx; for (octx = ctx; octx; octx = octx-outer_context) -if (octx-region_type ORT_TARGET) +if ((octx-region_type ORT_TARGET) +(octx-region_type ORT_TARGET_OFFLOAD)) { gcc_assert (!(octx-region_type ORT_TARGET_MAP_FORCE)); @@ -5643,7 +5648,8 @@ omp_notice_variable (struct gimplify_omp_ctx *ctx, tree decl, bool in_code) } n = splay_tree_lookup (ctx-variables, (splay_tree_key)decl); - if (ctx-region_type ORT_TARGET) + if ((ctx-region_type ORT_TARGET) + (ctx-region_type ORT_TARGET_OFFLOAD)) { unsigned map_force; if (ctx-region_type ORT_TARGET_MAP_FORCE) @@ -5695,7 +5701,8 @@ omp_notice_variable (struct gimplify_omp_ctx *ctx, tree decl, bool in_code) if (ctx-region_type == ORT_WORKSHARE || ctx-region_type == ORT_SIMD - || ctx-region_type == ORT_TARGET_DATA) + || ((ctx-region_type ORT_TARGET) + !(ctx-region_type ORT_TARGET_OFFLOAD))) goto do_outer; /* ??? Some compiler-generated variables (like SAVE_EXPRs) could be @@ -5746,7 +5753,7 @@ omp_notice_variable (struct gimplify_omp_ctx *ctx, tree decl, bool in_code) { splay_tree_node n2; - if ((octx-region_type (ORT_TARGET_DATA | ORT_TARGET)) != 0) + if (octx-region_type ORT_TARGET) continue; n2 =
[gomp4 1/3] Clarify to/from/map clauses usage in context of GF_OMP_TARGET_KIND_UPDATE.
From: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4 gcc/ * omp-low.c (scan_sharing_clauses): Catch unexpected occurrences of OMP_CLAUSE_TO, OMP_CLAUSE_FROM, OMP_CLAUSE_MAP. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@208015 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/ChangeLog.gomp | 3 +++ gcc/omp-low.c | 25 + 2 files changed, 28 insertions(+) diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp index bf8ec96..bd46f2e 100644 --- gcc/ChangeLog.gomp +++ gcc/ChangeLog.gomp @@ -1,5 +1,8 @@ 2014-02-21 Thomas Schwinge tho...@codesourcery.com + * omp-low.c (scan_sharing_clauses): Catch unexpected occurrences + of OMP_CLAUSE_TO, OMP_CLAUSE_FROM, OMP_CLAUSE_MAP. + * gimplify.c (enum omp_region_type): Make ORT_TARGET_OFFLOAD a flag for ORT_TARGET, in its negation replacing ORT_TARGET_DATA. Update all users. diff --git gcc/omp-low.c gcc/omp-low.c index 9fef4c1..bca4599 100644 --- gcc/omp-low.c +++ gcc/omp-low.c @@ -1630,6 +1630,26 @@ scan_sharing_clauses (tree clauses, omp_context *ctx) case OMP_CLAUSE_FROM: gcc_assert (!is_gimple_omp_oacc_specifically (ctx-stmt)); case OMP_CLAUSE_MAP: + switch (OMP_CLAUSE_CODE (c)) + { + case OMP_CLAUSE_TO: + case OMP_CLAUSE_FROM: + /* The to and from clauses are only ever seen with OpenMP target +update constructs. */ + gcc_assert (gimple_code (ctx-stmt) == GIMPLE_OMP_TARGET + (gimple_omp_target_kind (ctx-stmt) + == GF_OMP_TARGET_KIND_UPDATE)); + break; + case OMP_CLAUSE_MAP: + /* The map clause is never seen with OpenMP target update +constructs. */ + gcc_assert (gimple_code (ctx-stmt) != GIMPLE_OMP_TARGET + || (gimple_omp_target_kind (ctx-stmt) + != GF_OMP_TARGET_KIND_UPDATE)); + break; + default: + gcc_unreachable (); + } if (ctx-outer) scan_omp_op (OMP_CLAUSE_SIZE (c), ctx-outer); decl = OMP_CLAUSE_DECL (c); @@ -1799,6 +1819,11 @@ scan_sharing_clauses (tree clauses, omp_context *ctx) break; case OMP_CLAUSE_MAP: + /* The map clause is never seen with OpenMP target update +constructs. */ + gcc_assert (gimple_code (ctx-stmt) != GIMPLE_OMP_TARGET + || (gimple_omp_target_kind (ctx-stmt) + != GF_OMP_TARGET_KIND_UPDATE)); if (!gimple_code_is_oacc (ctx-stmt) gimple_omp_target_kind (ctx-stmt) == GF_OMP_TARGET_KIND_DATA) break; -- 1.8.1.1
[gomp4 2/3] OpenACC data construct implementation in terms of GF_OMP_TARGET_KIND_OACC_DATA.
From: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4 gcc/ * gimple.h (enum gf_mask): Add GF_OMP_TARGET_KIND_OACC_DATA. (is_gimple_omp_oacc_specifically): Handle it. * gimple-pretty-print.c (dump_gimple_omp_target): Likewise. * gimplify.c (gimplify_omp_workshare, gimplify_expr): Likewise. * omp-low.c (scan_sharing_clauses, scan_omp_target) (expand_omp_target, lower_omp_target, lower_omp_1): Likewise. * gimple.def (GIMPLE_OMP_TARGET): Update comment. * gimple.c (gimple_build_omp_target): Likewise. (gimple_copy): Catch unimplemented case. * tree-inline.c (remap_gimple_stmt): Likewise. * tree-nested.c (convert_nonlocal_reference_stmt) (convert_local_reference_stmt, convert_gimple_call): Likewise. * oacc-builtins.def (BUILT_IN_GOACC_DATA_START) (BUILT_IN_GOACC_DATA_END): New builtins. libgomp/ * libgomp.map (GOACC_2.0): Add GOACC_data_end, GOACC_data_start. * libgomp_g.h (GOACC_data_start, GOACC_data_end): New prototypes. * oacc-parallel.c (GOACC_data_start, GOACC_data_end): New functions. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@208016 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/ChangeLog.gomp| 15 ++ gcc/gimple-pretty-print.c | 3 ++ gcc/gimple.c | 4 +- gcc/gimple.def| 1 + gcc/gimple.h | 9 gcc/gimplify.c| 33 +--- gcc/oacc-builtins.def | 6 ++- gcc/omp-low.c | 132 -- gcc/tree-inline.c | 1 + gcc/tree-nested.c | 3 ++ libgomp/ChangeLog.gomp| 7 +++ libgomp/libgomp.map | 2 + libgomp/libgomp_g.h | 3 ++ libgomp/oacc-parallel.c | 34 +++- 14 files changed, 213 insertions(+), 40 deletions(-) diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp index bd46f2e..824ec94 100644 --- gcc/ChangeLog.gomp +++ gcc/ChangeLog.gomp @@ -1,5 +1,20 @@ 2014-02-21 Thomas Schwinge tho...@codesourcery.com + * gimple.h (enum gf_mask): Add GF_OMP_TARGET_KIND_OACC_DATA. + (is_gimple_omp_oacc_specifically): Handle it. + * gimple-pretty-print.c (dump_gimple_omp_target): Likewise. + * gimplify.c (gimplify_omp_workshare, gimplify_expr): Likewise. + * omp-low.c (scan_sharing_clauses, scan_omp_target) + (expand_omp_target, lower_omp_target, lower_omp_1): Likewise. + * gimple.def (GIMPLE_OMP_TARGET): Update comment. + * gimple.c (gimple_build_omp_target): Likewise. + (gimple_copy): Catch unimplemented case. + * tree-inline.c (remap_gimple_stmt): Likewise. + * tree-nested.c (convert_nonlocal_reference_stmt) + (convert_local_reference_stmt, convert_gimple_call): Likewise. + * oacc-builtins.def (BUILT_IN_GOACC_DATA_START) + (BUILT_IN_GOACC_DATA_END): New builtins. + * omp-low.c (scan_sharing_clauses): Catch unexpected occurrences of OMP_CLAUSE_TO, OMP_CLAUSE_FROM, OMP_CLAUSE_MAP. diff --git gcc/gimple-pretty-print.c gcc/gimple-pretty-print.c index 91a3eb2..ad9369c 100644 --- gcc/gimple-pretty-print.c +++ gcc/gimple-pretty-print.c @@ -1289,6 +1289,9 @@ dump_gimple_omp_target (pretty_printer *buffer, gimple gs, int spc, int flags) case GF_OMP_TARGET_KIND_UPDATE: kind = update; break; +case GF_OMP_TARGET_KIND_OACC_DATA: + kind = oacc_data; + break; default: gcc_unreachable (); } diff --git gcc/gimple.c gcc/gimple.c index 2a967aa..30561b1 100644 --- gcc/gimple.c +++ gcc/gimple.c @@ -1051,7 +1051,8 @@ gimple_build_omp_single (gimple_seq body, tree clauses) /* Build a GIMPLE_OMP_TARGET statement. BODY is the sequence of statements that will be executed. - CLAUSES are any of the OMP target construct's clauses. */ + KIND is the kind of target region. + CLAUSES are any of the construct's clauses. */ gimple gimple_build_omp_target (gimple_seq body, int kind, tree clauses) @@ -1747,6 +1748,7 @@ gimple_copy (gimple stmt) case GIMPLE_OMP_TASKGROUP: case GIMPLE_OMP_ORDERED: copy_omp_body: + gcc_assert (!is_gimple_omp_oacc_specifically (stmt)); new_seq = gimple_seq_copy (gimple_omp_body (stmt)); gimple_omp_set_body (copy, new_seq); break; diff --git gcc/gimple.def gcc/gimple.def index 2b78c06..ce800bd 100644 --- gcc/gimple.def +++ gcc/gimple.def @@ -360,6 +360,7 @@ DEFGSCODE(GIMPLE_OMP_SECTIONS_SWITCH, gimple_omp_sections_switch, GSS_BASE) DEFGSCODE(GIMPLE_OMP_SINGLE, gimple_omp_single, GSS_OMP_SINGLE_LAYOUT) /* GIMPLE_OMP_TARGET BODY, CLAUSES, CHILD_FN represents + #pragma acc data #pragma omp target {,data,update} BODY is the sequence of statements inside the target construct (NULL for target update). diff --git gcc/gimple.h gcc/gimple.h index 0d250ef..b4ee9fa 100644 --- gcc/gimple.h +++ gcc/gimple.h @@
[gomp4 3/3] OpenACC data construct support in the C front end.
From: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4 gcc/c-family/ * c-pragma.c (oacc_pragmas): Add data. * c-pragma.h (enum pragma_kind): Add PRAGMA_OACC_DATA. gcc/c/ * c-parser.c (OACC_DATA_CLAUSE_MASK): New macro definition. (c_parser_oacc_data): New function. (c_parser_omp_construct): Handle PRAGMA_OACC_DATA. * c-tree.h (c_finish_oacc_data): New prototype. * c-typeck.c (c_finish_oacc_data): New function. gcc/testsuite/ * c-c++-common/goacc-gomp/nesting-fail-1.c: Extend for OpenACC data construct. * c-c++-common/goacc/nesting-fail-1.c: Likewise. * c-c++-common/goacc/parallel-fail-1.c: Rename to... * c-c++-common/goacc/clauses-fail.c: ... this new file. Extend for OpenACC data construct. * c-c++-common/goacc/data-1.c: New file. libgomp/ * testsuite/libgomp.oacc-c/data-1.c: New file. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@208017 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/c-family/ChangeLog.gomp| 5 + gcc/c-family/c-pragma.c| 1 + gcc/c-family/c-pragma.h| 1 + gcc/c/ChangeLog.gomp | 8 + gcc/c/c-parser.c | 42 + gcc/c/c-tree.h | 1 + gcc/c/c-typeck.c | 19 +++ gcc/testsuite/ChangeLog.gomp | 10 ++ .../c-c++-common/goacc-gomp/nesting-fail-1.c | 92 ++- gcc/testsuite/c-c++-common/goacc/clauses-fail.c| 9 ++ gcc/testsuite/c-c++-common/goacc/data-1.c | 6 + gcc/testsuite/c-c++-common/goacc/nesting-fail-1.c | 18 ++- gcc/testsuite/c-c++-common/goacc/parallel-fail-1.c | 6 - libgomp/ChangeLog.gomp | 2 + libgomp/testsuite/libgomp.oacc-c/data-1.c | 170 + 15 files changed, 380 insertions(+), 10 deletions(-) create mode 100644 gcc/testsuite/c-c++-common/goacc/clauses-fail.c create mode 100644 gcc/testsuite/c-c++-common/goacc/data-1.c delete mode 100644 gcc/testsuite/c-c++-common/goacc/parallel-fail-1.c create mode 100644 libgomp/testsuite/libgomp.oacc-c/data-1.c diff --git gcc/c-family/ChangeLog.gomp gcc/c-family/ChangeLog.gomp index e092d53..3da377f 100644 --- gcc/c-family/ChangeLog.gomp +++ gcc/c-family/ChangeLog.gomp @@ -1,3 +1,8 @@ +2014-02-21 Thomas Schwinge tho...@codesourcery.com + + * c-pragma.c (oacc_pragmas): Add data. + * c-pragma.h (enum pragma_kind): Add PRAGMA_OACC_DATA. + 2014-01-28 Thomas Schwinge tho...@codesourcery.com * c-pragma.h (pragma_omp_clause): Add PRAGMA_OMP_CLAUSE_COPY, diff --git gcc/c-family/c-pragma.c gcc/c-family/c-pragma.c index f69486a..08374aa 100644 --- gcc/c-family/c-pragma.c +++ gcc/c-family/c-pragma.c @@ -1169,6 +1169,7 @@ static vecpragma_ns_name registered_pp_pragmas; struct omp_pragma_def { const char *name; unsigned int id; }; static const struct omp_pragma_def oacc_pragmas[] = { + { data, PRAGMA_OACC_DATA }, { parallel, PRAGMA_OACC_PARALLEL }, }; static const struct omp_pragma_def omp_pragmas[] = { diff --git gcc/c-family/c-pragma.h gcc/c-family/c-pragma.h index 1ea5b1d..d092f9f 100644 --- gcc/c-family/c-pragma.h +++ gcc/c-family/c-pragma.h @@ -27,6 +27,7 @@ along with GCC; see the file COPYING3. If not see typedef enum pragma_kind { PRAGMA_NONE = 0, + PRAGMA_OACC_DATA, PRAGMA_OACC_PARALLEL, PRAGMA_OMP_ATOMIC, PRAGMA_OMP_BARRIER, diff --git gcc/c/ChangeLog.gomp gcc/c/ChangeLog.gomp index b199957..9b95725 100644 --- gcc/c/ChangeLog.gomp +++ gcc/c/ChangeLog.gomp @@ -1,3 +1,11 @@ +2014-02-21 Thomas Schwinge tho...@codesourcery.com + + * c-parser.c (OACC_DATA_CLAUSE_MASK): New macro definition. + (c_parser_oacc_data): New function. + (c_parser_omp_construct): Handle PRAGMA_OACC_DATA. + * c-tree.h (c_finish_oacc_data): New prototype. + * c-typeck.c (c_finish_oacc_data): New function. + 2014-02-17 Thomas Schwinge tho...@codesourcery.com * c-parser.c (c_parser_omp_clause_name): Accept pcopy, pcopyin, diff --git gcc/c/c-parser.c gcc/c/c-parser.c index 7850eab..4643722 100644 --- gcc/c/c-parser.c +++ gcc/c/c-parser.c @@ -4776,10 +4776,14 @@ c_parser_label (c_parser *parser) openacc-construct: parallel-construct + data-construct parallel-construct: parallel-directive structured-block + data-construct: + data-directive structured-block + OpenMP: statement: @@ -11362,6 +11366,41 @@ c_parser_omp_structured_block (c_parser *parser) } /* OpenACC 2.0: + # pragma acc data oacc-data-clause[optseq] new-line + structured-block + + LOC is the location of the #pragma token. +*/ + +#define OACC_DATA_CLAUSE_MASK \ + ( (OMP_CLAUSE_MASK_1 PRAGMA_OMP_CLAUSE_COPY)
patch to fix PR60298
The following patch fixes http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60298 The patch was successfully bootstrapped on x86/x86-64. Committed as rev. 208023. 2014-02-21 Vladimir Makarov vmaka...@redhat.com PR target/60298 * lra-constraints.c (inherit_reload_reg): Use lra_emit_move instead of emit_move_insn. Index: lra-constraints.c === --- lra-constraints.c (revision 207787) +++ lra-constraints.c (working copy) @@ -4473,9 +4473,9 @@ inherit_reload_reg (bool def_p, int orig rclass, inheritance); start_sequence (); if (def_p) -emit_move_insn (original_reg, new_reg); +lra_emit_move (original_reg, new_reg); else -emit_move_insn (new_reg, original_reg); +lra_emit_move (new_reg, original_reg); new_insns = get_insns (); end_sequence (); if (NEXT_INSN (new_insns) != NULL_RTX)
C++ PATCH for c++/60241 (ICE with specialization of member class template)
We already have the code to reassign instances to the appropriate template when we see a specialization of a partial instantiation of a member template, but it wasn't firing properly in this case, for two reasons: 1) We were attaching the instances to the most general template and then looking for them on the partial instantiation. 2) We were only reassigning explicit specializations. Tested x86_64-pc-linux-gnu, applying to trunk. It should be appropriate for backporting later if it doesn't cause trouble. commit 667bae7d1bfeea4e881cf6236d8679fc0c11c49e Author: Jason Merrill ja...@redhat.com Date: Fri Feb 21 13:51:18 2014 -0500 PR c++/60241 * pt.c (lookup_template_class_1): Update DECL_TEMPLATE_INSTANTIATIONS of the partial instantiation, not the most general template. (maybe_process_partial_specialization): Reassign everything on that list. diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index a394441..91a8840 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -914,11 +914,13 @@ maybe_process_partial_specialization (tree type) t; t = TREE_CHAIN (t)) { tree inst = TREE_VALUE (t); - if (CLASSTYPE_TEMPLATE_SPECIALIZATION (inst)) + if (CLASSTYPE_TEMPLATE_SPECIALIZATION (inst) + || !COMPLETE_OR_OPEN_TYPE_P (inst)) { /* We already have a full specialization of this partial - instantiation. Reassign it to the new member - specialization template. */ + instantiation, or a full specialization has been + looked up but not instantiated. Reassign it to the + new member specialization template. */ spec_entry elt; spec_entry *entry; void **slot; @@ -937,7 +939,7 @@ maybe_process_partial_specialization (tree type) *entry = elt; *slot = entry; } - else if (COMPLETE_OR_OPEN_TYPE_P (inst)) + else /* But if we've had an implicit instantiation, that's a problem ([temp.expl.spec]/6). */ error (specialization %qT after instantiation %qT, @@ -7596,7 +7598,7 @@ lookup_template_class_1 (tree d1, tree arglist, tree in_decl, tree context, } /* Let's consider the explicit specialization of a member - of a class template specialization that is implicitely instantiated, + of a class template specialization that is implicitly instantiated, e.g.: templateclass T struct S @@ -7694,9 +7696,9 @@ lookup_template_class_1 (tree d1, tree arglist, tree in_decl, tree context, /* Note this use of the partial instantiation so we can check it later in maybe_process_partial_specialization. */ - DECL_TEMPLATE_INSTANTIATIONS (templ) + DECL_TEMPLATE_INSTANTIATIONS (found) = tree_cons (arglist, t, - DECL_TEMPLATE_INSTANTIATIONS (templ)); + DECL_TEMPLATE_INSTANTIATIONS (found)); if (TREE_CODE (template_type) == ENUMERAL_TYPE !is_dependent_type !DECL_ALIAS_TEMPLATE_P (gen_tmpl)) diff --git a/gcc/testsuite/g++.dg/template/memclass5.C b/gcc/testsuite/g++.dg/template/memclass5.C new file mode 100644 index 000..eb32f13 --- /dev/null +++ b/gcc/testsuite/g++.dg/template/memclass5.C @@ -0,0 +1,26 @@ +// PR c++/60241 + +template typename T +struct x +{ +template typename U +struct y +{ +typedef T result2; +}; + +typedef yint zy; +}; + +template +templateclass T +struct xint::y +{ +typedef double result2; +}; + +int main() +{ +xint::zy::result2 xxx; +xint::yint::result2 xxx2; +}
C++ PATCH for c++/59347 (ICE with ill-formed typedef in template)
An earlier patch of mine changed the compiler to retain erroneous declarations to provide better error-recovery behavior. But that's causing problems with nested typedefs, so let's not bother in that case. Tested x86_64-pc-linux-gnu, applying to trunk. commit 85cffc1cc3fe706d61a417cf6a1139f546a458e9 Author: Jason Merrill ja...@redhat.com Date: Fri Feb 21 13:59:45 2014 -0500 PR c++/59347 * pt.c (tsubst_decl) [TYPE_DECL]: Don't try to instantiate an erroneous typedef. diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index 91a8840..2dc5f32 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -10824,6 +10824,9 @@ tsubst_decl (tree t, tree args, tsubst_flags_t complain) tree type = NULL_TREE; bool local_p; + if (TREE_TYPE (t) == error_mark_node) + RETURN (error_mark_node); + if (TREE_CODE (t) == TYPE_DECL t == TYPE_MAIN_DECL (TREE_TYPE (t))) { diff --git a/gcc/testsuite/g++.dg/template/typedef41.C b/gcc/testsuite/g++.dg/template/typedef41.C new file mode 100644 index 000..dc25518 --- /dev/null +++ b/gcc/testsuite/g++.dg/template/typedef41.C @@ -0,0 +1,8 @@ +// PR c++/59347 + +templateint struct A +{ + typedef int ::X; // { dg-error } +}; + +A0 a;
C++ PATCH for c++/60187 (ICE with bare parameter pack in enum-base)
Yet another place where we need to check for bare parameter packs. Tested x86_64-pc-linux-gnu, applying to trunk and 4.8. commit 4e02d1498063b3ffa31d3fe35682b0c94667360c Author: Jason Merrill ja...@redhat.com Date: Fri Feb 21 14:03:36 2014 -0500 PR c++/60187 * parser.c (cp_parser_enum_specifier): Call check_for_bare_parameter_packs. diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c index 6f19ae2..7bbdf90 100644 --- a/gcc/cp/parser.c +++ b/gcc/cp/parser.c @@ -15376,7 +15376,8 @@ cp_parser_enum_specifier (cp_parser* parser) { underlying_type = grokdeclarator (NULL, type_specifiers, TYPENAME, /*initialized=*/0, NULL); - if (underlying_type == error_mark_node) + if (underlying_type == error_mark_node + || check_for_bare_parameter_packs (underlying_type)) underlying_type = NULL_TREE; } } diff --git a/gcc/testsuite/g++.dg/cpp0x/enum_base2.C b/gcc/testsuite/g++.dg/cpp0x/enum_base2.C new file mode 100644 index 000..8c6a901 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/enum_base2.C @@ -0,0 +1,9 @@ +// PR c++/60187 +// { dg-require-effective-target c++11 } + +templatetypename... T struct A +{ + enum E : T {}; // { dg-error parameter pack } +}; + +Aint a;
C++ PATCH for c++/60186 (ICE with constexpr and init-list in template)
My earlier massage_init_elt patch neglected to call fold_non_dependent_expr before maybe_constant_init. Tested x86_64-pc-linux-gnu, applying to trunk. commit b77241e3be8b3eb4247d07e2f2967cbb585e08bc Author: Jason Merrill ja...@redhat.com Date: Fri Feb 21 14:37:17 2014 -0500 PR c++/60186 * typeck2.c (massage_init_elt): Call fold_non_dependent_expr_sfinae. diff --git a/gcc/cp/typeck2.c b/gcc/cp/typeck2.c index 546b83f..8877286 100644 --- a/gcc/cp/typeck2.c +++ b/gcc/cp/typeck2.c @@ -1131,7 +1131,10 @@ massage_init_elt (tree type, tree init, tsubst_flags_t complain) init = TARGET_EXPR_INITIAL (init); /* When we defer constant folding within a statement, we may want to defer this folding as well. */ - init = maybe_constant_init (init); + tree t = fold_non_dependent_expr_sfinae (init, complain); + t = maybe_constant_value (t); + if (TREE_CONSTANT (t)) +init = t; return init; } diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-initlist7.C b/gcc/testsuite/g++.dg/cpp0x/constexpr-initlist7.C new file mode 100644 index 000..6fea82f --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-initlist7.C @@ -0,0 +1,7 @@ +// PR c++/60186 +// { dg-require-effective-target c++11 } + +templatetypename void foo(int i) +{ + constexpr int a[] = { i }; // { dg-error } +}
Re: C++ PATCH for c++/60252 (ICE with VLA in lambda parameter)
On 02/21/2014 09:10 AM, Jason Merrill wrote: While parsing the template parameter list for a lambda, we've already pushed into the closure class but haven't created the op() FUNCTION_DECL, so trying to capture 'this' by way of the 'this' pointer of op() breaks. Avoid the ICE by not trying to capture 'this' when parsing a parameter list. On second thought, I'd rather not depend on the parsing state here, since we don't always update current_binding_level during template instantiation. So let's check for the actual problem instead. Tested x86_64-pc-linux-gnu, applying to trunk. commit 5ca06118071f28b060b751415d18f8af4968a0a4 Author: Jason Merrill ja...@redhat.com Date: Fri Feb 21 15:06:47 2014 -0500 PR c++/60252 * lambda.c (maybe_resolve_dummy): Check lambda_function rather than current_binding_level. diff --git a/gcc/cp/lambda.c b/gcc/cp/lambda.c index 7fe235b..277dec6 100644 --- a/gcc/cp/lambda.c +++ b/gcc/cp/lambda.c @@ -749,10 +749,8 @@ maybe_resolve_dummy (tree object) if (type != current_class_type current_class_type LAMBDA_TYPE_P (current_class_type) - DERIVED_FROM_P (type, current_nonlambda_class_type ()) - /* If we get here while parsing the parameter list of a lambda, it - will fail, so don't even try (c++/60252). */ - current_binding_level-kind != sk_function_parms) + lambda_function (current_class_type) + DERIVED_FROM_P (type, current_nonlambda_class_type ())) { /* In a lambda, need to go through 'this' capture. */ tree lam = CLASSTYPE_LAMBDA_EXPR (current_class_type);
C++ PATCH for c++/60185 (ICE with invalid default arg in template)
To avoid problems trying to resolve an invalid use of 'this' before diagnosing it later, let's do the same thing we do in tsubst_default_argument, namely clear current_class_{ptr,ref}. Tested x86_64-pc-linux-gnu, applying to trunk. commit f1051ca23020746350bacff3c499b2a9d1ec0dff Author: Jason Merrill ja...@redhat.com Date: Fri Feb 21 15:08:28 2014 -0500 PR c++/60185 * parser.c (cp_parser_default_argument): Clear current_class_ptr/current_class_ref like tsubst_default_argument. diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c index 7bbdf90..47a67c4 100644 --- a/gcc/cp/parser.c +++ b/gcc/cp/parser.c @@ -18633,8 +18633,24 @@ cp_parser_default_argument (cp_parser *parser, bool template_parm_p) /* Parse the assignment-expression. */ if (template_parm_p) push_deferring_access_checks (dk_no_deferred); + tree saved_class_ptr = NULL_TREE; + tree saved_class_ref = NULL_TREE; + /* The this pointer is not valid in a default argument. */ + if (cfun) +{ + saved_class_ptr = current_class_ptr; + cp_function_chain-x_current_class_ptr = NULL_TREE; + saved_class_ref = current_class_ref; + cp_function_chain-x_current_class_ref = NULL_TREE; +} default_argument = cp_parser_initializer (parser, is_direct_init, non_constant_p); + /* Restore the this pointer. */ + if (cfun) +{ + cp_function_chain-x_current_class_ptr = saved_class_ptr; + cp_function_chain-x_current_class_ref = saved_class_ref; +} if (BRACE_ENCLOSED_INITIALIZER_P (default_argument)) maybe_warn_cpp0x (CPP0X_INITIALIZER_LISTS); if (template_parm_p) diff --git a/gcc/testsuite/g++.dg/overload/defarg5.C b/gcc/testsuite/g++.dg/overload/defarg5.C index 06ea6bf..d022b0c 100644 --- a/gcc/testsuite/g++.dg/overload/defarg5.C +++ b/gcc/testsuite/g++.dg/overload/defarg5.C @@ -2,6 +2,6 @@ struct A { - int i; - A() { void foo(int=i); } // { dg-error this } + int i; // { dg-message } + A() { void foo(int=i); } // { dg-error } }; diff --git a/gcc/testsuite/g++.dg/template/defarg17.C b/gcc/testsuite/g++.dg/template/defarg17.C new file mode 100644 index 000..38d68d4 --- /dev/null +++ b/gcc/testsuite/g++.dg/template/defarg17.C @@ -0,0 +1,9 @@ +// PR c++/60185 + +templateint struct A +{ + int i; // { dg-message } + A() { void foo(int=i); } // { dg-error } +}; + +A0 a;
C++ PATCH for c++/60108 (ICE with defaulted virtual in template)
emit_associated_thunks expects DECL_INTERFACE_KNOWN to be set, but we weren't setting it in this case (as opposed to the case where the destructor is implicitly declared) because it has DECL_TEMPLATE_INSTANTIATION set. Fixed by checking for DECL_DEFAULTED_FN as well. Tested x86_64-pc-linux-gnu, applying to trunk and 4.8. commit 670511e83f8bb5df8dd87bfbd3b8a9625ba9963f Author: Jason Merrill ja...@redhat.com Date: Fri Feb 21 15:37:45 2014 -0500 PR c++/60108 * semantics.c (expand_or_defer_fn_1): Check DECL_DEFAULTED_FN. diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c index 6f32496..85d6807 100644 --- a/gcc/cp/semantics.c +++ b/gcc/cp/semantics.c @@ -3986,7 +3986,7 @@ expand_or_defer_fn_1 (tree fn) linkage of all functions, and as that causes writes to the data mapped in from the PCH file, it's advantageous to mark the functions at this point. */ - if (!DECL_IMPLICIT_INSTANTIATION (fn)) + if (!DECL_IMPLICIT_INSTANTIATION (fn) || DECL_DEFAULTED_FN (fn)) { /* This function must have external linkage, as otherwise DECL_INTERFACE_KNOWN would have been diff --git a/gcc/testsuite/g++.dg/cpp0x/defaulted48.C b/gcc/testsuite/g++.dg/cpp0x/defaulted48.C new file mode 100644 index 000..727afc5 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/defaulted48.C @@ -0,0 +1,17 @@ +// PR c++/60108 +// { dg-require-effective-target c++11 } + +templateint struct A +{ + virtual ~A(); +}; + +templatetypename struct B : A0, A1 +{ + ~B() = default; +}; + +struct C : Bbool +{ + C() {} +};
Re: [google gcc-4_8] not split bb for machine dependent builtins
Ok. I expect this also submitted to trunk later. David On Fri, Feb 21, 2014 at 2:08 PM, Rong Xu x...@google.com wrote: Hi, For builtins without nothrow attributes, we currently split bb by adding fake edge to func_exit in instrumenting profile counters. While it's safe, The resulted control flow and additional counters drastically increase the compile time for programs with lots of builtin calls. This patch suppresses the adding of the fake edges for machine dependent builtins. This is for google branch only. Tested with SPEC2006, google internal benchmarks and bootstrap. OK to commit? Thanks, -Rong
C++ PATCH for c++/58170 (ICE with alias template)
There's no reason why we wouldn't check for dependent scopes when parsing the target of an alias declaration, and indeed not doing so led to the ICE here. The rest of the patch improves the diagnostic for this testcase (and some others). Tested x86_64-pc-linux-gnu, applying to trunk. Also applying the cp_parser_type_name hunk to 4.8. commit 21f4a8a5550498513e1235239b69aa5bc537687b Author: Jason Merrill ja...@redhat.com Date: Fri Feb 21 16:58:21 2014 -0500 PR c++/58170 * parser.c (cp_parser_type_name): Always check dependency. (cp_parser_type_specifier_seq): Call cp_parser_parse_and_diagnose_invalid_type_name. diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c index 47a67c4..1e98032 100644 --- a/gcc/cp/parser.c +++ b/gcc/cp/parser.c @@ -14763,7 +14763,7 @@ cp_parser_type_name (cp_parser* parser) instantiation of an alias template... */ type_decl = cp_parser_template_id (parser, /*template_keyword_p=*/false, - /*check_dependency_p=*/false, + /*check_dependency_p=*/true, none_type, /*is_declaration=*/false); /* Note that this must be an instantiation of an alias template @@ -18083,7 +18083,16 @@ cp_parser_type_specifier_seq (cp_parser* parser, type-specifier-seq at all. */ if (!seen_type_specifier) { - cp_parser_error (parser, expected type-specifier); + /* Set in_declarator_p to avoid skipping to the semicolon. */ + int in_decl = parser-in_declarator_p; + parser-in_declarator_p = true; + + if (cp_parser_uncommitted_to_tentative_parse_p (parser) + || !cp_parser_parse_and_diagnose_invalid_type_name (parser)) + cp_parser_error (parser, expected type-specifier); + + parser-in_declarator_p = in_decl; + type_specifier_seq-type = error_mark_node; return; } diff --git a/gcc/testsuite/g++.dg/cpp0x/alias-decl-40.C b/gcc/testsuite/g++.dg/cpp0x/alias-decl-40.C new file mode 100644 index 000..f8bff78 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/alias-decl-40.C @@ -0,0 +1,33 @@ +// PR c++/58170 +// { dg-require-effective-target c++11 } +// { dg-prune-output not declared } +// { dg-prune-output expected } + +template typename T, typename U +struct base { + template typename V + struct derived; +}; + +template typename T, typename U +template typename V +struct baseT, U::derived : public baseT, V { +}; + +// This (wrong?) alias declaration provokes the crash. +template typename T, typename U, typename V +using alias = baseT, U::derivedV; // { dg-error template|typename } + +// This one works: +// template typename T, typename U, typename V +// using alias = typename baseT, U::template derivedV; + +template typename T +void f() { + aliasT, bool, char m{}; + (void) m; +} + +int main() { + fint(); +} diff --git a/gcc/testsuite/g++.dg/cpp0x/error8.C b/gcc/testsuite/g++.dg/cpp0x/error8.C index cc4f877..a992077 100644 --- a/gcc/testsuite/g++.dg/cpp0x/error8.C +++ b/gcc/testsuite/g++.dg/cpp0x/error8.C @@ -3,5 +3,5 @@ struct A { - int* p = new foo; // { dg-error 16:expected type-specifier } + int* p = new foo; // { dg-error 16:foo. does not name a type } }; diff --git a/gcc/testsuite/g++.dg/cpp0x/override4.C b/gcc/testsuite/g++.dg/cpp0x/override4.C index aec5c2c..695f9a3 100644 --- a/gcc/testsuite/g++.dg/cpp0x/override4.C +++ b/gcc/testsuite/g++.dg/cpp0x/override4.C @@ -16,12 +16,12 @@ struct B2 struct B3 { - virtual auto f() - final void; // { dg-error expected type-specifier } + virtual auto f() - final void; // { dg-error type } }; struct B4 { - virtual auto f() - final void {} // { dg-error expected type-specifier } + virtual auto f() - final void {} // { dg-error type } }; struct D : B @@ -36,10 +36,10 @@ struct D2 : B struct D3 : B { - virtual auto g() - override void; // { dg-error expected type-specifier } + virtual auto g() - override void; // { dg-error type } }; struct D4 : B { - virtual auto g() - override void {} // { dg-error expected type-specifier } + virtual auto g() - override void {} // { dg-error type } }; diff --git a/gcc/testsuite/g++.dg/ext/underlying_type1.C b/gcc/testsuite/g++.dg/ext/underlying_type1.C index a8f68d3..999cd9f 100644 --- a/gcc/testsuite/g++.dg/ext/underlying_type1.C +++ b/gcc/testsuite/g++.dg/ext/underlying_type1.C @@ -8,7 +8,7 @@ templatetypename T { typedef __underlying_type(T) type; }; // { dg-error not an enumeration } __underlying_type(int) i1; // { dg-error not an enumeration|invalid } -__underlying_type(A) i2; // { dg-error expected } +__underlying_type(A) i2; // { dg-error expected|type } __underlying_type(B) i3; // { dg-error not an enumeration|invalid } __underlying_type(U) i4; // { dg-error not an enumeration|invalid } diff --git a/gcc/testsuite/g++.dg/parse/crash48.C b/gcc/testsuite/g++.dg/parse/crash48.C index 4541548..020ddf0 100644 --- a/gcc/testsuite/g++.dg/parse/crash48.C +++ b/gcc/testsuite/g++.dg/parse/crash48.C @@ -5,5 +5,5 @@ void foo (bool b) {