Re: [CHKP, PATCH] Fix LTO cgraph merge for instrumented functions
2015-04-02 20:01 GMT+03:00 Jan Hubicka hubi...@ucw.cz: Ping It would help if you add hubi...@ucw.cz to CC for IPA related patches. 2015-03-19 11:34 GMT+03:00 Ilya Enkovich enkovich@gmail.com: On 12 Mar 13:27, Ilya Enkovich wrote: Hi, Currently cgraph merge has several issues with instrumented code: - original function node may be removed = no assembler name conflict is detected between function and variable Why do you need to detect this one? Assembler name of instrumented function is a transparent alias of original function's name. Alias chains are not taken into account by analysis. Thus we see no conflict between instrumented function's name and a variable name but emit them with the same assembler name. To detect name conflict I keep original function node alive. - only orig_decl name is privatized for instrumented function = node still shares assembler name which causes infinite privatization loop - information about changed name is stored in file_data of instrumented node = original section name may be not found for original function - chkp reference is not fixed when nodes are merged This patch should fix theese problems by keeping instrumentation thunks reachable, privatizing both nodes and fixing chkp references. Bootstrapped and tested on x86_64-unknown-linux-gnu. OK for trunk? Next stage1 I definitly hope we will be able to reduce number of special cases we need for chck nodes. I will try to read the code and your design document and give it some tought. Original design didn't take into account several LTO aspect and additional patches appeared to fix various LTO issues. It would be nice to improve it with your help. I'll need to update a design document to reflect recent problems and used fixes. 2015-03-19 Ilya Enkovich ilya.enkov...@intel.com * ipa-chkp.h (chkp_maybe_fix_chkp_ref): New. * ipa-chkp.c (chkp_maybe_fix_chkp_ref): New. * ipa.c (symbol_table::remove_unreachable_nodes): Don't remove instumentation thunks calling reachable functions. * lto-cgraph.c: Include ipa-chkp.h. (input_symtab): Fix chkp references for boundary nodes. * lto/lto-partition.c (privatize_symbol_name_1): New. (privatize_symbol_name): Privatize both decl and orig_decl names for instrumented functions. * lto/lto-symtab.c: Include ipa-chkp.h. (lto_cgraph_replace_node): Fix chkp references for merged function nodes. gcc/testsuite/ 2015-03-19 Ilya Enkovich ilya.enkov...@intel.com * gcc.dg/lto/chkp-privatize-1_0.c: New. * gcc.dg/lto/chkp-privatize-1_1.c: New. * gcc.dg/lto/chkp-privatize-2_0.c: New. * gcc.dg/lto/chkp-privatize-2_1.c: New. diff --git a/gcc/ipa-chkp.c b/gcc/ipa-chkp.c index 0b857ff..7a7fc3c 100644 --- a/gcc/ipa-chkp.c +++ b/gcc/ipa-chkp.c @@ -414,6 +414,40 @@ chkp_instrumentable_p (tree fndecl) (!fn || !copy_forbidden (fn, fndecl))); } +/* Check NODE has a correct IPA_REF_CHKP reference. + Create a new reference if required. */ + +void +chkp_maybe_fix_chkp_ref (cgraph_node *node) OK, so you have the chkp references that links to instrumented_version. You do not stream them for boundary symbols but for some reason they are needed, but when you can end up with wrong CHKP link? Wrong links may appear when cgraph nodes are merged. Thanks Ilya
Re: [CHKP] Support returned bounds in thunks expand
2015-04-02 23:55 GMT+03:00 Jan Hubicka hubi...@ucw.cz: Ping 2015-03-10 13:12 GMT+03:00 Ilya Enkovich enkovich@gmail.com: Hi, Currentl we loose returned bounds when functions are merged. This patch fixes it by adding returne bounds support for cgraph_node::expand_thunk. Bootstrapped and tested on x86_64-unknown-linux-gnu. OK for trunk? 2015-03-06 Ilya Enkovich ilya.enkov...@intel.com * cgraphunit.c (cgraph_node::expand_thunk): Build returned bounds for instrumented functions. I think if the extra bultin call is needed here, the same andling needs to be added to ipa-split. We really ought to unify that code - it is surprisingly difficult to produce a wrapper call these days. Yep, there is such code in place. It is done by insert_bndret_call_after. You are right, I should use the same interface for both these cases. + if (instrumentation_clone + !DECL_BY_REFERENCE (resdecl) + restmp + BOUNDED_P (restmp)) + { + tree fn = targetm.builtin_chkp_function (BUILT_IN_CHKP_BNDRET); + gcall *retbnd = gimple_build_call (fn, 1, restmp); + + resbnd = create_tmp_reg (pointer_bounds_type_node, retbnd); + gimple_call_set_lhs (retbnd, resbnd); + + gsi_insert_after (bsi, retbnd, GSI_NEW_STMT); + create_edge (get_create (fn), retbnd, callees-count, callees-frequency); + } I am not sure why we need to check here: Originaly we have two bounded functions, A and B. We turn function B to a wrapper of function A. If callers of bounded functions need to call a builtin, I would expect all callers of B to do it already, so I would expect that wrapper is safe here? The problem with instrumented call is that instrumented function returns two values and call lhs gets only the first one. Thus we generate bndret call to get the second one to build own return with two values. Or do we mix instrumented and non-instrumented versions somehow? Addding code handling return value is going to turn the call to non-tailcall, so you probably want to drop the tailcall flag, too. No, function still return what the last call return, thus may keep tailcall. Thanks, Ilya Honza
[PATCH] Fix PR ipa/65665
Hello. Following patch fixes a new issue, ICE has disappeared on aarch64-linux machine and I was able to boostrap on the machine. Moreover, no new regression seen on x86_64-linux-pc. Ready for trunk? Thanks, Martin From b8463b8a7b2f6b652d6d9c3a2ece814ec5554619 Mon Sep 17 00:00:00 2001 From: mliska mli...@suse.cz Date: Fri, 3 Apr 2015 09:30:50 +0200 Subject: [PATCH] Fix PR ipa/65665 gcc/ChangeLog: 2015-04-03 Martin Liska mli...@suse.cz PR ipa/65665 * ipa-icf.c (sem_function::equals_wpa): Verify that IPA CP has computed data structure. (sem_item_optimizer::update_hash_by_addr_refs): Likewise. --- gcc/ipa-icf.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/gcc/ipa-icf.c b/gcc/ipa-icf.c index 8626730..8f8a0cf 100644 --- a/gcc/ipa-icf.c +++ b/gcc/ipa-icf.c @@ -535,7 +535,8 @@ sem_function::equals_wpa (sem_item *item, (TREE_CODE (TREE_TYPE (decl)) == METHOD_TYPE || TREE_CODE (TREE_TYPE (item-decl)) == METHOD_TYPE) (ipa_node_params_sum == NULL - || ipa_is_param_used (IPA_NODE_REF (dyn_cast cgraph_node *(node)), + || IPA_NODE_REF (get_node ())-descriptors.is_empty () + || ipa_is_param_used (IPA_NODE_REF (get_node ()), 0)) compare_polymorphic_p ()) { @@ -2501,14 +2502,15 @@ sem_item_optimizer::update_hash_by_addr_refs () m_items[i]-update_hash_by_addr_refs (m_symtab_node_map); if (m_items[i]-type == FUNC) { + cgraph_node *cnode = dyn_cast cgraph_node * (m_items[i]-node); + if (TREE_CODE (TREE_TYPE (m_items[i]-decl)) == METHOD_TYPE contains_polymorphic_type_p (method_class_type (TREE_TYPE (m_items[i]-decl))) (DECL_CXX_CONSTRUCTOR_P (m_items[i]-decl) || ((ipa_node_params_sum == NULL - || ipa_is_param_used ( - IPA_NODE_REF - (dyn_cast cgraph_node *(m_items[i]-node)), 0)) + || IPA_NODE_REF (cnode)-descriptors.is_empty () + || ipa_is_param_used (IPA_NODE_REF (cnode), 0)) static_castsem_function * (m_items[i]) -compare_polymorphic_p ( { -- 2.1.4
Re: [PATCH, CHKP] Fix ipa-comdats for instrumentation thunks
2015-04-02 20:04 GMT+03:00 Jan Hubicka hubi...@ucw.cz: Hi, With r221574 (https://gcc.gnu.org/ml/gcc-cvs/2015-03/msg00495.html) thunks don't get comdat groups assigned and this causes a failure in cgraph checker for instrumentation thunks. It happens because instrumentation thunk may reference local symbol in comdat not being in comdat by itself. This patch fixes the problem. Doesn't affect not instrumented code. Testing is in progress. Does it look OK? Thanks, Ilya -- gcc/ 2015-04-02 Ilya Enkovich ilya.enkov...@intel.com * ipa-comdats.c (ipa_comdats): Visit all instrumentation thunks to set proper comdat group. gcc/testsuite/ 2015-04-02 Ilya Enkovich ilya.enkov...@intel.com * gcc.target/i386/mpx/chkp-thunk-comdat-1.cc: New. * gcc.target/i386/mpx/chkp-thunk-comdat-2.cc: New. diff --git a/gcc/ipa-comdats.c b/gcc/ipa-comdats.c index f349f9f..30bcad8 100644 --- a/gcc/ipa-comdats.c +++ b/gcc/ipa-comdats.c @@ -348,10 +348,9 @@ ipa_comdats (void) } /* Finally assign symbols to the sections. */ - + cgraph_node *fun; FOR_EACH_DEFINED_SYMBOL (symbol) { - struct cgraph_node *fun; symbol-aux = NULL; if (!symbol-get_comdat_group () !symbol-alias @@ -388,6 +387,20 @@ ipa_comdats (void) true); } } + + /* Instrumentation thunks reference original node and thus + need to be in the same comdat group. Otherwise we may + get a local instrumented symbol in a comdat group and + the referencing original node outside of it. */ + FOR_EACH_DEFINED_FUNCTION (fun) +if (fun-instrumentation_clone + fun-instrumented_version + !fun-instrumented_version-alias + fun-get_comdat_group () + !fun-instrumented_version-get_comdat_group ()) + fun-instrumented_version-call_for_symbol_and_aliases + (set_comdat_group_1, fun, true); I think you want to handle them same way as the aliasesthunks are handled. This fix is symptomatic: the code may assign them to different comdat groups and may propagate that furhter. Currently ipa_comdats doesn't set comdat groups for thunks. At the same time all references to local symbol should be within one comdat group. That's why I need this fix. Assigning function and its instrumentation thunks to different comdat groups would be an error because both nodes represent the same function. Thanks, Ilya Honza
[PING, AArch64] Add long-call attribute and pragma interfaces
Patch ping: https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01148.html Thanks.
Re: [CHKP] Support returned bounds in thunks expand
2015-04-02 22:42 GMT+03:00 Jeff Law l...@redhat.com: On 04/02/2015 08:49 AM, Ilya Enkovich wrote: Ping 2015-03-10 13:12 GMT+03:00 Ilya Enkovich enkovich@gmail.com: Hi, Currentl we loose returned bounds when functions are merged. This patch fixes it by adding returne bounds support for cgraph_node::expand_thunk. Bootstrapped and tested on x86_64-unknown-linux-gnu. OK for trunk? Thanks, Ilya -- gcc/ 2015-03-06 Ilya Enkovich ilya.enkov...@intel.com * cgraphunit.c (cgraph_node::expand_thunk): Build returned bounds for instrumented functions. gcc/testsuite/ 2015-03-06 Ilya Enkovich ilya.enkov...@intel.com * gcc/testsuite/gcc.target/i386/thunk-retbnd.c: New. I really dislike the amount of gimple and bounded pointer knowledge in this code. It seems like a significant modularity violation and while you didn't introduce the gimple stuff, we probably shouldn't be making it worse. Is it possible to let this code build up the thunk, then pass it off as a whole to the chkp code to add the instrumentation, particularly for the return value? OK, will rework the patch. ALso, is this critical for stage4? It looks like this is strictly a QofI change, correct? Actually having just a tail call in a function we don't lose bounds and I don't expect it affects QofI. But we get instrumented function with no returned bounds for a pointer which triggers an assert somewhere (don't remember exact place). I'll revisit the original problem and will probably make a simpler stability fix for now, leaving thunk modification for stage 1. Thanks, Ilya jeff
[PATCH, i386] PR63211 broken type-punning in avx* tests.
Hi, I've looked into avx* tests and many of them (even those that don't fail in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63211) use invalid type punning. Properly fixing them looks like a lot of work, so I propose just adding -fno-strict-aliasing to them. This patch was obtained by running sed -i s/-O2/-O2 -fno-strict-aliasing/g ../gcc/testsuite/gcc.target/i386/avx*-2.c Ok for stage1? Changelog below: testsuite/ 2015-04-03 Ilya Tocar ilya.to...@intel.com PR target/63211 * gcc.target/i386/avx-cmpsd-2.c: Update test. * gcc.target/i386/avx-cmpss-2.c: Update test. * gcc.target/i386/avx-vbroadcastf128-256-2.c: Update test. * gcc.target/i386/avx-vbroadcastss-2.c: Update test. * gcc.target/i386/avx-vcomisd-2.c: Update test. * gcc.target/i386/avx-vcomiss-2.c: Update test. * gcc.target/i386/avx-vcvtsd2si-2.c: Update test. * gcc.target/i386/avx-vcvtsi2sd-2.c: Update test. * gcc.target/i386/avx-vcvtsi2ss-2.c: Update test. * gcc.target/i386/avx-vcvtss2si-2.c: Update test. * gcc.target/i386/avx-vcvttsd2si-2.c: Update test. * gcc.target/i386/avx-vcvttss2si-2.c: Update test. * gcc.target/i386/avx-vdppd-2.c: Update test. * gcc.target/i386/avx-vdpps-2.c: Update test. * gcc.target/i386/avx-vextractf128-256-2.c: Update test. * gcc.target/i386/avx-vinsertf128-256-2.c: Update test. * gcc.target/i386/avx-vinsertps-2.c: Update test. * gcc.target/i386/avx-vmaskmovpd-2.c: Update test. * gcc.target/i386/avx-vmaskmovpd-256-2.c: Update test. * gcc.target/i386/avx-vmaskmovps-2.c: Update test. * gcc.target/i386/avx-vmaskmovps-256-2.c: Update test. * gcc.target/i386/avx-vmovapd-2.c: Update test. * gcc.target/i386/avx-vmovapd-256-2.c: Update test. * gcc.target/i386/avx-vmovaps-2.c: Update test. * gcc.target/i386/avx-vmovaps-256-2.c: Update test. * gcc.target/i386/avx-vmovd-2.c: Update test. * gcc.target/i386/avx-vmovdqa-2.c: Update test. * gcc.target/i386/avx-vmovdqa-256-2.c: Update test. * gcc.target/i386/avx-vmovdqu-2.c: Update test. * gcc.target/i386/avx-vmovdqu-256-2.c: Update test. * gcc.target/i386/avx-vmovhpd-2.c: Update test. * gcc.target/i386/avx-vmovhps-2.c: Update test. * gcc.target/i386/avx-vmovlpd-2.c: Update test. * gcc.target/i386/avx-vmovq-2.c: Update test. * gcc.target/i386/avx-vmovsd-2.c: Update test. * gcc.target/i386/avx-vmovss-2.c: Update test. * gcc.target/i386/avx-vmovupd-2.c: Update test. * gcc.target/i386/avx-vmovupd-256-2.c: Update test. * gcc.target/i386/avx-vmovups-2.c: Update test. * gcc.target/i386/avx-vmovups-256-2.c: Update test. * gcc.target/i386/avx-vpcmpestri-2.c: Update test. * gcc.target/i386/avx-vpcmpestrm-2.c: Update test. * gcc.target/i386/avx-vpcmpistri-2.c: Update test. * gcc.target/i386/avx-vpcmpistrm-2.c: Update test. * gcc.target/i386/avx-vperm2f128-256-2.c: Update test. * gcc.target/i386/avx-vpermilpd-2.c: Update test. * gcc.target/i386/avx-vpermilpd-256-2.c: Update test. * gcc.target/i386/avx-vpermilps-2.c: Update test. * gcc.target/i386/avx-vpermilps-256-2.c: Update test. * gcc.target/i386/avx-vpslld-2.c: Update test. * gcc.target/i386/avx-vpsllq-2.c: Update test. * gcc.target/i386/avx-vpsllw-2.c: Update test. * gcc.target/i386/avx-vpsrad-2.c: Update test. * gcc.target/i386/avx-vpsraw-2.c: Update test. * gcc.target/i386/avx-vpsrld-2.c: Update test. * gcc.target/i386/avx-vpsrlq-2.c: Update test. * gcc.target/i386/avx-vpsrlw-2.c: Update test. * gcc.target/i386/avx-vptest-2.c: Update test. * gcc.target/i386/avx-vptest-256-2.c: Update test. * gcc.target/i386/avx-vroundpd-2.c: Update test. * gcc.target/i386/avx-vroundpd-256-2.c: Update test. * gcc.target/i386/avx-vtestpd-2.c: Update test. * gcc.target/i386/avx-vtestpd-256-2.c: Update test. * gcc.target/i386/avx-vtestps-2.c: Update test. * gcc.target/i386/avx-vtestps-256-2.c: Update test. * gcc.target/i386/avx-vucomisd-2.c: Update test. * gcc.target/i386/avx-vucomiss-2.c: Update test. * gcc.target/i386/avx2-i32gatherd-2.c: Update test. * gcc.target/i386/avx2-i32gatherd256-2.c: Update test. * gcc.target/i386/avx2-i32gatherpd-2.c: Update test. * gcc.target/i386/avx2-i32gatherpd256-2.c: Update test. * gcc.target/i386/avx2-i32gatherps-2.c: Update test. * gcc.target/i386/avx2-i32gatherps256-2.c: Update test. * gcc.target/i386/avx2-i32gatherq-2.c: Update test. * gcc.target/i386/avx2-i32gatherq256-2.c: Update test. * gcc.target/i386/avx2-i64gatherd-2.c: Update test. * gcc.target/i386/avx2-i64gatherd256-2.c: Update test.
Re: [PATCH, CHKP] Restore transparent alias chains
2015-04-02 21:48 GMT+03:00 Jan Hubicka hubi...@ucw.cz: On 03/20/2015 02:20 AM, Ilya Enkovich wrote: Hi, Identifiers read with input_identifier miss IDENTIFIER_TRANSPARENT_ALIAS bit. We always expect it for instrumentation clones, thus restore it input_cgraph_1. Bootstrapped and tested on x86_64-unknown-linux-gnu. OK for trunk? Thanks, Ilya -- 2015-03-20 Ilya Enkovich ilya.enkov...@intel.com * lto-cgraph.c (input_cgraph_1): Always link instrumented assembler name with original one. This appears to be a code path that only triggers when MPX is enabled and is roughly analogous to the code in chkp_build_instrumented_fndecl links things up. OK for the trunk. I think this will lead to wrong code. At this time, we may have multple declarations sharing single assembler name (and thus IDENTIFIER_TRANSPARENT_ALIAS link). Think of case where one unit defines static function and other global function of the same name. We may end up renaming the symbol but keeping bogus transparent alias link that will trigger on other symbol. That is the reason for fixing chains in privatize_symbol_name. During WPA the assembler names are never fully unique, but we also probably do not need to set IDENTIFIER_TRANSPARENT_ALIAS. I think IDENTIFIER_TRANSPARENT_ALIAS should be set in rename_statics and separately in ltrans on the place we skip lto_symtab merging. At least it is the place I link it with my patch for supporting transparent aliases in cgraph. I will try to find time to audit chkp code - I missed the addition of transparent aliases in the initial chkp patches. Symbol table code expect 1-1 correspondence between symbol table entries and real symbols. Basically all places we special case aliases or weakrefs needs to be revisited for transparent aliases. I don't see how 1-1 matching may be achieved now for instrumented functions. We have two cgraph_nodes which actually represent the same function. Thus single real symbol for two nodes. Thanks, Ilya Honza
Re: [CHKP] Never expand instrumentation thunks
2015-04-03 0:13 GMT+03:00 Jan Hubicka hubi...@ucw.cz: On 04/02/2015 02:11 PM, Jan Hubicka wrote: 2015-03-18 Ilya Enkovich ilya.enkov...@intel.com * cgraphunit.c (cgraph_node::expand_thunk): Don't expand instrumentation thunks. diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c index e640907..abc9cfe 100644 --- a/gcc/cgraphunit.c +++ b/gcc/cgraphunit.c @@ -1508,6 +1508,10 @@ cgraph_node::expand_thunk (bool output_asm_thunks, bool force_gimple_thunk) tree thunk_fndecl = decl; tree a; + /* Instrumentation thunk is the same function with + a different signature. Never need to expand it. */ + if (thunk.add_pointer_bounds_args) +return false; Yeah, this is another case where we hit problem with transparent alias pretending to be thunk :) I guess the patch is OK for GCC-5 and for next stage1 we can clean this up. I was actually building a compiler so I could take a look at this one under a debugger ;-) I think it is really the transparent alias issue. The comment seems pretty clear about it. What is confusing is that instrumentation thunks are called thunks when they are really not - thunk is a small hunk of code, while instrumentation thunk is a transparent alias. Too bad I did not notice we introduced transparent aliases, i would push out my code for that. I will compare Ilya's changes with mine and hopefully we can catch more bugs and unify the code next stage1. Thunk was the best I could find to represent the same function but with different signature. It would be great to have a more specialized way for that. Thanks, Ilya Honza
Re: [PATCH, i386] PR63211 broken type-punning in avx* tests.
On Fri, Apr 3, 2015 at 1:02 PM, Ilya Tocar tocarip.in...@gmail.com wrote: I've looked into avx* tests and many of them (even those that don't fail in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63211) use invalid type punning. Properly fixing them looks like a lot of work, so I propose just adding -fno-strict-aliasing to them. This patch was obtained by running sed -i s/-O2/-O2 -fno-strict-aliasing/g ../gcc/testsuite/gcc.target/i386/avx*-2.c Ok for stage1? I don't like this approach. If the testcase is broken, then it should be fixed, not worked around. Uros.
Re: [PATCH, stage1][PR65443] Add transform_to_exit_first_loop_alt
On 27-03-15 15:10, Tom de Vries wrote: Hi, this patch fixes PR65443, a todo in the parloops pass for function transform_to_exit_first_loop: ... TODO: the common case is that latch of the loop is empty and immediately follows the loop exit. In this case, it would be better not to copy the body of the loop, but only move the entry of the loop directly before the exit check and increase the number of iterations of the loop by one. This may need some additional preconditioning in case NIT = ~0. ... The current implementation of transform_to_exit_first_loop transforms the loop by moving and duplicating the loop body. This patch transforms the loop into the same normal form, but without the duplication, if that's possible (determined by try_transform_to_exit_first_loop_alt). The actual transformation, done by transform_to_exit_first_loop_alt transforms this loop: ... bb preheader: ... goto bb header bb header: ... if (ivtmp n) goto bb latch; else goto bb exit; bb latch: ivtmp = ivtmp + inc; goto bb header ... into this one: ... bb preheader: ... goto bb newheader bb header: ... goto bb latch; bb newheader: if (ivtmp n + 1) goto bb header; else goto bb exit; bb latch: ivtmp = ivtmp + inc; goto bb newheader ... Updated patch, now using redirect_edge_var_map and flush_pending_stmts. Bootstrapped and reg-tested on x86_64. OK for stage1 trunk? Thanks, - Tom Add transform_to_exit_first_loop_alt 2015-03-27 Tom de Vries t...@codesourcery.com PR tree-optimization/65443 * tree-parloops.c (replace_imm_uses, replace_uses_in_bb_by) (replace_uses_in_bbs_by, transform_to_exit_first_loop_alt) (try_transform_to_exit_first_loop_alt): New function. (transform_to_exit_first_loop): Use try_transform_to_exit_first_loop_alt. * gcc.dg/parloops-exit-first-loop-alt-2.c: New test. * gcc.dg/parloops-exit-first-loop-alt-3.c: New test. * gcc.dg/parloops-exit-first-loop-alt.c: New test. * testsuite/libgomp.c/parloops-exit-first-loop-alt-2.c: New test. * testsuite/libgomp.c/parloops-exit-first-loop-alt-3.c: New test. * testsuite/libgomp.c/parloops-exit-first-loop-alt.c: New test. --- .../gcc.dg/parloops-exit-first-loop-alt-2.c| 27 ++ .../gcc.dg/parloops-exit-first-loop-alt-3.c| 26 ++ .../gcc.dg/parloops-exit-first-loop-alt.c | 27 ++ gcc/tree-parloops.c| 341 - .../libgomp.c/parloops-exit-first-loop-alt-2.c | 48 +++ .../libgomp.c/parloops-exit-first-loop-alt-3.c | 29 ++ .../libgomp.c/parloops-exit-first-loop-alt.c | 48 +++ 7 files changed, 534 insertions(+), 12 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-2.c create mode 100644 gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-3.c create mode 100644 gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt.c create mode 100644 libgomp/testsuite/libgomp.c/parloops-exit-first-loop-alt-2.c create mode 100644 libgomp/testsuite/libgomp.c/parloops-exit-first-loop-alt-3.c create mode 100644 libgomp/testsuite/libgomp.c/parloops-exit-first-loop-alt.c diff --git a/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-2.c b/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-2.c new file mode 100644 index 000..a4d6cb2 --- /dev/null +++ b/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-2.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target pthread } */ +/* { dg-options -O2 -ftree-parallelize-loops=2 -fdump-tree-parloops } */ + +#define N 1000 + +unsigned int a[N]; +unsigned int b[N]; +unsigned int c[N]; + +void __attribute__((noclone,noinline)) +f (unsigned int n) +{ + int i; + +for (i = 0; i N; ++i) + c[i] = a[i] + b[i]; +} + +/* Three times three array accesses: + - three in f._loopfn.0 + - three in the parallel + - three in the low iteration count loop + Crucially, none for a peeled off last iteration following the parallel. */ +/* { dg-final { scan-tree-dump-times (?n)\\\[i 9 parloops } } */ + +/* { dg-final { cleanup-tree-dump parloops } } */ diff --git a/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-3.c b/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-3.c new file mode 100644 index 000..c7ab51f88 --- /dev/null +++ b/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-3.c @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target pthread } */ +/* { dg-options -O2 -ftree-parallelize-loops=2 -fdump-tree-parloops } */ + +unsigned int *a; + +unsigned int __attribute__((noclone,noinline)) +f (unsigned int n) +{ + int i; + unsigned int sum = 1; + + for (i = 0; i n; ++i) +sum += a[i]; + + return sum; +} + +/* Three array accesses: + - one in f._loopfn.0 + - one in the parallel + - one in the low iteration count loop + Crucially, none for a peeled off last
Re: [RS6000] Fix 65576 regression
On Thu, Apr 02, 2015 at 08:02:57PM -0400, David Edelsohn wrote: If the final alternative cannot occur, it should be removed as well. Alan, would you please test that change also? Tested powerpc64-linux and powerpc-linux no regressions. * config/rs6000/rs6000.md (extenddftf2_internal): Remove last alternative. Index: gcc/config/rs6000/rs6000.md === --- gcc/config/rs6000/rs6000.md (revision 221805) +++ gcc/config/rs6000/rs6000.md (working copy) @@ -8369,9 +8369,9 @@ }) (define_insn_and_split *extenddftf2_internal - [(set (match_operand:TF 0 nonimmediate_operand =m,Y,ws,d,d,r) - (float_extend:TF (match_operand:DF 1 input_operand d,r,md,md,md,rm))) - (use (match_operand:DF 2 zero_reg_mem_operand d,r,j,m,d,n))] + [(set (match_operand:TF 0 nonimmediate_operand =m,Y,ws,d,d) + (float_extend:TF (match_operand:DF 1 input_operand d,r,md,md,md))) + (use (match_operand:DF 2 zero_reg_mem_operand d,r,j,m,d))] !TARGET_IEEEQUAD TARGET_HARD_FLOAT TARGET_FPRS TARGET_DOUBLE_FLOAT TARGET_LONG_DOUBLE_128 -- Alan Modra Australia Development Lab, IBM
Re: [C++ Patch/RFC] PR 64085
OK. Jason
Re: [PATCH, CHKP] Fix cdtor merge for instrumented functions
On 02 Apr 22:21, Jan Hubicka wrote: Hi, This patch doesn't allow instrumentation thunks calls while merging constructors and destructors. Not isntrumented code is not affeceted. Bootstrapped and tested on x86_64-unknown-linux-gnu. OK for trunk? Thanks, Ilya -- gcc/ 2015-04-02 Ilya Enkovich ilya.enkov...@intel.com * ipa.c (ipa_cdtor_merge): Skip instrumentation thunks. So the problem here is that you do have two names for the function, one that is not instrumented and other that is instrumented? I am bit surprised we get instrumentation on ctors that should not take or return pointer parameter, but I see one can trigger that at least by manually adding constructor attribute. I think what you need is to drop DECL_STATIC_CONSTRUCTOR/DESTRUCTURO flags when producing the transparent alias. Honza Dropping flag is a good option. Here is a corresponding patch. Bootstrapped and tested on x86_64-unknown-linux-gnu. Does it look OK? Thanks, Ilya -- gcc/ 2015-04-03 Ilya Enkovich ilya.enkov...@intel.com * ipa-chkp.c (chkp_maybe_create_clone): Reset cdtor flags for instrumentation thunk. (chkp_produce_thunks): Likewise. gcc/testsuite/ 2015-04-03 Ilya Enkovich ilya.enkov...@intel.com * gcc.dg/lto/chkp-ctor-merge_0.c: New. diff --git a/gcc/ipa-chkp.c b/gcc/ipa-chkp.c index a9933e2..0c16f71 100644 --- a/gcc/ipa-chkp.c +++ b/gcc/ipa-chkp.c @@ -550,6 +550,9 @@ chkp_maybe_create_clone (tree fndecl) clone-thunk.thunk_p = true; clone-thunk.add_pointer_bounds_args = true; clone-create_edge (node, NULL, 0, CGRAPH_FREQ_BASE); + /* Thunk shouldn't be a cdtor. */ + DECL_STATIC_CONSTRUCTOR (clone-decl) = 0; + DECL_STATIC_DESTRUCTOR (clone-decl) = 0; } else { @@ -713,6 +716,9 @@ chkp_produce_thunks (bool early) 0, CGRAPH_FREQ_BASE); node-create_reference (node-instrumented_version, IPA_REF_CHKP, NULL); + /* Thunk shouldn't be a cdtor. */ + DECL_STATIC_CONSTRUCTOR (node-decl) = 0; + DECL_STATIC_DESTRUCTOR (node-decl) = 0; } } diff --git a/gcc/testsuite/gcc.dg/lto/chkp-ctor-merge_0.c b/gcc/testsuite/gcc.dg/lto/chkp-ctor-merge_0.c new file mode 100644 index 000..ac4095b --- /dev/null +++ b/gcc/testsuite/gcc.dg/lto/chkp-ctor-merge_0.c @@ -0,0 +1,23 @@ +/* { dg-lto-do run } */ +/* { dg-require-effective-target mpx } */ +/* { dg-lto-options { { -O2 -flto -fcheck-pointer-bounds -mmpx -nodefaultlibs -lc } } } */ + +int glob = 1; + +void __attribute__((constructor)) +ctor1 () +{ + glob += 1; +} + + +void __attribute__((constructor)) +ctor2 () +{ + glob -= 2; +} + +int main (int argc, const char **argv) +{ + return glob; +}
Re: [debug-early] equate new DIE with DW_AT_specificationto a previous declaration
On 03/18/2015 11:51 AM, Aldy Hernandez wrote: On 03/17/2015 07:12 PM, Jason Merrill wrote: Why are we outlining a DECL_EXTERNAL function? SRA has no restrictions on whether a function is DECL_EXTERNAL. Aha. So it seems that DECL_EXTERNAL is the wrong gate for equate_decl_number_to_die here. I think the rule we want is that if we don't already have a non-declaration DIE for a function, we should equate the new DIE. Let's remove the existing calls and replace them with a single conditional call before the if (declaration). Incidentally, /* A declaration that has been previously dumped needs no additional information. */ if (dumped_early declaration) return; Do we need to check dumped_early here? I would think that any declaration that has an old_die needs no additional information. Jason
Re: [debug-early] Handle specification of class scoped static functions
On 03/20/2015 08:11 PM, Aldy Hernandez wrote: + /* For class scoped static functions, the dumped early + version was the declaration, whereas the next time + around with a different context should be the + specification. In this case, avoid reusing the DIE, but + generate a specification below. E.g.: + + class C { + public: + static void moo () {} + }; */ + || !is_cu_die (context_die)) Why do we still need this added (relative to trunk)? Are we getting here multiple times with class context_die? Also, the comment seems redundant with the comment immediately above. Jason
Re: [PATCH, i386] PR63211 broken type-punning in avx* tests.
On 03 Apr 13:39, Uros Bizjak wrote: On Fri, Apr 3, 2015 at 1:02 PM, Ilya Tocar tocarip.in...@gmail.com wrote: I've looked into avx* tests and many of them (even those that don't fail in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63211) use invalid type punning. Properly fixing them looks like a lot of work, so I propose just adding -fno-strict-aliasing to them. This patch was obtained by running sed -i s/-O2/-O2 -fno-strict-aliasing/g ../gcc/testsuite/gcc.target/i386/avx*-2.c Ok for stage1? I don't like this approach. If the testcase is broken, then it should be fixed, not worked around. IMHO those tests don't need to be alias conformant. There are plenty of tests for aliasing rules, and avx tests verify intrinsics implementaion. There are plenty of real programs braking alias rules, so why can't we have non-conformant tests?
Re: [PATCH] Fix a test with bogus size_t type
On Thu, Apr 02, 2015 at 10:35:35AM +0200, Marek Polacek wrote: We are now more strict when accepting user-defined initializer_lists; in particular, we now require sizetype, not just any integral type. The following test failed with -m32, because it had a bogus type of size_t, with -m32 it usually should be unsigned int, not unsigned long. Test passes now with both -m32/-m64, ok for trunk? That is obvious. 2015-04-02 Marek Polacek pola...@redhat.com * g++.dg/cpp0x/pr57101.C: Use proper type for size_t. diff --git gcc/testsuite/g++.dg/cpp0x/pr57101.C gcc/testsuite/g++.dg/cpp0x/pr57101.C index 94b576f..c0fc966 100644 --- gcc/testsuite/g++.dg/cpp0x/pr57101.C +++ gcc/testsuite/g++.dg/cpp0x/pr57101.C @@ -1,7 +1,7 @@ // { dg-do compile { target c++11 } } // { dg-options -fcompare-debug } -typedef long unsigned size_t; +typedef __SIZE_TYPE__ size_t; namespace { template typename _Tp, _Tp __v struct integral_constant Jakub
Re: [libstdc++/65033] Give alignment info to libatomic
On Thu, 2 Apr 2015, Hans-Peter Nilsson wrote: Why then use __alignof(_M_i) (the object-alignment) instead of _S_alignment (the deduced alas insufficiently increased type-alignment)? (The immediate reason is that _S_alignment wasn't there until a later revision, but the gist of the question remains. :-) making sure that atomic_is_lock_free returns the same value for all objects of a given type, (That would work but it doesn't seem to be the case.) we probably should have changed the interface so that we would pass size and alignment rather than size and object pointer. Instead, we decided that passing null for the object pointer would be sufficient. But as this PR shows, we really do need to take alignment into account. Regarding what's actually needed, alignment of an atomic type should always be *forced to be at least the natural alignment of of the object* (with non-power-of-two sized-objects rounded up) and until then atomic types won't work for targets where the non-atomic equivalents have less alignment (as straddling a page-boundary won't be lock-less-atomic anywhere where straddling a page-boundary may cause a non-atomic-access...) So, not target-specific except for targets that require even higher-than-natural alignment. So, can we do something like this instead for gcc5? (Completely untested and may be syntactically, namespacingly and cxxstandardversionly flawed.) Index: include/std/atomic === --- include/std/atomic (revision 221849) +++ include/std/atomic (working copy) @@ -165,16 +165,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION struct atomic { private: - // Align 1/2/4/8/16-byte types the same as integer types of that size. + // Align 1/2/4/8/16-byte types to the natural alignment of that size. // This matches the alignment effects of the C11 _Atomic qualifier. static constexpr int _S_min_alignment - = sizeof(_Tp) == sizeof(char) ? alignof(char) - : sizeof(_Tp) == sizeof(short) ? alignof(short) - : sizeof(_Tp) == sizeof(int) ? alignof(int) - : sizeof(_Tp) == sizeof(long) ? alignof(long) - : sizeof(_Tp) == sizeof(long long) ? alignof(long long) + = sizeof(_Tp) == sizeof(char) ? max(sizeof(char), alignof(char)) + : sizeof(_Tp) == sizeof(short) ? max(sizeof(short), alignof(short)) + : sizeof(_Tp) == sizeof(int) ? max(sizeof(int), alignof(int)) + : sizeof(_Tp) == sizeof(long) ? max(sizeof(long), alignof(long)) + : sizeof(_Tp) == sizeof(long long) ? max(sizeof(long long), alignof(long long)) #ifdef _GLIBCXX_USE_INT128 - : sizeof(_Tp) == sizeof(__int128) ? alignof(__int128) + : sizeof(_Tp) == sizeof(__int128) ? max(sizeof(__int128), alignof(__int128)) #endif : 0; @@ -216,7 +216,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION is_lock_free() const noexcept { // Produce a fake, minimally aligned pointer. - void *__a = reinterpret_castvoid *(-__alignof(_M_i)); + void *__a = reinterpret_castvoid *(-_S_alignment); return __atomic_is_lock_free(sizeof(_M_i), __a); } @@ -224,7 +224,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION is_lock_free() const volatile noexcept { // Produce a fake, minimally aligned pointer. - void *__a = reinterpret_castvoid *(-__alignof(_M_i)); + void *__a = reinterpret_castvoid *(-_S_alignment); return __atomic_is_lock_free(sizeof(_M_i), __a); } Index: include/bits/atomic_base.h === --- include/bits/atomic_base.h (revision 221849) +++ include/bits/atomic_base.h (working copy) @@ -240,7 +240,23 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION private: typedef _ITp __int_type; - __int_type _M_i; + // Align 1/2/4/8/16-byte types to the natural alignment of that size. + // This matches the alignment effects of the C11 _Atomic qualifier. + static constexpr int _S_min_alignment + = sizeof(_Tp) == sizeof(char) ? max(sizeof(char), __alignof(char)) + : sizeof(_Tp) == sizeof(short) ? max(sizeof(short), __alignof(short)) + : sizeof(_Tp) == sizeof(int) ? max(sizeof(int), __alignof(int)) + : sizeof(_Tp) == sizeof(long) ? max(sizeof(long), __alignof(long)) + : sizeof(_Tp) == sizeof(long long) ? max(sizeof(long long), __alignof(long long)) +#ifdef _GLIBCXX_USE_INT128 + : sizeof(_Tp) == sizeof(__int128) ? max(sizeof(__int128), __alignof(__int128)) +#endif + : 0; + + static constexpr int _S_alignment += _S_min_alignment alignof(_Tp) ? _S_min_alignment : __alignof(_Tp); + + __int_type _M_i __attribute ((__aligned(_S_alignment))); public: __atomic_base() noexcept = default; @@ -348,7 +364,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION is_lock_free() const noexcept
Re: ipa-cp heuristic tweek
2015-03-31 15:32 GMT+03:00 Ilya Enkovich enkovich@gmail.com: 2015-03-29 18:43 GMT+03:00 Jan Hubicka hubi...@ucw.cz: +static bool +set_single_call_flag (cgraph_node *node, void *) +{ + cgraph_edge *cs = node-callers; + /* Local thunks can be handled transparently, skip them. */ + while (cs cs-caller-thunk.thunk_p cs-caller-local.local) +cs = cs-next_caller; + if (cs) +{ + gcc_assert (!cs-next_caller); This assert assumes the only non-thunk caller is always at the end of a callers list. Is it actually guaranteed? + IPA_NODE_REF (cs-caller)-node_calling_single_call = true; + return true; +} + return false; +} + /* Initialize ipcp_lattices. */ Thanks, Ilya Hi Honza, For chkp testing I see cases when gcc asserts in set_single_call_flag because instrumentation thunk is not the last one in a callers list. I want to install following patch to fix it. Is it OK? Thanks, Ilya -- 2015-04-03 Ilya Enkovich ilya.enkov...@intel.com * ipa-cp (set_single_call_flag): Remove too restrictive assert. diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c index d9aa92e..bfe4821 100644 --- a/gcc/ipa-cp.c +++ b/gcc/ipa-cp.c @@ -839,7 +839,6 @@ set_single_call_flag (cgraph_node *node, void *) cs = cs-next_caller; if (cs) { - gcc_assert (!cs-next_caller); IPA_NODE_REF (cs-caller)-node_calling_single_call = true; return true; }
Re: ipa-cp heuristic tweek
2015-03-31 15:32 GMT+03:00 Ilya Enkovich enkovich@gmail.com: 2015-03-29 18:43 GMT+03:00 Jan Hubicka hubi...@ucw.cz: +static bool +set_single_call_flag (cgraph_node *node, void *) +{ + cgraph_edge *cs = node-callers; + /* Local thunks can be handled transparently, skip them. */ + while (cs cs-caller-thunk.thunk_p cs-caller-local.local) +cs = cs-next_caller; + if (cs) +{ + gcc_assert (!cs-next_caller); This assert assumes the only non-thunk caller is always at the end of a callers list. Is it actually guaranteed? + IPA_NODE_REF (cs-caller)-node_calling_single_call = true; + return true; +} + return false; +} + /* Initialize ipcp_lattices. */ Thanks, Ilya Hi Honza, For chkp testing I see cases when gcc asserts in set_single_call_flag because instrumentation thunk is not the last one in a callers list. I want to install following patch to fix it. Is it OK? Thanks, Ilya -- 2015-04-03 Ilya Enkovich ilya.enkov...@intel.com * ipa-cp (set_single_call_flag): Remove too restrictive assert. OK (this is probably artifact of earlier version of the code, thanks for pointing it out) Honza
Re: [CHKP] Support returned bounds in thunks expand
Yep, there is such code in place. It is done by insert_bndret_call_after. You are right, I should use the same interface for both these cases. That would be nice indeed. We should separate out and unify the code for porducing a wrapper call next stage1 (as Jeff mentions too). + if (instrumentation_clone + !DECL_BY_REFERENCE (resdecl) + restmp + BOUNDED_P (restmp)) + { + tree fn = targetm.builtin_chkp_function (BUILT_IN_CHKP_BNDRET); + gcall *retbnd = gimple_build_call (fn, 1, restmp); + + resbnd = create_tmp_reg (pointer_bounds_type_node, retbnd); + gimple_call_set_lhs (retbnd, resbnd); + + gsi_insert_after (bsi, retbnd, GSI_NEW_STMT); + create_edge (get_create (fn), retbnd, callees-count, callees-frequency); + } I am not sure why we need to check here: Originaly we have two bounded functions, A and B. We turn function B to a wrapper of function A. If callers of bounded functions need to call a builtin, I would expect all callers of B to do it already, so I would expect that wrapper is safe here? The problem with instrumented call is that instrumented function returns two values and call lhs gets only the first one. Thus we generate bndret call to get the second one to build own return with two values. I see, patch is OK then (preferably merging as much as possible with ipa-split) Honza Or do we mix instrumented and non-instrumented versions somehow? Addding code handling return value is going to turn the call to non-tailcall, so you probably want to drop the tailcall flag, too. No, function still return what the last call return, thus may keep tailcall. Thanks, Ilya Honza
Re: [CHKP, PATCH] Fix LTO cgraph merge for instrumented functions
Assembler name of instrumented function is a transparent alias of original function's name. Alias chains are not taken into account by analysis. Thus we see no conflict between instrumented function's name and a variable name but emit them with the same assembler name. To detect name conflict I keep original function node alive. Hmm, so lets see if I understand your situation. You always have transparent alias TA and function F connected by IPA_REF_CHKP and because TA is represented as thunk you also have a fake call from TA to F? I do not understand how you can miss the link these days. I extended lto-cgraph to always keep thunk and its target in every unit that reffers to thunk. That should solve you problem? How IPA_REF_CHKP can link get lost? I suppose we still need to 1) write_symbol and lto-symtab should know that transparent aliases are not real symbols and ignore them. We have predicate symtab_node::real_symbol_p, I suppose it should return true. I think cgraph_merge and varpool_merge in lto-symtab needs to know what to do when merging two symbols with transparent aliases. 2) At some point we need to populate transparent alias links as discussed in the other thread. 3) For safety, I guess we want symbol_table::change_decl_assembler_name to either sanity check there are no trasparent alias links pointing to old assembler names or update it. 4) I think we want verify_node to check that transparent alias links and chkp points as intended and only on those symbols where they need to be There is also logic in lto-partition to always insert alias target OK, so you have the chkp references that links to instrumented_version. You do not stream them for boundary symbols but for some reason they are needed, but when you can end up with wrong CHKP link? Wrong links may appear when cgraph nodes are merged. I see, I think you really want to fix these at merging time as sugested in 1). In this case even the node-instrumented_version pointer may be out of date and point to a node that lost in merging? Honza Thanks Ilya
Fix ICE in speculative_call_info
Hi, this patch fixes ordering issue where we try to resolve speculation while we are mid of its duplication. Bootstrapped/regtested x86_64-liinux, will commit it shortly. Honza PR ipa/65655 * ipa-inline-analysis.c (edge_set_predicate): Do not redirect speculative indirect edges to avoid ordering issue. * g++.dg/torture/pr65655.C: New testcase. Index: ipa-inline-analysis.c === --- ipa-inline-analysis.c (revision 221845) +++ ipa-inline-analysis.c (working copy) @@ -793,7 +793,11 @@ edge_set_predicate (struct cgraph_edge * { /* If the edge is determined to be never executed, redirect it to BUILTIN_UNREACHABLE to save inliner from inlining into it. */ - if (predicate false_predicate_p (predicate)) + if (predicate false_predicate_p (predicate) + /* When handling speculative edges, we need to do the redirection + just once. Do it always on the direct edge, so we do not +attempt to resolve speculation while duplicating the edge. */ + (!e-speculative || e-callee)) e = redirect_to_unreachable (e); struct inline_edge_summary *es = inline_edge_summary (e); Index: testsuite/g++.dg/torture/pr65655.C === --- testsuite/g++.dg/torture/pr65655.C (revision 0) +++ testsuite/g++.dg/torture/pr65655.C (working copy) @@ -0,0 +1,51 @@ +/* { dg-do compile } */ +// { dg-additional-options -std=c++11 -fsanitize=undefined -O2 } +class ECoordinate { }; +class EPoint { +public: + inline ECoordinate y (); +}; +ECoordinate EPoint::y () { } +template class KEY, class CONTENT class AVLTree; +template class KEY, class CONTENT class AVLTreeNode { + friend class +AVLTree KEY, CONTENT ; + KEY key; + void set_rthread (unsigned char b); + void set_lthread (unsigned char b); +}; +template class KEY, class CONTENT class AVLTree { +public: + AVLTree (); + void insert (const KEY key, const CONTENT c); +AVLTreeNode KEY, CONTENT *root; + const KEY * _target_key; + virtual int compare (const KEY k1, const KEY k2) const; + void _add (AVLTreeNode KEY, CONTENT *t); + virtual void _status (unsigned int) { } +}; +template class KEY, class CONTENT void AVLTree KEY, CONTENT ::_add (AVLTreeNode KEY, CONTENT *t) { + int cmp = compare (*_target_key, t-key); + if (cmp == 0) +{ _status (1); } +} +template class KEY, class CONTENT void AVLTree KEY, CONTENT ::insert (const KEY key, const CONTENT c) { + if (root == 0) { + root-set_rthread (1); + root-set_lthread (1); +} +else { _target_key = key; _add (root); } +} +template class KEY, class CONTENT AVLTree KEY, CONTENT ::AVLTree () +: root (0) { } +class ContactRepository { + void insertContact (EPoint pt, int val); +}; +void ContactRepository::insertContact (EPoint pt, int val) { + AVLTreeNode ECoordinate, AVLTree ECoordinate, int **cont_x_node; + if (cont_x_node == __null) +{ + AVLTree ECoordinate, int *insert_tree = new AVLTree ECoordinate, int ; + insert_tree-insert (pt.y (), val); +} +}
Re: [PATCH] [ARM] Add support for the Samsung Exynos M1 processor
On Fri, Apr 3, 2015 at 5:17 PM, Sebastian Pop seb...@gmail.com wrote: Hi, On Thu, Apr 2, 2015 at 5:51 PM, James Greenhalgh james.greenha...@arm.com wrote: Trunk is currently in Stage 4 development, these patches are fairly low-risk, but they are certainly not regression fixes. I'll defer to port maintainers and release managers for the final say, but in my opinion it would not be appropriate to commit them until Stage 1 development for GCC 6.0 opens (hopefully in a few weeks). I thought that adding flags for new processors was ok at any time, even to backport. It's usually risk vs reward on a per patch basis and I don't think of it as a general rule. We've always avoided the CPU tuning backport rule to the FSF branches. The smaller the CPU tuning patch - the better it is and in this case I'm comfortable with the patch going in as it is adding another tuning option, using existing constructs and is not invasive in the backend. Ok for the ARM port if no regressions and there are no objections from the RMs. regards Ramana
Re: [PATCH, CHKP] Fix static const bounds creation in LTO
On 02 Apr 22:45, Jan Hubicka wrote: Hi, Current in LTO static const bounds are created as external symbol. It doesn't work in case original symbols were removed and privatized. This patch introduces a separate comdat symbol to be used in LTO. Bootstrapped and tested on x86_64-unknown-linux-gnu. Does this approach look OK? As I understand it, you want to create a static variables, like __chkp_zero_bounds which should be shared across translation units. Currently the function does: static tree chkp_make_static_const_bounds (HOST_WIDE_INT lb, HOST_WIDE_INT ub, const char *name) { tree var; /* With LTO we may have constant bounds already in varpool. Try to find it. */ var = chkp_find_const_bounds_var (lb, ub); if (var) return var; var = build_decl (UNKNOWN_LOCATION, VAR_DECL, get_identifier (name), pointer_bounds_type_node); TREE_PUBLIC (var) = 1; TREE_USED (var) = 1; TREE_READONLY (var) = 1; TREE_STATIC (var) = 1; TREE_ADDRESSABLE (var) = 0; DECL_ARTIFICIAL (var) = 1; DECL_READ_P (var) = 1; /* We may use this symbol during ctors generation in chkp_finish_file when all symbols are emitted. Force output to avoid undefined symbols in ctors. */ if (!in_lto_p) { DECL_INITIAL (var) = targetm.chkp_make_bounds_constant (lb, ub); DECL_COMDAT (var) = 1; varpool_node::get_create (var)-set_comdat_group (DECL_ASSEMBLER_NAME (var)); varpool_node::get_create (var)-force_output = 1; } else DECL_EXTERNAL (var) = 1; varpool_node::finalize_decl (var); } You should not set comdat group by hand, or we get into troubles on non-ELF targets. Just call make_decl_one_only (var, DECL_ASSEMBLER_NAME (var)); Now in LTO I guess you want to check if the symbol is provided by the source compilation unit, i.e. call symtab_node::get_for_asmname (DECL_ASSEMBLER_NAME (var)) and if it is there, check it have proper type and fatal_error otherwise and if it does, discar var and use existing variable This avoid a duplicate declaration of a symbol that is invalid in symtab. (once we solve the transparent alias thing, I will add verification for that) Because you already set force_output, the symbol should not be privatized and renamed in any way. If there are cases the symbol can get legally optimized out, I guess you can also avoid setting force_output and we can stream references to the decls into optimization summary in a case we want ot bind for it in WPA, but that can wait for next stage1. Honza Thanks a lot for looking into it! Seems originally I mostly got problems due to no DECL_WEAK for generated var, make_decl_one_only seems to work for my tests. Will run more testing to make sure it's OK. Here is a new patch version. Is it OK? Thanks, Ilya -- gcc/ 2015-04-03 Ilya Enkovich ilya.enkov...@intel.com * tree-chkp.c (chkp_find_const_bounds_var): Search by assembler name. (chkp_make_static_const_bounds): Use make_decl_one_only. gcc/testsuite/ 2015-04-03 Ilya Enkovich ilya.enkov...@intel.com * gcc.dg/lto/chkp-static-bounds_0.c: New. diff --git a/gcc/testsuite/gcc.dg/lto/chkp-static-bounds_0.c b/gcc/testsuite/gcc.dg/lto/chkp-static-bounds_0.c new file mode 100644 index 000..596e551 --- /dev/null +++ b/gcc/testsuite/gcc.dg/lto/chkp-static-bounds_0.c @@ -0,0 +1,26 @@ +/* { dg-lto-do link } */ +/* { dg-require-effective-target mpx } */ +/* { dg-lto-options { { -flto -flto-partition=max -fcheck-pointer-bounds -mmpx } } } */ + +const char *cc; + +int test1 (const char *c) +{ + c = __builtin___bnd_init_ptr_bounds (c); + cc = c; + return c[0] * 2; +} + +struct S +{ + int (*fnptr) (const char *); +} S; + +struct S s1 = {test1}; +struct S s2 = {test1}; +struct S s3 = {test1}; + +int main (int argc, const char **argv) +{ + return s1.fnptr (argv[0]) + s2.fnptr (argv[1]); +} diff --git a/gcc/tree-chkp.c b/gcc/tree-chkp.c index 03f75b3..4daeab6 100644 --- a/gcc/tree-chkp.c +++ b/gcc/tree-chkp.c @@ -1876,26 +1876,17 @@ chkp_add_bounds_to_call_stmt (gimple_stmt_iterator *gsi) /* Return constant static bounds var with specified LB and UB if such var exists in varpool. Return NULL otherwise. */ static tree -chkp_find_const_bounds_var (HOST_WIDE_INT lb, - HOST_WIDE_INT ub) +chkp_find_const_bounds_var (tree id) { - tree val = targetm.chkp_make_bounds_constant (lb, ub); - struct varpool_node *node; - - /* We expect bounds constant is represented as a complex value - of two pointer sized integers. */ - gcc_assert (TREE_CODE (val) == COMPLEX_CST); + struct symtab_node *node = symtab_node::get_for_asmname (id); - FOR_EACH_VARIABLE (node) -if (POINTER_BOUNDS_P (node-decl) -TREE_READONLY (node-decl) -DECL_INITIAL
Re: [PATCH, CHKP] Fix cdtor merge for instrumented functions
On 02 Apr 22:21, Jan Hubicka wrote: Hi, This patch doesn't allow instrumentation thunks calls while merging constructors and destructors. Not isntrumented code is not affeceted. Bootstrapped and tested on x86_64-unknown-linux-gnu. OK for trunk? Thanks, Ilya -- gcc/ 2015-04-02 Ilya Enkovich ilya.enkov...@intel.com * ipa.c (ipa_cdtor_merge): Skip instrumentation thunks. So the problem here is that you do have two names for the function, one that is not instrumented and other that is instrumented? I am bit surprised we get instrumentation on ctors that should not take or return pointer parameter, but I see one can trigger that at least by manually adding constructor attribute. I think what you need is to drop DECL_STATIC_CONSTRUCTOR/DESTRUCTURO flags when producing the transparent alias. Honza Dropping flag is a good option. Here is a corresponding patch. Bootstrapped and tested on x86_64-unknown-linux-gnu. Does it look OK? Thanks, Ilya -- gcc/ 2015-04-03 Ilya Enkovich ilya.enkov...@intel.com * ipa-chkp.c (chkp_maybe_create_clone): Reset cdtor flags for instrumentation thunk. (chkp_produce_thunks): Likewise. OK, Thanks! Honza
Re: [PATCH] Fix PR ipa/65665
Hello. Following patch fixes a new issue, ICE has disappeared on aarch64-linux machine and I was able to boostrap on the machine. Moreover, no new regression seen on x86_64-linux-pc. Ready for trunk? Thanks, Martin From b8463b8a7b2f6b652d6d9c3a2ece814ec5554619 Mon Sep 17 00:00:00 2001 From: mliska mli...@suse.cz Date: Fri, 3 Apr 2015 09:30:50 +0200 Subject: [PATCH] Fix PR ipa/65665 gcc/ChangeLog: 2015-04-03 Martin Liska mli...@suse.cz PR ipa/65665 * ipa-icf.c (sem_function::equals_wpa): Verify that IPA CP has computed data structure. (sem_item_optimizer::update_hash_by_addr_refs): Likewise. OK, we should clean this up next stage1 ideally making passmanager to do the ipa-prop analysis and making them trigger for all ipa passes consuming them (ipa-icf/ipa-cp/inliner). Currently the way they are initialized either by inliner or ipa-cp is quite ugly. Honza --- gcc/ipa-icf.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/gcc/ipa-icf.c b/gcc/ipa-icf.c index 8626730..8f8a0cf 100644 --- a/gcc/ipa-icf.c +++ b/gcc/ipa-icf.c @@ -535,7 +535,8 @@ sem_function::equals_wpa (sem_item *item, (TREE_CODE (TREE_TYPE (decl)) == METHOD_TYPE || TREE_CODE (TREE_TYPE (item-decl)) == METHOD_TYPE) (ipa_node_params_sum == NULL - || ipa_is_param_used (IPA_NODE_REF (dyn_cast cgraph_node *(node)), + || IPA_NODE_REF (get_node ())-descriptors.is_empty () + || ipa_is_param_used (IPA_NODE_REF (get_node ()), 0)) compare_polymorphic_p ()) { @@ -2501,14 +2502,15 @@ sem_item_optimizer::update_hash_by_addr_refs () m_items[i]-update_hash_by_addr_refs (m_symtab_node_map); if (m_items[i]-type == FUNC) { + cgraph_node *cnode = dyn_cast cgraph_node * (m_items[i]-node); + if (TREE_CODE (TREE_TYPE (m_items[i]-decl)) == METHOD_TYPE contains_polymorphic_type_p (method_class_type (TREE_TYPE (m_items[i]-decl))) (DECL_CXX_CONSTRUCTOR_P (m_items[i]-decl) || ((ipa_node_params_sum == NULL -|| ipa_is_param_used ( - IPA_NODE_REF - (dyn_cast cgraph_node *(m_items[i]-node)), 0)) +|| IPA_NODE_REF (cnode)-descriptors.is_empty () +|| ipa_is_param_used (IPA_NODE_REF (cnode), 0)) static_castsem_function * (m_items[i]) -compare_polymorphic_p ( { -- 2.1.4
Re: [PATCH, CHKP] Fix static const bounds creation in LTO
Thanks a lot for looking into it! Seems originally I mostly got problems due to no DECL_WEAK for generated var, make_decl_one_only seems to work for my tests. Will run more testing to make sure it's OK. Here is a new patch version. Is it OK? Good ;) diff --git a/gcc/tree-chkp.c b/gcc/tree-chkp.c index 03f75b3..4daeab6 100644 --- a/gcc/tree-chkp.c +++ b/gcc/tree-chkp.c @@ -1876,26 +1876,17 @@ chkp_add_bounds_to_call_stmt (gimple_stmt_iterator *gsi) /* Return constant static bounds var with specified LB and UB if such var exists in varpool. Return NULL otherwise. */ static tree -chkp_find_const_bounds_var (HOST_WIDE_INT lb, - HOST_WIDE_INT ub) +chkp_find_const_bounds_var (tree id) { - tree val = targetm.chkp_make_bounds_constant (lb, ub); - struct varpool_node *node; - - /* We expect bounds constant is represented as a complex value - of two pointer sized integers. */ - gcc_assert (TREE_CODE (val) == COMPLEX_CST); + struct symtab_node *node = symtab_node::get_for_asmname (id); Sadly I think this won't work on targets that adds prefix to assembler name, like djgpp/mingw. To obtain mangled assembler name you need to call id = targetm.mangle_decl_assembler_name (decl, DECL_NAME (decl)); that in turn requires decl to already exist. That is why I guess you may want to just build the new var and see if already exists Honza - FOR_EACH_VARIABLE (node) -if (POINTER_BOUNDS_P (node-decl) - TREE_READONLY (node-decl) - DECL_INITIAL (node-decl) - TREE_CODE (DECL_INITIAL (node-decl)) == COMPLEX_CST - tree_int_cst_equal (TREE_REALPART (DECL_INITIAL (node-decl)), -TREE_REALPART (val)) - tree_int_cst_equal (TREE_IMAGPART (DECL_INITIAL (node-decl)), -TREE_IMAGPART (val))) + if (node) +{ + /* We don't allow this symbol usage for non bounds. */ + gcc_assert (node-type == SYMTAB_VARIABLE); + gcc_assert (POINTER_BOUNDS_P (node-decl)); return node-decl; +} return NULL; } @@ -1907,37 +1898,34 @@ chkp_make_static_const_bounds (HOST_WIDE_INT lb, HOST_WIDE_INT ub, const char *name) { + tree id = get_identifier (name); tree var; + varpool_node *node; /* With LTO we may have constant bounds already in varpool. Try to find it. */ - var = chkp_find_const_bounds_var (lb, ub); + var = chkp_find_const_bounds_var (id); if (var) return var; - var = build_decl (UNKNOWN_LOCATION, VAR_DECL, - get_identifier (name), pointer_bounds_type_node); + var = build_decl (UNKNOWN_LOCATION, VAR_DECL, id, + pointer_bounds_type_node); - TREE_PUBLIC (var) = 1; TREE_USED (var) = 1; TREE_READONLY (var) = 1; TREE_STATIC (var) = 1; TREE_ADDRESSABLE (var) = 0; DECL_ARTIFICIAL (var) = 1; DECL_READ_P (var) = 1; + DECL_INITIAL (var) = targetm.chkp_make_bounds_constant (lb, ub); + make_decl_one_only (var, DECL_ASSEMBLER_NAME (var)); /* We may use this symbol during ctors generation in chkp_finish_file when all symbols are emitted. Force output to avoid undefined symbols in ctors. */ - if (!in_lto_p) -{ - DECL_INITIAL (var) = targetm.chkp_make_bounds_constant (lb, ub); - DECL_COMDAT (var) = 1; - varpool_node::get_create (var)-set_comdat_group (DECL_ASSEMBLER_NAME (var)); - varpool_node::get_create (var)-force_output = 1; -} - else -DECL_EXTERNAL (var) = 1; + node = varpool_node::get_create (var); + node-force_output = 1; + varpool_node::finalize_decl (var); return var;
Re: [PATCH, CHKP] Restore transparent alias chains
I think this will lead to wrong code. At this time, we may have multple declarations sharing single assembler name (and thus IDENTIFIER_TRANSPARENT_ALIAS link). Think of case where one unit defines static function and other global function of the same name. We may end up renaming the symbol but keeping bogus transparent alias link that will trigger on other symbol. That is the reason for fixing chains in privatize_symbol_name. OK, so with your proposed patch you produce the links during lto-stream-in but because the links may be wrong due to multiple different symbol sharing same assembler name you rely on not using it until you fixup in privatize_symbol_name. What happen when you have two symbols with different links sharing assembler name originally, but you rename one and do not rename other during privatization? It still looks much safer to simply install the links only after names become unique and probably don't do that during WPA compilation at all, since we never output any assembly. During WPA the assembler names are never fully unique, but we also probably do not need to set IDENTIFIER_TRANSPARENT_ALIAS. I think IDENTIFIER_TRANSPARENT_ALIAS should be set in rename_statics and separately in ltrans on the place we skip lto_symtab merging. At least it is the place I link it with my patch for supporting transparent aliases in cgraph. I will try to find time to audit chkp code - I missed the addition of transparent aliases in the initial chkp patches. Symbol table code expect 1-1 correspondence between symbol table entries and real symbols. Basically all places we special case aliases or weakrefs needs to be revisited for transparent aliases. I don't see how 1-1 matching may be achieved now for instrumented functions. We have two cgraph_nodes which actually represent the same function. Thus single real symbol for two nodes. Yeah, this is quite important change in the design assumptions for symtab that is why we need so many places to fix... Honza Thanks, Ilya Honza
Re: [PATCH, CHKP] Fix ipa-comdats for instrumentation thunks
2015-04-02 20:04 GMT+03:00 Jan Hubicka hubi...@ucw.cz: Hi, With r221574 (https://gcc.gnu.org/ml/gcc-cvs/2015-03/msg00495.html) thunks don't get comdat groups assigned and this causes a failure in cgraph checker for instrumentation thunks. It happens because instrumentation thunk may reference local symbol in comdat not being in comdat by itself. This patch fixes the problem. Doesn't affect not instrumented code. Testing is in progress. Does it look OK? Thanks, Ilya -- gcc/ 2015-04-02 Ilya Enkovich ilya.enkov...@intel.com * ipa-comdats.c (ipa_comdats): Visit all instrumentation thunks to set proper comdat group. gcc/testsuite/ 2015-04-02 Ilya Enkovich ilya.enkov...@intel.com * gcc.target/i386/mpx/chkp-thunk-comdat-1.cc: New. * gcc.target/i386/mpx/chkp-thunk-comdat-2.cc: New. diff --git a/gcc/ipa-comdats.c b/gcc/ipa-comdats.c index f349f9f..30bcad8 100644 --- a/gcc/ipa-comdats.c +++ b/gcc/ipa-comdats.c @@ -348,10 +348,9 @@ ipa_comdats (void) } /* Finally assign symbols to the sections. */ - + cgraph_node *fun; FOR_EACH_DEFINED_SYMBOL (symbol) { - struct cgraph_node *fun; symbol-aux = NULL; if (!symbol-get_comdat_group () !symbol-alias @@ -388,6 +387,20 @@ ipa_comdats (void) true); } } + + /* Instrumentation thunks reference original node and thus + need to be in the same comdat group. Otherwise we may + get a local instrumented symbol in a comdat group and + the referencing original node outside of it. */ + FOR_EACH_DEFINED_FUNCTION (fun) +if (fun-instrumentation_clone + fun-instrumented_version + !fun-instrumented_version-alias + fun-get_comdat_group () + !fun-instrumented_version-get_comdat_group ()) + fun-instrumented_version-call_for_symbol_and_aliases + (set_comdat_group_1, fun, true); I think you want to handle them same way as the aliasesthunks are handled. This fix is symptomatic: the code may assign them to different comdat groups and may propagate that furhter. Currently ipa_comdats doesn't set comdat groups for thunks. At the I see, that is a bug. It is supposed to keep thunks in the same section as their target (thunks doesn't really work across sections on some target, like PPC, because there is no way to produce a tailcall) Does the following fix the problem? Index: ipa-comdats.c === --- ipa-comdats.c (revision 221857) +++ ipa-comdats.c (working copy) @@ -377,12 +377,12 @@ fprintf (dump_file, To group: %s\n, IDENTIFIER_POINTER (group)); } if (is_a cgraph_node * (symbol)) - dyn_cast cgraph_node *(symbol)-call_for_symbol_and_aliases + dyn_cast cgraph_node *(symbol)-call_for_symbol_thunks_and_aliases (set_comdat_group_1, *comdat_head_map.get (group), true); else - symbol-call_for_symbol_and_aliases + symbol-call_for_symbol_thunks_and_aliases (set_comdat_group, *comdat_head_map.get (group), true); same time all references to local symbol should be within one comdat group. That's why I need this fix. Assigning function and its instrumentation thunks to different comdat groups would be an error because both nodes represent the same function. Thanks, Ilya Honza
Re: [CHKP] Never expand instrumentation thunks
2015-04-03 0:13 GMT+03:00 Jan Hubicka hubi...@ucw.cz: On 04/02/2015 02:11 PM, Jan Hubicka wrote: 2015-03-18 Ilya Enkovich ilya.enkov...@intel.com * cgraphunit.c (cgraph_node::expand_thunk): Don't expand instrumentation thunks. diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c index e640907..abc9cfe 100644 --- a/gcc/cgraphunit.c +++ b/gcc/cgraphunit.c @@ -1508,6 +1508,10 @@ cgraph_node::expand_thunk (bool output_asm_thunks, bool force_gimple_thunk) tree thunk_fndecl = decl; tree a; + /* Instrumentation thunk is the same function with + a different signature. Never need to expand it. */ + if (thunk.add_pointer_bounds_args) +return false; Yeah, this is another case where we hit problem with transparent alias pretending to be thunk :) I guess the patch is OK for GCC-5 and for next stage1 we can clean this up. I was actually building a compiler so I could take a look at this one under a debugger ;-) I think it is really the transparent alias issue. The comment seems pretty clear about it. What is confusing is that instrumentation thunks are called thunks when they are really not - thunk is a small hunk of code, while instrumentation thunk is a transparent alias. Too bad I did not notice we introduced transparent aliases, i would push out my code for that. I will compare Ilya's changes with mine and hopefully we can catch more bugs and unify the code next stage1. Thunk was the best I could find to represent the same function but with different signature. It would be great to have a more specialized way for that. Yeah, that is also what we need for https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61886 I just did not notice you are independently solving the same issue :( Honza
Re: [PATCH] [ARM] Add support for the Samsung Exynos M1 processor
Hi, On Thu, Apr 2, 2015 at 5:51 PM, James Greenhalgh james.greenha...@arm.com wrote: Trunk is currently in Stage 4 development, these patches are fairly low-risk, but they are certainly not regression fixes. I'll defer to port maintainers and release managers for the final say, but in my opinion it would not be appropriate to commit them until Stage 1 development for GCC 6.0 opens (hopefully in a few weeks). I thought that adding flags for new processors was ok at any time, even to backport. For the AArch64 patch, I was expecting to see a respin after Junmo's comment on Wednesday [1]. In particular: +AARCH64_CORE(exynos-m1, exynosm1, exynosm1, 8, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, exynosm1) As there has not been a scheduling model contributed for the exynos-m1, this will use *no* scheduling model on AArch64. This is unlikely to give good performance. Fixed in the attached patch. +static const struct tune_params exynosm1_tunings = +{ + cortexa57_extra_costs, + cortexa57_addrcost_table, + cortexa57_regmove_cost, + cortexa57_vector_cost, + 4, /* memmov_cost */ + 3, /* issue_rate */ + (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD + | AARCH64_FUSE_MOVK_MOVK), /* fuseable_ops */ + 16, /* function_align. */ + 8, /* jump_align. */ + 4, /* loop_align. */ + 2, /* int_reassoc_width. */ + 4, /* fp_reassoc_width. */ + 1/* vec_reassoc_width. */ +}; + As these are identical to the Cortex-A57 tuning, is there any reason to add them? I'd prefer if we took a copy-on-modify policy for these tuning structs, only adding them where there is a difference. Agreed. Fixed. I'm testing the two patches with bootstrap and make check on Juno aarch64-linux and Arndale arm-linux. Ok for trunk after regstrap passes? Thanks, Sebastian 0001-ARM-add-option-for-the-Samsung-Exynos-M1-core.patch Description: Binary data 0002-AArch64-add-option-for-the-Samsung-Exynos-M1-core-fo.patch Description: Binary data
Re: [libstdc++/65033] Give alignment info to libatomic
On 04/03/2015 07:13 AM, Jonathan Wakely wrote: +++ b/libstdc++-v3/include/std/atomic @@ -165,9 +165,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION struct atomic { private: - // Align 1/2/4/8/16-byte types the same as integer types of that size. + // Align 1/2/4/8/16-byte types the natural alignment of that size. // This matches the alignment effects of the C11 _Atomic qualifier. - static constexpr int _S_min_alignment + static constexpr int _S_int_alignment = sizeof(_Tp) == sizeof(char) ? alignof(char) : sizeof(_Tp) == sizeof(short) ? alignof(short) : sizeof(_Tp) == sizeof(int) ? alignof(int) @@ -178,6 +178,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION #endif : 0; + static constexpr int _S_min_alignment + = _S_int_alignment sizeof(_Tp) ? _S_int_alignment : sizeof(_Tp); + This doesn't work for non-power-of-two sized _Tp. Again, we don't have *any* target for which alignof(integer) sizeof(integer). So if you care about forcing natural alignment, don't bother with the alignof on the integrals, as you're doing with _S_int_alignment at the moment. r~
Re: [PING, AArch64] Add long-call attribute and pragma interfaces
On 04/02/2015 11:59 PM, Yangfei (Felix) wrote: Patch ping: https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01148.html This patch needs documentation for the new attributes and pragmas before it can be committed. (Since this is a new feature I think it has to wait until stage 1, too, but that's not my call.) -Sandra
Another tweak of inline badness metric
Hi, this patch adds combined_size (i.e. size of caller after inlining) to the denominator of the badness metric. This is based on observations in PR 65076. Old metric did use relative time speedup to get into sane range with fixed point math. I removed this after switching to sreals and this caused quite noticeable compile time slowdowns on tramp3d and a regression on dromaeo benchmark of Firefox. The issue turns out to be the fact that new metric preffer building very large functions with are not completely cool for optimization and definitely bad for compile time. I am comitting the attached patch and plan to watch benchmark results over the long weekend. If everything goes well, I plan to also increase somewhat the inline-unit-growth. In GCC 4.9 it is 30%, I dropped it to 15% that seems too tight for firefox. Given that large applications was the main motivator for the drop, I think it is good enough reason to retreat and do not be so agressive about code size at -O2/-O3. Bootstrapped/regtested x86_64-linux, will commit it shortly. Honza PR ipa/65076 * ipa-inline.c (edge_badness): Add combined size to the denominator. Index: ipa-inline.c === --- ipa-inline.c(revision 221806) +++ ipa-inline.c(working copy) @@ -1077,8 +1077,8 @@ edge_badness (struct cgraph_edge *edge, /* When profile is available. Compute badness as: time_saved * caller_count - goodness = - -growth_of_caller * overall_growth + goodness = - +growth_of_caller * overall_growth * combined_size badness = - goodness @@ -1167,6 +1167,7 @@ edge_badness (struct cgraph_edge *edge, overall_growth += 256 * 256 - 256; denominator *= overall_growth; } + denominator *= inline_summaries-get (caller)-self_size + growth; badness = - numerator / denominator;
Re: [PATCH, libmpx, i386, PR driver/65444] Pass '-z bndplt' when building dynamic objects with MPX
On Tue, 31 Mar 2015, Ilya Enkovich wrote: +library. It also passes '-z bndplt' to a linker in case it supports this +option (which is checked on libmpx configuration). Note that old versions +of linker may ignore option. Gold linker doesn't support '-z bndplt' +option. With no '-z bndplt' support in linker all calls to dynamic libraries +lose passed bounds reducing overall protection level. It's highly +recommended to use linker with '-z bndplt' support. In case such linker +is not available it is adviced to always use @option{-static-libmpxwrappers} +for better protection level or use @option{-static} to completely avoid +external calls to dynamic libraries. MPX-based instrumentation Use @samp{-z bndplt} rather than '' quoting (but Sandra may have further advice on the substance of this documentation). -- Joseph S. Myers jos...@codesourcery.com
Work around ICE in PR 65648
Hi, the dealII compilation ICEs on check that verifies that projected size match the size after inlining. This check suffers from roundoff errors in corner cases and I believe it is what triggers here. Given that I have a reorg of this code to sreals scheduled for next stage1, I think we can just disable the sanity check in GCC 5. It is non-critical thoguh very useful to fix varoius bug that causes propagation to diverge. Will commit it shortly as obvious. PR ipa/65648 * ipa-inline-transform.c (inline_call): Skip sanity check to work around the ICE Index: ipa-inline-transform.c === --- ipa-inline-transform.c (revision 221859) +++ ipa-inline-transform.c (working copy) @@ -304,7 +304,8 @@ inline_call (struct cgraph_edge *e, bool struct cgraph_node *callee = e-callee-ultimate_alias_target (); bool new_edges_found = false; -#ifdef ENABLE_CHECKING + /* This is used only for assert bellow. */ +#if 0 int estimated_growth = estimate_edge_growth (e); bool predicated = inline_edge_summary (e)-predicate != NULL; #endif @@ -375,7 +376,10 @@ inline_call (struct cgraph_edge *e, bool to-calls_comdat_local = false; } -#ifdef ENABLE_CHECKING + /* FIXME: This assert suffers from roundoff errors, disable it for GCC 5 + and revisit it after conversion to sreals in GCC 6. + See PR 65654. */ +#if 0 /* Verify that estimated growth match real growth. Allow off-by-one error due to INLINE_SIZE_SCALE roudoff errors. */ gcc_assert (!update_overall_summary || !overall_size || new_edges_found
[PATCH], PR target/65614, Prefer to use LFD on powerpc for constants
In my fix for PR target/65240, I removed the special -ffast-math code that delayed dealing with constants until reload time. In this patch, constants are now pushed to memory earlier, and the compiler uses LFS (load floating point single) to load double precision constants. When you use the LRA register allocator (-mlra), it uses the Altivec registers for scalar data more frequently, and there appears to be interactions between values loaded up as single constants that are moved to the Altivec registers via XXLOR. This patch makes (float_extend (mem)) slightly more costly than just (mem) and the code in expr.c will not compress the constant. In addition, for scalar single precision moves it uses copy sign instead of or to move the data. The copy sign instruction deals with single precision values that would create denormals. While working in the code, I also noticed that truncdfsf2 did not have support for ISA 2.07, so I added support for it. I have done bootstraps and make check with no regressions (after fixing the two tests that were checking that LFS was used). I have also built and run the Spec 2006 benchmark bwaves with the patch, and it now runs when compiled with -mlra and upper register support. Is the patch ok to commit to trunk? [gcc] 2015-04-03 Michael Meissner meiss...@linux.vnet.ibm.com PR target/65614 * config/rs6000/rs6000.c (rs6000_rtx_costs): Make FLOAT_EXTEND more expensive, so that LFD is used to load double constants, and not LFS. * config/rs6000/rs6000.md (extendsfdf2_fpr): Generate XSCPSGNDP instead of XXLOR to copy SFmode to clear out dirty bits created when SFmode denormals are generated. (movmode_hardfloat, FMOVE32 case): Likewise. (truncdfsf2_fpr): Add support for ISA 2.07 XSRSP instruction. [gcc/testsuite] 2015-04-03 Michael Meissner meiss...@linux.vnet.ibm.com PR target/65614 * gcc.target/powerpc/compress-float-ppc-pic.c: Run test on power5 to get floating point compression. * gcc.target/powerpc/compress-foat-ppc.c: Likewise. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797 Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 221802) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -30479,8 +30479,10 @@ rs6000_rtx_costs (rtx x, int code, int o return false; case FLOAT_EXTEND: + /* Make converts on newer machines slightly more expensive to encourage +expr.c to not use a LFS instead of LFD to load constants. */ if (mode == DFmode) - *total = 0; + *total = (TARGET_VSX || TARGET_POPCNTD) ? 1 : 0; else *total = rs6000_cost-fp; return false; Index: gcc/config/rs6000/rs6000.md === --- gcc/config/rs6000/rs6000.md (revision 221802) +++ gcc/config/rs6000/rs6000.md (working copy) @@ -5222,7 +5222,7 @@ (define_insn_and_split *extendsfdf2_fpr fmr %0,%1 lfs%U1%X1 %0,%1 # - xxlor %x0,%x1,%x1 + xscpsgndp %x0,%x1,%x1 lxsspx %x0,%y1 reload_completed REG_P (operands[1]) REGNO (operands[0]) == REGNO (operands[1]) [(const_int 0)] @@ -5230,7 +5230,7 @@ (define_insn_and_split *extendsfdf2_fpr emit_note (NOTE_INSN_DELETED); DONE; } - [(set_attr type fp,fp,fpload,fp,vecsimple,fpload)]) + [(set_attr type fp,fp,fpload,fp,fp,fpload)]) (define_expand truncdfsf2 [(set (match_operand:SF 0 gpc_reg_operand ) @@ -5239,10 +5239,12 @@ (define_expand truncdfsf2 ) (define_insn *truncdfsf2_fpr - [(set (match_operand:SF 0 gpc_reg_operand =f) - (float_truncate:SF (match_operand:DF 1 gpc_reg_operand d)))] + [(set (match_operand:SF 0 gpc_reg_operand =f,wy) + (float_truncate:SF (match_operand:DF 1 gpc_reg_operand d,ws)))] TARGET_HARD_FLOAT TARGET_FPRS TARGET_DOUBLE_FLOAT - frsp %0,%1 + @ + frsp %0,%1 + xsrsp %x0,%x1 [(set_attr type fp)]) ;; This expander is here to avoid FLOAT_WORDS_BIGENDIAN tests in @@ -8058,7 +8060,7 @@ (define_insn movmode_hardfloat lwz%U1%X1 %0,%1 stw%U0%X0 %1,%0 fmr %0,%1 - xxlor %x0,%x1,%x1 + xscpsgndp %x0,%x1,%x1 xxlxor %x0,%x0,%x0 li %0,0 f32_li @@ -8070,7 +8072,7 @@ (define_insn movmode_hardfloat mt%0 %1 mf%1 %0 nop - [(set_attr type *,load,store,fp,vecsimple,vecsimple,integer,fpload,fpstore,fpload,fpstore,mftgpr,mffgpr,mtjmpr,mfjmpr,*) + [(set_attr type *,load,store,fp,fp,vecsimple,integer,fpload,fpstore,fpload,fpstore,mftgpr,mffgpr,mtjmpr,mfjmpr,*) (set_attr length 4)]) (define_insn *movmode_softfloat Index: gcc/testsuite/gcc.target/powerpc/compress-float-ppc.c === --- gcc/testsuite/gcc.target/powerpc/compress-float-ppc.c (revision 221802) +++
Re: [PATCH], PR target/65614, Prefer to use LFD on powerpc for constants
On Fri, Apr 3, 2015 at 3:07 PM, Michael Meissner meiss...@linux.vnet.ibm.com wrote: In my fix for PR target/65240, I removed the special -ffast-math code that delayed dealing with constants until reload time. In this patch, constants are now pushed to memory earlier, and the compiler uses LFS (load floating point single) to load double precision constants. When you use the LRA register allocator (-mlra), it uses the Altivec registers for scalar data more frequently, and there appears to be interactions between values loaded up as single constants that are moved to the Altivec registers via XXLOR. This patch makes (float_extend (mem)) slightly more costly than just (mem) and the code in expr.c will not compress the constant. In addition, for scalar single precision moves it uses copy sign instead of or to move the data. The copy sign instruction deals with single precision values that would create denormals. While working in the code, I also noticed that truncdfsf2 did not have support for ISA 2.07, so I added support for it. I have done bootstraps and make check with no regressions (after fixing the two tests that were checking that LFS was used). I have also built and run the Spec 2006 benchmark bwaves with the patch, and it now runs when compiled with -mlra and upper register support. Is the patch ok to commit to trunk? The FLOAT_EXTEND cost should be based on the processor tuning, not the ISA. - David
Re: [PATCH] [ARM] Add support for the Samsung Exynos M1 processor
On Fri, Apr 03, 2015 at 07:53:12PM +0100, Ramana Radhakrishnan wrote: On Fri, Apr 3, 2015 at 5:17 PM, Sebastian Pop seb...@gmail.com wrote: Hi, On Thu, Apr 2, 2015 at 5:51 PM, James Greenhalgh james.greenha...@arm.com wrote: Trunk is currently in Stage 4 development, these patches are fairly low-risk, but they are certainly not regression fixes. I'll defer to port maintainers and release managers for the final say, but in my opinion it would not be appropriate to commit them until Stage 1 development for GCC 6.0 opens (hopefully in a few weeks). I thought that adding flags for new processors was ok at any time, even to backport. It's usually risk vs reward on a per patch basis and I don't think of it as a general rule. We've always avoided the CPU tuning backport rule to the FSF branches. The smaller the CPU tuning patch - the better it is and in this case I'm comfortable with the patch going in as it is adding another tuning option, using existing constructs and is not invasive in the backend. Thanks for the clarification Ramana. In which case, and now that I've seen that binutils support has also been accepted, the AArch64 part is OK to commit (assuming no regressions and no objections from Richard or Jakub). It would be great if you could follow these up with a patch for changes.html for GCC 5 for both ARM and AArch64. Cheers, James
Re: [PATCH] [ARM] Add support for the Samsung Exynos M1 processor
On Fri, Apr 3, 2015 at 4:09 PM, James Greenhalgh james.greenha...@arm.com wrote: On Fri, Apr 03, 2015 at 07:53:12PM +0100, Ramana Radhakrishnan wrote: On Fri, Apr 3, 2015 at 5:17 PM, Sebastian Pop seb...@gmail.com wrote: Hi, On Thu, Apr 2, 2015 at 5:51 PM, James Greenhalgh james.greenha...@arm.com wrote: Trunk is currently in Stage 4 development, these patches are fairly low-risk, but they are certainly not regression fixes. I'll defer to port maintainers and release managers for the final say, but in my opinion it would not be appropriate to commit them until Stage 1 development for GCC 6.0 opens (hopefully in a few weeks). I thought that adding flags for new processors was ok at any time, even to backport. It's usually risk vs reward on a per patch basis and I don't think of it as a general rule. We've always avoided the CPU tuning backport rule to the FSF branches. The smaller the CPU tuning patch - the better it is and in this case I'm comfortable with the patch going in as it is adding another tuning option, using existing constructs and is not invasive in the backend. Thanks for the clarification Ramana. In which case, and now that I've seen that binutils support has also been accepted, the AArch64 part is OK to commit (assuming no regressions and no objections from Richard or Jakub). I will wait to hear from Richi or Jakub before committing the two patches. It would be great if you could follow these up with a patch for changes.html for GCC 5 for both ARM and AArch64. Attached. I will commit this after the two patches adding the exynos-m1 flags. Thanks, Sebastian *** changes.html.~1.92.~2015-03-27 09:42:10.0 -0500 --- changes.html2015-04-03 21:57:33.550001606 -0500 *** *** 585,591 (codecortex-a72/code) and initial support for its big.LITTLE combination with the ARM Cortex-A53 (codecortex-a72.cortex-a53/code), Cavium ThunderX (codethunderx/code), Applied Micro X-Gene 1 ! (codexgene1/code). The GCC identifiers can be used as arguments to the code-mcpu/code or code-mtune/code options, for example: code-mcpu=xgene1/code or --- 585,591 (codecortex-a72/code) and initial support for its big.LITTLE combination with the ARM Cortex-A53 (codecortex-a72.cortex-a53/code), Cavium ThunderX (codethunderx/code), Applied Micro X-Gene 1 ! (codexgene1/code), and Samsung Exynos M1 (codeexynos-m1/code). The GCC identifiers can be used as arguments to the code-mcpu/code or code-mtune/code options, for example: code-mcpu=xgene1/code or *** *** 624,630 (codecortex-a72/code) and initial support for its big.LITTLE combination with the ARM Cortex-A53 (codecortex-a72.cortex-a53/code), ARM Cortex-M7 (codecortex-m7/code), Applied Micro X-Gene 1 !(codexgene1/code). The GCC identifiers can be used as arguments to the code-mcpu/code or code-mtune/code options, for example: code-mcpu=xgene1/code or code-mtune=cortex-a72.cortex-a53/code. --- 624,631 (codecortex-a72/code) and initial support for its big.LITTLE combination with the ARM Cortex-A53 (codecortex-a72.cortex-a53/code), ARM Cortex-M7 (codecortex-m7/code), Applied Micro X-Gene 1 !(codexgene1/code), and Samsung Exynos M1 (codeexynos-m1/code). !The GCC identifiers can be used as arguments to the code-mcpu/code or code-mtune/code options, for example: code-mcpu=xgene1/code or code-mtune=cortex-a72.cortex-a53/code.
Re: [libstdc++/65033] Give alignment info to libatomic
On 03/04/15 05:24 -0400, Hans-Peter Nilsson wrote: On Thu, 2 Apr 2015, Hans-Peter Nilsson wrote: Why then use __alignof(_M_i) (the object-alignment) instead of _S_alignment (the deduced alas insufficiently increased type-alignment)? Isn't the object aligned to _S_alignment? (The immediate reason is that _S_alignment wasn't there until a later revision, but the gist of the question remains. :-) making sure that atomic_is_lock_free returns the same value for all objects of a given type, (That would work but it doesn't seem to be the case.) Because we haven't done anything to increase the alignment for the __atomic_base_ITp specialization yet, see the additional patch at: https://gcc.gnu.org/ml/gcc-patches/2015-03/msg01664.html (Although that's insufficient as you point out, because it should depend on the size too.) we probably should have changed the interface so that we would pass size and alignment rather than size and object pointer. Instead, we decided that passing null for the object pointer would be sufficient. But as this PR shows, we really do need to take alignment into account. Regarding what's actually needed, alignment of an atomic type should always be *forced to be at least the natural alignment of of the object* (with non-power-of-two sized-objects rounded up) and until then atomic types won't work for targets where the non-atomic equivalents have less alignment (as straddling a page-boundary won't be lock-less-atomic anywhere where straddling a page-boundary may cause a non-atomic-access...) So, not target-specific except for targets that require even higher-than-natural alignment. So, can we do something like this instead for gcc5? (Completely untested and may be syntactically, namespacingly and cxxstandardversionly flawed.) Index: include/std/atomic === --- include/std/atomic (revision 221849) +++ include/std/atomic (working copy) @@ -165,16 +165,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION struct atomic { private: - // Align 1/2/4/8/16-byte types the same as integer types of that size. + // Align 1/2/4/8/16-byte types to the natural alignment of that size. // This matches the alignment effects of the C11 _Atomic qualifier. static constexpr int _S_min_alignment - = sizeof(_Tp) == sizeof(char) ? alignof(char) - : sizeof(_Tp) == sizeof(short) ? alignof(short) - : sizeof(_Tp) == sizeof(int) ? alignof(int) - : sizeof(_Tp) == sizeof(long) ? alignof(long) - : sizeof(_Tp) == sizeof(long long) ? alignof(long long) + = sizeof(_Tp) == sizeof(char) ? max(sizeof(char), alignof(char)) + : sizeof(_Tp) == sizeof(short) ? max(sizeof(short), alignof(short)) + : sizeof(_Tp) == sizeof(int) ? max(sizeof(int), alignof(int)) + : sizeof(_Tp) == sizeof(long) ? max(sizeof(long), alignof(long)) + : sizeof(_Tp) == sizeof(long long) ? max(sizeof(long long), alignof(long long)) #ifdef _GLIBCXX_USE_INT128 - : sizeof(_Tp) == sizeof(__int128) ? alignof(__int128) + : sizeof(_Tp) == sizeof(__int128) ? max(sizeof(__int128), alignof(__int128)) #endif : 0; Instead of changing every case in the condition to include sizeof why not just do it afterwards using sizeof(_Tp), in the _S_alignment calculation? We know sizeof(_Tp) == sizeof(corresponding integer type) because that's the whole point of the conditionals! See attachment. @@ -216,7 +216,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION is_lock_free() const noexcept { // Produce a fake, minimally aligned pointer. - void *__a = reinterpret_castvoid *(-__alignof(_M_i)); + void *__a = reinterpret_castvoid *(-_S_alignment); return __atomic_is_lock_free(sizeof(_M_i), __a); } If _M_i is aligned to _S_alignment then what difference does the change above make? It doesn't matter if the value is per-object if we've forced all such objects to have the same alignment, does it? Or is it different if a std::atomicT is included in some other struct and the user forces a different alignment on it? I don't think we really need to support that, users shouldn't be doing that. Index: include/bits/atomic_base.h === --- include/bits/atomic_base.h (revision 221849) +++ include/bits/atomic_base.h (working copy) @@ -240,7 +240,23 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION private: typedef _ITp __int_type; - __int_type _M_i; + // Align 1/2/4/8/16-byte types to the natural alignment of that size. + // This matches the alignment effects of the C11 _Atomic qualifier. + static constexpr int _S_min_alignment + = sizeof(_Tp) == sizeof(char) ? max(sizeof(char), __alignof(char)) + : sizeof(_Tp) == sizeof(short) ? max(sizeof(short), __alignof(short)) + : sizeof(_Tp) == sizeof(int) ?
Re: [PR64164] drop copyrename, integrate into expand
On Mar 31, 2015, Jeff Law l...@redhat.com wrote: - if (var1 != var2) + if (var1 != var2 !optimize) return false; So when when the base variables are different and we are optimizing, this allows coalescing, right? Yeah. What I don't see is a corresponding change to var_map_base_init to ensure we build a conflict graph which includes objects when SSA_NAME_VARs are not the same. I see a vague reference in var_map_base_init's header comment that refers us to coalesce_ssa_name. It appears that compute_optimized_partition_bases handles this by creating a partitions of things that are related by copies/phis regardless of their underlying named object, type, etc. Right? Correct. I guess it makes sense to move partition base computation to a single location. Since compute_optimized_partition_bases relies on data structures local to this source file, I'm moving the non-optimized version to tree-ssa-coalesce.c, and dropping support for basevar initialization from tree-ssa-live.c. Hard to argue with removing a pass that gets called 5 times! :-) @@ -890,6 +900,36 @@ build_ssa_conflict_graph (tree_live_info_p liveinfo) live_track_process_def (live, result, graph); } + /* Pretend there are defs for params' default defs at the start + of the (post-)entry block. We run after abnormal coalescing, + so we can't assume the leader variable is the default + definition, but because of SSA_NAME_VAR adjustments in + attempt_coalesce, we can assume that if there is any + PARM_DECL in the partition, it will be the leader's + SSA_NAME_VAR. */ This comment is outdated. Since we no longer have abnormal coalescing before building the conflict graph, we can just test whether each SSA_NAME is a default def for a PARM_DECL and be done with it. So the issue here is you want to iterate over the objects live at the entry block, which would include any SSA_NAMEs which result from PARM_DECLs. I don't guess there's an easier way to do that other than iterating over everything live in that initial block? We could iterate over all SSA_NAMEs, but that would probably be more costly. There shouldn't be very many live variables at the function entry, so using the live bitmaps is likely to save us time, especially on functions with lots of SSA_NAMEs. And the second second EXECUTE_IF_SET_IN_BITMAP iterates over everything in the partitions associated with the SSA_NAMES that are live at the the entry block, right? Yeah, we iterate over the bases in live_base_var, because the per-base bitmaps are only accurate when the corresponding live_base_var bit is iset. @@ -1126,11 +1166,12 @@ create_outofssa_var_map (coalesce_list_p cl, bitmap used_in_copy) static inline bool attempt_coalesce (var_map map, ssa_conflicts_p graph, int x, int y, - FILE *debug) + bitmap param_defaults, FILE *debug) [ ... ] So the bulk of the changes into this routine are really about picking a good leader, which presumably is how we're able to get the desired effects on debuginfo that we used to get from tree-ssa-copyrename.c? This has nothing to do with debuginfo, I'm afraid. We just had to keep track of parm and result decls to avoid coalescing them together, and parm decl default defs to promote them to leaders, because expand copies incoming REGs to pseudos in PARM_DECL's DECL_RTL. We should fill that in with the RTL created for the default def for the PARM_DECL. At the end, I believe we also copy the RESULT_DECL DECL_RTL to the actual return register or rtl. I didn't want to tackle the reworking of these expanders to avoid problems out of copying incoming parms to one pseudo and then reading it from another, as I observed before I made this change. I'm now tackling that, so that we can refrain from touching the base vars altogether, and not refrain from coalescing parms and results. So some comments about the various cases here might help. I can sort them out if I read the code, but one could argue that a block comment on the rules for how to select the partition leader would be better. *nod*. I won't bother, though, if this code ends up gone in the next iteration of the patch ;-) Is the special casing of PARM_DECLs + RESULT_DECLs really a failing of not handling one or both properly when computing liveness information? No, it's about RTL assignment and copying to/from hard regs. We assign RTL to PARM_DECLs and RESULT_DECLs explicitly in the expander, but we can't assign different RTL to them if they are coalesced in a single partition. I'm not aware of an inherent reason why a PARM_DECL couldn't coalesce with a related RESULT_DECL if they are otherwise non-conflicting and related by a copy/phi. Indeed, there isn't any inherent reason. It was just a restriction I carried over from copyrename, and that I postponed cleaning up. Presumably ordering of unioning of the partitions doesn't matter here as we're looking at coalesce
Re: Silence merge warnings on artificial types
On Thu, Apr 02, 2015 at 09:05:53PM +0200, Jakub Jelinek wrote: On Thu, Apr 02, 2015 at 09:23:03PM +0300, Ilya Verbin wrote: Hmm, libgomp.c++/target-3.C still fails. Here is what I see in need_assembler_name_p: Guess we should make the .omp_data_s.* types TYPE_ARTIFICIAL too. Will take care of that tomorrow. Tested on x86_64-linux, committed to trunk. 2015-04-03 Jakub Jelinek ja...@redhat.com * omp-low.c (scan_omp_parallel, scan_omp_task, scan_omp_target): Set TYPE_ARTIFICIAL on the .omp_data* types. --- gcc/omp-low.c.jj2015-03-23 08:47:51.0 +0100 +++ gcc/omp-low.c 2015-04-02 22:06:30.547850197 +0200 @@ -2351,6 +2351,7 @@ scan_omp_parallel (gimple_stmt_iterator DECL_ARTIFICIAL (name) = 1; DECL_NAMELESS (name) = 1; TYPE_NAME (ctx-record_type) = name; + TYPE_ARTIFICIAL (ctx-record_type) = 1; create_omp_child_function (ctx, false); gimple_omp_parallel_set_child_fn (stmt, ctx-cb.dst_fn); @@ -2391,6 +2392,7 @@ scan_omp_task (gimple_stmt_iterator *gsi DECL_ARTIFICIAL (name) = 1; DECL_NAMELESS (name) = 1; TYPE_NAME (ctx-record_type) = name; + TYPE_ARTIFICIAL (ctx-record_type) = 1; create_omp_child_function (ctx, false); gimple_omp_task_set_child_fn (stmt, ctx-cb.dst_fn); @@ -2404,6 +2406,7 @@ scan_omp_task (gimple_stmt_iterator *gsi DECL_ARTIFICIAL (name) = 1; DECL_NAMELESS (name) = 1; TYPE_NAME (ctx-srecord_type) = name; + TYPE_ARTIFICIAL (ctx-srecord_type) = 1; create_omp_child_function (ctx, true); } @@ -2671,6 +2674,7 @@ scan_omp_target (gomp_target *stmt, omp_ DECL_ARTIFICIAL (name) = 1; DECL_NAMELESS (name) = 1; TYPE_NAME (ctx-record_type) = name; + TYPE_ARTIFICIAL (ctx-record_type) = 1; if (offloaded) { if (is_gimple_omp_oacc (stmt)) Jakub
Re: [PR64164] drop copyrename, integrate into expand
On Mar 31, 2015, Richard Biener richard.guent...@gmail.com wrote: + || !(SSA_NAME_IS_DEFAULT_DEF (var) + || (param_defaults + bitmap_bit_p (param_defaults, part This looks somewhat awkward to me ;) Is it really important to allow coalescing PARM_DECL-based SSA vars with sth else? It's a valid optimization. I can't say it's really important, but if the only objection is to param_defaults, I'm getting rid of it. At least abnormal coalescing doesn't need to do that, so just walking over the function decls parameters and making their default-def live should be enough? It should. We'd have to duplicate logic of parameters, including static chain and whatnot. I figured this would make it more resilient to changes elsewhere. + else if (TREE_CODE (SSA_NAME_VAR (var2)) == VAR_DECL) + leader = SSA_NAME_VAR (var2); + else /* What else could it be? */ + gcc_unreachable (); + definitely comments missing in this spaghetti... I'm trying to remove the spaghetting now. or seeing this, why coalesce default-defs at all? Why not? (the referenced code is gone from my local tree, BTW) Either they are param values or they have indetermined values (and thus we can and do pick whatever is available at expansion time)? If they are param values, we want to have them available; if they aren't, whatever we coalesce with is good. So the above does full coalescing ignoring conflicts. Yeah. We want to tell what we'd get if all coalesce possibilities are taken, so that we can assign the same basevar to all partitions so that we detect conflicts. Did you do any statistics on how the number of basevars changes with your patch compared to trunk? 'fraid I didn't run any statistics whatsoever. I didn't think it was important, since it's pretty much just doing copyrename during coalesce. Truth be told, this has since relaxed some of the constraints from copyrename, and I'm going to relax some more in the next iteration, so I guess some statistics wouldn't be a bad idea. Is there any specific testcase you're interested in, or something like a GCC bootstrap or somesuch? -- Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer
Re: [RS6000] Fix 65576 regression
On Fri, Apr 3, 2015 at 8:51 AM, Alan Modra amo...@gmail.com wrote: On Thu, Apr 02, 2015 at 08:02:57PM -0400, David Edelsohn wrote: If the final alternative cannot occur, it should be removed as well. Alan, would you please test that change also? Tested powerpc64-linux and powerpc-linux no regressions. * config/rs6000/rs6000.md (extenddftf2_internal): Remove last alternative. Okay with the two patches combined. Thanks, David
Re: [libcpp] Fix ICE with conditional macros (PR preprocessor/61977)
OK. Jason