Re: [RFC] propagate malloc attribute in ipa-pure-const pass
On Fri, 27 Oct 2017, Jan Hubicka wrote: > > On 25 October 2017 at 20:44, Jan Hubickawrote: > > >> On 24 October 2017 at 16:26, Jan Hubicka wrote: > > >> >> 2017-10-13 Prathamesh Kulkarni > > >> >> > > >> >> * cgraph.h (set_malloc_flag): Declare. > > >> >> * cgraph.c (set_malloc_flag_1): New function. > > >> >> (set_malloc_flag): Likewise. > > >> >> * ipa-fnsummary.h (ipa_call_summary): Add new field > > >> >> is_return_callee. > > >> >> * ipa-fnsummary.c (ipa_call_summary::reset): Set > > >> >> is_return_callee to > > >> >> false. > > >> >> (read_ipa_call_summary): Add support for reading > > >> >> is_return_callee. > > >> >> (write_ipa_call_summary): Stream is_return_callee. > > >> >> * ipa-inline.c (ipa_inline): Remove call to ipa_free_fn_summary. > > >> >> * ipa-pure-const.c: Add headers ssa.h, alloc-pool.h, > > >> >> symbol-summary.h, > > >> >> ipa-prop.h, ipa-fnsummary.h. > > >> >> (pure_const_names): Change to static. > > >> >> (malloc_state_e): Define. > > >> >> (malloc_state_names): Define. > > >> >> (funct_state_d): Add field malloc_state. > > >> >> (varying_state): Set malloc_state to STATE_MALLOC_BOTTOM. > > >> >> (check_retval_uses): New function. > > >> >> (malloc_candidate_p): Likewise. > > >> >> (analyze_function): Add support for malloc attribute. > > >> >> (pure_const_write_summary): Stream malloc_state. > > >> >> (pure_const_read_summary): Add support for reading malloc_state. > > >> >> (dump_malloc_lattice): New function. > > >> >> (propagate_malloc): New function. > > >> >> (ipa_pure_const::execute): Call propagate_malloc and > > >> >> ipa_free_fn_summary. > > >> >> (pass_local_pure_const::execute): Add support for malloc > > >> >> attribute. > > >> >> * ssa-iterators.h (RETURN_FROM_IMM_USE_STMT): New macro. > > >> >> > > >> >> testsuite/ > > >> >> * gcc.dg/ipa/propmalloc-1.c: New test-case. > > >> >> * gcc.dg/ipa/propmalloc-2.c: Likewise. > > >> >> * gcc.dg/ipa/propmalloc-3.c: Likewise. > > >> > > > >> > OK. > > >> > Perhaps we could also add -Wsuggest-sttribute=malloc and mention it in > > >> > changes.html? > > >> Done in this version. > > >> In warn_function_malloc(), I passed false for known_finite param to > > >> suggest_attribute(). > > >> Does that look OK ? > > >> Validation in progress. OK to commit if passes ? > > > > > > OK, thanks! > > Thanks, committed as r254140 after following validation: > > 1] Bootstrap+test with --enable-languages=all,ada,go on > > x86_64-unknown-linux-gnu and ppc64le-linux-gnu. > > 2] LTO bootstrap+test on x86_64-unknown-linux-gnu and ppc64le-linux-gnu > > 3] Cross tested on arm*-*-* and aarch64*-*-*. > > > > Would it be a good idea to extend ipa-pure-const to propagate > > alloc_size/alloc_align and returns_nonnull attributes ? > > Which other attributes would be useful to propagate in ipa-pure-const ? > > Also one extension I was considering was TBAA mod-ref pass. I.e. propagate > what > types are read/stored by the call, rather than having just pure/const (no > stores, > no reads at all). > > This will be bit fun to implement in IPA, but it may be useful. If you would > be interested in looking into this, we can discuss it (I wanted to implement > it > this stage1 but I think I have way too many other plans). > > LLVM also has nocapture that seems useful for PTA. Richi may have useful > opinions on that ;) I once tried to prototype "fn spec" attribute autodetection and IPA propagation (also into IPA pure const). Didn't get very far though. That tracks per function argument properties like whether memory is written to through this pointer or whether a pointer possibly escapes from the function (including through its returned value). Richard.
Re: [RFC] propagate malloc attribute in ipa-pure-const pass
> On 25 October 2017 at 20:44, Jan Hubickawrote: > >> On 24 October 2017 at 16:26, Jan Hubicka wrote: > >> >> 2017-10-13 Prathamesh Kulkarni > >> >> > >> >> * cgraph.h (set_malloc_flag): Declare. > >> >> * cgraph.c (set_malloc_flag_1): New function. > >> >> (set_malloc_flag): Likewise. > >> >> * ipa-fnsummary.h (ipa_call_summary): Add new field > >> >> is_return_callee. > >> >> * ipa-fnsummary.c (ipa_call_summary::reset): Set is_return_callee > >> >> to > >> >> false. > >> >> (read_ipa_call_summary): Add support for reading is_return_callee. > >> >> (write_ipa_call_summary): Stream is_return_callee. > >> >> * ipa-inline.c (ipa_inline): Remove call to ipa_free_fn_summary. > >> >> * ipa-pure-const.c: Add headers ssa.h, alloc-pool.h, > >> >> symbol-summary.h, > >> >> ipa-prop.h, ipa-fnsummary.h. > >> >> (pure_const_names): Change to static. > >> >> (malloc_state_e): Define. > >> >> (malloc_state_names): Define. > >> >> (funct_state_d): Add field malloc_state. > >> >> (varying_state): Set malloc_state to STATE_MALLOC_BOTTOM. > >> >> (check_retval_uses): New function. > >> >> (malloc_candidate_p): Likewise. > >> >> (analyze_function): Add support for malloc attribute. > >> >> (pure_const_write_summary): Stream malloc_state. > >> >> (pure_const_read_summary): Add support for reading malloc_state. > >> >> (dump_malloc_lattice): New function. > >> >> (propagate_malloc): New function. > >> >> (ipa_pure_const::execute): Call propagate_malloc and > >> >> ipa_free_fn_summary. > >> >> (pass_local_pure_const::execute): Add support for malloc > >> >> attribute. > >> >> * ssa-iterators.h (RETURN_FROM_IMM_USE_STMT): New macro. > >> >> > >> >> testsuite/ > >> >> * gcc.dg/ipa/propmalloc-1.c: New test-case. > >> >> * gcc.dg/ipa/propmalloc-2.c: Likewise. > >> >> * gcc.dg/ipa/propmalloc-3.c: Likewise. > >> > > >> > OK. > >> > Perhaps we could also add -Wsuggest-sttribute=malloc and mention it in > >> > changes.html? > >> Done in this version. > >> In warn_function_malloc(), I passed false for known_finite param to > >> suggest_attribute(). > >> Does that look OK ? > >> Validation in progress. OK to commit if passes ? > > > > OK, thanks! > Thanks, committed as r254140 after following validation: > 1] Bootstrap+test with --enable-languages=all,ada,go on > x86_64-unknown-linux-gnu and ppc64le-linux-gnu. > 2] LTO bootstrap+test on x86_64-unknown-linux-gnu and ppc64le-linux-gnu > 3] Cross tested on arm*-*-* and aarch64*-*-*. > > Would it be a good idea to extend ipa-pure-const to propagate > alloc_size/alloc_align and returns_nonnull attributes ? > Which other attributes would be useful to propagate in ipa-pure-const ? Also one extension I was considering was TBAA mod-ref pass. I.e. propagate what types are read/stored by the call, rather than having just pure/const (no stores, no reads at all). This will be bit fun to implement in IPA, but it may be useful. If you would be interested in looking into this, we can discuss it (I wanted to implement it this stage1 but I think I have way too many other plans). LLVM also has nocapture that seems useful for PTA. Richi may have useful opinions on that ;) Honza
Re: [RFC] propagate malloc attribute in ipa-pure-const pass
Prathamesh > > OK, thanks! > Thanks, committed as r254140 after following validation: > 1] Bootstrap+test with --enable-languages=all,ada,go on > x86_64-unknown-linux-gnu and ppc64le-linux-gnu. > 2] LTO bootstrap+test on x86_64-unknown-linux-gnu and ppc64le-linux-gnu > 3] Cross tested on arm*-*-* and aarch64*-*-*. > > Would it be a good idea to extend ipa-pure-const to propagate > alloc_size/alloc_align and returns_nonnull attributes ? > Which other attributes would be useful to propagate in ipa-pure-const ? Good I did not discourage you by slow review rate (will try to do something about it). I guess alloc_size/alloc_align are obvious candidates. returns_nonnull is a special case of VRP over returns so I would rather like to see ipa-prop/ipa-cp extended to handle return values and this done as a consequence. One interesting case I think we could try to track is what types of exceptions are thrown. Currently if function throws specific exception which is handled by caller, we still think caller may throw because we do not take types into consideration at all. I think that may mark significant portion of functions nothrow as this seems like common coding practice. It would be also nice to cleanup ipa-pure-const. I think one can implement propagation template where one just feeds the basic parameters (what data to store, what edges to consider) rahter than copying the rather outdated code again and again. > > Also, I would be grateful for suggestions on how to propagate malloc > attribute through indirect calls. > Currently, I have left it as FIXME, and simply drop the lattice to > MALLOC_BOTTOM if there's an indirect call :( > > I am not able to understand how attribute propagation across indirect > calls works. > For example consider propagation of nothrow attribute in following test-case: > > __attribute__((noinline, noclone, nothrow)) > int f1(int k) { return k; } > > __attribute__((noinline, noclone)) > static int foo(int (*p)(int)) > { > return p(10); > } > > __attribute__((noinline, noclone)) > int bar(void) > { > return foo(f1); > } > > Shouldn't foo and bar be also marked as nothrow ? > Since foo indirectly calls f1 which is nothrow and bar only calls foo ? Well, foo may have other uses where it calls something else than f1 so one needs to track "nothrow in the context when f1 is called". In general I think this reduces to may edges in the callgraph (for given indirect calls we want to track whether we know complete list of possible targets). We do that for polymorphic calls via ipa-polymorphic-call analysis but I did not get around implementing anything for indirect calls. To ge the list of targets, you call possible_polymorphic_call_targets which also tells you whether the list is complete (final flag) or whether there may be some callees invisible to compiler (such as derrivate types from other compilation unit). The lists may be large and for that reason there is cache token which tells you if you see same list again. Extending ipa-pure-const to work across final lists of polymorphic targets may be first step to handle indirect calls in general. Honza > > The local-pure-const2 dump shows function is locally throwing for > "foo" and "bar", and ipa-pure-const dump doesn't appear to show foo and bar > marked as nothrow. > > Thanks, > Prathamesh > > Honza
Re: [RFC] propagate malloc attribute in ipa-pure-const pass
On 25 October 2017 at 20:44, Jan Hubickawrote: >> On 24 October 2017 at 16:26, Jan Hubicka wrote: >> >> 2017-10-13 Prathamesh Kulkarni >> >> >> >> * cgraph.h (set_malloc_flag): Declare. >> >> * cgraph.c (set_malloc_flag_1): New function. >> >> (set_malloc_flag): Likewise. >> >> * ipa-fnsummary.h (ipa_call_summary): Add new field >> >> is_return_callee. >> >> * ipa-fnsummary.c (ipa_call_summary::reset): Set is_return_callee to >> >> false. >> >> (read_ipa_call_summary): Add support for reading is_return_callee. >> >> (write_ipa_call_summary): Stream is_return_callee. >> >> * ipa-inline.c (ipa_inline): Remove call to ipa_free_fn_summary. >> >> * ipa-pure-const.c: Add headers ssa.h, alloc-pool.h, >> >> symbol-summary.h, >> >> ipa-prop.h, ipa-fnsummary.h. >> >> (pure_const_names): Change to static. >> >> (malloc_state_e): Define. >> >> (malloc_state_names): Define. >> >> (funct_state_d): Add field malloc_state. >> >> (varying_state): Set malloc_state to STATE_MALLOC_BOTTOM. >> >> (check_retval_uses): New function. >> >> (malloc_candidate_p): Likewise. >> >> (analyze_function): Add support for malloc attribute. >> >> (pure_const_write_summary): Stream malloc_state. >> >> (pure_const_read_summary): Add support for reading malloc_state. >> >> (dump_malloc_lattice): New function. >> >> (propagate_malloc): New function. >> >> (ipa_pure_const::execute): Call propagate_malloc and >> >> ipa_free_fn_summary. >> >> (pass_local_pure_const::execute): Add support for malloc attribute. >> >> * ssa-iterators.h (RETURN_FROM_IMM_USE_STMT): New macro. >> >> >> >> testsuite/ >> >> * gcc.dg/ipa/propmalloc-1.c: New test-case. >> >> * gcc.dg/ipa/propmalloc-2.c: Likewise. >> >> * gcc.dg/ipa/propmalloc-3.c: Likewise. >> > >> > OK. >> > Perhaps we could also add -Wsuggest-sttribute=malloc and mention it in >> > changes.html? >> Done in this version. >> In warn_function_malloc(), I passed false for known_finite param to >> suggest_attribute(). >> Does that look OK ? >> Validation in progress. OK to commit if passes ? > > OK, thanks! Thanks, committed as r254140 after following validation: 1] Bootstrap+test with --enable-languages=all,ada,go on x86_64-unknown-linux-gnu and ppc64le-linux-gnu. 2] LTO bootstrap+test on x86_64-unknown-linux-gnu and ppc64le-linux-gnu 3] Cross tested on arm*-*-* and aarch64*-*-*. Would it be a good idea to extend ipa-pure-const to propagate alloc_size/alloc_align and returns_nonnull attributes ? Which other attributes would be useful to propagate in ipa-pure-const ? Also, I would be grateful for suggestions on how to propagate malloc attribute through indirect calls. Currently, I have left it as FIXME, and simply drop the lattice to MALLOC_BOTTOM if there's an indirect call :( I am not able to understand how attribute propagation across indirect calls works. For example consider propagation of nothrow attribute in following test-case: __attribute__((noinline, noclone, nothrow)) int f1(int k) { return k; } __attribute__((noinline, noclone)) static int foo(int (*p)(int)) { return p(10); } __attribute__((noinline, noclone)) int bar(void) { return foo(f1); } Shouldn't foo and bar be also marked as nothrow ? Since foo indirectly calls f1 which is nothrow and bar only calls foo ? The local-pure-const2 dump shows function is locally throwing for "foo" and "bar", and ipa-pure-const dump doesn't appear to show foo and bar marked as nothrow. Thanks, Prathamesh > Honza
Re: [RFC] propagate malloc attribute in ipa-pure-const pass
> On 24 October 2017 at 16:26, Jan Hubickawrote: > >> 2017-10-13 Prathamesh Kulkarni > >> > >> * cgraph.h (set_malloc_flag): Declare. > >> * cgraph.c (set_malloc_flag_1): New function. > >> (set_malloc_flag): Likewise. > >> * ipa-fnsummary.h (ipa_call_summary): Add new field is_return_callee. > >> * ipa-fnsummary.c (ipa_call_summary::reset): Set is_return_callee to > >> false. > >> (read_ipa_call_summary): Add support for reading is_return_callee. > >> (write_ipa_call_summary): Stream is_return_callee. > >> * ipa-inline.c (ipa_inline): Remove call to ipa_free_fn_summary. > >> * ipa-pure-const.c: Add headers ssa.h, alloc-pool.h, > >> symbol-summary.h, > >> ipa-prop.h, ipa-fnsummary.h. > >> (pure_const_names): Change to static. > >> (malloc_state_e): Define. > >> (malloc_state_names): Define. > >> (funct_state_d): Add field malloc_state. > >> (varying_state): Set malloc_state to STATE_MALLOC_BOTTOM. > >> (check_retval_uses): New function. > >> (malloc_candidate_p): Likewise. > >> (analyze_function): Add support for malloc attribute. > >> (pure_const_write_summary): Stream malloc_state. > >> (pure_const_read_summary): Add support for reading malloc_state. > >> (dump_malloc_lattice): New function. > >> (propagate_malloc): New function. > >> (ipa_pure_const::execute): Call propagate_malloc and > >> ipa_free_fn_summary. > >> (pass_local_pure_const::execute): Add support for malloc attribute. > >> * ssa-iterators.h (RETURN_FROM_IMM_USE_STMT): New macro. > >> > >> testsuite/ > >> * gcc.dg/ipa/propmalloc-1.c: New test-case. > >> * gcc.dg/ipa/propmalloc-2.c: Likewise. > >> * gcc.dg/ipa/propmalloc-3.c: Likewise. > > > > OK. > > Perhaps we could also add -Wsuggest-sttribute=malloc and mention it in > > changes.html? > Done in this version. > In warn_function_malloc(), I passed false for known_finite param to > suggest_attribute(). > Does that look OK ? > Validation in progress. OK to commit if passes ? OK, thanks! Honza
Re: [RFC] propagate malloc attribute in ipa-pure-const pass
On 24 October 2017 at 16:26, Jan Hubickawrote: >> 2017-10-13 Prathamesh Kulkarni >> >> * cgraph.h (set_malloc_flag): Declare. >> * cgraph.c (set_malloc_flag_1): New function. >> (set_malloc_flag): Likewise. >> * ipa-fnsummary.h (ipa_call_summary): Add new field is_return_callee. >> * ipa-fnsummary.c (ipa_call_summary::reset): Set is_return_callee to >> false. >> (read_ipa_call_summary): Add support for reading is_return_callee. >> (write_ipa_call_summary): Stream is_return_callee. >> * ipa-inline.c (ipa_inline): Remove call to ipa_free_fn_summary. >> * ipa-pure-const.c: Add headers ssa.h, alloc-pool.h, symbol-summary.h, >> ipa-prop.h, ipa-fnsummary.h. >> (pure_const_names): Change to static. >> (malloc_state_e): Define. >> (malloc_state_names): Define. >> (funct_state_d): Add field malloc_state. >> (varying_state): Set malloc_state to STATE_MALLOC_BOTTOM. >> (check_retval_uses): New function. >> (malloc_candidate_p): Likewise. >> (analyze_function): Add support for malloc attribute. >> (pure_const_write_summary): Stream malloc_state. >> (pure_const_read_summary): Add support for reading malloc_state. >> (dump_malloc_lattice): New function. >> (propagate_malloc): New function. >> (ipa_pure_const::execute): Call propagate_malloc and >> ipa_free_fn_summary. >> (pass_local_pure_const::execute): Add support for malloc attribute. >> * ssa-iterators.h (RETURN_FROM_IMM_USE_STMT): New macro. >> >> testsuite/ >> * gcc.dg/ipa/propmalloc-1.c: New test-case. >> * gcc.dg/ipa/propmalloc-2.c: Likewise. >> * gcc.dg/ipa/propmalloc-3.c: Likewise. > > OK. > Perhaps we could also add -Wsuggest-sttribute=malloc and mention it in > changes.html? Done in this version. In warn_function_malloc(), I passed false for known_finite param to suggest_attribute(). Does that look OK ? Validation in progress. OK to commit if passes ? Thanks, Prathamesh > > Thanks, > Honza 2017-10-13 Prathamesh Kulkarni * cgraph.h (set_malloc_flag): Declare. * cgraph.c (set_malloc_flag_1): New function. (set_malloc_flag): Likewise. * ipa-fnsummary.h (ipa_call_summary): Add new field is_return_callee. * ipa-fnsummary.c (ipa_call_summary::reset): Set is_return_callee to false. (read_ipa_call_summary): Add support for reading is_return_callee. (write_ipa_call_summary): Stream is_return_callee. * ipa-inline.c (ipa_inline): Remove call to ipa_free_fn_summary. * ipa-pure-const.c: Add headers ssa.h, alloc-pool.h, symbol-summary.h, ipa-prop.h, ipa-fnsummary.h. (pure_const_names): Change to static. (malloc_state_e): Define. (malloc_state_names): Define. (funct_state_d): Add field malloc_state. (varying_state): Set malloc_state to STATE_MALLOC_BOTTOM. (check_retval_uses): New function. (malloc_candidate_p): Likewise. (analyze_function): Add support for malloc attribute. (pure_const_write_summary): Stream malloc_state. (pure_const_read_summary): Add support for reading malloc_state. (dump_malloc_lattice): New function. (propagate_malloc): New function. (warn_function_malloc): New function. (ipa_pure_const::execute): Call propagate_malloc and ipa_free_fn_summary. (pass_local_pure_const::execute): Add support for malloc attribute. * ssa-iterators.h (RETURN_FROM_IMM_USE_STMT): New macro. * doc/invoke.texi: Document Wsuggest-attribute=malloc. testsuite/ * gcc.dg/ipa/propmalloc-1.c: New test-case. * gcc.dg/ipa/propmalloc-2.c: Likewise. * gcc.dg/ipa/propmalloc-3.c: Likewise. diff --git a/gcc/cgraph.c b/gcc/cgraph.c index 3d0cefbd46b..0aad90d59ea 100644 --- a/gcc/cgraph.c +++ b/gcc/cgraph.c @@ -2530,6 +2530,53 @@ cgraph_node::set_nothrow_flag (bool nothrow) return changed; } +/* Worker to set malloc flag. */ +static void +set_malloc_flag_1 (cgraph_node *node, bool malloc_p, bool *changed) +{ + if (malloc_p && !DECL_IS_MALLOC (node->decl)) +{ + DECL_IS_MALLOC (node->decl) = true; + *changed = true; +} + + ipa_ref *ref; + FOR_EACH_ALIAS (node, ref) +{ + cgraph_node *alias = dyn_cast (ref->referring); + if (!malloc_p || alias->get_availability () > AVAIL_INTERPOSABLE) + set_malloc_flag_1 (alias, malloc_p, changed); +} + + for (cgraph_edge *e = node->callers; e; e = e->next_caller) +if (e->caller->thunk.thunk_p + && (!malloc_p || e->caller->get_availability () > AVAIL_INTERPOSABLE)) + set_malloc_flag_1 (e->caller, malloc_p, changed); +} + +/* Set DECL_IS_MALLOC on NODE's decl and on NODE's aliases if any. */ + +bool +cgraph_node::set_malloc_flag (bool malloc_p) +{ + bool changed = false; + +
Re: [RFC] propagate malloc attribute in ipa-pure-const pass
> 2017-10-13 Prathamesh Kulkarni> > * cgraph.h (set_malloc_flag): Declare. > * cgraph.c (set_malloc_flag_1): New function. > (set_malloc_flag): Likewise. > * ipa-fnsummary.h (ipa_call_summary): Add new field is_return_callee. > * ipa-fnsummary.c (ipa_call_summary::reset): Set is_return_callee to > false. > (read_ipa_call_summary): Add support for reading is_return_callee. > (write_ipa_call_summary): Stream is_return_callee. > * ipa-inline.c (ipa_inline): Remove call to ipa_free_fn_summary. > * ipa-pure-const.c: Add headers ssa.h, alloc-pool.h, symbol-summary.h, > ipa-prop.h, ipa-fnsummary.h. > (pure_const_names): Change to static. > (malloc_state_e): Define. > (malloc_state_names): Define. > (funct_state_d): Add field malloc_state. > (varying_state): Set malloc_state to STATE_MALLOC_BOTTOM. > (check_retval_uses): New function. > (malloc_candidate_p): Likewise. > (analyze_function): Add support for malloc attribute. > (pure_const_write_summary): Stream malloc_state. > (pure_const_read_summary): Add support for reading malloc_state. > (dump_malloc_lattice): New function. > (propagate_malloc): New function. > (ipa_pure_const::execute): Call propagate_malloc and > ipa_free_fn_summary. > (pass_local_pure_const::execute): Add support for malloc attribute. > * ssa-iterators.h (RETURN_FROM_IMM_USE_STMT): New macro. > > testsuite/ > * gcc.dg/ipa/propmalloc-1.c: New test-case. > * gcc.dg/ipa/propmalloc-2.c: Likewise. > * gcc.dg/ipa/propmalloc-3.c: Likewise. OK. Perhaps we could also add -Wsuggest-sttribute=malloc and mention it in changes.html? Thanks, Honza
Re: [RFC] propagate malloc attribute in ipa-pure-const pass
On 14 October 2017 at 03:20, Prathamesh Kulkarniwrote: > On 7 October 2017 at 12:35, Prathamesh Kulkarni > wrote: >> On 7 October 2017 at 11:23, Jan Hubicka wrote: On 6 October 2017 at 06:04, Jan Hubicka wrote: >> Hi Honza, >> Thanks for the detailed suggestions, I have updated the patch >> accordingly. >> I have following questions on call_summary: >> 1] I added field bool is_return_callee in ipa_call_summary to track >> whether the caller possibly returns value returned by callee, which >> gets rid of return_callees_map. I assume ipa_call_summary_t::remove() >> and ipa_call_summary_t::duplicate() will already take care of handling >> late insertion/removal of cgraph nodes ? I just initialized >> is_return_callee to false in ipa_call_summary::reset and that seems to >> work. I am not sure though if I have handled it correctly. Could you >> please check that ? > > I was actually thinking to introduce separate summary for ipa-pure-const > pass, > but this seems fine to me too (for one bit definitly more effecient) > ipa_call_summary_t::duplicate copies all the fields, so indeed you > should be > safe here. > > Also it is possible for functions to be inserted late. Updating of call > summaries > is currently handled by ipa_fn_summary_t::insert >> >> 2] ipa_inline() called ipa_free_fn_summary, which made >> ipa_call_summaries unavailable during ipa-pure-const pass. I removed >> call to ipa_free_fn_summary from ipa_inline, and moved it to >> ipa_pure_const::execute(). Is that OK ? > > Seems OK to me. >> >> Patch passes bootstrap+test and lto bootstrap+test on >> x86_64-unknown-linux-gnu. >> Verfiied SPEC2k6 compiles and runs without miscompares with LTO >> enabled on aarch64-linux-gnu. >> Cross-tested on arm*-*-* and aarch64*-*-*. I will additionally test >> the patch by building chromium or firefox. >> Would it be OK to commit if it passes above validations ? >> >> Thanks, >> Prathamesh >> > >> > Thanks, >> > Honza > >> 2017-10-05 Prathamesh Kulkarni >> >> * cgraph.h (set_malloc_flag): Declare. >> * cgraph.c (set_malloc_flag_1): New function. >> (set_malloc_flag): Likewise. >> * ipa-fnsummary.h (ipa_call_summary): Add new field >> is_return_callee. >> * ipa-fnsummary.c (ipa_call_summary::reset): Set is_return_callee >> to >> false. >> (read_ipa_call_summary): Add support for reading is_return_callee. >> (write_ipa_call_summary): Stream is_return_callee. >> * ipa-inline.c (ipa_inline): Remove call to ipa_free_fn_summary. >> * ipa-pure-const.c: Add headers ssa.h, alloc-pool.h, >> symbol-summary.h, >> ipa-prop.h, ipa-fnsummary.h. >> (malloc_state_e): Define. >> (malloc_state_names): Define. >> (funct_state_d): Add field malloc_state. >> (varying_state): Set malloc_state to STATE_MALLOC_BOTTOM. >> (check_retval_uses): New function. >> (malloc_candidate_p): Likewise. >> (analyze_function): Add support for malloc attribute. >> (pure_const_write_summary): Stream malloc_state. >> (pure_const_read_summary): Add support for reading malloc_state. >> (dump_malloc_lattice): New function. >> (propagate_malloc): New function. >> (ipa_pure_const::execute): Call propagate_malloc and >> ipa_free_fn_summary. >> (pass_local_pure_const::execute): Add support for malloc >> attribute. >> * ssa-iterators.h (RETURN_FROM_IMM_USE_STMT): New macro. >> >> testsuite/ >> * gcc.dg/ipa/propmalloc-1.c: New test-case. >> * gcc.dg/ipa/propmalloc-2.c: Likewise. >> * gcc.dg/ipa/propmalloc-3.c: Likewise. >> >> diff --git a/gcc/cgraph.c b/gcc/cgraph.c >> index 3d0cefbd46b..0aad90d59ea 100644 >> --- a/gcc/cgraph.c >> +++ b/gcc/cgraph.c >> @@ -2530,6 +2530,53 @@ cgraph_node::set_nothrow_flag (bool nothrow) >>return changed; >> } >> >> +/* Worker to set malloc flag. */ > New line here I guess (it is below) >> +static void >> +set_malloc_flag_1 (cgraph_node *node, bool malloc_p, bool *changed) >> +{ >> + if (malloc_p && !DECL_IS_MALLOC (node->decl)) >> +{ >> + DECL_IS_MALLOC (node->decl) = true; >> + *changed = true; >> +} >> + >> + ipa_ref *ref; >> + FOR_EACH_ALIAS (node, ref) >> +{ >> + cgraph_node *alias = dyn_cast (ref->referring); >> + if (!malloc_p ||
Re: [RFC] propagate malloc attribute in ipa-pure-const pass
On 7 October 2017 at 12:35, Prathamesh Kulkarniwrote: > On 7 October 2017 at 11:23, Jan Hubicka wrote: >>> On 6 October 2017 at 06:04, Jan Hubicka wrote: >>> >> Hi Honza, >>> >> Thanks for the detailed suggestions, I have updated the patch >>> >> accordingly. >>> >> I have following questions on call_summary: >>> >> 1] I added field bool is_return_callee in ipa_call_summary to track >>> >> whether the caller possibly returns value returned by callee, which >>> >> gets rid of return_callees_map. I assume ipa_call_summary_t::remove() >>> >> and ipa_call_summary_t::duplicate() will already take care of handling >>> >> late insertion/removal of cgraph nodes ? I just initialized >>> >> is_return_callee to false in ipa_call_summary::reset and that seems to >>> >> work. I am not sure though if I have handled it correctly. Could you >>> >> please check that ? >>> > >>> > I was actually thinking to introduce separate summary for ipa-pure-const >>> > pass, >>> > but this seems fine to me too (for one bit definitly more effecient) >>> > ipa_call_summary_t::duplicate copies all the fields, so indeed you should >>> > be >>> > safe here. >>> > >>> > Also it is possible for functions to be inserted late. Updating of call >>> > summaries >>> > is currently handled by ipa_fn_summary_t::insert >>> >> >>> >> 2] ipa_inline() called ipa_free_fn_summary, which made >>> >> ipa_call_summaries unavailable during ipa-pure-const pass. I removed >>> >> call to ipa_free_fn_summary from ipa_inline, and moved it to >>> >> ipa_pure_const::execute(). Is that OK ? >>> > >>> > Seems OK to me. >>> >> >>> >> Patch passes bootstrap+test and lto bootstrap+test on >>> >> x86_64-unknown-linux-gnu. >>> >> Verfiied SPEC2k6 compiles and runs without miscompares with LTO >>> >> enabled on aarch64-linux-gnu. >>> >> Cross-tested on arm*-*-* and aarch64*-*-*. I will additionally test >>> >> the patch by building chromium or firefox. >>> >> Would it be OK to commit if it passes above validations ? >>> >> >>> >> Thanks, >>> >> Prathamesh >>> >> > >>> >> > Thanks, >>> >> > Honza >>> > >>> >> 2017-10-05 Prathamesh Kulkarni >>> >> >>> >> * cgraph.h (set_malloc_flag): Declare. >>> >> * cgraph.c (set_malloc_flag_1): New function. >>> >> (set_malloc_flag): Likewise. >>> >> * ipa-fnsummary.h (ipa_call_summary): Add new field >>> >> is_return_callee. >>> >> * ipa-fnsummary.c (ipa_call_summary::reset): Set is_return_callee >>> >> to >>> >> false. >>> >> (read_ipa_call_summary): Add support for reading is_return_callee. >>> >> (write_ipa_call_summary): Stream is_return_callee. >>> >> * ipa-inline.c (ipa_inline): Remove call to ipa_free_fn_summary. >>> >> * ipa-pure-const.c: Add headers ssa.h, alloc-pool.h, >>> >> symbol-summary.h, >>> >> ipa-prop.h, ipa-fnsummary.h. >>> >> (malloc_state_e): Define. >>> >> (malloc_state_names): Define. >>> >> (funct_state_d): Add field malloc_state. >>> >> (varying_state): Set malloc_state to STATE_MALLOC_BOTTOM. >>> >> (check_retval_uses): New function. >>> >> (malloc_candidate_p): Likewise. >>> >> (analyze_function): Add support for malloc attribute. >>> >> (pure_const_write_summary): Stream malloc_state. >>> >> (pure_const_read_summary): Add support for reading malloc_state. >>> >> (dump_malloc_lattice): New function. >>> >> (propagate_malloc): New function. >>> >> (ipa_pure_const::execute): Call propagate_malloc and >>> >> ipa_free_fn_summary. >>> >> (pass_local_pure_const::execute): Add support for malloc attribute. >>> >> * ssa-iterators.h (RETURN_FROM_IMM_USE_STMT): New macro. >>> >> >>> >> testsuite/ >>> >> * gcc.dg/ipa/propmalloc-1.c: New test-case. >>> >> * gcc.dg/ipa/propmalloc-2.c: Likewise. >>> >> * gcc.dg/ipa/propmalloc-3.c: Likewise. >>> >> >>> >> diff --git a/gcc/cgraph.c b/gcc/cgraph.c >>> >> index 3d0cefbd46b..0aad90d59ea 100644 >>> >> --- a/gcc/cgraph.c >>> >> +++ b/gcc/cgraph.c >>> >> @@ -2530,6 +2530,53 @@ cgraph_node::set_nothrow_flag (bool nothrow) >>> >>return changed; >>> >> } >>> >> >>> >> +/* Worker to set malloc flag. */ >>> > New line here I guess (it is below) >>> >> +static void >>> >> +set_malloc_flag_1 (cgraph_node *node, bool malloc_p, bool *changed) >>> >> +{ >>> >> + if (malloc_p && !DECL_IS_MALLOC (node->decl)) >>> >> +{ >>> >> + DECL_IS_MALLOC (node->decl) = true; >>> >> + *changed = true; >>> >> +} >>> >> + >>> >> + ipa_ref *ref; >>> >> + FOR_EACH_ALIAS (node, ref) >>> >> +{ >>> >> + cgraph_node *alias = dyn_cast (ref->referring); >>> >> + if (!malloc_p || alias->get_availability () > AVAIL_INTERPOSABLE) >>> >> + set_malloc_flag_1 (alias, malloc_p, changed); >>> >> +} >>> >> + >>> >> + for (cgraph_edge *e = node->callers; e; e = e->next_caller) >>> >> +if
Re: [RFC] propagate malloc attribute in ipa-pure-const pass
On 7 October 2017 at 11:23, Jan Hubickawrote: >> On 6 October 2017 at 06:04, Jan Hubicka wrote: >> >> Hi Honza, >> >> Thanks for the detailed suggestions, I have updated the patch accordingly. >> >> I have following questions on call_summary: >> >> 1] I added field bool is_return_callee in ipa_call_summary to track >> >> whether the caller possibly returns value returned by callee, which >> >> gets rid of return_callees_map. I assume ipa_call_summary_t::remove() >> >> and ipa_call_summary_t::duplicate() will already take care of handling >> >> late insertion/removal of cgraph nodes ? I just initialized >> >> is_return_callee to false in ipa_call_summary::reset and that seems to >> >> work. I am not sure though if I have handled it correctly. Could you >> >> please check that ? >> > >> > I was actually thinking to introduce separate summary for ipa-pure-const >> > pass, >> > but this seems fine to me too (for one bit definitly more effecient) >> > ipa_call_summary_t::duplicate copies all the fields, so indeed you should >> > be >> > safe here. >> > >> > Also it is possible for functions to be inserted late. Updating of call >> > summaries >> > is currently handled by ipa_fn_summary_t::insert >> >> >> >> 2] ipa_inline() called ipa_free_fn_summary, which made >> >> ipa_call_summaries unavailable during ipa-pure-const pass. I removed >> >> call to ipa_free_fn_summary from ipa_inline, and moved it to >> >> ipa_pure_const::execute(). Is that OK ? >> > >> > Seems OK to me. >> >> >> >> Patch passes bootstrap+test and lto bootstrap+test on >> >> x86_64-unknown-linux-gnu. >> >> Verfiied SPEC2k6 compiles and runs without miscompares with LTO >> >> enabled on aarch64-linux-gnu. >> >> Cross-tested on arm*-*-* and aarch64*-*-*. I will additionally test >> >> the patch by building chromium or firefox. >> >> Would it be OK to commit if it passes above validations ? >> >> >> >> Thanks, >> >> Prathamesh >> >> > >> >> > Thanks, >> >> > Honza >> > >> >> 2017-10-05 Prathamesh Kulkarni >> >> >> >> * cgraph.h (set_malloc_flag): Declare. >> >> * cgraph.c (set_malloc_flag_1): New function. >> >> (set_malloc_flag): Likewise. >> >> * ipa-fnsummary.h (ipa_call_summary): Add new field >> >> is_return_callee. >> >> * ipa-fnsummary.c (ipa_call_summary::reset): Set is_return_callee to >> >> false. >> >> (read_ipa_call_summary): Add support for reading is_return_callee. >> >> (write_ipa_call_summary): Stream is_return_callee. >> >> * ipa-inline.c (ipa_inline): Remove call to ipa_free_fn_summary. >> >> * ipa-pure-const.c: Add headers ssa.h, alloc-pool.h, >> >> symbol-summary.h, >> >> ipa-prop.h, ipa-fnsummary.h. >> >> (malloc_state_e): Define. >> >> (malloc_state_names): Define. >> >> (funct_state_d): Add field malloc_state. >> >> (varying_state): Set malloc_state to STATE_MALLOC_BOTTOM. >> >> (check_retval_uses): New function. >> >> (malloc_candidate_p): Likewise. >> >> (analyze_function): Add support for malloc attribute. >> >> (pure_const_write_summary): Stream malloc_state. >> >> (pure_const_read_summary): Add support for reading malloc_state. >> >> (dump_malloc_lattice): New function. >> >> (propagate_malloc): New function. >> >> (ipa_pure_const::execute): Call propagate_malloc and >> >> ipa_free_fn_summary. >> >> (pass_local_pure_const::execute): Add support for malloc attribute. >> >> * ssa-iterators.h (RETURN_FROM_IMM_USE_STMT): New macro. >> >> >> >> testsuite/ >> >> * gcc.dg/ipa/propmalloc-1.c: New test-case. >> >> * gcc.dg/ipa/propmalloc-2.c: Likewise. >> >> * gcc.dg/ipa/propmalloc-3.c: Likewise. >> >> >> >> diff --git a/gcc/cgraph.c b/gcc/cgraph.c >> >> index 3d0cefbd46b..0aad90d59ea 100644 >> >> --- a/gcc/cgraph.c >> >> +++ b/gcc/cgraph.c >> >> @@ -2530,6 +2530,53 @@ cgraph_node::set_nothrow_flag (bool nothrow) >> >>return changed; >> >> } >> >> >> >> +/* Worker to set malloc flag. */ >> > New line here I guess (it is below) >> >> +static void >> >> +set_malloc_flag_1 (cgraph_node *node, bool malloc_p, bool *changed) >> >> +{ >> >> + if (malloc_p && !DECL_IS_MALLOC (node->decl)) >> >> +{ >> >> + DECL_IS_MALLOC (node->decl) = true; >> >> + *changed = true; >> >> +} >> >> + >> >> + ipa_ref *ref; >> >> + FOR_EACH_ALIAS (node, ref) >> >> +{ >> >> + cgraph_node *alias = dyn_cast (ref->referring); >> >> + if (!malloc_p || alias->get_availability () > AVAIL_INTERPOSABLE) >> >> + set_malloc_flag_1 (alias, malloc_p, changed); >> >> +} >> >> + >> >> + for (cgraph_edge *e = node->callers; e; e = e->next_caller) >> >> +if (e->caller->thunk.thunk_p >> >> + && (!malloc_p || e->caller->get_availability () > >> >> AVAIL_INTERPOSABLE)) >> >> + set_malloc_flag_1 (e->caller, malloc_p, changed); >> >> +} >> >> + >> >> +/* Set
Re: [RFC] propagate malloc attribute in ipa-pure-const pass
> On 6 October 2017 at 06:04, Jan Hubickawrote: > >> Hi Honza, > >> Thanks for the detailed suggestions, I have updated the patch accordingly. > >> I have following questions on call_summary: > >> 1] I added field bool is_return_callee in ipa_call_summary to track > >> whether the caller possibly returns value returned by callee, which > >> gets rid of return_callees_map. I assume ipa_call_summary_t::remove() > >> and ipa_call_summary_t::duplicate() will already take care of handling > >> late insertion/removal of cgraph nodes ? I just initialized > >> is_return_callee to false in ipa_call_summary::reset and that seems to > >> work. I am not sure though if I have handled it correctly. Could you > >> please check that ? > > > > I was actually thinking to introduce separate summary for ipa-pure-const > > pass, > > but this seems fine to me too (for one bit definitly more effecient) > > ipa_call_summary_t::duplicate copies all the fields, so indeed you should be > > safe here. > > > > Also it is possible for functions to be inserted late. Updating of call > > summaries > > is currently handled by ipa_fn_summary_t::insert > >> > >> 2] ipa_inline() called ipa_free_fn_summary, which made > >> ipa_call_summaries unavailable during ipa-pure-const pass. I removed > >> call to ipa_free_fn_summary from ipa_inline, and moved it to > >> ipa_pure_const::execute(). Is that OK ? > > > > Seems OK to me. > >> > >> Patch passes bootstrap+test and lto bootstrap+test on > >> x86_64-unknown-linux-gnu. > >> Verfiied SPEC2k6 compiles and runs without miscompares with LTO > >> enabled on aarch64-linux-gnu. > >> Cross-tested on arm*-*-* and aarch64*-*-*. I will additionally test > >> the patch by building chromium or firefox. > >> Would it be OK to commit if it passes above validations ? > >> > >> Thanks, > >> Prathamesh > >> > > >> > Thanks, > >> > Honza > > > >> 2017-10-05 Prathamesh Kulkarni > >> > >> * cgraph.h (set_malloc_flag): Declare. > >> * cgraph.c (set_malloc_flag_1): New function. > >> (set_malloc_flag): Likewise. > >> * ipa-fnsummary.h (ipa_call_summary): Add new field is_return_callee. > >> * ipa-fnsummary.c (ipa_call_summary::reset): Set is_return_callee to > >> false. > >> (read_ipa_call_summary): Add support for reading is_return_callee. > >> (write_ipa_call_summary): Stream is_return_callee. > >> * ipa-inline.c (ipa_inline): Remove call to ipa_free_fn_summary. > >> * ipa-pure-const.c: Add headers ssa.h, alloc-pool.h, > >> symbol-summary.h, > >> ipa-prop.h, ipa-fnsummary.h. > >> (malloc_state_e): Define. > >> (malloc_state_names): Define. > >> (funct_state_d): Add field malloc_state. > >> (varying_state): Set malloc_state to STATE_MALLOC_BOTTOM. > >> (check_retval_uses): New function. > >> (malloc_candidate_p): Likewise. > >> (analyze_function): Add support for malloc attribute. > >> (pure_const_write_summary): Stream malloc_state. > >> (pure_const_read_summary): Add support for reading malloc_state. > >> (dump_malloc_lattice): New function. > >> (propagate_malloc): New function. > >> (ipa_pure_const::execute): Call propagate_malloc and > >> ipa_free_fn_summary. > >> (pass_local_pure_const::execute): Add support for malloc attribute. > >> * ssa-iterators.h (RETURN_FROM_IMM_USE_STMT): New macro. > >> > >> testsuite/ > >> * gcc.dg/ipa/propmalloc-1.c: New test-case. > >> * gcc.dg/ipa/propmalloc-2.c: Likewise. > >> * gcc.dg/ipa/propmalloc-3.c: Likewise. > >> > >> diff --git a/gcc/cgraph.c b/gcc/cgraph.c > >> index 3d0cefbd46b..0aad90d59ea 100644 > >> --- a/gcc/cgraph.c > >> +++ b/gcc/cgraph.c > >> @@ -2530,6 +2530,53 @@ cgraph_node::set_nothrow_flag (bool nothrow) > >>return changed; > >> } > >> > >> +/* Worker to set malloc flag. */ > > New line here I guess (it is below) > >> +static void > >> +set_malloc_flag_1 (cgraph_node *node, bool malloc_p, bool *changed) > >> +{ > >> + if (malloc_p && !DECL_IS_MALLOC (node->decl)) > >> +{ > >> + DECL_IS_MALLOC (node->decl) = true; > >> + *changed = true; > >> +} > >> + > >> + ipa_ref *ref; > >> + FOR_EACH_ALIAS (node, ref) > >> +{ > >> + cgraph_node *alias = dyn_cast (ref->referring); > >> + if (!malloc_p || alias->get_availability () > AVAIL_INTERPOSABLE) > >> + set_malloc_flag_1 (alias, malloc_p, changed); > >> +} > >> + > >> + for (cgraph_edge *e = node->callers; e; e = e->next_caller) > >> +if (e->caller->thunk.thunk_p > >> + && (!malloc_p || e->caller->get_availability () > > >> AVAIL_INTERPOSABLE)) > >> + set_malloc_flag_1 (e->caller, malloc_p, changed); > >> +} > >> + > >> +/* Set DECL_IS_MALLOC on NODE's decl and on NODE's aliases if any. */ > >> + > >> +bool > >> +cgraph_node::set_malloc_flag (bool malloc_p) > >> +{ > >> + bool changed = false; > >> + > >> + if (!malloc_p
Re: [RFC] propagate malloc attribute in ipa-pure-const pass
On 6 October 2017 at 06:04, Jan Hubickawrote: >> Hi Honza, >> Thanks for the detailed suggestions, I have updated the patch accordingly. >> I have following questions on call_summary: >> 1] I added field bool is_return_callee in ipa_call_summary to track >> whether the caller possibly returns value returned by callee, which >> gets rid of return_callees_map. I assume ipa_call_summary_t::remove() >> and ipa_call_summary_t::duplicate() will already take care of handling >> late insertion/removal of cgraph nodes ? I just initialized >> is_return_callee to false in ipa_call_summary::reset and that seems to >> work. I am not sure though if I have handled it correctly. Could you >> please check that ? > > I was actually thinking to introduce separate summary for ipa-pure-const pass, > but this seems fine to me too (for one bit definitly more effecient) > ipa_call_summary_t::duplicate copies all the fields, so indeed you should be > safe here. > > Also it is possible for functions to be inserted late. Updating of call > summaries > is currently handled by ipa_fn_summary_t::insert >> >> 2] ipa_inline() called ipa_free_fn_summary, which made >> ipa_call_summaries unavailable during ipa-pure-const pass. I removed >> call to ipa_free_fn_summary from ipa_inline, and moved it to >> ipa_pure_const::execute(). Is that OK ? > > Seems OK to me. >> >> Patch passes bootstrap+test and lto bootstrap+test on >> x86_64-unknown-linux-gnu. >> Verfiied SPEC2k6 compiles and runs without miscompares with LTO >> enabled on aarch64-linux-gnu. >> Cross-tested on arm*-*-* and aarch64*-*-*. I will additionally test >> the patch by building chromium or firefox. >> Would it be OK to commit if it passes above validations ? >> >> Thanks, >> Prathamesh >> > >> > Thanks, >> > Honza > >> 2017-10-05 Prathamesh Kulkarni >> >> * cgraph.h (set_malloc_flag): Declare. >> * cgraph.c (set_malloc_flag_1): New function. >> (set_malloc_flag): Likewise. >> * ipa-fnsummary.h (ipa_call_summary): Add new field is_return_callee. >> * ipa-fnsummary.c (ipa_call_summary::reset): Set is_return_callee to >> false. >> (read_ipa_call_summary): Add support for reading is_return_callee. >> (write_ipa_call_summary): Stream is_return_callee. >> * ipa-inline.c (ipa_inline): Remove call to ipa_free_fn_summary. >> * ipa-pure-const.c: Add headers ssa.h, alloc-pool.h, symbol-summary.h, >> ipa-prop.h, ipa-fnsummary.h. >> (malloc_state_e): Define. >> (malloc_state_names): Define. >> (funct_state_d): Add field malloc_state. >> (varying_state): Set malloc_state to STATE_MALLOC_BOTTOM. >> (check_retval_uses): New function. >> (malloc_candidate_p): Likewise. >> (analyze_function): Add support for malloc attribute. >> (pure_const_write_summary): Stream malloc_state. >> (pure_const_read_summary): Add support for reading malloc_state. >> (dump_malloc_lattice): New function. >> (propagate_malloc): New function. >> (ipa_pure_const::execute): Call propagate_malloc and >> ipa_free_fn_summary. >> (pass_local_pure_const::execute): Add support for malloc attribute. >> * ssa-iterators.h (RETURN_FROM_IMM_USE_STMT): New macro. >> >> testsuite/ >> * gcc.dg/ipa/propmalloc-1.c: New test-case. >> * gcc.dg/ipa/propmalloc-2.c: Likewise. >> * gcc.dg/ipa/propmalloc-3.c: Likewise. >> >> diff --git a/gcc/cgraph.c b/gcc/cgraph.c >> index 3d0cefbd46b..0aad90d59ea 100644 >> --- a/gcc/cgraph.c >> +++ b/gcc/cgraph.c >> @@ -2530,6 +2530,53 @@ cgraph_node::set_nothrow_flag (bool nothrow) >>return changed; >> } >> >> +/* Worker to set malloc flag. */ > New line here I guess (it is below) >> +static void >> +set_malloc_flag_1 (cgraph_node *node, bool malloc_p, bool *changed) >> +{ >> + if (malloc_p && !DECL_IS_MALLOC (node->decl)) >> +{ >> + DECL_IS_MALLOC (node->decl) = true; >> + *changed = true; >> +} >> + >> + ipa_ref *ref; >> + FOR_EACH_ALIAS (node, ref) >> +{ >> + cgraph_node *alias = dyn_cast (ref->referring); >> + if (!malloc_p || alias->get_availability () > AVAIL_INTERPOSABLE) >> + set_malloc_flag_1 (alias, malloc_p, changed); >> +} >> + >> + for (cgraph_edge *e = node->callers; e; e = e->next_caller) >> +if (e->caller->thunk.thunk_p >> + && (!malloc_p || e->caller->get_availability () > AVAIL_INTERPOSABLE)) >> + set_malloc_flag_1 (e->caller, malloc_p, changed); >> +} >> + >> +/* Set DECL_IS_MALLOC on NODE's decl and on NODE's aliases if any. */ >> + >> +bool >> +cgraph_node::set_malloc_flag (bool malloc_p) >> +{ >> + bool changed = false; >> + >> + if (!malloc_p || get_availability () > AVAIL_INTERPOSABLE) >> +set_malloc_flag_1 (this, malloc_p, ); >> + else >> +{ >> + ipa_ref *ref; >> + >> + FOR_EACH_ALIAS (this, ref) >> + { >> + cgraph_node *alias = dyn_cast (ref->referring); >> +
Re: [RFC] propagate malloc attribute in ipa-pure-const pass
> Hi Honza, > Thanks for the detailed suggestions, I have updated the patch accordingly. > I have following questions on call_summary: > 1] I added field bool is_return_callee in ipa_call_summary to track > whether the caller possibly returns value returned by callee, which > gets rid of return_callees_map. I assume ipa_call_summary_t::remove() > and ipa_call_summary_t::duplicate() will already take care of handling > late insertion/removal of cgraph nodes ? I just initialized > is_return_callee to false in ipa_call_summary::reset and that seems to > work. I am not sure though if I have handled it correctly. Could you > please check that ? I was actually thinking to introduce separate summary for ipa-pure-const pass, but this seems fine to me too (for one bit definitly more effecient) ipa_call_summary_t::duplicate copies all the fields, so indeed you should be safe here. Also it is possible for functions to be inserted late. Updating of call summaries is currently handled by ipa_fn_summary_t::insert > > 2] ipa_inline() called ipa_free_fn_summary, which made > ipa_call_summaries unavailable during ipa-pure-const pass. I removed > call to ipa_free_fn_summary from ipa_inline, and moved it to > ipa_pure_const::execute(). Is that OK ? Seems OK to me. > > Patch passes bootstrap+test and lto bootstrap+test on > x86_64-unknown-linux-gnu. > Verfiied SPEC2k6 compiles and runs without miscompares with LTO > enabled on aarch64-linux-gnu. > Cross-tested on arm*-*-* and aarch64*-*-*. I will additionally test > the patch by building chromium or firefox. > Would it be OK to commit if it passes above validations ? > > Thanks, > Prathamesh > > > > Thanks, > > Honza > 2017-10-05 Prathamesh Kulkarni> > * cgraph.h (set_malloc_flag): Declare. > * cgraph.c (set_malloc_flag_1): New function. > (set_malloc_flag): Likewise. > * ipa-fnsummary.h (ipa_call_summary): Add new field is_return_callee. > * ipa-fnsummary.c (ipa_call_summary::reset): Set is_return_callee to > false. > (read_ipa_call_summary): Add support for reading is_return_callee. > (write_ipa_call_summary): Stream is_return_callee. > * ipa-inline.c (ipa_inline): Remove call to ipa_free_fn_summary. > * ipa-pure-const.c: Add headers ssa.h, alloc-pool.h, symbol-summary.h, > ipa-prop.h, ipa-fnsummary.h. > (malloc_state_e): Define. > (malloc_state_names): Define. > (funct_state_d): Add field malloc_state. > (varying_state): Set malloc_state to STATE_MALLOC_BOTTOM. > (check_retval_uses): New function. > (malloc_candidate_p): Likewise. > (analyze_function): Add support for malloc attribute. > (pure_const_write_summary): Stream malloc_state. > (pure_const_read_summary): Add support for reading malloc_state. > (dump_malloc_lattice): New function. > (propagate_malloc): New function. > (ipa_pure_const::execute): Call propagate_malloc and > ipa_free_fn_summary. > (pass_local_pure_const::execute): Add support for malloc attribute. > * ssa-iterators.h (RETURN_FROM_IMM_USE_STMT): New macro. > > testsuite/ > * gcc.dg/ipa/propmalloc-1.c: New test-case. > * gcc.dg/ipa/propmalloc-2.c: Likewise. > * gcc.dg/ipa/propmalloc-3.c: Likewise. > > diff --git a/gcc/cgraph.c b/gcc/cgraph.c > index 3d0cefbd46b..0aad90d59ea 100644 > --- a/gcc/cgraph.c > +++ b/gcc/cgraph.c > @@ -2530,6 +2530,53 @@ cgraph_node::set_nothrow_flag (bool nothrow) >return changed; > } > > +/* Worker to set malloc flag. */ New line here I guess (it is below) > +static void > +set_malloc_flag_1 (cgraph_node *node, bool malloc_p, bool *changed) > +{ > + if (malloc_p && !DECL_IS_MALLOC (node->decl)) > +{ > + DECL_IS_MALLOC (node->decl) = true; > + *changed = true; > +} > + > + ipa_ref *ref; > + FOR_EACH_ALIAS (node, ref) > +{ > + cgraph_node *alias = dyn_cast (ref->referring); > + if (!malloc_p || alias->get_availability () > AVAIL_INTERPOSABLE) > + set_malloc_flag_1 (alias, malloc_p, changed); > +} > + > + for (cgraph_edge *e = node->callers; e; e = e->next_caller) > +if (e->caller->thunk.thunk_p > + && (!malloc_p || e->caller->get_availability () > AVAIL_INTERPOSABLE)) > + set_malloc_flag_1 (e->caller, malloc_p, changed); > +} > + > +/* Set DECL_IS_MALLOC on NODE's decl and on NODE's aliases if any. */ > + > +bool > +cgraph_node::set_malloc_flag (bool malloc_p) > +{ > + bool changed = false; > + > + if (!malloc_p || get_availability () > AVAIL_INTERPOSABLE) > +set_malloc_flag_1 (this, malloc_p, ); > + else > +{ > + ipa_ref *ref; > + > + FOR_EACH_ALIAS (this, ref) > + { > + cgraph_node *alias = dyn_cast (ref->referring); > + if (!malloc_p || alias->get_availability () > AVAIL_INTERPOSABLE) > + set_malloc_flag_1 (alias, malloc_p, ); > + } > +} > + return changed; > +} > + > diff --git
Re: [RFC] propagate malloc attribute in ipa-pure-const pass
On 29 September 2017 at 12:28, Jan Hubickawrote: >> > I wonder what happens here when, say, ipa-icf redirect the call to >> > eqivaelnt >> > function and removes the callee? Perhaps we realy want to have set of call >> > sites rahter than nodes stored from analysis to execution. Call sites have >> > unique stmts and uids, so it will be possible to map them back and forth. >> IIUC, call site is represented using cgraph_edge ? > > Yes, there is call_stmt pointer when gimple is read and uid in WPA mode which > are unique. There is call_summary class which lets you to associate info with > a call site. While it is not most memory effecient to store one bit of > information > this way, i guess it may be easiest to use it in anticipation of it becoming > more useful in foreseeable future. We have some bits in call edge itself that > may be shuffled to the summary as well. > >> So change return_callees_map to be the mapping from node to subset of >> it's call-sites where >> node returns the value of one of it's callees: >> static hash_map< cgraph_node *, vec *> *return_callees_map; ? > > This should work too, though storing direct pointers to edges is something we > probably want to avoid to keep memory representation of edges in bounds. > > I would go with call_summary - everything else seems like bit of premature > optimization. If we will decide to optimize it later, we may invent a variant > of summary datatype for that. Hi Honza, Thanks for the detailed suggestions, I have updated the patch accordingly. I have following questions on call_summary: 1] I added field bool is_return_callee in ipa_call_summary to track whether the caller possibly returns value returned by callee, which gets rid of return_callees_map. I assume ipa_call_summary_t::remove() and ipa_call_summary_t::duplicate() will already take care of handling late insertion/removal of cgraph nodes ? I just initialized is_return_callee to false in ipa_call_summary::reset and that seems to work. I am not sure though if I have handled it correctly. Could you please check that ? 2] ipa_inline() called ipa_free_fn_summary, which made ipa_call_summaries unavailable during ipa-pure-const pass. I removed call to ipa_free_fn_summary from ipa_inline, and moved it to ipa_pure_const::execute(). Is that OK ? Patch passes bootstrap+test and lto bootstrap+test on x86_64-unknown-linux-gnu. Verfiied SPEC2k6 compiles and runs without miscompares with LTO enabled on aarch64-linux-gnu. Cross-tested on arm*-*-* and aarch64*-*-*. I will additionally test the patch by building chromium or firefox. Would it be OK to commit if it passes above validations ? Thanks, Prathamesh > > Thanks, > Honza 2017-10-05 Prathamesh Kulkarni * cgraph.h (set_malloc_flag): Declare. * cgraph.c (set_malloc_flag_1): New function. (set_malloc_flag): Likewise. * ipa-fnsummary.h (ipa_call_summary): Add new field is_return_callee. * ipa-fnsummary.c (ipa_call_summary::reset): Set is_return_callee to false. (read_ipa_call_summary): Add support for reading is_return_callee. (write_ipa_call_summary): Stream is_return_callee. * ipa-inline.c (ipa_inline): Remove call to ipa_free_fn_summary. * ipa-pure-const.c: Add headers ssa.h, alloc-pool.h, symbol-summary.h, ipa-prop.h, ipa-fnsummary.h. (malloc_state_e): Define. (malloc_state_names): Define. (funct_state_d): Add field malloc_state. (varying_state): Set malloc_state to STATE_MALLOC_BOTTOM. (check_retval_uses): New function. (malloc_candidate_p): Likewise. (analyze_function): Add support for malloc attribute. (pure_const_write_summary): Stream malloc_state. (pure_const_read_summary): Add support for reading malloc_state. (dump_malloc_lattice): New function. (propagate_malloc): New function. (ipa_pure_const::execute): Call propagate_malloc and ipa_free_fn_summary. (pass_local_pure_const::execute): Add support for malloc attribute. * ssa-iterators.h (RETURN_FROM_IMM_USE_STMT): New macro. testsuite/ * gcc.dg/ipa/propmalloc-1.c: New test-case. * gcc.dg/ipa/propmalloc-2.c: Likewise. * gcc.dg/ipa/propmalloc-3.c: Likewise. diff --git a/gcc/cgraph.c b/gcc/cgraph.c index 3d0cefbd46b..0aad90d59ea 100644 --- a/gcc/cgraph.c +++ b/gcc/cgraph.c @@ -2530,6 +2530,53 @@ cgraph_node::set_nothrow_flag (bool nothrow) return changed; } +/* Worker to set malloc flag. */ +static void +set_malloc_flag_1 (cgraph_node *node, bool malloc_p, bool *changed) +{ + if (malloc_p && !DECL_IS_MALLOC (node->decl)) +{ + DECL_IS_MALLOC (node->decl) = true; + *changed = true; +} + + ipa_ref *ref; + FOR_EACH_ALIAS (node, ref) +{ + cgraph_node *alias = dyn_cast (ref->referring); + if (!malloc_p || alias->get_availability () > AVAIL_INTERPOSABLE) +
Re: [RFC] propagate malloc attribute in ipa-pure-const pass
> > I wonder what happens here when, say, ipa-icf redirect the call to eqivaelnt > > function and removes the callee? Perhaps we realy want to have set of call > > sites rahter than nodes stored from analysis to execution. Call sites have > > unique stmts and uids, so it will be possible to map them back and forth. > IIUC, call site is represented using cgraph_edge ? Yes, there is call_stmt pointer when gimple is read and uid in WPA mode which are unique. There is call_summary class which lets you to associate info with a call site. While it is not most memory effecient to store one bit of information this way, i guess it may be easiest to use it in anticipation of it becoming more useful in foreseeable future. We have some bits in call edge itself that may be shuffled to the summary as well. > So change return_callees_map to be the mapping from node to subset of > it's call-sites where > node returns the value of one of it's callees: > static hash_map< cgraph_node *, vec *> *return_callees_map; ? This should work too, though storing direct pointers to edges is something we probably want to avoid to keep memory representation of edges in bounds. I would go with call_summary - everything else seems like bit of premature optimization. If we will decide to optimize it later, we may invent a variant of summary datatype for that. Thanks, Honza
Re: [RFC] propagate malloc attribute in ipa-pure-const pass
On 25 September 2017 at 17:24, Jan Hubickawrote: >> Hi Honza, >> Could you please have a look at this patch ? >> https://gcc.gnu.org/ml/gcc-patches/2017-07/msg02063.html > > I can and I should have done long time ago. I really apologize for slow > response > and I will try to be more timely from now on. The reason was that I had some > patches that I was thinking I would like to push out first, but I guess since > they are still not ready it is better to go other way around. No worries, and thanks for the feedback! > > +/* A map from node to subset of callees. The subset contains those callees > + * whose return-value is returned by the node. */ > +static hash_map< cgraph_node *, vec* > *return_callees_map; > > Extra * at the beggining of line. It would make more sense to put those > and the other bits into function_summary rather than using the hooks > but that is something we co do incrementally. > > I wonder what happens here when, say, ipa-icf redirect the call to eqivaelnt > function and removes the callee? Perhaps we realy want to have set of call > sites rahter than nodes stored from analysis to execution. Call sites have > unique stmts and uids, so it will be possible to map them back and forth. IIUC, call site is represented using cgraph_edge ? So change return_callees_map to be the mapping from node to subset of it's call-sites where node returns the value of one of it's callees: static hash_map< cgraph_node *, vec *> *return_callees_map; ? > > +static bool > +check_retval_uses (tree retval, gimple *stmt) > +{ > > there is missing toplevel comment on those. > > +/* > + * Currently this function does a very conservative analysis to check if > + * function could be a malloc candidate. > + * > + * The function is considered to be a candidate if > + * 1) The function returns a value of pointer type. > + * 2) SSA_NAME_DEF_STMT (return_value) is either a function call or > + *a phi, and element of phi is either NULL or > + *SSA_NAME_DEF_STMT(element) is function call. > + * 3) The return-value has immediate uses only within comparisons (gcond or > gassign) > + *and return_stmt (and likewise a phi arg has immediate use only within > comparison > + *or the phi stmt). > + */ > > Now * in begginig of lines. Theoretically by coding standards the comment > should start with description of what function does and what are the > parameters. > I believe Richi already commented on this part - which is more of his domain, > but it seems fine to me. > > Pehraps with -details dump it would be nice to dump reason why the malloc > candidate was rejected. > > +DEBUG_FUNCTION > +static void > +dump_malloc_lattice (FILE *dump_file, const char *s) > > +static void > +propagate_malloc (void) > > For coding standards, please add block comments. Thanks for the suggestions, I will try to address them in the next version of the patch. Regards, Prathamesh > > With these changes the patch looks good to me! > Honza > >> >> I tested it with SPEC2006 on AArch64 Cortex-a57 processor and saw some >> improvement for >> 433.milc (+1.79%), 437.leslie3d (+2.84%) and 470.lbm (+4%) and not >> much differences for other benchmarks. >> I don't expect them to be precise though, it was run with only one >> iteration of SPEC. >> Thanks! >> >> Regards, >> Prathamesh >> > >> > Thanks, >> > Prathamesh >> >> >> >> Thanks, >> >> Prathamesh >> >>> >> >>> Thanks, >> >>> Prathamesh >> >> Thanks, >> Prathamesh >> > >> > Regards, >> > Prathamesh >> >> >> >> Thanks, >> >> Prathamesh >> >>> >> >>> Honza
Re: [RFC] propagate malloc attribute in ipa-pure-const pass
> Hi Honza, > Could you please have a look at this patch ? > https://gcc.gnu.org/ml/gcc-patches/2017-07/msg02063.html I can and I should have done long time ago. I really apologize for slow response and I will try to be more timely from now on. The reason was that I had some patches that I was thinking I would like to push out first, but I guess since they are still not ready it is better to go other way around. +/* A map from node to subset of callees. The subset contains those callees + * whose return-value is returned by the node. */ +static hash_map< cgraph_node *, vec* > *return_callees_map; Extra * at the beggining of line. It would make more sense to put those and the other bits into function_summary rather than using the hooks but that is something we co do incrementally. I wonder what happens here when, say, ipa-icf redirect the call to eqivaelnt function and removes the callee? Perhaps we realy want to have set of call sites rahter than nodes stored from analysis to execution. Call sites have unique stmts and uids, so it will be possible to map them back and forth. +static bool +check_retval_uses (tree retval, gimple *stmt) +{ there is missing toplevel comment on those. +/* + * Currently this function does a very conservative analysis to check if + * function could be a malloc candidate. + * + * The function is considered to be a candidate if + * 1) The function returns a value of pointer type. + * 2) SSA_NAME_DEF_STMT (return_value) is either a function call or + *a phi, and element of phi is either NULL or + *SSA_NAME_DEF_STMT(element) is function call. + * 3) The return-value has immediate uses only within comparisons (gcond or gassign) + *and return_stmt (and likewise a phi arg has immediate use only within comparison + *or the phi stmt). + */ Now * in begginig of lines. Theoretically by coding standards the comment should start with description of what function does and what are the parameters. I believe Richi already commented on this part - which is more of his domain, but it seems fine to me. Pehraps with -details dump it would be nice to dump reason why the malloc candidate was rejected. +DEBUG_FUNCTION +static void +dump_malloc_lattice (FILE *dump_file, const char *s) +static void +propagate_malloc (void) For coding standards, please add block comments. With these changes the patch looks good to me! Honza > > I tested it with SPEC2006 on AArch64 Cortex-a57 processor and saw some > improvement for > 433.milc (+1.79%), 437.leslie3d (+2.84%) and 470.lbm (+4%) and not > much differences for other benchmarks. > I don't expect them to be precise though, it was run with only one > iteration of SPEC. > Thanks! > > Regards, > Prathamesh > > > > Thanks, > > Prathamesh > >> > >> Thanks, > >> Prathamesh > >>> > >>> Thanks, > >>> Prathamesh > > Thanks, > Prathamesh > > > > Regards, > > Prathamesh > >> > >> Thanks, > >> Prathamesh > >>> > >>> Honza
Re: [RFC] propagate malloc attribute in ipa-pure-const pass
On 15 September 2017 at 17:49, Prathamesh Kulkarniwrote: > On 1 September 2017 at 08:09, Prathamesh Kulkarni > wrote: >> On 17 August 2017 at 18:02, Prathamesh Kulkarni >> wrote: >>> On 8 August 2017 at 09:50, Prathamesh Kulkarni >>> wrote: On 31 July 2017 at 23:53, Prathamesh Kulkarni wrote: > On 23 May 2017 at 19:10, Prathamesh Kulkarni > wrote: >> On 19 May 2017 at 19:02, Jan Hubicka wrote: * LTO and memory management This is a general question about LTO and memory management. IIUC the following sequence takes place during normal LTO: LGEN: generate_summary, write_summary WPA: read_summary, execute ipa passes, write_opt_summary So I assumed it was OK in LGEN to allocate return_callees_map in generate_summary and free it in write_summary and during WPA, allocate return_callees_map in read_summary and free it after execute (since write_opt_summary does not require return_callees_map). However with fat LTO, it seems the sequence changes for LGEN with execute phase takes place after write_summary. However since return_callees_map is freed in pure_const_write_summary and propagate_malloc() accesses it in execute stage, it results in segmentation fault. To work around this, I am using the following hack in pure_const_write_summary: // FIXME: Do not free if -ffat-lto-objects is enabled. if (!global_options.x_flag_fat_lto_objects) free_return_callees_map (); Is there a better approach for handling this ? >>> >>> I think most passes just do not free summaries with -flto. We probably >>> want >>> to fix it to make it possible to compile multiple units i.e. from >>> plugin by >>> adding release_summaries method... >>> So I would say it is OK to do the same as others do and leak it with >>> -flto. diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c index e457166ea39..724c26e03f6 100644 --- a/gcc/ipa-pure-const.c +++ b/gcc/ipa-pure-const.c @@ -56,6 +56,7 @@ along with GCC; see the file COPYING3. If not see #include "tree-scalar-evolution.h" #include "intl.h" #include "opts.h" +#include "ssa.h" /* Lattice values for const and pure functions. Everything starts out being const, then may drop to pure and then neither depending on @@ -69,6 +70,15 @@ enum pure_const_state_e const char *pure_const_names[3] = {"const", "pure", "neither"}; +enum malloc_state_e +{ + PURE_CONST_MALLOC_TOP, + PURE_CONST_MALLOC, + PURE_CONST_MALLOC_BOTTOM +}; >>> >>> It took me a while to work out what PURE_CONST means here :) >>> I would just call it something like STATE_MALLOC_TOP... or so. >>> ipa_pure_const is outdated name from the time pass was doing only >>> those two. @@ -109,6 +121,10 @@ typedef struct funct_state_d * funct_state; static vec funct_state_vec; +/* A map from node to subset of callees. The subset contains those callees + * whose return-value is returned by the node. */ +static hash_map< cgraph_node *, vec* > *return_callees_map; + >>> >>> Hehe, a special case of return jump function. We ought to support >>> those more generally. >>> How do you keep it up to date over callgraph changes? @@ -921,6 +1055,23 @@ end: if (TREE_NOTHROW (decl)) l->can_throw = false; + if (ipa) +{ + vec v = vNULL; + l->malloc_state = PURE_CONST_MALLOC_BOTTOM; + if (DECL_IS_MALLOC (decl)) + l->malloc_state = PURE_CONST_MALLOC; + else if (malloc_candidate_p (DECL_STRUCT_FUNCTION (decl), v)) + { + l->malloc_state = PURE_CONST_MALLOC_TOP; + vec *callees_p = new vec (vNULL); + for (unsigned i = 0; i < v.length (); ++i) + callees_p->safe_push (v[i]); + return_callees_map->put (fn, callees_p); + } + v.release (); +} + >>> >>> I would do non-ipa variant, too. I think most attributes can be >>> detected that way >>> as well. >>> >>> The patch generally makes sense to me. It would be nice to make it >>> easier to write such >>> a basic propagators across callgraph (perhaps adding a template doing >>> the basic >>>
Re: [RFC] propagate malloc attribute in ipa-pure-const pass
On 1 September 2017 at 08:09, Prathamesh Kulkarniwrote: > On 17 August 2017 at 18:02, Prathamesh Kulkarni > wrote: >> On 8 August 2017 at 09:50, Prathamesh Kulkarni >> wrote: >>> On 31 July 2017 at 23:53, Prathamesh Kulkarni >>> wrote: On 23 May 2017 at 19:10, Prathamesh Kulkarni wrote: > On 19 May 2017 at 19:02, Jan Hubicka wrote: >>> >>> * LTO and memory management >>> This is a general question about LTO and memory management. >>> IIUC the following sequence takes place during normal LTO: >>> LGEN: generate_summary, write_summary >>> WPA: read_summary, execute ipa passes, write_opt_summary >>> >>> So I assumed it was OK in LGEN to allocate return_callees_map in >>> generate_summary and free it in write_summary and during WPA, allocate >>> return_callees_map in read_summary and free it after execute (since >>> write_opt_summary does not require return_callees_map). >>> >>> However with fat LTO, it seems the sequence changes for LGEN with >>> execute phase takes place after write_summary. However since >>> return_callees_map is freed in pure_const_write_summary and >>> propagate_malloc() accesses it in execute stage, it results in >>> segmentation fault. >>> >>> To work around this, I am using the following hack in >>> pure_const_write_summary: >>> // FIXME: Do not free if -ffat-lto-objects is enabled. >>> if (!global_options.x_flag_fat_lto_objects) >>> free_return_callees_map (); >>> Is there a better approach for handling this ? >> >> I think most passes just do not free summaries with -flto. We probably >> want >> to fix it to make it possible to compile multiple units i.e. from plugin >> by >> adding release_summaries method... >> So I would say it is OK to do the same as others do and leak it with >> -flto. >>> diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c >>> index e457166ea39..724c26e03f6 100644 >>> --- a/gcc/ipa-pure-const.c >>> +++ b/gcc/ipa-pure-const.c >>> @@ -56,6 +56,7 @@ along with GCC; see the file COPYING3. If not see >>> #include "tree-scalar-evolution.h" >>> #include "intl.h" >>> #include "opts.h" >>> +#include "ssa.h" >>> >>> /* Lattice values for const and pure functions. Everything starts out >>> being const, then may drop to pure and then neither depending on >>> @@ -69,6 +70,15 @@ enum pure_const_state_e >>> >>> const char *pure_const_names[3] = {"const", "pure", "neither"}; >>> >>> +enum malloc_state_e >>> +{ >>> + PURE_CONST_MALLOC_TOP, >>> + PURE_CONST_MALLOC, >>> + PURE_CONST_MALLOC_BOTTOM >>> +}; >> >> It took me a while to work out what PURE_CONST means here :) >> I would just call it something like STATE_MALLOC_TOP... or so. >> ipa_pure_const is outdated name from the time pass was doing only >> those two. >>> @@ -109,6 +121,10 @@ typedef struct funct_state_d * funct_state; >>> >>> static vec funct_state_vec; >>> >>> +/* A map from node to subset of callees. The subset contains those >>> callees >>> + * whose return-value is returned by the node. */ >>> +static hash_map< cgraph_node *, vec* > >>> *return_callees_map; >>> + >> >> Hehe, a special case of return jump function. We ought to support those >> more generally. >> How do you keep it up to date over callgraph changes? >>> @@ -921,6 +1055,23 @@ end: >>>if (TREE_NOTHROW (decl)) >>> l->can_throw = false; >>> >>> + if (ipa) >>> +{ >>> + vec v = vNULL; >>> + l->malloc_state = PURE_CONST_MALLOC_BOTTOM; >>> + if (DECL_IS_MALLOC (decl)) >>> + l->malloc_state = PURE_CONST_MALLOC; >>> + else if (malloc_candidate_p (DECL_STRUCT_FUNCTION (decl), v)) >>> + { >>> + l->malloc_state = PURE_CONST_MALLOC_TOP; >>> + vec *callees_p = new vec (vNULL); >>> + for (unsigned i = 0; i < v.length (); ++i) >>> + callees_p->safe_push (v[i]); >>> + return_callees_map->put (fn, callees_p); >>> + } >>> + v.release (); >>> +} >>> + >> >> I would do non-ipa variant, too. I think most attributes can be >> detected that way >> as well. >> >> The patch generally makes sense to me. It would be nice to make it >> easier to write such >> a basic propagators across callgraph (perhaps adding a template doing >> the basic >> propagation logic). Also I think you need to solve the problem with >> keeping your >> summaries up to date across callgraph node removal and duplications. > Thanks for the suggestions, I will try to
Re: [RFC] propagate malloc attribute in ipa-pure-const pass
On 17 August 2017 at 18:02, Prathamesh Kulkarniwrote: > On 8 August 2017 at 09:50, Prathamesh Kulkarni > wrote: >> On 31 July 2017 at 23:53, Prathamesh Kulkarni >> wrote: >>> On 23 May 2017 at 19:10, Prathamesh Kulkarni >>> wrote: On 19 May 2017 at 19:02, Jan Hubicka wrote: >> >> * LTO and memory management >> This is a general question about LTO and memory management. >> IIUC the following sequence takes place during normal LTO: >> LGEN: generate_summary, write_summary >> WPA: read_summary, execute ipa passes, write_opt_summary >> >> So I assumed it was OK in LGEN to allocate return_callees_map in >> generate_summary and free it in write_summary and during WPA, allocate >> return_callees_map in read_summary and free it after execute (since >> write_opt_summary does not require return_callees_map). >> >> However with fat LTO, it seems the sequence changes for LGEN with >> execute phase takes place after write_summary. However since >> return_callees_map is freed in pure_const_write_summary and >> propagate_malloc() accesses it in execute stage, it results in >> segmentation fault. >> >> To work around this, I am using the following hack in >> pure_const_write_summary: >> // FIXME: Do not free if -ffat-lto-objects is enabled. >> if (!global_options.x_flag_fat_lto_objects) >> free_return_callees_map (); >> Is there a better approach for handling this ? > > I think most passes just do not free summaries with -flto. We probably > want > to fix it to make it possible to compile multiple units i.e. from plugin > by > adding release_summaries method... > So I would say it is OK to do the same as others do and leak it with > -flto. >> diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c >> index e457166ea39..724c26e03f6 100644 >> --- a/gcc/ipa-pure-const.c >> +++ b/gcc/ipa-pure-const.c >> @@ -56,6 +56,7 @@ along with GCC; see the file COPYING3. If not see >> #include "tree-scalar-evolution.h" >> #include "intl.h" >> #include "opts.h" >> +#include "ssa.h" >> >> /* Lattice values for const and pure functions. Everything starts out >> being const, then may drop to pure and then neither depending on >> @@ -69,6 +70,15 @@ enum pure_const_state_e >> >> const char *pure_const_names[3] = {"const", "pure", "neither"}; >> >> +enum malloc_state_e >> +{ >> + PURE_CONST_MALLOC_TOP, >> + PURE_CONST_MALLOC, >> + PURE_CONST_MALLOC_BOTTOM >> +}; > > It took me a while to work out what PURE_CONST means here :) > I would just call it something like STATE_MALLOC_TOP... or so. > ipa_pure_const is outdated name from the time pass was doing only > those two. >> @@ -109,6 +121,10 @@ typedef struct funct_state_d * funct_state; >> >> static vec funct_state_vec; >> >> +/* A map from node to subset of callees. The subset contains those >> callees >> + * whose return-value is returned by the node. */ >> +static hash_map< cgraph_node *, vec* > >> *return_callees_map; >> + > > Hehe, a special case of return jump function. We ought to support those > more generally. > How do you keep it up to date over callgraph changes? >> @@ -921,6 +1055,23 @@ end: >>if (TREE_NOTHROW (decl)) >> l->can_throw = false; >> >> + if (ipa) >> +{ >> + vec v = vNULL; >> + l->malloc_state = PURE_CONST_MALLOC_BOTTOM; >> + if (DECL_IS_MALLOC (decl)) >> + l->malloc_state = PURE_CONST_MALLOC; >> + else if (malloc_candidate_p (DECL_STRUCT_FUNCTION (decl), v)) >> + { >> + l->malloc_state = PURE_CONST_MALLOC_TOP; >> + vec *callees_p = new vec (vNULL); >> + for (unsigned i = 0; i < v.length (); ++i) >> + callees_p->safe_push (v[i]); >> + return_callees_map->put (fn, callees_p); >> + } >> + v.release (); >> +} >> + > > I would do non-ipa variant, too. I think most attributes can be detected > that way > as well. > > The patch generally makes sense to me. It would be nice to make it > easier to write such > a basic propagators across callgraph (perhaps adding a template doing the > basic > propagation logic). Also I think you need to solve the problem with > keeping your > summaries up to date across callgraph node removal and duplications. Thanks for the suggestions, I will try to address them in a follow-up patch. IIUC, I would need to modify ipa-pure-const cgraph hooks - add_new_function, remove_node_data, duplicate_node_data to keep return_callees_map
Re: [RFC] propagate malloc attribute in ipa-pure-const pass
On 8 August 2017 at 09:50, Prathamesh Kulkarniwrote: > On 31 July 2017 at 23:53, Prathamesh Kulkarni > wrote: >> On 23 May 2017 at 19:10, Prathamesh Kulkarni >> wrote: >>> On 19 May 2017 at 19:02, Jan Hubicka wrote: > > * LTO and memory management > This is a general question about LTO and memory management. > IIUC the following sequence takes place during normal LTO: > LGEN: generate_summary, write_summary > WPA: read_summary, execute ipa passes, write_opt_summary > > So I assumed it was OK in LGEN to allocate return_callees_map in > generate_summary and free it in write_summary and during WPA, allocate > return_callees_map in read_summary and free it after execute (since > write_opt_summary does not require return_callees_map). > > However with fat LTO, it seems the sequence changes for LGEN with > execute phase takes place after write_summary. However since > return_callees_map is freed in pure_const_write_summary and > propagate_malloc() accesses it in execute stage, it results in > segmentation fault. > > To work around this, I am using the following hack in > pure_const_write_summary: > // FIXME: Do not free if -ffat-lto-objects is enabled. > if (!global_options.x_flag_fat_lto_objects) > free_return_callees_map (); > Is there a better approach for handling this ? I think most passes just do not free summaries with -flto. We probably want to fix it to make it possible to compile multiple units i.e. from plugin by adding release_summaries method... So I would say it is OK to do the same as others do and leak it with -flto. > diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c > index e457166ea39..724c26e03f6 100644 > --- a/gcc/ipa-pure-const.c > +++ b/gcc/ipa-pure-const.c > @@ -56,6 +56,7 @@ along with GCC; see the file COPYING3. If not see > #include "tree-scalar-evolution.h" > #include "intl.h" > #include "opts.h" > +#include "ssa.h" > > /* Lattice values for const and pure functions. Everything starts out > being const, then may drop to pure and then neither depending on > @@ -69,6 +70,15 @@ enum pure_const_state_e > > const char *pure_const_names[3] = {"const", "pure", "neither"}; > > +enum malloc_state_e > +{ > + PURE_CONST_MALLOC_TOP, > + PURE_CONST_MALLOC, > + PURE_CONST_MALLOC_BOTTOM > +}; It took me a while to work out what PURE_CONST means here :) I would just call it something like STATE_MALLOC_TOP... or so. ipa_pure_const is outdated name from the time pass was doing only those two. > @@ -109,6 +121,10 @@ typedef struct funct_state_d * funct_state; > > static vec funct_state_vec; > > +/* A map from node to subset of callees. The subset contains those > callees > + * whose return-value is returned by the node. */ > +static hash_map< cgraph_node *, vec* > > *return_callees_map; > + Hehe, a special case of return jump function. We ought to support those more generally. How do you keep it up to date over callgraph changes? > @@ -921,6 +1055,23 @@ end: >if (TREE_NOTHROW (decl)) > l->can_throw = false; > > + if (ipa) > +{ > + vec v = vNULL; > + l->malloc_state = PURE_CONST_MALLOC_BOTTOM; > + if (DECL_IS_MALLOC (decl)) > + l->malloc_state = PURE_CONST_MALLOC; > + else if (malloc_candidate_p (DECL_STRUCT_FUNCTION (decl), v)) > + { > + l->malloc_state = PURE_CONST_MALLOC_TOP; > + vec *callees_p = new vec (vNULL); > + for (unsigned i = 0; i < v.length (); ++i) > + callees_p->safe_push (v[i]); > + return_callees_map->put (fn, callees_p); > + } > + v.release (); > +} > + I would do non-ipa variant, too. I think most attributes can be detected that way as well. The patch generally makes sense to me. It would be nice to make it easier to write such a basic propagators across callgraph (perhaps adding a template doing the basic propagation logic). Also I think you need to solve the problem with keeping your summaries up to date across callgraph node removal and duplications. >>> Thanks for the suggestions, I will try to address them in a follow-up patch. >>> IIUC, I would need to modify ipa-pure-const cgraph hooks - >>> add_new_function, remove_node_data, duplicate_node_data >>> to keep return_callees_map up-to-date across callgraph node insertions >>> and removal ? >>> >>> Also, if instead of having a separate data-structure like >>> return_callees_map, >>> should we rather have a flag within cgraph_edge, which marks that the >>>
Re: [RFC] propagate malloc attribute in ipa-pure-const pass
On 31 July 2017 at 23:53, Prathamesh Kulkarniwrote: > On 23 May 2017 at 19:10, Prathamesh Kulkarni > wrote: >> On 19 May 2017 at 19:02, Jan Hubicka wrote: * LTO and memory management This is a general question about LTO and memory management. IIUC the following sequence takes place during normal LTO: LGEN: generate_summary, write_summary WPA: read_summary, execute ipa passes, write_opt_summary So I assumed it was OK in LGEN to allocate return_callees_map in generate_summary and free it in write_summary and during WPA, allocate return_callees_map in read_summary and free it after execute (since write_opt_summary does not require return_callees_map). However with fat LTO, it seems the sequence changes for LGEN with execute phase takes place after write_summary. However since return_callees_map is freed in pure_const_write_summary and propagate_malloc() accesses it in execute stage, it results in segmentation fault. To work around this, I am using the following hack in pure_const_write_summary: // FIXME: Do not free if -ffat-lto-objects is enabled. if (!global_options.x_flag_fat_lto_objects) free_return_callees_map (); Is there a better approach for handling this ? >>> >>> I think most passes just do not free summaries with -flto. We probably want >>> to fix it to make it possible to compile multiple units i.e. from plugin by >>> adding release_summaries method... >>> So I would say it is OK to do the same as others do and leak it with -flto. diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c index e457166ea39..724c26e03f6 100644 --- a/gcc/ipa-pure-const.c +++ b/gcc/ipa-pure-const.c @@ -56,6 +56,7 @@ along with GCC; see the file COPYING3. If not see #include "tree-scalar-evolution.h" #include "intl.h" #include "opts.h" +#include "ssa.h" /* Lattice values for const and pure functions. Everything starts out being const, then may drop to pure and then neither depending on @@ -69,6 +70,15 @@ enum pure_const_state_e const char *pure_const_names[3] = {"const", "pure", "neither"}; +enum malloc_state_e +{ + PURE_CONST_MALLOC_TOP, + PURE_CONST_MALLOC, + PURE_CONST_MALLOC_BOTTOM +}; >>> >>> It took me a while to work out what PURE_CONST means here :) >>> I would just call it something like STATE_MALLOC_TOP... or so. >>> ipa_pure_const is outdated name from the time pass was doing only >>> those two. @@ -109,6 +121,10 @@ typedef struct funct_state_d * funct_state; static vec funct_state_vec; +/* A map from node to subset of callees. The subset contains those callees + * whose return-value is returned by the node. */ +static hash_map< cgraph_node *, vec* > *return_callees_map; + >>> >>> Hehe, a special case of return jump function. We ought to support those >>> more generally. >>> How do you keep it up to date over callgraph changes? @@ -921,6 +1055,23 @@ end: if (TREE_NOTHROW (decl)) l->can_throw = false; + if (ipa) +{ + vec v = vNULL; + l->malloc_state = PURE_CONST_MALLOC_BOTTOM; + if (DECL_IS_MALLOC (decl)) + l->malloc_state = PURE_CONST_MALLOC; + else if (malloc_candidate_p (DECL_STRUCT_FUNCTION (decl), v)) + { + l->malloc_state = PURE_CONST_MALLOC_TOP; + vec *callees_p = new vec (vNULL); + for (unsigned i = 0; i < v.length (); ++i) + callees_p->safe_push (v[i]); + return_callees_map->put (fn, callees_p); + } + v.release (); +} + >>> >>> I would do non-ipa variant, too. I think most attributes can be detected >>> that way >>> as well. >>> >>> The patch generally makes sense to me. It would be nice to make it easier >>> to write such >>> a basic propagators across callgraph (perhaps adding a template doing the >>> basic >>> propagation logic). Also I think you need to solve the problem with keeping >>> your >>> summaries up to date across callgraph node removal and duplications. >> Thanks for the suggestions, I will try to address them in a follow-up patch. >> IIUC, I would need to modify ipa-pure-const cgraph hooks - >> add_new_function, remove_node_data, duplicate_node_data >> to keep return_callees_map up-to-date across callgraph node insertions >> and removal ? >> >> Also, if instead of having a separate data-structure like return_callees_map, >> should we rather have a flag within cgraph_edge, which marks that the >> caller may return the value of the callee ? > Hi, > Sorry for the very late response. I have attached an updated version > of the prototype patch, > which adds a non-ipa variant, and keeps return_callees_map up-to-date > across
Re: [RFC] propagate malloc attribute in ipa-pure-const pass
On 23 May 2017 at 19:10, Prathamesh Kulkarniwrote: > On 19 May 2017 at 19:02, Jan Hubicka wrote: >>> >>> * LTO and memory management >>> This is a general question about LTO and memory management. >>> IIUC the following sequence takes place during normal LTO: >>> LGEN: generate_summary, write_summary >>> WPA: read_summary, execute ipa passes, write_opt_summary >>> >>> So I assumed it was OK in LGEN to allocate return_callees_map in >>> generate_summary and free it in write_summary and during WPA, allocate >>> return_callees_map in read_summary and free it after execute (since >>> write_opt_summary does not require return_callees_map). >>> >>> However with fat LTO, it seems the sequence changes for LGEN with >>> execute phase takes place after write_summary. However since >>> return_callees_map is freed in pure_const_write_summary and >>> propagate_malloc() accesses it in execute stage, it results in >>> segmentation fault. >>> >>> To work around this, I am using the following hack in >>> pure_const_write_summary: >>> // FIXME: Do not free if -ffat-lto-objects is enabled. >>> if (!global_options.x_flag_fat_lto_objects) >>> free_return_callees_map (); >>> Is there a better approach for handling this ? >> >> I think most passes just do not free summaries with -flto. We probably want >> to fix it to make it possible to compile multiple units i.e. from plugin by >> adding release_summaries method... >> So I would say it is OK to do the same as others do and leak it with -flto. >>> diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c >>> index e457166ea39..724c26e03f6 100644 >>> --- a/gcc/ipa-pure-const.c >>> +++ b/gcc/ipa-pure-const.c >>> @@ -56,6 +56,7 @@ along with GCC; see the file COPYING3. If not see >>> #include "tree-scalar-evolution.h" >>> #include "intl.h" >>> #include "opts.h" >>> +#include "ssa.h" >>> >>> /* Lattice values for const and pure functions. Everything starts out >>> being const, then may drop to pure and then neither depending on >>> @@ -69,6 +70,15 @@ enum pure_const_state_e >>> >>> const char *pure_const_names[3] = {"const", "pure", "neither"}; >>> >>> +enum malloc_state_e >>> +{ >>> + PURE_CONST_MALLOC_TOP, >>> + PURE_CONST_MALLOC, >>> + PURE_CONST_MALLOC_BOTTOM >>> +}; >> >> It took me a while to work out what PURE_CONST means here :) >> I would just call it something like STATE_MALLOC_TOP... or so. >> ipa_pure_const is outdated name from the time pass was doing only >> those two. >>> @@ -109,6 +121,10 @@ typedef struct funct_state_d * funct_state; >>> >>> static vec funct_state_vec; >>> >>> +/* A map from node to subset of callees. The subset contains those callees >>> + * whose return-value is returned by the node. */ >>> +static hash_map< cgraph_node *, vec* > *return_callees_map; >>> + >> >> Hehe, a special case of return jump function. We ought to support those >> more generally. >> How do you keep it up to date over callgraph changes? >>> @@ -921,6 +1055,23 @@ end: >>>if (TREE_NOTHROW (decl)) >>> l->can_throw = false; >>> >>> + if (ipa) >>> +{ >>> + vec v = vNULL; >>> + l->malloc_state = PURE_CONST_MALLOC_BOTTOM; >>> + if (DECL_IS_MALLOC (decl)) >>> + l->malloc_state = PURE_CONST_MALLOC; >>> + else if (malloc_candidate_p (DECL_STRUCT_FUNCTION (decl), v)) >>> + { >>> + l->malloc_state = PURE_CONST_MALLOC_TOP; >>> + vec *callees_p = new vec (vNULL); >>> + for (unsigned i = 0; i < v.length (); ++i) >>> + callees_p->safe_push (v[i]); >>> + return_callees_map->put (fn, callees_p); >>> + } >>> + v.release (); >>> +} >>> + >> >> I would do non-ipa variant, too. I think most attributes can be detected >> that way >> as well. >> >> The patch generally makes sense to me. It would be nice to make it easier >> to write such >> a basic propagators across callgraph (perhaps adding a template doing the >> basic >> propagation logic). Also I think you need to solve the problem with keeping >> your >> summaries up to date across callgraph node removal and duplications. > Thanks for the suggestions, I will try to address them in a follow-up patch. > IIUC, I would need to modify ipa-pure-const cgraph hooks - > add_new_function, remove_node_data, duplicate_node_data > to keep return_callees_map up-to-date across callgraph node insertions > and removal ? > > Also, if instead of having a separate data-structure like return_callees_map, > should we rather have a flag within cgraph_edge, which marks that the > caller may return the value of the callee ? Hi, Sorry for the very late response. I have attached an updated version of the prototype patch, which adds a non-ipa variant, and keeps return_callees_map up-to-date across callgraph node insertions and removal. For the non-ipa variant, malloc_candidate_p() additionally checks that all the "return callees" have DECL_IS_MALLOC set to true. Bootstrapped+tested and LTO bootstrapped+tested
Re: [RFC] propagate malloc attribute in ipa-pure-const pass
On 19 May 2017 at 19:02, Jan Hubickawrote: >> >> * LTO and memory management >> This is a general question about LTO and memory management. >> IIUC the following sequence takes place during normal LTO: >> LGEN: generate_summary, write_summary >> WPA: read_summary, execute ipa passes, write_opt_summary >> >> So I assumed it was OK in LGEN to allocate return_callees_map in >> generate_summary and free it in write_summary and during WPA, allocate >> return_callees_map in read_summary and free it after execute (since >> write_opt_summary does not require return_callees_map). >> >> However with fat LTO, it seems the sequence changes for LGEN with >> execute phase takes place after write_summary. However since >> return_callees_map is freed in pure_const_write_summary and >> propagate_malloc() accesses it in execute stage, it results in >> segmentation fault. >> >> To work around this, I am using the following hack in >> pure_const_write_summary: >> // FIXME: Do not free if -ffat-lto-objects is enabled. >> if (!global_options.x_flag_fat_lto_objects) >> free_return_callees_map (); >> Is there a better approach for handling this ? > > I think most passes just do not free summaries with -flto. We probably want > to fix it to make it possible to compile multiple units i.e. from plugin by > adding release_summaries method... > So I would say it is OK to do the same as others do and leak it with -flto. >> diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c >> index e457166ea39..724c26e03f6 100644 >> --- a/gcc/ipa-pure-const.c >> +++ b/gcc/ipa-pure-const.c >> @@ -56,6 +56,7 @@ along with GCC; see the file COPYING3. If not see >> #include "tree-scalar-evolution.h" >> #include "intl.h" >> #include "opts.h" >> +#include "ssa.h" >> >> /* Lattice values for const and pure functions. Everything starts out >> being const, then may drop to pure and then neither depending on >> @@ -69,6 +70,15 @@ enum pure_const_state_e >> >> const char *pure_const_names[3] = {"const", "pure", "neither"}; >> >> +enum malloc_state_e >> +{ >> + PURE_CONST_MALLOC_TOP, >> + PURE_CONST_MALLOC, >> + PURE_CONST_MALLOC_BOTTOM >> +}; > > It took me a while to work out what PURE_CONST means here :) > I would just call it something like STATE_MALLOC_TOP... or so. > ipa_pure_const is outdated name from the time pass was doing only > those two. >> @@ -109,6 +121,10 @@ typedef struct funct_state_d * funct_state; >> >> static vec funct_state_vec; >> >> +/* A map from node to subset of callees. The subset contains those callees >> + * whose return-value is returned by the node. */ >> +static hash_map< cgraph_node *, vec* > *return_callees_map; >> + > > Hehe, a special case of return jump function. We ought to support those more > generally. > How do you keep it up to date over callgraph changes? >> @@ -921,6 +1055,23 @@ end: >>if (TREE_NOTHROW (decl)) >> l->can_throw = false; >> >> + if (ipa) >> +{ >> + vec v = vNULL; >> + l->malloc_state = PURE_CONST_MALLOC_BOTTOM; >> + if (DECL_IS_MALLOC (decl)) >> + l->malloc_state = PURE_CONST_MALLOC; >> + else if (malloc_candidate_p (DECL_STRUCT_FUNCTION (decl), v)) >> + { >> + l->malloc_state = PURE_CONST_MALLOC_TOP; >> + vec *callees_p = new vec (vNULL); >> + for (unsigned i = 0; i < v.length (); ++i) >> + callees_p->safe_push (v[i]); >> + return_callees_map->put (fn, callees_p); >> + } >> + v.release (); >> +} >> + > > I would do non-ipa variant, too. I think most attributes can be detected > that way > as well. > > The patch generally makes sense to me. It would be nice to make it easier to > write such > a basic propagators across callgraph (perhaps adding a template doing the > basic > propagation logic). Also I think you need to solve the problem with keeping > your > summaries up to date across callgraph node removal and duplications. Thanks for the suggestions, I will try to address them in a follow-up patch. IIUC, I would need to modify ipa-pure-const cgraph hooks - add_new_function, remove_node_data, duplicate_node_data to keep return_callees_map up-to-date across callgraph node insertions and removal ? Also, if instead of having a separate data-structure like return_callees_map, should we rather have a flag within cgraph_edge, which marks that the caller may return the value of the callee ? Thanks, Prathamesh > > Honza
Re: [RFC] propagate malloc attribute in ipa-pure-const pass
> > * LTO and memory management > This is a general question about LTO and memory management. > IIUC the following sequence takes place during normal LTO: > LGEN: generate_summary, write_summary > WPA: read_summary, execute ipa passes, write_opt_summary > > So I assumed it was OK in LGEN to allocate return_callees_map in > generate_summary and free it in write_summary and during WPA, allocate > return_callees_map in read_summary and free it after execute (since > write_opt_summary does not require return_callees_map). > > However with fat LTO, it seems the sequence changes for LGEN with > execute phase takes place after write_summary. However since > return_callees_map is freed in pure_const_write_summary and > propagate_malloc() accesses it in execute stage, it results in > segmentation fault. > > To work around this, I am using the following hack in > pure_const_write_summary: > // FIXME: Do not free if -ffat-lto-objects is enabled. > if (!global_options.x_flag_fat_lto_objects) > free_return_callees_map (); > Is there a better approach for handling this ? I think most passes just do not free summaries with -flto. We probably want to fix it to make it possible to compile multiple units i.e. from plugin by adding release_summaries method... So I would say it is OK to do the same as others do and leak it with -flto. > diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c > index e457166ea39..724c26e03f6 100644 > --- a/gcc/ipa-pure-const.c > +++ b/gcc/ipa-pure-const.c > @@ -56,6 +56,7 @@ along with GCC; see the file COPYING3. If not see > #include "tree-scalar-evolution.h" > #include "intl.h" > #include "opts.h" > +#include "ssa.h" > > /* Lattice values for const and pure functions. Everything starts out > being const, then may drop to pure and then neither depending on > @@ -69,6 +70,15 @@ enum pure_const_state_e > > const char *pure_const_names[3] = {"const", "pure", "neither"}; > > +enum malloc_state_e > +{ > + PURE_CONST_MALLOC_TOP, > + PURE_CONST_MALLOC, > + PURE_CONST_MALLOC_BOTTOM > +}; It took me a while to work out what PURE_CONST means here :) I would just call it something like STATE_MALLOC_TOP... or so. ipa_pure_const is outdated name from the time pass was doing only those two. > @@ -109,6 +121,10 @@ typedef struct funct_state_d * funct_state; > > static vec funct_state_vec; > > +/* A map from node to subset of callees. The subset contains those callees > + * whose return-value is returned by the node. */ > +static hash_map< cgraph_node *, vec* > *return_callees_map; > + Hehe, a special case of return jump function. We ought to support those more generally. How do you keep it up to date over callgraph changes? > @@ -921,6 +1055,23 @@ end: >if (TREE_NOTHROW (decl)) > l->can_throw = false; > > + if (ipa) > +{ > + vec v = vNULL; > + l->malloc_state = PURE_CONST_MALLOC_BOTTOM; > + if (DECL_IS_MALLOC (decl)) > + l->malloc_state = PURE_CONST_MALLOC; > + else if (malloc_candidate_p (DECL_STRUCT_FUNCTION (decl), v)) > + { > + l->malloc_state = PURE_CONST_MALLOC_TOP; > + vec *callees_p = new vec (vNULL); > + for (unsigned i = 0; i < v.length (); ++i) > + callees_p->safe_push (v[i]); > + return_callees_map->put (fn, callees_p); > + } > + v.release (); > +} > + I would do non-ipa variant, too. I think most attributes can be detected that way as well. The patch generally makes sense to me. It would be nice to make it easier to write such a basic propagators across callgraph (perhaps adding a template doing the basic propagation logic). Also I think you need to solve the problem with keeping your summaries up to date across callgraph node removal and duplications. Honza
Re: [RFC] propagate malloc attribute in ipa-pure-const pass
> > struct D: B { > > char buf[32]; > > virtual void* f (unsigned n) { > > if (n < 32) > > return n <= 32 ? buf : B::f (n); > > } > > > > Breaking foo's attribute malloc constraint. > > > > In other words, I think virtual functions need to be excluded > > from the list (unless they're defined in a class marked final, > > or unless we know they're not overridden to break the constraint > > like above). > > But we are annotating the actual decl, not the type in the class > struct. Yep and we will be able to use the information in case we devirtualize the call. So this seems OK to me. Honza > > Richard.
Re: [RFC] propagate malloc attribute in ipa-pure-const pass
On Wed, 17 May 2017, Martin Sebor wrote: > > The patch passes bootstrap+test on x86_64 and found a few functions in > > the source tree (attached func_names.txt) that could be annotated with > > malloc (I gave a brief look at some of the functions and didn't appear > > to be false positives but I will recheck thoroughly) > > virtual char* libcp1::compiler::find(std::__cxx11::string&) const > > The virtual on the list of your candidates gave me pause. Consider > this completely contrived example: > > struct B { > virtual void* f (unsigned n) { > return new char [n]; > } > }; > > void* foo (B , unsigned n) > { > return b.f (n); > } > > Based on these definitions alone both functions are candidates > for attribute malloc. > > But suppose foo is called with an object of a type derived from > B that overrides f() to do something wacky (but strictly not > invalid) like: > > struct D: B { > char buf[32]; > virtual void* f (unsigned n) { > if (n < 32) > return n <= 32 ? buf : B::f (n); > } > > Breaking foo's attribute malloc constraint. > > In other words, I think virtual functions need to be excluded > from the list (unless they're defined in a class marked final, > or unless we know they're not overridden to break the constraint > like above). But we are annotating the actual decl, not the type in the class struct. Richard.
Re: [RFC] propagate malloc attribute in ipa-pure-const pass
The patch passes bootstrap+test on x86_64 and found a few functions in the source tree (attached func_names.txt) that could be annotated with malloc (I gave a brief look at some of the functions and didn't appear to be false positives but I will recheck thoroughly) virtual char* libcp1::compiler::find(std::__cxx11::string&) const The virtual on the list of your candidates gave me pause. Consider this completely contrived example: struct B { virtual void* f (unsigned n) { return new char [n]; } }; void* foo (B , unsigned n) { return b.f (n); } Based on these definitions alone both functions are candidates for attribute malloc. But suppose foo is called with an object of a type derived from B that overrides f() to do something wacky (but strictly not invalid) like: struct D: B { char buf[32]; virtual void* f (unsigned n) { if (n < 32) return n <= 32 ? buf : B::f (n); } Breaking foo's attribute malloc constraint. In other words, I think virtual functions need to be excluded from the list (unless they're defined in a class marked final, or unless we know they're not overridden to break the constraint like above). Martin
Re: [RFC] propagate malloc attribute in ipa-pure-const pass
On Mon, 15 May 2017, Prathamesh Kulkarni wrote: > Hi, > I have attached a bare-bones prototype patch that propagates malloc attribute > in > ipa-pure-const. As far as I understand, from the doc a function could > be annotated > with malloc attribute if it returns a pointer to a newly allocated > memory block (uninitialized or zeroed) and the pointer does not alias > any other valid pointer ? > > * Analysis > The analysis used by the patch in malloc_candidate_p is too conservative, > and I would be grateful for suggestions on improving it. > Currently it only checks if: > 1) The function returns a value of pointer type. > 2) SSA_NAME_DEF_STMT (return_value) is either a function call or a phi, and > SSA_NAME_DEF_STMT(each element of phi) is function call. > 3) The return-value has immediate uses only within comparisons (gcond > or gassign) and return_stmt (and likewise a phi arg has immediate use only > within comparison or the phi stmt). > > The intent is that malloc_candidate_p(node) returns true if node > returns the return value of it's callee, and the return value is only > used for comparisons within node. > Then I assume it's safe (although conservative) to decide that node > could possibly be a malloc function if callee is found to be malloc ? > > for eg: > void *f(size_t n) > { > void *p = __builtin_malloc (n); > if (p == 0) > __builtin_abort (); > return p; > } > > malloc_candidate_p would determine f to be a candidate for malloc > attribute since it returns the return-value of it's callee > (__builtin_malloc) and the return value is used only within comparison > and return_stmt. > > However due to the imprecise analysis it misses this case: > char **g(size_t n) > { > char **p = malloc (sizeof(char *) * 10); > for (int i = 0; i < 10; i++) > p[i] = malloc (sizeof(char) * n); > return p; > } > I suppose g() could be annotated with malloc here ? It cannot because what p points to is initialized. If you do then char **x = g(10); **x = 1; will be optimized away. Now what would be interesting is to do "poor mans IPA PTA", namely identify functions that are a) small, b) likely return a newly allocated pointer. At PTA time then "inline" all such wrappers, but only for the PTA constraints. That will give more precise points-to results (and can handle more cases, esp. initialization) than just annotating functions with 'malloc'. + /* With non-call exceptions we can't say for sure if other function +body was not possibly optimized to still throw. */ + if (!non_call || node->binds_to_current_def_p ()) + { + DECL_IS_MALLOC (node->decl) = true; + *changed = true; I don't see how malloc throwing would be an issue. + FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, retval) +if (gcond *cond = dyn_cast (use_stmt)) + { + enum tree_code code = gimple_cond_code (cond); + if (TREE_CODE_CLASS (code) != tcc_comparison) always false + RETURN_FROM_IMM_USE_STMT (use_iter, false); I think you want to disallow eq/ne_expr against sth not integer_zerop. +else if (gassign *ga = dyn_cast (use_stmt)) + { + enum tree_code code = gimple_assign_rhs_code (ga); + if (TREE_CODE_CLASS (code) != tcc_comparison) + RETURN_FROM_IMM_USE_STMT (use_iter, false); likewise. +static bool +malloc_candidate_p (function *fun, vec& callees) +{ + basic_block bb; + FOR_EACH_BB_FN (bb, fun) +{ + for (gimple_stmt_iterator gsi = gsi_start_bb (bb); + !gsi_end_p (gsi); + gsi_next ()) + { + greturn *ret_stmt = dyn_cast (gsi_stmt (gsi)); + if (ret_stmt) I think you want do do this much cheaper by only walking the last stmt of predecessor(s) of EXIT_BLOCK_FOR_FN. (s) because we have multiple return stmts for int foo (int i) { if (i) return; else return i; } but all return stmts will be the last stmt in one of the exit block predecessors. + if (!check_retval_uses (retval, ret_stmt)) + return false; + + gimple *def = SSA_NAME_DEF_STMT (retval); + if (gcall *call_stmt = dyn_cast (def)) + { + tree callee_decl = gimple_call_fndecl (call_stmt); + /* FIXME: In case of indirect call, callee_decl might be NULL +Not sure what to do in that case, punting for now. */ + if (!callee_decl) + return false; + cgraph_node *callee = cgraph_node::get_create (callee_decl); + callees.safe_push (callee); + } + else if (gphi *phi = dyn_cast (def)) + for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i) + { + tree arg = gimple_phi_arg_def (phi, i); + if (TREE_CODE (arg) != SSA_NAME) + return false; + if (!check_retval_uses (arg, phi)) + return
Re: [RFC] propagate malloc attribute in ipa-pure-const pass
On 05/15/2017 04:39 AM, Prathamesh Kulkarni wrote: Hi, I have attached a bare-bones prototype patch that propagates malloc attribute in ipa-pure-const. As far as I understand, from the doc a function could be annotated with malloc attribute if it returns a pointer to a newly allocated memory block (uninitialized or zeroed) and the pointer does not alias any other valid pointer ? * Analysis The analysis used by the patch in malloc_candidate_p is too conservative, and I would be grateful for suggestions on improving it. Currently it only checks if: 1) The function returns a value of pointer type. 2) SSA_NAME_DEF_STMT (return_value) is either a function call or a phi, and SSA_NAME_DEF_STMT(each element of phi) is function call. 3) The return-value has immediate uses only within comparisons (gcond or gassign) and return_stmt (and likewise a phi arg has immediate use only within comparison or the phi stmt). ISTM the return value *must* be either the result of a call to a function with the malloc attribute or NULL. It can't be a mix of malloc'd objects or something else (such as a passed-in object). malloc_candidate_p would determine f to be a candidate for malloc attribute since it returns the return-value of it's callee (__builtin_malloc) and the return value is used only within comparison and return_stmt. However due to the imprecise analysis it misses this case: char **g(size_t n) { char **p = malloc (sizeof(char *) * 10); for (int i = 0; i < 10; i++) p[i] = malloc (sizeof(char) * n); return p; } I suppose g() could be annotated with malloc here ? I think that's up to us to decide. So the question becomes what makes the most sense from a user and optimization standpoint. I would start relatively conservatively and look to add further analysis to detect more malloc functions over time. You've already identified comparisons as valid uses, but ISTM the pointer could be passed as an argument, stored into memory and all kinds of other things. You'll probably want instrumentation to log cases where something looked like a potential candidate, but had to be rejected for some reason. Jeff
[RFC] propagate malloc attribute in ipa-pure-const pass
Hi, I have attached a bare-bones prototype patch that propagates malloc attribute in ipa-pure-const. As far as I understand, from the doc a function could be annotated with malloc attribute if it returns a pointer to a newly allocated memory block (uninitialized or zeroed) and the pointer does not alias any other valid pointer ? * Analysis The analysis used by the patch in malloc_candidate_p is too conservative, and I would be grateful for suggestions on improving it. Currently it only checks if: 1) The function returns a value of pointer type. 2) SSA_NAME_DEF_STMT (return_value) is either a function call or a phi, and SSA_NAME_DEF_STMT(each element of phi) is function call. 3) The return-value has immediate uses only within comparisons (gcond or gassign) and return_stmt (and likewise a phi arg has immediate use only within comparison or the phi stmt). The intent is that malloc_candidate_p(node) returns true if node returns the return value of it's callee, and the return value is only used for comparisons within node. Then I assume it's safe (although conservative) to decide that node could possibly be a malloc function if callee is found to be malloc ? for eg: void *f(size_t n) { void *p = __builtin_malloc (n); if (p == 0) __builtin_abort (); return p; } malloc_candidate_p would determine f to be a candidate for malloc attribute since it returns the return-value of it's callee (__builtin_malloc) and the return value is used only within comparison and return_stmt. However due to the imprecise analysis it misses this case: char **g(size_t n) { char **p = malloc (sizeof(char *) * 10); for (int i = 0; i < 10; i++) p[i] = malloc (sizeof(char) * n); return p; } I suppose g() could be annotated with malloc here ? The patch uses return_calles_map which is a hash_mapsuch that the return value of node is return value of one of these callees. For above eg it would be The analysis phase populates return_callees_map, and the propagation phase uses it to take the "meet" of callees. * LTO and memory management This is a general question about LTO and memory management. IIUC the following sequence takes place during normal LTO: LGEN: generate_summary, write_summary WPA: read_summary, execute ipa passes, write_opt_summary So I assumed it was OK in LGEN to allocate return_callees_map in generate_summary and free it in write_summary and during WPA, allocate return_callees_map in read_summary and free it after execute (since write_opt_summary does not require return_callees_map). However with fat LTO, it seems the sequence changes for LGEN with execute phase takes place after write_summary. However since return_callees_map is freed in pure_const_write_summary and propagate_malloc() accesses it in execute stage, it results in segmentation fault. To work around this, I am using the following hack in pure_const_write_summary: // FIXME: Do not free if -ffat-lto-objects is enabled. if (!global_options.x_flag_fat_lto_objects) free_return_callees_map (); Is there a better approach for handling this ? Also I am not sure if I have written cgraph_node::set_malloc_flag[_1] correctly. I tried to imitate cgraph_node::set_const_flag. The patch passes bootstrap+test on x86_64 and found a few functions in the source tree (attached func_names.txt) that could be annotated with malloc (I gave a brief look at some of the functions and didn't appear to be false positives but I will recheck thoroughly) Does the patch look in the right direction ? I would be grateful for suggestions on improving it. Thanks, Prathamesh virtual char* libcp1::compiler::find(std::__cxx11::string&) const gomp_malloc virtual char* libcc1::compiler::find(std::__cxx11::string&) const void* GTM::xrealloc(void*, size_t, bool) concat char* gen_internal_sym(const char*) void* ira_allocate(size_t) char* gen_fake_label() char* selftest::locate_file(const char*) const char* find_plugindir_spec_function(int, const char**) reconcat xvasprintf char* rtx_reader::read_until(const char*, bool) _Tp* __gnu_cxx::__detail::__mini_vector<_Tp>::allocate(__gnu_cxx::__detail::__mini_vector<_Tp>::size_type) [with _Tp = long unsigned int*] xstrndup const char* replace_extension_spec_func(int, const char**) void* GTM::xcalloc(size_t, bool) _Tp* __gnu_cxx::__detail::__mini_vector<_Tp>::allocate(__gnu_cxx::__detail::__mini_vector<_Tp>::size_type) [with _Tp = unsigned int*] xstrdup void* GTM::xmalloc(size_t, bool) void* base_pool_allocator::allocate() [with TBlockAllocator = memory_block_pool] char* ix86_offload_options() basic_block_def* alloc_block() xmemdup char* build_message_string(const char*, ...) make_relative_prefix gomp_malloc_cleared make_relative_prefix_ignore_links choose_temp_base make_temp_file xasprintf char* file_name_as_prefix(diagnostic_context*, const char*) void* yyalloc(yy_size_t) void* ggc_internal_cleared_alloc(size_t, void (*)(void*), size_t, size_t) void*