Re: More of ipa-inline housekeeping
Honza, I will also look into the estimate_size ICE reported today. This has been fixed by revision 172603 and now the thresholds are the same with/without -flto. For the fine tuning I have posted some results on pr48636. Cheers, Dominique
Re: More of ipa-inline housekeeping
On Sun, Apr 17, 2011 at 10:35 AM, Jan Hubicka hubi...@ucw.cz wrote: AFAICT revision 172430 fixed the original problem in pr45810: gfc -Ofast -fwhole-program fatigue.f90 : 6.301u 0.003s 0:06.30 gfc -Ofast -fwhole-program -flto fatigue.f90 : 6.263u 0.003s 0:06.26 However if I play with --param max-inline-insns-auto=*, I get gfc -Ofast -fwhole-program --param max-inline-insns-auto=124 -fstack-arrays fatigue.f90 : 4.870u 0.002s 0:04.87 gfc -Ofast -fwhole-program --param max-inline-insns-auto=125 -fstack-arrays fatigue.f90 : 2.872u 0.002s 0:02.87 and gfc -Ofast -fwhole-program -flto --param max-inline-insns-auto=515 -fstack-arrays fatigue.f90 : 4.965u 0.003s 0:04.97 gfc -Ofast -fwhole-program -flto --param max-inline-insns-auto=516 -fstack-arrays fatigue.f90 : 2.732u 0.002s 0:02.73 while I get the same threshold=125 with/without -flto at revision 172429. Note that I get the same thresholds without -fstack-arrays, the run times are only larger. Thanks for notice. This was not really expected, but seems to give some insight. I just tested a new cleanup patch of mine where I fixed few minor bugs in side corners. One of those bugs I noticed was introduced by this patch (an overlook while converting the code to new accesor). In case of nested inlining, the stack usage got misaccounted and consequently we allowed more inlining than --param large-stack-frame-growth would allow normally. The vortex and wupwise improvement seems to be gone, so I think they are due to this issue. I never really tuned the stack frame growth heuristics since it did not cause any problems in the benchmarks. On fortran this is quite different because of the large i/o blocks hitting it very commonly, so I will look into making it more permissive. We definitely can just bump up the limits and/or we can also teach it that if call dominates the return there is not really much to save of stack usage by preventing inlining since both stack frames will wind up on the stack anyway. I think Micha has a fix for the I/O block issue. Richard. This means adding new bit whether call edge dominate exit and using this info. Also simple noreturn IPA discovery can be based on this and I recently noticed it might be important for Mozilla. So I will give it a try soonish. I will also look into the estimate_size ICE reported today. Honza
Re: More of ipa-inline housekeeping
I never really tuned the stack frame growth heuristics since it did not cause any problems in the benchmarks. On fortran this is quite different because of the large i/o blocks hitting it very commonly, so I will look into making it more permissive. We definitely can just bump up the limits and/or we can also teach it that if call dominates the return there is not really much to save of stack usage by preventing inlining since both stack frames will wind up on the stack anyway. I think Micha has a fix for the I/O block issue. Good, how he fixed this? In any case I sort of like the idea of tracking whether function call happens always and bypassing limits in that case. There seems to be cases of functions called once that are not inlined for this reason and there is no danger for doing so. I saw that in combine.c dump as well as in the ac polyhedron benchmark I just looked that. In case of combine.c it is definitely becuase of standard non-scalar local vars. Honza Richard. This means adding new bit whether call edge dominate exit and using this info. Also simple noreturn IPA discovery can be based on this and I recently noticed it might be important for Mozilla. So I will give it a try soonish. I will also look into the estimate_size ICE reported today. Honza
Re: More of ipa-inline housekeeping
I fixed this on the and added sanity check that the fields are initialized. This has shown problem with early inliner iteration fixed thusly and fact that early inliner is attempting to compute overall growth at a time the inline parameters are not computed for functions not visited by early optimizations yet. We previously agreed that early inliner should not try to do that (as this leads to early inliner inlining functions called once that should be deferred for later consieration). I just hope it won't cause benchmarks to regress too much ;) Yeah, we agreed to that. And I forgot about it as it wasn't part of the early inliner reorg (which was supposed to be a 1:1 transform). Today C++ results shows some regressions, but nothing earthshaking. So I think it is good idea to drop this feature of early inliner since it is not really systematic. There is also great improvement on LTO SPEC2000, but I tend to hope it is unrelated change. Perhaps your aliasing? Honza
Re: More of ipa-inline housekeeping
2011/4/15 Jan Hubicka hubi...@ucw.cz: I fixed this on the and added sanity check that the fields are initialized. This has shown problem with early inliner iteration fixed thusly and fact that early inliner is attempting to compute overall growth at a time the inline parameters are not computed for functions not visited by early optimizations yet. We previously agreed that early inliner should not try to do that (as this leads to early inliner inlining functions called once that should be deferred for later consieration). I just hope it won't cause benchmarks to regress too much ;) Yeah, we agreed to that. And I forgot about it as it wasn't part of the early inliner reorg (which was supposed to be a 1:1 transform). Today C++ results shows some regressions, but nothing earthshaking. So I think it is good idea to drop this feature of early inliner since it is not really systematic. There is also great improvement on LTO SPEC2000, but I tend to hope it is unrelated change. Perhaps your aliasing? I doubt SPEC2k uses VLAs or alloca, does it? Might be the DSE improvements, but I'm not sure. Richard. Honza
Re: More of ipa-inline housekeeping
I have forgotten to say that the 125 and 516 thresholds are for x86_64-apple-darwin10. On powerpc-apple-darwin9 they are repsectively 172 and 638. Dominique
More of ipa-inline housekeeping
Hi, this patch moves inline_summary from field in cgraph_node into its own on side datastructure. This moves it from arcane decision of mine to split all IPA data into global/local datas stored in common datastructure into the scheme we developed for new IPA passes some time ago. The advantage is that the code is more contained and less spread across the compiler. We also make cgraph_node smaller and dumps more compact that never hurts. While working on it I noticed that Richi's patch to introduce cgraph_edge times/sizes is bit iffy in computing data when they are missing in the datastructure. Also it computes incomming edge costs instead of outgoing that leads to fact that not all edges gets their info computed for IPA inliner (think of newly discovered direct calls or IPA merging). I fixed this on the and added sanity check that the fields are initialized. This has shown problem with early inliner iteration fixed thusly and fact that early inliner is attempting to compute overall growth at a time the inline parameters are not computed for functions not visited by early optimizations yet. We previously agreed that early inliner should not try to do that (as this leads to early inliner inlining functions called once that should be deferred for later consieration). I just hope it won't cause benchmarks to regress too much ;) Having place to pile inline analysis info in, there is more to cleanup. The cgraph_local/cgraph_global fields probably should go and the stuff from global info should go into inline_summary datastructure, too (the lifetimes are essentially the same so there is no need for the split). I will handle this incrementally. Bootstrapped/regtested x86_64-linux with slightly modified version of the patch. Re-testing with final version and intend to commit the patch tomorrow. Honza * cgraph.c (dump_cgraph_node): Do not dump inline summaries. * cgraph.h (struct inline_summary): Move to ipa-inline.h (cgraph_local_info): Remove inline_summary. * ipa-cp.c: Include ipa-inline.h. (ipcp_cloning_candidate_p, ipcp_estimate_growth, ipcp_estimate_cloning_cost, ipcp_insert_stage): Use inline_summary accesor. * lto-cgraph.c (lto_output_node): Do not stream inline summary. (input_overwrite_node): Do not set inline summary. (input_node): Do not stream inline summary. * ipa-inline.c (cgraph_decide_inlining): Dump inline summaries. (cgraph_decide_inlining_incrementally): Do not try to estimate overall growth; we do not have inline parameters computed for that anyway. (cgraph_early_inlining): After inlining compute call_stmt_sizes. * ipa-inline.h (struct inline_summary): Move here from ipa-inline.h (inline_summary_t): New type and VECtor. (debug_inline_summary, dump_inline_summaries): Declare. (inline_summary): Use VOCtor. (estimate_edge_growth): Kill hack computing call stmt size directly. * lto-section-in.c (lto_section_name): Add inline section. * ipa-inline-analysis.c: Include lto-streamer.h (node_removal_hook_holder, node_duplication_hook_holder): New holders (inline_node_removal_hook, inline_node_duplication_hook): New functions. (inline_summary_vec): Define. (inline_summary_alloc, dump_inline_summary, debug_inline_summary, dump_inline_summaries): New functions. (estimate_function_body_sizes): Properly compute size/time of outgoing calls. (compute_inline_parameters): Alloc inline_summary; do not compute size/time of incomming calls. (estimate_edge_time): Avoid missing time summary hack. (inline_read_summary): Read inline summary info. (inline_write_summary): Write inline summary info. (inline_free_summary): Free all hooks and inline summary vector. * lto-streamer.h: Add LTO_section_inline_summary section. * Makefile.in (ipa-cp.o, ipa-inline-analysis.o): Update dependencies. * ipa.c (cgraph_remove_unreachable_nodes): Fix dump file formating. * lto.c: Include ipa-inline.h (add_cgraph_node_to_partition, undo_partition): Use inline_summary accessor. (ipa_node_duplication_hook): Fix declaration. * Make-lang.in (lto.o): Update dependencies. Index: cgraph.c === --- cgraph.c(revision 172396) +++ cgraph.c(working copy) @@ -1876,22 +1876,6 @@ dump_cgraph_node (FILE *f, struct cgraph if (node-count) fprintf (f, executed HOST_WIDEST_INT_PRINT_DECx, (HOST_WIDEST_INT)node-count); - if (node-local.inline_summary.self_time) -fprintf (f, %i time, %i benefit, node-local.inline_summary.self_time, - node-local.inline_summary.time_inlining_benefit); - if (node-global.time node-global.time - != node-local.inline_summary.self_time) -fprintf (f, (%i after