Re: More of ipa-inline housekeeping

2011-04-17 Thread Dominique Dhumieres
Honza,

 I will also look into the estimate_size ICE reported today.

This has been fixed by revision 172603 and now the thresholds are
the same with/without -flto. For the fine tuning I have posted
some results on pr48636.

Cheers,

Dominique


Re: More of ipa-inline housekeeping

2011-04-17 Thread Richard Guenther
On Sun, Apr 17, 2011 at 10:35 AM, Jan Hubicka hubi...@ucw.cz wrote:
 AFAICT revision 172430 fixed the original problem in pr45810:

 gfc -Ofast -fwhole-program fatigue.f90       : 6.301u 0.003s 0:06.30
 gfc -Ofast -fwhole-program -flto fatigue.f90 : 6.263u 0.003s 0:06.26

 However if I play with --param max-inline-insns-auto=*, I get

 gfc -Ofast -fwhole-program --param max-inline-insns-auto=124 -fstack-arrays 
 fatigue.f90 : 4.870u 0.002s 0:04.87
 gfc -Ofast -fwhole-program --param max-inline-insns-auto=125 -fstack-arrays 
 fatigue.f90 : 2.872u 0.002s 0:02.87

 and

 gfc -Ofast -fwhole-program -flto --param max-inline-insns-auto=515 
 -fstack-arrays fatigue.f90 : 4.965u 0.003s 0:04.97
 gfc -Ofast -fwhole-program -flto --param max-inline-insns-auto=516 
 -fstack-arrays fatigue.f90 : 2.732u 0.002s 0:02.73

 while I get the same threshold=125 with/without -flto at revision 172429.
 Note that I get the same thresholds without -fstack-arrays, the run times
 are only larger.

 Thanks for notice.   This was not really expected, but seems to give some
 insight.  I just tested a new cleanup patch of mine where I fixed few minor
 bugs in side corners.  One of those bugs I noticed was introduced by this 
 patch
 (an overlook while converting the code to new accesor).

 In case of nested inlining, the stack usage got misaccounted and consequently
 we allowed more inlining than --param large-stack-frame-growth would allow 
 normally.
 The vortex and wupwise improvement seems to be gone, so I think they are due 
 to this
 issue.

 I never really tuned the stack frame growth heuristics since it did not cause 
 any problems
 in the benchmarks. On fortran this is quite different because of the large 
 i/o blocks
 hitting it very commonly, so I will look into making it more permissive.  We 
 definitely
 can just bump up the limits and/or we can also teach it that if call 
 dominates the return
 there is not really much to save of stack usage by preventing inlining since 
 both stack
 frames will wind up on the stack anyway.

I think Micha has a fix for the I/O block issue.

Richard.

 This means adding new bit whether call edge dominate exit and using this 
 info. Also simple
 noreturn IPA discovery can be based on this and I recently noticed it might 
 be important
 for Mozilla. So I will give it a try soonish.

 I will also look into the estimate_size ICE reported today.

 Honza



Re: More of ipa-inline housekeeping

2011-04-17 Thread Jan Hubicka
 
  I never really tuned the stack frame growth heuristics since it did not 
  cause any problems
  in the benchmarks. On fortran this is quite different because of the large 
  i/o blocks
  hitting it very commonly, so I will look into making it more permissive.  
  We definitely
  can just bump up the limits and/or we can also teach it that if call 
  dominates the return
  there is not really much to save of stack usage by preventing inlining 
  since both stack
  frames will wind up on the stack anyway.
 
 I think Micha has a fix for the I/O block issue.

Good, how he fixed this?
In any case I sort of like the idea of tracking whether function call happens 
always and
bypassing limits in that case.  There seems to be cases of functions called 
once that
are not inlined for this reason and there is no danger for doing so.  I saw 
that in combine.c
dump as well as in the ac polyhedron benchmark I just looked that.  In case of 
combine.c it is
definitely becuase of standard non-scalar local vars.

Honza
 
 Richard.
 
  This means adding new bit whether call edge dominate exit and using this 
  info. Also simple
  noreturn IPA discovery can be based on this and I recently noticed it might 
  be important
  for Mozilla. So I will give it a try soonish.
 
  I will also look into the estimate_size ICE reported today.
 
  Honza
 


Re: More of ipa-inline housekeeping

2011-04-15 Thread Jan Hubicka
 
  I fixed this on the and added sanity check that the fields are initialized.
  This has shown problem with early inliner iteration fixed thusly and fact 
  that
  early inliner is attempting to compute overall growth at a time the inline
  parameters are not computed for functions not visited by early optimizations
  yet. We previously agreed that early inliner should not try to do that (as 
  this
  leads to early inliner inlining functions called once that should be 
  deferred
  for later consieration).  I just hope it won't cause benchmarks to
  regress too much ;)
 
 Yeah, we agreed to that.  And I forgot about it as it wasn't part of the
 early inliner reorg (which was supposed to be a 1:1 transform).

Today C++ results shows some regressions, but nothing earthshaking.  So I think 
it is good
idea to drop this feature of early inliner since it is not really systematic.
There is also great improvement on LTO SPEC2000, but I tend to hope it is 
unrelated change.
Perhaps your aliasing?

Honza


Re: More of ipa-inline housekeeping

2011-04-15 Thread Richard Guenther
2011/4/15 Jan Hubicka hubi...@ucw.cz:
 
  I fixed this on the and added sanity check that the fields are initialized.
  This has shown problem with early inliner iteration fixed thusly and fact 
  that
  early inliner is attempting to compute overall growth at a time the inline
  parameters are not computed for functions not visited by early 
  optimizations
  yet. We previously agreed that early inliner should not try to do that (as 
  this
  leads to early inliner inlining functions called once that should be 
  deferred
  for later consieration).  I just hope it won't cause benchmarks to
  regress too much ;)

 Yeah, we agreed to that.  And I forgot about it as it wasn't part of the
 early inliner reorg (which was supposed to be a 1:1 transform).

 Today C++ results shows some regressions, but nothing earthshaking.  So I 
 think it is good
 idea to drop this feature of early inliner since it is not really systematic.
 There is also great improvement on LTO SPEC2000, but I tend to hope it is 
 unrelated change.
 Perhaps your aliasing?

I doubt SPEC2k uses VLAs or alloca, does it?  Might be the DSE
improvements, but I'm not sure.

Richard.

 Honza



Re: More of ipa-inline housekeeping

2011-04-15 Thread Dominique Dhumieres
I have forgotten to say that the 125 and 516 thresholds are for
x86_64-apple-darwin10. On powerpc-apple-darwin9 they are
repsectively 172 and 638.

Dominique


More of ipa-inline housekeeping

2011-04-13 Thread Jan Hubicka
Hi,
this patch moves inline_summary from field in cgraph_node into its own on side
datastructure. This moves it from arcane decision of mine to split all IPA data
into global/local datas stored in common datastructure into the scheme we
developed for new IPA passes some time ago.

The advantage is that the code is more contained and less spread across the
compiler. We also make cgraph_node smaller and dumps more compact that never
hurts.

While working on it I noticed that Richi's patch to introduce cgraph_edge
times/sizes is bit iffy in computing data when they are missing in the
datastructure. Also it computes incomming edge costs instead of outgoing that
leads to fact that not all edges gets their info computed for IPA inliner
(think of newly discovered direct calls or IPA merging).

I fixed this on the and added sanity check that the fields are initialized.
This has shown problem with early inliner iteration fixed thusly and fact that
early inliner is attempting to compute overall growth at a time the inline
parameters are not computed for functions not visited by early optimizations
yet. We previously agreed that early inliner should not try to do that (as this
leads to early inliner inlining functions called once that should be deferred
for later consieration).  I just hope it won't cause benchmarks to
regress too much ;)

Having place to pile inline analysis info in, there is more to cleanup. The
cgraph_local/cgraph_global fields probably should go and the stuff from global
info should go into inline_summary datastructure, too (the lifetimes are
essentially the same so there is no need for the split).  I will handle this
incrementally.

Bootstrapped/regtested x86_64-linux with slightly modified version of the patch.
Re-testing with final version and intend to commit the patch tomorrow.

Honza

* cgraph.c (dump_cgraph_node): Do not dump inline summaries.
* cgraph.h (struct inline_summary): Move to ipa-inline.h
(cgraph_local_info): Remove inline_summary.
* ipa-cp.c: Include ipa-inline.h.
(ipcp_cloning_candidate_p, ipcp_estimate_growth,
ipcp_estimate_cloning_cost, ipcp_insert_stage): Use inline_summary
accesor.
* lto-cgraph.c (lto_output_node): Do not stream inline summary.
(input_overwrite_node): Do not set inline summary.
(input_node): Do not stream inline summary.
* ipa-inline.c (cgraph_decide_inlining): Dump inline summaries.
(cgraph_decide_inlining_incrementally): Do not try to estimate overall
growth; we do not have inline parameters computed for that anyway.
(cgraph_early_inlining): After inlining compute call_stmt_sizes.
* ipa-inline.h (struct inline_summary): Move here from ipa-inline.h
(inline_summary_t): New type and VECtor.
(debug_inline_summary, dump_inline_summaries): Declare.
(inline_summary): Use VOCtor.
(estimate_edge_growth): Kill hack computing call stmt size directly.
* lto-section-in.c (lto_section_name): Add inline section.
* ipa-inline-analysis.c: Include lto-streamer.h
(node_removal_hook_holder, node_duplication_hook_holder): New holders
(inline_node_removal_hook, inline_node_duplication_hook): New functions.
(inline_summary_vec): Define.
(inline_summary_alloc, dump_inline_summary, debug_inline_summary,
dump_inline_summaries): New functions.
(estimate_function_body_sizes): Properly compute size/time of outgoing 
calls.
(compute_inline_parameters): Alloc inline_summary; do not compute 
size/time
of incomming calls.
(estimate_edge_time): Avoid missing time summary hack.
(inline_read_summary): Read inline summary info.
(inline_write_summary): Write inline summary info.
(inline_free_summary): Free all hooks and inline summary vector.
* lto-streamer.h: Add LTO_section_inline_summary section.
* Makefile.in (ipa-cp.o, ipa-inline-analysis.o): Update dependencies.
* ipa.c (cgraph_remove_unreachable_nodes): Fix dump file formating.

* lto.c: Include ipa-inline.h
(add_cgraph_node_to_partition, undo_partition): Use inline_summary 
accessor.
(ipa_node_duplication_hook): Fix declaration.
* Make-lang.in (lto.o): Update dependencies.
Index: cgraph.c
===
--- cgraph.c(revision 172396)
+++ cgraph.c(working copy)
@@ -1876,22 +1876,6 @@ dump_cgraph_node (FILE *f, struct cgraph
   if (node-count)
 fprintf (f,  executed HOST_WIDEST_INT_PRINT_DECx,
 (HOST_WIDEST_INT)node-count);
-  if (node-local.inline_summary.self_time)
-fprintf (f,  %i time, %i benefit, node-local.inline_summary.self_time,
-   
node-local.inline_summary.time_inlining_benefit);
-  if (node-global.time  node-global.time
-  != node-local.inline_summary.self_time)
-fprintf (f,  (%i after