Re: Fix IPA profile updates in inlining and ipa-split

2017-11-18 Thread Jan Hubicka
> On 2017.11.18 at 00:39 +0100, Jan Hubicka wrote:
> > Hi,
> > this patch fixes remaining IPA profile updating issues I am aware of.  With
> > this change there is 1 profile mismatch after inlining for tramp3d -Ofast
> > and 112 of them in .optimized dump which is about 10 times less than before.
> > I did not inspect them all but they seems mostly justified and not very
> > important.
> > 
> > First patch adds new profile quality stage when global profile is 0 but not
> > known precisely.  This is useful for bb partitioning where we do not want to
> > move adjusted 0 to cold sections.
> > 
> > Second the patch adds little infrastructure for turning IPA info to local
> > info (profile_count::combine_with_ipa_count) and commonizes all places that
> > do this kind of operation.
> > 
> > Finally it fixes profile updating bug in ipa-split where entry and return
> > blocks got wrong profile.
> > 
> > Bootstrapped/regtested x86_64-linux. I am profilebootstrapping overnight
> > and plan to commit tomorrow.
> 
> This fixes the tramp3d compile time regression with LTO/PGO bootstrapped
> gcc, that I have reported earlier. 
> In fact gcc-8 now compiles tramp3d-v4 ~8.5% faster than gcc-7.
> 
> The patch also fixes PR83037 and PR83039.

Thanks, I will commit it soonish (my overnight profiledbootstrap with lto
died on disk space and I was on hike).

I still plan to check for more profile updating errors and do re-tunning of
inliner next, it is good to know there are fewer issues to look at. :)

Honza

> 
> -- 
> Markus


Re: Fix IPA profile updates in inlining and ipa-split

2017-11-18 Thread Markus Trippelsdorf
On 2017.11.18 at 00:39 +0100, Jan Hubicka wrote:
> Hi,
> this patch fixes remaining IPA profile updating issues I am aware of.  With
> this change there is 1 profile mismatch after inlining for tramp3d -Ofast
> and 112 of them in .optimized dump which is about 10 times less than before.
> I did not inspect them all but they seems mostly justified and not very
> important.
> 
> First patch adds new profile quality stage when global profile is 0 but not
> known precisely.  This is useful for bb partitioning where we do not want to
> move adjusted 0 to cold sections.
> 
> Second the patch adds little infrastructure for turning IPA info to local
> info (profile_count::combine_with_ipa_count) and commonizes all places that
> do this kind of operation.
> 
> Finally it fixes profile updating bug in ipa-split where entry and return
> blocks got wrong profile.
> 
> Bootstrapped/regtested x86_64-linux. I am profilebootstrapping overnight
> and plan to commit tomorrow.

This fixes the tramp3d compile time regression with LTO/PGO bootstrapped
gcc, that I have reported earlier. 
In fact gcc-8 now compiles tramp3d-v4 ~8.5% faster than gcc-7.

The patch also fixes PR83037 and PR83039.

-- 
Markus


Fix IPA profile updates in inlining and ipa-split

2017-11-17 Thread Jan Hubicka
Hi,
this patch fixes remaining IPA profile updating issues I am aware of.  With
this change there is 1 profile mismatch after inlining for tramp3d -Ofast
and 112 of them in .optimized dump which is about 10 times less than before.
I did not inspect them all but they seems mostly justified and not very
important.

First patch adds new profile quality stage when global profile is 0 but not
known precisely.  This is useful for bb partitioning where we do not want to
move adjusted 0 to cold sections.

Second the patch adds little infrastructure for turning IPA info to local
info (profile_count::combine_with_ipa_count) and commonizes all places that
do this kind of operation.

Finally it fixes profile updating bug in ipa-split where entry and return
blocks got wrong profile.

Bootstrapped/regtested x86_64-linux. I am profilebootstrapping overnight
and plan to commit tomorrow.

* cgraphclones.c (cgraph_edge::clone): Rename gcov_count to prof_count.
(cgraph_edge::clone): Cleanup updating of profile.
* ipa-cp.c (update_profiling_info): Likewise.
* ipa-inline-transform.c (inline_transform): Likewise.
* ipa-inline.c (inline_small_functions): Add missing space to dump.
* ipa-split.c (execute_split_functions): Do not split when function
is cold.
* predict.c (estimate_bb_frequencies): Cleanup updating of profile.
* profile-count.c (profile_count::dump): Add global0.
(profile_count::to_cgraph_frequency): Do not ICE when entry is
undefined.
(profile_count::to_sreal_scale): Likewise.
(profile_count::adjust_for_ipa_scaling): Fix typo in comment.
(profile_count::combine_with_ipa_count): New function.
* profile-count.h (profile_guessed_global0adjusted): New.
(profile_count::adjusted_zero): New.
(profile_count::global0adjusted): New.
(profile_count::combine_with_ipa_count): New.
* tree-inline.c (copy_edges_for_bb): Add NUM/DEN arugment;
correct profile of return block of split functions.
(copy_cfg_body): Remove unused profile_count.
(copy_body): Likewise.
(expand_call_inline): Update.
(tree_function_versioning): Update.

Index: cgraphclones.c
===
--- cgraphclones.c  (revision 254888)
+++ cgraphclones.c  (working copy)
@@ -91,7 +91,7 @@ cgraph_edge::clone (cgraph_node *n, gcal
 {
   cgraph_edge *new_edge;
   profile_count::adjust_for_ipa_scaling (, );
-  profile_count gcov_count = count.apply_scale (num, den);
+  profile_count prof_count = count.apply_scale (num, den);
 
   if (indirect_unknown_callee)
 {
@@ -104,19 +104,19 @@ cgraph_edge::clone (cgraph_node *n, gcal
{
  cgraph_node *callee = cgraph_node::get (decl);
  gcc_checking_assert (callee);
- new_edge = n->create_edge (callee, call_stmt, gcov_count);
+ new_edge = n->create_edge (callee, call_stmt, prof_count);
}
   else
{
  new_edge = n->create_indirect_edge (call_stmt,
  indirect_info->ecf_flags,
- gcov_count, false);
+ prof_count, false);
  *new_edge->indirect_info = *indirect_info;
}
 }
   else
 {
-  new_edge = n->create_edge (callee, call_stmt, gcov_count);
+  new_edge = n->create_edge (callee, call_stmt, prof_count);
   if (indirect_info)
{
  new_edge->indirect_info
@@ -135,12 +135,9 @@ cgraph_edge::clone (cgraph_node *n, gcal
   new_edge->in_polymorphic_cdtor = in_polymorphic_cdtor;
 
   /* Update IPA profile.  Local profiles need no updating in original.  */
-  if (update_original
-  && count.ipa () == count && new_edge->count.ipa () == new_edge->count)
-count -= new_edge->count;
-  else if (caller->count.global0 () == caller->count
-  && !(count == profile_count::zero ()))
-count = count.global0 ();
+  if (update_original)
+count = count.combine_with_ipa_count (count.ipa () 
+ - new_edge->count.ipa ());
   symtab->call_edge_duplication_hooks (this, new_edge);
   return new_edge;
 }
@@ -431,22 +428,12 @@ cgraph_node::create_clone (tree new_decl
   if (new_inlined_to)
 dump_callgraph_transformation (this, new_inlined_to, "inlining to");
 
-  if (prof_count == profile_count::zero ()
-  && !(count == profile_count::zero ()))
-prof_count = count.global0 ();
-
+  prof_count = count.combine_with_ipa_count (prof_count);
   new_node->count = prof_count;
 
   /* Update IPA profile.  Local profiles need no updating in original.  */
-  if (update_original && !(count == profile_count::zero ())
-  && count.ipa () == count && prof_count.ipa () == prof_count)
-{
-  if (count.nonzero_p ()
- && !(count - prof_count).nonzero_p ())
-   count = count.global0 ();
-