Re: Added better handling of outdated profile data. (issue 5989046)

2012-07-09 Thread asharif

On 2012/04/24 22:14:17, asharif wrote:

Ping?


Ping.

I also filed a bug here:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53547



Here is the formatted ChangeLog in case you want a summary of what I

did:


* gcc/ipa-inline.c (edge_badness): Make sure profile is valid before
using it to compute badness.
* gcc/predict.c (maybe_hot_frequency_p): Ditto.
(cgraph_maybe_hot_edge_p): Ditto.
(maybe_hot_edge_p): Ditto.
(probably_never_executed_bb_p): Ditto.
(compute_function_frequency): Ditto.
* gcc/profile.c (compute_branch_probabilities): Return without setting
the profile read flag if get_exec_counts returns NULL.
* gcc/tree.h: Added macro for accessing profile status.



The patch is in the original email and on the codereview issue
(http://codereview.appspot.com/5989046/).



http://codereview.appspot.com/5989046/


Re: Added better handling of outdated profile data. (issue 5989046)

2012-07-09 Thread Steven Bosscher
On Mon, Jul 9, 2012 at 3:23 PM,  asha...@chromium.org wrote:
 I also filed a bug here:
 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53547

Why are you compiling from a different source code base with
profile-use than with profile-generate? I'm surprised this works at
all, I would have expected GCC to just bail out.

The patch at the codereview link doesn't work for me, it gives errors
about broken chunks.

Some general comments on the patch:

You should make your patch GCC code style conforming. That means
DECL_READ_PROFILE_P (blah) i.e. with the space.

You should also add a comment in ipa-inline.c why the extra
DECL_READ_PROFILE_P tests are needed there. It's not immediately
obvious to me that it's even reasonable to expect things to work if
!DECL_READ_PROFILE_P. I think a warning, conditional on OPT_Winline or
OPT_Wdisabled_optimization, is also necessary for this kind of
work-arounds.

I would make this DECL_READ_PROFILE_P a static inline function in
profile.h. I don't think you should ever have a NULL
DECL_STRUCT_FUNCTION. Can you try with a gcc_assert in place for that,
or eliminate DECL_READ_PROFILE_P altogether and just use
profile_status_for_function directly?

But first of all, I think you should explain why you're changing the
source code between profile-generate and profile-use.

Ciao!
Steven


Re: Added better handling of outdated profile data. (issue 5989046)

2012-04-24 Thread asharif
Ping?

Here is the formatted ChangeLog in case you want a summary of what I did:

* gcc/ipa-inline.c (edge_badness): Make sure profile is valid before
using it to compute badness.
* gcc/predict.c (maybe_hot_frequency_p): Ditto.
(cgraph_maybe_hot_edge_p): Ditto.
(maybe_hot_edge_p): Ditto.
(probably_never_executed_bb_p): Ditto.
(compute_function_frequency): Ditto.
* gcc/profile.c (compute_branch_probabilities): Return without setting
the profile read flag if get_exec_counts returns NULL.
* gcc/tree.h: Added macro for accessing profile status.

The patch is in the original email and on the codereview issue
(http://codereview.appspot.com/5989046/).


Re: Added better handling of outdated profile data. (issue 5989046)

2012-04-10 Thread asharif

On 2012/04/05 18:53:28, asharif wrote:

Jan, please take a look and provide some feedback.



Thanks,


Ping?

http://codereview.appspot.com/5989046/


Added better handling of outdated profile data. (issue 5989046)

2012-04-05 Thread asharif

Reviewers: jh_suse.cz, davidxl, bjanakiraman_google.com,

Message:
Jan, please take a look and provide some feedback.

Thanks,

Description:
Added better handling of outdated profiles.

gcc dumps profile information in .gcda files that contain the function
id and
the corresponding counter information. The function cfg and line number
checksums are also stored along with the ids. When the program source
changes
between when the instrumentation run is done vs. when the -fprofile-use
is
passed, gcc will have functions that do not have a valid profile. These
functions could be hot and may not be inlined because the summary of the
profile
is still valid (sum_max, etc.). This can lead to a situation where the
performance with -fprofile-use is lower than with -fno-profile-use.

It is easy to have a situation where the profile is outdated. A single
line
change in the header file can potentially renumber all functions and
none of the
ids will match the profile generated from the old source.

This patch makes the performance degradation in the outdated case much
less
severe. It adds checks that the profile has been read for a certain
function
before using the counter values (which are 0 in the case where the
profile is
invalid).

Without this patch we see up to 15% performance degradation by using
outdated
profiles vs. not using any profile information for Chrome. This patch
decreases
that degradation down to (0% to 4%).


Please review this at http://codereview.appspot.com/5989046/

Affected files:
  M gcc/ipa-inline.c
  M gcc/predict.c
  M gcc/profile.c
  M gcc/tree.h


From 88d8094b6408b38d05241ecf574a51edabeabb3a Mon Sep 17 00:00:00 2001
From: Ahmad Sharif asha...@chromium.org
Date: Wed, 4 Apr 2012 19:09:23 -0700
Subject: [PATCH] Added better handling of outdated profiles.

gcc dumps profile information in .gcda files that contain the function id  
and

the corresponding counter information. The function cfg and line number
checksums are also stored along with the ids. When the program source  
changes

between when the instrumentation run is done vs. when the -fprofile-use is
passed, gcc will have functions that do not have a valid profile. These
functions could be hot and may not be inlined because the summary of the  
profile

is still valid (sum_max, etc.). This can lead to a situation where the
performance with -fprofile-use is lower than with -fno-profile-use.

It is easy to have a situation where the profile is outdated. A single line
change in the header file can potentially renumber all functions and none  
of the

ids will match the profile generated from the old source.

This patch makes the performance degradation in the outdated case much less
severe. It adds checks that the profile has been read for a certain function
before using the counter values (which are 0 in the case where the profile  
is

invalid).

Without this patch we see up to 15% performance degradation by using  
outdated
profiles vs. not using any profile information for Chrome. This patch  
decreases

that degradation down to (0% to 4%).
---
 gcc/ipa-inline.c |4 +++-
 gcc/predict.c|   13 +
 gcc/profile.c|2 +-
 gcc/tree.h   |5 +
 4 files changed, 18 insertions(+), 6 deletions(-)

Index: gcc/ipa-inline.c
diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c
index d7ccf68..fbc6c4a 100644
--- a/gcc/ipa-inline.c
+++ b/gcc/ipa-inline.c
@@ -780,7 +780,9 @@ edge_badness (struct cgraph_edge *edge, bool dump)
 The fraction is upside down, becuase on edge counts and time beneits
 the bounds are known. Edge growth is essentially unlimited.  */

-  else if (max_count)
+  else if (max_count
+DECL_READ_PROFILE_P(edge-caller-decl)
+DECL_READ_PROFILE_P(edge-callee-decl))
 {
   int relbenefit = relative_time_benefit (callee_info, edge,  
time_growth);

   badness =
Index: gcc/predict.c
diff --git a/gcc/predict.c b/gcc/predict.c
index c12b45f..94af954 100644
--- a/gcc/predict.c
+++ b/gcc/predict.c
@@ -114,7 +114,8 @@ static inline bool
 maybe_hot_frequency_p (int freq)
 {
   struct cgraph_node *node = cgraph_get_node (current_function_decl);
-  if (!profile_info || !flag_branch_probabilities)
+  if (!profile_info || !flag_branch_probabilities
+  || !DECL_READ_PROFILE_P(current_function_decl))
 {
   if (node-frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED)
 return false;
@@ -162,6 +163,7 @@ bool
 cgraph_maybe_hot_edge_p (struct cgraph_edge *edge)
 {
   if (profile_info  flag_branch_probabilities
+   DECL_READ_PROFILE_P(edge-caller-decl)
(edge-count
  = profile_info-sum_max / PARAM_VALUE (HOT_BB_COUNT_FRACTION)))
 return false;
@@ -202,9 +204,11 @@ maybe_hot_edge_p (edge e)
 bool
 probably_never_executed_bb_p (const_basic_block bb)
 {
-  if (profile_info  flag_branch_probabilities)
+  if (profile_info  flag_branch_probabilities
+   DECL_READ_PROFILE_P(current_function_decl))
 return ((bb-count + profile_info-runs / 2) /