Re: [PATCH 1/8] perf, tools: Support handling complete branch stacks as histograms

2014-11-17 Thread Andi Kleen
> > I considered this. For this example it doesn't make much difference
> > because the functions are so small.
> >
> > But for anything larger I really need the line numbers to make
> > sense of it. 
> >
> > So I prefer to keep them. I'll look into some easy switch
> > to turn them off though.
> 
> Oh, I'm not just removing line numbers - it also removed duplicates (f1
> and f2).  But having both from/to entries, I'm not sure it's worth tho..

The duplicate removal is only for the LBRs. I think it's a sensible
default there.

What would be nice in the future would be to add some kind
of annotation support to the hist entries, so we could say
"removed N iterations" and display it (and possibly some more
LBR information, like mispredict rate). But that's more
work and definitely would be a new patchkit.

> 
> 
> >> > +if (sort__has_parent && !*parent &&
> >> > +symbol__match_regex(al.sym, _regex))
> >> > +*parent = al.sym;
> >> > +else if (have_ignore_callees && root_al &&
> >> > +  symbol__match_regex(al.sym, _callees_regex)) {
> >> > +/* Treat this symbol as the root,
> >> > +   forgetting its callees. */
> >> > +*root_al = al;
> >> > +callchain_cursor_reset(_cursor);
> >> > +}
> >> > +if (!symbol_conf.use_callchain)
> >> > +return -EINVAL;
> >> 
> >> This check already went away.
> >> 
> >> And, to remove duplicates, I think we need to check last callchain
> >> cursor node wrt the callchain_param.key here.
> >
> > I don't understand the comment. I'm not modifying anything
> > that has been already added to the callchain. Just things
> > to be added in the future. So why would I need to check
> > or change the cursor?
> 
> But didn't you already do it (with ips[first_call]) to remove overlaps
> between LBR and normal callchain?

I added the LBRs, but i didn't add the normal call entries yet.


> 
> 
> >
> >> 
> >> Also, by comparing 'from' address, I'd expect you add the from address
> >> alone but you add both of 'from' and 'to'.  Do we really need to do
> >> that?
> >
> > Adding from and to makes it much clearer to the user what happens,
> > especially with conditional branches, so they can follow the 
> > control flow.
> 
> But it could be confusing too - esp. when it moves from LBR to normal
> callchains?  Hmm.. maybe we can print them bit differently.

Yes that would be nice.

> 
> 
> >
> >
> >> And the first address saved in normal callchain is address of the
> >> function itself so it might be 'to' you need to check if sampled before
> >> any branch in a function.
> >
> > I'm checking against the CALL, not the target.
> 
> Yeah, but I'm afraid that it'd always fail to find a match.

It seems to work as far as I can tell.
> >> > +err = add_callchain_ip(machine, thread,
> >> > +   parent, root_al,
> >> > +   -1, be[i].from);
> >> > +if (err == -EINVAL)
> >> > +break;
> >> > +if (err)
> >> > +return err;
> >> > +}
> >> > +chain_nr -= nr;
> >> 
> >> I'm not sure this line is needed.
> >
> > Without that i could exceed the limit.
> 
> What limit?

The limit of max history entries.

-Andi
-- 
a...@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/8] perf, tools: Support handling complete branch stacks as histograms

2014-11-17 Thread Andi Kleen
  I considered this. For this example it doesn't make much difference
  because the functions are so small.
 
  But for anything larger I really need the line numbers to make
  sense of it. 
 
  So I prefer to keep them. I'll look into some easy switch
  to turn them off though.
 
 Oh, I'm not just removing line numbers - it also removed duplicates (f1
 and f2).  But having both from/to entries, I'm not sure it's worth tho..

The duplicate removal is only for the LBRs. I think it's a sensible
default there.

What would be nice in the future would be to add some kind
of annotation support to the hist entries, so we could say
removed N iterations and display it (and possibly some more
LBR information, like mispredict rate). But that's more
work and definitely would be a new patchkit.

 
 
   +if (sort__has_parent  !*parent 
   +symbol__match_regex(al.sym, parent_regex))
   +*parent = al.sym;
   +else if (have_ignore_callees  root_al 
   +  symbol__match_regex(al.sym, ignore_callees_regex)) {
   +/* Treat this symbol as the root,
   +   forgetting its callees. */
   +*root_al = al;
   +callchain_cursor_reset(callchain_cursor);
   +}
   +if (!symbol_conf.use_callchain)
   +return -EINVAL;
  
  This check already went away.
  
  And, to remove duplicates, I think we need to check last callchain
  cursor node wrt the callchain_param.key here.
 
  I don't understand the comment. I'm not modifying anything
  that has been already added to the callchain. Just things
  to be added in the future. So why would I need to check
  or change the cursor?
 
 But didn't you already do it (with ips[first_call]) to remove overlaps
 between LBR and normal callchain?

I added the LBRs, but i didn't add the normal call entries yet.


 
 
 
  
  Also, by comparing 'from' address, I'd expect you add the from address
  alone but you add both of 'from' and 'to'.  Do we really need to do
  that?
 
  Adding from and to makes it much clearer to the user what happens,
  especially with conditional branches, so they can follow the 
  control flow.
 
 But it could be confusing too - esp. when it moves from LBR to normal
 callchains?  Hmm.. maybe we can print them bit differently.

Yes that would be nice.

 
 
 
 
  And the first address saved in normal callchain is address of the
  function itself so it might be 'to' you need to check if sampled before
  any branch in a function.
 
  I'm checking against the CALL, not the target.
 
 Yeah, but I'm afraid that it'd always fail to find a match.

It seems to work as far as I can tell.
   +err = add_callchain_ip(machine, thread,
   +   parent, root_al,
   +   -1, be[i].from);
   +if (err == -EINVAL)
   +break;
   +if (err)
   +return err;
   +}
   +chain_nr -= nr;
  
  I'm not sure this line is needed.
 
  Without that i could exceed the limit.
 
 What limit?

The limit of max history entries.

-Andi
-- 
a...@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/8] perf, tools: Support handling complete branch stacks as histograms

2014-11-16 Thread Namhyung Kim
Hi Andi,

On Wed, 12 Nov 2014 00:31:53 +0100, Andi Kleen wrote:
> Sorry for the long delay. Just revisiting that.
>
> On Wed, Oct 22, 2014 at 10:03:51AM +0900, Namhyung Kim wrote:
>> > |  |  f2 tcall.c:5
>> > |  |  f1 tcall.c:12
>> > |  |  f1 tcall.c:12
>> > |  |  f2 tcall.c:7
>> > |  |  f2 tcall.c:5
>> > |  |  f1 tcall.c:11
>> 
>> I think it'd be better if it just prints function names as normal
>> callchain does (and optionally srcline with a switch) and duplicates
>> removed like below:
>> 
>>  54.91%  tcall.c:6  [.] f2  tcall
>>  |
>>  |--65.53%-- f2 tcall.c:5
>>  |  |
>>  |  |--70.83%-- f1
>>  |  |  main
>>  |  |  f1
>>  |  |  f2
>>  |  |  f1
>>  |  |  f2
>
> I considered this. For this example it doesn't make much difference
> because the functions are so small.
>
> But for anything larger I really need the line numbers to make
> sense of it. 
>
> So I prefer to keep them. I'll look into some easy switch
> to turn them off though.

Oh, I'm not just removing line numbers - it also removed duplicates (f1
and f2).  But having both from/to entries, I'm not sure it's worth tho..


>
>
>> > +  if (sort__has_parent && !*parent &&
>> > +  symbol__match_regex(al.sym, _regex))
>> > +  *parent = al.sym;
>> > +  else if (have_ignore_callees && root_al &&
>> > +symbol__match_regex(al.sym, _callees_regex)) {
>> > +  /* Treat this symbol as the root,
>> > + forgetting its callees. */
>> > +  *root_al = al;
>> > +  callchain_cursor_reset(_cursor);
>> > +  }
>> > +  if (!symbol_conf.use_callchain)
>> > +  return -EINVAL;
>> 
>> This check already went away.
>> 
>> And, to remove duplicates, I think we need to check last callchain
>> cursor node wrt the callchain_param.key here.
>
> I don't understand the comment. I'm not modifying anything
> that has been already added to the callchain. Just things
> to be added in the future. So why would I need to check
> or change the cursor?

But didn't you already do it (with ips[first_call]) to remove overlaps
between LBR and normal callchain?


>
>> 
>> Also, by comparing 'from' address, I'd expect you add the from address
>> alone but you add both of 'from' and 'to'.  Do we really need to do
>> that?
>
> Adding from and to makes it much clearer to the user what happens,
> especially with conditional branches, so they can follow the 
> control flow.

But it could be confusing too - esp. when it moves from LBR to normal
callchains?  Hmm.. maybe we can print them bit differently.


>
>
>> And the first address saved in normal callchain is address of the
>> function itself so it might be 'to' you need to check if sampled before
>> any branch in a function.
>
> I'm checking against the CALL, not the target.

Yeah, but I'm afraid that it'd always fail to find a match.


>
>> 
>> > +  } else
>> > +  be[i] = branch->entries[branch->nr - i - 1];
>> > +  }
>> > +
>> > +  nr = remove_loops(be, nr);
>> > +
>> > +  for (i = 0; i < nr; i++) {
>> > +  err = add_callchain_ip(machine, thread, parent,
>> > + root_al,
>> > + -1, be[i].to);
>> > +  if (!err)
>> > +  err = add_callchain_ip(machine, thread,
>> > + parent, root_al,
>> > + -1, be[i].from);
>> > +  if (err == -EINVAL)
>> > +  break;
>> > +  if (err)
>> > +  return err;
>> > +  }
>> > +  chain_nr -= nr;
>> 
>> I'm not sure this line is needed.
>
> Without that i could exceed the limit.

What limit?

Let's say there's a callchains and LBR records below..

callchain:
  f1 <- f2 <- f3 <- f4 <- f5

LBR
  (f1<-f1) <- (f1<-f2)

So two entries are matched, we have nr = 2, first_call = 2 and chain_nr
= 5 right?  So IIUC above code will print callchains like this:

  - f1
  - f1
  - f1
  - f2
  - f3

while I expect below (with duplicates for now):

  - f1
  - f1
  - f1
  - f2
  - f3
  - f4
  - f5

Do I miss something?

Thanks,
Namhyung
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/8] perf, tools: Support handling complete branch stacks as histograms

2014-11-16 Thread Namhyung Kim
Hi Andi,

On Wed, 12 Nov 2014 00:31:53 +0100, Andi Kleen wrote:
 Sorry for the long delay. Just revisiting that.

 On Wed, Oct 22, 2014 at 10:03:51AM +0900, Namhyung Kim wrote:
  |  |  f2 tcall.c:5
  |  |  f1 tcall.c:12
  |  |  f1 tcall.c:12
  |  |  f2 tcall.c:7
  |  |  f2 tcall.c:5
  |  |  f1 tcall.c:11
 
 I think it'd be better if it just prints function names as normal
 callchain does (and optionally srcline with a switch) and duplicates
 removed like below:
 
  54.91%  tcall.c:6  [.] f2  tcall
  |
  |--65.53%-- f2 tcall.c:5
  |  |
  |  |--70.83%-- f1
  |  |  main
  |  |  f1
  |  |  f2
  |  |  f1
  |  |  f2

 I considered this. For this example it doesn't make much difference
 because the functions are so small.

 But for anything larger I really need the line numbers to make
 sense of it. 

 So I prefer to keep them. I'll look into some easy switch
 to turn them off though.

Oh, I'm not just removing line numbers - it also removed duplicates (f1
and f2).  But having both from/to entries, I'm not sure it's worth tho..




  +  if (sort__has_parent  !*parent 
  +  symbol__match_regex(al.sym, parent_regex))
  +  *parent = al.sym;
  +  else if (have_ignore_callees  root_al 
  +symbol__match_regex(al.sym, ignore_callees_regex)) {
  +  /* Treat this symbol as the root,
  + forgetting its callees. */
  +  *root_al = al;
  +  callchain_cursor_reset(callchain_cursor);
  +  }
  +  if (!symbol_conf.use_callchain)
  +  return -EINVAL;
 
 This check already went away.
 
 And, to remove duplicates, I think we need to check last callchain
 cursor node wrt the callchain_param.key here.

 I don't understand the comment. I'm not modifying anything
 that has been already added to the callchain. Just things
 to be added in the future. So why would I need to check
 or change the cursor?

But didn't you already do it (with ips[first_call]) to remove overlaps
between LBR and normal callchain?



 
 Also, by comparing 'from' address, I'd expect you add the from address
 alone but you add both of 'from' and 'to'.  Do we really need to do
 that?

 Adding from and to makes it much clearer to the user what happens,
 especially with conditional branches, so they can follow the 
 control flow.

But it could be confusing too - esp. when it moves from LBR to normal
callchains?  Hmm.. maybe we can print them bit differently.




 And the first address saved in normal callchain is address of the
 function itself so it might be 'to' you need to check if sampled before
 any branch in a function.

 I'm checking against the CALL, not the target.

Yeah, but I'm afraid that it'd always fail to find a match.



 
  +  } else
  +  be[i] = branch-entries[branch-nr - i - 1];
  +  }
  +
  +  nr = remove_loops(be, nr);
  +
  +  for (i = 0; i  nr; i++) {
  +  err = add_callchain_ip(machine, thread, parent,
  + root_al,
  + -1, be[i].to);
  +  if (!err)
  +  err = add_callchain_ip(machine, thread,
  + parent, root_al,
  + -1, be[i].from);
  +  if (err == -EINVAL)
  +  break;
  +  if (err)
  +  return err;
  +  }
  +  chain_nr -= nr;
 
 I'm not sure this line is needed.

 Without that i could exceed the limit.

What limit?

Let's say there's a callchains and LBR records below..

callchain:
  f1 - f2 - f3 - f4 - f5

LBR
  (f1-f1) - (f1-f2)

So two entries are matched, we have nr = 2, first_call = 2 and chain_nr
= 5 right?  So IIUC above code will print callchains like this:

  - f1
  - f1
  - f1
  - f2
  - f3

while I expect below (with duplicates for now):

  - f1
  - f1
  - f1
  - f2
  - f3
  - f4
  - f5

Do I miss something?

Thanks,
Namhyung
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/8] perf, tools: Support handling complete branch stacks as histograms

2014-11-11 Thread Andi Kleen
Sorry for the long delay. Just revisiting that.

On Wed, Oct 22, 2014 at 10:03:51AM +0900, Namhyung Kim wrote:
> > |  |  f2 tcall.c:5
> > |  |  f1 tcall.c:12
> > |  |  f1 tcall.c:12
> > |  |  f2 tcall.c:7
> > |  |  f2 tcall.c:5
> > |  |  f1 tcall.c:11
> 
> I think it'd be better if it just prints function names as normal
> callchain does (and optionally srcline with a switch) and duplicates
> removed like below:
> 
>  54.91%  tcall.c:6  [.] f2  tcall
>  |
>  |--65.53%-- f2 tcall.c:5
>  |  |
>  |  |--70.83%-- f1
>  |  |  main
>  |  |  f1
>  |  |  f2
>  |  |  f1
>  |  |  f2

I considered this. For this example it doesn't make much difference
because the functions are so small.

But for anything larger I really need the line numbers to make
sense of it. 

So I prefer to keep them. I'll look into some easy switch
to turn them off though.


> > +   if (sort__has_parent && !*parent &&
> > +   symbol__match_regex(al.sym, _regex))
> > +   *parent = al.sym;
> > +   else if (have_ignore_callees && root_al &&
> > + symbol__match_regex(al.sym, _callees_regex)) {
> > +   /* Treat this symbol as the root,
> > +  forgetting its callees. */
> > +   *root_al = al;
> > +   callchain_cursor_reset(_cursor);
> > +   }
> > +   if (!symbol_conf.use_callchain)
> > +   return -EINVAL;
> 
> This check already went away.
> 
> And, to remove duplicates, I think we need to check last callchain
> cursor node wrt the callchain_param.key here.

I don't understand the comment. I'm not modifying anything
that has been already added to the callchain. Just things
to be added in the future. So why would I need to check
or change the cursor?

> 
> Also, by comparing 'from' address, I'd expect you add the from address
> alone but you add both of 'from' and 'to'.  Do we really need to do
> that?

Adding from and to makes it much clearer to the user what happens,
especially with conditional branches, so they can follow the 
control flow.


> And the first address saved in normal callchain is address of the
> function itself so it might be 'to' you need to check if sampled before
> any branch in a function.

I'm checking against the CALL, not the target.

> 
> > +   } else
> > +   be[i] = branch->entries[branch->nr - i - 1];
> > +   }
> > +
> > +   nr = remove_loops(be, nr);
> > +
> > +   for (i = 0; i < nr; i++) {
> > +   err = add_callchain_ip(machine, thread, parent,
> > +  root_al,
> > +  -1, be[i].to);
> > +   if (!err)
> > +   err = add_callchain_ip(machine, thread,
> > +  parent, root_al,
> > +  -1, be[i].from);
> > +   if (err == -EINVAL)
> > +   break;
> > +   if (err)
> > +   return err;
> > +   }
> > +   chain_nr -= nr;
> 
> I'm not sure this line is needed.

Without that i could exceed the limit.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/8] perf, tools: Support handling complete branch stacks as histograms

2014-11-11 Thread Andi Kleen
Sorry for the long delay. Just revisiting that.

On Wed, Oct 22, 2014 at 10:03:51AM +0900, Namhyung Kim wrote:
  |  |  f2 tcall.c:5
  |  |  f1 tcall.c:12
  |  |  f1 tcall.c:12
  |  |  f2 tcall.c:7
  |  |  f2 tcall.c:5
  |  |  f1 tcall.c:11
 
 I think it'd be better if it just prints function names as normal
 callchain does (and optionally srcline with a switch) and duplicates
 removed like below:
 
  54.91%  tcall.c:6  [.] f2  tcall
  |
  |--65.53%-- f2 tcall.c:5
  |  |
  |  |--70.83%-- f1
  |  |  main
  |  |  f1
  |  |  f2
  |  |  f1
  |  |  f2

I considered this. For this example it doesn't make much difference
because the functions are so small.

But for anything larger I really need the line numbers to make
sense of it. 

So I prefer to keep them. I'll look into some easy switch
to turn them off though.


  +   if (sort__has_parent  !*parent 
  +   symbol__match_regex(al.sym, parent_regex))
  +   *parent = al.sym;
  +   else if (have_ignore_callees  root_al 
  + symbol__match_regex(al.sym, ignore_callees_regex)) {
  +   /* Treat this symbol as the root,
  +  forgetting its callees. */
  +   *root_al = al;
  +   callchain_cursor_reset(callchain_cursor);
  +   }
  +   if (!symbol_conf.use_callchain)
  +   return -EINVAL;
 
 This check already went away.
 
 And, to remove duplicates, I think we need to check last callchain
 cursor node wrt the callchain_param.key here.

I don't understand the comment. I'm not modifying anything
that has been already added to the callchain. Just things
to be added in the future. So why would I need to check
or change the cursor?

 
 Also, by comparing 'from' address, I'd expect you add the from address
 alone but you add both of 'from' and 'to'.  Do we really need to do
 that?

Adding from and to makes it much clearer to the user what happens,
especially with conditional branches, so they can follow the 
control flow.


 And the first address saved in normal callchain is address of the
 function itself so it might be 'to' you need to check if sampled before
 any branch in a function.

I'm checking against the CALL, not the target.

 
  +   } else
  +   be[i] = branch-entries[branch-nr - i - 1];
  +   }
  +
  +   nr = remove_loops(be, nr);
  +
  +   for (i = 0; i  nr; i++) {
  +   err = add_callchain_ip(machine, thread, parent,
  +  root_al,
  +  -1, be[i].to);
  +   if (!err)
  +   err = add_callchain_ip(machine, thread,
  +  parent, root_al,
  +  -1, be[i].from);
  +   if (err == -EINVAL)
  +   break;
  +   if (err)
  +   return err;
  +   }
  +   chain_nr -= nr;
 
 I'm not sure this line is needed.

Without that i could exceed the limit.

-Andi
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/8] perf, tools: Support handling complete branch stacks as histograms

2014-10-21 Thread Namhyung Kim
Hi Andi,

On Fri, 26 Sep 2014 16:37:09 -0700, Andi Kleen wrote:
> From: Andi Kleen 
>
> Currently branch stacks can be only shown as edge histograms for
> individual branches. I never found this display particularly useful.
>
> This implements an alternative mode that creates histograms over complete
> branch traces, instead of individual branches, similar to how normal
> callgraphs are handled. This is done by putting it in
> front of the normal callgraph and then using the normal callgraph
> histogram infrastructure to unify them.
>
> This way in complex functions we can understand the control flow
> that lead to a particular sample, and may even see some control
> flow in the caller for short functions.
>
> Example (simplified, of course for such simple code this
> is usually not needed):
>
> tcall.c:
>
> volatile a = 1, b = 10, c;
>
> __attribute__((noinline)) f2()
> {
>   c = a / b;
> }
>
> __attribute__((noinline)) f1()
> {
>   f2();
>   f2();
> }
> main()
> {
>   int i;
>   for (i = 0; i < 100; i++)
>   f1();
> }
>
> % perf record -b -g ./tsrc/tcall
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.044 MB perf.data (~1923 samples) ]
> % perf report --branch-history
> ...
> 54.91%  tcall.c:6  [.] f2  tcall
> |
> |--65.53%-- f2 tcall.c:5
> |  |
> |  |--70.83%-- f1 tcall.c:11
> |  |  f1 tcall.c:10
> |  |  main tcall.c:18
> |  |  main tcall.c:18
> |  |  main tcall.c:17
> |  |  main tcall.c:17
> |  |  f1 tcall.c:13
> |  |  f1 tcall.c:13
> |  |  f2 tcall.c:7
> |  |  f2 tcall.c:5
> |  |  f1 tcall.c:12
> |  |  f1 tcall.c:12
> |  |  f2 tcall.c:7
> |  |  f2 tcall.c:5
> |  |  f1 tcall.c:11

I think it'd be better if it just prints function names as normal
callchain does (and optionally srcline with a switch) and duplicates
removed like below:

 54.91%  tcall.c:6  [.] f2  tcall
 |
 |--65.53%-- f2 tcall.c:5
 |  |
 |  |--70.83%-- f1
 |  |  main
 |  |  f1
 |  |  f2
 |  |  f1
 |  |  f2


[SNIP]
> +static int add_callchain_ip(struct machine *machine,
> + struct thread *thread,
> + struct symbol **parent,
> + struct addr_location *root_al,
> + int cpumode,
> + u64 ip)
> +{
> + struct addr_location al;
> +
> + al.filtered = 0;
> + al.sym = NULL;
> + if (cpumode == -1)
> + thread__find_cpumode_addr_location(thread, machine, 
> MAP__FUNCTION, ip, );
> + else
> + thread__find_addr_location(thread, machine, cpumode, 
> MAP__FUNCTION,
> +ip, );
> + if (al.sym != NULL) {
> + if (sort__has_parent && !*parent &&
> + symbol__match_regex(al.sym, _regex))
> + *parent = al.sym;
> + else if (have_ignore_callees && root_al &&
> +   symbol__match_regex(al.sym, _callees_regex)) {
> + /* Treat this symbol as the root,
> +forgetting its callees. */
> + *root_al = al;
> + callchain_cursor_reset(_cursor);
> + }
> + if (!symbol_conf.use_callchain)
> + return -EINVAL;

This check already went away.

And, to remove duplicates, I think we need to check last callchain
cursor node wrt the callchain_param.key here.


> + }
> +
> + return callchain_cursor_append(_cursor, ip, al.map, al.sym);
> +}
> +
> +#define CHASHSZ 127
> +#define CHASHBITS 7
> +#define NO_ENTRY 0xff
> +
> +#define PERF_MAX_BRANCH_DEPTH 127
> +
> +/* Remove loops. */
> +static int remove_loops(struct branch_entry *l, int nr)
> +{
> + int i, j, off;
> + unsigned char chash[CHASHSZ];
> + memset(chash, NO_ENTRY, sizeof(chash));
> +
> + BUG_ON(nr >= 256);
> + for (i = 0; i < nr; i++) {
> + int h = hash_64(l[i].from, CHASHBITS) % CHASHSZ;
> +
> + /* no collision handling for now */
> + if (chash[h] == NO_ENTRY) {
> + chash[h] = i;
> + } else if (l[chash[h]].from == l[i].from) {
> + bool is_loop = true;
> + /* check if it is a real loop */
> + off = 0;
> +  

Re: [PATCH 1/8] perf, tools: Support handling complete branch stacks as histograms

2014-10-21 Thread Namhyung Kim
Hi Andi,

On Fri, 26 Sep 2014 16:37:09 -0700, Andi Kleen wrote:
 From: Andi Kleen a...@linux.intel.com

 Currently branch stacks can be only shown as edge histograms for
 individual branches. I never found this display particularly useful.

 This implements an alternative mode that creates histograms over complete
 branch traces, instead of individual branches, similar to how normal
 callgraphs are handled. This is done by putting it in
 front of the normal callgraph and then using the normal callgraph
 histogram infrastructure to unify them.

 This way in complex functions we can understand the control flow
 that lead to a particular sample, and may even see some control
 flow in the caller for short functions.

 Example (simplified, of course for such simple code this
 is usually not needed):

 tcall.c:

 volatile a = 1, b = 10, c;

 __attribute__((noinline)) f2()
 {
   c = a / b;
 }

 __attribute__((noinline)) f1()
 {
   f2();
   f2();
 }
 main()
 {
   int i;
   for (i = 0; i  100; i++)
   f1();
 }

 % perf record -b -g ./tsrc/tcall
 [ perf record: Woken up 1 times to write data ]
 [ perf record: Captured and wrote 0.044 MB perf.data (~1923 samples) ]
 % perf report --branch-history
 ...
 54.91%  tcall.c:6  [.] f2  tcall
 |
 |--65.53%-- f2 tcall.c:5
 |  |
 |  |--70.83%-- f1 tcall.c:11
 |  |  f1 tcall.c:10
 |  |  main tcall.c:18
 |  |  main tcall.c:18
 |  |  main tcall.c:17
 |  |  main tcall.c:17
 |  |  f1 tcall.c:13
 |  |  f1 tcall.c:13
 |  |  f2 tcall.c:7
 |  |  f2 tcall.c:5
 |  |  f1 tcall.c:12
 |  |  f1 tcall.c:12
 |  |  f2 tcall.c:7
 |  |  f2 tcall.c:5
 |  |  f1 tcall.c:11

I think it'd be better if it just prints function names as normal
callchain does (and optionally srcline with a switch) and duplicates
removed like below:

 54.91%  tcall.c:6  [.] f2  tcall
 |
 |--65.53%-- f2 tcall.c:5
 |  |
 |  |--70.83%-- f1
 |  |  main
 |  |  f1
 |  |  f2
 |  |  f1
 |  |  f2


[SNIP]
 +static int add_callchain_ip(struct machine *machine,
 + struct thread *thread,
 + struct symbol **parent,
 + struct addr_location *root_al,
 + int cpumode,
 + u64 ip)
 +{
 + struct addr_location al;
 +
 + al.filtered = 0;
 + al.sym = NULL;
 + if (cpumode == -1)
 + thread__find_cpumode_addr_location(thread, machine, 
 MAP__FUNCTION, ip, al);
 + else
 + thread__find_addr_location(thread, machine, cpumode, 
 MAP__FUNCTION,
 +ip, al);
 + if (al.sym != NULL) {
 + if (sort__has_parent  !*parent 
 + symbol__match_regex(al.sym, parent_regex))
 + *parent = al.sym;
 + else if (have_ignore_callees  root_al 
 +   symbol__match_regex(al.sym, ignore_callees_regex)) {
 + /* Treat this symbol as the root,
 +forgetting its callees. */
 + *root_al = al;
 + callchain_cursor_reset(callchain_cursor);
 + }
 + if (!symbol_conf.use_callchain)
 + return -EINVAL;

This check already went away.

And, to remove duplicates, I think we need to check last callchain
cursor node wrt the callchain_param.key here.


 + }
 +
 + return callchain_cursor_append(callchain_cursor, ip, al.map, al.sym);
 +}
 +
 +#define CHASHSZ 127
 +#define CHASHBITS 7
 +#define NO_ENTRY 0xff
 +
 +#define PERF_MAX_BRANCH_DEPTH 127
 +
 +/* Remove loops. */
 +static int remove_loops(struct branch_entry *l, int nr)
 +{
 + int i, j, off;
 + unsigned char chash[CHASHSZ];
 + memset(chash, NO_ENTRY, sizeof(chash));
 +
 + BUG_ON(nr = 256);
 + for (i = 0; i  nr; i++) {
 + int h = hash_64(l[i].from, CHASHBITS) % CHASHSZ;
 +
 + /* no collision handling for now */
 + if (chash[h] == NO_ENTRY) {
 + chash[h] = i;
 + } else if (l[chash[h]].from == l[i].from) {
 + bool is_loop = true;
 + /* check if it is a real loop */
 + off = 0;
 + for (j = chash[h]; j  i  i + off  nr; j++, off++)
 +  

Re: [PATCH 1/8] perf, tools: Support handling complete branch stacks as histograms

2014-10-20 Thread Jiri Olsa
On Fri, Sep 26, 2014 at 04:37:09PM -0700, Andi Kleen wrote:
> From: Andi Kleen 

SNIP

>  
>  struct callchain_list {
> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> index b2ec38b..8ba32ce 100644
> --- a/tools/perf/util/machine.c
> +++ b/tools/perf/util/machine.c
> @@ -12,6 +12,7 @@
>  #include 
>  #include 
>  #include "unwind.h"
> +#include "linux/hash.h"
>  
>  int machine__init(struct machine *machine, const char *root_dir, pid_t pid)
>  {
> @@ -1364,9 +1365,84 @@ struct branch_info *sample__resolve_bstack(struct 
> perf_sample *sample,
>   return bi;
>  }
>  
> +static int add_callchain_ip(struct machine *machine,
> + struct thread *thread,
> + struct symbol **parent,
> + struct addr_location *root_al,
> + int cpumode,
> + u64 ip)
> +{
> + struct addr_location al;
> +
> + al.filtered = 0;
> + al.sym = NULL;
> + if (cpumode == -1)
> + thread__find_cpumode_addr_location(thread, machine, 
> MAP__FUNCTION, ip, );
> + else
> + thread__find_addr_location(thread, machine, cpumode, 
> MAP__FUNCTION,
> +ip, );

this cpumode condition is new (wrt below comment)

> + if (al.sym != NULL) {
> + if (sort__has_parent && !*parent &&
> + symbol__match_regex(al.sym, _regex))
> + *parent = al.sym;
> + else if (have_ignore_callees && root_al &&
> +   symbol__match_regex(al.sym, _callees_regex)) {
> + /* Treat this symbol as the root,
> +forgetting its callees. */
> + *root_al = al;
> + callchain_cursor_reset(_cursor);
> + }
> + if (!symbol_conf.use_callchain)
> + return -EINVAL;

why is this condition here?

could you please split this change into
  - adding add_callchain_ip function
  - adding more functionality to add_callchain_ip function?

IMO it'd make it cleaner and easier to understand

thanks,
jirka
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/8] perf, tools: Support handling complete branch stacks as histograms

2014-10-20 Thread Jiri Olsa
On Fri, Sep 26, 2014 at 04:37:09PM -0700, Andi Kleen wrote:

SNIP

>   OPT_BOOLEAN('x', "exclude-other", _conf.exclude_other,
>   "Only display entries with parent-match"),
> - OPT_CALLBACK_DEFAULT('g', "call-graph", , 
> "output_type,min_percent[,print_limit],call_order",
> -  "Display callchains using output_type (graph, flat, 
> fractal, or none) , min percent threshold, optional print limit, callchain 
> order, key (function or address). "
> + OPT_CALLBACK_DEFAULT('g', "call-graph", , 
> "output_type,min_percent[,print_limit],call_order[,branch]",
> +  "Display callchains using output_type (graph, flat, 
> fractal, or none) , min percent threshold, optional print limit, callchain 
> order, key (function or address), add branches. "
>"Default: fractal,0.5,callee,function", 
> _parse_callchain_opt, callchain_default_opt),
>   OPT_BOOLEAN(0, "children", _conf.cumulate_callchain,
>   "Accumulate callchains of children and show total overhead 
> as well"),
> diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
> index 08f0fbf..265457c 100644
> --- a/tools/perf/util/callchain.c
> +++ b/tools/perf/util/callchain.c
> @@ -61,6 +61,8 @@ parse_callchain_report_opt(const char *arg)
>   callchain_param.key = CCKEY_FUNCTION;
>   else if (!strncmp(tok, "address", strlen(tok)))
>   callchain_param.key = CCKEY_ADDRESS;
> + else if (!strncmp(tok, "branch", strlen(tok)))
> + callchain_param.branch_callstack = 1;

this needs to be rebased to latest Namhyung's changes
which got in..

could you please rebase on Arnaldo's perf/core?

jirka
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/8] perf, tools: Support handling complete branch stacks as histograms

2014-10-20 Thread Jiri Olsa
On Fri, Sep 26, 2014 at 04:37:09PM -0700, Andi Kleen wrote:

SNIP

   OPT_BOOLEAN('x', exclude-other, symbol_conf.exclude_other,
   Only display entries with parent-match),
 - OPT_CALLBACK_DEFAULT('g', call-graph, report, 
 output_type,min_percent[,print_limit],call_order,
 -  Display callchains using output_type (graph, flat, 
 fractal, or none) , min percent threshold, optional print limit, callchain 
 order, key (function or address). 
 + OPT_CALLBACK_DEFAULT('g', call-graph, report, 
 output_type,min_percent[,print_limit],call_order[,branch],
 +  Display callchains using output_type (graph, flat, 
 fractal, or none) , min percent threshold, optional print limit, callchain 
 order, key (function or address), add branches. 
Default: fractal,0.5,callee,function, 
 report_parse_callchain_opt, callchain_default_opt),
   OPT_BOOLEAN(0, children, symbol_conf.cumulate_callchain,
   Accumulate callchains of children and show total overhead 
 as well),
 diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
 index 08f0fbf..265457c 100644
 --- a/tools/perf/util/callchain.c
 +++ b/tools/perf/util/callchain.c
 @@ -61,6 +61,8 @@ parse_callchain_report_opt(const char *arg)
   callchain_param.key = CCKEY_FUNCTION;
   else if (!strncmp(tok, address, strlen(tok)))
   callchain_param.key = CCKEY_ADDRESS;
 + else if (!strncmp(tok, branch, strlen(tok)))
 + callchain_param.branch_callstack = 1;

this needs to be rebased to latest Namhyung's changes
which got in..

could you please rebase on Arnaldo's perf/core?

jirka
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/8] perf, tools: Support handling complete branch stacks as histograms

2014-10-20 Thread Jiri Olsa
On Fri, Sep 26, 2014 at 04:37:09PM -0700, Andi Kleen wrote:
 From: Andi Kleen a...@linux.intel.com

SNIP

  
  struct callchain_list {
 diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
 index b2ec38b..8ba32ce 100644
 --- a/tools/perf/util/machine.c
 +++ b/tools/perf/util/machine.c
 @@ -12,6 +12,7 @@
  #include stdbool.h
  #include symbol/kallsyms.h
  #include unwind.h
 +#include linux/hash.h
  
  int machine__init(struct machine *machine, const char *root_dir, pid_t pid)
  {
 @@ -1364,9 +1365,84 @@ struct branch_info *sample__resolve_bstack(struct 
 perf_sample *sample,
   return bi;
  }
  
 +static int add_callchain_ip(struct machine *machine,
 + struct thread *thread,
 + struct symbol **parent,
 + struct addr_location *root_al,
 + int cpumode,
 + u64 ip)
 +{
 + struct addr_location al;
 +
 + al.filtered = 0;
 + al.sym = NULL;
 + if (cpumode == -1)
 + thread__find_cpumode_addr_location(thread, machine, 
 MAP__FUNCTION, ip, al);
 + else
 + thread__find_addr_location(thread, machine, cpumode, 
 MAP__FUNCTION,
 +ip, al);

this cpumode condition is new (wrt below comment)

 + if (al.sym != NULL) {
 + if (sort__has_parent  !*parent 
 + symbol__match_regex(al.sym, parent_regex))
 + *parent = al.sym;
 + else if (have_ignore_callees  root_al 
 +   symbol__match_regex(al.sym, ignore_callees_regex)) {
 + /* Treat this symbol as the root,
 +forgetting its callees. */
 + *root_al = al;
 + callchain_cursor_reset(callchain_cursor);
 + }
 + if (!symbol_conf.use_callchain)
 + return -EINVAL;

why is this condition here?

could you please split this change into
  - adding add_callchain_ip function
  - adding more functionality to add_callchain_ip function?

IMO it'd make it cleaner and easier to understand

thanks,
jirka
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/