Re: [PATCH 3/3] perf callchain: Prune misleading callchains for self entries

2014-08-19 Thread Namhyung Kim
On Tue, 19 Aug 2014 09:10:24 +0200, Jiri Olsa wrote:
> On Tue, Aug 19, 2014 at 02:51:31PM +0900, Namhyung Kim wrote:
> SNIP
>
>> IMHO, it's harder to find out the right info.  Even if we might add
>> self/children tag to the callchains, I still think it'd be better just
>> to trim down irrelavant info from the output.
>> 
>> After applying this patch, it'll only show the callees of the symbol and
>> I think this is what most users expect to see:
>> 
>>   +   99.98% 0.00%  abc2 libc-2.17.so   [.] __libc_start_main
>>   -   79.75%11.29%  abc2 abc2   [.] main
>>  - main
>> + 38.01% a
>> + 35.75% c
>> + 26.22% b
>>   -   53.07%52.99%  abc2 abc2   [.] c
>>  - c
>> + 100.00% apic_timer_interrupt
>>   -   34.12%28.75%  abc2 abc2   [.] b
>>  - b
>>   98.85% c
>> + 1.15% apic_timer_interrupt
>>   -   30.56% 6.78%  abc2 abc2   [.] a
>>  - a
>> + 69.15% c
>> + 30.80% b
>>   +0.17% 0.01%  abc2 [kernel.kallsyms]  [k] apic_timer_interrupt
>>   +0.15% 0.00%  abc2 [kernel.kallsyms]  [k] 
>> smp_apic_timer_interrupt
>>   +0.11% 0.00%  abc2 [kernel.kallsyms]  [k] hrtimer_interrupt
>> 
>> 
>> If you want to see the callers of 'c', you may go to above entries (main
>> and __libc_start_main in this case) and then follow the callchains.  If
>> it's not enough, you can re-run perf report with -g callee option.
>
> but only if you know who called them, like in this simple example

Nope.  The --children option tends to collect callers above a callee.
This is not guaranteed though, but for most cases it'll work.  If it's
not case, one can use -g callee.


>
> IMHO this part of callchain is not confusing enough to be cut,
> for me it's usefull info for which I'd need to make more steps
> to get it

Well, that might be true for this simple case, but I'm not sure for
large and complex data.  I'd really like to keep the code and the output
simpler, more consistent and more intuitive.


>
> anyway, this is from review point of view.. as a user I dont use
> '-g caller' often enough to actually care ;-)

Yeah, me too.  But with --children option, it becomes very powerful and
can give another viewpoint of the performance analysis IMHO.

Thanks,
Namhyung
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] perf callchain: Prune misleading callchains for self entries

2014-08-19 Thread Jiri Olsa
On Tue, Aug 19, 2014 at 02:51:31PM +0900, Namhyung Kim wrote:
SNIP

> IMHO, it's harder to find out the right info.  Even if we might add
> self/children tag to the callchains, I still think it'd be better just
> to trim down irrelavant info from the output.
> 
> After applying this patch, it'll only show the callees of the symbol and
> I think this is what most users expect to see:
> 
>   +   99.98% 0.00%  abc2 libc-2.17.so   [.] __libc_start_main
>   -   79.75%11.29%  abc2 abc2   [.] main
>  - main
> + 38.01% a
> + 35.75% c
> + 26.22% b
>   -   53.07%52.99%  abc2 abc2   [.] c
>  - c
> + 100.00% apic_timer_interrupt
>   -   34.12%28.75%  abc2 abc2   [.] b
>  - b
>   98.85% c
> + 1.15% apic_timer_interrupt
>   -   30.56% 6.78%  abc2 abc2   [.] a
>  - a
> + 69.15% c
> + 30.80% b
>   +0.17% 0.01%  abc2 [kernel.kallsyms]  [k] apic_timer_interrupt
>   +0.15% 0.00%  abc2 [kernel.kallsyms]  [k] 
> smp_apic_timer_interrupt
>   +0.11% 0.00%  abc2 [kernel.kallsyms]  [k] hrtimer_interrupt
> 
> 
> If you want to see the callers of 'c', you may go to above entries (main
> and __libc_start_main in this case) and then follow the callchains.  If
> it's not enough, you can re-run perf report with -g callee option.

but only if you know who called them, like in this simple example

IMHO this part of callchain is not confusing enough to be cut,
for me it's usefull info for which I'd need to make more steps
to get it

anyway, this is from review point of view.. as a user I dont use
'-g caller' often enough to actually care ;-)

thanks,
jirka
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] perf callchain: Prune misleading callchains for self entries

2014-08-19 Thread Jiri Olsa
On Tue, Aug 19, 2014 at 02:51:31PM +0900, Namhyung Kim wrote:
SNIP

 IMHO, it's harder to find out the right info.  Even if we might add
 self/children tag to the callchains, I still think it'd be better just
 to trim down irrelavant info from the output.
 
 After applying this patch, it'll only show the callees of the symbol and
 I think this is what most users expect to see:
 
   +   99.98% 0.00%  abc2 libc-2.17.so   [.] __libc_start_main
   -   79.75%11.29%  abc2 abc2   [.] main
  - main
 + 38.01% a
 + 35.75% c
 + 26.22% b
   -   53.07%52.99%  abc2 abc2   [.] c
  - c
 + 100.00% apic_timer_interrupt
   -   34.12%28.75%  abc2 abc2   [.] b
  - b
   98.85% c
 + 1.15% apic_timer_interrupt
   -   30.56% 6.78%  abc2 abc2   [.] a
  - a
 + 69.15% c
 + 30.80% b
   +0.17% 0.01%  abc2 [kernel.kallsyms]  [k] apic_timer_interrupt
   +0.15% 0.00%  abc2 [kernel.kallsyms]  [k] 
 smp_apic_timer_interrupt
   +0.11% 0.00%  abc2 [kernel.kallsyms]  [k] hrtimer_interrupt
 
 
 If you want to see the callers of 'c', you may go to above entries (main
 and __libc_start_main in this case) and then follow the callchains.  If
 it's not enough, you can re-run perf report with -g callee option.

but only if you know who called them, like in this simple example

IMHO this part of callchain is not confusing enough to be cut,
for me it's usefull info for which I'd need to make more steps
to get it

anyway, this is from review point of view.. as a user I dont use
'-g caller' often enough to actually care ;-)

thanks,
jirka
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] perf callchain: Prune misleading callchains for self entries

2014-08-19 Thread Namhyung Kim
On Tue, 19 Aug 2014 09:10:24 +0200, Jiri Olsa wrote:
 On Tue, Aug 19, 2014 at 02:51:31PM +0900, Namhyung Kim wrote:
 SNIP

 IMHO, it's harder to find out the right info.  Even if we might add
 self/children tag to the callchains, I still think it'd be better just
 to trim down irrelavant info from the output.
 
 After applying this patch, it'll only show the callees of the symbol and
 I think this is what most users expect to see:
 
   +   99.98% 0.00%  abc2 libc-2.17.so   [.] __libc_start_main
   -   79.75%11.29%  abc2 abc2   [.] main
  - main
 + 38.01% a
 + 35.75% c
 + 26.22% b
   -   53.07%52.99%  abc2 abc2   [.] c
  - c
 + 100.00% apic_timer_interrupt
   -   34.12%28.75%  abc2 abc2   [.] b
  - b
   98.85% c
 + 1.15% apic_timer_interrupt
   -   30.56% 6.78%  abc2 abc2   [.] a
  - a
 + 69.15% c
 + 30.80% b
   +0.17% 0.01%  abc2 [kernel.kallsyms]  [k] apic_timer_interrupt
   +0.15% 0.00%  abc2 [kernel.kallsyms]  [k] 
 smp_apic_timer_interrupt
   +0.11% 0.00%  abc2 [kernel.kallsyms]  [k] hrtimer_interrupt
 
 
 If you want to see the callers of 'c', you may go to above entries (main
 and __libc_start_main in this case) and then follow the callchains.  If
 it's not enough, you can re-run perf report with -g callee option.

 but only if you know who called them, like in this simple example

Nope.  The --children option tends to collect callers above a callee.
This is not guaranteed though, but for most cases it'll work.  If it's
not case, one can use -g callee.



 IMHO this part of callchain is not confusing enough to be cut,
 for me it's usefull info for which I'd need to make more steps
 to get it

Well, that might be true for this simple case, but I'm not sure for
large and complex data.  I'd really like to keep the code and the output
simpler, more consistent and more intuitive.



 anyway, this is from review point of view.. as a user I dont use
 '-g caller' often enough to actually care ;-)

Yeah, me too.  But with --children option, it becomes very powerful and
can give another viewpoint of the performance analysis IMHO.

Thanks,
Namhyung
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] perf callchain: Prune misleading callchains for self entries

2014-08-18 Thread Namhyung Kim
Hi Jiri,

On Mon, 18 Aug 2014 13:31:47 +0200, Jiri Olsa wrote:
> On Sat, Aug 16, 2014 at 11:26:31AM +0900, Namhyung Kim wrote:
>
> SNIP
>
>> > 
>> > hum, where is it callee/caller mixed?  with following example:
>> > 
>> > ---
>> > void c(void)
>> > {
>> > }
>> > 
>> > void b(void)
>> > {
>> > c();
>> > }
>> > 
>> > void a(void)
>> > {
>> > b();
>> > }
>> > 
>> > int main(void)
>> > {
>> > while(1) {
>> > a();
>> > b();
>> > c();
>> > }
>> > }
>> > ---
>> > 
>> > for 'c' the current code will display:
>> > 
>> > -   43.74%43.74%  tt [.] c 
>> >▒
>> >- __libc_start_main 
>> >▒
>> >   - 86.33% main
>> >▒
>> >67.08% c
>> >▒
>> >  - 32.91% a
>> >▒
>> >   99.44% c 
>> >▒
>> > - 0.56% b  
>> >▒
>> >  c 
>> >▒
>> > 13.67% c   
>> >▒
>> > 
>> > and with this patch:
>> > 
>> > -   43.74%43.74%  tt [.] c 
>> >▒
>> >  c 
>> >▒
>> > 
>> > 
>> > The 'c' callchain is still in caller order. IMO we should
>> > keep whole callchain here.
>> 
>> The problem is not in pure self entry (that has self overhead = children
>> overhead) and pure cumulative entry (self overhead = 0).  It's in mixed
>> entries, please see last two examples in the description 0/3.
>
> right, but it still affects pure entries as well
> anyway, let's see the mixed entry
>
> for 'a' the current code will display:
> -   31.97%17.16%  tt  [.] a   
>  ▒
>- __libc_start_main
>  ◆
> 81.08% a  
>  ▒
>   - 18.92% main   
>  ▒
>a  
>  ▒
>- a
>  ▒
> 85.05% c  
>  ▒
>   - 14.91% b  
>  ▒
>100.00% c  
>  ▒
>
> and with this patch:
> -   31.97%17.16%  tt  [.] a   
>  ▒
>- a
>  ▒
> 85.05% c  
>  ▒
>   - 14.91% b  
>  ▒
>100.00% c  
>  ▒
>
> so we'll miss the 'self' callchain of 'a' symbol
>
> if we want to avoid the confusion about 2 different callchains, how
> about marking them with 'self' and 'children' tags, instead of removing
> one of them, like:
>
> for 'a' the current code will display:
> -   31.97%17.16%  tt  [.] a   
>  ▒
>- [self]
>  __libc_start_main
>  ◆
> 81.08% a

Re: [PATCH 3/3] perf callchain: Prune misleading callchains for self entries

2014-08-18 Thread Jiri Olsa
On Sat, Aug 16, 2014 at 11:26:31AM +0900, Namhyung Kim wrote:

SNIP

> > 
> > hum, where is it callee/caller mixed?  with following example:
> > 
> > ---
> > void c(void)
> > {
> > }
> > 
> > void b(void)
> > {
> > c();
> > }
> > 
> > void a(void)
> > {
> > b();
> > }
> > 
> > int main(void)
> > {
> > while(1) {
> > a();
> > b();
> > c();
> > }
> > }
> > ---
> > 
> > for 'c' the current code will display:
> > 
> > -   43.74%43.74%  tt [.] c  
> >   ▒
> >- __libc_start_main  
> >   ▒
> >   - 86.33% main 
> >   ▒
> >67.08% c 
> >   ▒
> >  - 32.91% a 
> >   ▒
> >   99.44% c  
> >   ▒
> > - 0.56% b   
> >   ▒
> >  c  
> >   ▒
> > 13.67% c
> >   ▒
> > 
> > and with this patch:
> > 
> > -   43.74%43.74%  tt [.] c  
> >   ▒
> >  c  
> >   ▒
> > 
> > 
> > The 'c' callchain is still in caller order. IMO we should
> > keep whole callchain here.
> 
> The problem is not in pure self entry (that has self overhead = children
> overhead) and pure cumulative entry (self overhead = 0).  It's in mixed
> entries, please see last two examples in the description 0/3.

right, but it still affects pure entries as well
anyway, let's see the mixed entry

for 'a' the current code will display:
-   31.97%17.16%  tt  [.] a 
   ▒
   - __libc_start_main  
   ◆
81.08% a
   ▒
  - 18.92% main 
   ▒
   a
   ▒
   - a  
   ▒
85.05% c
   ▒
  - 14.91% b
   ▒
   100.00% c
   ▒

and with this patch:
-   31.97%17.16%  tt  [.] a 
   ▒
   - a  
   ▒
85.05% c
   ▒
  - 14.91% b
   ▒
   100.00% c
   ▒

so we'll miss the 'self' callchain of 'a' symbol

if we want to avoid the confusion about 2 different callchains, how
about marking them with 'self' and 'children' tags, instead of removing
one of them, like:

for 'a' the current code will display:
-   31.97%17.16%  tt  [.] a 
   ▒
   - [self]
 __libc_start_main  
   ◆
81.08% a
   ▒
  - 18.92% main 
   ▒
   a

Re: [PATCH 3/3] perf callchain: Prune misleading callchains for self entries

2014-08-18 Thread Jiri Olsa
On Sat, Aug 16, 2014 at 11:26:31AM +0900, Namhyung Kim wrote:

SNIP

  
  hum, where is it callee/caller mixed?  with following example:
  
  ---
  void c(void)
  {
  }
  
  void b(void)
  {
  c();
  }
  
  void a(void)
  {
  b();
  }
  
  int main(void)
  {
  while(1) {
  a();
  b();
  c();
  }
  }
  ---
  
  for 'c' the current code will display:
  
  -   43.74%43.74%  tt [.] c  
▒
 - __libc_start_main  
▒
- 86.33% main 
▒
 67.08% c 
▒
   - 32.91% a 
▒
99.44% c  
▒
  - 0.56% b   
▒
   c  
▒
  13.67% c
▒
  
  and with this patch:
  
  -   43.74%43.74%  tt [.] c  
▒
   c  
▒
  
  
  The 'c' callchain is still in caller order. IMO we should
  keep whole callchain here.
 
 The problem is not in pure self entry (that has self overhead = children
 overhead) and pure cumulative entry (self overhead = 0).  It's in mixed
 entries, please see last two examples in the description 0/3.

right, but it still affects pure entries as well
anyway, let's see the mixed entry

for 'a' the current code will display:
-   31.97%17.16%  tt  [.] a 
   ▒
   - __libc_start_main  
   ◆
81.08% a
   ▒
  - 18.92% main 
   ▒
   a
   ▒
   - a  
   ▒
85.05% c
   ▒
  - 14.91% b
   ▒
   100.00% c
   ▒

and with this patch:
-   31.97%17.16%  tt  [.] a 
   ▒
   - a  
   ▒
85.05% c
   ▒
  - 14.91% b
   ▒
   100.00% c
   ▒

so we'll miss the 'self' callchain of 'a' symbol

if we want to avoid the confusion about 2 different callchains, how
about marking them with 'self' and 'children' tags, instead of removing
one of them, like:

for 'a' the current code will display:
-   31.97%17.16%  tt  [.] a 
   ▒
   - [self]
 __libc_start_main  
   ◆
81.08% a
   ▒
  - 18.92% main 
   ▒
   a
   ▒
   - [children]
 a  

Re: [PATCH 3/3] perf callchain: Prune misleading callchains for self entries

2014-08-18 Thread Namhyung Kim
Hi Jiri,

On Mon, 18 Aug 2014 13:31:47 +0200, Jiri Olsa wrote:
 On Sat, Aug 16, 2014 at 11:26:31AM +0900, Namhyung Kim wrote:

 SNIP

  
  hum, where is it callee/caller mixed?  with following example:
  
  ---
  void c(void)
  {
  }
  
  void b(void)
  {
  c();
  }
  
  void a(void)
  {
  b();
  }
  
  int main(void)
  {
  while(1) {
  a();
  b();
  c();
  }
  }
  ---
  
  for 'c' the current code will display:
  
  -   43.74%43.74%  tt [.] c 
 ▒
 - __libc_start_main 
 ▒
- 86.33% main
 ▒
 67.08% c
 ▒
   - 32.91% a
 ▒
99.44% c 
 ▒
  - 0.56% b  
 ▒
   c 
 ▒
  13.67% c   
 ▒
  
  and with this patch:
  
  -   43.74%43.74%  tt [.] c 
 ▒
   c 
 ▒
  
  
  The 'c' callchain is still in caller order. IMO we should
  keep whole callchain here.
 
 The problem is not in pure self entry (that has self overhead = children
 overhead) and pure cumulative entry (self overhead = 0).  It's in mixed
 entries, please see last two examples in the description 0/3.

 right, but it still affects pure entries as well
 anyway, let's see the mixed entry

 for 'a' the current code will display:
 -   31.97%17.16%  tt  [.] a   
  ▒
- __libc_start_main
  ◆
 81.08% a  
  ▒
   - 18.92% main   
  ▒
a  
  ▒
- a
  ▒
 85.05% c  
  ▒
   - 14.91% b  
  ▒
100.00% c  
  ▒

 and with this patch:
 -   31.97%17.16%  tt  [.] a   
  ▒
- a
  ▒
 85.05% c  
  ▒
   - 14.91% b  
  ▒
100.00% c  
  ▒

 so we'll miss the 'self' callchain of 'a' symbol

 if we want to avoid the confusion about 2 different callchains, how
 about marking them with 'self' and 'children' tags, instead of removing
 one of them, like:

 for 'a' the current code will display:
 -   31.97%17.16%  tt  [.] a   
  ▒
- [self]
  __libc_start_main
  ◆
 81.08% a  
  ▒
   - 18.92% main   
  ▒
a  

Re: [PATCH 3/3] perf callchain: Prune misleading callchains for self entries

2014-08-15 Thread Namhyung Kim
2014-08-15 (금), 21:51 +0200, Jiri Olsa:
> On Fri, Aug 15, 2014 at 10:57:14AM +0900, Namhyung Kim wrote:
> > Hi Jiri,
> > 
> > 2014-08-14 (목), 16:10 +0200, Jiri Olsa:
> > > On Thu, Aug 14, 2014 at 03:01:40PM +0900, Namhyung Kim wrote:
> > > 
> > > SNIP
> > > 
> > > > However, with --children feature added, it now can show all callees of
> > > > the entry.  For example, "start_kernel" entry now can display it calls
> > > > rest_init and in turn cpu_idle and then cpuidle_idle_call (95.72%).
> > > > 
> > > >  6.14% 0.00%  swapper  [kernel.kallsyms]   [k] 
> > > > start_kernel
> > > >|
> > > > --- start_kernel
> > > > rest_init
> > > > cpu_idle
> > > >|
> > > >|--97.52%-- cpuidle_idle_call
> > > >|  cpuidle_enter_tk
> > > >|  |
> > > >|  |--99.91%-- cpuidle_wrap_enter
> > > >|  |  cpuidle_enter
> > > >|  |  intel_idle
> > > >|   --0.09%-- [...]
> > > > --2.48%-- [...]
> > > > 
> > > > Note that start_kernel has no self overhead - meaning that it never
> > > > get sampled by itself but constructs such a nice callgraph.  But,
> > > > sadly, if an entry has self overhead, callchain will get confused with
> > > > generated callchain (like above) and self callchains (which reversed
> > > > order) like the eariler example.
> > > > 
> > > > To be consistent with other entries, I'd like to make it just to show
> > > > a single entry - itself - like below since it doesn't have callees
> > > > (children) at all.  But still use the whole callchain to construct
> > > > children entries (like the start_kernel) as usual.
> > > > 
> > > > 40.53%40.53%  swapper  [kernel.kallsyms]   [k] 
> > > > intel_idle
> > > > |
> > > > --- intel_idle
> > > 
> > > I understand the consistency point, but I think we'd loose
> > > usefull info by cutting this off
> > > 
> > > I guess I can run 'report -g callee' to find out who called intel_idle
> > > instead.. but I would not need to if the callchain stays here
> > 
> > Yeah, but current behavior intermixes caller-callchains and
> > callee-callchains together so adds confusion to users.  This is a
> > problem IMHO.
> 
> hum, where is it callee/caller mixed?  with following example:
> 
> ---
> void c(void)
> {
> }
> 
> void b(void)
> {
> c();
> }
> 
> void a(void)
> {
> b();
> }
> 
> int main(void)
> {
> while(1) {
> a();
> b();
> c();
> }
> }
> ---
> 
> for 'c' the current code will display:
> 
> -   43.74%43.74%  tt [.] c
> ▒
>- __libc_start_main
> ▒
>   - 86.33% main   
> ▒
>67.08% c   
> ▒
>  - 32.91% a   
> ▒
>   99.44% c
> ▒
> - 0.56% b 
> ▒
>  c
> ▒
> 13.67% c  
> ▒
> 
> and with this patch:
> 
> -   43.74%43.74%  tt [.] c
> ▒
>  c
> ▒
> 
> 
> The 'c' callchain is still in caller order. IMO we should
> keep whole callchain here.

The problem is not in pure self entry (that has self overhead = children
overhead) and pure cumulative entry (self overhead = 0).  It's in mixed
entries, please see last two examples in the description 0/3.

Thanks,
Namhyung


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

Re: [PATCH 3/3] perf callchain: Prune misleading callchains for self entries

2014-08-15 Thread Jiri Olsa
On Fri, Aug 15, 2014 at 10:57:14AM +0900, Namhyung Kim wrote:
> Hi Jiri,
> 
> 2014-08-14 (목), 16:10 +0200, Jiri Olsa:
> > On Thu, Aug 14, 2014 at 03:01:40PM +0900, Namhyung Kim wrote:
> > 
> > SNIP
> > 
> > > However, with --children feature added, it now can show all callees of
> > > the entry.  For example, "start_kernel" entry now can display it calls
> > > rest_init and in turn cpu_idle and then cpuidle_idle_call (95.72%).
> > > 
> > >  6.14% 0.00%  swapper  [kernel.kallsyms]   [k] 
> > > start_kernel
> > >|
> > > --- start_kernel
> > > rest_init
> > > cpu_idle
> > >|
> > >|--97.52%-- cpuidle_idle_call
> > >|  cpuidle_enter_tk
> > >|  |
> > >|  |--99.91%-- cpuidle_wrap_enter
> > >|  |  cpuidle_enter
> > >|  |  intel_idle
> > >|   --0.09%-- [...]
> > > --2.48%-- [...]
> > > 
> > > Note that start_kernel has no self overhead - meaning that it never
> > > get sampled by itself but constructs such a nice callgraph.  But,
> > > sadly, if an entry has self overhead, callchain will get confused with
> > > generated callchain (like above) and self callchains (which reversed
> > > order) like the eariler example.
> > > 
> > > To be consistent with other entries, I'd like to make it just to show
> > > a single entry - itself - like below since it doesn't have callees
> > > (children) at all.  But still use the whole callchain to construct
> > > children entries (like the start_kernel) as usual.
> > > 
> > > 40.53%40.53%  swapper  [kernel.kallsyms]   [k] intel_idle
> > > |
> > > --- intel_idle
> > 
> > I understand the consistency point, but I think we'd loose
> > usefull info by cutting this off
> > 
> > I guess I can run 'report -g callee' to find out who called intel_idle
> > instead.. but I would not need to if the callchain stays here
> 
> Yeah, but current behavior intermixes caller-callchains and
> callee-callchains together so adds confusion to users.  This is a
> problem IMHO.

hum, where is it callee/caller mixed?  with following example:

---
void c(void)
{
}

void b(void)
{
c();
}

void a(void)
{
b();
}

int main(void)
{
while(1) {
a();
b();
c();
}
}
---

for 'c' the current code will display:

-   43.74%43.74%  tt [.] c  
  ▒
   - __libc_start_main  
  ▒
  - 86.33% main 
  ▒
   67.08% c 
  ▒
 - 32.91% a 
  ▒
  99.44% c  
  ▒
- 0.56% b   
  ▒
 c  
  ▒
13.67% c
  ▒

and with this patch:

-   43.74%43.74%  tt [.] c  
  ▒
 c  
  ▒


The 'c' callchain is still in caller order. IMO we should
keep whole callchain here.

jirka
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] perf callchain: Prune misleading callchains for self entries

2014-08-15 Thread Jiri Olsa
On Fri, Aug 15, 2014 at 10:57:14AM +0900, Namhyung Kim wrote:
 Hi Jiri,
 
 2014-08-14 (목), 16:10 +0200, Jiri Olsa:
  On Thu, Aug 14, 2014 at 03:01:40PM +0900, Namhyung Kim wrote:
  
  SNIP
  
   However, with --children feature added, it now can show all callees of
   the entry.  For example, start_kernel entry now can display it calls
   rest_init and in turn cpu_idle and then cpuidle_idle_call (95.72%).
   
6.14% 0.00%  swapper  [kernel.kallsyms]   [k] 
   start_kernel
  |
   --- start_kernel
   rest_init
   cpu_idle
  |
  |--97.52%-- cpuidle_idle_call
  |  cpuidle_enter_tk
  |  |
  |  |--99.91%-- cpuidle_wrap_enter
  |  |  cpuidle_enter
  |  |  intel_idle
  |   --0.09%-- [...]
   --2.48%-- [...]
   
   Note that start_kernel has no self overhead - meaning that it never
   get sampled by itself but constructs such a nice callgraph.  But,
   sadly, if an entry has self overhead, callchain will get confused with
   generated callchain (like above) and self callchains (which reversed
   order) like the eariler example.
   
   To be consistent with other entries, I'd like to make it just to show
   a single entry - itself - like below since it doesn't have callees
   (children) at all.  But still use the whole callchain to construct
   children entries (like the start_kernel) as usual.
   
   40.53%40.53%  swapper  [kernel.kallsyms]   [k] intel_idle
   |
   --- intel_idle
  
  I understand the consistency point, but I think we'd loose
  usefull info by cutting this off
  
  I guess I can run 'report -g callee' to find out who called intel_idle
  instead.. but I would not need to if the callchain stays here
 
 Yeah, but current behavior intermixes caller-callchains and
 callee-callchains together so adds confusion to users.  This is a
 problem IMHO.

hum, where is it callee/caller mixed?  with following example:

---
void c(void)
{
}

void b(void)
{
c();
}

void a(void)
{
b();
}

int main(void)
{
while(1) {
a();
b();
c();
}
}
---

for 'c' the current code will display:

-   43.74%43.74%  tt [.] c  
  ▒
   - __libc_start_main  
  ▒
  - 86.33% main 
  ▒
   67.08% c 
  ▒
 - 32.91% a 
  ▒
  99.44% c  
  ▒
- 0.56% b   
  ▒
 c  
  ▒
13.67% c
  ▒

and with this patch:

-   43.74%43.74%  tt [.] c  
  ▒
 c  
  ▒


The 'c' callchain is still in caller order. IMO we should
keep whole callchain here.

jirka
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] perf callchain: Prune misleading callchains for self entries

2014-08-15 Thread Namhyung Kim
2014-08-15 (금), 21:51 +0200, Jiri Olsa:
 On Fri, Aug 15, 2014 at 10:57:14AM +0900, Namhyung Kim wrote:
  Hi Jiri,
  
  2014-08-14 (목), 16:10 +0200, Jiri Olsa:
   On Thu, Aug 14, 2014 at 03:01:40PM +0900, Namhyung Kim wrote:
   
   SNIP
   
However, with --children feature added, it now can show all callees of
the entry.  For example, start_kernel entry now can display it calls
rest_init and in turn cpu_idle and then cpuidle_idle_call (95.72%).

 6.14% 0.00%  swapper  [kernel.kallsyms]   [k] 
start_kernel
   |
--- start_kernel
rest_init
cpu_idle
   |
   |--97.52%-- cpuidle_idle_call
   |  cpuidle_enter_tk
   |  |
   |  |--99.91%-- cpuidle_wrap_enter
   |  |  cpuidle_enter
   |  |  intel_idle
   |   --0.09%-- [...]
--2.48%-- [...]

Note that start_kernel has no self overhead - meaning that it never
get sampled by itself but constructs such a nice callgraph.  But,
sadly, if an entry has self overhead, callchain will get confused with
generated callchain (like above) and self callchains (which reversed
order) like the eariler example.

To be consistent with other entries, I'd like to make it just to show
a single entry - itself - like below since it doesn't have callees
(children) at all.  But still use the whole callchain to construct
children entries (like the start_kernel) as usual.

40.53%40.53%  swapper  [kernel.kallsyms]   [k] 
intel_idle
|
--- intel_idle
   
   I understand the consistency point, but I think we'd loose
   usefull info by cutting this off
   
   I guess I can run 'report -g callee' to find out who called intel_idle
   instead.. but I would not need to if the callchain stays here
  
  Yeah, but current behavior intermixes caller-callchains and
  callee-callchains together so adds confusion to users.  This is a
  problem IMHO.
 
 hum, where is it callee/caller mixed?  with following example:
 
 ---
 void c(void)
 {
 }
 
 void b(void)
 {
 c();
 }
 
 void a(void)
 {
 b();
 }
 
 int main(void)
 {
 while(1) {
 a();
 b();
 c();
 }
 }
 ---
 
 for 'c' the current code will display:
 
 -   43.74%43.74%  tt [.] c
 ▒
- __libc_start_main
 ▒
   - 86.33% main   
 ▒
67.08% c   
 ▒
  - 32.91% a   
 ▒
   99.44% c
 ▒
 - 0.56% b 
 ▒
  c
 ▒
 13.67% c  
 ▒
 
 and with this patch:
 
 -   43.74%43.74%  tt [.] c
 ▒
  c
 ▒
 
 
 The 'c' callchain is still in caller order. IMO we should
 keep whole callchain here.

The problem is not in pure self entry (that has self overhead = children
overhead) and pure cumulative entry (self overhead = 0).  It's in mixed
entries, please see last two examples in the description 0/3.

Thanks,
Namhyung


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] perf callchain: Prune misleading callchains for self entries

2014-08-14 Thread Namhyung Kim
Hi Jiri,

2014-08-14 (목), 16:10 +0200, Jiri Olsa:
> On Thu, Aug 14, 2014 at 03:01:40PM +0900, Namhyung Kim wrote:
> 
> SNIP
> 
> > However, with --children feature added, it now can show all callees of
> > the entry.  For example, "start_kernel" entry now can display it calls
> > rest_init and in turn cpu_idle and then cpuidle_idle_call (95.72%).
> > 
> >  6.14% 0.00%  swapper  [kernel.kallsyms]   [k] start_kernel
> >|
> > --- start_kernel
> > rest_init
> > cpu_idle
> >|
> >|--97.52%-- cpuidle_idle_call
> >|  cpuidle_enter_tk
> >|  |
> >|  |--99.91%-- cpuidle_wrap_enter
> >|  |  cpuidle_enter
> >|  |  intel_idle
> >|   --0.09%-- [...]
> > --2.48%-- [...]
> > 
> > Note that start_kernel has no self overhead - meaning that it never
> > get sampled by itself but constructs such a nice callgraph.  But,
> > sadly, if an entry has self overhead, callchain will get confused with
> > generated callchain (like above) and self callchains (which reversed
> > order) like the eariler example.
> > 
> > To be consistent with other entries, I'd like to make it just to show
> > a single entry - itself - like below since it doesn't have callees
> > (children) at all.  But still use the whole callchain to construct
> > children entries (like the start_kernel) as usual.
> > 
> > 40.53%40.53%  swapper  [kernel.kallsyms]   [k] intel_idle
> > |
> > --- intel_idle
> 
> I understand the consistency point, but I think we'd loose
> usefull info by cutting this off
> 
> I guess I can run 'report -g callee' to find out who called intel_idle
> instead.. but I would not need to if the callchain stays here

Yeah, but current behavior intermixes caller-callchains and
callee-callchains together so adds confusion to users.  This is a
problem IMHO.

And with --children you can easily see the callers right above the entry
as they likely to have same or higher children overhead.

Thanks,
Namhyung


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] perf callchain: Prune misleading callchains for self entries

2014-08-14 Thread Jiri Olsa
On Thu, Aug 14, 2014 at 03:01:40PM +0900, Namhyung Kim wrote:

SNIP

> However, with --children feature added, it now can show all callees of
> the entry.  For example, "start_kernel" entry now can display it calls
> rest_init and in turn cpu_idle and then cpuidle_idle_call (95.72%).
> 
>  6.14% 0.00%  swapper  [kernel.kallsyms]   [k] start_kernel
>|
> --- start_kernel
> rest_init
> cpu_idle
>|
>|--97.52%-- cpuidle_idle_call
>|  cpuidle_enter_tk
>|  |
>|  |--99.91%-- cpuidle_wrap_enter
>|  |  cpuidle_enter
>|  |  intel_idle
>|   --0.09%-- [...]
> --2.48%-- [...]
> 
> Note that start_kernel has no self overhead - meaning that it never
> get sampled by itself but constructs such a nice callgraph.  But,
> sadly, if an entry has self overhead, callchain will get confused with
> generated callchain (like above) and self callchains (which reversed
> order) like the eariler example.
> 
> To be consistent with other entries, I'd like to make it just to show
> a single entry - itself - like below since it doesn't have callees
> (children) at all.  But still use the whole callchain to construct
> children entries (like the start_kernel) as usual.
> 
> 40.53%40.53%  swapper  [kernel.kallsyms]   [k] intel_idle
> |
> --- intel_idle

I understand the consistency point, but I think we'd loose
usefull info by cutting this off

I guess I can run 'report -g callee' to find out who called intel_idle
instead.. but I would not need to if the callchain stays here

jirka
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/3] perf callchain: Prune misleading callchains for self entries

2014-08-14 Thread Namhyung Kim
The "perf report -g" displays callchains callee-first order.  That
means it can see the callers of the sample in a reverse order.  For
example, "intel_idle" entry shows following callchain on my data.

  Children  Self  Command  Shared Object   Symbol
      ---  --  
40.53%40.53%  swapper  [kernel.kallsyms]   [k] intel_idle
  |
   --- intel_idle
   cpuidle_enter
   cpuidle_wrap_enter
   cpuidle_enter_tk
   cpuidle_idle_call
   cpu_idle
  |
  |--85.25%-- start_secondary
  |
   --14.75%-- rest_init
 start_kernel
 x86_64_start_reservations
 x86_64_start_kernel

So "intel_idle" was called by "cpuidle_enter", and in turn, it's
called by "cpuidle_wrap_enter" and so on.

When with -g "caller", it shows callchains in a reversed order -
"caller-first".  I know it's sometimes useful but never used it.

40.53%40.53%  swapper  [kernel.kallsyms]   [k] intel_idle
|
|--85.25%-- start_secondary
|  cpu_idle
|  cpuidle_idle_call
|  cpuidle_enter_tk
|  cpuidle_wrap_enter
|  cpuidle_enter
|  intel_idle
|
 --14.75%-- x86_64_start_kernel
   x86_64_start_reservations
   start_kernel
   rest_init
   cpu_idle
   cpuidle_idle_call
   cpuidle_enter_tk
   cpuidle_wrap_enter
   cpuidle_enter
   intel_idle

However, with --children feature added, it now can show all callees of
the entry.  For example, "start_kernel" entry now can display it calls
rest_init and in turn cpu_idle and then cpuidle_idle_call (95.72%).

 6.14% 0.00%  swapper  [kernel.kallsyms]   [k] start_kernel
   |
--- start_kernel
rest_init
cpu_idle
   |
   |--97.52%-- cpuidle_idle_call
   |  cpuidle_enter_tk
   |  |
   |  |--99.91%-- cpuidle_wrap_enter
   |  |  cpuidle_enter
   |  |  intel_idle
   |   --0.09%-- [...]
--2.48%-- [...]

Note that start_kernel has no self overhead - meaning that it never
get sampled by itself but constructs such a nice callgraph.  But,
sadly, if an entry has self overhead, callchain will get confused with
generated callchain (like above) and self callchains (which reversed
order) like the eariler example.

To be consistent with other entries, I'd like to make it just to show
a single entry - itself - like below since it doesn't have callees
(children) at all.  But still use the whole callchain to construct
children entries (like the start_kernel) as usual.

40.53%40.53%  swapper  [kernel.kallsyms]   [k] intel_idle
|
--- intel_idle

Note that this is just for --children is enabled.  Without that, it
will do the original behavior.

Cc: Frederic Weisbecker 
Signed-off-by: Namhyung Kim 
---
 tools/perf/util/hist.c | 23 ++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 30df6187ee02..67f249daf34e 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -716,7 +716,28 @@ iter_add_single_cumulative_entry(struct hist_entry_iter 
*iter,
iter->he = he;
he_cache[iter->curr++] = he;
 
-   callchain_append(he->callchain, _cursor, sample->period);
+   if (callchain_param.order == ORDER_CALLER) {
+   struct callchain_cursor cursor;
+   int nr = callchain_cursor.nr;
+
+   /*
+* When --children with -g caller, it just adds noise to
+* self entries.  Just adding last node (self) is enough
+* and it'd privide a consistent view with other (cumulative)
+* entries.
+*/
+   while (--nr) {
+   callchain_cursor_advance(_cursor);
+   }
+
+   BUG_ON(callchain_cursor.nr != callchain_cursor.pos + 1);
+
+   callchain_cursor_snapshot(, _cursor);
+
+   callchain_append(he->callchain, , sample->period);
+   } else 

[PATCH 3/3] perf callchain: Prune misleading callchains for self entries

2014-08-14 Thread Namhyung Kim
The perf report -g displays callchains callee-first order.  That
means it can see the callers of the sample in a reverse order.  For
example, intel_idle entry shows following callchain on my data.

  Children  Self  Command  Shared Object   Symbol
      ---  --  
40.53%40.53%  swapper  [kernel.kallsyms]   [k] intel_idle
  |
   --- intel_idle
   cpuidle_enter
   cpuidle_wrap_enter
   cpuidle_enter_tk
   cpuidle_idle_call
   cpu_idle
  |
  |--85.25%-- start_secondary
  |
   --14.75%-- rest_init
 start_kernel
 x86_64_start_reservations
 x86_64_start_kernel

So intel_idle was called by cpuidle_enter, and in turn, it's
called by cpuidle_wrap_enter and so on.

When with -g caller, it shows callchains in a reversed order -
caller-first.  I know it's sometimes useful but never used it.

40.53%40.53%  swapper  [kernel.kallsyms]   [k] intel_idle
|
|--85.25%-- start_secondary
|  cpu_idle
|  cpuidle_idle_call
|  cpuidle_enter_tk
|  cpuidle_wrap_enter
|  cpuidle_enter
|  intel_idle
|
 --14.75%-- x86_64_start_kernel
   x86_64_start_reservations
   start_kernel
   rest_init
   cpu_idle
   cpuidle_idle_call
   cpuidle_enter_tk
   cpuidle_wrap_enter
   cpuidle_enter
   intel_idle

However, with --children feature added, it now can show all callees of
the entry.  For example, start_kernel entry now can display it calls
rest_init and in turn cpu_idle and then cpuidle_idle_call (95.72%).

 6.14% 0.00%  swapper  [kernel.kallsyms]   [k] start_kernel
   |
--- start_kernel
rest_init
cpu_idle
   |
   |--97.52%-- cpuidle_idle_call
   |  cpuidle_enter_tk
   |  |
   |  |--99.91%-- cpuidle_wrap_enter
   |  |  cpuidle_enter
   |  |  intel_idle
   |   --0.09%-- [...]
--2.48%-- [...]

Note that start_kernel has no self overhead - meaning that it never
get sampled by itself but constructs such a nice callgraph.  But,
sadly, if an entry has self overhead, callchain will get confused with
generated callchain (like above) and self callchains (which reversed
order) like the eariler example.

To be consistent with other entries, I'd like to make it just to show
a single entry - itself - like below since it doesn't have callees
(children) at all.  But still use the whole callchain to construct
children entries (like the start_kernel) as usual.

40.53%40.53%  swapper  [kernel.kallsyms]   [k] intel_idle
|
--- intel_idle

Note that this is just for --children is enabled.  Without that, it
will do the original behavior.

Cc: Frederic Weisbecker fweis...@gmail.com
Signed-off-by: Namhyung Kim namhy...@kernel.org
---
 tools/perf/util/hist.c | 23 ++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 30df6187ee02..67f249daf34e 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -716,7 +716,28 @@ iter_add_single_cumulative_entry(struct hist_entry_iter 
*iter,
iter-he = he;
he_cache[iter-curr++] = he;
 
-   callchain_append(he-callchain, callchain_cursor, sample-period);
+   if (callchain_param.order == ORDER_CALLER) {
+   struct callchain_cursor cursor;
+   int nr = callchain_cursor.nr;
+
+   /*
+* When --children with -g caller, it just adds noise to
+* self entries.  Just adding last node (self) is enough
+* and it'd privide a consistent view with other (cumulative)
+* entries.
+*/
+   while (--nr) {
+   callchain_cursor_advance(callchain_cursor);
+   }
+
+   BUG_ON(callchain_cursor.nr != callchain_cursor.pos + 1);
+
+   callchain_cursor_snapshot(cursor, callchain_cursor);
+
+   

Re: [PATCH 3/3] perf callchain: Prune misleading callchains for self entries

2014-08-14 Thread Jiri Olsa
On Thu, Aug 14, 2014 at 03:01:40PM +0900, Namhyung Kim wrote:

SNIP

 However, with --children feature added, it now can show all callees of
 the entry.  For example, start_kernel entry now can display it calls
 rest_init and in turn cpu_idle and then cpuidle_idle_call (95.72%).
 
  6.14% 0.00%  swapper  [kernel.kallsyms]   [k] start_kernel
|
 --- start_kernel
 rest_init
 cpu_idle
|
|--97.52%-- cpuidle_idle_call
|  cpuidle_enter_tk
|  |
|  |--99.91%-- cpuidle_wrap_enter
|  |  cpuidle_enter
|  |  intel_idle
|   --0.09%-- [...]
 --2.48%-- [...]
 
 Note that start_kernel has no self overhead - meaning that it never
 get sampled by itself but constructs such a nice callgraph.  But,
 sadly, if an entry has self overhead, callchain will get confused with
 generated callchain (like above) and self callchains (which reversed
 order) like the eariler example.
 
 To be consistent with other entries, I'd like to make it just to show
 a single entry - itself - like below since it doesn't have callees
 (children) at all.  But still use the whole callchain to construct
 children entries (like the start_kernel) as usual.
 
 40.53%40.53%  swapper  [kernel.kallsyms]   [k] intel_idle
 |
 --- intel_idle

I understand the consistency point, but I think we'd loose
usefull info by cutting this off

I guess I can run 'report -g callee' to find out who called intel_idle
instead.. but I would not need to if the callchain stays here

jirka
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] perf callchain: Prune misleading callchains for self entries

2014-08-14 Thread Namhyung Kim
Hi Jiri,

2014-08-14 (목), 16:10 +0200, Jiri Olsa:
 On Thu, Aug 14, 2014 at 03:01:40PM +0900, Namhyung Kim wrote:
 
 SNIP
 
  However, with --children feature added, it now can show all callees of
  the entry.  For example, start_kernel entry now can display it calls
  rest_init and in turn cpu_idle and then cpuidle_idle_call (95.72%).
  
   6.14% 0.00%  swapper  [kernel.kallsyms]   [k] start_kernel
 |
  --- start_kernel
  rest_init
  cpu_idle
 |
 |--97.52%-- cpuidle_idle_call
 |  cpuidle_enter_tk
 |  |
 |  |--99.91%-- cpuidle_wrap_enter
 |  |  cpuidle_enter
 |  |  intel_idle
 |   --0.09%-- [...]
  --2.48%-- [...]
  
  Note that start_kernel has no self overhead - meaning that it never
  get sampled by itself but constructs such a nice callgraph.  But,
  sadly, if an entry has self overhead, callchain will get confused with
  generated callchain (like above) and self callchains (which reversed
  order) like the eariler example.
  
  To be consistent with other entries, I'd like to make it just to show
  a single entry - itself - like below since it doesn't have callees
  (children) at all.  But still use the whole callchain to construct
  children entries (like the start_kernel) as usual.
  
  40.53%40.53%  swapper  [kernel.kallsyms]   [k] intel_idle
  |
  --- intel_idle
 
 I understand the consistency point, but I think we'd loose
 usefull info by cutting this off
 
 I guess I can run 'report -g callee' to find out who called intel_idle
 instead.. but I would not need to if the callchain stays here

Yeah, but current behavior intermixes caller-callchains and
callee-callchains together so adds confusion to users.  This is a
problem IMHO.

And with --children you can easily see the callers right above the entry
as they likely to have same or higher children overhead.

Thanks,
Namhyung


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/