Re: [PATCH RFC 1/1] perf,tool: partial callgrap and time support

2015-06-30 Thread Jiri Olsa
On Sun, Jun 28, 2015 at 01:47:21PM -0400, kan.li...@intel.com wrote:
> From: Kan Liang 
> 
> When multiple events are sampled it may not be needed to collect
> callgraphs for all of them. The sample sites are usually nearby, and
> it's enough to collect the callgraphs on a reference event (such as
> precise cycles or precise instructions). Similarly we also don't need
> fine grained time stamps on all events, as it's enough to have time
> stamps on the regular reference events.
> This patchkit adds the ability to turn off callgraphs and time stamps
> per event. This in term can reduce sampling overhead and the size of the
> perf.data. Furthermore, it makes collecting back traces and timestamps
> possible when PEBS threshold > 1, which significantly reducing the
> sampling overhead especially for frequently occurring events
> (https://lkml.org/lkml/2015/5/10/196). For example, A slower event with
> a larger period collects back traces/timestamps. Other more events run
> fast with multi-pebs. The time stamps from the slower events can be used
> to order the faster events. Their backtraces can give the user enough
> hint to find the right spot.

SNIP

> 
>  7.13% 0.03%  tchain_edit  [kernel.vmlinux]  [k] do_nmi
> |
> ---do_nmi
>end_repeat_nmi
>f3
>f2
>f1
>main
>__libc_start_main
> 
> Signed-off-by: Kan Liang 
> ---
>  tools/perf/Documentation/perf-record.txt | 13 
>  tools/perf/builtin-record.c  |  7 ++--
>  tools/perf/perf.h|  2 ++
>  tools/perf/util/evsel.c  | 55 
> ++--
>  tools/perf/util/parse-events.c   | 33 +++
>  tools/perf/util/parse-events.h   |  3 ++
>  tools/perf/util/parse-events.l   |  3 ++
>  tools/perf/util/parse-options.c  |  2 ++
>  tools/perf/util/parse-options.h  |  4 +++
>  9 files changed, 116 insertions(+), 6 deletions(-)

could you please break up this patch into:
  - parse option changes (+description what are those new macros  good for)
  - callchains terms addition
  - time term addition

thanks,
jirka
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 1/1] perf,tool: partial callgrap and time support

2015-06-30 Thread Jiri Olsa
On Sun, Jun 28, 2015 at 01:47:21PM -0400, kan.li...@intel.com wrote:

SNIP

> Signed-off-by: Kan Liang 
> ---
>  tools/perf/Documentation/perf-record.txt | 13 
>  tools/perf/builtin-record.c  |  7 ++--
>  tools/perf/perf.h|  2 ++
>  tools/perf/util/evsel.c  | 55 
> ++--
>  tools/perf/util/parse-events.c   | 33 +++
>  tools/perf/util/parse-events.h   |  3 ++
>  tools/perf/util/parse-events.l   |  3 ++
>  tools/perf/util/parse-options.c  |  2 ++
>  tools/perf/util/parse-options.h  |  4 +++
>  9 files changed, 116 insertions(+), 6 deletions(-)
> 
> diff --git a/tools/perf/Documentation/perf-record.txt 
> b/tools/perf/Documentation/perf-record.txt
> index 9b9d9d0..f945b01 100644
> --- a/tools/perf/Documentation/perf-record.txt
> +++ b/tools/perf/Documentation/perf-record.txt
> @@ -45,6 +45,19 @@ OPTIONS
>param1 and param2 are defined as formats for the PMU in:
>/sys/bus/event_sources/devices//format/*
>  
> +  There are also some params which are not defined in 
> ...//format/*.
> +  These params can be used to set event defaults.
> +  Here is a list of the params.
> +  - 'period': Set event sampling period
> +  - 'callgraph': Disable/enable callgraph. Acceptable values are
> + 1 for FP mode, 2 for dwarf mode, 3 for LBR mode,
> + 0 for disabling callgraph.

why dont use strings like 'fp,dwarf,lbr' to identify callchains?
you'd have:

  '{cpu/cpu-cycles,callgraph=fp,time=1,period=10/,

also you dont need to assign 1 to time, it's the default, so this is equal:

  '{cpu/cpu-cycles,callgraph=fp,time,period=10/,

come to that, 'fp' could be default for callgraph, like:

  '{cpu/cpu-cycles,callgraph,time,period=10/,

we already have string terms translation support
in pmu_config_term.. but I guess this is not pmu specific
and can stay in config_term

also please add appropriate tests to tests/parse-events.c

thanks,
jirka
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH RFC 1/1] perf,tool: partial callgrap and time support

2015-06-28 Thread kan . liang
From: Kan Liang 

When multiple events are sampled it may not be needed to collect
callgraphs for all of them. The sample sites are usually nearby, and
it's enough to collect the callgraphs on a reference event (such as
precise cycles or precise instructions). Similarly we also don't need
fine grained time stamps on all events, as it's enough to have time
stamps on the regular reference events.
This patchkit adds the ability to turn off callgraphs and time stamps
per event. This in term can reduce sampling overhead and the size of the
perf.data. Furthermore, it makes collecting back traces and timestamps
possible when PEBS threshold > 1, which significantly reducing the
sampling overhead especially for frequently occurring events
(https://lkml.org/lkml/2015/5/10/196). For example, A slower event with
a larger period collects back traces/timestamps. Other more events run
fast with multi-pebs. The time stamps from the slower events can be used
to order the faster events. Their backtraces can give the user enough
hint to find the right spot.

Here are some examples and test results.

1. Comparing the elapsed time and perf.data size from "kernbench -M -H".

 The test command for FULL callgrap and time support.
   "perf record -e
   '{cpu/cpu-cycles,period=10/,cpu/instructions,period=2/p}'
   --call-graph fp --time"

 The test command for PARTIAL callgrap and time support.
   "perf record -e
   '{cpu/cpu-cycles,callgraph=1,time=1,period=10/,
 cpu/instructions,callgraph=0,time=0,period=2/p}'"

 The elapsed time for FULL is 24.3 Sec, while for PARTIAL is 16.9 Sec.
 The perf.data size for FULL is 22.1 Gb, while for PARTIAL is 12.4 Gb.

2. Comparing the perf.data size and callgraph results.

 The test command for FULL callgrap and time support.
   "perf record -e
   '{cpu/cpu-cycles,period=10/pp,cpu/instructions,period=2/p}'
   --call-graph fp -- ./tchain_edit"

 The test command for PARTIAL callgrap and time support.
   "perf record -e
   '{cpu/cpu-cycles,callgraph=1,time=1,period=10/pp,
 cpu/instructions,callgraph=0,time=0,period=2/p}'
   -- ./tchain_edit"

 The perf.data size for FULL is 43.2 MB, while for PARTIAL is 21.1 MB.
 The callgraph is roughly the same.

 The callgraph from FULL
 # Samples: 87K of event
 'cpu/cpu-cycles,callgraph=1,time=1,period=10/pp'
 # Event count (approx.): 876000
 #
 # Children  Self  Command  Shared Object   Symbol
 #     ...  ..
..
 #
99.98% 0.00%  tchain_edit  libc-2.15.so[.]
__libc_start_main
|
---__libc_start_main

99.97% 0.00%  tchain_edit  tchain_edit [.] main
|
---main
   __libc_start_main

99.97% 0.00%  tchain_edit  tchain_edit [.] f1
|
---f1
   main
   __libc_start_main

99.85%87.01%  tchain_edit  tchain_edit [.] f3
|
---f3
   |
   |--99.74%-- f2
   |  f1
   |  main
   |  __libc_start_main
--0.26%-- [...]
99.71% 0.12%  tchain_edit  tchain_edit [.] f2
|
---f2
   f1
   main
   __libc_start_main

 The callgraph from PARTIAL
 # Samples: 417K of event
 'cpu/instructions,callgraph=0,time=0,period=2/p'
 # Event count (approx.): 834698
 #
 # Children  Self  Command  Shared Object Symbol
 #     ...  
..
 #
98.82% 0.00%  tchain_edit  libc-2.15.so  [.]
__libc_start_main
|
---__libc_start_main

98.82% 0.00%  tchain_edit  tchain_edit   [.] main
|
---main
   __libc_start_main

98.82% 0.00%  tchain_edit  tchain_edit   [.] f1
|
---f1
   main
   __libc_start_main

98.82%98.28%  tchain_edit  tchain_edit   [.] f3
|
---f3
   |
   |--0.53%-- f2
   |  f1
   |  main
   |  __libc_start_main
   |
   |--0.01%-- f1
   |  main
   |  __libc_start_main
--99.46%-- [...]
97.63% 0.03%  tchain_edit  tchain_edit   [.] f2
|
---f2
   f1
   main
   __libc_start_main

 7.13% 0.03%  tchain_edit  [kernel.vmlinux]  [k] do_nmi
|
---do_nmi
   end_repeat_nmi
   f3
   f2
   f1
   main
   __libc_start_main

Signed-off-by: Kan Liang 
---
 tools/perf/Documentation/perf-record.txt | 13 
 tools/perf/built