[tip:perf/core] perf bpf examples: Convert etcsnoop to use bpf_map()

2019-01-26 Thread tip-bot for Arnaldo Carvalho de Melo
Commit-ID:  f52fdd64f6046ab121688155aebd66b52ce0077d
Gitweb: https://git.kernel.org/tip/f52fdd64f6046ab121688155aebd66b52ce0077d
Author: Arnaldo Carvalho de Melo 
AuthorDate: Thu, 24 Jan 2019 15:48:05 +0100
Committer:  Arnaldo Carvalho de Melo 
CommitDate: Fri, 25 Jan 2019 15:12:11 +0100

perf bpf examples: Convert etcsnoop to use bpf_map()

Making the code more compact, end result is the same:

  # trace -e /home/acme/git/perf/tools/perf/examples/bpf/etcsnoop.c
 0.000 ( ): sed/7385 openat(dfd: CWD, filename: "/etc/ld.so.cache", 
flags: RDONLY|CLOEXEC) ...
  2727.723 ( ): cat/7389 openat(dfd: CWD, filename: "/etc/ld.so.cache", 
flags: RDONLY|CLOEXEC) ...
  2728.543 ( ): cat/7389 openat(dfd: CWD, filename: "/etc/passwd")  
...
  ^C

Cc: Adrian Hunter 
Cc: Jiri Olsa 
Cc: Luis Cláudio Gonçalves 
Cc: Namhyung Kim 
Cc: Wang Nan 
Link: https://lkml.kernel.org/n/tip-znhgz24p0daux2kay200o...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/examples/bpf/etcsnoop.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/tools/perf/examples/bpf/etcsnoop.c 
b/tools/perf/examples/bpf/etcsnoop.c
index 550e69c2e8d1..e81b535346c0 100644
--- a/tools/perf/examples/bpf/etcsnoop.c
+++ b/tools/perf/examples/bpf/etcsnoop.c
@@ -21,12 +21,8 @@
 
 #include 
 
-struct bpf_map SEC("maps") __augmented_syscalls__ = {
-   .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
-   .key_size = sizeof(int),
-   .value_size = sizeof(u32),
-   .max_entries = __NR_CPUS__,
-};
+/* bpf-output associated map */
+bpf_map(__augmented_syscalls__, PERF_EVENT_ARRAY, int, u32, __NR_CPUS__);
 
 struct augmented_filename {
int size;


[PATCH 28/29] perf bpf examples: Convert etcsnoop to use bpf_map()

2019-01-25 Thread Arnaldo Carvalho de Melo
From: Arnaldo Carvalho de Melo 

Making the code more compact, end result is the same:

  # trace -e /home/acme/git/perf/tools/perf/examples/bpf/etcsnoop.c
 0.000 ( ): sed/7385 openat(dfd: CWD, filename: "/etc/ld.so.cache", 
flags: RDONLY|CLOEXEC) ...
  2727.723 ( ): cat/7389 openat(dfd: CWD, filename: "/etc/ld.so.cache", 
flags: RDONLY|CLOEXEC) ...
  2728.543 ( ): cat/7389 openat(dfd: CWD, filename: "/etc/passwd")  
...
  ^C

Cc: Adrian Hunter 
Cc: Jiri Olsa 
Cc: Luis Cláudio Gonçalves 
Cc: Namhyung Kim 
Cc: Wang Nan 
Link: https://lkml.kernel.org/n/tip-znhgz24p0daux2kay200o...@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/examples/bpf/etcsnoop.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/tools/perf/examples/bpf/etcsnoop.c 
b/tools/perf/examples/bpf/etcsnoop.c
index 550e69c2e8d1..e81b535346c0 100644
--- a/tools/perf/examples/bpf/etcsnoop.c
+++ b/tools/perf/examples/bpf/etcsnoop.c
@@ -21,12 +21,8 @@
 
 #include 
 
-struct bpf_map SEC("maps") __augmented_syscalls__ = {
-   .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
-   .key_size = sizeof(int),
-   .value_size = sizeof(u32),
-   .max_entries = __NR_CPUS__,
-};
+/* bpf-output associated map */
+bpf_map(__augmented_syscalls__, PERF_EVENT_ARRAY, int, u32, __NR_CPUS__);
 
 struct augmented_filename {
int size;
-- 
2.20.1



Re: perf bpf examples

2016-07-08 Thread Brendan Gregg
On Fri, Jul 8, 2016 at 3:46 AM, Wangnan (F)  wrote:
>
>
> On 2016/7/8 15:57, Brendan Gregg wrote:
>>
[...]
>> I mean just an -F99 that executes a BPF program on each sample. My
>> most common use for perf is:
>>
>> perf record -F 99 -a -g -- sleep 30
>> perf report (or perf script, for making flame graphs)
>>
>> But this uses perf.data as an intermediate file. With the recent
>> BPF_MAP_TYPE_STACK_TRACE, we could frequency count stack traces in
>> kernel context, and just dump a report. Much more efficient. And
>> improving a very common perf one-liner.
>
>
> You can't attach BPF script to samples other than kprobe and tracepoints.
> When you use 'perf record -F99 -a -g -- sleep 30', you are sampling on
> 'cycles:ppp' event. This is a hardware PMU event.

Sure, either cycles:ppp or cpu-clock (my Xen guests have no PMU,
sadly). But These are ultimately calling perf_swevent_hrtimer()/etc,
so I was wondering if someone was already looking at enhancing this
code to support BPF? Ie, BPF should be able to attach to kprobes,
uprobes, tracepoints, and timer-based samples.

> If we find a kprobe or tracepoint event which would be triggered 99 times
> in each second, we can utilize BPF_MAP_TYPE_STACK_TRACE and
> bpf_get_stackid().

Yes, that should be a workaround. It's annoying as some like
perf_swevent_hrtimer() can't be kprobed (inlined?), but I found
perf_misc_flags(struct pt_regs *regs) was called, but passing in that
regs to bpf_get_stackid() was returning "type=inv expected=ctx"
errors, despite casting. I'm guessing the BPF ctx type is special and
can't be casted, but need to dig more.

Brendan


Re: perf bpf examples

2016-07-08 Thread Brendan Gregg
On Fri, Jul 8, 2016 at 3:46 AM, Wangnan (F)  wrote:
>
>
> On 2016/7/8 15:57, Brendan Gregg wrote:
>>
[...]
>> I mean just an -F99 that executes a BPF program on each sample. My
>> most common use for perf is:
>>
>> perf record -F 99 -a -g -- sleep 30
>> perf report (or perf script, for making flame graphs)
>>
>> But this uses perf.data as an intermediate file. With the recent
>> BPF_MAP_TYPE_STACK_TRACE, we could frequency count stack traces in
>> kernel context, and just dump a report. Much more efficient. And
>> improving a very common perf one-liner.
>
>
> You can't attach BPF script to samples other than kprobe and tracepoints.
> When you use 'perf record -F99 -a -g -- sleep 30', you are sampling on
> 'cycles:ppp' event. This is a hardware PMU event.

Sure, either cycles:ppp or cpu-clock (my Xen guests have no PMU,
sadly). But These are ultimately calling perf_swevent_hrtimer()/etc,
so I was wondering if someone was already looking at enhancing this
code to support BPF? Ie, BPF should be able to attach to kprobes,
uprobes, tracepoints, and timer-based samples.

> If we find a kprobe or tracepoint event which would be triggered 99 times
> in each second, we can utilize BPF_MAP_TYPE_STACK_TRACE and
> bpf_get_stackid().

Yes, that should be a workaround. It's annoying as some like
perf_swevent_hrtimer() can't be kprobed (inlined?), but I found
perf_misc_flags(struct pt_regs *regs) was called, but passing in that
regs to bpf_get_stackid() was returning "type=inv expected=ctx"
errors, despite casting. I'm guessing the BPF ctx type is special and
can't be casted, but need to dig more.

Brendan


Re: perf bpf examples

2016-07-08 Thread Wangnan (F)



On 2016/7/8 15:57, Brendan Gregg wrote:

On Thu, Jul 7, 2016 at 9:18 PM, Wangnan (F)  wrote:


On 2016/7/8 1:58, Brendan Gregg wrote:

On Thu, Jul 7, 2016 at 10:54 AM, Brendan Gregg
 wrote:

On Wed, Jul 6, 2016 at 6:49 PM, Wangnan (F)  wrote:

[...]

... Also, has anyone looked into perf sampling (-F 99) with bpf yet?
Thanks,


Theoretically, BPF program is an additional filter to
decide whetier an event should be filtered out or pass to perf. -F 99
is another filter, which drops samples to ensure the frequence.
Filters works together. The full graph should be:

  BPF --> traditional filter --> proc (system wide of proc specific) -->
period

See the example at the end of this mail. The BPF program returns 0 for half
of
the events, and the result should be symmetrical. We can get similar result
without
-F:

# ~/perf record -a --clang-opt '-DCATCH_ODD' -e ./sampling.c dd if=/dev/zero
of=/dev/null count=8388480
8388480+0 records in
8388480+0 records out
4294901760 bytes (4.3 GB) copied, 11.9908 s, 358 MB/s
[ perf record: Woken up 28 times to write data ]
[ perf record: Captured and wrote 303.915 MB perf.data (4194449 samples) ]
#
root@wn-Lenovo-Product:~# ~/perf record -a --clang-opt '-DCATCH_EVEN' -e
./sampling.c dd if=/dev/zero of=/dev/null count=8388480
8388480+0 records in
8388480+0 records out
4294901760 bytes (4.3 GB) copied, 12.1154 s, 355 MB/s
[ perf record: Woken up 54 times to write data ]
[ perf record: Captured and wrote 303.933 MB perf.data (4194347 samples) ]


With -F99 added:

# ~/perf record -F99 -a --clang-opt '-DCATCH_ODD' -e ./sampling.c dd
if=/dev/zero of=/dev/null count=8388480
8388480+0 records in
8388480+0 records out
4294901760 bytes (4.3 GB) copied, 9.60126 s, 447 MB/s
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.402 MB perf.data (35 samples) ]
# ~/perf record -F99 -a --clang-opt '-DCATCH_EVEN' -e ./sampling.c dd
if=/dev/zero of=/dev/null count=8388480
8388480+0 records in
8388480+0 records out
4294901760 bytes (4.3 GB) copied, 9.76719 s, 440 MB/s
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.399 MB perf.data (37 samples) ]

That looks like it's doing two different things: -F99, and a
sampling.c script (SEC("func=sys_read")).

I mean just an -F99 that executes a BPF program on each sample. My
most common use for perf is:

perf record -F 99 -a -g -- sleep 30
perf report (or perf script, for making flame graphs)

But this uses perf.data as an intermediate file. With the recent
BPF_MAP_TYPE_STACK_TRACE, we could frequency count stack traces in
kernel context, and just dump a report. Much more efficient. And
improving a very common perf one-liner.


You can't attach BPF script to samples other than kprobe and tracepoints.
When you use 'perf record -F99 -a -g -- sleep 30', you are sampling on
'cycles:ppp' event. This is a hardware PMU event.

If we find a kprobe or tracepoint event which would be triggered 99 times
in each second, we can utilize BPF_MAP_TYPE_STACK_TRACE and 
bpf_get_stackid().


Thank you.



Re: perf bpf examples

2016-07-08 Thread Wangnan (F)



On 2016/7/8 15:57, Brendan Gregg wrote:

On Thu, Jul 7, 2016 at 9:18 PM, Wangnan (F)  wrote:


On 2016/7/8 1:58, Brendan Gregg wrote:

On Thu, Jul 7, 2016 at 10:54 AM, Brendan Gregg
 wrote:

On Wed, Jul 6, 2016 at 6:49 PM, Wangnan (F)  wrote:

[...]

... Also, has anyone looked into perf sampling (-F 99) with bpf yet?
Thanks,


Theoretically, BPF program is an additional filter to
decide whetier an event should be filtered out or pass to perf. -F 99
is another filter, which drops samples to ensure the frequence.
Filters works together. The full graph should be:

  BPF --> traditional filter --> proc (system wide of proc specific) -->
period

See the example at the end of this mail. The BPF program returns 0 for half
of
the events, and the result should be symmetrical. We can get similar result
without
-F:

# ~/perf record -a --clang-opt '-DCATCH_ODD' -e ./sampling.c dd if=/dev/zero
of=/dev/null count=8388480
8388480+0 records in
8388480+0 records out
4294901760 bytes (4.3 GB) copied, 11.9908 s, 358 MB/s
[ perf record: Woken up 28 times to write data ]
[ perf record: Captured and wrote 303.915 MB perf.data (4194449 samples) ]
#
root@wn-Lenovo-Product:~# ~/perf record -a --clang-opt '-DCATCH_EVEN' -e
./sampling.c dd if=/dev/zero of=/dev/null count=8388480
8388480+0 records in
8388480+0 records out
4294901760 bytes (4.3 GB) copied, 12.1154 s, 355 MB/s
[ perf record: Woken up 54 times to write data ]
[ perf record: Captured and wrote 303.933 MB perf.data (4194347 samples) ]


With -F99 added:

# ~/perf record -F99 -a --clang-opt '-DCATCH_ODD' -e ./sampling.c dd
if=/dev/zero of=/dev/null count=8388480
8388480+0 records in
8388480+0 records out
4294901760 bytes (4.3 GB) copied, 9.60126 s, 447 MB/s
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.402 MB perf.data (35 samples) ]
# ~/perf record -F99 -a --clang-opt '-DCATCH_EVEN' -e ./sampling.c dd
if=/dev/zero of=/dev/null count=8388480
8388480+0 records in
8388480+0 records out
4294901760 bytes (4.3 GB) copied, 9.76719 s, 440 MB/s
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.399 MB perf.data (37 samples) ]

That looks like it's doing two different things: -F99, and a
sampling.c script (SEC("func=sys_read")).

I mean just an -F99 that executes a BPF program on each sample. My
most common use for perf is:

perf record -F 99 -a -g -- sleep 30
perf report (or perf script, for making flame graphs)

But this uses perf.data as an intermediate file. With the recent
BPF_MAP_TYPE_STACK_TRACE, we could frequency count stack traces in
kernel context, and just dump a report. Much more efficient. And
improving a very common perf one-liner.


You can't attach BPF script to samples other than kprobe and tracepoints.
When you use 'perf record -F99 -a -g -- sleep 30', you are sampling on
'cycles:ppp' event. This is a hardware PMU event.

If we find a kprobe or tracepoint event which would be triggered 99 times
in each second, we can utilize BPF_MAP_TYPE_STACK_TRACE and 
bpf_get_stackid().


Thank you.



Re: perf bpf examples

2016-07-08 Thread Brendan Gregg
On Thu, Jul 7, 2016 at 9:18 PM, Wangnan (F)  wrote:
>
>
> On 2016/7/8 1:58, Brendan Gregg wrote:
>>
>> On Thu, Jul 7, 2016 at 10:54 AM, Brendan Gregg
>>  wrote:
>>>
>>> On Wed, Jul 6, 2016 at 6:49 PM, Wangnan (F)  wrote:
[...]
>> ... Also, has anyone looked into perf sampling (-F 99) with bpf yet?
>> Thanks,
>
>
> Theoretically, BPF program is an additional filter to
> decide whetier an event should be filtered out or pass to perf. -F 99
> is another filter, which drops samples to ensure the frequence.
> Filters works together. The full graph should be:
>
>  BPF --> traditional filter --> proc (system wide of proc specific) -->
> period
>
> See the example at the end of this mail. The BPF program returns 0 for half
> of
> the events, and the result should be symmetrical. We can get similar result
> without
> -F:
>
> # ~/perf record -a --clang-opt '-DCATCH_ODD' -e ./sampling.c dd if=/dev/zero
> of=/dev/null count=8388480
> 8388480+0 records in
> 8388480+0 records out
> 4294901760 bytes (4.3 GB) copied, 11.9908 s, 358 MB/s
> [ perf record: Woken up 28 times to write data ]
> [ perf record: Captured and wrote 303.915 MB perf.data (4194449 samples) ]
> #
> root@wn-Lenovo-Product:~# ~/perf record -a --clang-opt '-DCATCH_EVEN' -e
> ./sampling.c dd if=/dev/zero of=/dev/null count=8388480
> 8388480+0 records in
> 8388480+0 records out
> 4294901760 bytes (4.3 GB) copied, 12.1154 s, 355 MB/s
> [ perf record: Woken up 54 times to write data ]
> [ perf record: Captured and wrote 303.933 MB perf.data (4194347 samples) ]
>
>
> With -F99 added:
>
> # ~/perf record -F99 -a --clang-opt '-DCATCH_ODD' -e ./sampling.c dd
> if=/dev/zero of=/dev/null count=8388480
> 8388480+0 records in
> 8388480+0 records out
> 4294901760 bytes (4.3 GB) copied, 9.60126 s, 447 MB/s
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.402 MB perf.data (35 samples) ]
> # ~/perf record -F99 -a --clang-opt '-DCATCH_EVEN' -e ./sampling.c dd
> if=/dev/zero of=/dev/null count=8388480
> 8388480+0 records in
> 8388480+0 records out
> 4294901760 bytes (4.3 GB) copied, 9.76719 s, 440 MB/s
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.399 MB perf.data (37 samples) ]

That looks like it's doing two different things: -F99, and a
sampling.c script (SEC("func=sys_read")).

I mean just an -F99 that executes a BPF program on each sample. My
most common use for perf is:

perf record -F 99 -a -g -- sleep 30
perf report (or perf script, for making flame graphs)

But this uses perf.data as an intermediate file. With the recent
BPF_MAP_TYPE_STACK_TRACE, we could frequency count stack traces in
kernel context, and just dump a report. Much more efficient. And
improving a very common perf one-liner.

Brendan


Re: perf bpf examples

2016-07-08 Thread Brendan Gregg
On Thu, Jul 7, 2016 at 9:18 PM, Wangnan (F)  wrote:
>
>
> On 2016/7/8 1:58, Brendan Gregg wrote:
>>
>> On Thu, Jul 7, 2016 at 10:54 AM, Brendan Gregg
>>  wrote:
>>>
>>> On Wed, Jul 6, 2016 at 6:49 PM, Wangnan (F)  wrote:
[...]
>> ... Also, has anyone looked into perf sampling (-F 99) with bpf yet?
>> Thanks,
>
>
> Theoretically, BPF program is an additional filter to
> decide whetier an event should be filtered out or pass to perf. -F 99
> is another filter, which drops samples to ensure the frequence.
> Filters works together. The full graph should be:
>
>  BPF --> traditional filter --> proc (system wide of proc specific) -->
> period
>
> See the example at the end of this mail. The BPF program returns 0 for half
> of
> the events, and the result should be symmetrical. We can get similar result
> without
> -F:
>
> # ~/perf record -a --clang-opt '-DCATCH_ODD' -e ./sampling.c dd if=/dev/zero
> of=/dev/null count=8388480
> 8388480+0 records in
> 8388480+0 records out
> 4294901760 bytes (4.3 GB) copied, 11.9908 s, 358 MB/s
> [ perf record: Woken up 28 times to write data ]
> [ perf record: Captured and wrote 303.915 MB perf.data (4194449 samples) ]
> #
> root@wn-Lenovo-Product:~# ~/perf record -a --clang-opt '-DCATCH_EVEN' -e
> ./sampling.c dd if=/dev/zero of=/dev/null count=8388480
> 8388480+0 records in
> 8388480+0 records out
> 4294901760 bytes (4.3 GB) copied, 12.1154 s, 355 MB/s
> [ perf record: Woken up 54 times to write data ]
> [ perf record: Captured and wrote 303.933 MB perf.data (4194347 samples) ]
>
>
> With -F99 added:
>
> # ~/perf record -F99 -a --clang-opt '-DCATCH_ODD' -e ./sampling.c dd
> if=/dev/zero of=/dev/null count=8388480
> 8388480+0 records in
> 8388480+0 records out
> 4294901760 bytes (4.3 GB) copied, 9.60126 s, 447 MB/s
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.402 MB perf.data (35 samples) ]
> # ~/perf record -F99 -a --clang-opt '-DCATCH_EVEN' -e ./sampling.c dd
> if=/dev/zero of=/dev/null count=8388480
> 8388480+0 records in
> 8388480+0 records out
> 4294901760 bytes (4.3 GB) copied, 9.76719 s, 440 MB/s
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.399 MB perf.data (37 samples) ]

That looks like it's doing two different things: -F99, and a
sampling.c script (SEC("func=sys_read")).

I mean just an -F99 that executes a BPF program on each sample. My
most common use for perf is:

perf record -F 99 -a -g -- sleep 30
perf report (or perf script, for making flame graphs)

But this uses perf.data as an intermediate file. With the recent
BPF_MAP_TYPE_STACK_TRACE, we could frequency count stack traces in
kernel context, and just dump a report. Much more efficient. And
improving a very common perf one-liner.

Brendan


Re: perf bpf examples

2016-07-07 Thread Wangnan (F)



On 2016/7/8 1:58, Brendan Gregg wrote:

On Thu, Jul 7, 2016 at 10:54 AM, Brendan Gregg
<brendan.d.gr...@gmail.com> wrote:

On Wed, Jul 6, 2016 at 6:49 PM, Wangnan (F) <wangn...@huawei.com> wrote:



On 2016/7/7 4:29, Brendan Gregg wrote:

G'Day,

Are perf bpf examples shared anywhere? I've seen many posted to lkml
(by Wang Nan), but don't see them in the linux source, or
documentation. Would be very handy to throw them all up somewhere for
searching/learning, if that hasn't already happened, eg, github.

I was also looking to see if perf bpf supports sampling yet, but I
don't think it does. Eg, imagine a:

perf record -F 99 -e bpf_process_samples.c -a -- sleep 10

which would require BPF attaching to perf_swevent_hrtimer()/etc, and
also emitting a map (eg, sampled instruction pointer counts). I don't
think perf currently does either, but was hoping for a collection of
examples to double check.


Currently perf-bpf doesn't support dumpping resuling maps, but
we are working on it. I think you have read our uBPF approach:

http://article.gmane.org/gmane.linux.kernel/2203717

and

http://article.gmane.org/gmane.linux.kernel/2253579

in them we embeded a uBPF virtual machine to perf and give it
the ability to operate the result in maps.

Now we are trying another approach, introduce LLVM to perf,
compile data analysis and report to code. It would be much
powerful.


Great, thanks!

But what about a set of examples covering the existing perf+bpf
capabilities so far? I know you've emailed them to lkml, but has
someone put them all in one place yet? If not, I can go through lkml
and at least put them on github so we can search and learn from them.


Great. Thanks a lot.


... Also, has anyone looked into perf sampling (-F 99) with bpf yet? Thanks,


Theoretically, BPF program is an additional filter to
decide whetier an event should be filtered out or pass to perf. -F 99
is another filter, which drops samples to ensure the frequence.
Filters works together. The full graph should be:

 BPF --> traditional filter --> proc (system wide of proc specific) --> 
period


See the example at the end of this mail. The BPF program returns 0 for 
half of
the events, and the result should be symmetrical. We can get similar 
result without

-F:

# ~/perf record -a --clang-opt '-DCATCH_ODD' -e ./sampling.c dd 
if=/dev/zero of=/dev/null count=8388480

8388480+0 records in
8388480+0 records out
4294901760 bytes (4.3 GB) copied, 11.9908 s, 358 MB/s
[ perf record: Woken up 28 times to write data ]
[ perf record: Captured and wrote 303.915 MB perf.data (4194449 samples) ]
#
root@wn-Lenovo-Product:~# ~/perf record -a --clang-opt '-DCATCH_EVEN' -e 
./sampling.c dd if=/dev/zero of=/dev/null count=8388480

8388480+0 records in
8388480+0 records out
4294901760 bytes (4.3 GB) copied, 12.1154 s, 355 MB/s
[ perf record: Woken up 54 times to write data ]
[ perf record: Captured and wrote 303.933 MB perf.data (4194347 samples) ]


With -F99 added:

# ~/perf record -F99 -a --clang-opt '-DCATCH_ODD' -e ./sampling.c dd 
if=/dev/zero of=/dev/null count=8388480

8388480+0 records in
8388480+0 records out
4294901760 bytes (4.3 GB) copied, 9.60126 s, 447 MB/s
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.402 MB perf.data (35 samples) ]
# ~/perf record -F99 -a --clang-opt '-DCATCH_EVEN' -e ./sampling.c dd 
if=/dev/zero of=/dev/null count=8388480

8388480+0 records in
8388480+0 records out
4294901760 bytes (4.3 GB) copied, 9.76719 s, 440 MB/s
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.399 MB perf.data (37 samples) ]

However, there must be something I don't understand. It takes nearly 10 
seconds to
finish the record, so we should get nearly 1000 samples. Sometimes I can 
get about 500 samples:


# ~/perf record -F99 -a --clang-opt '-DCATCH_ODD' -e ./sampling.c dd 
if=/dev/zero of=/dev/null count=8388480

8388480+0 records in
8388480+0 records out
4294901760 bytes (4.3 GB) copied, 9.60536 s, 447 MB/s
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.431 MB perf.data (555 samples) ]

/
#include 
#define SEC(NAME) __attribute__((section(NAME), used))
struct bpf_map_def {
unsigned int type;
unsigned int key_size;
unsigned int value_size;
unsigned int max_entries;
};
struct bpf_map_def SEC("maps") m = {
.type = BPF_MAP_TYPE_ARRAY,
.key_size = sizeof(int),
.value_size = sizeof(int),
.max_entries = 1,
};
static void *(*map_lookup_elem)(struct bpf_map_def *, void *) =
   (void *)BPF_FUNC_map_lookup_elem;
static int (*trace_printk)(const char *fmt, int fmt_size, ...) =
(void *)BPF_FUNC_trace_printk;
char _license[] SEC("license") = "GPL";
int _version SEC("version") = LINUX_VERSION_CODE;
#ifdef CATCH_ODD
# define RET_ODD  1
# define RET_EVEN 0
#endif
#ifdef CATCH_EVEN
# define R

Re: perf bpf examples

2016-07-07 Thread Wangnan (F)



On 2016/7/8 1:58, Brendan Gregg wrote:

On Thu, Jul 7, 2016 at 10:54 AM, Brendan Gregg
 wrote:

On Wed, Jul 6, 2016 at 6:49 PM, Wangnan (F)  wrote:



On 2016/7/7 4:29, Brendan Gregg wrote:

G'Day,

Are perf bpf examples shared anywhere? I've seen many posted to lkml
(by Wang Nan), but don't see them in the linux source, or
documentation. Would be very handy to throw them all up somewhere for
searching/learning, if that hasn't already happened, eg, github.

I was also looking to see if perf bpf supports sampling yet, but I
don't think it does. Eg, imagine a:

perf record -F 99 -e bpf_process_samples.c -a -- sleep 10

which would require BPF attaching to perf_swevent_hrtimer()/etc, and
also emitting a map (eg, sampled instruction pointer counts). I don't
think perf currently does either, but was hoping for a collection of
examples to double check.


Currently perf-bpf doesn't support dumpping resuling maps, but
we are working on it. I think you have read our uBPF approach:

http://article.gmane.org/gmane.linux.kernel/2203717

and

http://article.gmane.org/gmane.linux.kernel/2253579

in them we embeded a uBPF virtual machine to perf and give it
the ability to operate the result in maps.

Now we are trying another approach, introduce LLVM to perf,
compile data analysis and report to code. It would be much
powerful.


Great, thanks!

But what about a set of examples covering the existing perf+bpf
capabilities so far? I know you've emailed them to lkml, but has
someone put them all in one place yet? If not, I can go through lkml
and at least put them on github so we can search and learn from them.


Great. Thanks a lot.


... Also, has anyone looked into perf sampling (-F 99) with bpf yet? Thanks,


Theoretically, BPF program is an additional filter to
decide whetier an event should be filtered out or pass to perf. -F 99
is another filter, which drops samples to ensure the frequence.
Filters works together. The full graph should be:

 BPF --> traditional filter --> proc (system wide of proc specific) --> 
period


See the example at the end of this mail. The BPF program returns 0 for 
half of
the events, and the result should be symmetrical. We can get similar 
result without

-F:

# ~/perf record -a --clang-opt '-DCATCH_ODD' -e ./sampling.c dd 
if=/dev/zero of=/dev/null count=8388480

8388480+0 records in
8388480+0 records out
4294901760 bytes (4.3 GB) copied, 11.9908 s, 358 MB/s
[ perf record: Woken up 28 times to write data ]
[ perf record: Captured and wrote 303.915 MB perf.data (4194449 samples) ]
#
root@wn-Lenovo-Product:~# ~/perf record -a --clang-opt '-DCATCH_EVEN' -e 
./sampling.c dd if=/dev/zero of=/dev/null count=8388480

8388480+0 records in
8388480+0 records out
4294901760 bytes (4.3 GB) copied, 12.1154 s, 355 MB/s
[ perf record: Woken up 54 times to write data ]
[ perf record: Captured and wrote 303.933 MB perf.data (4194347 samples) ]


With -F99 added:

# ~/perf record -F99 -a --clang-opt '-DCATCH_ODD' -e ./sampling.c dd 
if=/dev/zero of=/dev/null count=8388480

8388480+0 records in
8388480+0 records out
4294901760 bytes (4.3 GB) copied, 9.60126 s, 447 MB/s
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.402 MB perf.data (35 samples) ]
# ~/perf record -F99 -a --clang-opt '-DCATCH_EVEN' -e ./sampling.c dd 
if=/dev/zero of=/dev/null count=8388480

8388480+0 records in
8388480+0 records out
4294901760 bytes (4.3 GB) copied, 9.76719 s, 440 MB/s
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.399 MB perf.data (37 samples) ]

However, there must be something I don't understand. It takes nearly 10 
seconds to
finish the record, so we should get nearly 1000 samples. Sometimes I can 
get about 500 samples:


# ~/perf record -F99 -a --clang-opt '-DCATCH_ODD' -e ./sampling.c dd 
if=/dev/zero of=/dev/null count=8388480

8388480+0 records in
8388480+0 records out
4294901760 bytes (4.3 GB) copied, 9.60536 s, 447 MB/s
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.431 MB perf.data (555 samples) ]

/
#include 
#define SEC(NAME) __attribute__((section(NAME), used))
struct bpf_map_def {
unsigned int type;
unsigned int key_size;
unsigned int value_size;
unsigned int max_entries;
};
struct bpf_map_def SEC("maps") m = {
.type = BPF_MAP_TYPE_ARRAY,
.key_size = sizeof(int),
.value_size = sizeof(int),
.max_entries = 1,
};
static void *(*map_lookup_elem)(struct bpf_map_def *, void *) =
   (void *)BPF_FUNC_map_lookup_elem;
static int (*trace_printk)(const char *fmt, int fmt_size, ...) =
(void *)BPF_FUNC_trace_printk;
char _license[] SEC("license") = "GPL";
int _version SEC("version") = LINUX_VERSION_CODE;
#ifdef CATCH_ODD
# define RET_ODD  1
# define RET_EVEN 0
#endif
#ifdef CATCH_EVEN
# define RET_ODD  0
# define RET_EVEN 1
#endif
SEC("func=sy