[tip:perf/core] perf bpf examples: Convert etcsnoop to use bpf_map()
Commit-ID: f52fdd64f6046ab121688155aebd66b52ce0077d Gitweb: https://git.kernel.org/tip/f52fdd64f6046ab121688155aebd66b52ce0077d Author: Arnaldo Carvalho de Melo AuthorDate: Thu, 24 Jan 2019 15:48:05 +0100 Committer: Arnaldo Carvalho de Melo CommitDate: Fri, 25 Jan 2019 15:12:11 +0100 perf bpf examples: Convert etcsnoop to use bpf_map() Making the code more compact, end result is the same: # trace -e /home/acme/git/perf/tools/perf/examples/bpf/etcsnoop.c 0.000 ( ): sed/7385 openat(dfd: CWD, filename: "/etc/ld.so.cache", flags: RDONLY|CLOEXEC) ... 2727.723 ( ): cat/7389 openat(dfd: CWD, filename: "/etc/ld.so.cache", flags: RDONLY|CLOEXEC) ... 2728.543 ( ): cat/7389 openat(dfd: CWD, filename: "/etc/passwd") ... ^C Cc: Adrian Hunter Cc: Jiri Olsa Cc: Luis Cláudio Gonçalves Cc: Namhyung Kim Cc: Wang Nan Link: https://lkml.kernel.org/n/tip-znhgz24p0daux2kay200o...@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/examples/bpf/etcsnoop.c | 8 ++-- 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/tools/perf/examples/bpf/etcsnoop.c b/tools/perf/examples/bpf/etcsnoop.c index 550e69c2e8d1..e81b535346c0 100644 --- a/tools/perf/examples/bpf/etcsnoop.c +++ b/tools/perf/examples/bpf/etcsnoop.c @@ -21,12 +21,8 @@ #include -struct bpf_map SEC("maps") __augmented_syscalls__ = { - .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY, - .key_size = sizeof(int), - .value_size = sizeof(u32), - .max_entries = __NR_CPUS__, -}; +/* bpf-output associated map */ +bpf_map(__augmented_syscalls__, PERF_EVENT_ARRAY, int, u32, __NR_CPUS__); struct augmented_filename { int size;
[PATCH 28/29] perf bpf examples: Convert etcsnoop to use bpf_map()
From: Arnaldo Carvalho de Melo Making the code more compact, end result is the same: # trace -e /home/acme/git/perf/tools/perf/examples/bpf/etcsnoop.c 0.000 ( ): sed/7385 openat(dfd: CWD, filename: "/etc/ld.so.cache", flags: RDONLY|CLOEXEC) ... 2727.723 ( ): cat/7389 openat(dfd: CWD, filename: "/etc/ld.so.cache", flags: RDONLY|CLOEXEC) ... 2728.543 ( ): cat/7389 openat(dfd: CWD, filename: "/etc/passwd") ... ^C Cc: Adrian Hunter Cc: Jiri Olsa Cc: Luis Cláudio Gonçalves Cc: Namhyung Kim Cc: Wang Nan Link: https://lkml.kernel.org/n/tip-znhgz24p0daux2kay200o...@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/examples/bpf/etcsnoop.c | 8 ++-- 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/tools/perf/examples/bpf/etcsnoop.c b/tools/perf/examples/bpf/etcsnoop.c index 550e69c2e8d1..e81b535346c0 100644 --- a/tools/perf/examples/bpf/etcsnoop.c +++ b/tools/perf/examples/bpf/etcsnoop.c @@ -21,12 +21,8 @@ #include -struct bpf_map SEC("maps") __augmented_syscalls__ = { - .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY, - .key_size = sizeof(int), - .value_size = sizeof(u32), - .max_entries = __NR_CPUS__, -}; +/* bpf-output associated map */ +bpf_map(__augmented_syscalls__, PERF_EVENT_ARRAY, int, u32, __NR_CPUS__); struct augmented_filename { int size; -- 2.20.1
Re: perf bpf examples
On Fri, Jul 8, 2016 at 3:46 AM, Wangnan (F)wrote: > > > On 2016/7/8 15:57, Brendan Gregg wrote: >> [...] >> I mean just an -F99 that executes a BPF program on each sample. My >> most common use for perf is: >> >> perf record -F 99 -a -g -- sleep 30 >> perf report (or perf script, for making flame graphs) >> >> But this uses perf.data as an intermediate file. With the recent >> BPF_MAP_TYPE_STACK_TRACE, we could frequency count stack traces in >> kernel context, and just dump a report. Much more efficient. And >> improving a very common perf one-liner. > > > You can't attach BPF script to samples other than kprobe and tracepoints. > When you use 'perf record -F99 -a -g -- sleep 30', you are sampling on > 'cycles:ppp' event. This is a hardware PMU event. Sure, either cycles:ppp or cpu-clock (my Xen guests have no PMU, sadly). But These are ultimately calling perf_swevent_hrtimer()/etc, so I was wondering if someone was already looking at enhancing this code to support BPF? Ie, BPF should be able to attach to kprobes, uprobes, tracepoints, and timer-based samples. > If we find a kprobe or tracepoint event which would be triggered 99 times > in each second, we can utilize BPF_MAP_TYPE_STACK_TRACE and > bpf_get_stackid(). Yes, that should be a workaround. It's annoying as some like perf_swevent_hrtimer() can't be kprobed (inlined?), but I found perf_misc_flags(struct pt_regs *regs) was called, but passing in that regs to bpf_get_stackid() was returning "type=inv expected=ctx" errors, despite casting. I'm guessing the BPF ctx type is special and can't be casted, but need to dig more. Brendan
Re: perf bpf examples
On Fri, Jul 8, 2016 at 3:46 AM, Wangnan (F) wrote: > > > On 2016/7/8 15:57, Brendan Gregg wrote: >> [...] >> I mean just an -F99 that executes a BPF program on each sample. My >> most common use for perf is: >> >> perf record -F 99 -a -g -- sleep 30 >> perf report (or perf script, for making flame graphs) >> >> But this uses perf.data as an intermediate file. With the recent >> BPF_MAP_TYPE_STACK_TRACE, we could frequency count stack traces in >> kernel context, and just dump a report. Much more efficient. And >> improving a very common perf one-liner. > > > You can't attach BPF script to samples other than kprobe and tracepoints. > When you use 'perf record -F99 -a -g -- sleep 30', you are sampling on > 'cycles:ppp' event. This is a hardware PMU event. Sure, either cycles:ppp or cpu-clock (my Xen guests have no PMU, sadly). But These are ultimately calling perf_swevent_hrtimer()/etc, so I was wondering if someone was already looking at enhancing this code to support BPF? Ie, BPF should be able to attach to kprobes, uprobes, tracepoints, and timer-based samples. > If we find a kprobe or tracepoint event which would be triggered 99 times > in each second, we can utilize BPF_MAP_TYPE_STACK_TRACE and > bpf_get_stackid(). Yes, that should be a workaround. It's annoying as some like perf_swevent_hrtimer() can't be kprobed (inlined?), but I found perf_misc_flags(struct pt_regs *regs) was called, but passing in that regs to bpf_get_stackid() was returning "type=inv expected=ctx" errors, despite casting. I'm guessing the BPF ctx type is special and can't be casted, but need to dig more. Brendan
Re: perf bpf examples
On 2016/7/8 15:57, Brendan Gregg wrote: On Thu, Jul 7, 2016 at 9:18 PM, Wangnan (F)wrote: On 2016/7/8 1:58, Brendan Gregg wrote: On Thu, Jul 7, 2016 at 10:54 AM, Brendan Gregg wrote: On Wed, Jul 6, 2016 at 6:49 PM, Wangnan (F) wrote: [...] ... Also, has anyone looked into perf sampling (-F 99) with bpf yet? Thanks, Theoretically, BPF program is an additional filter to decide whetier an event should be filtered out or pass to perf. -F 99 is another filter, which drops samples to ensure the frequence. Filters works together. The full graph should be: BPF --> traditional filter --> proc (system wide of proc specific) --> period See the example at the end of this mail. The BPF program returns 0 for half of the events, and the result should be symmetrical. We can get similar result without -F: # ~/perf record -a --clang-opt '-DCATCH_ODD' -e ./sampling.c dd if=/dev/zero of=/dev/null count=8388480 8388480+0 records in 8388480+0 records out 4294901760 bytes (4.3 GB) copied, 11.9908 s, 358 MB/s [ perf record: Woken up 28 times to write data ] [ perf record: Captured and wrote 303.915 MB perf.data (4194449 samples) ] # root@wn-Lenovo-Product:~# ~/perf record -a --clang-opt '-DCATCH_EVEN' -e ./sampling.c dd if=/dev/zero of=/dev/null count=8388480 8388480+0 records in 8388480+0 records out 4294901760 bytes (4.3 GB) copied, 12.1154 s, 355 MB/s [ perf record: Woken up 54 times to write data ] [ perf record: Captured and wrote 303.933 MB perf.data (4194347 samples) ] With -F99 added: # ~/perf record -F99 -a --clang-opt '-DCATCH_ODD' -e ./sampling.c dd if=/dev/zero of=/dev/null count=8388480 8388480+0 records in 8388480+0 records out 4294901760 bytes (4.3 GB) copied, 9.60126 s, 447 MB/s [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.402 MB perf.data (35 samples) ] # ~/perf record -F99 -a --clang-opt '-DCATCH_EVEN' -e ./sampling.c dd if=/dev/zero of=/dev/null count=8388480 8388480+0 records in 8388480+0 records out 4294901760 bytes (4.3 GB) copied, 9.76719 s, 440 MB/s [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.399 MB perf.data (37 samples) ] That looks like it's doing two different things: -F99, and a sampling.c script (SEC("func=sys_read")). I mean just an -F99 that executes a BPF program on each sample. My most common use for perf is: perf record -F 99 -a -g -- sleep 30 perf report (or perf script, for making flame graphs) But this uses perf.data as an intermediate file. With the recent BPF_MAP_TYPE_STACK_TRACE, we could frequency count stack traces in kernel context, and just dump a report. Much more efficient. And improving a very common perf one-liner. You can't attach BPF script to samples other than kprobe and tracepoints. When you use 'perf record -F99 -a -g -- sleep 30', you are sampling on 'cycles:ppp' event. This is a hardware PMU event. If we find a kprobe or tracepoint event which would be triggered 99 times in each second, we can utilize BPF_MAP_TYPE_STACK_TRACE and bpf_get_stackid(). Thank you.
Re: perf bpf examples
On 2016/7/8 15:57, Brendan Gregg wrote: On Thu, Jul 7, 2016 at 9:18 PM, Wangnan (F) wrote: On 2016/7/8 1:58, Brendan Gregg wrote: On Thu, Jul 7, 2016 at 10:54 AM, Brendan Gregg wrote: On Wed, Jul 6, 2016 at 6:49 PM, Wangnan (F) wrote: [...] ... Also, has anyone looked into perf sampling (-F 99) with bpf yet? Thanks, Theoretically, BPF program is an additional filter to decide whetier an event should be filtered out or pass to perf. -F 99 is another filter, which drops samples to ensure the frequence. Filters works together. The full graph should be: BPF --> traditional filter --> proc (system wide of proc specific) --> period See the example at the end of this mail. The BPF program returns 0 for half of the events, and the result should be symmetrical. We can get similar result without -F: # ~/perf record -a --clang-opt '-DCATCH_ODD' -e ./sampling.c dd if=/dev/zero of=/dev/null count=8388480 8388480+0 records in 8388480+0 records out 4294901760 bytes (4.3 GB) copied, 11.9908 s, 358 MB/s [ perf record: Woken up 28 times to write data ] [ perf record: Captured and wrote 303.915 MB perf.data (4194449 samples) ] # root@wn-Lenovo-Product:~# ~/perf record -a --clang-opt '-DCATCH_EVEN' -e ./sampling.c dd if=/dev/zero of=/dev/null count=8388480 8388480+0 records in 8388480+0 records out 4294901760 bytes (4.3 GB) copied, 12.1154 s, 355 MB/s [ perf record: Woken up 54 times to write data ] [ perf record: Captured and wrote 303.933 MB perf.data (4194347 samples) ] With -F99 added: # ~/perf record -F99 -a --clang-opt '-DCATCH_ODD' -e ./sampling.c dd if=/dev/zero of=/dev/null count=8388480 8388480+0 records in 8388480+0 records out 4294901760 bytes (4.3 GB) copied, 9.60126 s, 447 MB/s [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.402 MB perf.data (35 samples) ] # ~/perf record -F99 -a --clang-opt '-DCATCH_EVEN' -e ./sampling.c dd if=/dev/zero of=/dev/null count=8388480 8388480+0 records in 8388480+0 records out 4294901760 bytes (4.3 GB) copied, 9.76719 s, 440 MB/s [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.399 MB perf.data (37 samples) ] That looks like it's doing two different things: -F99, and a sampling.c script (SEC("func=sys_read")). I mean just an -F99 that executes a BPF program on each sample. My most common use for perf is: perf record -F 99 -a -g -- sleep 30 perf report (or perf script, for making flame graphs) But this uses perf.data as an intermediate file. With the recent BPF_MAP_TYPE_STACK_TRACE, we could frequency count stack traces in kernel context, and just dump a report. Much more efficient. And improving a very common perf one-liner. You can't attach BPF script to samples other than kprobe and tracepoints. When you use 'perf record -F99 -a -g -- sleep 30', you are sampling on 'cycles:ppp' event. This is a hardware PMU event. If we find a kprobe or tracepoint event which would be triggered 99 times in each second, we can utilize BPF_MAP_TYPE_STACK_TRACE and bpf_get_stackid(). Thank you.
Re: perf bpf examples
On Thu, Jul 7, 2016 at 9:18 PM, Wangnan (F)wrote: > > > On 2016/7/8 1:58, Brendan Gregg wrote: >> >> On Thu, Jul 7, 2016 at 10:54 AM, Brendan Gregg >> wrote: >>> >>> On Wed, Jul 6, 2016 at 6:49 PM, Wangnan (F) wrote: [...] >> ... Also, has anyone looked into perf sampling (-F 99) with bpf yet? >> Thanks, > > > Theoretically, BPF program is an additional filter to > decide whetier an event should be filtered out or pass to perf. -F 99 > is another filter, which drops samples to ensure the frequence. > Filters works together. The full graph should be: > > BPF --> traditional filter --> proc (system wide of proc specific) --> > period > > See the example at the end of this mail. The BPF program returns 0 for half > of > the events, and the result should be symmetrical. We can get similar result > without > -F: > > # ~/perf record -a --clang-opt '-DCATCH_ODD' -e ./sampling.c dd if=/dev/zero > of=/dev/null count=8388480 > 8388480+0 records in > 8388480+0 records out > 4294901760 bytes (4.3 GB) copied, 11.9908 s, 358 MB/s > [ perf record: Woken up 28 times to write data ] > [ perf record: Captured and wrote 303.915 MB perf.data (4194449 samples) ] > # > root@wn-Lenovo-Product:~# ~/perf record -a --clang-opt '-DCATCH_EVEN' -e > ./sampling.c dd if=/dev/zero of=/dev/null count=8388480 > 8388480+0 records in > 8388480+0 records out > 4294901760 bytes (4.3 GB) copied, 12.1154 s, 355 MB/s > [ perf record: Woken up 54 times to write data ] > [ perf record: Captured and wrote 303.933 MB perf.data (4194347 samples) ] > > > With -F99 added: > > # ~/perf record -F99 -a --clang-opt '-DCATCH_ODD' -e ./sampling.c dd > if=/dev/zero of=/dev/null count=8388480 > 8388480+0 records in > 8388480+0 records out > 4294901760 bytes (4.3 GB) copied, 9.60126 s, 447 MB/s > [ perf record: Woken up 1 times to write data ] > [ perf record: Captured and wrote 0.402 MB perf.data (35 samples) ] > # ~/perf record -F99 -a --clang-opt '-DCATCH_EVEN' -e ./sampling.c dd > if=/dev/zero of=/dev/null count=8388480 > 8388480+0 records in > 8388480+0 records out > 4294901760 bytes (4.3 GB) copied, 9.76719 s, 440 MB/s > [ perf record: Woken up 1 times to write data ] > [ perf record: Captured and wrote 0.399 MB perf.data (37 samples) ] That looks like it's doing two different things: -F99, and a sampling.c script (SEC("func=sys_read")). I mean just an -F99 that executes a BPF program on each sample. My most common use for perf is: perf record -F 99 -a -g -- sleep 30 perf report (or perf script, for making flame graphs) But this uses perf.data as an intermediate file. With the recent BPF_MAP_TYPE_STACK_TRACE, we could frequency count stack traces in kernel context, and just dump a report. Much more efficient. And improving a very common perf one-liner. Brendan
Re: perf bpf examples
On Thu, Jul 7, 2016 at 9:18 PM, Wangnan (F) wrote: > > > On 2016/7/8 1:58, Brendan Gregg wrote: >> >> On Thu, Jul 7, 2016 at 10:54 AM, Brendan Gregg >> wrote: >>> >>> On Wed, Jul 6, 2016 at 6:49 PM, Wangnan (F) wrote: [...] >> ... Also, has anyone looked into perf sampling (-F 99) with bpf yet? >> Thanks, > > > Theoretically, BPF program is an additional filter to > decide whetier an event should be filtered out or pass to perf. -F 99 > is another filter, which drops samples to ensure the frequence. > Filters works together. The full graph should be: > > BPF --> traditional filter --> proc (system wide of proc specific) --> > period > > See the example at the end of this mail. The BPF program returns 0 for half > of > the events, and the result should be symmetrical. We can get similar result > without > -F: > > # ~/perf record -a --clang-opt '-DCATCH_ODD' -e ./sampling.c dd if=/dev/zero > of=/dev/null count=8388480 > 8388480+0 records in > 8388480+0 records out > 4294901760 bytes (4.3 GB) copied, 11.9908 s, 358 MB/s > [ perf record: Woken up 28 times to write data ] > [ perf record: Captured and wrote 303.915 MB perf.data (4194449 samples) ] > # > root@wn-Lenovo-Product:~# ~/perf record -a --clang-opt '-DCATCH_EVEN' -e > ./sampling.c dd if=/dev/zero of=/dev/null count=8388480 > 8388480+0 records in > 8388480+0 records out > 4294901760 bytes (4.3 GB) copied, 12.1154 s, 355 MB/s > [ perf record: Woken up 54 times to write data ] > [ perf record: Captured and wrote 303.933 MB perf.data (4194347 samples) ] > > > With -F99 added: > > # ~/perf record -F99 -a --clang-opt '-DCATCH_ODD' -e ./sampling.c dd > if=/dev/zero of=/dev/null count=8388480 > 8388480+0 records in > 8388480+0 records out > 4294901760 bytes (4.3 GB) copied, 9.60126 s, 447 MB/s > [ perf record: Woken up 1 times to write data ] > [ perf record: Captured and wrote 0.402 MB perf.data (35 samples) ] > # ~/perf record -F99 -a --clang-opt '-DCATCH_EVEN' -e ./sampling.c dd > if=/dev/zero of=/dev/null count=8388480 > 8388480+0 records in > 8388480+0 records out > 4294901760 bytes (4.3 GB) copied, 9.76719 s, 440 MB/s > [ perf record: Woken up 1 times to write data ] > [ perf record: Captured and wrote 0.399 MB perf.data (37 samples) ] That looks like it's doing two different things: -F99, and a sampling.c script (SEC("func=sys_read")). I mean just an -F99 that executes a BPF program on each sample. My most common use for perf is: perf record -F 99 -a -g -- sleep 30 perf report (or perf script, for making flame graphs) But this uses perf.data as an intermediate file. With the recent BPF_MAP_TYPE_STACK_TRACE, we could frequency count stack traces in kernel context, and just dump a report. Much more efficient. And improving a very common perf one-liner. Brendan
Re: perf bpf examples
On 2016/7/8 1:58, Brendan Gregg wrote: On Thu, Jul 7, 2016 at 10:54 AM, Brendan Gregg <brendan.d.gr...@gmail.com> wrote: On Wed, Jul 6, 2016 at 6:49 PM, Wangnan (F) <wangn...@huawei.com> wrote: On 2016/7/7 4:29, Brendan Gregg wrote: G'Day, Are perf bpf examples shared anywhere? I've seen many posted to lkml (by Wang Nan), but don't see them in the linux source, or documentation. Would be very handy to throw them all up somewhere for searching/learning, if that hasn't already happened, eg, github. I was also looking to see if perf bpf supports sampling yet, but I don't think it does. Eg, imagine a: perf record -F 99 -e bpf_process_samples.c -a -- sleep 10 which would require BPF attaching to perf_swevent_hrtimer()/etc, and also emitting a map (eg, sampled instruction pointer counts). I don't think perf currently does either, but was hoping for a collection of examples to double check. Currently perf-bpf doesn't support dumpping resuling maps, but we are working on it. I think you have read our uBPF approach: http://article.gmane.org/gmane.linux.kernel/2203717 and http://article.gmane.org/gmane.linux.kernel/2253579 in them we embeded a uBPF virtual machine to perf and give it the ability to operate the result in maps. Now we are trying another approach, introduce LLVM to perf, compile data analysis and report to code. It would be much powerful. Great, thanks! But what about a set of examples covering the existing perf+bpf capabilities so far? I know you've emailed them to lkml, but has someone put them all in one place yet? If not, I can go through lkml and at least put them on github so we can search and learn from them. Great. Thanks a lot. ... Also, has anyone looked into perf sampling (-F 99) with bpf yet? Thanks, Theoretically, BPF program is an additional filter to decide whetier an event should be filtered out or pass to perf. -F 99 is another filter, which drops samples to ensure the frequence. Filters works together. The full graph should be: BPF --> traditional filter --> proc (system wide of proc specific) --> period See the example at the end of this mail. The BPF program returns 0 for half of the events, and the result should be symmetrical. We can get similar result without -F: # ~/perf record -a --clang-opt '-DCATCH_ODD' -e ./sampling.c dd if=/dev/zero of=/dev/null count=8388480 8388480+0 records in 8388480+0 records out 4294901760 bytes (4.3 GB) copied, 11.9908 s, 358 MB/s [ perf record: Woken up 28 times to write data ] [ perf record: Captured and wrote 303.915 MB perf.data (4194449 samples) ] # root@wn-Lenovo-Product:~# ~/perf record -a --clang-opt '-DCATCH_EVEN' -e ./sampling.c dd if=/dev/zero of=/dev/null count=8388480 8388480+0 records in 8388480+0 records out 4294901760 bytes (4.3 GB) copied, 12.1154 s, 355 MB/s [ perf record: Woken up 54 times to write data ] [ perf record: Captured and wrote 303.933 MB perf.data (4194347 samples) ] With -F99 added: # ~/perf record -F99 -a --clang-opt '-DCATCH_ODD' -e ./sampling.c dd if=/dev/zero of=/dev/null count=8388480 8388480+0 records in 8388480+0 records out 4294901760 bytes (4.3 GB) copied, 9.60126 s, 447 MB/s [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.402 MB perf.data (35 samples) ] # ~/perf record -F99 -a --clang-opt '-DCATCH_EVEN' -e ./sampling.c dd if=/dev/zero of=/dev/null count=8388480 8388480+0 records in 8388480+0 records out 4294901760 bytes (4.3 GB) copied, 9.76719 s, 440 MB/s [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.399 MB perf.data (37 samples) ] However, there must be something I don't understand. It takes nearly 10 seconds to finish the record, so we should get nearly 1000 samples. Sometimes I can get about 500 samples: # ~/perf record -F99 -a --clang-opt '-DCATCH_ODD' -e ./sampling.c dd if=/dev/zero of=/dev/null count=8388480 8388480+0 records in 8388480+0 records out 4294901760 bytes (4.3 GB) copied, 9.60536 s, 447 MB/s [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.431 MB perf.data (555 samples) ] / #include #define SEC(NAME) __attribute__((section(NAME), used)) struct bpf_map_def { unsigned int type; unsigned int key_size; unsigned int value_size; unsigned int max_entries; }; struct bpf_map_def SEC("maps") m = { .type = BPF_MAP_TYPE_ARRAY, .key_size = sizeof(int), .value_size = sizeof(int), .max_entries = 1, }; static void *(*map_lookup_elem)(struct bpf_map_def *, void *) = (void *)BPF_FUNC_map_lookup_elem; static int (*trace_printk)(const char *fmt, int fmt_size, ...) = (void *)BPF_FUNC_trace_printk; char _license[] SEC("license") = "GPL"; int _version SEC("version") = LINUX_VERSION_CODE; #ifdef CATCH_ODD # define RET_ODD 1 # define RET_EVEN 0 #endif #ifdef CATCH_EVEN # define R
Re: perf bpf examples
On 2016/7/8 1:58, Brendan Gregg wrote: On Thu, Jul 7, 2016 at 10:54 AM, Brendan Gregg wrote: On Wed, Jul 6, 2016 at 6:49 PM, Wangnan (F) wrote: On 2016/7/7 4:29, Brendan Gregg wrote: G'Day, Are perf bpf examples shared anywhere? I've seen many posted to lkml (by Wang Nan), but don't see them in the linux source, or documentation. Would be very handy to throw them all up somewhere for searching/learning, if that hasn't already happened, eg, github. I was also looking to see if perf bpf supports sampling yet, but I don't think it does. Eg, imagine a: perf record -F 99 -e bpf_process_samples.c -a -- sleep 10 which would require BPF attaching to perf_swevent_hrtimer()/etc, and also emitting a map (eg, sampled instruction pointer counts). I don't think perf currently does either, but was hoping for a collection of examples to double check. Currently perf-bpf doesn't support dumpping resuling maps, but we are working on it. I think you have read our uBPF approach: http://article.gmane.org/gmane.linux.kernel/2203717 and http://article.gmane.org/gmane.linux.kernel/2253579 in them we embeded a uBPF virtual machine to perf and give it the ability to operate the result in maps. Now we are trying another approach, introduce LLVM to perf, compile data analysis and report to code. It would be much powerful. Great, thanks! But what about a set of examples covering the existing perf+bpf capabilities so far? I know you've emailed them to lkml, but has someone put them all in one place yet? If not, I can go through lkml and at least put them on github so we can search and learn from them. Great. Thanks a lot. ... Also, has anyone looked into perf sampling (-F 99) with bpf yet? Thanks, Theoretically, BPF program is an additional filter to decide whetier an event should be filtered out or pass to perf. -F 99 is another filter, which drops samples to ensure the frequence. Filters works together. The full graph should be: BPF --> traditional filter --> proc (system wide of proc specific) --> period See the example at the end of this mail. The BPF program returns 0 for half of the events, and the result should be symmetrical. We can get similar result without -F: # ~/perf record -a --clang-opt '-DCATCH_ODD' -e ./sampling.c dd if=/dev/zero of=/dev/null count=8388480 8388480+0 records in 8388480+0 records out 4294901760 bytes (4.3 GB) copied, 11.9908 s, 358 MB/s [ perf record: Woken up 28 times to write data ] [ perf record: Captured and wrote 303.915 MB perf.data (4194449 samples) ] # root@wn-Lenovo-Product:~# ~/perf record -a --clang-opt '-DCATCH_EVEN' -e ./sampling.c dd if=/dev/zero of=/dev/null count=8388480 8388480+0 records in 8388480+0 records out 4294901760 bytes (4.3 GB) copied, 12.1154 s, 355 MB/s [ perf record: Woken up 54 times to write data ] [ perf record: Captured and wrote 303.933 MB perf.data (4194347 samples) ] With -F99 added: # ~/perf record -F99 -a --clang-opt '-DCATCH_ODD' -e ./sampling.c dd if=/dev/zero of=/dev/null count=8388480 8388480+0 records in 8388480+0 records out 4294901760 bytes (4.3 GB) copied, 9.60126 s, 447 MB/s [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.402 MB perf.data (35 samples) ] # ~/perf record -F99 -a --clang-opt '-DCATCH_EVEN' -e ./sampling.c dd if=/dev/zero of=/dev/null count=8388480 8388480+0 records in 8388480+0 records out 4294901760 bytes (4.3 GB) copied, 9.76719 s, 440 MB/s [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.399 MB perf.data (37 samples) ] However, there must be something I don't understand. It takes nearly 10 seconds to finish the record, so we should get nearly 1000 samples. Sometimes I can get about 500 samples: # ~/perf record -F99 -a --clang-opt '-DCATCH_ODD' -e ./sampling.c dd if=/dev/zero of=/dev/null count=8388480 8388480+0 records in 8388480+0 records out 4294901760 bytes (4.3 GB) copied, 9.60536 s, 447 MB/s [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.431 MB perf.data (555 samples) ] / #include #define SEC(NAME) __attribute__((section(NAME), used)) struct bpf_map_def { unsigned int type; unsigned int key_size; unsigned int value_size; unsigned int max_entries; }; struct bpf_map_def SEC("maps") m = { .type = BPF_MAP_TYPE_ARRAY, .key_size = sizeof(int), .value_size = sizeof(int), .max_entries = 1, }; static void *(*map_lookup_elem)(struct bpf_map_def *, void *) = (void *)BPF_FUNC_map_lookup_elem; static int (*trace_printk)(const char *fmt, int fmt_size, ...) = (void *)BPF_FUNC_trace_printk; char _license[] SEC("license") = "GPL"; int _version SEC("version") = LINUX_VERSION_CODE; #ifdef CATCH_ODD # define RET_ODD 1 # define RET_EVEN 0 #endif #ifdef CATCH_EVEN # define RET_ODD 0 # define RET_EVEN 1 #endif SEC("func=sy