[PATCH bpf-next v2] bpf, doc: Update bpf_jit_enable limitation for CONFIG_BPF_JIT_ALWAYS_ON
When CONFIG_BPF_JIT_ALWAYS_ON is enabled, kernel has limitation for bpf_jit_enable, so it has fixed value 1 and we cannot set it to 2 for JIT opcode dumping; this patch is to update the doc for it. Suggested-by: Daniel Borkmann <dan...@iogearbox.net> Signed-off-by: Leo Yan <leo@linaro.org> --- Documentation/networking/filter.txt | 6 ++ 1 file changed, 6 insertions(+) diff --git a/Documentation/networking/filter.txt b/Documentation/networking/filter.txt index fd55c7d..5032e12 100644 --- a/Documentation/networking/filter.txt +++ b/Documentation/networking/filter.txt @@ -483,6 +483,12 @@ Example output from dmesg: [ 3389.935851] JIT code: 0030: 00 e8 28 94 ff e0 83 f8 01 75 07 b8 ff ff 00 00 [ 3389.935852] JIT code: 0040: eb 02 31 c0 c9 c3 +When CONFIG_BPF_JIT_ALWAYS_ON is enabled, bpf_jit_enable is permanently set to 1 and +setting any other value than that will return in failure. This is even the case for +setting bpf_jit_enable to 2, since dumping the final JIT image into the kernel log +is discouraged and introspection through bpftool (under tools/bpf/bpftool/) is the +generally recommended approach instead. + In the kernel source tree under tools/bpf/, there's bpf_jit_disasm for generating disassembly out of the kernel log's hexdump: -- 1.9.1
Re: [PATCH bpf-next] bpf, doc: Update bpf_jit_enable limitation for CONFIG_BPF_JIT_ALWAYS_ON
On Fri, Apr 27, 2018 at 11:44:44AM +0200, Daniel Borkmann wrote: > On 04/26/2018 04:26 AM, Leo Yan wrote: > > When CONFIG_BPF_JIT_ALWAYS_ON is enabled, kernel has limitation for > > bpf_jit_enable, so it has fixed value 1 and we cannot set it to 2 > > for JIT opcode dumping; this patch is to update the doc for it. > > > > Signed-off-by: Leo Yan <leo@linaro.org> > > --- > > Documentation/networking/filter.txt | 6 ++ > > 1 file changed, 6 insertions(+) > > > > diff --git a/Documentation/networking/filter.txt > > b/Documentation/networking/filter.txt > > index fd55c7d..feddab9 100644 > > --- a/Documentation/networking/filter.txt > > +++ b/Documentation/networking/filter.txt > > @@ -483,6 +483,12 @@ Example output from dmesg: > > [ 3389.935851] JIT code: 0030: 00 e8 28 94 ff e0 83 f8 01 75 07 b8 ff > > ff 00 00 > > [ 3389.935852] JIT code: 0040: eb 02 31 c0 c9 c3 > > > > +When CONFIG_BPF_JIT_ALWAYS_ON is enabled, bpf_jit_enable is set to 1 by > > default > > +and it returns failure if change to any other value from proc node; this is > > +for security consideration to avoid leaking info to unprivileged users. In > > this > > +case, we can't directly dump JIT opcode image from kernel log, > > alternatively we > > +need to use bpf tool for the dumping. > > + > > Could you change this doc text a bit, I think it's slightly misleading. From > the first > sentence one could also interpret that value 0 would leaking info to > unprivileged users > whereas here we're only talking about the case of value 2. Maybe something > roughly like > this to make it more clear: > > When CONFIG_BPF_JIT_ALWAYS_ON is enabled, bpf_jit_enable is permanently set > to 1 and > setting any other value than that will return in failure. This is even the > case for > setting bpf_jit_enable to 2, since dumping the final JIT image into the > kernel log > is discouraged and introspection through bpftool (under tools/bpf/bpftool/) > is the > generally recommended approach instead. Yeah, your rephrasing is more clear and better. Will do this and send new patch soon. Thanks for your helping. > Thanks, > Daniel
Re: [PATCH bpf-next] bpf: Allow bpf_jit_enable = 2 with BPF_JIT_ALWAYS_ON config
On Wed, Apr 25, 2018 at 05:37:39PM +0200, Daniel Borkmann wrote: > On 04/25/2018 04:14 PM, Alexei Starovoitov wrote: > > On Wed, Apr 25, 2018 at 05:25:47PM +0800, Leo Yan wrote: > >> > >> If we have concern for security issue, should we remove support for > >> 'bpf_jit_enable = 2' and modify the doc to reflect this change? > > > > I suggest to fix the doc. > > Agree, lets do that instead. Leo, could you cook a patch for that? Sure, have sent new patch for this. Thanks for suggestion! > Thanks, > Daniel
[PATCH bpf-next] bpf, doc: Update bpf_jit_enable limitation for CONFIG_BPF_JIT_ALWAYS_ON
When CONFIG_BPF_JIT_ALWAYS_ON is enabled, kernel has limitation for bpf_jit_enable, so it has fixed value 1 and we cannot set it to 2 for JIT opcode dumping; this patch is to update the doc for it. Signed-off-by: Leo Yan <leo@linaro.org> --- Documentation/networking/filter.txt | 6 ++ 1 file changed, 6 insertions(+) diff --git a/Documentation/networking/filter.txt b/Documentation/networking/filter.txt index fd55c7d..feddab9 100644 --- a/Documentation/networking/filter.txt +++ b/Documentation/networking/filter.txt @@ -483,6 +483,12 @@ Example output from dmesg: [ 3389.935851] JIT code: 0030: 00 e8 28 94 ff e0 83 f8 01 75 07 b8 ff ff 00 00 [ 3389.935852] JIT code: 0040: eb 02 31 c0 c9 c3 +When CONFIG_BPF_JIT_ALWAYS_ON is enabled, bpf_jit_enable is set to 1 by default +and it returns failure if change to any other value from proc node; this is +for security consideration to avoid leaking info to unprivileged users. In this +case, we can't directly dump JIT opcode image from kernel log, alternatively we +need to use bpf tool for the dumping. + In the kernel source tree under tools/bpf/, there's bpf_jit_disasm for generating disassembly out of the kernel log's hexdump: -- 1.9.1
Re: [PATCH bpf-next 1/5] samples/bpf: Fix typo in comment
On Fri, Apr 20, 2018 at 03:52:13PM +0200, Jesper Dangaard Brouer wrote: > On Fri, 20 Apr 2018 14:21:16 +0100 > Daniel Thompson <daniel.thomp...@linaro.org> wrote: > > > On Fri, Apr 20, 2018 at 02:10:04PM +0200, Jesper Dangaard Brouer wrote: > > > > > > On Thu, 19 Apr 2018 09:34:02 +0800 Leo Yan <leo@linaro.org> wrote: > > > > > > > Fix typo by replacing 'iif' with 'if'. > > > > > > > > Signed-off-by: Leo Yan <leo@linaro.org> > > > > --- > > > > samples/bpf/bpf_load.c | 2 +- > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c > > > > index bebe418..28e4678 100644 > > > > --- a/samples/bpf/bpf_load.c > > > > +++ b/samples/bpf/bpf_load.c > > > > @@ -393,7 +393,7 @@ static int load_elf_maps_section(struct > > > > bpf_map_data *maps, int maps_shndx, > > > > continue; > > > > if (sym[nr_maps].st_shndx != maps_shndx) > > > > continue; > > > > - /* Only increment iif maps section */ > > > > + /* Only increment if maps section */ > > > > nr_maps++; > > > > } > > > > > > This was actually not a typo from my side. > > > > > > With 'iif' I mean 'if and only if' ... but it doesn't matter much. > > > > I think 'if and only if' is more commonly abbreviated 'iff' isn't it? > > Ah, yes![1] -- then it *is* actually a typo! - LOL > > I'm fine with changing this to "if" :-) Thanks for the reviewing, Daniel & Jesper. I also learn it from the discussion :) > [1] https://en.wikipedia.org/wiki/If_and_only_if > > -- > Best regards, > Jesper Dangaard Brouer > MSc.CS, Principal Kernel Engineer at Red Hat > LinkedIn: http://www.linkedin.com/in/brouer
Re: [PATCH bpf-next] bpf: Allow bpf_jit_enable = 2 with BPF_JIT_ALWAYS_ON config
Hi Daniel, On Wed, Apr 25, 2018 at 11:12:21AM +0200, Daniel Borkmann wrote: > On 04/25/2018 10:18 AM, Leo Yan wrote: > > After enabled BPF_JIT_ALWAYS_ON config, bpf_jit_enable always equals to > > 1; it is impossible to set 'bpf_jit_enable = 2' and the kernel has no > > chance to call bpf_jit_dump(). > > > > This patch relaxes bpf_jit_enable range to [1..2] when kernel config > > BPF_JIT_ALWAYS_ON is enabled so can invoke jit dump. > > > > Signed-off-by: Leo Yan <leo@linaro.org> > > Is there a specific reason why you need this here instead of retrieving the > dump from the newer interface available from bpftool (tools/bpf/bpftool/)? > The bpf_jit_enable = 2 is not recommended these days since it dumps into the > kernel log which is often readable from unpriv as well. bpftool makes use > of the BPF_OBJ_GET_INFO_BY_FD interface via bpf syscall to get the JIT dump > instead when bpf_jit_enable is set. Thanks for reviewing. When I read the doc Documentation/networking/filter.txt and the section "JIT compiler" it suggests as below. So I tried to set 'bpf_jit_enable = 2' to dump JIT code, but it failed. If we have concern for security issue, should we remove support for 'bpf_jit_enable = 2' and modify the doc to reflect this change? ---8<--- For JIT developers, doing audits etc, each compile run can output the generated opcode image into the kernel log via: echo 2 > /proc/sys/net/core/bpf_jit_enable Example output from dmesg: [ 3389.935842] flen=6 proglen=70 pass=3 image=a0069c8f [ 3389.935847] JIT code: : 55 48 89 e5 48 83 ec 60 48 89 5d f8 44 8b 4f 68 [ 3389.935849] JIT code: 0010: 44 2b 4f 6c 4c 8b 87 d8 00 00 00 be 0c 00 00 00 [ 3389.935850] JIT code: 0020: e8 1d 94 ff e0 3d 00 08 00 00 75 16 be 17 00 00 [ 3389.935851] JIT code: 0030: 00 e8 28 94 ff e0 83 f8 01 75 07 b8 ff ff 00 00 [ 3389.935852] JIT code: 0040: eb 02 31 c0 c9 c3 > > --- > > net/core/sysctl_net_core.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c > > index b1a2c5e..6a39b22 100644 > > --- a/net/core/sysctl_net_core.c > > +++ b/net/core/sysctl_net_core.c > > @@ -371,7 +371,7 @@ static int proc_dointvec_minmax_bpf_enable(struct > > ctl_table *table, int write, > > .proc_handler = proc_dointvec_minmax_bpf_enable, > > # ifdef CONFIG_BPF_JIT_ALWAYS_ON > > .extra1 = , > > - .extra2 = , > > + .extra2 = , > > # else > > .extra1 = , > > .extra2 = , > > >
[PATCH bpf-next] bpf: Allow bpf_jit_enable = 2 with BPF_JIT_ALWAYS_ON config
After enabled BPF_JIT_ALWAYS_ON config, bpf_jit_enable always equals to 1; it is impossible to set 'bpf_jit_enable = 2' and the kernel has no chance to call bpf_jit_dump(). This patch relaxes bpf_jit_enable range to [1..2] when kernel config BPF_JIT_ALWAYS_ON is enabled so can invoke jit dump. Signed-off-by: Leo Yan <leo@linaro.org> --- net/core/sysctl_net_core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c index b1a2c5e..6a39b22 100644 --- a/net/core/sysctl_net_core.c +++ b/net/core/sysctl_net_core.c @@ -371,7 +371,7 @@ static int proc_dointvec_minmax_bpf_enable(struct ctl_table *table, int write, .proc_handler = proc_dointvec_minmax_bpf_enable, # ifdef CONFIG_BPF_JIT_ALWAYS_ON .extra1 = , - .extra2 = , + .extra2 = , # else .extra1 = , .extra2 = , -- 1.9.1
Re: [PATCH bpf-next 4/5] samples/bpf: Refine printing symbol for sampleip
On Wed, Apr 18, 2018 at 10:21:25PM -0700, Alexei Starovoitov wrote: > On Thu, Apr 19, 2018 at 01:12:49PM +0800, Leo Yan wrote: > > On Wed, Apr 18, 2018 at 09:47:45PM -0700, Alexei Starovoitov wrote: > > > On Thu, Apr 19, 2018 at 09:34:05AM +0800, Leo Yan wrote: > > > > The code defines macro 'PAGE_OFFSET' and uses it to decide if the > > > > address is in kernel space or not. But different architecture has > > > > different 'PAGE_OFFSET' so this program cannot be used for all > > > > platforms. > > > > > > > > This commit changes to check returned pointer from ksym_search() to > > > > judge if the address falls into kernel space or not, and removes > > > > macro 'PAGE_OFFSET' as it isn't used anymore. As result, this program > > > > has no architecture dependency. > > > > > > > > Signed-off-by: Leo Yan <leo@linaro.org> > > > > --- > > > > samples/bpf/sampleip_user.c | 8 +++- > > > > 1 file changed, 3 insertions(+), 5 deletions(-) > > > > > > > > diff --git a/samples/bpf/sampleip_user.c b/samples/bpf/sampleip_user.c > > > > index 4ed690b..0eea1b3 100644 > > > > --- a/samples/bpf/sampleip_user.c > > > > +++ b/samples/bpf/sampleip_user.c > > > > @@ -26,7 +26,6 @@ > > > > #define DEFAULT_FREQ 99 > > > > #define DEFAULT_SECS 5 > > > > #define MAX_IPS8192 > > > > -#define PAGE_OFFSET0x8800 > > > > > > > > static int nr_cpus; > > > > > > > > @@ -107,14 +106,13 @@ static void print_ip_map(int fd) > > > > /* sort and print */ > > > > qsort(counts, max, sizeof(struct ipcount), count_cmp); > > > > for (i = 0; i < max; i++) { > > > > - if (counts[i].ip > PAGE_OFFSET) { > > > > - sym = ksym_search(counts[i].ip); > > > > > > yes. it is x64 specific, since it's a sample code, > > > but simply removing it is not a fix. > > > It makes this sampleip code behaving incorrectly. > > > To do such 'cleanup of ksym' please refactor it in the true generic way, > > > so these ksym* helpers can work on all archs and put this new > > > functionality into selftests. > > > > Just want to check, are you suggesting to create a standalone > > testing for kallsyms (like a folder tools/testing/selftests/ksym) or > > do you mean to place the generic code into folder > > tools/testing/selftests/bpf/? > > I think the minimal first step is to cleanup ksym bits into something > that bpf selftests can use and keep it as new .c in > tools/testing/selftests/bpf/ > Later if it becomes useful for other tests in selftests it can be moved. Thanks for explanation, now it's clear for me :) > Thanks! >
Re: [PATCH bpf-next 4/5] samples/bpf: Refine printing symbol for sampleip
On Wed, Apr 18, 2018 at 09:47:45PM -0700, Alexei Starovoitov wrote: > On Thu, Apr 19, 2018 at 09:34:05AM +0800, Leo Yan wrote: > > The code defines macro 'PAGE_OFFSET' and uses it to decide if the > > address is in kernel space or not. But different architecture has > > different 'PAGE_OFFSET' so this program cannot be used for all > > platforms. > > > > This commit changes to check returned pointer from ksym_search() to > > judge if the address falls into kernel space or not, and removes > > macro 'PAGE_OFFSET' as it isn't used anymore. As result, this program > > has no architecture dependency. > > > > Signed-off-by: Leo Yan <leo@linaro.org> > > --- > > samples/bpf/sampleip_user.c | 8 +++- > > 1 file changed, 3 insertions(+), 5 deletions(-) > > > > diff --git a/samples/bpf/sampleip_user.c b/samples/bpf/sampleip_user.c > > index 4ed690b..0eea1b3 100644 > > --- a/samples/bpf/sampleip_user.c > > +++ b/samples/bpf/sampleip_user.c > > @@ -26,7 +26,6 @@ > > #define DEFAULT_FREQ 99 > > #define DEFAULT_SECS 5 > > #define MAX_IPS8192 > > -#define PAGE_OFFSET0x8800 > > > > static int nr_cpus; > > > > @@ -107,14 +106,13 @@ static void print_ip_map(int fd) > > /* sort and print */ > > qsort(counts, max, sizeof(struct ipcount), count_cmp); > > for (i = 0; i < max; i++) { > > - if (counts[i].ip > PAGE_OFFSET) { > > - sym = ksym_search(counts[i].ip); > > yes. it is x64 specific, since it's a sample code, > but simply removing it is not a fix. > It makes this sampleip code behaving incorrectly. > To do such 'cleanup of ksym' please refactor it in the true generic way, > so these ksym* helpers can work on all archs and put this new > functionality into selftests. Just want to check, are you suggesting to create a standalone testing for kallsyms (like a folder tools/testing/selftests/ksym) or do you mean to place the generic code into folder tools/testing/selftests/bpf/? > If you can convert these examples into proper self-checking tests > that run out of selftests that would be awesome. Thank you for suggestions, Alexei. > Thanks! >
[PATCH bpf-next 0/5] samples/bpf: Minor fixes and cleanup
This patch series is minor fixes and cleanup for bpf load and samples code. The first one patch is typo fixing; patch 0002 is refactor for dynamically allocate memory for kernel symbol structures; the last three patches are mainly related with refactor with function ksym_search(), the main benefit of this refactor is program sampleip can be used without architecture dependency. The patch series has been tested on ARM64 Hikey960 boards. Leo Yan (5): samples/bpf: Fix typo in comment samples/bpf: Dynamically allocate structure 'syms' samples/bpf: Use NULL for failed to find symbol samples/bpf: Refine printing symbol for sampleip samples/bpf: Handle NULL pointer returned by ksym_search() samples/bpf/bpf_load.c | 29 +++-- samples/bpf/offwaketime_user.c | 5 + samples/bpf/sampleip_user.c| 8 +++- samples/bpf/spintest_user.c| 5 - samples/bpf/trace_event_user.c | 5 + 5 files changed, 40 insertions(+), 12 deletions(-) -- 1.9.1
[PATCH bpf-next 1/5] samples/bpf: Fix typo in comment
Fix typo by replacing 'iif' with 'if'. Signed-off-by: Leo Yan <leo@linaro.org> --- samples/bpf/bpf_load.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c index bebe418..28e4678 100644 --- a/samples/bpf/bpf_load.c +++ b/samples/bpf/bpf_load.c @@ -393,7 +393,7 @@ static int load_elf_maps_section(struct bpf_map_data *maps, int maps_shndx, continue; if (sym[nr_maps].st_shndx != maps_shndx) continue; - /* Only increment iif maps section */ + /* Only increment if maps section */ nr_maps++; } -- 1.9.1
[PATCH bpf-next 3/5] samples/bpf: Use NULL for failed to find symbol
Function ksym_search() is used to parse address and return the symbol structure, when the address is out of range for kernel symbols it returns the symbol structure of kernel '_stext' entry; this introduces confusion and it misses the chance to intuitively tell the address is out of range. This commit changes to use NULL pointer for failed to find symbol, user functions need to check the pointer is NULL and get to know the address has no corresponding kernel symbol for it. Signed-off-by: Leo Yan <leo@linaro.org> --- samples/bpf/bpf_load.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c index c2bf7ca..0c0584f 100644 --- a/samples/bpf/bpf_load.c +++ b/samples/bpf/bpf_load.c @@ -726,7 +726,7 @@ struct ksym *ksym_search(long key) /* valid ksym */ return [start - 1]; - /* out of range. return _stext */ - return [0]; + /* out of range. return NULL */ + return NULL; } -- 1.9.1
[PATCH bpf-next 2/5] samples/bpf: Dynamically allocate structure 'syms'
Structure 'syms' is used to store kernel symbol info by reading proc fs node '/proc/kallsyms', this structure is declared with 30 entries and static linked into bss section. For most case the kernel symbols has less than 30 entries, so it's safe to define so large array, but the side effect is bss section is big introduced by this structure and it isn't flexible. To fix this, this patch dynamically allocates memory for structure 'syms' based on parsing '/proc/kallsyms' line number at the runtime, which can save elf file required memory significantly. Before: textdata bss dec hex filename 188411172 5199776 5219789 4fa5cd samples/bpf/sampleip After: textdata bss dec hex filename 191011188 399792 420081 668f1 samples/bpf/sampleip Signed-off-by: Leo Yan <leo@linaro.org> --- samples/bpf/bpf_load.c | 23 --- 1 file changed, 20 insertions(+), 3 deletions(-) diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c index 28e4678..c2bf7ca 100644 --- a/samples/bpf/bpf_load.c +++ b/samples/bpf/bpf_load.c @@ -651,8 +651,7 @@ void read_trace_pipe(void) } } -#define MAX_SYMS 30 -static struct ksym syms[MAX_SYMS]; +static struct ksym *syms; static int sym_cnt; static int ksym_cmp(const void *p1, const void *p2) @@ -678,12 +677,30 @@ int load_kallsyms(void) break; if (!addr) continue; + sym_cnt++; + } + + syms = calloc(sym_cnt, sizeof(*syms)); + if (!syms) { + fclose(f); + return -ENOMEM; + } + + rewind(f); + while (!feof(f)) { + if (!fgets(buf, sizeof(buf), f)) + break; + if (sscanf(buf, "%p %c %s", , , func) != 3) + break; + if (!addr) + continue; syms[i].addr = (long) addr; syms[i].name = strdup(func); i++; } - sym_cnt = i; qsort(syms, sym_cnt, sizeof(struct ksym), ksym_cmp); + + fclose(f); return 0; } -- 1.9.1
[PATCH bpf-next 4/5] samples/bpf: Refine printing symbol for sampleip
The code defines macro 'PAGE_OFFSET' and uses it to decide if the address is in kernel space or not. But different architecture has different 'PAGE_OFFSET' so this program cannot be used for all platforms. This commit changes to check returned pointer from ksym_search() to judge if the address falls into kernel space or not, and removes macro 'PAGE_OFFSET' as it isn't used anymore. As result, this program has no architecture dependency. Signed-off-by: Leo Yan <leo@linaro.org> --- samples/bpf/sampleip_user.c | 8 +++- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/samples/bpf/sampleip_user.c b/samples/bpf/sampleip_user.c index 4ed690b..0eea1b3 100644 --- a/samples/bpf/sampleip_user.c +++ b/samples/bpf/sampleip_user.c @@ -26,7 +26,6 @@ #define DEFAULT_FREQ 99 #define DEFAULT_SECS 5 #define MAX_IPS8192 -#define PAGE_OFFSET0x8800 static int nr_cpus; @@ -107,14 +106,13 @@ static void print_ip_map(int fd) /* sort and print */ qsort(counts, max, sizeof(struct ipcount), count_cmp); for (i = 0; i < max; i++) { - if (counts[i].ip > PAGE_OFFSET) { - sym = ksym_search(counts[i].ip); + sym = ksym_search(counts[i].ip); + if (sym) printf("0x%-17llx %-32s %u\n", counts[i].ip, sym->name, counts[i].count); - } else { + else printf("0x%-17llx %-32s %u\n", counts[i].ip, "(user)", counts[i].count); - } } if (max == MAX_IPS) { -- 1.9.1
[PATCH bpf-next 5/5] samples/bpf: Handle NULL pointer returned by ksym_search()
This commit handles NULL pointer returned by ksym_search() to directly print address hexadecimal value, the change is applied in 'trace_event', 'spintest' and 'offwaketime' programs. Signed-off-by: Leo Yan <leo@linaro.org> --- samples/bpf/offwaketime_user.c | 5 + samples/bpf/spintest_user.c| 5 - samples/bpf/trace_event_user.c | 5 + 3 files changed, 14 insertions(+), 1 deletion(-) diff --git a/samples/bpf/offwaketime_user.c b/samples/bpf/offwaketime_user.c index 512f87a..fce2113 100644 --- a/samples/bpf/offwaketime_user.c +++ b/samples/bpf/offwaketime_user.c @@ -27,6 +27,11 @@ static void print_ksym(__u64 addr) if (!addr) return; sym = ksym_search(addr); + if (!sym) { + printf("%llx;", addr); + return; + } + if (PRINT_RAW_ADDR) printf("%s/%llx;", sym->name, addr); else diff --git a/samples/bpf/spintest_user.c b/samples/bpf/spintest_user.c index 3d73621..3140803 100644 --- a/samples/bpf/spintest_user.c +++ b/samples/bpf/spintest_user.c @@ -36,7 +36,10 @@ int main(int ac, char **argv) bpf_map_lookup_elem(map_fd[0], _key, ); assert(next_key == value); sym = ksym_search(value); - printf(" %s", sym->name); + if (!sym) + printf(" %lx", value); + else + printf(" %s", sym->name); key = next_key; } if (key) diff --git a/samples/bpf/trace_event_user.c b/samples/bpf/trace_event_user.c index 56f7a25..d2ab33e 100644 --- a/samples/bpf/trace_event_user.c +++ b/samples/bpf/trace_event_user.c @@ -33,6 +33,11 @@ static void print_ksym(__u64 addr) if (!addr) return; sym = ksym_search(addr); + if (!sym) { + printf("%llx;", addr); + return; + } + printf("%s;", sym->name); if (!strcmp(sym->name, "sys_read")) sys_read_seen = true; -- 1.9.1
Re: [PATCH bpf-next v2] samples/bpf: Add program for CPU state statistics
On Mon, Feb 26, 2018 at 11:26:52AM +0100, Daniel Borkmann wrote: > On 02/26/2018 02:19 AM, Leo Yan wrote: [...] > > CPU states statistics: > > state(ms) cstate-0cstate-1cstate-2pstate-0pstate-1 > > pstate-2pstate-3pstate-4 > > CPU-0 767 6111111863 561 31 756 > >853 190 > > CPU-1 241 10606 107956 484 125 646 > >990 85 > > CPU-2 413 19721 98735 636 84 696 > >757 89 > > CPU-3 84 11711 79989 17516 909 4811 > >5773341 > > CPU-4 152 19610 98229 444 53 649 > >708 1283 > > CPU-5 185 8781108697 666 91 671 > >677 1365 > > CPU-6 157 21964 95825 581 67 566 > >684 1284 > > CPU-7 125 15238 102704 398 20 665 > >786 1197 > > > > Cc: Daniel Lezcano <daniel.lezc...@linaro.org> > > Cc: Vincent Guittot <vincent.guit...@linaro.org> > > Signed-off-by: Leo Yan <leo@linaro.org> > > Applied to bpf-next, thanks Leo! Thanks for help, Daniel.
[PATCH bpf-next v2] samples/bpf: Add program for CPU state statistics
CPU is active when have running tasks on it and CPUFreq governor can select different operating points (OPP) according to different workload; we use 'pstate' to present CPU state which have running tasks with one specific OPP. On the other hand, CPU is idle which only idle task on it, CPUIdle governor can select one specific idle state to power off hardware logics; we use 'cstate' to present CPU idle state. Based on trace events 'cpu_idle' and 'cpu_frequency' we can accomplish the duration statistics for every state. Every time when CPU enters into or exits from idle states, the trace event 'cpu_idle' is recorded; trace event 'cpu_frequency' records the event for CPU OPP changing, so it's easily to know how long time the CPU stays in the specified OPP, and the CPU must be not in any idle state. This patch is to utilize the mentioned trace events for pstate and cstate statistics. To achieve more accurate profiling data, the program uses below sequence to insure CPU running/idle time aren't missed: - Before profiling the user space program wakes up all CPUs for once, so can avoid to missing account time for CPU staying in idle state for long time; the program forces to set 'scaling_max_freq' to lowest frequency and then restore 'scaling_max_freq' to highest frequency, this can ensure the frequency to be set to lowest frequency and later after start to run workload the frequency can be easily to be changed to higher frequency; - User space program reads map data and update statistics for every 5s, so this is same with other sample bpf programs for avoiding big overload introduced by bpf program self; - When send signal to terminate program, the signal handler wakes up all CPUs, set lowest frequency and restore highest frequency to 'scaling_max_freq'; this is exactly same with the first step so avoid to missing account CPU pstate and cstate time during last stage. Finally it reports the latest statistics. The program has been tested on Hikey board with octa CA53 CPUs, below is one example for statistics result, the format mainly follows up Jesper Dangaard Brouer suggestion. Jesper reminds to 'get printf to pretty print with thousands separators use %' and setlocale(LC_NUMERIC, "en_US")', tried three different arm64 GCC toolchains (5.4.0 20160609, 6.2.1 20161016, 6.3.0 20170516) but all of them cannot support printf flag character %' on arm64 platform, so go back print number without grouping mode. CPU states statistics: state(ms) cstate-0cstate-1cstate-2pstate-0pstate-1pstate-2 pstate-3pstate-4 CPU-0 767 6111111863 561 31 756 853 190 CPU-1 241 10606 107956 484 125 646 990 85 CPU-2 413 19721 98735 636 84 696 757 89 CPU-3 84 11711 79989 17516 909 4811 5773341 CPU-4 152 19610 98229 444 53 649 708 1283 CPU-5 185 8781108697 666 91 671 677 1365 CPU-6 157 21964 95825 581 67 566 684 1284 CPU-7 125 15238 102704 398 20 665 786 1197 Cc: Daniel Lezcano <daniel.lezc...@linaro.org> Cc: Vincent Guittot <vincent.guit...@linaro.org> Signed-off-by: Leo Yan <leo@linaro.org> --- samples/bpf/Makefile | 4 + samples/bpf/cpustat_kern.c | 281 + samples/bpf/cpustat_user.c | 219 +++ 3 files changed, 504 insertions(+) create mode 100644 samples/bpf/cpustat_kern.c create mode 100644 samples/bpf/cpustat_user.c diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile index ec3fc8d..2c2a587 100644 --- a/samples/bpf/Makefile +++ b/samples/bpf/Makefile @@ -43,6 +43,7 @@ hostprogs-y += xdp_redirect_cpu hostprogs-y += xdp_monitor hostprogs-y += xdp_rxq_info hostprogs-y += syscall_tp +hostprogs-y += cpustat # Libbpf dependencies LIBBPF := ../../tools/lib/bpf/bpf.o ../../tools/lib/bpf/nlattr.o @@ -93,6 +94,7 @@ xdp_redirect_cpu-objs := bpf_load.o $(LIBBPF) xdp_redirect_cpu_user.o xdp_monitor-objs := bpf_load.o $(LIBBPF) xdp_monitor_user.o xdp_rxq_info-objs := bpf_load.o $(LIBBPF) xdp_rxq_info_user.o syscall_tp-objs := bpf_load.o $(LIBBPF) syscall_tp_user.o +cpustat-objs := bpf_load.o $(LIBBPF) cpustat_user.o # Tell kbuild to always build the programs always := $(hostprogs-y) @@ -144,6 +146,7 @@ always += xdp_monitor_kern.o always += xdp_rxq_info_kern.o always += xdp2skb_meta_kern.o always += syscall_tp_kern.o +always += cpustat_kern.o HOSTCFLAGS += -I$(objtree)/usr/include HOSTCFLAGS += -I$(srctree)/tools/lib/ @@ -188,6 +191,7 @@ HOSTLOADLIBES_xdp_redirect_cpu += -lelf HOSTLO
Re: [PATCH] samples/bpf: Add program for CPU state statistics
Hi Jesper, On Wed, Jan 31, 2018 at 10:14:27AM +0100, Jesper Dangaard Brouer wrote: > On Wed, 31 Jan 2018 02:29:59 +0800 > Leo Yan <leo@linaro.org> wrote: > > > CPU 0 > > State: Duration(ms) Distribution > > cstate 0 : 47555|* | > > cstate 1 : 0|| > > cstate 2 : 0|| > > pstate 0 : 15239|* | > > pstate 1 : 1521 || > > pstate 2 : 3188 |* | > > pstate 3 : 1836 || > > pstate 4 : 94 || > > > > CPU 1 > > State: Duration(ms) Distribution > > cstate 0 : 87 || > > cstate 1 : 16264|** | > > cstate 2 : 50458|*** | > > pstate 0 : 832 || > > pstate 1 : 131 || > > pstate 2 : 825 || > > pstate 3 : 787 || > > pstate 4 : 4|| > > > > CPU 2 > > State: Duration(ms) Distribution > > cstate 0 : 177 || > > cstate 1 : 9363 |* | > > cstate 2 : 55835|*** | > > pstate 0 : 1468 || > > pstate 1 : 350 || > > pstate 2 : 1062 || > > pstate 3 : 1164 || > > pstate 4 : 7|| > > The output gets very long as the number of CPUs grow... > What about using the following output: > > state(ms) cstate-0 cstate-1 cstate-2 pstate-0 pstate-1 pstate-2 > pstate-3 pstate-4 > CPU-047,555 0 015,239 1,521 1,836 > 1,83694 > CPU-18716,26450,458 832 131 825 > 787 4 > CPU-2 177 9,36355,835 1,468 350 1,062 > 1,164 7 > > Look at the code samples/bpf/xdp_redirect_cpu_user.c for an examples of > howto align the columns, and the trick to get printf to pretty print > with thousands separators use %' and setlocale(LC_NUMERIC, "en_US"). Thanks a lot for suggestion. Using row/columns looks good for me, will change code for this way. > P.S. net-next and bpf-next is closed at the moment. > Next time you submit read[1] and [2], especially howto indicate which > tree (bpf vs. bpf-next) the patch is against, as this helps the > workflow of the maintainers. Yeah, will read the docs and follow up the workflow and send new patch to target next merge window. Thanks, Leo Yan > [1] > https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/Documentation/bpf/bpf_devel_QA.txt > [2] Documentation/networking/netdev-FAQ.txt > -- > Best regards, > Jesper Dangaard Brouer > MSc.CS, Principal Kernel Engineer at Red Hat > LinkedIn: http://www.linkedin.com/in/brouer
[PATCH] samples/bpf: Add program for CPU state statistics
: Duration(ms) Distribution cstate 0 : 95 || cstate 1 : 18377|| cstate 2 : 47609|* | pstate 0 : 1165 || pstate 1 : 243 || pstate 2 : 818 || pstate 3 : 1007 || pstate 4 : 9|| CPU 6 State: Duration(ms) Distribution cstate 0 : 102 || cstate 1 : 16629|** | cstate 2 : 49335|** | pstate 0 : 836 || pstate 1 : 253 || pstate 2 : 895 || pstate 3 : 1275 || pstate 4 : 6|| CPU 7 State: Duration(ms) Distribution cstate 0 : 88 || cstate 1 : 16070|** | cstate 2 : 50279|*** | pstate 0 : 948 || pstate 1 : 214 || pstate 2 : 873 || pstate 3 : 952 || pstate 4 : 0|| Cc: Daniel Lezcano <daniel.lezc...@linaro.org> Cc: Vincent Guittot <vincent.guit...@linaro.org> Signed-off-by: Leo Yan <leo@linaro.org> --- samples/bpf/Makefile | 4 + samples/bpf/cpustat_kern.c | 281 + samples/bpf/cpustat_user.c | 234 + 3 files changed, 519 insertions(+) create mode 100644 samples/bpf/cpustat_kern.c create mode 100644 samples/bpf/cpustat_user.c diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile index adeaa13..e5d747f 100644 --- a/samples/bpf/Makefile +++ b/samples/bpf/Makefile @@ -41,6 +41,7 @@ hostprogs-y += xdp_redirect_map hostprogs-y += xdp_redirect_cpu hostprogs-y += xdp_monitor hostprogs-y += syscall_tp +hostprogs-y += cpustat # Libbpf dependencies LIBBPF := ../../tools/lib/bpf/bpf.o @@ -89,6 +90,7 @@ xdp_redirect_map-objs := bpf_load.o $(LIBBPF) xdp_redirect_map_user.o xdp_redirect_cpu-objs := bpf_load.o $(LIBBPF) xdp_redirect_cpu_user.o xdp_monitor-objs := bpf_load.o $(LIBBPF) xdp_monitor_user.o syscall_tp-objs := bpf_load.o $(LIBBPF) syscall_tp_user.o +cpustat-objs := bpf_load.o $(LIBBPF) cpustat_user.o # Tell kbuild to always build the programs always := $(hostprogs-y) @@ -137,6 +139,7 @@ always += xdp_redirect_map_kern.o always += xdp_redirect_cpu_kern.o always += xdp_monitor_kern.o always += syscall_tp_kern.o +always += cpustat_kern.o HOSTCFLAGS += -I$(objtree)/usr/include HOSTCFLAGS += -I$(srctree)/tools/lib/ @@ -179,6 +182,7 @@ HOSTLOADLIBES_xdp_redirect_map += -lelf HOSTLOADLIBES_xdp_redirect_cpu += -lelf HOSTLOADLIBES_xdp_monitor += -lelf HOSTLOADLIBES_syscall_tp += -lelf +HOSTLOADLIBES_cpustat += -lelf # Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on cmdline: # make samples/bpf/ LLC=~/git/llvm/build/bin/llc CLANG=~/git/llvm/build/bin/clang diff --git a/samples/bpf/cpustat_kern.c b/samples/bpf/cpustat_kern.c new file mode 100644 index 000..68c84da --- /dev/null +++ b/samples/bpf/cpustat_kern.c @@ -0,0 +1,281 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include +#include +#include +#include "bpf_helpers.h" + +/* + * The CPU number, cstate number and pstate number are based + * on 96boards Hikey with octa CA53 CPUs. + * + * Every CPU have three idle states for cstate: + * WFI, CPU_OFF, CLUSTER_OFF + * + * Every CPU have 5 operating points: + * 208MHz, 432MHz, 729MHz, 960MHz, 1200MHz + * + * This code is based on these assumption and other platforms + * need to adjust these definitions. + */ +#define MAX_CPU8 +#define MAX_PSTATE_ENTRIES 5 +#define MAX_CSTATE_ENTRIES 3 + +static int cpu_opps[] = { 208000, 432000, 729000, 96, 120 }; + +/* + * my_map structure is used to record cstate and pstate index and + * timestamp (Idx, Ts), when new event incoming we need to update + * combination for new state index and timestamp (Idx`, Ts`). + * + * Based on (Idx, Ts) and (Idx`, Ts`) we can calculate the time + * interval for the previous state: Duration(Idx) = Ts` - Ts. + * + * Every CPU has one below array for recording state index and + * timestamp, and record for cstate and pstate saperately: + * + * +--