[PATCH bpf-next v2] bpf, doc: Update bpf_jit_enable limitation for CONFIG_BPF_JIT_ALWAYS_ON

2018-04-27 Thread Leo Yan
When CONFIG_BPF_JIT_ALWAYS_ON is enabled, kernel has limitation for
bpf_jit_enable, so it has fixed value 1 and we cannot set it to 2
for JIT opcode dumping; this patch is to update the doc for it.

Suggested-by: Daniel Borkmann <dan...@iogearbox.net>
Signed-off-by: Leo Yan <leo@linaro.org>
---
 Documentation/networking/filter.txt | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/Documentation/networking/filter.txt 
b/Documentation/networking/filter.txt
index fd55c7d..5032e12 100644
--- a/Documentation/networking/filter.txt
+++ b/Documentation/networking/filter.txt
@@ -483,6 +483,12 @@ Example output from dmesg:
 [ 3389.935851] JIT code: 0030: 00 e8 28 94 ff e0 83 f8 01 75 07 b8 ff ff 
00 00
 [ 3389.935852] JIT code: 0040: eb 02 31 c0 c9 c3
 
+When CONFIG_BPF_JIT_ALWAYS_ON is enabled, bpf_jit_enable is permanently set to 
1 and
+setting any other value than that will return in failure. This is even the 
case for
+setting bpf_jit_enable to 2, since dumping the final JIT image into the kernel 
log
+is discouraged and introspection through bpftool (under tools/bpf/bpftool/) is 
the
+generally recommended approach instead.
+
 In the kernel source tree under tools/bpf/, there's bpf_jit_disasm for
 generating disassembly out of the kernel log's hexdump:
 
-- 
1.9.1



Re: [PATCH bpf-next] bpf, doc: Update bpf_jit_enable limitation for CONFIG_BPF_JIT_ALWAYS_ON

2018-04-27 Thread Leo Yan
On Fri, Apr 27, 2018 at 11:44:44AM +0200, Daniel Borkmann wrote:
> On 04/26/2018 04:26 AM, Leo Yan wrote:
> > When CONFIG_BPF_JIT_ALWAYS_ON is enabled, kernel has limitation for
> > bpf_jit_enable, so it has fixed value 1 and we cannot set it to 2
> > for JIT opcode dumping; this patch is to update the doc for it.
> > 
> > Signed-off-by: Leo Yan <leo@linaro.org>
> > ---
> >  Documentation/networking/filter.txt | 6 ++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/Documentation/networking/filter.txt 
> > b/Documentation/networking/filter.txt
> > index fd55c7d..feddab9 100644
> > --- a/Documentation/networking/filter.txt
> > +++ b/Documentation/networking/filter.txt
> > @@ -483,6 +483,12 @@ Example output from dmesg:
> >  [ 3389.935851] JIT code: 0030: 00 e8 28 94 ff e0 83 f8 01 75 07 b8 ff 
> > ff 00 00
> >  [ 3389.935852] JIT code: 0040: eb 02 31 c0 c9 c3
> >  
> > +When CONFIG_BPF_JIT_ALWAYS_ON is enabled, bpf_jit_enable is set to 1 by 
> > default
> > +and it returns failure if change to any other value from proc node; this is
> > +for security consideration to avoid leaking info to unprivileged users. In 
> > this
> > +case, we can't directly dump JIT opcode image from kernel log, 
> > alternatively we
> > +need to use bpf tool for the dumping.
> > +
> 
> Could you change this doc text a bit, I think it's slightly misleading. From 
> the first
> sentence one could also interpret that value 0 would leaking info to 
> unprivileged users
> whereas here we're only talking about the case of value 2. Maybe something 
> roughly like
> this to make it more clear:
> 
>   When CONFIG_BPF_JIT_ALWAYS_ON is enabled, bpf_jit_enable is permanently set 
> to 1 and
>   setting any other value than that will return in failure. This is even the 
> case for
>   setting bpf_jit_enable to 2, since dumping the final JIT image into the 
> kernel log
>   is discouraged and introspection through bpftool (under tools/bpf/bpftool/) 
> is the
>   generally recommended approach instead.

Yeah, your rephrasing is more clear and better.  Will do this and send
new patch soon.  Thanks for your helping.

> Thanks,
> Daniel


Re: [PATCH bpf-next] bpf: Allow bpf_jit_enable = 2 with BPF_JIT_ALWAYS_ON config

2018-04-25 Thread Leo Yan
On Wed, Apr 25, 2018 at 05:37:39PM +0200, Daniel Borkmann wrote:
> On 04/25/2018 04:14 PM, Alexei Starovoitov wrote:
> > On Wed, Apr 25, 2018 at 05:25:47PM +0800, Leo Yan wrote:
> >>
> >> If we have concern for security issue, should we remove support for
> >> 'bpf_jit_enable = 2' and modify the doc to reflect this change?
> > 
> > I suggest to fix the doc.
> 
> Agree, lets do that instead. Leo, could you cook a patch for that?

Sure, have sent new patch for this.

Thanks for suggestion!

> Thanks,
> Daniel


[PATCH bpf-next] bpf, doc: Update bpf_jit_enable limitation for CONFIG_BPF_JIT_ALWAYS_ON

2018-04-25 Thread Leo Yan
When CONFIG_BPF_JIT_ALWAYS_ON is enabled, kernel has limitation for
bpf_jit_enable, so it has fixed value 1 and we cannot set it to 2
for JIT opcode dumping; this patch is to update the doc for it.

Signed-off-by: Leo Yan <leo@linaro.org>
---
 Documentation/networking/filter.txt | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/Documentation/networking/filter.txt 
b/Documentation/networking/filter.txt
index fd55c7d..feddab9 100644
--- a/Documentation/networking/filter.txt
+++ b/Documentation/networking/filter.txt
@@ -483,6 +483,12 @@ Example output from dmesg:
 [ 3389.935851] JIT code: 0030: 00 e8 28 94 ff e0 83 f8 01 75 07 b8 ff ff 
00 00
 [ 3389.935852] JIT code: 0040: eb 02 31 c0 c9 c3
 
+When CONFIG_BPF_JIT_ALWAYS_ON is enabled, bpf_jit_enable is set to 1 by default
+and it returns failure if change to any other value from proc node; this is
+for security consideration to avoid leaking info to unprivileged users. In this
+case, we can't directly dump JIT opcode image from kernel log, alternatively we
+need to use bpf tool for the dumping.
+
 In the kernel source tree under tools/bpf/, there's bpf_jit_disasm for
 generating disassembly out of the kernel log's hexdump:
 
-- 
1.9.1



Re: [PATCH bpf-next 1/5] samples/bpf: Fix typo in comment

2018-04-25 Thread Leo Yan
On Fri, Apr 20, 2018 at 03:52:13PM +0200, Jesper Dangaard Brouer wrote:
> On Fri, 20 Apr 2018 14:21:16 +0100
> Daniel Thompson <daniel.thomp...@linaro.org> wrote:
> 
> > On Fri, Apr 20, 2018 at 02:10:04PM +0200, Jesper Dangaard Brouer wrote:
> > > 
> > > On Thu, 19 Apr 2018 09:34:02 +0800 Leo Yan <leo@linaro.org> wrote:
> > >   
> > > > Fix typo by replacing 'iif' with 'if'.
> > > > 
> > > > Signed-off-by: Leo Yan <leo@linaro.org>
> > > > ---
> > > >  samples/bpf/bpf_load.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c
> > > > index bebe418..28e4678 100644
> > > > --- a/samples/bpf/bpf_load.c
> > > > +++ b/samples/bpf/bpf_load.c
> > > > @@ -393,7 +393,7 @@ static int load_elf_maps_section(struct 
> > > > bpf_map_data *maps, int maps_shndx,
> > > > continue;
> > > > if (sym[nr_maps].st_shndx != maps_shndx)
> > > > continue;
> > > > -   /* Only increment iif maps section */
> > > > +   /* Only increment if maps section */
> > > > nr_maps++;
> > > > }  
> > > 
> > > This was actually not a typo from my side.
> > > 
> > > With 'iif' I mean 'if and only if' ... but it doesn't matter much.  
> > 
> > I think 'if and only if' is more commonly abbreviated 'iff' isn't it?
> 
> Ah, yes![1]  -- then it *is* actually a typo! - LOL
> 
> I'm fine with changing this to "if" :-)

Thanks for the reviewing, Daniel & Jesper.
I also learn it from the discussion :)

> [1] https://en.wikipedia.org/wiki/If_and_only_if
> 
> -- 
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer


Re: [PATCH bpf-next] bpf: Allow bpf_jit_enable = 2 with BPF_JIT_ALWAYS_ON config

2018-04-25 Thread Leo Yan
Hi Daniel,

On Wed, Apr 25, 2018 at 11:12:21AM +0200, Daniel Borkmann wrote:
> On 04/25/2018 10:18 AM, Leo Yan wrote:
> > After enabled BPF_JIT_ALWAYS_ON config, bpf_jit_enable always equals to
> > 1; it is impossible to set 'bpf_jit_enable = 2' and the kernel has no
> > chance to call bpf_jit_dump().
> > 
> > This patch relaxes bpf_jit_enable range to [1..2] when kernel config
> > BPF_JIT_ALWAYS_ON is enabled so can invoke jit dump.
> > 
> > Signed-off-by: Leo Yan <leo@linaro.org>
> 
> Is there a specific reason why you need this here instead of retrieving the
> dump from the newer interface available from bpftool (tools/bpf/bpftool/)?
> The bpf_jit_enable = 2 is not recommended these days since it dumps into the
> kernel log which is often readable from unpriv as well. bpftool makes use
> of the BPF_OBJ_GET_INFO_BY_FD interface via bpf syscall to get the JIT dump
> instead when bpf_jit_enable is set.

Thanks for reviewing.

When I read the doc Documentation/networking/filter.txt and the
section "JIT compiler" it suggests as below.  So I tried to set
'bpf_jit_enable = 2' to dump JIT code, but it failed.

If we have concern for security issue, should we remove support for
'bpf_jit_enable = 2' and modify the doc to reflect this change?

---8<---

For JIT developers, doing audits etc, each compile run can output the generated
opcode image into the kernel log via: 

  echo 2 > /proc/sys/net/core/bpf_jit_enable

Example output from dmesg:

[ 3389.935842] flen=6 proglen=70 pass=3 image=a0069c8f
[ 3389.935847] JIT code: : 55 48 89 e5 48 83 ec 60 48 89 5d f8 44 8b 4f 
68
[ 3389.935849] JIT code: 0010: 44 2b 4f 6c 4c 8b 87 d8 00 00 00 be 0c 00 00 
00
[ 3389.935850] JIT code: 0020: e8 1d 94 ff e0 3d 00 08 00 00 75 16 be 17 00 
00
[ 3389.935851] JIT code: 0030: 00 e8 28 94 ff e0 83 f8 01 75 07 b8 ff ff 00 
00
[ 3389.935852] JIT code: 0040: eb 02 31 c0 c9 c3

> > ---
> >  net/core/sysctl_net_core.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
> > index b1a2c5e..6a39b22 100644
> > --- a/net/core/sysctl_net_core.c
> > +++ b/net/core/sysctl_net_core.c
> > @@ -371,7 +371,7 @@ static int proc_dointvec_minmax_bpf_enable(struct 
> > ctl_table *table, int write,
> > .proc_handler   = proc_dointvec_minmax_bpf_enable,
> >  # ifdef CONFIG_BPF_JIT_ALWAYS_ON
> > .extra1 = ,
> > -   .extra2 = ,
> > +   .extra2 = ,
> >  # else
> > .extra1 = ,
> > .extra2 = ,
> > 
> 


[PATCH bpf-next] bpf: Allow bpf_jit_enable = 2 with BPF_JIT_ALWAYS_ON config

2018-04-25 Thread Leo Yan
After enabled BPF_JIT_ALWAYS_ON config, bpf_jit_enable always equals to
1; it is impossible to set 'bpf_jit_enable = 2' and the kernel has no
chance to call bpf_jit_dump().

This patch relaxes bpf_jit_enable range to [1..2] when kernel config
BPF_JIT_ALWAYS_ON is enabled so can invoke jit dump.

Signed-off-by: Leo Yan <leo@linaro.org>
---
 net/core/sysctl_net_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index b1a2c5e..6a39b22 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -371,7 +371,7 @@ static int proc_dointvec_minmax_bpf_enable(struct ctl_table 
*table, int write,
.proc_handler   = proc_dointvec_minmax_bpf_enable,
 # ifdef CONFIG_BPF_JIT_ALWAYS_ON
.extra1 = ,
-   .extra2 = ,
+   .extra2 = ,
 # else
.extra1 = ,
.extra2 = ,
-- 
1.9.1



Re: [PATCH bpf-next 4/5] samples/bpf: Refine printing symbol for sampleip

2018-04-18 Thread Leo Yan
On Wed, Apr 18, 2018 at 10:21:25PM -0700, Alexei Starovoitov wrote:
> On Thu, Apr 19, 2018 at 01:12:49PM +0800, Leo Yan wrote:
> > On Wed, Apr 18, 2018 at 09:47:45PM -0700, Alexei Starovoitov wrote:
> > > On Thu, Apr 19, 2018 at 09:34:05AM +0800, Leo Yan wrote:
> > > > The code defines macro 'PAGE_OFFSET' and uses it to decide if the
> > > > address is in kernel space or not.  But different architecture has
> > > > different 'PAGE_OFFSET' so this program cannot be used for all
> > > > platforms.
> > > > 
> > > > This commit changes to check returned pointer from ksym_search() to
> > > > judge if the address falls into kernel space or not, and removes
> > > > macro 'PAGE_OFFSET' as it isn't used anymore.  As result, this program
> > > > has no architecture dependency.
> > > > 
> > > > Signed-off-by: Leo Yan <leo@linaro.org>
> > > > ---
> > > >  samples/bpf/sampleip_user.c | 8 +++-
> > > >  1 file changed, 3 insertions(+), 5 deletions(-)
> > > > 
> > > > diff --git a/samples/bpf/sampleip_user.c b/samples/bpf/sampleip_user.c
> > > > index 4ed690b..0eea1b3 100644
> > > > --- a/samples/bpf/sampleip_user.c
> > > > +++ b/samples/bpf/sampleip_user.c
> > > > @@ -26,7 +26,6 @@
> > > >  #define DEFAULT_FREQ   99
> > > >  #define DEFAULT_SECS   5
> > > >  #define MAX_IPS8192
> > > > -#define PAGE_OFFSET0x8800
> > > >  
> > > >  static int nr_cpus;
> > > >  
> > > > @@ -107,14 +106,13 @@ static void print_ip_map(int fd)
> > > > /* sort and print */
> > > > qsort(counts, max, sizeof(struct ipcount), count_cmp);
> > > > for (i = 0; i < max; i++) {
> > > > -   if (counts[i].ip > PAGE_OFFSET) {
> > > > -   sym = ksym_search(counts[i].ip);
> > > 
> > > yes. it is x64 specific, since it's a sample code,
> > > but simply removing it is not a fix.
> > > It makes this sampleip code behaving incorrectly.
> > > To do such 'cleanup of ksym' please refactor it in the true generic way,
> > > so these ksym* helpers can work on all archs and put this new
> > > functionality into selftests.
> > 
> > Just want to check, are you suggesting to create a standalone
> > testing for kallsyms (like a folder tools/testing/selftests/ksym) or
> > do you mean to place the generic code into folder
> > tools/testing/selftests/bpf/?
> 
> I think the minimal first step is to cleanup ksym bits into something
> that bpf selftests can use and keep it as new .c in
> tools/testing/selftests/bpf/
> Later if it becomes useful for other tests in selftests it can be moved.

Thanks for explanation, now it's clear for me :)

> Thanks!
> 


Re: [PATCH bpf-next 4/5] samples/bpf: Refine printing symbol for sampleip

2018-04-18 Thread Leo Yan
On Wed, Apr 18, 2018 at 09:47:45PM -0700, Alexei Starovoitov wrote:
> On Thu, Apr 19, 2018 at 09:34:05AM +0800, Leo Yan wrote:
> > The code defines macro 'PAGE_OFFSET' and uses it to decide if the
> > address is in kernel space or not.  But different architecture has
> > different 'PAGE_OFFSET' so this program cannot be used for all
> > platforms.
> > 
> > This commit changes to check returned pointer from ksym_search() to
> > judge if the address falls into kernel space or not, and removes
> > macro 'PAGE_OFFSET' as it isn't used anymore.  As result, this program
> > has no architecture dependency.
> > 
> > Signed-off-by: Leo Yan <leo@linaro.org>
> > ---
> >  samples/bpf/sampleip_user.c | 8 +++-
> >  1 file changed, 3 insertions(+), 5 deletions(-)
> > 
> > diff --git a/samples/bpf/sampleip_user.c b/samples/bpf/sampleip_user.c
> > index 4ed690b..0eea1b3 100644
> > --- a/samples/bpf/sampleip_user.c
> > +++ b/samples/bpf/sampleip_user.c
> > @@ -26,7 +26,6 @@
> >  #define DEFAULT_FREQ   99
> >  #define DEFAULT_SECS   5
> >  #define MAX_IPS8192
> > -#define PAGE_OFFSET0x8800
> >  
> >  static int nr_cpus;
> >  
> > @@ -107,14 +106,13 @@ static void print_ip_map(int fd)
> > /* sort and print */
> > qsort(counts, max, sizeof(struct ipcount), count_cmp);
> > for (i = 0; i < max; i++) {
> > -   if (counts[i].ip > PAGE_OFFSET) {
> > -   sym = ksym_search(counts[i].ip);
> 
> yes. it is x64 specific, since it's a sample code,
> but simply removing it is not a fix.
> It makes this sampleip code behaving incorrectly.
> To do such 'cleanup of ksym' please refactor it in the true generic way,
> so these ksym* helpers can work on all archs and put this new
> functionality into selftests.

Just want to check, are you suggesting to create a standalone
testing for kallsyms (like a folder tools/testing/selftests/ksym) or
do you mean to place the generic code into folder
tools/testing/selftests/bpf/?

> If you can convert these examples into proper self-checking tests
> that run out of selftests that would be awesome.

Thank you for suggestions, Alexei.

> Thanks!
> 


[PATCH bpf-next 0/5] samples/bpf: Minor fixes and cleanup

2018-04-18 Thread Leo Yan
This patch series is minor fixes and cleanup for bpf load and samples
code.  The first one patch is typo fixing; patch 0002 is refactor for
dynamically allocate memory for kernel symbol structures; the last
three patches are mainly related with refactor with function
ksym_search(), the main benefit of this refactor is program sampleip
can be used without architecture dependency.

The patch series has been tested on ARM64 Hikey960 boards.

Leo Yan (5):
  samples/bpf: Fix typo in comment
  samples/bpf: Dynamically allocate structure 'syms'
  samples/bpf: Use NULL for failed to find symbol
  samples/bpf: Refine printing symbol for sampleip
  samples/bpf: Handle NULL pointer returned by ksym_search()

 samples/bpf/bpf_load.c | 29 +++--
 samples/bpf/offwaketime_user.c |  5 +
 samples/bpf/sampleip_user.c|  8 +++-
 samples/bpf/spintest_user.c|  5 -
 samples/bpf/trace_event_user.c |  5 +
 5 files changed, 40 insertions(+), 12 deletions(-)

-- 
1.9.1



[PATCH bpf-next 1/5] samples/bpf: Fix typo in comment

2018-04-18 Thread Leo Yan
Fix typo by replacing 'iif' with 'if'.

Signed-off-by: Leo Yan <leo@linaro.org>
---
 samples/bpf/bpf_load.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c
index bebe418..28e4678 100644
--- a/samples/bpf/bpf_load.c
+++ b/samples/bpf/bpf_load.c
@@ -393,7 +393,7 @@ static int load_elf_maps_section(struct bpf_map_data *maps, 
int maps_shndx,
continue;
if (sym[nr_maps].st_shndx != maps_shndx)
continue;
-   /* Only increment iif maps section */
+   /* Only increment if maps section */
nr_maps++;
}
 
-- 
1.9.1



[PATCH bpf-next 3/5] samples/bpf: Use NULL for failed to find symbol

2018-04-18 Thread Leo Yan
Function ksym_search() is used to parse address and return the symbol
structure, when the address is out of range for kernel symbols it
returns the symbol structure of kernel '_stext' entry; this introduces
confusion and it misses the chance to intuitively tell the address is
out of range.

This commit changes to use NULL pointer for failed to find symbol, user
functions need to check the pointer is NULL and get to know the address
has no corresponding kernel symbol for it.

Signed-off-by: Leo Yan <leo@linaro.org>
---
 samples/bpf/bpf_load.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c
index c2bf7ca..0c0584f 100644
--- a/samples/bpf/bpf_load.c
+++ b/samples/bpf/bpf_load.c
@@ -726,7 +726,7 @@ struct ksym *ksym_search(long key)
/* valid ksym */
return [start - 1];
 
-   /* out of range. return _stext */
-   return [0];
+   /* out of range. return NULL */
+   return NULL;
 }
 
-- 
1.9.1



[PATCH bpf-next 2/5] samples/bpf: Dynamically allocate structure 'syms'

2018-04-18 Thread Leo Yan
Structure 'syms' is used to store kernel symbol info by reading proc fs
node '/proc/kallsyms', this structure is declared with 30 entries
and static linked into bss section.  For most case the kernel symbols
has less than 30 entries, so it's safe to define so large array, but
the side effect is bss section is big introduced by this structure and
it isn't flexible.

To fix this, this patch dynamically allocates memory for structure
'syms' based on parsing '/proc/kallsyms' line number at the runtime,
which can save elf file required memory significantly.

Before:
   textdata bss dec hex filename
  188411172 5199776 5219789  4fa5cd samples/bpf/sampleip

After:
   textdata bss dec hex filename
  191011188  399792  420081   668f1 samples/bpf/sampleip

Signed-off-by: Leo Yan <leo@linaro.org>
---
 samples/bpf/bpf_load.c | 23 ---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c
index 28e4678..c2bf7ca 100644
--- a/samples/bpf/bpf_load.c
+++ b/samples/bpf/bpf_load.c
@@ -651,8 +651,7 @@ void read_trace_pipe(void)
}
 }
 
-#define MAX_SYMS 30
-static struct ksym syms[MAX_SYMS];
+static struct ksym *syms;
 static int sym_cnt;
 
 static int ksym_cmp(const void *p1, const void *p2)
@@ -678,12 +677,30 @@ int load_kallsyms(void)
break;
if (!addr)
continue;
+   sym_cnt++;
+   }
+
+   syms = calloc(sym_cnt, sizeof(*syms));
+   if (!syms) {
+   fclose(f);
+   return -ENOMEM;
+   }
+
+   rewind(f);
+   while (!feof(f)) {
+   if (!fgets(buf, sizeof(buf), f))
+   break;
+   if (sscanf(buf, "%p %c %s", , , func) != 3)
+   break;
+   if (!addr)
+   continue;
syms[i].addr = (long) addr;
syms[i].name = strdup(func);
i++;
}
-   sym_cnt = i;
qsort(syms, sym_cnt, sizeof(struct ksym), ksym_cmp);
+
+   fclose(f);
return 0;
 }
 
-- 
1.9.1



[PATCH bpf-next 4/5] samples/bpf: Refine printing symbol for sampleip

2018-04-18 Thread Leo Yan
The code defines macro 'PAGE_OFFSET' and uses it to decide if the
address is in kernel space or not.  But different architecture has
different 'PAGE_OFFSET' so this program cannot be used for all
platforms.

This commit changes to check returned pointer from ksym_search() to
judge if the address falls into kernel space or not, and removes
macro 'PAGE_OFFSET' as it isn't used anymore.  As result, this program
has no architecture dependency.

Signed-off-by: Leo Yan <leo@linaro.org>
---
 samples/bpf/sampleip_user.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/samples/bpf/sampleip_user.c b/samples/bpf/sampleip_user.c
index 4ed690b..0eea1b3 100644
--- a/samples/bpf/sampleip_user.c
+++ b/samples/bpf/sampleip_user.c
@@ -26,7 +26,6 @@
 #define DEFAULT_FREQ   99
 #define DEFAULT_SECS   5
 #define MAX_IPS8192
-#define PAGE_OFFSET0x8800
 
 static int nr_cpus;
 
@@ -107,14 +106,13 @@ static void print_ip_map(int fd)
/* sort and print */
qsort(counts, max, sizeof(struct ipcount), count_cmp);
for (i = 0; i < max; i++) {
-   if (counts[i].ip > PAGE_OFFSET) {
-   sym = ksym_search(counts[i].ip);
+   sym = ksym_search(counts[i].ip);
+   if (sym)
printf("0x%-17llx %-32s %u\n", counts[i].ip, sym->name,
   counts[i].count);
-   } else {
+   else
printf("0x%-17llx %-32s %u\n", counts[i].ip, "(user)",
   counts[i].count);
-   }
}
 
if (max == MAX_IPS) {
-- 
1.9.1



[PATCH bpf-next 5/5] samples/bpf: Handle NULL pointer returned by ksym_search()

2018-04-18 Thread Leo Yan
This commit handles NULL pointer returned by ksym_search() to directly
print address hexadecimal value, the change is applied in 'trace_event',
'spintest' and 'offwaketime' programs.

Signed-off-by: Leo Yan <leo@linaro.org>
---
 samples/bpf/offwaketime_user.c | 5 +
 samples/bpf/spintest_user.c| 5 -
 samples/bpf/trace_event_user.c | 5 +
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/samples/bpf/offwaketime_user.c b/samples/bpf/offwaketime_user.c
index 512f87a..fce2113 100644
--- a/samples/bpf/offwaketime_user.c
+++ b/samples/bpf/offwaketime_user.c
@@ -27,6 +27,11 @@ static void print_ksym(__u64 addr)
if (!addr)
return;
sym = ksym_search(addr);
+   if (!sym) {
+   printf("%llx;", addr);
+   return;
+   }
+
if (PRINT_RAW_ADDR)
printf("%s/%llx;", sym->name, addr);
else
diff --git a/samples/bpf/spintest_user.c b/samples/bpf/spintest_user.c
index 3d73621..3140803 100644
--- a/samples/bpf/spintest_user.c
+++ b/samples/bpf/spintest_user.c
@@ -36,7 +36,10 @@ int main(int ac, char **argv)
bpf_map_lookup_elem(map_fd[0], _key, );
assert(next_key == value);
sym = ksym_search(value);
-   printf(" %s", sym->name);
+   if (!sym)
+   printf(" %lx", value);
+   else
+   printf(" %s", sym->name);
key = next_key;
}
if (key)
diff --git a/samples/bpf/trace_event_user.c b/samples/bpf/trace_event_user.c
index 56f7a25..d2ab33e 100644
--- a/samples/bpf/trace_event_user.c
+++ b/samples/bpf/trace_event_user.c
@@ -33,6 +33,11 @@ static void print_ksym(__u64 addr)
if (!addr)
return;
sym = ksym_search(addr);
+   if (!sym) {
+   printf("%llx;", addr);
+   return;
+   }
+
printf("%s;", sym->name);
if (!strcmp(sym->name, "sys_read"))
sys_read_seen = true;
-- 
1.9.1



Re: [PATCH bpf-next v2] samples/bpf: Add program for CPU state statistics

2018-02-26 Thread Leo Yan
On Mon, Feb 26, 2018 at 11:26:52AM +0100, Daniel Borkmann wrote:
> On 02/26/2018 02:19 AM, Leo Yan wrote:

[...]

> > CPU states statistics:
> > state(ms)  cstate-0cstate-1cstate-2pstate-0pstate-1
> > pstate-2pstate-3pstate-4
> > CPU-0  767 6111111863  561 31  756  
> >853 190
> > CPU-1  241 10606   107956  484 125 646  
> >990 85
> > CPU-2  413 19721   98735   636 84  696  
> >757 89
> > CPU-3  84  11711   79989   17516   909 4811 
> >5773341
> > CPU-4  152 19610   98229   444 53  649  
> >708 1283
> > CPU-5  185 8781108697  666 91  671  
> >677 1365
> > CPU-6  157 21964   95825   581 67  566  
> >684 1284
> > CPU-7  125 15238   102704  398 20  665  
> >786 1197
> > 
> > Cc: Daniel Lezcano <daniel.lezc...@linaro.org>
> > Cc: Vincent Guittot <vincent.guit...@linaro.org>
> > Signed-off-by: Leo Yan <leo@linaro.org>
> 
> Applied to bpf-next, thanks Leo!

Thanks for help, Daniel.


[PATCH bpf-next v2] samples/bpf: Add program for CPU state statistics

2018-02-25 Thread Leo Yan
CPU is active when have running tasks on it and CPUFreq governor can
select different operating points (OPP) according to different workload;
we use 'pstate' to present CPU state which have running tasks with one
specific OPP.  On the other hand, CPU is idle which only idle task on
it, CPUIdle governor can select one specific idle state to power off
hardware logics; we use 'cstate' to present CPU idle state.

Based on trace events 'cpu_idle' and 'cpu_frequency' we can accomplish
the duration statistics for every state.  Every time when CPU enters
into or exits from idle states, the trace event 'cpu_idle' is recorded;
trace event 'cpu_frequency' records the event for CPU OPP changing, so
it's easily to know how long time the CPU stays in the specified OPP,
and the CPU must be not in any idle state.

This patch is to utilize the mentioned trace events for pstate and
cstate statistics.  To achieve more accurate profiling data, the program
uses below sequence to insure CPU running/idle time aren't missed:

- Before profiling the user space program wakes up all CPUs for once, so
  can avoid to missing account time for CPU staying in idle state for
  long time; the program forces to set 'scaling_max_freq' to lowest
  frequency and then restore 'scaling_max_freq' to highest frequency,
  this can ensure the frequency to be set to lowest frequency and later
  after start to run workload the frequency can be easily to be changed
  to higher frequency;

- User space program reads map data and update statistics for every 5s,
  so this is same with other sample bpf programs for avoiding big
  overload introduced by bpf program self;

- When send signal to terminate program, the signal handler wakes up
  all CPUs, set lowest frequency and restore highest frequency to
  'scaling_max_freq'; this is exactly same with the first step so
  avoid to missing account CPU pstate and cstate time during last
  stage.  Finally it reports the latest statistics.

The program has been tested on Hikey board with octa CA53 CPUs, below
is one example for statistics result, the format mainly follows up
Jesper Dangaard Brouer suggestion.

Jesper reminds to 'get printf to pretty print with thousands separators
use %' and setlocale(LC_NUMERIC, "en_US")', tried three different arm64
GCC toolchains (5.4.0 20160609, 6.2.1 20161016, 6.3.0 20170516) but all
of them cannot support printf flag character %' on arm64 platform, so go
back print number without grouping mode.

CPU states statistics:
state(ms)  cstate-0cstate-1cstate-2pstate-0pstate-1pstate-2 
   pstate-3pstate-4
CPU-0  767 6111111863  561 31  756  
   853 190
CPU-1  241 10606   107956  484 125 646  
   990 85
CPU-2  413 19721   98735   636 84  696  
   757 89
CPU-3  84  11711   79989   17516   909 4811 
   5773341
CPU-4  152 19610   98229   444 53  649  
   708 1283
CPU-5  185 8781108697  666 91  671  
   677 1365
CPU-6  157 21964   95825   581 67  566  
   684 1284
CPU-7  125 15238   102704  398 20  665  
   786 1197

Cc: Daniel Lezcano <daniel.lezc...@linaro.org>
Cc: Vincent Guittot <vincent.guit...@linaro.org>
Signed-off-by: Leo Yan <leo@linaro.org>
---
 samples/bpf/Makefile   |   4 +
 samples/bpf/cpustat_kern.c | 281 +
 samples/bpf/cpustat_user.c | 219 +++
 3 files changed, 504 insertions(+)
 create mode 100644 samples/bpf/cpustat_kern.c
 create mode 100644 samples/bpf/cpustat_user.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index ec3fc8d..2c2a587 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -43,6 +43,7 @@ hostprogs-y += xdp_redirect_cpu
 hostprogs-y += xdp_monitor
 hostprogs-y += xdp_rxq_info
 hostprogs-y += syscall_tp
+hostprogs-y += cpustat
 
 # Libbpf dependencies
 LIBBPF := ../../tools/lib/bpf/bpf.o ../../tools/lib/bpf/nlattr.o
@@ -93,6 +94,7 @@ xdp_redirect_cpu-objs := bpf_load.o $(LIBBPF) 
xdp_redirect_cpu_user.o
 xdp_monitor-objs := bpf_load.o $(LIBBPF) xdp_monitor_user.o
 xdp_rxq_info-objs := bpf_load.o $(LIBBPF) xdp_rxq_info_user.o
 syscall_tp-objs := bpf_load.o $(LIBBPF) syscall_tp_user.o
+cpustat-objs := bpf_load.o $(LIBBPF) cpustat_user.o
 
 # Tell kbuild to always build the programs
 always := $(hostprogs-y)
@@ -144,6 +146,7 @@ always += xdp_monitor_kern.o
 always += xdp_rxq_info_kern.o
 always += xdp2skb_meta_kern.o
 always += syscall_tp_kern.o
+always += cpustat_kern.o
 
 HOSTCFLAGS += -I$(objtree)/usr/include
 HOSTCFLAGS += -I$(srctree)/tools/lib/
@@ -188,6 +191,7 @@ HOSTLOADLIBES_xdp_redirect_cpu += -lelf
 HOSTLO

Re: [PATCH] samples/bpf: Add program for CPU state statistics

2018-01-31 Thread Leo Yan
Hi Jesper,

On Wed, Jan 31, 2018 at 10:14:27AM +0100, Jesper Dangaard Brouer wrote:
> On Wed, 31 Jan 2018 02:29:59 +0800
> Leo Yan <leo@linaro.org> wrote:
> 
> > CPU 0
> > State: Duration(ms)  Distribution
> > cstate 0 : 47555|*   |
> > cstate 1 : 0||
> > cstate 2 : 0||
> > pstate 0 : 15239|*   |
> > pstate 1 : 1521 ||
> > pstate 2 : 3188 |*   |
> > pstate 3 : 1836 ||
> > pstate 4 : 94   ||
> > 
> > CPU 1
> > State: Duration(ms)  Distribution
> > cstate 0 : 87   ||
> > cstate 1 : 16264|**  |
> > cstate 2 : 50458|*** |
> > pstate 0 : 832  ||
> > pstate 1 : 131  ||
> > pstate 2 : 825  ||
> > pstate 3 : 787  ||
> > pstate 4 : 4||
> > 
> > CPU 2
> > State: Duration(ms)  Distribution
> > cstate 0 : 177  ||
> > cstate 1 : 9363 |*   |
> > cstate 2 : 55835|*** |
> > pstate 0 : 1468 ||
> > pstate 1 : 350  ||
> > pstate 2 : 1062 ||
> > pstate 3 : 1164 ||
> > pstate 4 : 7||
> 
> The output gets very long as the number of CPUs grow...
> What about using the following output:
> 
> state(ms)  cstate-0  cstate-1  cstate-2  pstate-0  pstate-1  pstate-2  
> pstate-3  pstate-4
> CPU-047,555 0 015,239 1,521 1,836 
> 1,83694
> CPU-18716,26450,458   832   131   825   
> 787 4
> CPU-2   177 9,36355,835 1,468   350 1,062 
> 1,164 7
> 
> Look at the code samples/bpf/xdp_redirect_cpu_user.c for an examples of
> howto align the columns, and the trick to get printf to pretty print
> with thousands separators use %' and setlocale(LC_NUMERIC, "en_US").

Thanks a lot for suggestion.  Using row/columns looks good for me,
will change code for this way.

> P.S. net-next and bpf-next is closed at the moment.
>  Next time you submit read[1] and [2], especially howto indicate which
> tree (bpf vs. bpf-next) the patch is against, as this helps the
> workflow of the maintainers.

Yeah, will read the docs and follow up the workflow and send new patch
to target next merge window.

Thanks,
Leo Yan

> [1] 
> https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/Documentation/bpf/bpf_devel_QA.txt
> [2] Documentation/networking/netdev-FAQ.txt 
> -- 
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer


[PATCH] samples/bpf: Add program for CPU state statistics

2018-01-30 Thread Leo Yan
: Duration(ms)  Distribution
cstate 0 : 95   ||
cstate 1 : 18377||
cstate 2 : 47609|*   |
pstate 0 : 1165 ||
pstate 1 : 243  ||
pstate 2 : 818  ||
pstate 3 : 1007 ||
pstate 4 : 9||

CPU 6
State: Duration(ms)  Distribution
cstate 0 : 102  ||
cstate 1 : 16629|**  |
cstate 2 : 49335|**  |
pstate 0 : 836  ||
pstate 1 : 253  ||
pstate 2 : 895  ||
pstate 3 : 1275 ||
pstate 4 : 6||

CPU 7
State: Duration(ms)  Distribution
cstate 0 : 88   ||
cstate 1 : 16070|**  |
cstate 2 : 50279|*** |
pstate 0 : 948  ||
pstate 1 : 214  ||
pstate 2 : 873  ||
pstate 3 : 952  ||
pstate 4 : 0||

Cc: Daniel Lezcano <daniel.lezc...@linaro.org>
Cc: Vincent Guittot <vincent.guit...@linaro.org>
Signed-off-by: Leo Yan <leo@linaro.org>
---
 samples/bpf/Makefile   |   4 +
 samples/bpf/cpustat_kern.c | 281 +
 samples/bpf/cpustat_user.c | 234 +
 3 files changed, 519 insertions(+)
 create mode 100644 samples/bpf/cpustat_kern.c
 create mode 100644 samples/bpf/cpustat_user.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index adeaa13..e5d747f 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -41,6 +41,7 @@ hostprogs-y += xdp_redirect_map
 hostprogs-y += xdp_redirect_cpu
 hostprogs-y += xdp_monitor
 hostprogs-y += syscall_tp
+hostprogs-y += cpustat
 
 # Libbpf dependencies
 LIBBPF := ../../tools/lib/bpf/bpf.o
@@ -89,6 +90,7 @@ xdp_redirect_map-objs := bpf_load.o $(LIBBPF) 
xdp_redirect_map_user.o
 xdp_redirect_cpu-objs := bpf_load.o $(LIBBPF) xdp_redirect_cpu_user.o
 xdp_monitor-objs := bpf_load.o $(LIBBPF) xdp_monitor_user.o
 syscall_tp-objs := bpf_load.o $(LIBBPF) syscall_tp_user.o
+cpustat-objs := bpf_load.o $(LIBBPF) cpustat_user.o
 
 # Tell kbuild to always build the programs
 always := $(hostprogs-y)
@@ -137,6 +139,7 @@ always += xdp_redirect_map_kern.o
 always += xdp_redirect_cpu_kern.o
 always += xdp_monitor_kern.o
 always += syscall_tp_kern.o
+always += cpustat_kern.o
 
 HOSTCFLAGS += -I$(objtree)/usr/include
 HOSTCFLAGS += -I$(srctree)/tools/lib/
@@ -179,6 +182,7 @@ HOSTLOADLIBES_xdp_redirect_map += -lelf
 HOSTLOADLIBES_xdp_redirect_cpu += -lelf
 HOSTLOADLIBES_xdp_monitor += -lelf
 HOSTLOADLIBES_syscall_tp += -lelf
+HOSTLOADLIBES_cpustat += -lelf
 
 # Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on 
cmdline:
 #  make samples/bpf/ LLC=~/git/llvm/build/bin/llc 
CLANG=~/git/llvm/build/bin/clang
diff --git a/samples/bpf/cpustat_kern.c b/samples/bpf/cpustat_kern.c
new file mode 100644
index 000..68c84da
--- /dev/null
+++ b/samples/bpf/cpustat_kern.c
@@ -0,0 +1,281 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include 
+#include 
+#include 
+#include "bpf_helpers.h"
+
+/*
+ * The CPU number, cstate number and pstate number are based
+ * on 96boards Hikey with octa CA53 CPUs.
+ *
+ * Every CPU have three idle states for cstate:
+ *   WFI, CPU_OFF, CLUSTER_OFF
+ *
+ * Every CPU have 5 operating points:
+ *   208MHz, 432MHz, 729MHz, 960MHz, 1200MHz
+ *
+ * This code is based on these assumption and other platforms
+ * need to adjust these definitions.
+ */
+#define MAX_CPU8
+#define MAX_PSTATE_ENTRIES 5
+#define MAX_CSTATE_ENTRIES 3
+
+static int cpu_opps[] = { 208000, 432000, 729000, 96, 120 };
+
+/*
+ * my_map structure is used to record cstate and pstate index and
+ * timestamp (Idx, Ts), when new event incoming we need to update
+ * combination for new state index and timestamp (Idx`, Ts`).
+ *
+ * Based on (Idx, Ts) and (Idx`, Ts`) we can calculate the time
+ * interval for the previous state: Duration(Idx) = Ts` - Ts.
+ *
+ * Every CPU has one below array for recording state index and
+ * timestamp, and record for cstate and pstate saperately:
+ *
+ * +--