from:"Roman Gushchin"

Re: [PATCH net] bpf: uninitialized variables in test code

2018-12-01 Thread Roman Gushchin

On Sat, Dec 01, 2018 at 11:28:46AM -0800, Alexei Starovoitov wrote:
> On Sat, Dec 01, 2018 at 07:13:50PM +0000, Roman Gushchin wrote:
> > On Fri, Nov 30, 2018 at 02:58:03PM -0800, Alexei Starovoitov wrote:
> > > On Thu, Nov 29, 2018 at 01:27:03PM +0300, Dan Carpenter wrote:
> > > > Smatch complains that if bpf_test_run() fails with -ENOMEM at the
> > > > begining then the "duration" is uninitialized.  We then copy the
> > > > unintialized variables to the user inside the bpf_test_finish()
> > > > function.  The functions require CAP_SYS_ADMIN so it's not really an
> > > > information leak.
> > > > 
> > > > Fixes: 1cf1cae963c2 ("bpf: introduce BPF_PROG_TEST_RUN command")
> > > > Signed-off-by: Dan Carpenter 
> > > 
> > > That is incorrect fixes tag.
> > > It should be pointing to commit f42ee093be29 ("bpf/test_run: support 
> > > cgroup local storage")
> > > 
> > > bpf_test_run() can only return the value that bpf program returned.
> > > It cannot return -ENOMEM.
> > > That code needs to be refactored.
> > > I think the proper way for bpf_test_run() would be to return 0 or -ENOMEM
> > > and store bpf's retval into extra pointer.
> > > Proper checks need to be added in the callers (bpf_prog_test_run_skb, 
> > > etc).
> > 
> > Makes total sense. How about this patch?
> 
> Thanks for the quick fix!
> 
> > Thanks!
> > 
> > --
> > 
> > From a2832f56c621d7809da8d4196877fa01621055f5 Mon Sep 17 00:00:00 2001
> > From: Roman Gushchin 
> > Date: Sat, 1 Dec 2018 10:39:44 -0800
> > Subject: [PATCH bpf] bpf: refactor bpf_test_run() to separate own failures 
> > and
> >  test program result
> > 
> > After commit f42ee093be29 ("bpf/test_run: support cgroup local
> > storage") the bpf_test_run() function may fail with -ENOMEM, if
> > it's not possible to allocate memory for a cgroup local storage.
> > 
> > This error shouldn't be mixed with the return value of the testing
> > program. Let's add an additional argument with a pointer where to
> > store the testing program's result; and make bpf_test_run()
> > return either 0 or -ENOMEM.
> > 
> > Fixes: f42ee093be29 ("bpf/test_run: support cgroup local storage")
> > Reported-by: Dan Carpenter 
> > Suggested-by: Alexei Starovoitov 
> > Signed-off-by: Roman Gushchin 
> > Cc: Daniel Borkmann 
> > Cc: Alexei Starovoitov 
> > ---
> >  net/bpf/test_run.c | 21 +++--
> >  1 file changed, 15 insertions(+), 6 deletions(-)
> > 
> > diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
> > index c89c22c49015..8bce7d8d00d9 100644
> > --- a/net/bpf/test_run.c
> > +++ b/net/bpf/test_run.c
> > @@ -28,12 +28,13 @@ static __always_inline u32 bpf_test_run_one(struct 
> > bpf_prog *prog, void *ctx,
> > return ret;
> >  }
> >  
> > -static u32 bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat, u32 
> > *time)
> > +static u32 bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat, u32 
> > *ret,
> > +   u32 *time)
> 
> may be 'int' return value?

Sure.
> 
> >  {
> > struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE] = { 0 };
> > enum bpf_cgroup_storage_type stype;
> > u64 time_start, time_spent = 0;
> > -   u32 ret = 0, i;
> > +   u32 i;
> >  
> > for_each_cgroup_storage_type(stype) {
> > storage[stype] = bpf_cgroup_storage_alloc(prog, stype);
> > @@ -49,7 +50,7 @@ static u32 bpf_test_run(struct bpf_prog *prog, void *ctx, 
> > u32 repeat, u32 *time)
> > repeat = 1;
> > time_start = ktime_get_ns();
> > for (i = 0; i < repeat; i++) {
> > -   ret = bpf_test_run_one(prog, ctx, storage);
> > +   *ret = bpf_test_run_one(prog, ctx, storage);
> > if (need_resched()) {
> > if (signal_pending(current))
> > break;
> > @@ -65,7 +66,7 @@ static u32 bpf_test_run(struct bpf_prog *prog, void *ctx, 
> > u32 repeat, u32 *time)
> > for_each_cgroup_storage_type(stype)
> > bpf_cgroup_storage_free(storage[stype]);
> >  
> > -   return ret;
> > +   return 0;
> >  }
> >  
> >  static int bpf_test_finish(const union bpf_attr *kattr,
> > @@ -165,7 +166,12 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const 
> > union bpf_attr *kattr,
> > __skb_push(skb, hh_l

Re: [PATCH net] bpf: uninitialized variables in test code

2018-12-01 Thread Roman Gushchin

On Fri, Nov 30, 2018 at 02:58:03PM -0800, Alexei Starovoitov wrote:
> On Thu, Nov 29, 2018 at 01:27:03PM +0300, Dan Carpenter wrote:
> > Smatch complains that if bpf_test_run() fails with -ENOMEM at the
> > begining then the "duration" is uninitialized.  We then copy the
> > unintialized variables to the user inside the bpf_test_finish()
> > function.  The functions require CAP_SYS_ADMIN so it's not really an
> > information leak.
> > 
> > Fixes: 1cf1cae963c2 ("bpf: introduce BPF_PROG_TEST_RUN command")
> > Signed-off-by: Dan Carpenter 
> 
> That is incorrect fixes tag.
> It should be pointing to commit f42ee093be29 ("bpf/test_run: support cgroup 
> local storage")
> 
> bpf_test_run() can only return the value that bpf program returned.
> It cannot return -ENOMEM.
> That code needs to be refactored.
> I think the proper way for bpf_test_run() would be to return 0 or -ENOMEM
> and store bpf's retval into extra pointer.
> Proper checks need to be added in the callers (bpf_prog_test_run_skb, etc).

Makes total sense. How about this patch?

Thanks!

--

>From a2832f56c621d7809da8d4196877fa01621055f5 Mon Sep 17 00:00:00 2001
From: Roman Gushchin 
Date: Sat, 1 Dec 2018 10:39:44 -0800
Subject: [PATCH bpf] bpf: refactor bpf_test_run() to separate own failures and
 test program result

After commit f42ee093be29 ("bpf/test_run: support cgroup local
storage") the bpf_test_run() function may fail with -ENOMEM, if
it's not possible to allocate memory for a cgroup local storage.

This error shouldn't be mixed with the return value of the testing
program. Let's add an additional argument with a pointer where to
store the testing program's result; and make bpf_test_run()
return either 0 or -ENOMEM.

Fixes: f42ee093be29 ("bpf/test_run: support cgroup local storage")
Reported-by: Dan Carpenter 
Suggested-by: Alexei Starovoitov 
Signed-off-by: Roman Gushchin 
Cc: Daniel Borkmann 
Cc: Alexei Starovoitov 
---
 net/bpf/test_run.c | 21 +++--
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index c89c22c49015..8bce7d8d00d9 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -28,12 +28,13 @@ static __always_inline u32 bpf_test_run_one(struct bpf_prog 
*prog, void *ctx,
return ret;
 }
 
-static u32 bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat, u32 
*time)
+static u32 bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat, u32 *ret,
+   u32 *time)
 {
struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE] = { 0 };
enum bpf_cgroup_storage_type stype;
u64 time_start, time_spent = 0;
-   u32 ret = 0, i;
+   u32 i;
 
for_each_cgroup_storage_type(stype) {
storage[stype] = bpf_cgroup_storage_alloc(prog, stype);
@@ -49,7 +50,7 @@ static u32 bpf_test_run(struct bpf_prog *prog, void *ctx, u32 
repeat, u32 *time)
repeat = 1;
time_start = ktime_get_ns();
for (i = 0; i < repeat; i++) {
-   ret = bpf_test_run_one(prog, ctx, storage);
+   *ret = bpf_test_run_one(prog, ctx, storage);
if (need_resched()) {
if (signal_pending(current))
break;
@@ -65,7 +66,7 @@ static u32 bpf_test_run(struct bpf_prog *prog, void *ctx, u32 
repeat, u32 *time)
for_each_cgroup_storage_type(stype)
bpf_cgroup_storage_free(storage[stype]);
 
-   return ret;
+   return 0;
 }
 
 static int bpf_test_finish(const union bpf_attr *kattr,
@@ -165,7 +166,12 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const 
union bpf_attr *kattr,
__skb_push(skb, hh_len);
if (is_direct_pkt_access)
bpf_compute_data_pointers(skb);
-   retval = bpf_test_run(prog, skb, repeat, );
+   ret = bpf_test_run(prog, skb, repeat, , );
+   if (ret) {
+   kfree(data);
+   kfree(sk);
+   return ret;
+   }
if (!is_l2) {
if (skb_headroom(skb) < hh_len) {
int nhead = HH_DATA_ALIGN(hh_len - skb_headroom(skb));
@@ -212,11 +218,14 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const 
union bpf_attr *kattr,
rxqueue = __netif_get_rx_queue(current->nsproxy->net_ns->loopback_dev, 
0);
xdp.rxq = >xdp_rxq;
 
-   retval = bpf_test_run(prog, , repeat, );
+   ret = bpf_test_run(prog, , repeat, , );
+   if (ret)
+   goto out;
if (xdp.data != data + XDP_PACKET_HEADROOM + NET_IP_ALIGN ||
xdp.data_end != xdp.data + size)
size = xdp.data_end - xdp.data;
ret = bpf_test_finish(kattr, uattr, xdp.data, size, retval, duration);
+out:
kfree(data);
return ret;
 }
-- 
2.17.2

Re: BUG: sleeping function called from invalid context at mm/slab.h:421

2018-11-13 Thread Roman Gushchin

On Tue, Nov 13, 2018 at 10:03:38PM +0530, Naresh Kamboju wrote:
> While running kernel selftests bpf test_cgroup_storage test this
> kernel BUG reported every time on all devices running Linux -next
> 4.20.0-rc2-next-20181113 (from 4.19.0-rc5-next-20180928).
> This kernel BUG log is from x86_64 machine.
> 
> Do you see at your end ?
> 
> [   73.047526] BUG: sleeping function called from invalid context at
> /srv/oe/build/tmp-rpb-glibc/work-shared/intel-corei7-64/kernel-source/mm/slab.h:421
> [   73.060915] in_atomic(): 1, irqs_disabled(): 0, pid: 3157, name:
> test_cgroup_sto
> [   73.068342] INFO: lockdep is turned off.
> [   73.072293] CPU: 2 PID: 3157 Comm: test_cgroup_sto Not tainted
> 4.20.0-rc2-next-20181113 #1
> [   73.080548] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS
> 2.0b 07/27/2017
> [   73.088018] Call Trace:
> [   73.090463]  dump_stack+0x70/0xa5
> [   73.093783]  ___might_sleep+0x152/0x240
> [   73.097619]  __might_sleep+0x4a/0x80
> [   73.101191]  __kmalloc_node+0x1cf/0x2f0
> [   73.105031]  ? cgroup_storage_update_elem+0x46/0x90
> [   73.109909]  cgroup_storage_update_elem+0x46/0x90
> [   73.114608]  map_update_elem+0x4a1/0x4c0
> [   73.118534]  __x64_sys_bpf+0x124/0x280
> [   73.122286]  do_syscall_64+0x4f/0x190
> [   73.125952]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [   73.131004] RIP: 0033:0x7f46b93ea7f9
> [   73.134581] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00
> 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24
> 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 4f e6 2b 00 f7 d8 64 89
> 01 48
> [   73.153318] RSP: 002b:7fffc6595858 EFLAGS: 0206 ORIG_RAX:
> 0141
> [   73.160876] RAX: ffda RBX: 014a0260 RCX: 
> 7f46b93ea7f9
> [   73.167999] RDX: 0048 RSI: 7fffc65958a0 RDI: 
> 0002
> [   73.175124] RBP: 7fffc6595870 R08: 7fffc65958a0 R09: 
> 7fffc65958a0
> [   73.182246] R10: 7fffc65958a0 R11: 0206 R12: 
> 0003
> [   73.189369] R13: 0004 R14: 0005 R15: 
> 0006
> selftests: bpf: test_cgroup_storage

Hi Naresh!

Thank you for the report! Can you, please, try the following patch?

Thanks!

--

diff --git a/kernel/bpf/local_storage.c b/kernel/bpf/local_storage.c
index c97a8f968638..d91710fb8360 100644
--- a/kernel/bpf/local_storage.c
+++ b/kernel/bpf/local_storage.c
@@ -139,8 +139,8 @@ static int cgroup_storage_update_elem(struct bpf_map *map, 
void *_key,
return -ENOENT;
 
new = kmalloc_node(sizeof(struct bpf_storage_buffer) +
-  map->value_size, __GFP_ZERO | GFP_USER,
-  map->numa_node);
+  map->value_size, __GFP_ZERO | GFP_ATOMIC |
+  __GFP_NOWARN, map->numa_node);
if (!new)
return -ENOMEM;

Re: [PATCH bpf 2/4] bpf: don't set id on after map lookup with ptr_to_map_val return

2018-10-31 Thread Roman Gushchin

On Thu, Nov 01, 2018 at 12:05:53AM +0100, Daniel Borkmann wrote:
> In the verifier there is no such semantics where registers with
> PTR_TO_MAP_VALUE type have an id assigned to them. This is only
> used in PTR_TO_MAP_VALUE_OR_NULL and later on nullified once the
> test against NULL has been pattern matched and type transformed
> into PTR_TO_MAP_VALUE.
> 
> Fixes: 3e6a4b3e0289 ("bpf/verifier: introduce BPF_PTR_TO_MAP_VALUE")
> Signed-off-by: Daniel Borkmann 
> Cc: Roman Gushchin 
> Acked-by: Alexei Starovoitov 

Looks good to me.
Acked-by: Roman Gushchin 

Thanks!

Re: [PATCH bpf] bpf: fix wrong helper enablement in cgroup local storage

2018-10-26 Thread Roman Gushchin

On Sat, Oct 27, 2018 at 12:49:02AM +0200, Daniel Borkmann wrote:
> Commit cd3394317653 ("bpf: introduce the bpf_get_local_storage()
> helper function") enabled the bpf_get_local_storage() helper also
> for BPF program types where it does not make sense to use them.
> 
> They have been added both in sk_skb_func_proto() and sk_msg_func_proto()
> even though both program types are not invoked in combination with
> cgroups, and neither through BPF_PROG_RUN_ARRAY(). In the latter the
> bpf_cgroup_storage_set() is set shortly before BPF program invocation.
> 
> Later, the helper bpf_get_local_storage() retrieves this prior set
> up per-cpu pointer and hands the buffer to the BPF program. The map
> argument in there solely retrieves the enum bpf_cgroup_storage_type
> from a local storage map associated with the program and based on the
> type returns either the global or per-cpu storage. However, there
> is no specific association between the program's map and the actual
> content in bpf_cgroup_storage[].
> 
> Meaning, any BPF program that would have been properly run from the
> cgroup side through BPF_PROG_RUN_ARRAY() where bpf_cgroup_storage_set()
> was performed, and that is later unloaded such that prog / maps are
> teared down will cause a use after free if that pointer is retrieved
> from programs that are not run through BPF_PROG_RUN_ARRAY() but have
> the cgroup local storage helper enabled in their func proto.
> 
> Lets just remove it from the two sock_map program types to fix it.
> Auditing through the types where this helper is enabled, it appears
> that these are the only ones where it was mistakenly allowed.
> 
> Fixes: cd3394317653 ("bpf: introduce the bpf_get_local_storage() helper 
> function")
> Signed-off-by: Daniel Borkmann 
> Cc: Roman Gushchin 
> Acked-by: John Fastabend 


Acked-by: Roman Gushchin 

Thanks, Daniel!

Re: [PATCH bpf-next] bpf: permit CGROUP_DEVICE programs accessing helper bpf_get_current_cgroup_id()

2018-09-28 Thread Roman Gushchin

On Thu, Sep 27, 2018 at 02:37:30PM -0700, Yonghong Song wrote:
> Currently, helper bpf_get_current_cgroup_id() is not permitted
> for CGROUP_DEVICE type of programs. If the helper is used
> in such cases, the verifier will log the following error:
> 
>   0: (bf) r6 = r1
>   1: (69) r7 = *(u16 *)(r6 +0)
>   2: (85) call bpf_get_current_cgroup_id#80
>   unknown func bpf_get_current_cgroup_id#80
> 
> The bpf_get_current_cgroup_id() is useful for CGROUP_DEVICE
> type of programs in order to customize action based on cgroup id.
> This patch added such a support.
> 
> Cc: Roman Gushchin 
> Signed-off-by: Yonghong Song 

Acked-by: Roman Gushchin 

Thanks, Yonghong!

Re: [PATCH bpf] tools/bpf: fix bpf selftest test_cgroup_storage failure

2018-08-17 Thread Roman Gushchin

On Fri, Aug 17, 2018 at 08:54:15AM -0700, Yonghong Song wrote:
> The bpf selftest test_cgroup_storage failed in one of
> our production test servers.
>   # sudo ./test_cgroup_storage
>   Failed to create map: Operation not permitted
> 
> It turns out this is due to insufficient locked memory
> with system default 16KB.
> 
> Similar to other self tests, let us arm the process
> with unlimited locked memory. With this change,
> the test passed.
>   # sudo ./test_cgroup_storage
>   test_cgroup_storage:PASS
> 
> Fixes: 68cfa3ac6b8d ("selftests/bpf: add a cgroup storage test")
> Cc: Roman Gushchin 
> Signed-off-by: Yonghong Song 
> ---
>  tools/testing/selftests/bpf/test_cgroup_storage.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/tools/testing/selftests/bpf/test_cgroup_storage.c 
> b/tools/testing/selftests/bpf/test_cgroup_storage.c
> index dc83fb2d3f27..4e196e3bfecf 100644
> --- a/tools/testing/selftests/bpf/test_cgroup_storage.c
> +++ b/tools/testing/selftests/bpf/test_cgroup_storage.c
> @@ -5,6 +5,7 @@
>  #include 
>  #include 
>  
> +#include "bpf_rlimit.h"
>  #include "cgroup_helpers.h"
>  
>  char bpf_log_buf[BPF_LOG_BUF_SIZE];
> -- 
> 2.17.1
> 

Acked-by: Roman Gushchin 

Thank you, Yonghong!

Re: [PATCH bpf] bpf: fix a rcu usage warning in bpf_prog_array_copy_core()

2018-08-15 Thread Roman Gushchin

On Wed, Aug 15, 2018 at 02:30:11PM -0700, Alexei Starovoitov wrote:
> On Tue, Aug 14, 2018 at 05:08:44PM -0700, Roman Gushchin wrote:
> > On Tue, Aug 14, 2018 at 04:59:45PM -0700, Alexei Starovoitov wrote:
> > > On Tue, Aug 14, 2018 at 11:01:12AM -0700, Yonghong Song wrote:
> > > > Commit 394e40a29788 ("bpf: extend bpf_prog_array to store pointers
> > > > to the cgroup storage") refactored the bpf_prog_array_copy_core()
> > > > to accommodate new structure bpf_prog_array_item which contains
> > > > bpf_prog array itself.
> > > > 
> > > > In the old code, we had
> > > >perf_event_query_prog_array():
> > > >  mutex_lock(...)
> > > >  bpf_prog_array_copy_call():
> > > >prog = rcu_dereference_check(array, 1)->progs
> > > >bpf_prog_array_copy_core(prog, ...)
> > > >  mutex_unlock(...)
> > > > 
> > > > With the above commit, we had
> > > >perf_event_query_prog_array():
> > > >  mutex_lock(...)
> > > >  bpf_prog_array_copy_call():
> > > >bpf_prog_array_copy_core(array, ...):
> > > >  item = rcu_dereference(array)->items;
> > > >  ...
> > > >  mutex_unlock(...)
> > > > 
> > > > The new code will trigger a lockdep rcu checking warning.
> > > > The fix is to change rcu_dereference() to rcu_dereference_check()
> > > > to prevent such a warning.
> > > > 
> > > > Reported-by: syzbot+6e72317008eef84a2...@syzkaller.appspotmail.com
> > > > Fixes: 394e40a29788 ("bpf: extend bpf_prog_array to store pointers to 
> > > > the cgroup storage")
> > > > Cc: Roman Gushchin 
> > > > Signed-off-by: Yonghong Song 
> > > 
> > > makes sense to me
> > > Acked-by: Alexei Starovoitov 
> > > 
> > > Roman, would you agree?
> > > 
> > 
> > rcu_dereference_check(<>, 1) always looks a bit strange to me,
> > but if it's the only reasonable way to silence the warning,
> > of course I'm fine with it.
> 
> do you have better suggestion?
> This patch is a fix for the regression introduced in your earlier patch,
> so I think the only fair path forward is either to Ack it or
> to send an alternative patch asap.
> 

As I said, I've nothing against.

Acked-by: Roman Gushchin 

Thanks!

Re: [PATCH bpf] bpf: fix a rcu usage warning in bpf_prog_array_copy_core()

2018-08-14 Thread Roman Gushchin

On Tue, Aug 14, 2018 at 04:59:45PM -0700, Alexei Starovoitov wrote:
> On Tue, Aug 14, 2018 at 11:01:12AM -0700, Yonghong Song wrote:
> > Commit 394e40a29788 ("bpf: extend bpf_prog_array to store pointers
> > to the cgroup storage") refactored the bpf_prog_array_copy_core()
> > to accommodate new structure bpf_prog_array_item which contains
> > bpf_prog array itself.
> > 
> > In the old code, we had
> >perf_event_query_prog_array():
> >  mutex_lock(...)
> >  bpf_prog_array_copy_call():
> >prog = rcu_dereference_check(array, 1)->progs
> >bpf_prog_array_copy_core(prog, ...)
> >  mutex_unlock(...)
> > 
> > With the above commit, we had
> >perf_event_query_prog_array():
> >  mutex_lock(...)
> >  bpf_prog_array_copy_call():
> >bpf_prog_array_copy_core(array, ...):
> >  item = rcu_dereference(array)->items;
> >  ...
> >  mutex_unlock(...)
> > 
> > The new code will trigger a lockdep rcu checking warning.
> > The fix is to change rcu_dereference() to rcu_dereference_check()
> > to prevent such a warning.
> > 
> > Reported-by: syzbot+6e72317008eef84a2...@syzkaller.appspotmail.com
> > Fixes: 394e40a29788 ("bpf: extend bpf_prog_array to store pointers to the 
> > cgroup storage")
> > Cc: Roman Gushchin 
> > Signed-off-by: Yonghong Song 
> 
> makes sense to me
> Acked-by: Alexei Starovoitov 
> 
> Roman, would you agree?
> 

rcu_dereference_check(<>, 1) always looks a bit strange to me,
but if it's the only reasonable way to silence the warning,
of course I'm fine with it.

Thanks!

Re: [PATCH bpf-net 00/14] bpf: cgroup local storage

2018-06-28 Thread Roman Gushchin

On Thu, Jun 28, 2018 at 09:34:44AM -0700, Roman Gushchin wrote:
> This patchset implements cgroup local storage for bpf programs.
> The main idea is to provide a fast accessible memory for storing
> various per-cgroup data, e.g. number of transmitted packets.

Just noticed a typo in the subject: "bpf-net" :)
Will resend the patchset.
Sorry for confusion.

Thanks,
Roman

[PATCH bpf-net 10/14] bpftool: add support for CGROUP_STORAGE maps

2018-06-28 Thread Roman Gushchin

Add BPF_MAP_TYPE_CGROUP_STORAGE maps to the list
of maps types which bpftool recognizes.

Signed-off-by: Roman Gushchin 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Cc: Jakub Kicinski 
Acked-by: Martin KaFai Lau 
---
 tools/bpf/bpftool/map.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c
index 097b1a5e046b..154d258cdde3 100644
--- a/tools/bpf/bpftool/map.c
+++ b/tools/bpf/bpftool/map.c
@@ -67,6 +67,7 @@ static const char * const map_type_name[] = {
[BPF_MAP_TYPE_SOCKMAP]  = "sockmap",
[BPF_MAP_TYPE_CPUMAP]   = "cpumap",
[BPF_MAP_TYPE_SOCKHASH] = "sockhash",
+   [BPF_MAP_TYPE_CGROUP_STORAGE]   = "cgroup_storage",
 };
 
 static bool map_is_per_cpu(__u32 type)
-- 
2.14.4

[PATCH bpf-net 14/14] samples/bpf: extend test_cgrp2_attach2 test to use cgroup storage

2018-06-28 Thread Roman Gushchin

The test_cgrp2_attach test covers bpf cgroup attachment code well,
so let's re-use it for testing allocation/releasing of cgroup storage.

The extension is pretty straightforward: the bpf program will use
the cgroup storage to save the number of transmitted bytes.

Expected output:
  $ ./test_cgrp2_attach2
  Attached DROP prog. This ping in cgroup /foo should fail...
  ping: sendmsg: Operation not permitted
  Attached DROP prog. This ping in cgroup /foo/bar should fail...
  ping: sendmsg: Operation not permitted
  Attached PASS prog. This ping in cgroup /foo/bar should pass...
  Detached PASS from /foo/bar while DROP is attached to /foo.
  This ping in cgroup /foo/bar should fail...
  ping: sendmsg: Operation not permitted
  Attached PASS from /foo/bar and detached DROP from /foo.
  This ping in cgroup /foo/bar should pass...
  ### override:PASS
  ### multi:PASS

Signed-off-by: Roman Gushchin 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Acked-by: Martin KaFai Lau 
---
 samples/bpf/test_cgrp2_attach2.c | 27 ++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/samples/bpf/test_cgrp2_attach2.c b/samples/bpf/test_cgrp2_attach2.c
index b453e6a161be..f682e0b8aa83 100644
--- a/samples/bpf/test_cgrp2_attach2.c
+++ b/samples/bpf/test_cgrp2_attach2.c
@@ -8,7 +8,8 @@
  *   information. The number of invocations of the program, which maps
  *   to the number of packets received, is stored to key 0. Key 1 is
  *   incremented on each iteration by the number of bytes stored in
- *   the skb.
+ *   the skb. The program also stores the number of received bytes
+ *   in the cgroup storage.
  *
  * - Attaches the new program to a cgroup using BPF_PROG_ATTACH
  *
@@ -21,12 +22,15 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 
 #include 
 #include 
 
 #include "bpf_insn.h"
+#include "bpf_rlimit.h"
 #include "cgroup_helpers.h"
 
 #define FOO"/foo"
@@ -205,6 +209,8 @@ static int map_fd = -1;
 
 static int prog_load_cnt(int verdict, int val)
 {
+   int cgroup_storage_fd;
+
if (map_fd < 0)
map_fd = bpf_create_map(BPF_MAP_TYPE_ARRAY, 4, 8, 1, 0);
if (map_fd < 0) {
@@ -212,6 +218,13 @@ static int prog_load_cnt(int verdict, int val)
return -1;
}
 
+   cgroup_storage_fd = bpf_create_map(BPF_MAP_TYPE_CGROUP_STORAGE,
+   sizeof(struct bpf_cgroup_storage_key), 8, 0, 0);
+   if (cgroup_storage_fd < 0) {
+   printf("failed to create map '%s'\n", strerror(errno));
+   return -1;
+   }
+
struct bpf_insn prog[] = {
BPF_MOV32_IMM(BPF_REG_0, 0),
BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_0, -4), /* *(u32 *)(fp - 
4) = r0 */
@@ -222,6 +235,11 @@ static int prog_load_cnt(int verdict, int val)
BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2),
BPF_MOV64_IMM(BPF_REG_1, val), /* r1 = 1 */
BPF_RAW_INSN(BPF_STX | BPF_XADD | BPF_DW, BPF_REG_0, BPF_REG_1, 
0, 0), /* xadd r0 += r1 */
+   BPF_LD_MAP_FD(BPF_REG_1, cgroup_storage_fd),
+   BPF_MOV64_IMM(BPF_REG_2, 0),
+   BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, 
BPF_FUNC_get_local_storage),
+   BPF_MOV64_IMM(BPF_REG_1, val),
+   BPF_RAW_INSN(BPF_STX | BPF_XADD | BPF_W, BPF_REG_0, BPF_REG_1, 
0, 0),
BPF_MOV64_IMM(BPF_REG_0, verdict), /* r0 = verdict */
BPF_EXIT_INSN(),
};
@@ -237,6 +255,7 @@ static int prog_load_cnt(int verdict, int val)
printf("Output from verifier:\n%s\n---\n", bpf_log_buf);
return 0;
}
+   close(cgroup_storage_fd);
return ret;
 }
 
@@ -414,6 +433,12 @@ static int test_multiprog(void)
 int main(int argc, char **argv)
 {
int rc = 0;
+   struct rlimit r = {1024*1024, RLIM_INFINITY};
+
+   if (setrlimit(RLIMIT_MEMLOCK, )) {
+   log_err("Setrlimit(RLIMIT_MEMLOCK) failed");
+   return 1;
+   }
 
rc = test_foo_bar();
if (rc)
-- 
2.14.4

[PATCH bpf-net 07/14] bpf: don't allow create maps of cgroup local storages

2018-06-28 Thread Roman Gushchin

As there is one-to-one relation between a bpf program
and cgroup local storage map, there is no sense in
creating a map of cgroup local storage maps.

Forbid it explicitly to avoid possible side effects.

Signed-off-by: Roman Gushchin 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Acked-by: Martin KaFai Lau 
---
 kernel/bpf/map_in_map.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/map_in_map.c b/kernel/bpf/map_in_map.c
index 1da574612bea..3bfbf4464416 100644
--- a/kernel/bpf/map_in_map.c
+++ b/kernel/bpf/map_in_map.c
@@ -23,7 +23,8 @@ struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd)
 * is a runtime binding.  Doing static check alone
 * in the verifier is not enough.
 */
-   if (inner_map->map_type == BPF_MAP_TYPE_PROG_ARRAY) {
+   if (inner_map->map_type == BPF_MAP_TYPE_PROG_ARRAY ||
+   inner_map->map_type == BPF_MAP_TYPE_CGROUP_STORAGE) {
fdput(f);
return ERR_PTR(-ENOTSUPP);
}
-- 
2.14.4

[PATCH bpf-net 11/14] bpf/test_run: support cgroup local storage

2018-06-28 Thread Roman Gushchin

Allocate a temporary cgroup storage to use for bpf program test runs.

Because the test program is not actually attached to a cgroup,
the storage is allocated manually just for the execution
of the bpf program.

If the program is executed multiple times, the storage is not zeroed
on each run, emulating multiple runs of the program, attached to
a real cgroup.

Signed-off-by: Roman Gushchin 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Acked-by: Martin KaFai Lau 
---
 net/bpf/test_run.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 68c3578343b4..74971a9b7cfb 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -11,12 +11,14 @@
 #include 
 #include 
 
-static __always_inline u32 bpf_test_run_one(struct bpf_prog *prog, void *ctx)
+static __always_inline u32 bpf_test_run_one(struct bpf_prog *prog, void *ctx,
+   struct bpf_cgroup_storage *storage)
 {
u32 ret;
 
preempt_disable();
rcu_read_lock();
+   bpf_cgroup_storage_set(storage);
ret = BPF_PROG_RUN(prog, ctx);
rcu_read_unlock();
preempt_enable();
@@ -26,14 +28,19 @@ static __always_inline u32 bpf_test_run_one(struct bpf_prog 
*prog, void *ctx)
 
 static u32 bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat, u32 
*time)
 {
+   struct bpf_cgroup_storage *storage = NULL;
u64 time_start, time_spent = 0;
u32 ret = 0, i;
 
+   storage = bpf_cgroup_storage_alloc(prog);
+   if (IS_ERR(storage))
+   return PTR_ERR(storage);
+
if (!repeat)
repeat = 1;
time_start = ktime_get_ns();
for (i = 0; i < repeat; i++) {
-   ret = bpf_test_run_one(prog, ctx);
+   ret = bpf_test_run_one(prog, ctx, storage);
if (need_resched()) {
if (signal_pending(current))
break;
@@ -46,6 +53,8 @@ static u32 bpf_test_run(struct bpf_prog *prog, void *ctx, u32 
repeat, u32 *time)
do_div(time_spent, repeat);
*time = time_spent > U32_MAX ? U32_MAX : (u32)time_spent;
 
+   bpf_cgroup_storage_free(storage);
+
return ret;
 }
 
-- 
2.14.4

[PATCH bpf-net 13/14] selftests/bpf: add a cgroup storage test

2018-06-28 Thread Roman Gushchin

Implement a test to cover the cgroup storage functionality.
The test implements a bpf program which drops every second packet
by using the cgroup storage as a persistent storage.

The test also use the userspace API to check the data
in the cgroup storage, alter it, and check that the loaded
and attached bpf program sees the update.

Expected output:
  $ ./test_cgroup_storage
  test_cgroup_storage:PASS

Signed-off-by: Roman Gushchin 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Acked-by: Martin KaFai Lau 
---
 tools/testing/selftests/bpf/Makefile  |   4 +-
 tools/testing/selftests/bpf/test_cgroup_storage.c | 130 ++
 2 files changed, 133 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/bpf/test_cgroup_storage.c

diff --git a/tools/testing/selftests/bpf/Makefile 
b/tools/testing/selftests/bpf/Makefile
index 7a6214e9ae58..81f38623fc9f 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -22,7 +22,8 @@ $(TEST_CUSTOM_PROGS): $(OUTPUT)/%: %.c
 # Order correspond to 'make run_tests' order
 TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map 
test_progs \
test_align test_verifier_log test_dev_cgroup test_tcpbpf_user \
-   test_sock test_btf test_sockmap test_lirc_mode2_user get_cgroup_id_user
+   test_sock test_btf test_sockmap test_lirc_mode2_user get_cgroup_id_user 
\
+   test_cgroup_storage
 
 TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o test_tcp_estats.o 
test_obj_id.o \
test_pkt_md_access.o test_xdp_redirect.o test_xdp_meta.o 
sockmap_parse_prog.o \
@@ -63,6 +64,7 @@ $(OUTPUT)/test_sock_addr: cgroup_helpers.c
 $(OUTPUT)/test_sockmap: cgroup_helpers.c
 $(OUTPUT)/test_progs: trace_helpers.c
 $(OUTPUT)/get_cgroup_id_user: cgroup_helpers.c
+$(OUTPUT)/test_cgroup_storage: cgroup_helpers.c
 
 .PHONY: force
 
diff --git a/tools/testing/selftests/bpf/test_cgroup_storage.c 
b/tools/testing/selftests/bpf/test_cgroup_storage.c
new file mode 100644
index ..0597943ce34b
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_cgroup_storage.c
@@ -0,0 +1,130 @@
+// SPDX-License-Identifier: GPL-2.0
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "cgroup_helpers.h"
+
+char bpf_log_buf[BPF_LOG_BUF_SIZE];
+
+#define TEST_CGROUP "/test-bpf-cgroup-storage-buf/"
+
+int main(int argc, char **argv)
+{
+   struct bpf_insn prog[] = {
+   BPF_LD_MAP_FD(BPF_REG_1, 0), /* map fd */
+   BPF_MOV64_IMM(BPF_REG_2, 0), /* flags, not used */
+   BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+BPF_FUNC_get_local_storage),
+   BPF_LDX_MEM(BPF_W, BPF_REG_1, BPF_REG_0, 0),
+   BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 1),
+   BPF_STX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, 0),
+   BPF_ALU64_IMM(BPF_AND, BPF_REG_1, 0x1),
+   BPF_MOV64_REG(BPF_REG_0, BPF_REG_1),
+   BPF_EXIT_INSN(),
+   };
+   size_t insns_cnt = sizeof(prog) / sizeof(struct bpf_insn);
+   int error = EXIT_FAILURE;
+   int map_fd, prog_fd, cgroup_fd;
+   struct bpf_cgroup_storage_key key;
+   unsigned long long value;
+
+   map_fd = bpf_create_map(BPF_MAP_TYPE_CGROUP_STORAGE, sizeof(key),
+   sizeof(value), 0, 0);
+   if (map_fd < 0) {
+   printf("Failed to create map: %s\n", strerror(errno));
+   goto out;
+   }
+
+   prog[0].imm = map_fd;
+   prog_fd = bpf_load_program(BPF_PROG_TYPE_CGROUP_SKB,
+  prog, insns_cnt, "GPL", 0,
+  bpf_log_buf, BPF_LOG_BUF_SIZE);
+   if (prog_fd < 0) {
+   printf("Failed to load bpf program: %s\n", bpf_log_buf);
+   goto out;
+   }
+
+   if (setup_cgroup_environment()) {
+   printf("Failed to setup cgroup environment\n");
+   goto err;
+   }
+
+   /* Create a cgroup, get fd, and join it */
+   cgroup_fd = create_and_get_cgroup(TEST_CGROUP);
+   if (!cgroup_fd) {
+   printf("Failed to create test cgroup\n");
+   goto err;
+   }
+
+   if (join_cgroup(TEST_CGROUP)) {
+   printf("Failed to join cgroup\n");
+   goto err;
+   }
+
+   /* Attach the bpf program */
+   if (bpf_prog_attach(prog_fd, cgroup_fd, BPF_CGROUP_INET_EGRESS, 0)) {
+   printf("Failed to attach bpf program\n");
+   goto err;
+   }
+
+   if (bpf_map_get_next_key(map_fd, NULL, )) {
+   printf("Failed to get the first key in cgroup storage\n");
+   goto err;
+   }
+
+   if (bpf_map_lookup_elem(map_fd, , )) {
+   printf("Failed to lookup cgroup storage\n");
+   goto err;
+   }

[PATCH bpf-net 08/14] bpf: introduce the bpf_get_local_storage() helper function

2018-06-28 Thread Roman Gushchin

The bpf_get_local_storage() helper function is used
to get a pointer to the bpf local storage from a bpf program.

It takes a pointer to a storage map and flags as arguments.
Right now it accepts only cgroup storage maps, and flags
argument has to be 0. Further it can be extended to support
other types of local storage: e.g. thread local storage etc.

Signed-off-by: Roman Gushchin 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Acked-by: Martin KaFai Lau 
---
 include/linux/bpf.h  |  2 ++
 include/uapi/linux/bpf.h | 13 -
 kernel/bpf/cgroup.c  |  2 ++
 kernel/bpf/helpers.c | 20 
 kernel/bpf/verifier.c| 18 ++
 net/core/filter.c| 23 ++-
 6 files changed, 76 insertions(+), 2 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 6d7e0dfc..1fdcf9d21b74 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -771,6 +771,8 @@ extern const struct bpf_func_proto 
bpf_sock_map_update_proto;
 extern const struct bpf_func_proto bpf_sock_hash_update_proto;
 extern const struct bpf_func_proto bpf_get_current_cgroup_id_proto;
 
+extern const struct bpf_func_proto bpf_get_local_storage_proto;
+
 /* Shared helpers among cBPF and eBPF. */
 void bpf_user_rnd_init_once(void);
 u64 bpf_user_rnd_u32(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 7aa135e4c2f3..baf74db6c06e 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2081,6 +2081,16 @@ union bpf_attr {
  * Return
  * A 64-bit integer containing the current cgroup id based
  * on the cgroup within which the current task is running.
+ *
+ * void* get_local_storage(void *map, u64 flags)
+ * Description
+ * Get the pointer to the local storage area.
+ * The type and the size of the local storage is defined
+ * by the *map* argument.
+ * The *flags* meaning is specific for each map type,
+ * and has to be 0 for cgroup local storage.
+ * Return
+ * Pointer to the local storage area.
  */
 #define __BPF_FUNC_MAPPER(FN)  \
FN(unspec), \
@@ -2163,7 +2173,8 @@ union bpf_attr {
FN(rc_repeat),  \
FN(rc_keydown), \
FN(skb_cgroup_id),  \
-   FN(get_current_cgroup_id),
+   FN(get_current_cgroup_id),  \
+   FN(get_local_storage),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 14a1f6c94592..47d4519a6847 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -629,6 +629,8 @@ cgroup_dev_func_proto(enum bpf_func_id func_id, const 
struct bpf_prog *prog)
return _map_delete_elem_proto;
case BPF_FUNC_get_current_uid_gid:
return _get_current_uid_gid_proto;
+   case BPF_FUNC_get_local_storage:
+   return _get_local_storage_proto;
case BPF_FUNC_trace_printk:
if (capable(CAP_SYS_ADMIN))
return bpf_get_trace_printk_proto();
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 73065e2d23c2..ca17b4ed3ac9 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -193,4 +193,24 @@ const struct bpf_func_proto 
bpf_get_current_cgroup_id_proto = {
.gpl_only   = false,
.ret_type   = RET_INTEGER,
 };
+
+DECLARE_PER_CPU(void*, bpf_cgroup_storage);
+
+BPF_CALL_2(bpf_get_local_storage, struct bpf_map *, map, u64, flags)
+{
+   /* map and flags arguments are not used now,
+* but provide an ability to extend the API
+* for other types of local storages.
+* verifier checks that their values are correct.
+*/
+   return (u64)this_cpu_read(bpf_cgroup_storage);
+}
+
+const struct bpf_func_proto bpf_get_local_storage_proto = {
+   .func   = bpf_get_local_storage,
+   .gpl_only   = false,
+   .ret_type   = RET_PTR_TO_MAP_VALUE,
+   .arg1_type  = ARG_CONST_MAP_PTR,
+   .arg2_type  = ARG_ANYTHING,
+};
 #endif
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index cc0c7990f849..a0f5c26fffc1 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -2127,6 +2127,10 @@ static int check_map_func_compatibility(struct 
bpf_verifier_env *env,
func_id != BPF_FUNC_current_task_under_cgroup)
goto error;
break;
+   case BPF_MAP_TYPE_CGROUP_STORAGE:
+   if (func_id != BPF_FUNC_get_local_storage)
+   goto error;
+   break;
/* devmap returns a pointer to a live net_device ifindex that we cannot
 * allow to be modified from bpf side. So do not allow lookup elements
 * for now.
@@ -2209,6 +2213,10

[PATCH bpf-net 03/14] bpf: pass a pointer to a cgroup storage using pcpu variable

2018-06-28 Thread Roman Gushchin

This commit introduces the bpf_cgroup_storage_set() helper,
which will be used to pass a pointer to a cgroup storage
to the bpf helper.

Signed-off-by: Roman Gushchin 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Acked-by: Martin KaFai Lau 
---
 include/linux/bpf-cgroup.h | 14 ++
 kernel/bpf/local_storage.c |  2 ++
 2 files changed, 16 insertions(+)

diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index b4e2e42c1d2a..128fb0e39b4d 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -20,6 +20,8 @@ struct bpf_cgroup_storage;
 extern struct static_key_false cgroup_bpf_enabled_key;
 #define cgroup_bpf_enabled static_branch_unlikely(_bpf_enabled_key)
 
+DECLARE_PER_CPU(void*, bpf_cgroup_storage);
+
 struct bpf_cgroup_storage_map;
 
 struct bpf_storage_buffer {
@@ -96,6 +98,17 @@ int __cgroup_bpf_run_filter_sock_ops(struct sock *sk,
 int __cgroup_bpf_check_dev_permission(short dev_type, u32 major, u32 minor,
  short access, enum bpf_attach_type type);
 
+static inline void bpf_cgroup_storage_set(struct bpf_cgroup_storage *storage)
+{
+   struct bpf_storage_buffer *buf;
+
+   if (!storage)
+   return;
+
+   buf = rcu_dereference(storage->buf);
+   this_cpu_write(bpf_cgroup_storage, >data[0]);
+}
+
 struct bpf_cgroup_storage *bpf_cgroup_storage_alloc(struct bpf_prog *prog);
 void bpf_cgroup_storage_free(struct bpf_cgroup_storage *storage);
 void bpf_cgroup_storage_link(struct bpf_cgroup_storage *storage,
@@ -223,6 +236,7 @@ struct cgroup_bpf {};
 static inline void cgroup_bpf_put(struct cgroup *cgrp) {}
 static inline int cgroup_bpf_inherit(struct cgroup *cgrp) { return 0; }
 
+static inline void bpf_cgroup_storage_set(struct bpf_cgroup_storage *storage) 
{}
 static inline int bpf_cgroup_storage_assign(struct bpf_prog *prog,
struct bpf_map *map) { return 0; }
 static inline void bpf_cgroup_storage_release(struct bpf_prog *prog,
diff --git a/kernel/bpf/local_storage.c b/kernel/bpf/local_storage.c
index 940889eda2c7..38810a712971 100644
--- a/kernel/bpf/local_storage.c
+++ b/kernel/bpf/local_storage.c
@@ -7,6 +7,8 @@
 #include 
 #include 
 
+DEFINE_PER_CPU(void*, bpf_cgroup_storage);
+
 #ifdef CONFIG_CGROUP_BPF
 
 struct bpf_cgroup_storage_map {
-- 
2.14.4

[PATCH bpf-net 09/14] bpf: sync bpf.h to tools/

2018-06-28 Thread Roman Gushchin

Sync cgroup storage related changes:
1) new BPF_MAP_TYPE_CGROUP_STORAGE map type
2) struct bpf_cgroup_sotrage_key definition
3) get_local_storage() helper

Signed-off-by: Roman Gushchin 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Acked-by: Martin KaFai Lau 
---
 tools/include/uapi/linux/bpf.h | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index e0b06784f227..06e81dda 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -75,6 +75,11 @@ struct bpf_lpm_trie_key {
__u8data[0];/* Arbitrary size */
 };
 
+struct bpf_cgroup_storage_key {
+   __u64   cgroup_inode_id;/* cgroup inode id */
+   __u32   attach_type;/* program attach type */
+};
+
 /* BPF syscall commands, see bpf(2) man-page for details. */
 enum bpf_cmd {
BPF_MAP_CREATE,
@@ -120,6 +125,7 @@ enum bpf_map_type {
BPF_MAP_TYPE_CPUMAP,
BPF_MAP_TYPE_XSKMAP,
BPF_MAP_TYPE_SOCKHASH,
+   BPF_MAP_TYPE_CGROUP_STORAGE,
 };
 
 enum bpf_prog_type {
@@ -2157,7 +2163,8 @@ union bpf_attr {
FN(rc_repeat),  \
FN(rc_keydown), \
FN(skb_cgroup_id),  \
-   FN(get_current_cgroup_id),
+   FN(get_current_cgroup_id),  \
+   FN(get_local_storage),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
-- 
2.14.4

[PATCH bpf-net 04/14] bpf: allocate cgroup storage entries on attaching bpf programs

2018-06-28 Thread Roman Gushchin

If a bpf program is using cgroup local storage, allocate
a bpf_cgroup_storage structure automatically on attaching the program
to a cgroup and save the pointer into the corresponding bpf_prog_list
entry.
Analogically, release the cgroup local storage on detaching
of the bpf program.

Signed-off-by: Roman Gushchin 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Acked-by: Martin KaFai Lau 
---
 include/linux/bpf-cgroup.h |  1 +
 kernel/bpf/cgroup.c| 28 ++--
 2 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index 128fb0e39b4d..25ba744d2364 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -41,6 +41,7 @@ struct bpf_cgroup_storage {
 struct bpf_prog_list {
struct list_head node;
struct bpf_prog *prog;
+   struct bpf_cgroup_storage *storage;
 };
 
 struct bpf_prog_array;
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index f7c00bd6f8e4..f0a809868f92 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -34,6 +34,8 @@ void cgroup_bpf_put(struct cgroup *cgrp)
list_for_each_entry_safe(pl, tmp, progs, node) {
list_del(>node);
bpf_prog_put(pl->prog);
+   bpf_cgroup_storage_unlink(pl->storage);
+   bpf_cgroup_storage_free(pl->storage);
kfree(pl);
static_branch_dec(_bpf_enabled_key);
}
@@ -189,6 +191,7 @@ int __cgroup_bpf_attach(struct cgroup *cgrp, struct 
bpf_prog *prog,
 {
struct list_head *progs = >bpf.progs[type];
struct bpf_prog *old_prog = NULL;
+   struct bpf_cgroup_storage *storage, *old_storage = NULL;
struct cgroup_subsys_state *css;
struct bpf_prog_list *pl;
bool pl_was_allocated;
@@ -211,6 +214,10 @@ int __cgroup_bpf_attach(struct cgroup *cgrp, struct 
bpf_prog *prog,
if (prog_list_length(progs) >= BPF_CGROUP_MAX_PROGS)
return -E2BIG;
 
+   storage = bpf_cgroup_storage_alloc(prog);
+   if (IS_ERR(storage))
+   return -ENOMEM;
+
if (flags & BPF_F_ALLOW_MULTI) {
list_for_each_entry(pl, progs, node)
if (pl->prog == prog)
@@ -218,24 +225,33 @@ int __cgroup_bpf_attach(struct cgroup *cgrp, struct 
bpf_prog *prog,
return -EINVAL;
 
pl = kmalloc(sizeof(*pl), GFP_KERNEL);
-   if (!pl)
+   if (!pl) {
+   bpf_cgroup_storage_free(storage);
return -ENOMEM;
+   }
+
pl_was_allocated = true;
pl->prog = prog;
+   pl->storage = storage;
list_add_tail(>node, progs);
} else {
if (list_empty(progs)) {
pl = kmalloc(sizeof(*pl), GFP_KERNEL);
-   if (!pl)
+   if (!pl) {
+   bpf_cgroup_storage_free(storage);
return -ENOMEM;
+   }
pl_was_allocated = true;
list_add_tail(>node, progs);
} else {
pl = list_first_entry(progs, typeof(*pl), node);
old_prog = pl->prog;
+   old_storage = pl->storage;
+   bpf_cgroup_storage_unlink(old_storage);
pl_was_allocated = false;
}
pl->prog = prog;
+   pl->storage = storage;
}
 
cgrp->bpf.flags[type] = flags;
@@ -258,10 +274,13 @@ int __cgroup_bpf_attach(struct cgroup *cgrp, struct 
bpf_prog *prog,
}
 
static_branch_inc(_bpf_enabled_key);
+   if (old_storage)
+   bpf_cgroup_storage_free(old_storage);
if (old_prog) {
bpf_prog_put(old_prog);
static_branch_dec(_bpf_enabled_key);
}
+   bpf_cgroup_storage_link(storage, cgrp, type);
return 0;
 
 cleanup:
@@ -277,6 +296,9 @@ int __cgroup_bpf_attach(struct cgroup *cgrp, struct 
bpf_prog *prog,
 
/* and cleanup the prog list */
pl->prog = old_prog;
+   bpf_cgroup_storage_free(pl->storage);
+   pl->storage = old_storage;
+   bpf_cgroup_storage_link(old_storage, cgrp, type);
if (pl_was_allocated) {
list_del(>node);
kfree(pl);
@@ -357,6 +379,8 @@ int __cgroup_bpf_detach(struct cgroup *cgrp, struct 
bpf_prog *prog,
 
/* now can actually delete it from this cgroup list */
list_del(>node);
+   bpf_cgroup_storage_unlink(pl->storage);
+   bpf_cgroup_storage_free(pl->storage);
kfree(pl);
if (list_empty(progs))
/* last program was detached, reset flags to zero */
-- 
2.14.4

[PATCH bpf-net 12/14] selftests/bpf: add verifier cgroup storage tests

2018-06-28 Thread Roman Gushchin

Add the following verifier tests to cover the cgroup storage
functionality:
1) valid access to the cgroup storage
2) invalid access: use regular hashmap instead of cgroup storage map
3) invalid access: use invalid map fd
4) invalid access: try access memory after the cgroup storage
5) invalid access: try access memory before the cgroup storage
6) invalid access: call get_local_storage() with non-zero flags

For tests 2)-6) check returned error strings.

Expected output:
  $ ./test_verifier
  #0/u add+sub+mul OK
  #0/p add+sub+mul OK
  #1/u DIV32 by 0, zero check 1 OK
  ...
  #280/p valid cgroup storage access OK
  #281/p invalid cgroup storage access 1 OK
  #282/p invalid cgroup storage access 2 OK
  #283/p invalid per-cgroup storage access 3 OK
  #284/p invalid cgroup storage access 4 OK
  #285/p invalid cgroup storage access 5 OK
  ...
  #649/p pass modified ctx pointer to helper, 2 OK
  #650/p pass modified ctx pointer to helper, 3 OK
  Summary: 901 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Roman Gushchin 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Acked-by: Martin KaFai Lau 
---
 tools/testing/selftests/bpf/bpf_helpers.h   |   2 +
 tools/testing/selftests/bpf/test_verifier.c | 123 +++-
 2 files changed, 124 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/bpf_helpers.h 
b/tools/testing/selftests/bpf/bpf_helpers.h
index f2f28b6c8915..ccd959fd940e 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -133,6 +133,8 @@ static int (*bpf_rc_keydown)(void *ctx, unsigned int 
protocol,
(void *) BPF_FUNC_rc_keydown;
 static unsigned long long (*bpf_get_current_cgroup_id)(void) =
(void *) BPF_FUNC_get_current_cgroup_id;
+static void *(*bpf_get_local_storage)(void *map, unsigned long long flags) =
+   (void *) BPF_FUNC_get_local_storage;
 
 /* llvm builtin functions that eBPF C program may use to
  * emit BPF_LD_ABS and BPF_LD_IND instructions
diff --git a/tools/testing/selftests/bpf/test_verifier.c 
b/tools/testing/selftests/bpf/test_verifier.c
index 2ecd27b670d7..7016fb2964a1 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -50,7 +50,7 @@
 
 #define MAX_INSNS  BPF_MAXINSNS
 #define MAX_FIXUPS 8
-#define MAX_NR_MAPS7
+#define MAX_NR_MAPS8
 #define POINTER_VALUE  0xcafe4all
 #define TEST_DATA_LEN  64
 
@@ -70,6 +70,7 @@ struct bpf_test {
int fixup_prog1[MAX_FIXUPS];
int fixup_prog2[MAX_FIXUPS];
int fixup_map_in_map[MAX_FIXUPS];
+   int fixup_cgroup_storage[MAX_FIXUPS];
const char *errstr;
const char *errstr_unpriv;
uint32_t retval;
@@ -4630,6 +4631,104 @@ static struct bpf_test tests[] = {
.result = REJECT,
.flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
},
+   {
+   "valid cgroup storage access",
+   .insns = {
+   BPF_MOV64_IMM(BPF_REG_2, 0),
+   BPF_LD_MAP_FD(BPF_REG_1, 0),
+   BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+BPF_FUNC_get_local_storage),
+   BPF_LDX_MEM(BPF_W, BPF_REG_1, BPF_REG_0, 0),
+   BPF_MOV64_REG(BPF_REG_0, BPF_REG_1),
+   BPF_ALU64_IMM(BPF_AND, BPF_REG_0, 1),
+   BPF_EXIT_INSN(),
+   },
+   .fixup_cgroup_storage = { 1 },
+   .result = ACCEPT,
+   .prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+   },
+   {
+   "invalid cgroup storage access 1",
+   .insns = {
+   BPF_MOV64_IMM(BPF_REG_2, 0),
+   BPF_LD_MAP_FD(BPF_REG_1, 0),
+   BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+BPF_FUNC_get_local_storage),
+   BPF_LDX_MEM(BPF_W, BPF_REG_1, BPF_REG_0, 0),
+   BPF_MOV64_REG(BPF_REG_0, BPF_REG_1),
+   BPF_ALU64_IMM(BPF_AND, BPF_REG_0, 1),
+   BPF_EXIT_INSN(),
+   },
+   .fixup_map1 = { 1 },
+   .result = REJECT,
+   .errstr = "cannot pass map_type 1 into func 
bpf_get_local_storage",
+   .prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+   },
+   {
+   "invalid cgroup storage access 2",
+   .insns = {
+   BPF_MOV64_IMM(BPF_REG_2, 0),
+   BPF_LD_MAP_FD(BPF_REG_1, 1),
+   BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+BPF_FUNC_get_local_storage),
+   BPF_ALU64_IMM(BPF_AND, BPF_REG_0, 1),
+   BPF_EXIT_INSN(),
+   },
+   .result = REJECT,
+   .errstr = "fd 1 is not pointing

[PATCH bpf-net 00/14] bpf: cgroup local storage

2018-06-28 Thread Roman Gushchin

This patchset implements cgroup local storage for bpf programs.
The main idea is to provide a fast accessible memory for storing
various per-cgroup data, e.g. number of transmitted packets.

Cgroup local storage looks as a special type of map for userspace,
and is accessible using generic bpf maps API for reading and
updating of the data. The (cgroup inode id, attachment type) pair
is used as a map key.

A user can't create new entries or destroy existing entries;
it happens automatically when a user attaches/detaches a bpf program
to a cgroup.

>From a bpf program's point of view, cgroup storage is accessible
without lookup using the special get_local_storage() helper function.
It takes a map fd as an argument. It always returns a valid pointer
to the corresponding memory area.
To implement such a lookup-free access a pointer to the cgroup
storage is saved for an attachment of a bpf program to a cgroup,
if required by the program. Before running the program, it's saved
in a special global per-cpu variable, which is accessible from the
get_local_storage() helper.

This patchset implement only cgroup local storage, however the API
is intentionally made extensible to support other local storage types
further: e.g. thread local storage, socket local storage, etc.

Patch (1) adds an ability to charge bpf maps for consuming memory
dynamically.
Patch (2) introduces cgroup storage maps.
Patch (3) implements a mechanism to pass cgroup storage pointer
to a bpf program.
Patch (4) implements allocation/releasing of cgroup local storage
on attaching/detaching of a bpf program to/from a cgroup.
Patch (5) extends bpf_prog_array to store cgroup storage pointers.
Patch (6) introduces BPF_PTR_TO_MAP_VALUE, required to skip
non-necessary NULL-check in bpf programs.
Patch (7) disables creation of maps of cgroup storage maps.
Patch (8) introduces the get_local_storage() helper.
Patch (9) syncs bpf.h to tools/.
Patch (10) adds cgroup storage maps support to bpftool.
Patch (11) adds support for testing programs which are using
cgroup storage without actually attaching them to cgroups.
Patches (12), (13) and (14) are adding necessary tests.

Signed-off-by: Roman Gushchin 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Cc: Martin KaFai Lau 

Roman Gushchin (14):
  bpf: add ability to charge bpf maps memory dynamically
  bpf: introduce cgroup storage maps
  bpf: pass a pointer to a cgroup storage using pcpu variable
  bpf: allocate cgroup storage entries on attaching bpf programs
  bpf: extend bpf_prog_array to store pointers to the cgroup storage
  bpf/verifier: introduce BPF_PTR_TO_MAP_VALUE
  bpf: don't allow create maps of cgroup local storages
  bpf: introduce the bpf_get_local_storage() helper function
  bpf: sync bpf.h to tools/
  bpftool: add support for CGROUP_STORAGE maps
  bpf/test_run: support cgroup local storage
  selftests/bpf: add verifier cgroup storage tests
  selftests/bpf: add a cgroup storage test
  samples/bpf: extend test_cgrp2_attach2 test to use cgroup storage

 include/linux/bpf-cgroup.h|  53 
 include/linux/bpf.h   |  25 +-
 include/linux/bpf_types.h |   3 +
 include/uapi/linux/bpf.h  |  19 +-
 kernel/bpf/Makefile   |   1 +
 kernel/bpf/cgroup.c   |  54 +++-
 kernel/bpf/core.c |  76 ++---
 kernel/bpf/helpers.c  |  20 ++
 kernel/bpf/local_storage.c| 369 ++
 kernel/bpf/map_in_map.c   |   3 +-
 kernel/bpf/syscall.c  |  53 +++-
 kernel/bpf/verifier.c |  38 ++-
 net/bpf/test_run.c|  13 +-
 net/core/filter.c |  23 +-
 samples/bpf/test_cgrp2_attach2.c  |  27 +-
 tools/bpf/bpftool/map.c   |   1 +
 tools/include/uapi/linux/bpf.h|   9 +-
 tools/testing/selftests/bpf/Makefile  |   4 +-
 tools/testing/selftests/bpf/bpf_helpers.h |   2 +
 tools/testing/selftests/bpf/test_cgroup_storage.c | 130 
 tools/testing/selftests/bpf/test_verifier.c   | 123 +++-
 21 files changed, 965 insertions(+), 81 deletions(-)
 create mode 100644 kernel/bpf/local_storage.c
 create mode 100644 tools/testing/selftests/bpf/test_cgroup_storage.c

-- 
2.14.4

[PATCH bpf-net 06/14] bpf/verifier: introduce BPF_PTR_TO_MAP_VALUE

2018-06-28 Thread Roman Gushchin

BPF_MAP_TYPE_CGROUP_STORAGE maps are special in a way
that the access from the bpf program side is lookup-free.
That means the result is guaranteed to be a valid
pointer to the cgroup storage; no NULL-check is required.

This patch introduces BPF_PTR_TO_MAP_VALUE return type,
which is required to cause the verifier accept programs,
which are not checking the map value pointer for being NULL.

Signed-off-by: Roman Gushchin 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Acked-by: Martin KaFai Lau 
---
 include/linux/bpf.h   | 1 +
 kernel/bpf/verifier.c | 8 ++--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 709354a0608a..6d7e0dfc 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -154,6 +154,7 @@ enum bpf_arg_type {
 enum bpf_return_type {
RET_INTEGER,/* function returns integer */
RET_VOID,   /* function doesn't return anything */
+   RET_PTR_TO_MAP_VALUE,   /* returns a pointer to map elem value 
*/
RET_PTR_TO_MAP_VALUE_OR_NULL,   /* returns a pointer to map elem value 
or NULL */
 };
 
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index de097a642c3f..cc0c7990f849 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -2545,8 +2545,12 @@ static int check_helper_call(struct bpf_verifier_env 
*env, int func_id, int insn
mark_reg_unknown(env, regs, BPF_REG_0);
} else if (fn->ret_type == RET_VOID) {
regs[BPF_REG_0].type = NOT_INIT;
-   } else if (fn->ret_type == RET_PTR_TO_MAP_VALUE_OR_NULL) {
-   regs[BPF_REG_0].type = PTR_TO_MAP_VALUE_OR_NULL;
+   } else if (fn->ret_type == RET_PTR_TO_MAP_VALUE_OR_NULL ||
+  fn->ret_type == RET_PTR_TO_MAP_VALUE) {
+   if (fn->ret_type == RET_PTR_TO_MAP_VALUE)
+   regs[BPF_REG_0].type = PTR_TO_MAP_VALUE;
+   else
+   regs[BPF_REG_0].type = PTR_TO_MAP_VALUE_OR_NULL;
/* There is no offset yet applied, variable or fixed */
mark_reg_known_zero(env, regs, BPF_REG_0);
regs[BPF_REG_0].off = 0;
-- 
2.14.4

[PATCH bpf-net 05/14] bpf: extend bpf_prog_array to store pointers to the cgroup storage

2018-06-28 Thread Roman Gushchin

This patch converts bpf_prog_array from an array of prog pointers
to the array of struct bpf_prog_array_item elements.

This allows to save a cgroup storage pointer for each bpf program
efficiently attached to a cgroup.

Signed-off-by: Roman Gushchin 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Acked-by: Martin KaFai Lau 
---
 include/linux/bpf.h | 19 +-
 kernel/bpf/cgroup.c | 24 ++---
 kernel/bpf/core.c   | 76 +++--
 3 files changed, 66 insertions(+), 53 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 4b3e42e5b6d0..709354a0608a 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -348,9 +348,14 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const 
union bpf_attr *kattr,
  * The 'struct bpf_prog_array *' should only be replaced with xchg()
  * since other cpus are walking the array of pointers in parallel.
  */
+struct bpf_prog_array_item {
+   struct bpf_prog *prog;
+   struct bpf_cgroup_storage *cgroup_storage;
+};
+
 struct bpf_prog_array {
struct rcu_head rcu;
-   struct bpf_prog *progs[0];
+   struct bpf_prog_array_item items[0];
 };
 
 struct bpf_prog_array __rcu *bpf_prog_array_alloc(u32 prog_cnt, gfp_t flags);
@@ -371,7 +376,8 @@ int bpf_prog_array_copy(struct bpf_prog_array __rcu 
*old_array,
 
 #define __BPF_PROG_RUN_ARRAY(array, ctx, func, check_non_null) \
({  \
-   struct bpf_prog **_prog, *__prog;   \
+   struct bpf_prog_array_item *_item;  \
+   struct bpf_prog *_prog; \
struct bpf_prog_array *_array;  \
u32 _ret = 1;   \
preempt_disable();  \
@@ -379,10 +385,11 @@ int bpf_prog_array_copy(struct bpf_prog_array __rcu 
*old_array,
_array = rcu_dereference(array);\
if (unlikely(check_non_null && !_array))\
goto _out;  \
-   _prog = _array->progs;  \
-   while ((__prog = READ_ONCE(*_prog))) {  \
-   _ret &= func(__prog, ctx);  \
-   _prog++;\
+   _item = &_array->items[0];  \
+   while ((_prog = READ_ONCE(_item->prog))) {  \
+   bpf_cgroup_storage_set(_item->cgroup_storage);  \
+   _ret &= func(_prog, ctx);   \
+   _item++;\
}   \
 _out:  \
rcu_read_unlock();  \
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index f0a809868f92..14a1f6c94592 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -117,16 +117,20 @@ static int compute_effective_progs(struct cgroup *cgrp,
cnt = 0;
p = cgrp;
do {
-   if (cnt == 0 || (p->bpf.flags[type] & BPF_F_ALLOW_MULTI))
-   list_for_each_entry(pl,
-   >bpf.progs[type], node) {
-   if (!pl->prog)
-   continue;
-   rcu_dereference_protected(progs, 1)->
-   progs[cnt++] = pl->prog;
-   }
-   p = cgroup_parent(p);
-   } while (p);
+   if (cnt > 0 && !(p->bpf.flags[type] & BPF_F_ALLOW_MULTI))
+   continue;
+
+   list_for_each_entry(pl, >bpf.progs[type], node) {
+   if (!pl->prog)
+   continue;
+
+   rcu_dereference_protected(progs, 1)->
+   items[cnt].prog = pl->prog;
+   rcu_dereference_protected(progs, 1)->
+   items[cnt].cgroup_storage = pl->storage;
+   cnt++;
+   }
+   } while ((p = cgroup_parent(p)));
 
*array = progs;
return 0;
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index a9e6c04d0f4a..145f44cb0cad 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -1570,7 +1570,8 @@ struct bpf_prog_array __rcu *bpf_prog_array_alloc(u32 
prog_cnt, gfp_t flags)
 {
if (prog_cnt)
return kzalloc(sizeof(struct bpf_prog_array) +
-  sizeof(struct bpf_prog *) * (prog_cnt + 1),
+  sizeof(struct bpf_prog_array_item) *
+  (prog_cnt + 1),
   flags);
 
return _prog_array.hdr;
@@ -1584,43 +1585,45 @@ void bpf_prog_array_free(struct bpf_p

[PATCH bpf-net 02/14] bpf: introduce cgroup storage maps

2018-06-28 Thread Roman Gushchin

This commit introduces BPF_MAP_TYPE_CGROUP_STORAGE maps:
a special type of maps which are implementing the cgroup storage.

>From the userspace point of view it's almost a generic
hash map with the (cgroup inode id, attachment type) pair
used as a key.

The only difference is that some operations are restricted:
  1) a user can't create new entries,
  2) a user can't remove existing entries.

The lookup from userspace is o(log(n)).

Signed-off-by: Roman Gushchin 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Acked-by: Martin KaFai Lau 
---
 include/linux/bpf-cgroup.h |  38 +
 include/linux/bpf.h|   1 +
 include/linux/bpf_types.h  |   3 +
 include/uapi/linux/bpf.h   |   6 +
 kernel/bpf/Makefile|   1 +
 kernel/bpf/local_storage.c | 367 +
 kernel/bpf/verifier.c  |  12 ++
 7 files changed, 428 insertions(+)
 create mode 100644 kernel/bpf/local_storage.c

diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index 975fb4cf1bb7..b4e2e42c1d2a 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -3,19 +3,39 @@
 #define _BPF_CGROUP_H
 
 #include 
+#include 
 #include 
 
 struct sock;
 struct sockaddr;
 struct cgroup;
 struct sk_buff;
+struct bpf_map;
+struct bpf_prog;
 struct bpf_sock_ops_kern;
+struct bpf_cgroup_storage;
 
 #ifdef CONFIG_CGROUP_BPF
 
 extern struct static_key_false cgroup_bpf_enabled_key;
 #define cgroup_bpf_enabled static_branch_unlikely(_bpf_enabled_key)
 
+struct bpf_cgroup_storage_map;
+
+struct bpf_storage_buffer {
+   struct rcu_head rcu;
+   char data[0];
+};
+
+struct bpf_cgroup_storage {
+   struct bpf_storage_buffer *buf;
+   struct bpf_cgroup_storage_map *map;
+   struct bpf_cgroup_storage_key key;
+   struct list_head list;
+   struct rb_node node;
+   struct rcu_head rcu;
+};
+
 struct bpf_prog_list {
struct list_head node;
struct bpf_prog *prog;
@@ -76,6 +96,15 @@ int __cgroup_bpf_run_filter_sock_ops(struct sock *sk,
 int __cgroup_bpf_check_dev_permission(short dev_type, u32 major, u32 minor,
  short access, enum bpf_attach_type type);
 
+struct bpf_cgroup_storage *bpf_cgroup_storage_alloc(struct bpf_prog *prog);
+void bpf_cgroup_storage_free(struct bpf_cgroup_storage *storage);
+void bpf_cgroup_storage_link(struct bpf_cgroup_storage *storage,
+struct cgroup *cgroup,
+enum bpf_attach_type type);
+void bpf_cgroup_storage_unlink(struct bpf_cgroup_storage *storage);
+int bpf_cgroup_storage_assign(struct bpf_prog *prog, struct bpf_map *map);
+void bpf_cgroup_storage_release(struct bpf_prog *prog, struct bpf_map *map);
+
 /* Wrappers for __cgroup_bpf_run_filter_skb() guarded by cgroup_bpf_enabled. */
 #define BPF_CGROUP_RUN_PROG_INET_INGRESS(sk, skb)\
 ({   \
@@ -194,6 +223,15 @@ struct cgroup_bpf {};
 static inline void cgroup_bpf_put(struct cgroup *cgrp) {}
 static inline int cgroup_bpf_inherit(struct cgroup *cgrp) { return 0; }
 
+static inline int bpf_cgroup_storage_assign(struct bpf_prog *prog,
+   struct bpf_map *map) { return 0; }
+static inline void bpf_cgroup_storage_release(struct bpf_prog *prog,
+ struct bpf_map *map) {}
+static inline struct bpf_cgroup_storage *bpf_cgroup_storage_alloc(
+   struct bpf_prog *prog) { return 0; }
+static inline void bpf_cgroup_storage_free(
+   struct bpf_cgroup_storage *storage) {}
+
 #define cgroup_bpf_enabled (0)
 #define BPF_CGROUP_PRE_CONNECT_ENABLED(sk) (0)
 #define BPF_CGROUP_RUN_PROG_INET_INGRESS(sk,skb) ({ 0; })
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index e4d684ce3f5e..4b3e42e5b6d0 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -281,6 +281,7 @@ struct bpf_prog_aux {
struct bpf_prog *prog;
struct user_struct *user;
u64 load_time; /* ns since boottime */
+   struct bpf_map *cgroup_storage;
char name[BPF_OBJ_NAME_LEN];
 #ifdef CONFIG_SECURITY
void *security;
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index c5700c2d5549..add08be53b6f 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -37,6 +37,9 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_PERF_EVENT_ARRAY, 
perf_event_array_map_ops)
 #ifdef CONFIG_CGROUPS
 BPF_MAP_TYPE(BPF_MAP_TYPE_CGROUP_ARRAY, cgroup_array_map_ops)
 #endif
+#ifdef CONFIG_CGROUP_BPF
+BPF_MAP_TYPE(BPF_MAP_TYPE_CGROUP_STORAGE, cgroup_storage_map_ops)
+#endif
 BPF_MAP_TYPE(BPF_MAP_TYPE_HASH, htab_map_ops)
 BPF_MAP_TYPE(BPF_MAP_TYPE_PERCPU_HASH, htab_percpu_map_ops)
 BPF_MAP_TYPE(BPF_MAP_TYPE_LRU_HASH, htab_lru_map_ops)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 59b19b6a40d7..7aa135e4c2f3 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -7

[PATCH bpf-net 01/14] bpf: add ability to charge bpf maps memory dynamically

2018-06-28 Thread Roman Gushchin

This commits extends existing bpf maps memory charging API
to support dynamic charging/uncharging.

This is required to account memory used by maps,
if all entries are created dynamically after
the map initialization.

Signed-off-by: Roman Gushchin 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Acked-by: Martin KaFai Lau 
---
 include/linux/bpf.h  |  2 ++
 kernel/bpf/syscall.c | 53 +---
 2 files changed, 40 insertions(+), 15 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 7df32a3200f7..e4d684ce3f5e 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -434,6 +434,8 @@ struct bpf_map * __must_check bpf_map_inc(struct bpf_map 
*map, bool uref);
 void bpf_map_put_with_uref(struct bpf_map *map);
 void bpf_map_put(struct bpf_map *map);
 int bpf_map_precharge_memlock(u32 pages);
+int bpf_map_charge_memlock(struct bpf_map *map, u32 pages);
+void bpf_map_uncharge_memlock(struct bpf_map *map, u32 pages);
 void *bpf_map_area_alloc(size_t size, int numa_node);
 void bpf_map_area_free(void *base);
 void bpf_map_init_from_attr(struct bpf_map *map, union bpf_attr *attr);
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 35dc466641f2..e03aeeec01e0 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -181,32 +181,55 @@ int bpf_map_precharge_memlock(u32 pages)
return 0;
 }
 
-static int bpf_map_charge_memlock(struct bpf_map *map)
+static int bpf_charge_memlock(struct user_struct *user, u32 pages)
 {
-   struct user_struct *user = get_current_user();
-   unsigned long memlock_limit;
+   unsigned long memlock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
 
-   memlock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
+   if (atomic_long_add_return(pages, >locked_vm) > memlock_limit) {
+   atomic_long_sub(pages, >locked_vm);
+   return -EPERM;
+   }
+   return 0;
+}
 
-   atomic_long_add(map->pages, >locked_vm);
+static int bpf_map_init_memlock(struct bpf_map *map)
+{
+   struct user_struct *user = get_current_user();
+   int ret;
 
-   if (atomic_long_read(>locked_vm) > memlock_limit) {
-   atomic_long_sub(map->pages, >locked_vm);
+   ret = bpf_charge_memlock(user, map->pages);
+   if (ret) {
free_uid(user);
-   return -EPERM;
+   return ret;
}
map->user = user;
-   return 0;
+   return ret;
 }
 
-static void bpf_map_uncharge_memlock(struct bpf_map *map)
+static void bpf_map_release_memlock(struct bpf_map *map)
 {
struct user_struct *user = map->user;
-
-   atomic_long_sub(map->pages, >locked_vm);
+   atomic_long_sub(map->pages, >user->locked_vm);
free_uid(user);
 }
 
+int bpf_map_charge_memlock(struct bpf_map *map, u32 pages)
+{
+   int ret;
+
+   ret = bpf_charge_memlock(map->user, pages);
+   if (ret)
+   return ret;
+   map->pages += pages;
+   return ret;
+}
+
+void bpf_map_uncharge_memlock(struct bpf_map *map, u32 pages)
+{
+   atomic_long_sub(pages, >user->locked_vm);
+   map->pages -= pages;
+}
+
 static int bpf_map_alloc_id(struct bpf_map *map)
 {
int id;
@@ -256,7 +279,7 @@ static void bpf_map_free_deferred(struct work_struct *work)
 {
struct bpf_map *map = container_of(work, struct bpf_map, work);
 
-   bpf_map_uncharge_memlock(map);
+   bpf_map_release_memlock(map);
security_bpf_map_free(map);
/* implementation dependent freeing */
map->ops->map_free(map);
@@ -492,7 +515,7 @@ static int map_create(union bpf_attr *attr)
if (err)
goto free_map_nouncharge;
 
-   err = bpf_map_charge_memlock(map);
+   err = bpf_map_init_memlock(map);
if (err)
goto free_map_sec;
 
@@ -515,7 +538,7 @@ static int map_create(union bpf_attr *attr)
return err;
 
 free_map:
-   bpf_map_uncharge_memlock(map);
+   bpf_map_release_memlock(map);
 free_map_sec:
security_bpf_map_free(map);
 free_map_nouncharge:
-- 
2.14.4

[PATCH bpf] bpf: disable and restore preemption in __BPF_PROG_RUN_ARRAY

2018-04-23 Thread Roman Gushchin

Running bpf programs requires disabled preemption,
however at least some* of the BPF_PROG_RUN_ARRAY users
do not follow this rule.

To fix this bug, and also to make it not happen in the future,
let's add explicit preemption disabling/re-enabling
to the __BPF_PROG_RUN_ARRAY code.

* for example:
 [   17.624472] RIP: 0010:__cgroup_bpf_run_filter_sk+0x1c4/0x1d0
 ...
 [   17.640890]  inet6_create+0x3eb/0x520
 [   17.641405]  __sock_create+0x242/0x340
 [   17.641939]  __sys_socket+0x57/0xe0
 [   17.642370]  ? trace_hardirqs_off_thunk+0x1a/0x1c
 [   17.642944]  SyS_socket+0xa/0x10
 [   17.643357]  do_syscall_64+0x79/0x220
 [   17.643879]  entry_SYSCALL_64_after_hwframe+0x42/0xb7

Signed-off-by: Roman Gushchin <g...@fb.com>
Cc: Alexei Starovoitov <a...@kernel.org>
Cc: Daniel Borkmann <dan...@iogearbox.net>
---
 include/linux/bpf.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 486e65e3db26..dc586cc64bc2 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -351,6 +351,7 @@ int bpf_prog_array_copy(struct bpf_prog_array __rcu 
*old_array,
struct bpf_prog **_prog, *__prog;   \
struct bpf_prog_array *_array;  \
u32 _ret = 1;   \
+   preempt_disable();  \
rcu_read_lock();\
_array = rcu_dereference(array);\
if (unlikely(check_non_null && !_array))\
@@ -362,6 +363,7 @@ int bpf_prog_array_copy(struct bpf_prog_array __rcu 
*old_array,
}   \
 _out:  \
rcu_read_unlock();  \
+   preempt_enable_no_resched();\
_ret;   \
 })
 
-- 
2.14.3

[PATCH net] Revert "defer call to mem_cgroup_sk_alloc()"

2018-02-02 Thread Roman Gushchin

On Fri, Feb 02, 2018 at 11:34:56AM -0800, Eric Dumazet wrote:
> On Fri, 2018-02-02 at 19:04 +0000, Roman Gushchin wrote:
> > On Fri, Feb 02, 2018 at 10:39:04AM -0800, Eric Dumazet wrote:
> > > On Fri, 2018-02-02 at 18:06 +0000, Roman Gushchin wrote:
> > > > 
> > > > Idk, how even we can hit it? And if so, what scary will happen?
> > > > 
> > > > If you prefer to have it there, I definitely can return it,
> > > > but I see no profit so far.
> > > 
> > > I was simply curious this was not mentioned in the changelog.
> > > 
> > > A revert is normally a true revert, modulo the changes needed by
> > > conflicts and possible changes.
> > > 
> > > I personally do not care of this BUG_ON(), I had not put it in the
> > > first place.
> > 
> > Technically it's not a true revert, but you're totally right.
> > Let me add a note to the commit description.
> > 
> > Are you ok with the rest?
> 
> Sure !
> 
> Thanks.

Hello, David!

Can you, please, pull the patch below?
It should be applied for 4.14+.

Thank you!

Roman

--

>From a0a07f65a38105562bf424d7dc072a2bc4f1569e Mon Sep 17 00:00:00 2001
From: Roman Gushchin <g...@fb.com>
Date: Fri, 2 Feb 2018 15:26:57 +
Subject: [PATCH net] Revert "defer call to mem_cgroup_sk_alloc()"

This patch effectively reverts commit 9f1c2674b328 ("net: memcontrol:
defer call to mem_cgroup_sk_alloc()").

Moving mem_cgroup_sk_alloc() to the inet_csk_accept() completely breaks
memcg socket memory accounting, as packets received before memcg
pointer initialization are not accounted and are causing refcounting
underflow on socket release.

Actually the free-after-use problem was fixed by
commit c0576e397508 ("net: call cgroup_sk_alloc() earlier in
sk_clone_lock()") for the cgroup pointer.

So, let's revert it and call mem_cgroup_sk_alloc() just before
cgroup_sk_alloc(). This is safe, as we hold a reference to the socket
we're cloning, and it holds a reference to the memcg.

Also, let's drop BUG_ON(mem_cgroup_is_root()) check from
mem_cgroup_sk_alloc(). I see no reasons why bumping the root
memcg counter is a good reason to panic, and there are no realistic
ways to hit it.

Signed-off-by: Roman Gushchin <g...@fb.com>
Cc: Eric Dumazet <eduma...@google.com>
Cc: David S. Miller <da...@davemloft.net>
Cc: Johannes Weiner <han...@cmpxchg.org>
Cc: Tejun Heo <t...@kernel.org>
---
 mm/memcontrol.c | 14 ++
 net/core/sock.c |  5 +
 net/ipv4/inet_connection_sock.c |  1 -
 3 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 0ae2dc3a1748..0937f2c52c7d 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5747,6 +5747,20 @@ void mem_cgroup_sk_alloc(struct sock *sk)
if (!mem_cgroup_sockets_enabled)
return;
 
+   /*
+* Socket cloning can throw us here with sk_memcg already
+* filled. It won't however, necessarily happen from
+* process context. So the test for root memcg given
+* the current task's memcg won't help us in this case.
+*
+* Respecting the original socket's memcg is a better
+* decision in this case.
+*/
+   if (sk->sk_memcg) {
+   css_get(>sk_memcg->css);
+   return;
+   }
+
rcu_read_lock();
memcg = mem_cgroup_from_task(current);
if (memcg == root_mem_cgroup)
diff --git a/net/core/sock.c b/net/core/sock.c
index 1033f8ab0547..e50e7b3f2223 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1683,16 +1683,13 @@ struct sock *sk_clone_lock(const struct sock *sk, const 
gfp_t priority)
newsk->sk_dst_pending_confirm = 0;
newsk->sk_wmem_queued   = 0;
newsk->sk_forward_alloc = 0;
-
-   /* sk->sk_memcg will be populated at accept() time */
-   newsk->sk_memcg = NULL;
-
atomic_set(>sk_drops, 0);
newsk->sk_send_head = NULL;
newsk->sk_userlocks = sk->sk_userlocks & 
~SOCK_BINDPORT_LOCK;
atomic_set(>sk_zckey, 0);
 
sock_reset_flag(newsk, SOCK_DONE);
+   mem_cgroup_sk_alloc(newsk);
cgroup_sk_alloc(>sk_cgrp_data);
 
rcu_read_lock();
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 12410ec6f7f7..881ac6d046f2 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -475,7 +475,6 @@ struct sock *inet_csk_accept(struct sock *sk, int flags, 
int *err, bool kern)
}
spin_unlock_bh(>fastopenq.lock);
}
-   mem_cgroup_sk_alloc(newsk);
 out:
release_sock(sk);
if (req)
-- 
2.14.3

Re: [PATCH net] Revert "defer call to mem_cgroup_sk_alloc()"

2018-02-02 Thread Roman Gushchin

On Fri, Feb 02, 2018 at 10:39:04AM -0800, Eric Dumazet wrote:
> On Fri, 2018-02-02 at 18:06 +0000, Roman Gushchin wrote:
> > 
> > Idk, how even we can hit it? And if so, what scary will happen?
> > 
> > If you prefer to have it there, I definitely can return it,
> > but I see no profit so far.
> 
> I was simply curious this was not mentioned in the changelog.
> 
> A revert is normally a true revert, modulo the changes needed by
> conflicts and possible changes.
> 
> I personally do not care of this BUG_ON(), I had not put it in the
> first place.

Technically it's not a true revert, but you're totally right.
Let me add a note to the commit description.

Are you ok with the rest?

Thanks!

Re: [PATCH net] Revert "defer call to mem_cgroup_sk_alloc()"

2018-02-02 Thread Roman Gushchin

On Fri, Feb 02, 2018 at 09:59:27AM -0800, Eric Dumazet wrote:
> On Fri, 2018-02-02 at 16:57 +0000, Roman Gushchin wrote:
> > This patch effectively reverts commit 9f1c2674b328 ("net: memcontrol:
> > defer call to mem_cgroup_sk_alloc()").
> > 
> > Moving mem_cgroup_sk_alloc() to the inet_csk_accept() completely breaks
> > memcg socket memory accounting, as packets received before memcg
> > pointer initialization are not accounted and are causing refcounting
> > underflow on socket release.
> > 
> > Actually the free-after-use problem was fixed by
> > commit c0576e397508 ("net: call cgroup_sk_alloc() earlier in
> > sk_clone_lock()") for the cgroup pointer.
> > 
> > So, let's revert it and call mem_cgroup_sk_alloc() just before
> > cgroup_sk_alloc(). This is safe, as we hold a reference to the socket
> > we're cloning, and it holds a reference to the memcg.
> > 
> > Signed-off-by: Roman Gushchin <g...@fb.com>
> > Cc: Eric Dumazet <eduma...@google.com>
> > Cc: David S. Miller <da...@davemloft.net>
> > Cc: Johannes Weiner <han...@cmpxchg.org>
> > Cc: Tejun Heo <t...@kernel.org>
> > ---
> >  mm/memcontrol.c | 14 ++
> >  net/core/sock.c |  5 +
> >  net/ipv4/inet_connection_sock.c |  1 -
> >  3 files changed, 15 insertions(+), 5 deletions(-)
> > 
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 0ae2dc3a1748..0937f2c52c7d 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -5747,6 +5747,20 @@ void mem_cgroup_sk_alloc(struct sock *sk)
> > if (!mem_cgroup_sockets_enabled)
> > return;
> >  
> > +   /*
> > +* Socket cloning can throw us here with sk_memcg already
> > +* filled. It won't however, necessarily happen from
> > +* process context. So the test for root memcg given
> > +* the current task's memcg won't help us in this case.
> > +*
> > +* Respecting the original socket's memcg is a better
> > +* decision in this case.
> > +*/
> > +   if (sk->sk_memcg) {
> 
> Original commit had a BUG_ON(mem_cgroup_is_root(sk->sk_memcg));
> 
> I presume it is no longer useful ?

Idk, how even we can hit it? And if so, what scary will happen?

If you prefer to have it there, I definitely can return it,
but I see no profit so far.

Thanks!

[PATCH net] Revert "defer call to mem_cgroup_sk_alloc()"

2018-02-02 Thread Roman Gushchin

This patch effectively reverts commit 9f1c2674b328 ("net: memcontrol:
defer call to mem_cgroup_sk_alloc()").

Moving mem_cgroup_sk_alloc() to the inet_csk_accept() completely breaks
memcg socket memory accounting, as packets received before memcg
pointer initialization are not accounted and are causing refcounting
underflow on socket release.

Actually the free-after-use problem was fixed by
commit c0576e397508 ("net: call cgroup_sk_alloc() earlier in
sk_clone_lock()") for the cgroup pointer.

So, let's revert it and call mem_cgroup_sk_alloc() just before
cgroup_sk_alloc(). This is safe, as we hold a reference to the socket
we're cloning, and it holds a reference to the memcg.

Signed-off-by: Roman Gushchin <g...@fb.com>
Cc: Eric Dumazet <eduma...@google.com>
Cc: David S. Miller <da...@davemloft.net>
Cc: Johannes Weiner <han...@cmpxchg.org>
Cc: Tejun Heo <t...@kernel.org>
---
 mm/memcontrol.c | 14 ++
 net/core/sock.c |  5 +
 net/ipv4/inet_connection_sock.c |  1 -
 3 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 0ae2dc3a1748..0937f2c52c7d 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5747,6 +5747,20 @@ void mem_cgroup_sk_alloc(struct sock *sk)
if (!mem_cgroup_sockets_enabled)
return;
 
+   /*
+* Socket cloning can throw us here with sk_memcg already
+* filled. It won't however, necessarily happen from
+* process context. So the test for root memcg given
+* the current task's memcg won't help us in this case.
+*
+* Respecting the original socket's memcg is a better
+* decision in this case.
+*/
+   if (sk->sk_memcg) {
+   css_get(>sk_memcg->css);
+   return;
+   }
+
rcu_read_lock();
memcg = mem_cgroup_from_task(current);
if (memcg == root_mem_cgroup)
diff --git a/net/core/sock.c b/net/core/sock.c
index 1033f8ab0547..e50e7b3f2223 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1683,16 +1683,13 @@ struct sock *sk_clone_lock(const struct sock *sk, const 
gfp_t priority)
newsk->sk_dst_pending_confirm = 0;
newsk->sk_wmem_queued   = 0;
newsk->sk_forward_alloc = 0;
-
-   /* sk->sk_memcg will be populated at accept() time */
-   newsk->sk_memcg = NULL;
-
atomic_set(>sk_drops, 0);
newsk->sk_send_head = NULL;
newsk->sk_userlocks = sk->sk_userlocks & 
~SOCK_BINDPORT_LOCK;
atomic_set(>sk_zckey, 0);
 
sock_reset_flag(newsk, SOCK_DONE);
+   mem_cgroup_sk_alloc(newsk);
cgroup_sk_alloc(>sk_cgrp_data);
 
rcu_read_lock();
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 12410ec6f7f7..881ac6d046f2 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -475,7 +475,6 @@ struct sock *inet_csk_accept(struct sock *sk, int flags, 
int *err, bool kern)
}
spin_unlock_bh(>fastopenq.lock);
}
-   mem_cgroup_sk_alloc(newsk);
 out:
release_sock(sk);
if (req)
-- 
2.14.3

Re: [PATCH net] net: memcontrol: charge allocated memory after mem_cgroup_sk_alloc()

2018-02-01 Thread Roman Gushchin

On Thu, Feb 01, 2018 at 03:27:14PM -0800, Eric Dumazet wrote:
> Well, this memcg stuff is so confusing.
> 
> My recollection is that we had :
> 
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=d979a39d7242e0601bf9b60e89628fb8ac577179
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=75cb070960ade40fba5de32138390f3c85c90941
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=c0576e3975084d4699b7bfef578613fb8e1144f6
> 
> And commit a590b90d472f2c176c140576ee3ab44df7f67839 as well
> 
> Honestly bug was closed months ago for us, based on stack traces on the wild.
> 
> No C repro or whatever, but reproducing it would be a matter of
> having a TCP listener constantly doing a
> socket()/setsockopt(REUSEADDR)/bind()/listen()/close() in a loop,
> while connections are attempted to the listening port.

Oh, I see...

Then I think that we should return memcg_sk_alloc() back to the bh context,
where cgroup_sk_alloc() is, and repeat all the tricks to avoid copying
dead cgroups/memcg pointers. Do you agree?

I'll try to master a patch and reproduce the issue.

Thanks!

Re: [PATCH net] net: memcontrol: charge allocated memory after mem_cgroup_sk_alloc()

2018-02-01 Thread Roman Gushchin

On Thu, Feb 01, 2018 at 01:17:56PM -0800, Eric Dumazet wrote:
> On Thu, Feb 1, 2018 at 12:22 PM, Roman Gushchin <g...@fb.com> wrote:
> > On Thu, Feb 01, 2018 at 10:16:55AM -0500, David Miller wrote:
> >> From: Roman Gushchin <g...@fb.com>
> >> Date: Wed, 31 Jan 2018 21:54:08 +
> >>
> >> > So I really start thinking that reverting 9f1c2674b328
> >> > ("net: memcontrol: defer call to mem_cgroup_sk_alloc()")
> >> > and fixing the original issue differently might be easier
> >> > and a proper way to go. Does it makes sense?
> >>
> >> You'll need to work that out with Eric Dumazet who added the
> >> change in question which you think we should revert.
> >
> > Eric,
> >
> > can you, please, provide some details about the use-after-free problem
> > that you've fixed with commit 9f1c2674b328 ("net: memcontrol: defer call
> > to mem_cgroup_sk_alloc()" ? Do you know how to reproduce it?
> >
> > Deferring mem_cgroup_sk_alloc() breaks socket memory accounting
> > and makes it much more fragile in general. So, I wonder, if there are
> > solutions for the use-after-free problem.
> >
> > Thank you!
> >
> > Roman
> 
> Unfortunately bug is not public (Google-Bug-Id 67556600 for Googlers
> following this thread )
> 
> Our kernel has a debug feature on percpu_ref_get_many() which detects
> the typical use-after-free problem of
> doing atomic_long_add(nr, >count); while ref->count is 0, or
> memory already freed.
> 
> Bug was serious because css_put() will release the css a second time.
> 
> Stack trace looked like :
> 
> Oct  8 00:23:14 lphh23 kernel: [27239.568098]  
> [] dump_stack+0x4d/0x6c
> Oct  8 00:23:14 lphh23 kernel: [27239.568108]  [] ?
> cgroup_get+0x43/0x50
> Oct  8 00:23:14 lphh23 kernel: [27239.568114]  []
> warn_slowpath_common+0xac/0xc8
> Oct  8 00:23:14 lphh23 kernel: [27239.568117]  []
> warn_slowpath_null+0x1a/0x1c
> Oct  8 00:23:14 lphh23 kernel: [27239.568120]  []
> cgroup_get+0x43/0x50
> Oct  8 00:23:14 lphh23 kernel: [27239.568123]  []
> cgroup_sk_alloc+0x64/0x90

Hm, that looks strange... It's cgroup_sk_alloc(),
not mem_cgroup_sk_alloc(), which was removed by 9f1c2674b328.

I thought, that it's css_get() in mem_cgroup_sk_alloc(), which
you removed, but the stacktrace you've posted is different.

void mem_cgroup_sk_alloc(struct sock *sk) {
/*
 * Socket cloning can throw us here with sk_memcg already
 * filled. It won't however, necessarily happen from
 * process context. So the test for root memcg given
 * the current task's memcg won't help us in this case.
 *
 * Respecting the original socket's memcg is a better
 * decision in this case.
 */
if (sk->sk_memcg) {
BUG_ON(mem_cgroup_is_root(sk->sk_memcg));
>>> css_get(>sk_memcg->css);
return;
}

Is it possible to reproduce the issue on an upstream kernel?
Any ideas of what can trigger it?

Btw, with the following patch applied (below) and cgroup v2 enabled,
the issue, which I'm talking about, can be reproduced in seconds after reboot
by doing almost any network activity. Just sshing to a machine is enough.
The corresponding warning will be printed to dmesg.

What is a proper way to fix the socket memory accounting in this case,
what do you think?

Thank you!

Roman

--

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index c51c589..55fb890 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -276,6 +276,8 @@ struct mem_cgroup {
struct list_head event_list;
spinlock_t event_list_lock;
 
+   atomic_t tcpcnt;
+
struct mem_cgroup_per_node *nodeinfo[0];
/* WARNING: nodeinfo must be the last member here */
 };
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 19eea69..c69ff04 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5623,6 +5623,8 @@ static int memory_stat_show(struct seq_file *m, void *v)
seq_printf(m, "workingset_nodereclaim %lu\n",
   stat[WORKINGSET_NODERECLAIM]);
 
+   seq_printf(m, "tcpcnt %d\n", atomic_read(>tcpcnt));
+
return 0;
 }
 
@@ -6139,6 +6141,8 @@ bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, 
unsigned int nr_pages)
 {
gfp_t gfp_mask = GFP_KERNEL;
 
+   atomic_add(nr_pages, >tcpcnt);
+
if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
struct page_counter *fail;
 
@@ -6171,6 +6175,11 @@ bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, 
unsigned int nr_pages)
  */
 void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
 {
+   int v = atomic_sub_return(nr_pages, >tcpcnt);
+   if (v < 0) {
+   pr_info("@@@ %p %d \n", memcg, v);
+   }
+
if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
page_counter_uncharge(>tcpmem, nr_pages);
return;

Re: [PATCH net] net: memcontrol: charge allocated memory after mem_cgroup_sk_alloc()

2018-02-01 Thread Roman Gushchin

On Thu, Feb 01, 2018 at 10:16:55AM -0500, David Miller wrote:
> From: Roman Gushchin <g...@fb.com>
> Date: Wed, 31 Jan 2018 21:54:08 +
> 
> > So I really start thinking that reverting 9f1c2674b328
> > ("net: memcontrol: defer call to mem_cgroup_sk_alloc()")
> > and fixing the original issue differently might be easier
> > and a proper way to go. Does it makes sense?
> 
> You'll need to work that out with Eric Dumazet who added the
> change in question which you think we should revert.

Eric,

can you, please, provide some details about the use-after-free problem
that you've fixed with commit 9f1c2674b328 ("net: memcontrol: defer call
to mem_cgroup_sk_alloc()" ? Do you know how to reproduce it?

Deferring mem_cgroup_sk_alloc() breaks socket memory accounting
and makes it much more fragile in general. So, I wonder, if there are
solutions for the use-after-free problem.

Thank you!

Roman

Re: [PATCH net] net: memcontrol: charge allocated memory after mem_cgroup_sk_alloc()

2018-01-31 Thread Roman Gushchin

On Thu, Jan 25, 2018 at 12:03:02PM -0500, David Miller wrote:
> From: Roman Gushchin <g...@fb.com>
> Date: Thu, 25 Jan 2018 00:19:11 +
> 
> > @@ -476,6 +477,10 @@ struct sock *inet_csk_accept(struct sock *sk, int 
> > flags, int *err, bool kern)
> > spin_unlock_bh(>fastopenq.lock);
> > }
> > mem_cgroup_sk_alloc(newsk);
> > +   amt = sk_memory_allocated(newsk);
> > +   if (amt && newsk->sk_memcg)
> > +   mem_cgroup_charge_skmem(newsk->sk_memcg, amt);
> > +
> 
> This looks confusing to me.
> 
> sk_memory_allocated() is the total amount of memory used by all
> sockets for a particular "struct proto", not just for that specific
> socket.
> 
> Maybe I don't understand how this socket memcg stuff works, but it
> seems like you should be looking instead at how much memory is
> allocated to this specific socket.

So, the patch below takes the per-socket charge into account,
and it _almost_ works: css leak is weaker by a couple orders
of magnitude, but still exists. I believe, the problem is
that we need additional synchronization for sk_memcg and
sk_forward_alloc fields; and I'm really out of ideas how
to do it without heavy artillery like introducing a new
field for unaccounted memcg charge. As I can see, we
check the sk_memcg field without socket lock; and we
do set it from a concurrent context.
Most likely, I do miss something...

So I really start thinking that reverting 9f1c2674b328
("net: memcontrol: defer call to mem_cgroup_sk_alloc()")
and fixing the original issue differently might be easier
and a proper way to go. Does it makes sense?

Thanks!

--

diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 4ca46dc08e63..287de1501a30 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -476,6 +476,12 @@ struct sock *inet_csk_accept(struct sock *sk, int flags, 
int *err, bool kern)
spin_unlock_bh(>fastopenq.lock);
}
mem_cgroup_sk_alloc(newsk);
+   if (mem_cgroup_sockets_enabled && newsk->sk_memcg) {
+   int amt = sk_mem_pages(newsk->sk_forward_alloc);
+   if (amt > 0)
+   mem_cgroup_charge_skmem(newsk->sk_memcg, amt);
+   }
+
 out:
release_sock(sk);
if (req)

Re: [PATCH net] net: memcontrol: charge allocated memory after mem_cgroup_sk_alloc()

2018-01-25 Thread Roman Gushchin

On Thu, Jan 25, 2018 at 12:03:02PM -0500, David Miller wrote:
> From: Roman Gushchin <g...@fb.com>
> Date: Thu, 25 Jan 2018 00:19:11 +
> 
> > @@ -476,6 +477,10 @@ struct sock *inet_csk_accept(struct sock *sk, int 
> > flags, int *err, bool kern)
> > spin_unlock_bh(>fastopenq.lock);
> > }
> > mem_cgroup_sk_alloc(newsk);
> > +   amt = sk_memory_allocated(newsk);
> > +   if (amt && newsk->sk_memcg)
> > +   mem_cgroup_charge_skmem(newsk->sk_memcg, amt);
> > +
> 
> This looks confusing to me.
> 
> sk_memory_allocated() is the total amount of memory used by all
> sockets for a particular "struct proto", not just for that specific
> socket.

Oh, I see...

> 
> Maybe I don't understand how this socket memcg stuff works, but it
> seems like you should be looking instead at how much memory is
> allocated to this specific socket.

Yes, this is what I wanted to do originally.
Let me find a proper way to do this.

Thank you!

Roman

[PATCH net] net: memcontrol: charge allocated memory after mem_cgroup_sk_alloc()

2018-01-24 Thread Roman Gushchin

We've catched several cgroup css refcounting issues on 4.15-rc7,
triggered from different release paths. We've used cgroups v2.
I've added a temporarily per-memcg sockmem atomic counter,
and found, that we're sometimes falling below 0. It was easy
to reproduce, so I was able to bisect the problem.

It was introduced by the commit 9f1c2674b328 ("net: memcontrol:
defer call to mem_cgroup_sk_alloc()"), which moved
the mem_cgroup_sk_alloc() call from the BH context
into inet_csk_accept().

The problem is that all the memory allocated before
mem_cgroup_sk_alloc() is charged to the socket,
but not charged to the memcg. So, when we're releasing
the socket, we're uncharging more, than we've charged.

Fix this by charging the cgroup by the amount of already
allocated memory right after mem_cgroup_sk_alloc() in
inet_csk_accept().

Fixes: 9f1c2674b328 ("net: memcontrol: defer call to mem_cgroup_sk_alloc()")
Signed-off-by: Roman Gushchin <g...@fb.com>
Cc: Eric Dumazet <eduma...@google.com>
Cc: Johannes Weiner <han...@cmpxchg.org>
Cc: Tejun Heo <t...@kernel.org>
Cc: David S. Miller <da...@davemloft.net>
---
 net/ipv4/inet_connection_sock.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 4ca46dc08e63..f439162c2ea2 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -434,6 +434,7 @@ struct sock *inet_csk_accept(struct sock *sk, int flags, 
int *err, bool kern)
struct request_sock *req;
struct sock *newsk;
int error;
+   long amt;
 
lock_sock(sk);
 
@@ -476,6 +477,10 @@ struct sock *inet_csk_accept(struct sock *sk, int flags, 
int *err, bool kern)
spin_unlock_bh(>fastopenq.lock);
}
mem_cgroup_sk_alloc(newsk);
+   amt = sk_memory_allocated(newsk);
+   if (amt && newsk->sk_memcg)
+   mem_cgroup_charge_skmem(newsk->sk_memcg, amt);
+
 out:
release_sock(sk);
if (req)
-- 
2.14.3

Re: [PATCH bpf-next] bpftool: recognize BPF_PROG_TYPE_CGROUP_DEVICE programs

2018-01-19 Thread Roman Gushchin

On Mon, Jan 15, 2018 at 07:32:01PM +, Quentin Monnet wrote:
> 2018-01-15 19:16 UTC+ ~ Roman Gushchin <g...@fb.com>
> > Bpftool doesn't recognize BPF_PROG_TYPE_CGROUP_DEVICE programs,
> > so the prog show command prints the numeric type value:
> >
> > $ bpftool prog show
> > 1: type 15  name bpf_prog1  tag ac9f93dbfd6d9b74
> > loaded_at Jan 15/07:58  uid 0
> > xlated 96B  jited 105B  memlock 4096B
> >
> > This patch defines the corresponding textual representation:
> >
> > $ bpftool prog show
> > 1: cgroup_device  name bpf_prog1  tag ac9f93dbfd6d9b74
> > loaded_at Jan 15/07:58  uid 0
> > xlated 96B  jited 105B  memlock 4096B
> >
> > Signed-off-by: Roman Gushchin <g...@fb.com>
> > Cc: Jakub Kicinski <jakub.kicin...@netronome.com>
> > Cc: Quentin Monnet <quentin.mon...@netronome.com>
> > Cc: Daniel Borkmann <dan...@iogearbox.net>
> > Cc: Alexei Starovoitov <a...@kernel.org>
> > ---
> >  tools/bpf/bpftool/prog.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
> > index c6a28be4665c..099e21cf1b5c 100644
> > --- a/tools/bpf/bpftool/prog.c
> > +++ b/tools/bpf/bpftool/prog.c
> > @@ -66,6 +66,7 @@ static const char * const prog_type_name[] = {
> > [BPF_PROG_TYPE_LWT_XMIT]= "lwt_xmit",
> > [BPF_PROG_TYPE_SOCK_OPS]= "sock_ops",
> > [BPF_PROG_TYPE_SK_SKB]  = "sk_skb",
> > +   [BPF_PROG_TYPE_CGROUP_DEVICE]   = "cgroup_device",
> >  };
> >  
> >  static void print_boot_time(__u64 nsecs, char *buf, unsigned int size)
> 
> Looks good, thanks Roman!
> Would you mind updating the map names as well? It seems the
> BPF_MAP_TYPE_CPUMAP is missing from the list in map.c.

Hello, Quentin!


Here is the patch.


Thanks!

--

>From 16245383a894038a63cc1ad4b77629ba704aaa38 Mon Sep 17 00:00:00 2001
From: Roman Gushchin <g...@fb.com>
Date: Fri, 19 Jan 2018 14:07:38 +
Subject: [PATCH bpf-next] bpftool: recognize BPF_MAP_TYPE_CPUMAP maps

Add BPF_MAP_TYPE_CPUMAP map type to the list
of map type recognized by bpftool and define
corresponding text representation.

Signed-off-by: Roman Gushchin <g...@fb.com>
Cc: Quentin Monnet <quentin.mon...@netronome.com>
Cc: Jakub Kicinski <jakub.kicin...@netronome.com>
Cc: Daniel Borkmann <dan...@iogearbox.net>
Cc: Alexei Starovoitov <a...@kernel.org>
---
 tools/bpf/bpftool/map.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c
index a152c1a5c94c..f95fa67bb498 100644
--- a/tools/bpf/bpftool/map.c
+++ b/tools/bpf/bpftool/map.c
@@ -66,6 +66,7 @@ static const char * const map_type_name[] = {
[BPF_MAP_TYPE_HASH_OF_MAPS] = "hash_of_maps",
[BPF_MAP_TYPE_DEVMAP]   = "devmap",
[BPF_MAP_TYPE_SOCKMAP]  = "sockmap",
+   [BPF_MAP_TYPE_CPUMAP]   = "cpumap",
 };
 
 static unsigned int get_possible_cpus(void)
-- 
2.14.3

Re: [PATCH bpf-next] bpftool: recognize BPF_PROG_TYPE_CGROUP_DEVICE programs

2018-01-15 Thread Roman Gushchin

On Mon, Jan 15, 2018 at 07:32:01PM +, Quentin Monnet wrote:
> 2018-01-15 19:16 UTC+ ~ Roman Gushchin <g...@fb.com>
> > Bpftool doesn't recognize BPF_PROG_TYPE_CGROUP_DEVICE programs,
> > so the prog show command prints the numeric type value:
> >
> > $ bpftool prog show
> > 1: type 15  name bpf_prog1  tag ac9f93dbfd6d9b74
> > loaded_at Jan 15/07:58  uid 0
> > xlated 96B  jited 105B  memlock 4096B
> >
> > This patch defines the corresponding textual representation:
> >
> > $ bpftool prog show
> > 1: cgroup_device  name bpf_prog1  tag ac9f93dbfd6d9b74
> > loaded_at Jan 15/07:58  uid 0
> > xlated 96B  jited 105B  memlock 4096B
> >
> > Signed-off-by: Roman Gushchin <g...@fb.com>
> > Cc: Jakub Kicinski <jakub.kicin...@netronome.com>
> > Cc: Quentin Monnet <quentin.mon...@netronome.com>
> > Cc: Daniel Borkmann <dan...@iogearbox.net>
> > Cc: Alexei Starovoitov <a...@kernel.org>
> > ---
> >  tools/bpf/bpftool/prog.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
> > index c6a28be4665c..099e21cf1b5c 100644
> > --- a/tools/bpf/bpftool/prog.c
> > +++ b/tools/bpf/bpftool/prog.c
> > @@ -66,6 +66,7 @@ static const char * const prog_type_name[] = {
> > [BPF_PROG_TYPE_LWT_XMIT]= "lwt_xmit",
> > [BPF_PROG_TYPE_SOCK_OPS]= "sock_ops",
> > [BPF_PROG_TYPE_SK_SKB]  = "sk_skb",
> > +   [BPF_PROG_TYPE_CGROUP_DEVICE]   = "cgroup_device",
> >  };
> >  
> >  static void print_boot_time(__u64 nsecs, char *buf, unsigned int size)
> 
> Looks good, thanks Roman!
> Would you mind updating the map names as well? It seems the
> BPF_MAP_TYPE_CPUMAP is missing from the list in map.c.

Hello, Quentin!

Sure, I'll take a look.

Thanks!

[PATCH bpf-next] bpftool: recognize BPF_PROG_TYPE_CGROUP_DEVICE programs

2018-01-15 Thread Roman Gushchin

Bpftool doesn't recognize BPF_PROG_TYPE_CGROUP_DEVICE programs,
so the prog show command prints the numeric type value:

$ bpftool prog show
1: type 15  name bpf_prog1  tag ac9f93dbfd6d9b74
loaded_at Jan 15/07:58  uid 0
xlated 96B  jited 105B  memlock 4096B

This patch defines the corresponding textual representation:

$ bpftool prog show
1: cgroup_device  name bpf_prog1  tag ac9f93dbfd6d9b74
loaded_at Jan 15/07:58  uid 0
xlated 96B  jited 105B  memlock 4096B

Signed-off-by: Roman Gushchin <g...@fb.com>
Cc: Jakub Kicinski <jakub.kicin...@netronome.com>
Cc: Quentin Monnet <quentin.mon...@netronome.com>
Cc: Daniel Borkmann <dan...@iogearbox.net>
Cc: Alexei Starovoitov <a...@kernel.org>
---
 tools/bpf/bpftool/prog.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
index c6a28be4665c..099e21cf1b5c 100644
--- a/tools/bpf/bpftool/prog.c
+++ b/tools/bpf/bpftool/prog.c
@@ -66,6 +66,7 @@ static const char * const prog_type_name[] = {
[BPF_PROG_TYPE_LWT_XMIT]= "lwt_xmit",
[BPF_PROG_TYPE_SOCK_OPS]= "sock_ops",
[BPF_PROG_TYPE_SK_SKB]  = "sk_skb",
+   [BPF_PROG_TYPE_CGROUP_DEVICE]   = "cgroup_device",
 };
 
 static void print_boot_time(__u64 nsecs, char *buf, unsigned int size)
-- 
2.14.3

[PATCH v3 bpf-next 1/2] tools/bpftool: use version from the kernel source tree

2017-12-27 Thread Roman Gushchin

Bpftool determines it's own version based on the kernel
version, which is picked from the linux/version.h header.

It's strange to use the version of the installed kernel
headers, and makes much more sense to use the version
of the actual source tree, where bpftool sources are.

Fix this by building kernelversion target and use
the resulting string as bpftool version.

Example:
before:

$ bpftool version
bpftool v4.14.6

after:
$ bpftool version
bpftool v4.15.0-rc3

$bpftool version --json
{"version":"4.15.0-rc3"}

Signed-off-by: Roman Gushchin <g...@fb.com>
Reviewed-by: Jakub Kicinski <jakub.kicin...@netronome.com>
Cc: Alexei Starovoitov <a...@kernel.org>
Cc: Daniel Borkmann <dan...@iogearbox.net>
---
 tools/bpf/bpftool/Makefile |  3 +++
 tools/bpf/bpftool/main.c   | 13 ++---
 2 files changed, 5 insertions(+), 11 deletions(-)

diff --git a/tools/bpf/bpftool/Makefile b/tools/bpf/bpftool/Makefile
index 3f17ad317512..f8f31a8d9269 100644
--- a/tools/bpf/bpftool/Makefile
+++ b/tools/bpf/bpftool/Makefile
@@ -23,6 +23,8 @@ endif
 
 LIBBPF = $(BPF_PATH)libbpf.a
 
+BPFTOOL_VERSION=$(shell make --no-print-directory -sC ../../.. kernelversion)
+
 $(LIBBPF): FORCE
$(Q)$(MAKE) -C $(BPF_DIR) OUTPUT=$(OUTPUT) $(OUTPUT)libbpf.a 
FEATURES_DUMP=$(FEATURE_DUMP_EXPORT)
 
@@ -38,6 +40,7 @@ CC = gcc
 CFLAGS += -O2
 CFLAGS += -W -Wall -Wextra -Wno-unused-parameter -Wshadow
 CFLAGS += -D__EXPORTED_HEADERS__ -I$(srctree)/tools/include/uapi 
-I$(srctree)/tools/include -I$(srctree)/tools/lib/bpf -I$(srctree)/kernel/bpf/
+CFLAGS += -DBPFTOOL_VERSION='"$(BPFTOOL_VERSION)"'
 LIBS = -lelf -lbfd -lopcodes $(LIBBPF)
 
 INSTALL ?= install
diff --git a/tools/bpf/bpftool/main.c b/tools/bpf/bpftool/main.c
index ecd53ccf1239..3a0396d87c42 100644
--- a/tools/bpf/bpftool/main.c
+++ b/tools/bpf/bpftool/main.c
@@ -38,7 +38,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -95,21 +94,13 @@ static int do_help(int argc, char **argv)
 
 static int do_version(int argc, char **argv)
 {
-   unsigned int version[3];
-
-   version[0] = LINUX_VERSION_CODE >> 16;
-   version[1] = LINUX_VERSION_CODE >> 8 & 0xf;
-   version[2] = LINUX_VERSION_CODE & 0xf;
-
if (json_output) {
jsonw_start_object(json_wtr);
jsonw_name(json_wtr, "version");
-   jsonw_printf(json_wtr, "\"%u.%u.%u\"",
-version[0], version[1], version[2]);
+   jsonw_printf(json_wtr, "\"%s\"", BPFTOOL_VERSION);
jsonw_end_object(json_wtr);
} else {
-   printf("%s v%u.%u.%u\n", bin_name,
-  version[0], version[1], version[2]);
+   printf("%s v%s\n", bin_name, BPFTOOL_VERSION);
}
return 0;
 }
-- 
2.14.3

[PATCH v3 bpf-next 2/2] tools/bpftool: fix bpftool build with bintutils >= 2.9

2017-12-27 Thread Roman Gushchin

Bpftool build is broken with binutils version 2.29 and later.
The cause is commit 003ca0fd2286 ("Refactor disassembler selection")
in the binutils repo, which changed the disassembler() function
signature.

Fix this by adding a new "feature" to the tools/build/features
infrastructure and make it responsible for decision which
disassembler() function signature to use.

Signed-off-by: Roman Gushchin <g...@fb.com>
Cc: Jakub Kicinski <jakub.kicin...@netronome.com>
Cc: Alexei Starovoitov <a...@kernel.org>
Cc: Daniel Borkmann <dan...@iogearbox.net>
---
 tools/bpf/Makefile| 29 +++
 tools/bpf/bpf_jit_disasm.c|  7 ++
 tools/bpf/bpftool/Makefile| 24 +++
 tools/bpf/bpftool/jit_disasm.c|  7 ++
 tools/build/feature/Makefile  |  4 
 tools/build/feature/test-disassembler-four-args.c | 15 
 6 files changed, 86 insertions(+)
 create mode 100644 tools/build/feature/test-disassembler-four-args.c

diff --git a/tools/bpf/Makefile b/tools/bpf/Makefile
index 07a6697466ef..c8ec0ae16bf0 100644
--- a/tools/bpf/Makefile
+++ b/tools/bpf/Makefile
@@ -9,6 +9,35 @@ MAKE = make
 CFLAGS += -Wall -O2
 CFLAGS += -D__EXPORTED_HEADERS__ -I../../include/uapi -I../../include
 
+ifeq ($(srctree),)
+srctree := $(patsubst %/,%,$(dir $(CURDIR)))
+srctree := $(patsubst %/,%,$(dir $(srctree)))
+endif
+
+FEATURE_USER = .bpf
+FEATURE_TESTS = libbfd disassembler-four-args
+FEATURE_DISPLAY = libbfd disassembler-four-args
+
+check_feat := 1
+NON_CHECK_FEAT_TARGETS := clean bpftool_clean
+ifdef MAKECMDGOALS
+ifeq ($(filter-out $(NON_CHECK_FEAT_TARGETS),$(MAKECMDGOALS)),)
+  check_feat := 0
+endif
+endif
+
+ifeq ($(check_feat),1)
+ifeq ($(FEATURES_DUMP),)
+include $(srctree)/tools/build/Makefile.feature
+else
+include $(FEATURES_DUMP)
+endif
+endif
+
+ifeq ($(feature-disassembler-four-args), 1)
+CFLAGS += -DDISASM_FOUR_ARGS_SIGNATURE
+endif
+
 %.yacc.c: %.y
$(YACC) -o $@ -d $<
 
diff --git a/tools/bpf/bpf_jit_disasm.c b/tools/bpf/bpf_jit_disasm.c
index 75bf526a0168..30044bc4f389 100644
--- a/tools/bpf/bpf_jit_disasm.c
+++ b/tools/bpf/bpf_jit_disasm.c
@@ -72,7 +72,14 @@ static void get_asm_insns(uint8_t *image, size_t len, int 
opcodes)
 
disassemble_init_for_target();
 
+#ifdef DISASM_FOUR_ARGS_SIGNATURE
+   disassemble = disassembler(info.arch,
+  bfd_big_endian(bfdf),
+  info.mach,
+  bfdf);
+#else
disassemble = disassembler(bfdf);
+#endif
assert(disassemble);
 
do {
diff --git a/tools/bpf/bpftool/Makefile b/tools/bpf/bpftool/Makefile
index f8f31a8d9269..2237bc43f71c 100644
--- a/tools/bpf/bpftool/Makefile
+++ b/tools/bpf/bpftool/Makefile
@@ -46,6 +46,30 @@ LIBS = -lelf -lbfd -lopcodes $(LIBBPF)
 INSTALL ?= install
 RM ?= rm -f
 
+FEATURE_USER = .bpftool
+FEATURE_TESTS = libbfd disassembler-four-args
+FEATURE_DISPLAY = libbfd disassembler-four-args
+
+check_feat := 1
+NON_CHECK_FEAT_TARGETS := clean uninstall doc doc-clean doc-install 
doc-uninstall
+ifdef MAKECMDGOALS
+ifeq ($(filter-out $(NON_CHECK_FEAT_TARGETS),$(MAKECMDGOALS)),)
+  check_feat := 0
+endif
+endif
+
+ifeq ($(check_feat),1)
+ifeq ($(FEATURES_DUMP),)
+include $(srctree)/tools/build/Makefile.feature
+else
+include $(FEATURES_DUMP)
+endif
+endif
+
+ifeq ($(feature-disassembler-four-args), 1)
+CFLAGS += -DDISASM_FOUR_ARGS_SIGNATURE
+endif
+
 include $(wildcard *.d)
 
 all: $(OUTPUT)bpftool
diff --git a/tools/bpf/bpftool/jit_disasm.c b/tools/bpf/bpftool/jit_disasm.c
index 1551d3918d4c..57d32e8a1391 100644
--- a/tools/bpf/bpftool/jit_disasm.c
+++ b/tools/bpf/bpftool/jit_disasm.c
@@ -107,7 +107,14 @@ void disasm_print_insn(unsigned char *image, ssize_t len, 
int opcodes)
 
disassemble_init_for_target();
 
+#ifdef DISASM_FOUR_ARGS_SIGNATURE
+   disassemble = disassembler(info.arch,
+  bfd_big_endian(bfdf),
+  info.mach,
+  bfdf);
+#else
disassemble = disassembler(bfdf);
+#endif
assert(disassemble);
 
if (json_output)
diff --git a/tools/build/feature/Makefile b/tools/build/feature/Makefile
index 96982640fbf8..17f2c73fff8b 100644
--- a/tools/build/feature/Makefile
+++ b/tools/build/feature/Makefile
@@ -13,6 +13,7 @@ FILES=  \
  test-hello.bin \
  test-libaudit.bin  \
  test-libbfd.bin\
+ test-disassembler-four-args.bin\
  test-liberty.bin   \
  test-liberty-z.bin \
  test-cplus-demangle.bin\
@@ -188,6 +189,9 @@ $(OUTPUT)test-libpython-version.bin:
 $(OUTPUT)test-libbfd.bin:

Re: [PATCH v2 bpf-next 2/2] tools/bpftool: fix bpftool build with bintutils >= 2.8

2017-12-27 Thread Roman Gushchin

On Tue, Dec 26, 2017 at 06:32:05PM -0800, Alexei Starovoitov wrote:
> On Fri, Dec 22, 2017 at 06:50:01PM +, Quentin Monnet wrote:
> > Hi Roman,
> > 
> > 2017-12-22 16:11 UTC+ ~ Roman Gushchin <g...@fb.com>
> > > Bpftool build is broken with binutils version 2.28 and later.
> > 
> > Could you check the binutils version? I believe it changed in 2.29
> > instead of 2.28. Could you update your commit log and subject
> > accordingly, please?

Yes, you're right. Thanks!

> > 
> > > The cause is commit 003ca0fd2286 ("Refactor disassembler selection")
> > > in the binutils repo, which changed the disassembler() function
> > > signature.
> > > 
> > > Fix this by adding a new "feature" to the tools/build/features
> > > infrastructure and make it responsible for decision which
> > > disassembler() function signature to use.
> > > 
> > > Signed-off-by: Roman Gushchin <g...@fb.com>
> > > Cc: Jakub Kicinski <jakub.kicin...@netronome.com>
> > > Cc: Alexei Starovoitov <a...@kernel.org>
> > > Cc: Daniel Borkmann <dan...@iogearbox.net>
> > > ---
> > >  tools/bpf/Makefile| 29 
> > > +++
> > >  tools/bpf/bpf_jit_disasm.c|  7 ++
> > >  tools/bpf/bpftool/Makefile| 24 
> > > +++
> > >  tools/bpf/bpftool/jit_disasm.c|  7 ++
> > >  tools/build/feature/Makefile  |  4 
> > >  tools/build/feature/test-disassembler-four-args.c | 15 
> > >  6 files changed, 86 insertions(+)
> > >  create mode 100644 tools/build/feature/test-disassembler-four-args.c
> > > 
> > > diff --git a/tools/bpf/Makefile b/tools/bpf/Makefile
> > > index 07a6697466ef..c8ec0ae16bf0 100644
> > > --- a/tools/bpf/Makefile
> > > +++ b/tools/bpf/Makefile
> > > @@ -9,6 +9,35 @@ MAKE = make
> > >  CFLAGS += -Wall -O2
> > >  CFLAGS += -D__EXPORTED_HEADERS__ -I../../include/uapi -I../../include
> > >  
> > > +ifeq ($(srctree),)
> > > +srctree := $(patsubst %/,%,$(dir $(CURDIR)))
> > > +srctree := $(patsubst %/,%,$(dir $(srctree)))
> > > +endif
> > > +
> > > +FEATURE_USER = .bpf
> > > +FEATURE_TESTS = libbfd disassembler-four-args
> > > +FEATURE_DISPLAY = libbfd disassembler-four-args
> > 
> > Thanks for adding libbfd as I requested. However, you do not use it in
> > the Makefile to prevent compilation if the feature is not detected (see
> > "bpfdep" or "elfdep" in tools/lib/bpf/Makefile. Sorry, I should have
> > pointed it in my previous review.
> > 
> > But actually, I have another issue related to the libbfd feature: since
> > commit 280e7c48c3b8 ("perf tools: fix BFD detection on opensuse") it
> > requires libiberty so that libbfd is correctly detected, but libiberty
> > is not needed on all distros (at least Ubuntu can have libbfd without
> > libiberty). Typically, detection fails on my setup, although I do have
> > libbfd installed. So forcing libbfd feature here may eventually force
> > users to install libraries they do not need to compile bpftool, which is
> > not what we want.
> > 
> > I do not have a clean work around to suggest. Maybe have one
> > "libbfd-something" feature that tries to compile without libiberty, then
> > another one that tries with it, and compile the tools if at least one of
> > them succeeds. But it's probably for another patch series. In the
> > meantime, would you please simply remove libbfd detection here and
> > accept my apologies for suggesting to add it in the previous review?
> 
> I think since libbfd is already used by bpftool it's a good thing
> to add feature detection. Even if it's not perfect on some setups.

Agree, we can enhance it later.

> 
> Roman,
> I think you still need to do one more respin to address commit log nit?
> 

Sure, will send soon-ish.

Thanks!

[PATCH v2 bpf-next 2/2] tools/bpftool: fix bpftool build with bintutils >= 2.8

2017-12-22 Thread Roman Gushchin

Bpftool build is broken with binutils version 2.28 and later.
The cause is commit 003ca0fd2286 ("Refactor disassembler selection")
in the binutils repo, which changed the disassembler() function
signature.

Fix this by adding a new "feature" to the tools/build/features
infrastructure and make it responsible for decision which
disassembler() function signature to use.

Signed-off-by: Roman Gushchin <g...@fb.com>
Cc: Jakub Kicinski <jakub.kicin...@netronome.com>
Cc: Alexei Starovoitov <a...@kernel.org>
Cc: Daniel Borkmann <dan...@iogearbox.net>
---
 tools/bpf/Makefile| 29 +++
 tools/bpf/bpf_jit_disasm.c|  7 ++
 tools/bpf/bpftool/Makefile| 24 +++
 tools/bpf/bpftool/jit_disasm.c|  7 ++
 tools/build/feature/Makefile  |  4 
 tools/build/feature/test-disassembler-four-args.c | 15 
 6 files changed, 86 insertions(+)
 create mode 100644 tools/build/feature/test-disassembler-four-args.c

diff --git a/tools/bpf/Makefile b/tools/bpf/Makefile
index 07a6697466ef..c8ec0ae16bf0 100644
--- a/tools/bpf/Makefile
+++ b/tools/bpf/Makefile
@@ -9,6 +9,35 @@ MAKE = make
 CFLAGS += -Wall -O2
 CFLAGS += -D__EXPORTED_HEADERS__ -I../../include/uapi -I../../include
 
+ifeq ($(srctree),)
+srctree := $(patsubst %/,%,$(dir $(CURDIR)))
+srctree := $(patsubst %/,%,$(dir $(srctree)))
+endif
+
+FEATURE_USER = .bpf
+FEATURE_TESTS = libbfd disassembler-four-args
+FEATURE_DISPLAY = libbfd disassembler-four-args
+
+check_feat := 1
+NON_CHECK_FEAT_TARGETS := clean bpftool_clean
+ifdef MAKECMDGOALS
+ifeq ($(filter-out $(NON_CHECK_FEAT_TARGETS),$(MAKECMDGOALS)),)
+  check_feat := 0
+endif
+endif
+
+ifeq ($(check_feat),1)
+ifeq ($(FEATURES_DUMP),)
+include $(srctree)/tools/build/Makefile.feature
+else
+include $(FEATURES_DUMP)
+endif
+endif
+
+ifeq ($(feature-disassembler-four-args), 1)
+CFLAGS += -DDISASM_FOUR_ARGS_SIGNATURE
+endif
+
 %.yacc.c: %.y
$(YACC) -o $@ -d $<
 
diff --git a/tools/bpf/bpf_jit_disasm.c b/tools/bpf/bpf_jit_disasm.c
index 75bf526a0168..30044bc4f389 100644
--- a/tools/bpf/bpf_jit_disasm.c
+++ b/tools/bpf/bpf_jit_disasm.c
@@ -72,7 +72,14 @@ static void get_asm_insns(uint8_t *image, size_t len, int 
opcodes)
 
disassemble_init_for_target();
 
+#ifdef DISASM_FOUR_ARGS_SIGNATURE
+   disassemble = disassembler(info.arch,
+  bfd_big_endian(bfdf),
+  info.mach,
+  bfdf);
+#else
disassemble = disassembler(bfdf);
+#endif
assert(disassemble);
 
do {
diff --git a/tools/bpf/bpftool/Makefile b/tools/bpf/bpftool/Makefile
index f8f31a8d9269..2237bc43f71c 100644
--- a/tools/bpf/bpftool/Makefile
+++ b/tools/bpf/bpftool/Makefile
@@ -46,6 +46,30 @@ LIBS = -lelf -lbfd -lopcodes $(LIBBPF)
 INSTALL ?= install
 RM ?= rm -f
 
+FEATURE_USER = .bpftool
+FEATURE_TESTS = libbfd disassembler-four-args
+FEATURE_DISPLAY = libbfd disassembler-four-args
+
+check_feat := 1
+NON_CHECK_FEAT_TARGETS := clean uninstall doc doc-clean doc-install 
doc-uninstall
+ifdef MAKECMDGOALS
+ifeq ($(filter-out $(NON_CHECK_FEAT_TARGETS),$(MAKECMDGOALS)),)
+  check_feat := 0
+endif
+endif
+
+ifeq ($(check_feat),1)
+ifeq ($(FEATURES_DUMP),)
+include $(srctree)/tools/build/Makefile.feature
+else
+include $(FEATURES_DUMP)
+endif
+endif
+
+ifeq ($(feature-disassembler-four-args), 1)
+CFLAGS += -DDISASM_FOUR_ARGS_SIGNATURE
+endif
+
 include $(wildcard *.d)
 
 all: $(OUTPUT)bpftool
diff --git a/tools/bpf/bpftool/jit_disasm.c b/tools/bpf/bpftool/jit_disasm.c
index 1551d3918d4c..57d32e8a1391 100644
--- a/tools/bpf/bpftool/jit_disasm.c
+++ b/tools/bpf/bpftool/jit_disasm.c
@@ -107,7 +107,14 @@ void disasm_print_insn(unsigned char *image, ssize_t len, 
int opcodes)
 
disassemble_init_for_target();
 
+#ifdef DISASM_FOUR_ARGS_SIGNATURE
+   disassemble = disassembler(info.arch,
+  bfd_big_endian(bfdf),
+  info.mach,
+  bfdf);
+#else
disassemble = disassembler(bfdf);
+#endif
assert(disassemble);
 
if (json_output)
diff --git a/tools/build/feature/Makefile b/tools/build/feature/Makefile
index 96982640fbf8..17f2c73fff8b 100644
--- a/tools/build/feature/Makefile
+++ b/tools/build/feature/Makefile
@@ -13,6 +13,7 @@ FILES=  \
  test-hello.bin \
  test-libaudit.bin  \
  test-libbfd.bin\
+ test-disassembler-four-args.bin\
  test-liberty.bin   \
  test-liberty-z.bin \
  test-cplus-demangle.bin\
@@ -188,6 +189,9 @@ $(OUTPUT)test-libpython-version.bin:
 $(OUTPUT)test-libbfd.bin:

[PATCH v2 bpf-next 1/2] tools/bpftool: use version from the kernel source tree

2017-12-22 Thread Roman Gushchin

Bpftool determines it's own version based on the kernel
version, which is picked from the linux/version.h header.

It's strange to use the version of the installed kernel
headers, and makes much more sense to use the version
of the actual source tree, where bpftool sources are.

Fix this by building kernelversion target and use
the resulting string as bpftool version.

Example:
before:

$ bpftool version
bpftool v4.14.6

after:
$ bpftool version
bpftool v4.15.0-rc3

$bpftool version --json
{"version":"4.15.0-rc3"}

Signed-off-by: Roman Gushchin <g...@fb.com>
Cc: Jakub Kicinski <jakub.kicin...@netronome.com>
Cc: Alexei Starovoitov <a...@kernel.org>
Cc: Daniel Borkmann <dan...@iogearbox.net>
---
 tools/bpf/bpftool/Makefile |  3 +++
 tools/bpf/bpftool/main.c   | 13 ++---
 2 files changed, 5 insertions(+), 11 deletions(-)

diff --git a/tools/bpf/bpftool/Makefile b/tools/bpf/bpftool/Makefile
index 3f17ad317512..f8f31a8d9269 100644
--- a/tools/bpf/bpftool/Makefile
+++ b/tools/bpf/bpftool/Makefile
@@ -23,6 +23,8 @@ endif
 
 LIBBPF = $(BPF_PATH)libbpf.a
 
+BPFTOOL_VERSION=$(shell make --no-print-directory -sC ../../.. kernelversion)
+
 $(LIBBPF): FORCE
$(Q)$(MAKE) -C $(BPF_DIR) OUTPUT=$(OUTPUT) $(OUTPUT)libbpf.a 
FEATURES_DUMP=$(FEATURE_DUMP_EXPORT)
 
@@ -38,6 +40,7 @@ CC = gcc
 CFLAGS += -O2
 CFLAGS += -W -Wall -Wextra -Wno-unused-parameter -Wshadow
 CFLAGS += -D__EXPORTED_HEADERS__ -I$(srctree)/tools/include/uapi 
-I$(srctree)/tools/include -I$(srctree)/tools/lib/bpf -I$(srctree)/kernel/bpf/
+CFLAGS += -DBPFTOOL_VERSION='"$(BPFTOOL_VERSION)"'
 LIBS = -lelf -lbfd -lopcodes $(LIBBPF)
 
 INSTALL ?= install
diff --git a/tools/bpf/bpftool/main.c b/tools/bpf/bpftool/main.c
index ecd53ccf1239..3a0396d87c42 100644
--- a/tools/bpf/bpftool/main.c
+++ b/tools/bpf/bpftool/main.c
@@ -38,7 +38,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -95,21 +94,13 @@ static int do_help(int argc, char **argv)
 
 static int do_version(int argc, char **argv)
 {
-   unsigned int version[3];
-
-   version[0] = LINUX_VERSION_CODE >> 16;
-   version[1] = LINUX_VERSION_CODE >> 8 & 0xf;
-   version[2] = LINUX_VERSION_CODE & 0xf;
-
if (json_output) {
jsonw_start_object(json_wtr);
jsonw_name(json_wtr, "version");
-   jsonw_printf(json_wtr, "\"%u.%u.%u\"",
-version[0], version[1], version[2]);
+   jsonw_printf(json_wtr, "\"%s\"", BPFTOOL_VERSION);
jsonw_end_object(json_wtr);
} else {
-   printf("%s v%u.%u.%u\n", bin_name,
-  version[0], version[1], version[2]);
+   printf("%s v%s\n", bin_name, BPFTOOL_VERSION);
}
return 0;
 }
-- 
2.14.3

Re: [RFC PATCH net-next] tools/bpftool: use version from the kernel source tree

2017-12-21 Thread Roman Gushchin

On Wed, Dec 20, 2017 at 01:52:18PM -0800, Jakub Kicinski wrote:
> On Wed, 20 Dec 2017 20:53:41 +0000, Roman Gushchin wrote:
> > On Wed, Dec 20, 2017 at 12:29:21PM -0800, Jakub Kicinski wrote:
> > > On Wed, 20 Dec 2017 20:19:43 +0000, Roman Gushchin wrote:  
> > > > Bpftool determines it's own version based on the kernel
> > > > version, which is picked from the linux/version.h header.
> > > > 
> > > > It's strange to use the version of the installed kernel
> > > > headers, and makes much more sense to use the version
> > > > of the actual source tree, where bpftool sources are.
> > > > 
> > > > This patch adds $(srctree)/usr/include to the list
> > > > of include files, which causes bpftool to use the version
> > > > from the source tree.
> > > > 
> > > > Example:
> > > > before:
> > > > 
> > > > $ bpftool version
> > > > bpftool v4.14.6
> > > > 
> > > > after:
> > > > $ bpftool version
> > > > bpftool v4.15.0  
> > > 
> > > Thanks for the patch, this would indeed use some improvement.
> > > 
> > > How about we just run make to get the version like liblockdep does?
> > > 
> > > LIBLOCKDEP_VERSION=$(shell make --no-print-directory -sC ../../.. 
> > > kernelversion)
> > > 
> > > probably s@../../..@$(srctree)@
> > > 
> > > $(srctree)/usr/include is not going to be there for out-of-source builds. 
> > >  
> > 
> > Hm, why it's better? It's not only about the kernel version,
> > IMO it's generally better to use includes from the source tree,
> > rather then system-wide installed kernel headers.
> 
> Right I agree the kernel headers are preferred.  I'm not entirely sure
> why we don't use them, if it was OK to assume usr/ is there we wouldn't
> need the tools/include/uapi/ contraption.  Maybe Arnaldo could explain?
> 
> > I've got about out-of-source builds, but do we support it in general?
> > How can I build bpftool outside of the kernel tree?
> > I've tried a bit, but failed.
> 
> This is what I do:
> 
> make -C tools/bpf/bpftool/ W=1 O=/tmp/builds/bpftool

This works perfectly with my patch:

$ make -C ~/linux/tools/bpf/ W=1 O=/home/guro/build/ --trace
<...>
echo '  CC   '/home/guro/build/main.o;gcc -O2 -W -Wall -Wextra 
-Wno-unused-parameter -Wshadow -D__EXPORTED_HEADERS__ 
-I/home/guro/linux/tools/include/uapi -I/home/guro/linux/tools/include 
-I/home/guro/linux/tools/lib/bpf -I/home/guro/linux/kernel/bpf/ 
-I/home/guro/linux/usr/include -DNEW_DISSASSEMBLER_SIGNATURE   -c -MMD -o 
/home/guro/build/main.o main.c
<...>
echo '  LINK '/home/guro/build/bpftool;gcc -O2 -W -Wall -Wextra 
-Wno-unused-parameter -Wshadow -D__EXPORTED_HEADERS__ 
-I/home/guro/linux/tools/include/uapi -I/home/guro/linux/tools/include 
-I/home/guro/linux/tools/lib/bpf -I/home/guro/linux/kernel/bpf/ 
-I/home/guro/linux/usr/include -DNEW_DISSASSEMBLER_SIGNATURE -o 
/home/guro/build/bpftool /home/guro/build/common.o /home/guro/build/cgroup.o 
/home/guro/build/main.o /home/guro/build/json_writer.o /home/guro/build/prog.o 
/home/guro/build/map.o /home/guro/build/jit_disasm.o /home/guro/build/disasm.o 
/home/guro/build/libbpf.a -lelf -lbfd -lopcodes /home/guro/build/libbpf.a
  LINK /home/guro/build/bpftool
make[1]: Leaving directory '/home/guro/linux/tools/bpf/bpftool'
make: Leaving directory '/home/guro/linux/tools/bpf'

$ ./build/bpftool version
./build/bpftool v4.15.0

Thanks!

Re: [RFC PATCH net-next] tools/bpftool: use version from the kernel source tree

2017-12-20 Thread Roman Gushchin

On Wed, Dec 20, 2017 at 12:26:30PM -0800, Yonghong Song wrote:
> 
> 
> On 12/20/17 12:19 PM, Roman Gushchin wrote:
> > Bpftool determines it's own version based on the kernel
> > version, which is picked from the linux/version.h header.
> > 
> > It's strange to use the version of the installed kernel
> > headers, and makes much more sense to use the version
> > of the actual source tree, where bpftool sources are.
> > 
> > This patch adds $(srctree)/usr/include to the list
> > of include files, which causes bpftool to use the version
> > from the source tree.
> > 
> > Example:
> > before:
> > 
> > $ bpftool version
> > bpftool v4.14.6
> > 
> > after:
> > $ bpftool version
> > bpftool v4.15.0
> > 
> > Signed-off-by: Roman Gushchin <g...@fb.com>
> > Cc: Jakub Kicinski <jakub.kicin...@netronome.com>
> > Cc: Alexei Starovoitov <a...@kernel.org>
> > Cc: Daniel Borkmann <dan...@iogearbox.net>
> > ---
> >   tools/bpf/bpftool/Makefile | 4 +++-
> >   1 file changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/tools/bpf/bpftool/Makefile b/tools/bpf/bpftool/Makefile
> > index 9c089cfa5f3f..6864d416c49e 100644
> > --- a/tools/bpf/bpftool/Makefile
> > +++ b/tools/bpf/bpftool/Makefile
> > @@ -37,7 +37,9 @@ CC = gcc
> >   CFLAGS += -O2
> >   CFLAGS += -W -Wall -Wextra -Wno-unused-parameter -Wshadow
> > -CFLAGS += -D__EXPORTED_HEADERS__ -I$(srctree)/tools/include/uapi 
> > -I$(srctree)/tools/include -I$(srctree)/tools/lib/bpf 
> > -I$(srctree)/kernel/bpf/
> > +CFLAGS += -D__EXPORTED_HEADERS__ -I$(srctree)/tools/include/uapi
> > +CFLAGS += -I$(srctree)/tools/include -I$(srctree)/tools/lib/bpf
> > +CFLAGS += -I$(srctree)/kernel/bpf/ -I$(srctree)/usr/include
> 
> -I$(srctree)/usr/include may not work if build directory is not the same as
> the source directory. You probably should use
> -I$(objtree)/usr/include?

$(objtree) is not defined there, so it doesn't work.
Tbh, I struggle to say if it's supposed to work there or not.

Thanks!

Re: [RFC PATCH net-next] tools/bpftool: use version from the kernel source tree

2017-12-20 Thread Roman Gushchin

On Wed, Dec 20, 2017 at 12:29:21PM -0800, Jakub Kicinski wrote:
> On Wed, 20 Dec 2017 20:19:43 +0000, Roman Gushchin wrote:
> > Bpftool determines it's own version based on the kernel
> > version, which is picked from the linux/version.h header.
> > 
> > It's strange to use the version of the installed kernel
> > headers, and makes much more sense to use the version
> > of the actual source tree, where bpftool sources are.
> > 
> > This patch adds $(srctree)/usr/include to the list
> > of include files, which causes bpftool to use the version
> > from the source tree.
> > 
> > Example:
> > before:
> > 
> > $ bpftool version
> > bpftool v4.14.6
> > 
> > after:
> > $ bpftool version
> > bpftool v4.15.0
> 
> Thanks for the patch, this would indeed use some improvement.
> 
> How about we just run make to get the version like liblockdep does?
> 
> LIBLOCKDEP_VERSION=$(shell make --no-print-directory -sC ../../.. 
> kernelversion)
> 
> probably s@../../..@$(srctree)@
> 
> $(srctree)/usr/include is not going to be there for out-of-source builds.

Hm, why it's better? It's not only about the kernel version,
IMO it's generally better to use includes from the source tree,
rather then system-wide installed kernel headers.

I've got about out-of-source builds, but do we support it in general?
How can I build bpftool outside of the kernel tree?
I've tried a bit, but failed.

Thank you!

[RFC PATCH net-next] tools/bpftool: use version from the kernel source tree

2017-12-20 Thread Roman Gushchin

Bpftool determines it's own version based on the kernel
version, which is picked from the linux/version.h header.

It's strange to use the version of the installed kernel
headers, and makes much more sense to use the version
of the actual source tree, where bpftool sources are.

This patch adds $(srctree)/usr/include to the list
of include files, which causes bpftool to use the version
from the source tree.

Example:
before:

$ bpftool version
bpftool v4.14.6

after:
$ bpftool version
bpftool v4.15.0

Signed-off-by: Roman Gushchin <g...@fb.com>
Cc: Jakub Kicinski <jakub.kicin...@netronome.com>
Cc: Alexei Starovoitov <a...@kernel.org>
Cc: Daniel Borkmann <dan...@iogearbox.net>
---
 tools/bpf/bpftool/Makefile | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/bpf/bpftool/Makefile b/tools/bpf/bpftool/Makefile
index 9c089cfa5f3f..6864d416c49e 100644
--- a/tools/bpf/bpftool/Makefile
+++ b/tools/bpf/bpftool/Makefile
@@ -37,7 +37,9 @@ CC = gcc
 
 CFLAGS += -O2
 CFLAGS += -W -Wall -Wextra -Wno-unused-parameter -Wshadow
-CFLAGS += -D__EXPORTED_HEADERS__ -I$(srctree)/tools/include/uapi 
-I$(srctree)/tools/include -I$(srctree)/tools/lib/bpf -I$(srctree)/kernel/bpf/
+CFLAGS += -D__EXPORTED_HEADERS__ -I$(srctree)/tools/include/uapi
+CFLAGS += -I$(srctree)/tools/include -I$(srctree)/tools/lib/bpf
+CFLAGS += -I$(srctree)/kernel/bpf/ -I$(srctree)/usr/include
 LIBS = -lelf -lbfd -lopcodes $(LIBBPF)
 
 INSTALL ?= install
-- 
2.14.3

Re: [RFC PATCH net-next] tools/bpf: fix build with binutils >= 2.28

2017-12-20 Thread Roman Gushchin

On Tue, Dec 19, 2017 at 04:22:51PM +, Quentin Monnet wrote:
> 2017-12-19 16:10 UTC+ ~ Roman Gushchin <g...@fb.com>
> > On Tue, Dec 19, 2017 at 03:57:02PM +, Quentin Monnet wrote:
> >> Hi Roman, thanks for working on this!
> >>
> >>
> >> I discussed this issue with Jakub recently, and one suggestion he had
> >> was to look in tools/build/feature to add a new "feature", by trying to
> >> compile short programs, for making the distinction between binutils
> >> versions. It probably requires more work, but could be more robust than
> >> parsing the version from the command line?
> > 
> > Hm, might be an option. Parsing readelf output is pretty ugly, here I agree.
> > In general it feels more like a binutils issue, so we have to workaround it
> > in either way.
> > 
> > Is Jakub or someone else working on it?
> > 
> > Thanks!
> > 
> 
> Jakub isn't. On our side, I noticed last week that there was this change
> in binutils, and started to have a look at how these "features" work.
> But I have nothing that works so far, so feel free to tackle this.
> 
> Quentin

Hi Quentin!

Can you, please, check that the patch below works in your environment.

Thanks!

--


>From b08deabf42e4c143b9e0eec8c49714e4d2c928e3 Mon Sep 17 00:00:00 2001
From: Roman Gushchin <g...@fb.com>
Date: Wed, 20 Dec 2017 13:27:32 +
Subject: [RFC PATCH net-next] tools/bpftool: fix bpftool build with bintutils
 >= 2.8

Bpftool build is broken with binutils version 2.28 and later.
The cause is commit 003ca0fd2286 ("Refactor disassembler selection")
in the binutils repo, which changed the disassembler() function
signature.

Fix this by adding a new "feature" to the tools/build/features
infrastructure and make it responsible for decision which
disassembler() function signature to use.

Signed-off-by: Roman Gushchin <g...@fb.com>
Cc: Jakub Kicinski <jakub.kicin...@netronome.com>
Cc: Alexei Starovoitov <a...@kernel.org>
Cc: Daniel Borkmann <dan...@iogearbox.net>
---
 tools/bpf/Makefile  | 18 ++
 tools/bpf/bpf_jit_disasm.c  |  7 +++
 tools/bpf/bpftool/Makefile  | 13 +
 tools/bpf/bpftool/jit_disasm.c  |  7 +++
 tools/build/feature/Makefile|  4 
 tools/build/feature/test-disassembler.c | 15 +++
 6 files changed, 64 insertions(+)
 create mode 100644 tools/build/feature/test-disassembler.c

diff --git a/tools/bpf/Makefile b/tools/bpf/Makefile
index 07a6697466ef..c62b3a311486 100644
--- a/tools/bpf/Makefile
+++ b/tools/bpf/Makefile
@@ -9,6 +9,24 @@ MAKE = make
 CFLAGS += -Wall -O2
 CFLAGS += -D__EXPORTED_HEADERS__ -I../../include/uapi -I../../include
 
+ifeq ($(srctree),)
+srctree := $(patsubst %/,%,$(dir $(CURDIR)))
+srctree := $(patsubst %/,%,$(dir $(srctree)))
+endif
+
+FEATURE_TESTS = disassembler
+FEATURE_DISPLAY = disassembler
+
+ifeq ($(FEATURES_DUMP),)
+include $(srctree)/tools/build/Makefile.feature
+else
+include $(FEATURES_DUMP)
+endif
+
+ifeq ($(feature-disassembler), 1)
+CFLAGS += -DNEW_DISSASSEMBLER_SIGNATURE
+endif
+
 %.yacc.c: %.y
$(YACC) -o $@ -d $<
 
diff --git a/tools/bpf/bpf_jit_disasm.c b/tools/bpf/bpf_jit_disasm.c
index 75bf526a0168..a5f4dbacdb11 100644
--- a/tools/bpf/bpf_jit_disasm.c
+++ b/tools/bpf/bpf_jit_disasm.c
@@ -72,7 +72,14 @@ static void get_asm_insns(uint8_t *image, size_t len, int 
opcodes)
 
disassemble_init_for_target();
 
+#ifdef NEW_DISSASSEMBLER_SIGNATURE
+   disassemble = disassembler(bfd_get_arch(bfdf),
+  bfd_big_endian(bfdf),
+  bfd_get_mach(bfdf),
+  bfdf);
+#else
disassemble = disassembler(bfdf);
+#endif
assert(disassemble);
 
do {
diff --git a/tools/bpf/bpftool/Makefile b/tools/bpf/bpftool/Makefile
index 3f17ad317512..9c089cfa5f3f 100644
--- a/tools/bpf/bpftool/Makefile
+++ b/tools/bpf/bpftool/Makefile
@@ -43,6 +43,19 @@ LIBS = -lelf -lbfd -lopcodes $(LIBBPF)
 INSTALL ?= install
 RM ?= rm -f
 
+FEATURE_TESTS = disassembler
+FEATURE_DISPLAY = disassembler
+
+ifeq ($(FEATURES_DUMP),)
+include $(srctree)/tools/build/Makefile.feature
+else
+include $(FEATURES_DUMP)
+endif
+
+ifeq ($(feature-disassembler), 1)
+CFLAGS += -DNEW_DISSASSEMBLER_SIGNATURE
+endif
+
 include $(wildcard *.d)
 
 all: $(OUTPUT)bpftool
diff --git a/tools/bpf/bpftool/jit_disasm.c b/tools/bpf/bpftool/jit_disasm.c
index 1551d3918d4c..8295e2f14ed7 100644
--- a/tools/bpf/bpftool/jit_disasm.c
+++ b/tools/bpf/bpftool/jit_disasm.c
@@ -107,7 +107,14 @@ void disasm_print_insn(unsigned char *image, ssize_t len, 
int opcodes)
 
disassemble_init_for_target();
 
+#ifdef NEW_DISSASSEMBLER_SIGNATURE
+   disassemble = disassembler(bfd_get_arch(bfdf),

Re: [RFC PATCH net-next] tools/bpf: fix build with binutils >= 2.28

2017-12-19 Thread Roman Gushchin

On Tue, Dec 19, 2017 at 03:57:02PM +, Quentin Monnet wrote:
> Hi Roman, thanks for working on this!
> 
> 2017-12-19 14:38 UTC+0000 ~ Roman Gushchin <g...@fb.com>
> > Bpftool build is broken with binutils version 2.28 and later.
> > The cause is commit 003ca0fd2286 ("Refactor disassembler selection")
> > in the binutils repo, which changed the disassembler() function
> > signature.
> >
> > Fix this by checking binutils version and use an appropriate
> > disassembler() signature.
> >
> > Signed-off-by: Roman Gushchin <g...@fb.com>
> > Cc: Jakub Kicinski <jakub.kicin...@netronome.com>
> > Cc: Alexei Starovoitov <a...@kernel.org>
> > Cc: Daniel Borkmann <dan...@iogearbox.net>
> > ---
> >  tools/bpf/Makefile | 6 ++
> >  tools/bpf/bpf_jit_disasm.c | 7 +++
> >  tools/bpf/bpftool/Makefile | 6 ++
> >  tools/bpf/bpftool/jit_disasm.c | 5 +
> >  4 files changed, 24 insertions(+)
> >
> > diff --git a/tools/bpf/Makefile b/tools/bpf/Makefile
> > index 07a6697466ef..3fd32fd0daa1 100644
> > --- a/tools/bpf/Makefile
> > +++ b/tools/bpf/Makefile
> > @@ -6,8 +6,14 @@ LEX = flex
> >  YACC = bison
> >  MAKE = make
> >  
> > +BINUTILS_VER := $(word 4, $(shell readelf -v | head -n 1))
> > +BINUTILS_VER_MAJ := $(word 1, $(subst ., , $(subst -, , ${BINUTILS_VER})))
> > +BINUTILS_VER_MIN := $(word 2, $(subst ., , $(subst -, , ${BINUTILS_VER})))
> > +
> >  CFLAGS += -Wall -O2
> >  CFLAGS += -D__EXPORTED_HEADERS__ -I../../include/uapi -I../../include
> > +CFLAGS += -DBINUTILS_VER_MAJ=${BINUTILS_VER_MAJ}
> > +CFLAGS += -DBINUTILS_VER_MIN=${BINUTILS_VER_MIN}
> >  
> >  %.yacc.c: %.y
> > $(YACC) -o $@ -d $<
> > diff --git a/tools/bpf/bpf_jit_disasm.c b/tools/bpf/bpf_jit_disasm.c
> > index 75bf526a0168..3ef7c8bdc0f3 100644
> > --- a/tools/bpf/bpf_jit_disasm.c
> > +++ b/tools/bpf/bpf_jit_disasm.c
> > @@ -72,7 +72,14 @@ static void get_asm_insns(uint8_t *image, size_t len, 
> > int opcodes)
> >  
> > disassemble_init_for_target();
> >  
> > +#if (BINUTILS_VER_MAJ >= 2) && (BINUTILS_VER_MIN >= 28)
> > +   disassemble = disassembler(bfd_get_arch(bfdf),
> > +  bfd_big_endian(bfdf),
> > +  bfd_get_mach(bfdf),
> > +  bfdf);
> > +#else
> > disassemble = disassembler(bfdf);
> > +#endif
> > assert(disassemble);
> >  
> > do {
> > diff --git a/tools/bpf/bpftool/Makefile b/tools/bpf/bpftool/Makefile
> > index 3f17ad317512..94ad51bf14b5 100644
> > --- a/tools/bpf/bpftool/Makefile
> > +++ b/tools/bpf/bpftool/Makefile
> > @@ -40,6 +40,12 @@ CFLAGS += -W -Wall -Wextra -Wno-unused-parameter -Wshadow
> >  CFLAGS += -D__EXPORTED_HEADERS__ -I$(srctree)/tools/include/uapi 
> > -I$(srctree)/tools/include -I$(srctree)/tools/lib/bpf 
> > -I$(srctree)/kernel/bpf/
> >  LIBS = -lelf -lbfd -lopcodes $(LIBBPF)
> >  
> > +BINUTILS_VER := $(word 4, $(shell readelf -v | head -n 1))
> 
> This does not seem to be portable. I tried that on Ubuntu and `readelf
> -v` returns "GNU readelf (GNU Binutils for Ubuntu) 2.26.1", and
> BINUTILS_VER catches "Binutils".
> 
> > +BINUTILS_VER_MAJ := $(word 1, $(subst ., , $(subst -, , ${BINUTILS_VER})))
> > +BINUTILS_VER_MIN := $(word 2, $(subst ., , $(subst -, , ${BINUTILS_VER})))
> > +CFLAGS += -DBINUTILS_VER_MAJ=${BINUTILS_VER_MAJ}
> > +CFLAGS += -DBINUTILS_VER_MIN=${BINUTILS_VER_MIN}
> > +
> >  INSTALL ?= install
> >  RM ?= rm -f
> >  
> > diff --git a/tools/bpf/bpftool/jit_disasm.c b/tools/bpf/bpftool/jit_disasm.c
> > index 1551d3918d4c..eaa7127e9eeb 100644
> > --- a/tools/bpf/bpftool/jit_disasm.c
> > +++ b/tools/bpf/bpftool/jit_disasm.c
> > @@ -107,7 +107,12 @@ void disasm_print_insn(unsigned char *image, ssize_t 
> > len, int opcodes)
> >  
> > disassemble_init_for_target();
> >  
> > +#if (BINUTILS_VER_MAJ >= 2) && (BINUTILS_VER_MIN >= 28)
> > +   disassemble = disassembler(bfd_get_arch(bfdf), bfd_big_endian(bfdf),
> > +  bfd_get_mach(bfdf), bfdf);
> > +#else
> > disassemble = disassembler(bfdf);
> > +#endif
> > assert(disassemble);
> >  
> > if (json_output)
> 
> I discussed this issue with Jakub recently, and one suggestion he had
> was to look in tools/build/feature to add a new "feature", by trying to
> compile short programs, for making the distinction between binutils
> versions. It probably requires more work, but could be more robust than
> parsing the version from the command line?

Hm, might be an option. Parsing readelf output is pretty ugly, here I agree.
In general it feels more like a binutils issue, so we have to workaround it
in either way.

Is Jakub or someone else working on it?

Thanks!

[RFC PATCH net-next] tools/bpf: fix build with binutils >= 2.28

2017-12-19 Thread Roman Gushchin

Bpftool build is broken with binutils version 2.28 and later.
The cause is commit 003ca0fd2286 ("Refactor disassembler selection")
in the binutils repo, which changed the disassembler() function
signature.

Fix this by checking binutils version and use an appropriate
disassembler() signature.

Signed-off-by: Roman Gushchin <g...@fb.com>
Cc: Jakub Kicinski <jakub.kicin...@netronome.com>
Cc: Alexei Starovoitov <a...@kernel.org>
Cc: Daniel Borkmann <dan...@iogearbox.net>
---
 tools/bpf/Makefile | 6 ++
 tools/bpf/bpf_jit_disasm.c | 7 +++
 tools/bpf/bpftool/Makefile | 6 ++
 tools/bpf/bpftool/jit_disasm.c | 5 +
 4 files changed, 24 insertions(+)

diff --git a/tools/bpf/Makefile b/tools/bpf/Makefile
index 07a6697466ef..3fd32fd0daa1 100644
--- a/tools/bpf/Makefile
+++ b/tools/bpf/Makefile
@@ -6,8 +6,14 @@ LEX = flex
 YACC = bison
 MAKE = make
 
+BINUTILS_VER := $(word 4, $(shell readelf -v | head -n 1))
+BINUTILS_VER_MAJ := $(word 1, $(subst ., , $(subst -, , ${BINUTILS_VER})))
+BINUTILS_VER_MIN := $(word 2, $(subst ., , $(subst -, , ${BINUTILS_VER})))
+
 CFLAGS += -Wall -O2
 CFLAGS += -D__EXPORTED_HEADERS__ -I../../include/uapi -I../../include
+CFLAGS += -DBINUTILS_VER_MAJ=${BINUTILS_VER_MAJ}
+CFLAGS += -DBINUTILS_VER_MIN=${BINUTILS_VER_MIN}
 
 %.yacc.c: %.y
$(YACC) -o $@ -d $<
diff --git a/tools/bpf/bpf_jit_disasm.c b/tools/bpf/bpf_jit_disasm.c
index 75bf526a0168..3ef7c8bdc0f3 100644
--- a/tools/bpf/bpf_jit_disasm.c
+++ b/tools/bpf/bpf_jit_disasm.c
@@ -72,7 +72,14 @@ static void get_asm_insns(uint8_t *image, size_t len, int 
opcodes)
 
disassemble_init_for_target();
 
+#if (BINUTILS_VER_MAJ >= 2) && (BINUTILS_VER_MIN >= 28)
+   disassemble = disassembler(bfd_get_arch(bfdf),
+  bfd_big_endian(bfdf),
+  bfd_get_mach(bfdf),
+  bfdf);
+#else
disassemble = disassembler(bfdf);
+#endif
assert(disassemble);
 
do {
diff --git a/tools/bpf/bpftool/Makefile b/tools/bpf/bpftool/Makefile
index 3f17ad317512..94ad51bf14b5 100644
--- a/tools/bpf/bpftool/Makefile
+++ b/tools/bpf/bpftool/Makefile
@@ -40,6 +40,12 @@ CFLAGS += -W -Wall -Wextra -Wno-unused-parameter -Wshadow
 CFLAGS += -D__EXPORTED_HEADERS__ -I$(srctree)/tools/include/uapi 
-I$(srctree)/tools/include -I$(srctree)/tools/lib/bpf -I$(srctree)/kernel/bpf/
 LIBS = -lelf -lbfd -lopcodes $(LIBBPF)
 
+BINUTILS_VER := $(word 4, $(shell readelf -v | head -n 1))
+BINUTILS_VER_MAJ := $(word 1, $(subst ., , $(subst -, , ${BINUTILS_VER})))
+BINUTILS_VER_MIN := $(word 2, $(subst ., , $(subst -, , ${BINUTILS_VER})))
+CFLAGS += -DBINUTILS_VER_MAJ=${BINUTILS_VER_MAJ}
+CFLAGS += -DBINUTILS_VER_MIN=${BINUTILS_VER_MIN}
+
 INSTALL ?= install
 RM ?= rm -f
 
diff --git a/tools/bpf/bpftool/jit_disasm.c b/tools/bpf/bpftool/jit_disasm.c
index 1551d3918d4c..eaa7127e9eeb 100644
--- a/tools/bpf/bpftool/jit_disasm.c
+++ b/tools/bpf/bpftool/jit_disasm.c
@@ -107,7 +107,12 @@ void disasm_print_insn(unsigned char *image, ssize_t len, 
int opcodes)
 
disassemble_init_for_target();
 
+#if (BINUTILS_VER_MAJ >= 2) && (BINUTILS_VER_MIN >= 28)
+   disassemble = disassembler(bfd_get_arch(bfdf), bfd_big_endian(bfdf),
+  bfd_get_mach(bfdf), bfdf);
+#else
disassemble = disassembler(bfdf);
+#endif
assert(disassemble);
 
if (json_output)
-- 
2.14.3

[PATCH v4 net-next 4/4] bpftool: implement cgroup bpf operations

2017-12-13 Thread Roman Gushchin

This patch adds basic cgroup bpf operations to bpftool:
cgroup list, attach and detach commands.

Usage is described in the corresponding man pages,
and examples are provided.

Syntax:
$ bpftool cgroup list CGROUP
$ bpftool cgroup attach CGROUP ATTACH_TYPE PROG [ATTACH_FLAGS]
$ bpftool cgroup detach CGROUP ATTACH_TYPE PROG

Signed-off-by: Roman Gushchin <g...@fb.com>
Cc: Alexei Starovoitov <a...@kernel.org>
Cc: Daniel Borkmann <dan...@iogearbox.net>
Cc: Jakub Kicinski <jakub.kicin...@netronome.com>
Cc: Martin KaFai Lau <ka...@fb.com>
Cc: Quentin Monnet <quentin.mon...@netronome.com>
Reviewed-by: David Ahern <dsah...@gmail.com>
---
 tools/bpf/bpftool/Documentation/bpftool-cgroup.rst | 118 
 tools/bpf/bpftool/Documentation/bpftool-map.rst|   2 +-
 tools/bpf/bpftool/Documentation/bpftool-prog.rst   |   2 +-
 tools/bpf/bpftool/Documentation/bpftool.rst|   6 +-
 tools/bpf/bpftool/cgroup.c | 307 +
 tools/bpf/bpftool/main.c   |   3 +-
 tools/bpf/bpftool/main.h   |   1 +
 7 files changed, 434 insertions(+), 5 deletions(-)
 create mode 100644 tools/bpf/bpftool/Documentation/bpftool-cgroup.rst
 create mode 100644 tools/bpf/bpftool/cgroup.c

diff --git a/tools/bpf/bpftool/Documentation/bpftool-cgroup.rst 
b/tools/bpf/bpftool/Documentation/bpftool-cgroup.rst
new file mode 100644
index ..45c71b1f682b
--- /dev/null
+++ b/tools/bpf/bpftool/Documentation/bpftool-cgroup.rst
@@ -0,0 +1,118 @@
+
+bpftool-cgroup
+
+---
+tool for inspection and simple manipulation of eBPF progs
+---
+
+:Manual section: 8
+
+SYNOPSIS
+
+
+   **bpftool** [*OPTIONS*] **cgroup** *COMMAND*
+
+   *OPTIONS* := { { **-j** | **--json** } [{ **-p** | **--pretty** }] | { 
**-f** | **--bpffs** } }
+
+   *COMMANDS* :=
+   { **list** | **attach** | **detach** | **help** }
+
+MAP COMMANDS
+=
+
+|  **bpftool** **cgroup list** *CGROUP*
+|  **bpftool** **cgroup attach** *CGROUP* *ATTACH_TYPE* *PROG* 
[*ATTACH_FLAGS*]
+|  **bpftool** **cgroup detach** *CGROUP* *ATTACH_TYPE* *PROG*
+|  **bpftool** **cgroup help**
+|
+|  *PROG* := { **id** *PROG_ID* | **pinned** *FILE* | **tag** *PROG_TAG* }
+|  *ATTACH_TYPE* := { *ingress* | *egress* | *sock_create* | *sock_ops* | 
*device* }
+|  *ATTACH_FLAGS* := { *multi* | *override* }
+
+DESCRIPTION
+===
+   **bpftool cgroup list** *CGROUP*
+ List all programs attached to the cgroup *CGROUP*.
+
+ Output will start with program ID followed by attach type,
+ attach flags and program name.
+
+   **bpftool cgroup attach** *CGROUP* *ATTACH_TYPE* *PROG* [*ATTACH_FLAGS*]
+ Attach program *PROG* to the cgroup *CGROUP* with attach type
+ *ATTACH_TYPE* and optional *ATTACH_FLAGS*.
+
+ *ATTACH_FLAGS* can be one of: **override** if a sub-cgroup 
installs
+ some bpf program, the program in this cgroup yields to 
sub-cgroup
+ program; **multi** if a sub-cgroup installs some bpf program,
+ that cgroup program gets run in addition to the program in 
this
+ cgroup.
+
+ Only one program is allowed to be attached to a cgroup with
+ no attach flags or the **override** flag. Attaching another
+ program will release old program and attach the new one.
+
+ Multiple programs are allowed to be attached to a cgroup with
+ **multi**. They are executed in FIFO order (those that were
+ attached first, run first).
+
+ Non-default *ATTACH_FLAGS* are supported by kernel version 
4.14
+ and later.
+
+ *ATTACH_TYPE* can be on of:
+ **ingress** ingress path of the inet socket (since 4.10);
+ **egress** egress path of the inet socket (since 4.10);
+ **sock_create** opening of an inet socket (since 4.10);
+ **sock_ops** various socket operations (since 4.12);
+ **device** device access (since 4.15).
+
+   **bpftool cgroup detach** *CGROUP* *ATTACH_TYPE* *PROG*
+ Detach *PROG* from the cgroup *CGROUP* and attach type
+ *ATTACH_TYPE*.
+
+   **bpftool prog help**
+ Print short help message.
+
+OPTIONS
+===
+   -h, --help
+ Print short generic help message (similar to **bpftool 
help**).
+
+   -v, --version
+ Print version number (similar to **bpftool version**).
+
+   -j, --json
+ Generate JSON output. For commands that cannot produce JSON, 
t

[PATCH v4 net-next 1/4] libbpf: add ability to guess program type based on section name

2017-12-13 Thread Roman Gushchin

The bpf_prog_load() function will guess program type if it's not
specified explicitly. This functionality will be used to implement
loading of different programs without asking a user to specify
the program type. In first order it will be used by bpftool.

Signed-off-by: Roman Gushchin <g...@fb.com>
Cc: Alexei Starovoitov <a...@kernel.org>
Cc: Daniel Borkmann <dan...@iogearbox.net>
Cc: Jakub Kicinski <jakub.kicin...@netronome.com>
Cc: Martin KaFai Lau <ka...@fb.com>
Cc: Quentin Monnet <quentin.mon...@netronome.com>
Cc: David Ahern <dsah...@gmail.com>
---
 tools/lib/bpf/libbpf.c | 51 ++
 1 file changed, 51 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 5aa45f89da93..205b7822fa0a 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -1721,6 +1721,45 @@ BPF_PROG_TYPE_FNS(tracepoint, BPF_PROG_TYPE_TRACEPOINT);
 BPF_PROG_TYPE_FNS(xdp, BPF_PROG_TYPE_XDP);
 BPF_PROG_TYPE_FNS(perf_event, BPF_PROG_TYPE_PERF_EVENT);
 
+#define BPF_PROG_SEC(string, type) { string, sizeof(string), type }
+static const struct {
+   const char *sec;
+   size_t len;
+   enum bpf_prog_type prog_type;
+} section_names[] = {
+   BPF_PROG_SEC("socket",  BPF_PROG_TYPE_SOCKET_FILTER),
+   BPF_PROG_SEC("kprobe/", BPF_PROG_TYPE_KPROBE),
+   BPF_PROG_SEC("kretprobe/",  BPF_PROG_TYPE_KPROBE),
+   BPF_PROG_SEC("tracepoint/", BPF_PROG_TYPE_TRACEPOINT),
+   BPF_PROG_SEC("xdp", BPF_PROG_TYPE_XDP),
+   BPF_PROG_SEC("perf_event",  BPF_PROG_TYPE_PERF_EVENT),
+   BPF_PROG_SEC("cgroup/skb",  BPF_PROG_TYPE_CGROUP_SKB),
+   BPF_PROG_SEC("cgroup/sock", BPF_PROG_TYPE_CGROUP_SOCK),
+   BPF_PROG_SEC("cgroup/dev",  BPF_PROG_TYPE_CGROUP_DEVICE),
+   BPF_PROG_SEC("sockops", BPF_PROG_TYPE_SOCK_OPS),
+   BPF_PROG_SEC("sk_skb",  BPF_PROG_TYPE_SK_SKB),
+};
+#undef BPF_PROG_SEC
+
+static enum bpf_prog_type bpf_program__guess_type(struct bpf_program *prog)
+{
+   int i;
+
+   if (!prog->section_name)
+   goto err;
+
+   for (i = 0; i < ARRAY_SIZE(section_names); i++)
+   if (strncmp(prog->section_name, section_names[i].sec,
+   section_names[i].len) == 0)
+   return section_names[i].prog_type;
+
+err:
+   pr_warning("failed to guess program type based on section name %s\n",
+  prog->section_name);
+
+   return BPF_PROG_TYPE_UNSPEC;
+}
+
 int bpf_map__fd(struct bpf_map *map)
 {
return map ? map->fd : -EINVAL;
@@ -1832,6 +1871,18 @@ int bpf_prog_load(const char *file, enum bpf_prog_type 
type,
return -ENOENT;
}
 
+   /*
+* If type is not specified, try to guess it based on
+* section name.
+*/
+   if (type == BPF_PROG_TYPE_UNSPEC) {
+   type = bpf_program__guess_type(prog);
+   if (type == BPF_PROG_TYPE_UNSPEC) {
+   bpf_object__close(obj);
+   return -EINVAL;
+   }
+   }
+
bpf_program__set_type(prog, type);
err = bpf_object__load(obj);
if (err) {
-- 
2.14.3

[PATCH v4 net-next 3/4] bpftool: implement prog load command

2017-12-13 Thread Roman Gushchin

Add the prog load command to load a bpf program from a specified
binary file and pin it to bpffs.

Usage description and examples are given in the corresponding man
page.

Syntax:
$ bpftool prog load OBJ FILE

FILE is a non-existing file on bpffs.

Signed-off-by: Roman Gushchin <g...@fb.com>
Cc: Alexei Starovoitov <a...@kernel.org>
Cc: Daniel Borkmann <dan...@iogearbox.net>
Reviewed-by: Jakub Kicinski <jakub.kicin...@netronome.com>
Cc: Martin KaFai Lau <ka...@fb.com>
Cc: Quentin Monnet <quentin.mon...@netronome.com>
Cc: David Ahern <dsah...@gmail.com>
---
 tools/bpf/bpftool/Documentation/bpftool-prog.rst | 10 +++-
 tools/bpf/bpftool/Documentation/bpftool.rst  |  2 +-
 tools/bpf/bpftool/common.c   | 71 +---
 tools/bpf/bpftool/main.h |  1 +
 tools/bpf/bpftool/prog.c | 29 +-
 5 files changed, 79 insertions(+), 34 deletions(-)

diff --git a/tools/bpf/bpftool/Documentation/bpftool-prog.rst 
b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
index 36e8d1c3c40d..ffdb20e8280f 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-prog.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
@@ -15,7 +15,7 @@ SYNOPSIS
*OPTIONS* := { { **-j** | **--json** } [{ **-p** | **--pretty** }] | { 
**-f** | **--bpffs** } }
 
*COMMANDS* :=
-   { **show** | **dump xlated** | **dump jited** | **pin** | **help** }
+   { **show** | **dump xlated** | **dump jited** | **pin** | **load** | 
**help** }
 
 MAP COMMANDS
 =
@@ -24,6 +24,7 @@ MAP COMMANDS
 |  **bpftool** **prog dump xlated** *PROG* [{**file** *FILE* | 
**opcodes**}]
 |  **bpftool** **prog dump jited**  *PROG* [{**file** *FILE* | 
**opcodes**}]
 |  **bpftool** **prog pin** *PROG* *FILE*
+|  **bpftool** **prog load** *OBJ* *FILE*
 |  **bpftool** **prog help**
 |
 |  *PROG* := { **id** *PROG_ID* | **pinned** *FILE* | **tag** *PROG_TAG* }
@@ -57,6 +58,11 @@ DESCRIPTION
 
  Note: *FILE* must be located in *bpffs* mount.
 
+   **bpftool prog load** *OBJ* *FILE*
+ Load bpf program from binary *OBJ* and pin as *FILE*.
+
+ Note: *FILE* must be located in *bpffs* mount.
+
**bpftool prog help**
  Print short help message.
 
@@ -126,8 +132,10 @@ EXAMPLES
 |
 | **# mount -t bpf none /sys/fs/bpf/**
 | **# bpftool prog pin id 10 /sys/fs/bpf/prog**
+| **# bpftool prog load ./my_prog.o /sys/fs/bpf/prog2**
 | **# ls -l /sys/fs/bpf/**
 |   -rw--- 1 root root 0 Jul 22 01:43 prog
+|   -rw--- 1 root root 0 Jul 22 01:44 prog2
 
 **# bpftool prog dum jited pinned /sys/fs/bpf/prog opcodes**
 
diff --git a/tools/bpf/bpftool/Documentation/bpftool.rst 
b/tools/bpf/bpftool/Documentation/bpftool.rst
index 926c03d5a8da..f547a0c0aa34 100644
--- a/tools/bpf/bpftool/Documentation/bpftool.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool.rst
@@ -26,7 +26,7 @@ SYNOPSIS
| **pin** | **help** }
 
*PROG-COMMANDS* := { **show** | **dump jited** | **dump xlated** | 
**pin**
-   | **help** }
+   | **load** | **help** }
 
 DESCRIPTION
 ===
diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
index 2bd3b280e6dd..b62c94e3997a 100644
--- a/tools/bpf/bpftool/common.c
+++ b/tools/bpf/bpftool/common.c
@@ -163,13 +163,49 @@ int open_obj_pinned_any(char *path, enum bpf_obj_type 
exp_type)
return fd;
 }
 
-int do_pin_any(int argc, char **argv, int (*get_fd_by_id)(__u32))
+int do_pin_fd(int fd, const char *name)
 {
char err_str[ERR_MAX_LEN];
-   unsigned int id;
-   char *endptr;
char *file;
char *dir;
+   int err = 0;
+
+   err = bpf_obj_pin(fd, name);
+   if (!err)
+   goto out;
+
+   file = malloc(strlen(name) + 1);
+   strcpy(file, name);
+   dir = dirname(file);
+
+   if (errno != EPERM || is_bpffs(dir)) {
+   p_err("can't pin the object (%s): %s", name, strerror(errno));
+   goto out_free;
+   }
+
+   /* Attempt to mount bpffs, then retry pinning. */
+   err = mnt_bpffs(dir, err_str, ERR_MAX_LEN);
+   if (!err) {
+   err = bpf_obj_pin(fd, name);
+   if (err)
+   p_err("can't pin the object (%s): %s", name,
+ strerror(errno));
+   } else {
+   err_str[ERR_MAX_LEN - 1] = '\0';
+   p_err("can't mount BPF file system to pin the object (%s): %s",
+ name, err_str);
+   }
+
+out_free:
+   free(file);
+out:
+   return err;
+}
+
+int do_pin_any(int argc, char **argv, int (*get_fd_by_id)(__u32))
+{
+   unsigned int id;
+   char *endptr;
int err;
int fd;
 
@@ -195,35 +231,8 @@ int do_pin_any(int argc, char **argv, int 
(*get_fd_by_id)(__u32))
return -1;
}
 
-   err =

[PATCH v4 net-next 2/4] libbpf: prefer global symbols as bpf program name source

2017-12-13 Thread Roman Gushchin

Libbpf picks the name of the first symbol in the corresponding
elf section to use as a program name. But without taking symbol's
scope into account it may end's up with some local label
as a program name. E.g.:

$ bpftool prog
1: type 15  name LBB0_10tag 0390a5136ba23f5c
loaded_at Dec 07/17:22  uid 0
xlated 456B  not jited  memlock 4096B

Fix this by preferring global symbols as program name.

For instance:
$ bpftool prog
1: type 15  name bpf_prog1  tag 0390a5136ba23f5c
loaded_at Dec 07/17:26  uid 0
xlated 456B  not jited  memlock 4096B

Signed-off-by: Roman Gushchin <g...@fb.com>
Cc: Alexei Starovoitov <a...@kernel.org>
Cc: Daniel Borkmann <dan...@iogearbox.net>
Cc: Jakub Kicinski <jakub.kicin...@netronome.com>
Cc: Martin KaFai Lau <ka...@fb.com>
Cc: Quentin Monnet <quentin.mon...@netronome.com>
Cc: David Ahern <dsah...@gmail.com>
---
 tools/lib/bpf/libbpf.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 205b7822fa0a..65d0d0aff4fa 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -387,6 +387,8 @@ bpf_object__init_prog_names(struct bpf_object *obj)
continue;
if (sym.st_shndx != prog->idx)
continue;
+   if (GELF_ST_BIND(sym.st_info) != STB_GLOBAL)
+   continue;
 
name = elf_strptr(obj->efile.elf,
  obj->efile.strtabidx,
-- 
2.14.3

[PATCH v4 net-next 0/4] bpftool: cgroup bpf operations

2017-12-13 Thread Roman Gushchin

This patchset adds basic cgroup bpf operations to bpftool.

Right now there is no convenient way to perform these operations.
The /samples/bpf/load_sock_ops.c implements attach/detacg operations,
but only for BPF_CGROUP_SOCK_OPS programs. Bps (part of bcc) implements
bpf introspection, but lacks any cgroup-related specific.

I find having a tool to perform these basic operations in the kernel tree
very useful, as it can be used in the corresponding bpf documentation
without creating additional dependencies. And bpftool seems to be
a right tool to extend with such functionality.

v4:
  - ATTACH_FLAGS and ATTACH_TYPE are listed and described in docs and usage
  - ATTACH_FLAG names converted to "multi" and "override"
  - do_attach() recognizes ATTACH_FLAG abbreviations, e.g "mul"
  - Local variables sorted ("reverse Christmas tree")
  - unknown attach flags value will be never truncated

v3:
  - SRC replaced with OBJ in prog load docs
  - Output unknown attach type in hex
  - License header in SPDX format
  - Minor style fixes (e.g. variable reordering)

v2:
  - Added prog load operations
  - All cgroup operations are looking like bpftool cgroup 
  - All cgroup-related stuff is moved to a separate file
  - Added support for attach flags
  - Added support for attaching/detaching programs by id, pinned name, etc
  - Changed cgroup detach arguments order
  - Added empty json output for succesful programs
  - Style fixed: includes order, strncmp and macroses, error handling
  - Added man pages

v1:
  https://lwn.net/Articles/740366/

Roman Gushchin (4):
  libbpf: add ability to guess program type based on section name
  libbpf: prefer global symbols as bpf program name source
  bpftool: implement prog load command
  bpftool: implement cgroup bpf operations

 tools/bpf/bpftool/Documentation/bpftool-cgroup.rst | 118 
 tools/bpf/bpftool/Documentation/bpftool-map.rst|   2 +-
 tools/bpf/bpftool/Documentation/bpftool-prog.rst   |  12 +-
 tools/bpf/bpftool/Documentation/bpftool.rst|   8 +-
 tools/bpf/bpftool/cgroup.c | 307 +
 tools/bpf/bpftool/common.c |  71 ++---
 tools/bpf/bpftool/main.c   |   3 +-
 tools/bpf/bpftool/main.h   |   2 +
 tools/bpf/bpftool/prog.c   |  29 +-
 tools/lib/bpf/libbpf.c |  53 
 10 files changed, 566 insertions(+), 39 deletions(-)
 create mode 100644 tools/bpf/bpftool/Documentation/bpftool-cgroup.rst
 create mode 100644 tools/bpf/bpftool/cgroup.c

-- 
2.14.3

Re: [PATCH] selftests: bpf: Adding config fragment CONFIG_CGROUP_BPF=y

2017-12-11 Thread Roman Gushchin

Hi Naresh,

Looks good!

Thanks!

On Tue, Dec 12, 2017 at 12:55:23AM +0530, Naresh Kamboju wrote:
> CONFIG_CGROUP_BPF=y is required for test_dev_cgroup test case.
> 
> Signed-off-by: Naresh Kamboju 
> ---
>  tools/testing/selftests/bpf/config | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/tools/testing/selftests/bpf/config 
> b/tools/testing/selftests/bpf/config
> index 52d53ed..9d48973 100644
> --- a/tools/testing/selftests/bpf/config
> +++ b/tools/testing/selftests/bpf/config
> @@ -3,3 +3,4 @@ CONFIG_BPF_SYSCALL=y
>  CONFIG_NET_CLS_BPF=m
>  CONFIG_BPF_EVENTS=y
>  CONFIG_TEST_BPF=m
> +CONFIG_CGROUP_BPF=y
> -- 
> 2.7.4
>

Re: [PATCH v2 net-next 4/4] bpftool: implement cgroup bpf operations

2017-12-09 Thread Roman Gushchin

On Fri, Dec 08, 2017 at 03:39:43PM +, Quentin Monnet wrote:
> 2017-12-08 14:12 UTC+ ~ Roman Gushchin <g...@fb.com>
> > On Fri, Dec 08, 2017 at 10:34:16AM +, Quentin Monnet wrote:
> >> 2017-12-07 18:39 UTC+ ~ Roman Gushchin <g...@fb.com>
> >>> This patch adds basic cgroup bpf operations to bpftool:
> >>> cgroup list, attach and detach commands.
> >>>
> >>> Usage is described in the corresponding man pages,
> >>> and examples are provided.
> > [...]
> >>> +MAP COMMANDS
> >>> +=
> >>> +
> >>> +|**bpftool** **cgroup list** *CGROUP*
> >>> +|**bpftool** **cgroup attach** *CGROUP* *ATTACH_TYPE* *PROG* 
> >>> [*ATTACH_FLAGS*]
> >>> +|**bpftool** **cgroup detach** *CGROUP* *ATTACH_TYPE* *PROG*
> >>> +|**bpftool** **cgroup help**
> >>> +|
> >>> +|*PROG* := { **id** *PROG_ID* | **pinned** *FILE* | **tag** 
> >>> *PROG_TAG* }
> >>
> >> Could you please give the different possible values for ATTACH_TYPE and
> >> ATTACH_FLAGS, and provide some documentation for the flags?
> > 
> > I intentionally didn't include the list of possible values, as it depends
> > on the exact kernel version, and other bpftool docs are carefully avoiding
> > specifying such things.
> 
> Do they? As far as I can tell the only other bpftool command that uses
> flags is the `bpftool map update`, and it does specify the possible
> values for UPDATE_FLAGS (and document them) in the man page.

You are right about UPDATE_FLAGS, but at the same time we do
not describe bpf program attributes in prog show:
  **bpftool prog show** [*PROG*]
  Show information about loaded programs.  If *PROG* is
  specified show information only about given program, otherwise
  list all programs currently loaded on the system.

  Output will start with program ID followed by program type and
  zero or more named attributes (depending on kernel version).

I think, that actually ATTACH_TYPE is similar to PROGRAM_TYPE because
it will likely be extended in the following kernel versions.
So we should probably support specifying it in a numeric form too.

ATTACH_FLAGS are probably less volatile and will unlikely be extended often,
so we can describe them in docs and add a note about the kernel version
next time when a new flag will be added.

Anyway, I don't see any big problem in documenting current ATTACH_FLAG
and ATTACH_TYPE sets, if we think that it's a good way forward.

Thanks!

[PATCH v3 net-next 0/4] bpftool: cgroup bpf operations

2017-12-08 Thread Roman Gushchin

This patchset adds basic cgroup bpf operations to bpftool.

Right now there is no convenient way to perform these operations.
The /samples/bpf/load_sock_ops.c implements attach/detacg operations,
but only for BPF_CGROUP_SOCK_OPS programs. Bps (part of bcc) implements
bpf introspection, but lacks any cgroup-related specific.

I find having a tool to perform these basic operations in the kernel tree
very useful, as it can be used in the corresponding bpf documentation
without creating additional dependencies. And bpftool seems to be
a right tool to extend with such functionality.

v3:
  - SRC replaced with OBJ in prog load docs
  - Output unknown attach type in hex
  - License header in SPDX format
  - Minor style fixes (e.g. variable reordering)

v2:
  - Added prog load operations
  - All cgroup operations are looking like bpftool cgroup 
  - All cgroup-related stuff is moved to a separate file
  - Added support for attach flags
  - Added support for attaching/detaching programs by id, pinned name, etc
  - Changed cgroup detach arguments order
  - Added empty json output for succesful programs
  - Style fixed: includes order, strncmp and macroses, error handling
  - Added man pages

v1:
  https://lwn.net/Articles/740366/

Roman Gushchin (4):
  libbpf: add ability to guess program type based on section name
  libbpf: prefer global symbols as bpf program name source
  bpftool: implement prog load command
  bpftool: implement cgroup bpf operations

 tools/bpf/bpftool/Documentation/bpftool-cgroup.rst |  92 +++
 tools/bpf/bpftool/Documentation/bpftool-map.rst|   2 +-
 tools/bpf/bpftool/Documentation/bpftool-prog.rst   |  12 +-
 tools/bpf/bpftool/Documentation/bpftool.rst|   8 +-
 tools/bpf/bpftool/cgroup.c | 300 +
 tools/bpf/bpftool/common.c |  71 ++---
 tools/bpf/bpftool/main.c   |   3 +-
 tools/bpf/bpftool/main.h   |   2 +
 tools/bpf/bpftool/prog.c   |  29 +-
 tools/lib/bpf/libbpf.c |  53 
 10 files changed, 533 insertions(+), 39 deletions(-)
 create mode 100644 tools/bpf/bpftool/Documentation/bpftool-cgroup.rst
 create mode 100644 tools/bpf/bpftool/cgroup.c

-- 
2.14.3

[PATCH v3 net-next 3/4] bpftool: implement prog load command

2017-12-08 Thread Roman Gushchin

Add the prog load command to load a bpf program from a specified
binary file and pin it to bpffs.

Usage description and examples are given in the corresponding man
page.

Syntax:
$ bpftool prog load OBJ FILE

FILE is a non-existing file on bpffs.

Signed-off-by: Roman Gushchin <g...@fb.com>
Cc: Alexei Starovoitov <a...@kernel.org>
Cc: Daniel Borkmann <dan...@iogearbox.net>
Reviewed-by: Jakub Kicinski <jakub.kicin...@netronome.com>
Cc: Martin KaFai Lau <ka...@fb.com>
Cc: Quentin Monnet <quentin.mon...@netronome.com>
Cc: David Ahern <dsah...@gmail.com>
---
 tools/bpf/bpftool/Documentation/bpftool-prog.rst | 10 +++-
 tools/bpf/bpftool/Documentation/bpftool.rst  |  2 +-
 tools/bpf/bpftool/common.c   | 71 +---
 tools/bpf/bpftool/main.h |  1 +
 tools/bpf/bpftool/prog.c | 29 +-
 5 files changed, 79 insertions(+), 34 deletions(-)

diff --git a/tools/bpf/bpftool/Documentation/bpftool-prog.rst 
b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
index 36e8d1c3c40d..ffdb20e8280f 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-prog.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
@@ -15,7 +15,7 @@ SYNOPSIS
*OPTIONS* := { { **-j** | **--json** } [{ **-p** | **--pretty** }] | { 
**-f** | **--bpffs** } }
 
*COMMANDS* :=
-   { **show** | **dump xlated** | **dump jited** | **pin** | **help** }
+   { **show** | **dump xlated** | **dump jited** | **pin** | **load** | 
**help** }
 
 MAP COMMANDS
 =
@@ -24,6 +24,7 @@ MAP COMMANDS
 |  **bpftool** **prog dump xlated** *PROG* [{**file** *FILE* | 
**opcodes**}]
 |  **bpftool** **prog dump jited**  *PROG* [{**file** *FILE* | 
**opcodes**}]
 |  **bpftool** **prog pin** *PROG* *FILE*
+|  **bpftool** **prog load** *OBJ* *FILE*
 |  **bpftool** **prog help**
 |
 |  *PROG* := { **id** *PROG_ID* | **pinned** *FILE* | **tag** *PROG_TAG* }
@@ -57,6 +58,11 @@ DESCRIPTION
 
  Note: *FILE* must be located in *bpffs* mount.
 
+   **bpftool prog load** *OBJ* *FILE*
+ Load bpf program from binary *OBJ* and pin as *FILE*.
+
+ Note: *FILE* must be located in *bpffs* mount.
+
**bpftool prog help**
  Print short help message.
 
@@ -126,8 +132,10 @@ EXAMPLES
 |
 | **# mount -t bpf none /sys/fs/bpf/**
 | **# bpftool prog pin id 10 /sys/fs/bpf/prog**
+| **# bpftool prog load ./my_prog.o /sys/fs/bpf/prog2**
 | **# ls -l /sys/fs/bpf/**
 |   -rw--- 1 root root 0 Jul 22 01:43 prog
+|   -rw--- 1 root root 0 Jul 22 01:44 prog2
 
 **# bpftool prog dum jited pinned /sys/fs/bpf/prog opcodes**
 
diff --git a/tools/bpf/bpftool/Documentation/bpftool.rst 
b/tools/bpf/bpftool/Documentation/bpftool.rst
index 926c03d5a8da..f547a0c0aa34 100644
--- a/tools/bpf/bpftool/Documentation/bpftool.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool.rst
@@ -26,7 +26,7 @@ SYNOPSIS
| **pin** | **help** }
 
*PROG-COMMANDS* := { **show** | **dump jited** | **dump xlated** | 
**pin**
-   | **help** }
+   | **load** | **help** }
 
 DESCRIPTION
 ===
diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
index 2bd3b280e6dd..b62c94e3997a 100644
--- a/tools/bpf/bpftool/common.c
+++ b/tools/bpf/bpftool/common.c
@@ -163,13 +163,49 @@ int open_obj_pinned_any(char *path, enum bpf_obj_type 
exp_type)
return fd;
 }
 
-int do_pin_any(int argc, char **argv, int (*get_fd_by_id)(__u32))
+int do_pin_fd(int fd, const char *name)
 {
char err_str[ERR_MAX_LEN];
-   unsigned int id;
-   char *endptr;
char *file;
char *dir;
+   int err = 0;
+
+   err = bpf_obj_pin(fd, name);
+   if (!err)
+   goto out;
+
+   file = malloc(strlen(name) + 1);
+   strcpy(file, name);
+   dir = dirname(file);
+
+   if (errno != EPERM || is_bpffs(dir)) {
+   p_err("can't pin the object (%s): %s", name, strerror(errno));
+   goto out_free;
+   }
+
+   /* Attempt to mount bpffs, then retry pinning. */
+   err = mnt_bpffs(dir, err_str, ERR_MAX_LEN);
+   if (!err) {
+   err = bpf_obj_pin(fd, name);
+   if (err)
+   p_err("can't pin the object (%s): %s", name,
+ strerror(errno));
+   } else {
+   err_str[ERR_MAX_LEN - 1] = '\0';
+   p_err("can't mount BPF file system to pin the object (%s): %s",
+ name, err_str);
+   }
+
+out_free:
+   free(file);
+out:
+   return err;
+}
+
+int do_pin_any(int argc, char **argv, int (*get_fd_by_id)(__u32))
+{
+   unsigned int id;
+   char *endptr;
int err;
int fd;
 
@@ -195,35 +231,8 @@ int do_pin_any(int argc, char **argv, int 
(*get_fd_by_id)(__u32))
return -1;
}
 
-   err =

[PATCH v3 net-next 2/4] libbpf: prefer global symbols as bpf program name source

2017-12-08 Thread Roman Gushchin

Libbpf picks the name of the first symbol in the corresponding
elf section to use as a program name. But without taking symbol's
scope into account it may end's up with some local label
as a program name. E.g.:

$ bpftool prog
1: type 15  name LBB0_10tag 0390a5136ba23f5c
loaded_at Dec 07/17:22  uid 0
xlated 456B  not jited  memlock 4096B

Fix this by preferring global symbols as program name.

For instance:
$ bpftool prog
1: type 15  name bpf_prog1  tag 0390a5136ba23f5c
loaded_at Dec 07/17:26  uid 0
xlated 456B  not jited  memlock 4096B

Signed-off-by: Roman Gushchin <g...@fb.com>
Cc: Alexei Starovoitov <a...@kernel.org>
Cc: Daniel Borkmann <dan...@iogearbox.net>
Cc: Jakub Kicinski <jakub.kicin...@netronome.com>
Cc: Martin KaFai Lau <ka...@fb.com>
Cc: Quentin Monnet <quentin.mon...@netronome.com>
Cc: David Ahern <dsah...@gmail.com>
---
 tools/lib/bpf/libbpf.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 205b7822fa0a..65d0d0aff4fa 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -387,6 +387,8 @@ bpf_object__init_prog_names(struct bpf_object *obj)
continue;
if (sym.st_shndx != prog->idx)
continue;
+   if (GELF_ST_BIND(sym.st_info) != STB_GLOBAL)
+   continue;
 
name = elf_strptr(obj->efile.elf,
  obj->efile.strtabidx,
-- 
2.14.3

Re: [PATCH v2 net-next 4/4] bpftool: implement cgroup bpf operations

2017-12-08 Thread Roman Gushchin

On Fri, Dec 08, 2017 at 02:56:15PM +0100, Philippe Ombredanne wrote:
> On Fri, Dec 8, 2017 at 11:34 AM, Quentin Monnet
> <quentin.mon...@netronome.com> wrote:
> > 2017-12-07 18:39 UTC+0000 ~ Roman Gushchin <g...@fb.com>
> >> This patch adds basic cgroup bpf operations to bpftool:
> >> cgroup list, attach and detach commands.
> 
> [...]
> >> --- /dev/null
> >> +++ b/tools/bpf/bpftool/cgroup.c
> >> @@ -0,0 +1,305 @@
> >> +/*
> >> + * Copyright (C) 2017 Facebook
> >> + *
> >> + * This program is free software; you can redistribute it and/or
> >> + * modify it under the terms of the GNU General Public License
> >> + * as published by the Free Software Foundation; either version
> >> + * 2 of the License, or (at your option) any later version.
> >> + *
> >> + *
> >> + */
> >> +
> 
> Roman,
> Have you considered using the simpler and new SPDX ids instead? e.g.:
> 
> // SPDX-License-Identifier: GPL-2.0+
> // Copyright (C) 2017 Facebook
> // Author: Roman Gushchin <g...@fb.com>
> 
> This would boost your code/comments ratio nicely IMHO.

Thanks, applied to v3!

[PATCH v3 net-next 4/4] bpftool: implement cgroup bpf operations

2017-12-08 Thread Roman Gushchin

This patch adds basic cgroup bpf operations to bpftool:
cgroup list, attach and detach commands.

Usage is described in the corresponding man pages,
and examples are provided.

Syntax:
$ bpftool cgroup list CGROUP
$ bpftool cgroup attach CGROUP ATTACH_TYPE PROG [ATTACH_FLAGS]
$ bpftool cgroup detach CGROUP ATTACH_TYPE PROG

Signed-off-by: Roman Gushchin <g...@fb.com>
Cc: Alexei Starovoitov <a...@kernel.org>
Cc: Daniel Borkmann <dan...@iogearbox.net>
Cc: Jakub Kicinski <jakub.kicin...@netronome.com>
Cc: Martin KaFai Lau <ka...@fb.com>
Cc: Quentin Monnet <quentin.mon...@netronome.com>
Reviewed-by: David Ahern <dsah...@gmail.com>
---
 tools/bpf/bpftool/Documentation/bpftool-cgroup.rst |  92 +++
 tools/bpf/bpftool/Documentation/bpftool-map.rst|   2 +-
 tools/bpf/bpftool/Documentation/bpftool-prog.rst   |   2 +-
 tools/bpf/bpftool/Documentation/bpftool.rst|   6 +-
 tools/bpf/bpftool/cgroup.c | 300 +
 tools/bpf/bpftool/main.c   |   3 +-
 tools/bpf/bpftool/main.h   |   1 +
 7 files changed, 401 insertions(+), 5 deletions(-)
 create mode 100644 tools/bpf/bpftool/Documentation/bpftool-cgroup.rst
 create mode 100644 tools/bpf/bpftool/cgroup.c

diff --git a/tools/bpf/bpftool/Documentation/bpftool-cgroup.rst 
b/tools/bpf/bpftool/Documentation/bpftool-cgroup.rst
new file mode 100644
index ..61ded613aee1
--- /dev/null
+++ b/tools/bpf/bpftool/Documentation/bpftool-cgroup.rst
@@ -0,0 +1,92 @@
+
+bpftool-cgroup
+
+---
+tool for inspection and simple manipulation of eBPF progs
+---
+
+:Manual section: 8
+
+SYNOPSIS
+
+
+   **bpftool** [*OPTIONS*] **cgroup** *COMMAND*
+
+   *OPTIONS* := { { **-j** | **--json** } [{ **-p** | **--pretty** }] | { 
**-f** | **--bpffs** } }
+
+   *COMMANDS* :=
+   { **list** | **attach** | **detach** | **help** }
+
+MAP COMMANDS
+=
+
+|  **bpftool** **cgroup list** *CGROUP*
+|  **bpftool** **cgroup attach** *CGROUP* *ATTACH_TYPE* *PROG* 
[*ATTACH_FLAGS*]
+|  **bpftool** **cgroup detach** *CGROUP* *ATTACH_TYPE* *PROG*
+|  **bpftool** **cgroup help**
+|
+|  *PROG* := { **id** *PROG_ID* | **pinned** *FILE* | **tag** *PROG_TAG* }
+
+DESCRIPTION
+===
+   **bpftool cgroup list** *CGROUP*
+ List all programs attached to the cgroup *CGROUP*.
+
+ Output will start with program ID followed by attach type,
+ attach flags and program name.
+
+   **bpftool cgroup attach** *CGROUP* *ATTACH_TYPE* *PROG* [*ATTACH_FLAGS*]
+ Attach program *PROG* to the cgroup *CGROUP* with attach type
+ *ATTACH_TYPE* and optional *ATTACH_FLAGS*.
+
+   **bpftool cgroup detach** *CGROUP* *ATTACH_TYPE* *PROG*
+ Detach *PROG* from the cgroup *CGROUP* and attach type
+ *ATTACH_TYPE*.
+
+   **bpftool prog help**
+ Print short help message.
+
+OPTIONS
+===
+   -h, --help
+ Print short generic help message (similar to **bpftool 
help**).
+
+   -v, --version
+ Print version number (similar to **bpftool version**).
+
+   -j, --json
+ Generate JSON output. For commands that cannot produce JSON, 
this
+ option has no effect.
+
+   -p, --pretty
+ Generate human-readable JSON output. Implies **-j**.
+
+   -f, --bpffs
+ Show file names of pinned programs.
+
+EXAMPLES
+
+|
+| **# mount -t bpf none /sys/fs/bpf/**
+| **# mkdir /sys/fs/cgroup/test.slice**
+| **# bpftool prog load ./device_cgroup.o /sys/fs/bpf/prog**
+| **# bpftool cgroup attach /sys/fs/cgroup/test.slice/ device id 1 
allow_multi**
+
+**# bpftool cgroup list /sys/fs/cgroup/test.slice/**
+
+::
+
+ID   AttachType  AttachFlags Name
+1device  allow_multi bpf_prog1
+
+|
+| **# bpftool cgroup detach /sys/fs/cgroup/test.slice/ device id 1**
+| **# bpftool cgroup list /sys/fs/cgroup/test.slice/**
+
+::
+
+ID   AttachType  AttachFlags Name
+
+SEE ALSO
+
+   **bpftool**\ (8), **bpftool-prog**\ (8), **bpftool-map**\ (8)
diff --git a/tools/bpf/bpftool/Documentation/bpftool-map.rst 
b/tools/bpf/bpftool/Documentation/bpftool-map.rst
index 9f51a268eb06..421cabc417e6 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-map.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-map.rst
@@ -128,4 +128,4 @@ EXAMPLES
 
 SEE ALSO
 
-   **bpftool**\ (8), **bpftool-prog**\ (8)
+   **bpftool**\ (8), **bpftool-prog**\ (8), **bpftool-cgroup**\ (8)
diff --git a/tools/bpf/bpftool/Documentation/bpftool-prog.rst 
b/tools/bpf/bpftool/Documentation/bpftool-prog.r

[PATCH v3 net-next 1/4] libbpf: add ability to guess program type based on section name

2017-12-08 Thread Roman Gushchin

The bpf_prog_load() function will guess program type if it's not
specified explicitly. This functionality will be used to implement
loading of different programs without asking a user to specify
the program type. In first order it will be used by bpftool.

Signed-off-by: Roman Gushchin <g...@fb.com>
Cc: Alexei Starovoitov <a...@kernel.org>
Cc: Daniel Borkmann <dan...@iogearbox.net>
Cc: Jakub Kicinski <jakub.kicin...@netronome.com>
Cc: Martin KaFai Lau <ka...@fb.com>
Cc: Quentin Monnet <quentin.mon...@netronome.com>
Cc: David Ahern <dsah...@gmail.com>
---
 tools/lib/bpf/libbpf.c | 51 ++
 1 file changed, 51 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 5aa45f89da93..205b7822fa0a 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -1721,6 +1721,45 @@ BPF_PROG_TYPE_FNS(tracepoint, BPF_PROG_TYPE_TRACEPOINT);
 BPF_PROG_TYPE_FNS(xdp, BPF_PROG_TYPE_XDP);
 BPF_PROG_TYPE_FNS(perf_event, BPF_PROG_TYPE_PERF_EVENT);
 
+#define BPF_PROG_SEC(string, type) { string, sizeof(string), type }
+static const struct {
+   const char *sec;
+   size_t len;
+   enum bpf_prog_type prog_type;
+} section_names[] = {
+   BPF_PROG_SEC("socket",  BPF_PROG_TYPE_SOCKET_FILTER),
+   BPF_PROG_SEC("kprobe/", BPF_PROG_TYPE_KPROBE),
+   BPF_PROG_SEC("kretprobe/",  BPF_PROG_TYPE_KPROBE),
+   BPF_PROG_SEC("tracepoint/", BPF_PROG_TYPE_TRACEPOINT),
+   BPF_PROG_SEC("xdp", BPF_PROG_TYPE_XDP),
+   BPF_PROG_SEC("perf_event",  BPF_PROG_TYPE_PERF_EVENT),
+   BPF_PROG_SEC("cgroup/skb",  BPF_PROG_TYPE_CGROUP_SKB),
+   BPF_PROG_SEC("cgroup/sock", BPF_PROG_TYPE_CGROUP_SOCK),
+   BPF_PROG_SEC("cgroup/dev",  BPF_PROG_TYPE_CGROUP_DEVICE),
+   BPF_PROG_SEC("sockops", BPF_PROG_TYPE_SOCK_OPS),
+   BPF_PROG_SEC("sk_skb",  BPF_PROG_TYPE_SK_SKB),
+};
+#undef BPF_PROG_SEC
+
+static enum bpf_prog_type bpf_program__guess_type(struct bpf_program *prog)
+{
+   int i;
+
+   if (!prog->section_name)
+   goto err;
+
+   for (i = 0; i < ARRAY_SIZE(section_names); i++)
+   if (strncmp(prog->section_name, section_names[i].sec,
+   section_names[i].len) == 0)
+   return section_names[i].prog_type;
+
+err:
+   pr_warning("failed to guess program type based on section name %s\n",
+  prog->section_name);
+
+   return BPF_PROG_TYPE_UNSPEC;
+}
+
 int bpf_map__fd(struct bpf_map *map)
 {
return map ? map->fd : -EINVAL;
@@ -1832,6 +1871,18 @@ int bpf_prog_load(const char *file, enum bpf_prog_type 
type,
return -ENOENT;
}
 
+   /*
+* If type is not specified, try to guess it based on
+* section name.
+*/
+   if (type == BPF_PROG_TYPE_UNSPEC) {
+   type = bpf_program__guess_type(prog);
+   if (type == BPF_PROG_TYPE_UNSPEC) {
+   bpf_object__close(obj);
+   return -EINVAL;
+   }
+   }
+
bpf_program__set_type(prog, type);
err = bpf_object__load(obj);
if (err) {
-- 
2.14.3

Re: [PATCH v2 net-next 4/4] bpftool: implement cgroup bpf operations

2017-12-08 Thread Roman Gushchin

On Thu, Dec 07, 2017 at 02:23:06PM -0800, Jakub Kicinski wrote:
> On Thu, 7 Dec 2017 18:39:09 +0000, Roman Gushchin wrote:
> > This patch adds basic cgroup bpf operations to bpftool:
> > cgroup list, attach and detach commands.
> > 
> > Usage is described in the corresponding man pages,
> > and examples are provided.
> > 
> > Syntax:
> > $ bpftool cgroup list CGROUP
> > $ bpftool cgroup attach CGROUP ATTACH_TYPE PROG [ATTACH_FLAGS]
> > $ bpftool cgroup detach CGROUP ATTACH_TYPE PROG
> > 
> > Signed-off-by: Roman Gushchin <g...@fb.com>
> 
> Looks good, a few very minor nits/questions below.
> 
> > diff --git a/tools/bpf/bpftool/cgroup.c b/tools/bpf/bpftool/cgroup.c
> > new file mode 100644
> > index ..88d67f74313f
> > --- /dev/null
> > +++ b/tools/bpf/bpftool/cgroup.c
> > @@ -0,0 +1,305 @@
> > +/*
> > + * Copyright (C) 2017 Facebook
> > + *
> > + * This program is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU General Public License
> > + * as published by the Free Software Foundation; either version
> > + * 2 of the License, or (at your option) any later version.
> > + *
> > + * Author: Roman Gushchin <g...@fb.com>
> > + */
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#include 
> > +
> > +#include "main.h"
> > +
> > +static const char * const attach_type_strings[] = {
> > +   [BPF_CGROUP_INET_INGRESS] = "ingress",
> > +   [BPF_CGROUP_INET_EGRESS] = "egress",
> > +   [BPF_CGROUP_INET_SOCK_CREATE] = "sock_create",
> > +   [BPF_CGROUP_SOCK_OPS] = "sock_ops",
> > +   [BPF_CGROUP_DEVICE] = "device",
> > +   [__MAX_BPF_ATTACH_TYPE] = NULL,
> > +};
> > +
> > +static enum bpf_attach_type parse_attach_type(const char *str)
> > +{
> > +   enum bpf_attach_type type;
> > +
> > +   for (type = 0; type < __MAX_BPF_ATTACH_TYPE; type++) {
> > +   if (attach_type_strings[type] &&
> > +   strcmp(str, attach_type_strings[type]) == 0)
> 
> Here and for matching flags you use straight strcmp(), not our locally
> defined is_prefix(), is this intentional?  is_prefix() allows
> abbreviations, like in iproute2 commands.  E.g. this would work:

Fixed in v3.

> 
> # bpftool cg att /sys/fs/cgroup/test.slice/ dev id 1 allow_multi
> 
> > +   return type;
> > +   }
> > +
> > +   return __MAX_BPF_ATTACH_TYPE;
> > +}
> 
> > +static int list_attached_bpf_progs(int cgroup_fd, enum bpf_attach_type 
> > type)
> > +{
> > +   __u32 attach_flags;
> > +   __u32 prog_ids[1024] = {0};
> > +   __u32 prog_cnt, iter;
> > +   char *attach_flags_str;
> > +   int ret;
> 
> nit: could you reorder the variables so they're listed longest to
>  shortest (reverse christmas tree)?
> 
> > +   prog_cnt = ARRAY_SIZE(prog_ids);
> > +   ret = bpf_prog_query(cgroup_fd, type, 0, _flags, prog_ids,
> > +_cnt);
> > +   if (ret)
> > +   return ret;
> > +
> > +   if (prog_cnt == 0)
> > +   return 0;
> > +
> > +   switch (attach_flags) {
> > +   case BPF_F_ALLOW_MULTI:
> > +   attach_flags_str = "allow_multi";
> > +   break;
> > +   case BPF_F_ALLOW_OVERRIDE:
> > +   attach_flags_str = "allow_override";
> > +   break;
> > +   case 0:
> > +   attach_flags_str = "";
> > +   break;
> > +   default:
> > +   attach_flags_str = "unknown";
> 
> nit: would it make sense to perhaps print flags in hex format in this
>  case?
> 
> > +   }
> > +
> > +   for (iter = 0; iter < prog_cnt; iter++)
> > +   list_bpf_prog(prog_ids[iter], attach_type_strings[type],
> > + attach_flags_str);
> > +
> > +   return 0;
> > +}
> 
> > +static int do_attach(int argc, char **argv)
> > +{
> > +   int cgroup_fd, prog_fd;
> > +   enum bpf_attach_type attach_type;
> > +   int attach_flags = 0;
> > +   int i;
> > +   int ret = -1;
> > +
> > +   if (argc < 4) {
> > +   p_err("too few parameters for cgroup attach\n");
> > +   goto exit;
> > +   }
> > +
> > +   cgroup_fd = open(argv[0], O_RDONLY);
> > +   if (cgroup_fd < 0) {
> > +

Re: [PATCH v2 net-next 4/4] bpftool: implement cgroup bpf operations

2017-12-08 Thread Roman Gushchin

On Fri, Dec 08, 2017 at 10:34:16AM +, Quentin Monnet wrote:
> 2017-12-07 18:39 UTC+ ~ Roman Gushchin <g...@fb.com>
> > This patch adds basic cgroup bpf operations to bpftool:
> > cgroup list, attach and detach commands.
> > 
> > Usage is described in the corresponding man pages,
> > and examples are provided.
[...]
> > +MAP COMMANDS
> > +=
> > +
> > +|  **bpftool** **cgroup list** *CGROUP*
> > +|  **bpftool** **cgroup attach** *CGROUP* *ATTACH_TYPE* *PROG* 
> > [*ATTACH_FLAGS*]
> > +|  **bpftool** **cgroup detach** *CGROUP* *ATTACH_TYPE* *PROG*
> > +|  **bpftool** **cgroup help**
> > +|
> > +|  *PROG* := { **id** *PROG_ID* | **pinned** *FILE* | **tag** *PROG_TAG* }
> 
> Could you please give the different possible values for ATTACH_TYPE and
> ATTACH_FLAGS, and provide some documentation for the flags?

I intentionally didn't include the list of possible values, as it depends
on the exact kernel version, and other bpftool docs are carefully avoiding
specifying such things.

It would be nice to have a way to ask the kernel about provided bpf program 
types,
attach types, etc; but I'm not sure that hardcoding it in bpftool docs is
a good idea.

> 
> > +
> > +DESCRIPTION
> > +===
> > +   **bpftool cgroup list** *CGROUP*
> > + List all programs attached to the cgroup *CGROUP*.
> > +
> > + Output will start with program ID followed by attach type,
> > + attach flags and program name.
> > +
> > +   **bpftool cgroup attach** *CGROUP* *ATTACH_TYPE* *PROG* [*ATTACH_FLAGS*]
> > + Attach program *PROG* to the cgroup *CGROUP* with attach type
> > + *ATTACH_TYPE* and optional *ATTACH_FLAGS*.
[...]
> > +
> > +   attach_type = parse_attach_type(argv[1]);
> > +   if (attach_type == __MAX_BPF_ATTACH_TYPE) {
> > +   p_err("invalid attach type");
> > +   goto exit_cgroup;
> > +   }
> > +
> > +   argc -= 2;
> > +   argv = [2];
> > +   prog_fd = prog_parse_fd(, );
> > +   if (prog_fd < 0)
> > +   goto exit_cgroup;
> > +
> > +   if (bpf_prog_detach2(prog_fd, cgroup_fd, attach_type)) {
> > +   p_err("failed to attach program");
> 
> Failed to *detach* instead of “attach”.

Fixed.

> 
> > +   goto exit_prog;
> > +   }
> > +
> > +   if (json_output)
> > +   jsonw_null(json_wtr);
> > +
> > +   ret = 0;
> > +
> > +exit_prog:
> > +   close(prog_fd);
> > +exit_cgroup:
> > +   close(cgroup_fd);
> > +exit:
> > +   return ret;
> > +}
> 
> […]
> 
> Very nice work on this v2, thanks a lot!
> Quentin

Thank you for reviewing!

[PATCH v2 net-next 4/4] bpftool: implement cgroup bpf operations

2017-12-07 Thread Roman Gushchin

This patch adds basic cgroup bpf operations to bpftool:
cgroup list, attach and detach commands.

Usage is described in the corresponding man pages,
and examples are provided.

Syntax:
$ bpftool cgroup list CGROUP
$ bpftool cgroup attach CGROUP ATTACH_TYPE PROG [ATTACH_FLAGS]
$ bpftool cgroup detach CGROUP ATTACH_TYPE PROG

Signed-off-by: Roman Gushchin <g...@fb.com>
Cc: Alexei Starovoitov <a...@kernel.org>
Cc: Daniel Borkmann <dan...@iogearbox.net>
Cc: Jakub Kicinski <jakub.kicin...@netronome.com>
Cc: Martin KaFai Lau <ka...@fb.com>
Cc: Quentin Monnet <quentin.mon...@netronome.com>
Cc: David Ahern <dsah...@gmail.com>
---
 tools/bpf/bpftool/Documentation/bpftool-cgroup.rst |  92 +++
 tools/bpf/bpftool/Documentation/bpftool-map.rst|   2 +-
 tools/bpf/bpftool/Documentation/bpftool-prog.rst   |   2 +-
 tools/bpf/bpftool/Documentation/bpftool.rst|   6 +-
 tools/bpf/bpftool/cgroup.c | 305 +
 tools/bpf/bpftool/main.c   |   3 +-
 tools/bpf/bpftool/main.h   |   1 +
 7 files changed, 406 insertions(+), 5 deletions(-)
 create mode 100644 tools/bpf/bpftool/Documentation/bpftool-cgroup.rst
 create mode 100644 tools/bpf/bpftool/cgroup.c

diff --git a/tools/bpf/bpftool/Documentation/bpftool-cgroup.rst 
b/tools/bpf/bpftool/Documentation/bpftool-cgroup.rst
new file mode 100644
index ..61ded613aee1
--- /dev/null
+++ b/tools/bpf/bpftool/Documentation/bpftool-cgroup.rst
@@ -0,0 +1,92 @@
+
+bpftool-cgroup
+
+---
+tool for inspection and simple manipulation of eBPF progs
+---
+
+:Manual section: 8
+
+SYNOPSIS
+
+
+   **bpftool** [*OPTIONS*] **cgroup** *COMMAND*
+
+   *OPTIONS* := { { **-j** | **--json** } [{ **-p** | **--pretty** }] | { 
**-f** | **--bpffs** } }
+
+   *COMMANDS* :=
+   { **list** | **attach** | **detach** | **help** }
+
+MAP COMMANDS
+=
+
+|  **bpftool** **cgroup list** *CGROUP*
+|  **bpftool** **cgroup attach** *CGROUP* *ATTACH_TYPE* *PROG* 
[*ATTACH_FLAGS*]
+|  **bpftool** **cgroup detach** *CGROUP* *ATTACH_TYPE* *PROG*
+|  **bpftool** **cgroup help**
+|
+|  *PROG* := { **id** *PROG_ID* | **pinned** *FILE* | **tag** *PROG_TAG* }
+
+DESCRIPTION
+===
+   **bpftool cgroup list** *CGROUP*
+ List all programs attached to the cgroup *CGROUP*.
+
+ Output will start with program ID followed by attach type,
+ attach flags and program name.
+
+   **bpftool cgroup attach** *CGROUP* *ATTACH_TYPE* *PROG* [*ATTACH_FLAGS*]
+ Attach program *PROG* to the cgroup *CGROUP* with attach type
+ *ATTACH_TYPE* and optional *ATTACH_FLAGS*.
+
+   **bpftool cgroup detach** *CGROUP* *ATTACH_TYPE* *PROG*
+ Detach *PROG* from the cgroup *CGROUP* and attach type
+ *ATTACH_TYPE*.
+
+   **bpftool prog help**
+ Print short help message.
+
+OPTIONS
+===
+   -h, --help
+ Print short generic help message (similar to **bpftool 
help**).
+
+   -v, --version
+ Print version number (similar to **bpftool version**).
+
+   -j, --json
+ Generate JSON output. For commands that cannot produce JSON, 
this
+ option has no effect.
+
+   -p, --pretty
+ Generate human-readable JSON output. Implies **-j**.
+
+   -f, --bpffs
+ Show file names of pinned programs.
+
+EXAMPLES
+
+|
+| **# mount -t bpf none /sys/fs/bpf/**
+| **# mkdir /sys/fs/cgroup/test.slice**
+| **# bpftool prog load ./device_cgroup.o /sys/fs/bpf/prog**
+| **# bpftool cgroup attach /sys/fs/cgroup/test.slice/ device id 1 
allow_multi**
+
+**# bpftool cgroup list /sys/fs/cgroup/test.slice/**
+
+::
+
+ID   AttachType  AttachFlags Name
+1device  allow_multi bpf_prog1
+
+|
+| **# bpftool cgroup detach /sys/fs/cgroup/test.slice/ device id 1**
+| **# bpftool cgroup list /sys/fs/cgroup/test.slice/**
+
+::
+
+ID   AttachType  AttachFlags Name
+
+SEE ALSO
+
+   **bpftool**\ (8), **bpftool-prog**\ (8), **bpftool-map**\ (8)
diff --git a/tools/bpf/bpftool/Documentation/bpftool-map.rst 
b/tools/bpf/bpftool/Documentation/bpftool-map.rst
index 9f51a268eb06..421cabc417e6 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-map.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-map.rst
@@ -128,4 +128,4 @@ EXAMPLES
 
 SEE ALSO
 
-   **bpftool**\ (8), **bpftool-prog**\ (8)
+   **bpftool**\ (8), **bpftool-prog**\ (8), **bpftool-cgroup**\ (8)
diff --git a/tools/bpf/bpftool/Documentation/bpftool-prog.rst 
b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
index 827

[PATCH v2 net-next 3/4] bpftool: implement prog load command

2017-12-07 Thread Roman Gushchin

Add the prog load command to load a bpf program from a specified
binary file and pin it to bpffs.

Usage description and examples are given in the corresponding man
page.

Syntax:
$ bpftool prog load SOURCE_FILE FILE

FILE is a non-existing file on bpffs.

Signed-off-by: Roman Gushchin <g...@fb.com>
Cc: Alexei Starovoitov <a...@kernel.org>
Cc: Daniel Borkmann <dan...@iogearbox.net>
Cc: Jakub Kicinski <jakub.kicin...@netronome.com>
Cc: Martin KaFai Lau <ka...@fb.com>
Cc: Quentin Monnet <quentin.mon...@netronome.com>
Cc: David Ahern <dsah...@gmail.com>
---
 tools/bpf/bpftool/Documentation/bpftool-prog.rst | 10 +++-
 tools/bpf/bpftool/Documentation/bpftool.rst  |  2 +-
 tools/bpf/bpftool/common.c   | 71 +---
 tools/bpf/bpftool/main.h |  1 +
 tools/bpf/bpftool/prog.c | 31 ++-
 5 files changed, 81 insertions(+), 34 deletions(-)

diff --git a/tools/bpf/bpftool/Documentation/bpftool-prog.rst 
b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
index 36e8d1c3c40d..827b415f8ab6 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-prog.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
@@ -15,7 +15,7 @@ SYNOPSIS
*OPTIONS* := { { **-j** | **--json** } [{ **-p** | **--pretty** }] | { 
**-f** | **--bpffs** } }
 
*COMMANDS* :=
-   { **show** | **dump xlated** | **dump jited** | **pin** | **help** }
+   { **show** | **dump xlated** | **dump jited** | **pin** | **load** | 
**help** }
 
 MAP COMMANDS
 =
@@ -24,6 +24,7 @@ MAP COMMANDS
 |  **bpftool** **prog dump xlated** *PROG* [{**file** *FILE* | 
**opcodes**}]
 |  **bpftool** **prog dump jited**  *PROG* [{**file** *FILE* | 
**opcodes**}]
 |  **bpftool** **prog pin** *PROG* *FILE*
+|  **bpftool** **prog load** *SRC* *FILE*
 |  **bpftool** **prog help**
 |
 |  *PROG* := { **id** *PROG_ID* | **pinned** *FILE* | **tag** *PROG_TAG* }
@@ -57,6 +58,11 @@ DESCRIPTION
 
  Note: *FILE* must be located in *bpffs* mount.
 
+   **bpftool prog load** *SRC* *FILE*
+ Load bpf program from binary *SRC* and pin as *FILE*.
+
+ Note: *FILE* must be located in *bpffs* mount.
+
**bpftool prog help**
  Print short help message.
 
@@ -126,8 +132,10 @@ EXAMPLES
 |
 | **# mount -t bpf none /sys/fs/bpf/**
 | **# bpftool prog pin id 10 /sys/fs/bpf/prog**
+| **# bpftool prog load ./my_prog.o /sys/fs/bpf/prog2**
 | **# ls -l /sys/fs/bpf/**
 |   -rw--- 1 root root 0 Jul 22 01:43 prog
+|   -rw--- 1 root root 0 Dec 07 17:23 prog2
 
 **# bpftool prog dum jited pinned /sys/fs/bpf/prog opcodes**
 
diff --git a/tools/bpf/bpftool/Documentation/bpftool.rst 
b/tools/bpf/bpftool/Documentation/bpftool.rst
index 926c03d5a8da..f547a0c0aa34 100644
--- a/tools/bpf/bpftool/Documentation/bpftool.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool.rst
@@ -26,7 +26,7 @@ SYNOPSIS
| **pin** | **help** }
 
*PROG-COMMANDS* := { **show** | **dump jited** | **dump xlated** | 
**pin**
-   | **help** }
+   | **load** | **help** }
 
 DESCRIPTION
 ===
diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
index 2bd3b280e6dd..b62c94e3997a 100644
--- a/tools/bpf/bpftool/common.c
+++ b/tools/bpf/bpftool/common.c
@@ -163,13 +163,49 @@ int open_obj_pinned_any(char *path, enum bpf_obj_type 
exp_type)
return fd;
 }
 
-int do_pin_any(int argc, char **argv, int (*get_fd_by_id)(__u32))
+int do_pin_fd(int fd, const char *name)
 {
char err_str[ERR_MAX_LEN];
-   unsigned int id;
-   char *endptr;
char *file;
char *dir;
+   int err = 0;
+
+   err = bpf_obj_pin(fd, name);
+   if (!err)
+   goto out;
+
+   file = malloc(strlen(name) + 1);
+   strcpy(file, name);
+   dir = dirname(file);
+
+   if (errno != EPERM || is_bpffs(dir)) {
+   p_err("can't pin the object (%s): %s", name, strerror(errno));
+   goto out_free;
+   }
+
+   /* Attempt to mount bpffs, then retry pinning. */
+   err = mnt_bpffs(dir, err_str, ERR_MAX_LEN);
+   if (!err) {
+   err = bpf_obj_pin(fd, name);
+   if (err)
+   p_err("can't pin the object (%s): %s", name,
+ strerror(errno));
+   } else {
+   err_str[ERR_MAX_LEN - 1] = '\0';
+   p_err("can't mount BPF file system to pin the object (%s): %s",
+ name, err_str);
+   }
+
+out_free:
+   free(file);
+out:
+   return err;
+}
+
+int do_pin_any(int argc, char **argv, int (*get_fd_by_id)(__u32))
+{
+   unsigned int id;
+   char *endptr;
int err;
int fd;
 
@@ -195,35 +231,8 @@ int do_pin_any(int argc, char **argv, int 
(*get_fd_by_id)(__u32))
return -1;
}
 
-   err =

[PATCH v2 net-next 2/4] libbpf: prefer global symbols as bpf program name source

2017-12-07 Thread Roman Gushchin

Libbpf picks the name of the first symbol in the corresponding
elf section to use as a program name. But without taking symbol's
scope into account it may end's up with some local label
as a program name. E.g.:

$ bpftool prog
1: type 15  name LBB0_10tag 0390a5136ba23f5c
loaded_at Dec 07/17:22  uid 0
xlated 456B  not jited  memlock 4096B

Fix this by preferring global symbols as program name.

For instance:
$ bpftool prog
1: type 15  name bpf_prog1  tag 0390a5136ba23f5c
loaded_at Dec 07/17:26  uid 0
xlated 456B  not jited  memlock 4096B

Signed-off-by: Roman Gushchin <g...@fb.com>
Cc: Alexei Starovoitov <a...@kernel.org>
Cc: Daniel Borkmann <dan...@iogearbox.net>
Cc: Jakub Kicinski <jakub.kicin...@netronome.com>
Cc: Martin KaFai Lau <ka...@fb.com>
Cc: Quentin Monnet <quentin.mon...@netronome.com>
Cc: David Ahern <dsah...@gmail.com>
---
 tools/lib/bpf/libbpf.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 205b7822fa0a..65d0d0aff4fa 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -387,6 +387,8 @@ bpf_object__init_prog_names(struct bpf_object *obj)
continue;
if (sym.st_shndx != prog->idx)
continue;
+   if (GELF_ST_BIND(sym.st_info) != STB_GLOBAL)
+   continue;
 
name = elf_strptr(obj->efile.elf,
  obj->efile.strtabidx,
-- 
2.14.3

[PATCH v2 net-next 1/4] libbpf: add ability to guess program type based on section name

2017-12-07 Thread Roman Gushchin

The bpf_prog_load() function will guess program type if it's not
specified explicitly. This functionality will be used to implement
loading of different programs without asking a user to specify
the program type. In first order it will be used by bpftool.

Signed-off-by: Roman Gushchin <g...@fb.com>
Cc: Alexei Starovoitov <a...@kernel.org>
Cc: Daniel Borkmann <dan...@iogearbox.net>
Cc: Jakub Kicinski <jakub.kicin...@netronome.com>
Cc: Martin KaFai Lau <ka...@fb.com>
Cc: Quentin Monnet <quentin.mon...@netronome.com>
Cc: David Ahern <dsah...@gmail.com>
---
 tools/lib/bpf/libbpf.c | 51 ++
 1 file changed, 51 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 5aa45f89da93..205b7822fa0a 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -1721,6 +1721,45 @@ BPF_PROG_TYPE_FNS(tracepoint, BPF_PROG_TYPE_TRACEPOINT);
 BPF_PROG_TYPE_FNS(xdp, BPF_PROG_TYPE_XDP);
 BPF_PROG_TYPE_FNS(perf_event, BPF_PROG_TYPE_PERF_EVENT);
 
+#define BPF_PROG_SEC(string, type) { string, sizeof(string), type }
+static const struct {
+   const char *sec;
+   size_t len;
+   enum bpf_prog_type prog_type;
+} section_names[] = {
+   BPF_PROG_SEC("socket",  BPF_PROG_TYPE_SOCKET_FILTER),
+   BPF_PROG_SEC("kprobe/", BPF_PROG_TYPE_KPROBE),
+   BPF_PROG_SEC("kretprobe/",  BPF_PROG_TYPE_KPROBE),
+   BPF_PROG_SEC("tracepoint/", BPF_PROG_TYPE_TRACEPOINT),
+   BPF_PROG_SEC("xdp", BPF_PROG_TYPE_XDP),
+   BPF_PROG_SEC("perf_event",  BPF_PROG_TYPE_PERF_EVENT),
+   BPF_PROG_SEC("cgroup/skb",  BPF_PROG_TYPE_CGROUP_SKB),
+   BPF_PROG_SEC("cgroup/sock", BPF_PROG_TYPE_CGROUP_SOCK),
+   BPF_PROG_SEC("cgroup/dev",  BPF_PROG_TYPE_CGROUP_DEVICE),
+   BPF_PROG_SEC("sockops", BPF_PROG_TYPE_SOCK_OPS),
+   BPF_PROG_SEC("sk_skb",  BPF_PROG_TYPE_SK_SKB),
+};
+#undef BPF_PROG_SEC
+
+static enum bpf_prog_type bpf_program__guess_type(struct bpf_program *prog)
+{
+   int i;
+
+   if (!prog->section_name)
+   goto err;
+
+   for (i = 0; i < ARRAY_SIZE(section_names); i++)
+   if (strncmp(prog->section_name, section_names[i].sec,
+   section_names[i].len) == 0)
+   return section_names[i].prog_type;
+
+err:
+   pr_warning("failed to guess program type based on section name %s\n",
+  prog->section_name);
+
+   return BPF_PROG_TYPE_UNSPEC;
+}
+
 int bpf_map__fd(struct bpf_map *map)
 {
return map ? map->fd : -EINVAL;
@@ -1832,6 +1871,18 @@ int bpf_prog_load(const char *file, enum bpf_prog_type 
type,
return -ENOENT;
}
 
+   /*
+* If type is not specified, try to guess it based on
+* section name.
+*/
+   if (type == BPF_PROG_TYPE_UNSPEC) {
+   type = bpf_program__guess_type(prog);
+   if (type == BPF_PROG_TYPE_UNSPEC) {
+   bpf_object__close(obj);
+   return -EINVAL;
+   }
+   }
+
bpf_program__set_type(prog, type);
err = bpf_object__load(obj);
if (err) {
-- 
2.14.3

[PATCH v2 net-next 0/4] bpftool: cgroup bpf operations

2017-12-07 Thread Roman Gushchin

This patchset adds basic cgroup bpf operations to bpftool.

Right now there is no convenient way to perform these operations.
The /samples/bpf/load_sock_ops.c implements attach/detacg operations,
but only for BPF_CGROUP_SOCK_OPS programs. Bps (part of bcc) implements
bpf introspection, but lacks any cgroup-related specific.

I find having a tool to perform these basic operations in the kernel tree
very useful, as it can be used in the corresponding bpf documentation
without creating additional dependencies. And bpftool seems to be
a right tool to extend with such functionality.

v2:
  - Added prog load operations
  - All cgroup operations are looking like bpftool cgroup 
  - All cgroup-related stuff is moved to a separate file
  - Added support for attach flags
  - Added support for attaching/detaching programs by id, pinned name, etc
  - Changed cgroup detach arguments order
  - Added empty json output for succesful programs
  - Style fixed: includes order, strncmp and macroses, error handling
  - Added man pages

v1:
  https://lwn.net/Articles/740366/

Roman Gushchin (4):
  libbpf: add ability to guess program type based on section name
  libbpf: prefer global symbols as bpf program name source
  bpftool: implement prog load command
  bpftool: implement cgroup bpf operations

 tools/bpf/bpftool/Documentation/bpftool-cgroup.rst |  92 +++
 tools/bpf/bpftool/Documentation/bpftool-map.rst|   2 +-
 tools/bpf/bpftool/Documentation/bpftool-prog.rst   |  12 +-
 tools/bpf/bpftool/Documentation/bpftool.rst|   8 +-
 tools/bpf/bpftool/cgroup.c | 305 +
 tools/bpf/bpftool/common.c |  71 ++---
 tools/bpf/bpftool/main.c   |   3 +-
 tools/bpf/bpftool/main.h   |   2 +
 tools/bpf/bpftool/prog.c   |  31 ++-
 tools/lib/bpf/libbpf.c |  53 
 10 files changed, 540 insertions(+), 39 deletions(-)
 create mode 100644 tools/bpf/bpftool/Documentation/bpftool-cgroup.rst
 create mode 100644 tools/bpf/bpftool/cgroup.c

-- 
2.14.3

Re: [PATCH net-next 1/5] libbpf: add ability to guess program type based on section name

2017-12-04 Thread Roman Gushchin

On Mon, Dec 04, 2017 at 01:12:33PM +, Quentin Monnet wrote:
> 2017-12-04 12:34 UTC+ ~ Roman Gushchin <g...@fb.com>
> > On Fri, Dec 01, 2017 at 02:46:06PM -0800, Jakub Kicinski wrote:
> >> On Fri, 1 Dec 2017 10:22:57 +, Quentin Monnet wrote:
> >>> Thanks Roman!
> >>> One comment in-line.
> >>>
> >>> 2017-11-30 13:42 UTC+ ~ Roman Gushchin <g...@fb.com>
> >>>> The bpf_prog_load() function will guess program type if it's not
> >>>> specified explicitly. This functionality will be used to implement
> >>>> loading of different programs without asking a user to specify
> >>>> the program type. In first order it will be used by bpftool.
> >>>>
> >>>> Signed-off-by: Roman Gushchin <g...@fb.com>
> >>>> Cc: Alexei Starovoitov <a...@kernel.org>
> >>>> Cc: Daniel Borkmann <dan...@iogearbox.net>
> >>>> Cc: Jakub Kicinski <jakub.kicin...@netronome.com>
> >>>> ---
> >>>>  tools/lib/bpf/libbpf.c | 47 
> >>>> +++
> >>>>  1 file changed, 47 insertions(+)
> >>>>
> >>>> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> >>>> index 5aa45f89da93..9f2410beaa18 100644
> >>>> --- a/tools/lib/bpf/libbpf.c
> >>>> +++ b/tools/lib/bpf/libbpf.c
> >>>> @@ -1721,6 +1721,41 @@ BPF_PROG_TYPE_FNS(tracepoint, 
> >>>> BPF_PROG_TYPE_TRACEPOINT);
> >>>>  BPF_PROG_TYPE_FNS(xdp, BPF_PROG_TYPE_XDP);
> >>>>  BPF_PROG_TYPE_FNS(perf_event, BPF_PROG_TYPE_PERF_EVENT);
> >>>>  
> >>>> +static enum bpf_prog_type bpf_program__guess_type(struct bpf_program 
> >>>> *prog)
> >>>> +{
> >>>> +if (!prog->section_name)
> >>>> +goto err;
> >>>> +
> >>>> +if (strncmp(prog->section_name, "socket", 6) == 0)
> >>>> +return BPF_PROG_TYPE_SOCKET_FILTER;
> >>>> +if (strncmp(prog->section_name, "kprobe/", 7) == 0)
> >>>> +return BPF_PROG_TYPE_KPROBE;
> >>>> +if (strncmp(prog->section_name, "kretprobe/", 10) == 0)
> >>>> +return BPF_PROG_TYPE_KPROBE;
> >>>> +if (strncmp(prog->section_name, "tracepoint/", 11) == 0)
> >>>> +return BPF_PROG_TYPE_TRACEPOINT;
> >>>> +if (strncmp(prog->section_name, "xdp", 3) == 0)
> >>>> +return BPF_PROG_TYPE_XDP;
> >>>> +if (strncmp(prog->section_name, "perf_event", 10) == 0)
> >>>> +return BPF_PROG_TYPE_PERF_EVENT;
> >>>> +if (strncmp(prog->section_name, "cgroup/skb", 10) == 0)
> >>>> +return BPF_PROG_TYPE_CGROUP_SKB;
> >>>> +if (strncmp(prog->section_name, "cgroup/sock", 11) == 0)
> >>>> +return BPF_PROG_TYPE_CGROUP_SOCK;
> >>>> +if (strncmp(prog->section_name, "cgroup/dev", 10) == 0)
> >>>> +return BPF_PROG_TYPE_CGROUP_DEVICE;
> >>>> +if (strncmp(prog->section_name, "sockops", 7) == 0)
> >>>> +return BPF_PROG_TYPE_SOCK_OPS;
> >>>> +if (strncmp(prog->section_name, "sk_skb", 6) == 0)
> >>>> +return BPF_PROG_TYPE_SK_SKB;  
> >>>
> >>> I do not really like these hard-coded lengths, maybe we could work out
> >>> something nicer with a bit of pre-processing work? Perhaps something like:
> >>>
> >>> #define SOCKET_FILTER_SEC_PREFIX "socket"
> >>> #define KPROBE_SEC_PREFIX "kprobe/"
> >>> […]
> >>>
> >>> #define TRY_TYPE(string, __TYPE)  \
> >>>   do {\
> >>>   if (!strncmp(string, __TYPE ## _SEC_PREFIX, \
> >>>sizeof(__TYPE ## _SEC_PREFIX)))\
> >>>   return BPF_PROG_TYPE_ ## __TYPE;\
> >>>   } while(0);
> >>
> >> I like the suggestion, but I think return and goto statements hiding
> >> inside macros are slightly frowned upon in the ne

Re: [PATCH net-next 1/5] libbpf: add ability to guess program type based on section name

2017-12-04 Thread Roman Gushchin

On Fri, Dec 01, 2017 at 02:46:06PM -0800, Jakub Kicinski wrote:
> On Fri, 1 Dec 2017 10:22:57 +, Quentin Monnet wrote:
> > Thanks Roman!
> > One comment in-line.
> > 
> > 2017-11-30 13:42 UTC+ ~ Roman Gushchin <g...@fb.com>
> > > The bpf_prog_load() function will guess program type if it's not
> > > specified explicitly. This functionality will be used to implement
> > > loading of different programs without asking a user to specify
> > > the program type. In first order it will be used by bpftool.
> > > 
> > > Signed-off-by: Roman Gushchin <g...@fb.com>
> > > Cc: Alexei Starovoitov <a...@kernel.org>
> > > Cc: Daniel Borkmann <dan...@iogearbox.net>
> > > Cc: Jakub Kicinski <jakub.kicin...@netronome.com>
> > > ---
> > >  tools/lib/bpf/libbpf.c | 47 
> > > +++
> > >  1 file changed, 47 insertions(+)
> > > 
> > > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> > > index 5aa45f89da93..9f2410beaa18 100644
> > > --- a/tools/lib/bpf/libbpf.c
> > > +++ b/tools/lib/bpf/libbpf.c
> > > @@ -1721,6 +1721,41 @@ BPF_PROG_TYPE_FNS(tracepoint, 
> > > BPF_PROG_TYPE_TRACEPOINT);
> > >  BPF_PROG_TYPE_FNS(xdp, BPF_PROG_TYPE_XDP);
> > >  BPF_PROG_TYPE_FNS(perf_event, BPF_PROG_TYPE_PERF_EVENT);
> > >  
> > > +static enum bpf_prog_type bpf_program__guess_type(struct bpf_program 
> > > *prog)
> > > +{
> > > + if (!prog->section_name)
> > > + goto err;
> > > +
> > > + if (strncmp(prog->section_name, "socket", 6) == 0)
> > > + return BPF_PROG_TYPE_SOCKET_FILTER;
> > > + if (strncmp(prog->section_name, "kprobe/", 7) == 0)
> > > + return BPF_PROG_TYPE_KPROBE;
> > > + if (strncmp(prog->section_name, "kretprobe/", 10) == 0)
> > > + return BPF_PROG_TYPE_KPROBE;
> > > + if (strncmp(prog->section_name, "tracepoint/", 11) == 0)
> > > + return BPF_PROG_TYPE_TRACEPOINT;
> > > + if (strncmp(prog->section_name, "xdp", 3) == 0)
> > > + return BPF_PROG_TYPE_XDP;
> > > + if (strncmp(prog->section_name, "perf_event", 10) == 0)
> > > + return BPF_PROG_TYPE_PERF_EVENT;
> > > + if (strncmp(prog->section_name, "cgroup/skb", 10) == 0)
> > > + return BPF_PROG_TYPE_CGROUP_SKB;
> > > + if (strncmp(prog->section_name, "cgroup/sock", 11) == 0)
> > > + return BPF_PROG_TYPE_CGROUP_SOCK;
> > > + if (strncmp(prog->section_name, "cgroup/dev", 10) == 0)
> > > + return BPF_PROG_TYPE_CGROUP_DEVICE;
> > > + if (strncmp(prog->section_name, "sockops", 7) == 0)
> > > + return BPF_PROG_TYPE_SOCK_OPS;
> > > + if (strncmp(prog->section_name, "sk_skb", 6) == 0)
> > > + return BPF_PROG_TYPE_SK_SKB;  
> > 
> > I do not really like these hard-coded lengths, maybe we could work out
> > something nicer with a bit of pre-processing work? Perhaps something like:
> > 
> > #define SOCKET_FILTER_SEC_PREFIX "socket"
> > #define KPROBE_SEC_PREFIX "kprobe/"
> > […]
> > 
> > #define TRY_TYPE(string, __TYPE)\
> > do {\
> > if (!strncmp(string, __TYPE ## _SEC_PREFIX, \
> >  sizeof(__TYPE ## _SEC_PREFIX)))\
> > return BPF_PROG_TYPE_ ## __TYPE;\
> > } while(0);
> 
> I like the suggestion, but I think return and goto statements hiding
> inside macros are slightly frowned upon in the netdev.  Perhaps just 
> a macro that wraps the strncmp() with sizeof would be enough?  Without
> the return inside?

Hm, I'm not sure that using macroses here makes the code much easier to read.
Maybe we can use just strcmp() instead?
As we compare with hardcoded strings, there is no real difference.
Something like this:

if (!strcmp("socket", prog->section_name))
return BPF_PROG_TYPE_SOCKET_FILTER;
if (!strcmp("kprobe/", prog->section_name))
return BPF_PROG_TYPE_KPROBE;
if (!strcmp("kretprobe/", prog->section_name))
return BPF_PROG_TYPE_KPROBE;
if (!strcmp("tracepoint/", prog->section_name))
return BPF_PROG_TYPE_TRACEPOINT;
if (!strcmp("xdp", prog->section_name))
return BPF_PROG_TYPE_XDP;
if (!strcmp("perf_event", prog->section_name))
return BPF_PROG_TYPE_PERF_EVENT;
if (!strcmp("cgroup/skb", prog->section_name))
return BPF_PROG_TYPE_CGROUP_SKB;
if (!strcmp("cgroup/sock", prog->section_name))
return BPF_PROG_TYPE_CGROUP_SOCK;
if (!strcmp("cgroup/dev", prog->section_name))
return BPF_PROG_TYPE_CGROUP_DEVICE;
if (!strcmp("sockops", prog->section_name))
return BPF_PROG_TYPE_SOCK_OPS;
if (!strcmp("sk_skb", prog->section_name))
return BPF_PROG_TYPE_SK_SKB;

Thanks!

Re: [PATCH net-next 0/5] bpftool: cgroup bpf operations

2017-12-01 Thread Roman Gushchin

On Thu, Nov 30, 2017 at 07:04:54PM -0800, Jakub Kicinski wrote:
> Hi Roman!
> 
> On Thu, 30 Nov 2017 13:42:57 +0000, Roman Gushchin wrote:
> > This patchset adds basic cgroup bpf operations to bpftool.
> > 
> > Right now there is no convenient way to perform these operations.
> > The /samples/bpf/load_sock_ops.c implements attach/detacg operations,
> > but only for BPF_CGROUP_SOCK_OPS programs. Bps (part of bcc) implements
> > bpf introspection, but lacks any cgroup-related specific.
> > 
> > I find having a tool to perform these basic operations in the kernel tree
> > very useful, as it can be used in the corresponding bpf documentation
> > without creating additional dependencies. And bpftool seems to be
> > a right tool to extend with such functionality.
> 
> Could you place your code in a new file and add a new "object level"?
> I.e. 
> bpftool cgroup list 
> bpftool cgroup attach ...
> bpftool cgroup help
> etc?  Note that you probably want the list to be first, so if someone
> types "bpftool cg" it runs list by default.
> 
> Does it make sense to support pinned files and specifying programs by
> id?  I used the "id"/"pinned" keywords so that users can choose to use
> either.  Perhaps you should at least prefix the file to with "file"?
> So:
> $ bpftool cgattach file ./mybpfprog.o /sys/fs/cgroup/user.slice/ ingress
> $ bpftool cgattach id 19 /sys/fs/cgroup/user.slice/ ingress
> $ bpftool cgattach pin /bpf/prog /sys/fs/cgroup/user.slice/ ingress
> Would this make sense?
> 
> Smaller nits on the coding style:
>  - please try to run checkpatch, perhaps you did, but some people
>forget tools are in the kernel tree :)
>  - please keep includes in alphabetical order;
>  - please keep variable declarations in functions ordered longest to
>shortest, if that's impossible because of dependency between
>initializers - move the initializers to the code.
> 
> Please also don't forget to update/create new man page.

Ok, I'll try to address these comments in v2.

Thank you!

Re: [PATCH net-next 3/5] bpftool: implement cgattach command

2017-11-30 Thread Roman Gushchin

On Thu, Nov 30, 2017 at 09:17:17AM -0700, David Ahern wrote:
> On 11/30/17 6:43 AM, Roman Gushchin wrote:
> > @@ -75,12 +80,13 @@ static int do_help(int argc, char **argv)
> > fprintf(stderr,
> > "Usage: %s [OPTIONS] OBJECT { COMMAND | help }\n"
> > "   %s batch file FILE\n"
> > +   "   %s cgattach FILE CGROUP TYPE\n"
> 
> Can you change the order to:
> + "   %s cgattach CGROUP TYPE FILE\n"
> 
> Makes for better consistency with the detach command in the next patch:
> + "   %s cgdetach CGROUP TYPE ID\n"
> 
> 

Good point.

I'll fix this and will add support for attach_flags in v2.

Thanks!

[PATCH net-next 0/5] bpftool: cgroup bpf operations

2017-11-30 Thread Roman Gushchin

This patchset adds basic cgroup bpf operations to bpftool.

Right now there is no convenient way to perform these operations.
The /samples/bpf/load_sock_ops.c implements attach/detacg operations,
but only for BPF_CGROUP_SOCK_OPS programs. Bps (part of bcc) implements
bpf introspection, but lacks any cgroup-related specific.

I find having a tool to perform these basic operations in the kernel tree
very useful, as it can be used in the corresponding bpf documentation
without creating additional dependencies. And bpftool seems to be
a right tool to extend with such functionality.

Roman Gushchin (5):
  libbpf: add ability to guess program type based on section name
  libbpf: prefer global symbols as bpf program name source
  bpftool: implement cgattach command
  bpftool: implement cgdetach command
  bpftool: implement cglist command

 tools/bpf/bpftool/main.c | 209 ++-
 tools/lib/bpf/libbpf.c   |  49 +++
 2 files changed, 257 insertions(+), 1 deletion(-)

-- 
2.14.3

[PATCH net-next 1/5] libbpf: add ability to guess program type based on section name

2017-11-30 Thread Roman Gushchin

The bpf_prog_load() function will guess program type if it's not
specified explicitly. This functionality will be used to implement
loading of different programs without asking a user to specify
the program type. In first order it will be used by bpftool.

Signed-off-by: Roman Gushchin <g...@fb.com>
Cc: Alexei Starovoitov <a...@kernel.org>
Cc: Daniel Borkmann <dan...@iogearbox.net>
Cc: Jakub Kicinski <jakub.kicin...@netronome.com>
---
 tools/lib/bpf/libbpf.c | 47 +++
 1 file changed, 47 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 5aa45f89da93..9f2410beaa18 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -1721,6 +1721,41 @@ BPF_PROG_TYPE_FNS(tracepoint, BPF_PROG_TYPE_TRACEPOINT);
 BPF_PROG_TYPE_FNS(xdp, BPF_PROG_TYPE_XDP);
 BPF_PROG_TYPE_FNS(perf_event, BPF_PROG_TYPE_PERF_EVENT);
 
+static enum bpf_prog_type bpf_program__guess_type(struct bpf_program *prog)
+{
+   if (!prog->section_name)
+   goto err;
+
+   if (strncmp(prog->section_name, "socket", 6) == 0)
+   return BPF_PROG_TYPE_SOCKET_FILTER;
+   if (strncmp(prog->section_name, "kprobe/", 7) == 0)
+   return BPF_PROG_TYPE_KPROBE;
+   if (strncmp(prog->section_name, "kretprobe/", 10) == 0)
+   return BPF_PROG_TYPE_KPROBE;
+   if (strncmp(prog->section_name, "tracepoint/", 11) == 0)
+   return BPF_PROG_TYPE_TRACEPOINT;
+   if (strncmp(prog->section_name, "xdp", 3) == 0)
+   return BPF_PROG_TYPE_XDP;
+   if (strncmp(prog->section_name, "perf_event", 10) == 0)
+   return BPF_PROG_TYPE_PERF_EVENT;
+   if (strncmp(prog->section_name, "cgroup/skb", 10) == 0)
+   return BPF_PROG_TYPE_CGROUP_SKB;
+   if (strncmp(prog->section_name, "cgroup/sock", 11) == 0)
+   return BPF_PROG_TYPE_CGROUP_SOCK;
+   if (strncmp(prog->section_name, "cgroup/dev", 10) == 0)
+   return BPF_PROG_TYPE_CGROUP_DEVICE;
+   if (strncmp(prog->section_name, "sockops", 7) == 0)
+   return BPF_PROG_TYPE_SOCK_OPS;
+   if (strncmp(prog->section_name, "sk_skb", 6) == 0)
+   return BPF_PROG_TYPE_SK_SKB;
+
+err:
+   pr_warning("failed to guess program type based on section name %s\n",
+  prog->section_name);
+
+   return BPF_PROG_TYPE_UNSPEC;
+}
+
 int bpf_map__fd(struct bpf_map *map)
 {
return map ? map->fd : -EINVAL;
@@ -1832,6 +1867,18 @@ int bpf_prog_load(const char *file, enum bpf_prog_type 
type,
return -ENOENT;
}
 
+   /*
+* If type is not specified, try to guess it based on
+* section name.
+*/
+   if (type == BPF_PROG_TYPE_UNSPEC) {
+   type = bpf_program__guess_type(prog);
+   if (type == BPF_PROG_TYPE_UNSPEC) {
+   bpf_object__close(obj);
+   return -EINVAL;
+   }
+   }
+
bpf_program__set_type(prog, type);
err = bpf_object__load(obj);
if (err) {
-- 
2.14.3

[PATCH net-next 5/5] bpftool: implement cglist command

2017-11-30 Thread Roman Gushchin

Implement cgattach command to list bpf progrrams attached
to the given cgroup:

Example:
$ ./bpftool cgattach dev_cgroup.o /sys/fs/cgroup/user.slice/ device
$ ./bpftool cglist /sys/fs/cgroup/user.slice/
ID   AttachType  Name
1device  bpf_prog1

Signed-off-by: Roman Gushchin <g...@fb.com>
Cc: Alexei Starovoitov <a...@kernel.org>
Cc: Daniel Borkmann <dan...@iogearbox.net>
Cc: Jakub Kicinski <jakub.kicin...@netronome.com>
Cc: Martin KaFai Lau <ka...@fb.com>
---
 tools/bpf/bpftool/main.c | 82 +++-
 1 file changed, 81 insertions(+), 1 deletion(-)

diff --git a/tools/bpf/bpftool/main.c b/tools/bpf/bpftool/main.c
index 77fcc1a0bd5d..8a48f6a32adc 100644
--- a/tools/bpf/bpftool/main.c
+++ b/tools/bpf/bpftool/main.c
@@ -82,12 +82,13 @@ static int do_help(int argc, char **argv)
"   %s batch file FILE\n"
"   %s cgattach FILE CGROUP TYPE\n"
"   %s cgdetach CGROUP TYPE ID\n"
+   "   %s cglist CGROUP TYPE\n"
"   %s version\n"
"\n"
"   OBJECT := { prog | map }\n"
"   " HELP_SPEC_OPTIONS "\n"
"",
-   bin_name, bin_name, bin_name, bin_name, bin_name);
+   bin_name, bin_name, bin_name, bin_name, bin_name, bin_name);
 
return 0;
 }
@@ -168,6 +169,7 @@ void fprint_hex(FILE *f, void *arg, unsigned int n, const 
char *sep)
 static int do_batch(int argc, char **argv);
 static int do_cgattach(int argc, char **argv);
 static int do_cgdetach(int argc, char **argv);
+static int do_cglist(int argc, char **argv);
 
 static const struct cmd cmds[] = {
{ "help",   do_help },
@@ -176,6 +178,7 @@ static const struct cmd cmds[] = {
{ "map",do_map },
{ "cgattach",   do_cgattach },
{ "cgdetach",   do_cgdetach },
+   { "cglist", do_cglist },
{ "version",do_version },
{ 0 }
 };
@@ -386,6 +389,83 @@ static int do_cgdetach(int argc, char **argv)
return 0;
 }
 
+static int do_cglist(int argc, char **argv)
+{
+   enum bpf_attach_type type;
+   int prog_fd, cgroup_fd;
+   __u32 attach_flags;
+   __u32 prog_ids[1024] = {0};
+   __u32 prog_cnt, iter;
+   int ret = -1;
+   struct bpf_prog_info info = {};
+   __u32 info_len = sizeof(info);
+
+   if (argc < 1) {
+   p_err("too few parameters for cglist\n");
+   return -1;
+   } else if (argc > 1) {
+   p_err("too many parameters for cglist\n");
+   return -1;
+   }
+
+   cgroup_fd = open(argv[0], O_RDONLY);
+   if (cgroup_fd < 0) {
+   p_err("can't open cgroup %s\n", argv[1]);
+   return -1;
+   }
+
+   if (json_output)
+   jsonw_start_array(json_wtr);
+
+   for (type = 0; type < __MAX_BPF_ATTACH_TYPE; type++) {
+   prog_cnt = ARRAY_SIZE(prog_ids);
+   if (bpf_prog_query(cgroup_fd, type, 0, _flags, prog_ids,
+  _cnt))
+   continue;
+
+   ret = 0;
+
+   if (prog_cnt == 0)
+   continue;
+
+   if (!json_output)
+   printf("%-8s %-15s %-15s\n", "ID", "AttachType",
+  "Name");
+
+   for (iter = 0; iter < prog_cnt; iter++) {
+   prog_fd = bpf_prog_get_fd_by_id(prog_ids[iter]);
+   if (prog_fd < 0)
+   continue;
+
+   if (bpf_obj_get_info_by_fd(prog_fd, , _len)) {
+   close(prog_fd);
+   continue;
+   }
+
+   if (json_output) {
+   jsonw_start_object(json_wtr);
+   jsonw_uint_field(json_wtr, "id", info.id);
+   jsonw_string_field(json_wtr, "attach_type",
+  attach_type_strings[type]);
+   jsonw_string_field(json_wtr, "name", info.name);
+   jsonw_end_object(json_wtr);
+   } else {
+   printf("%-8u %-15s %-15s\n", info.id,
+  attach_type_strings[type], info.name);
+   }
+
+   close(prog_fd);
+   }
+   }
+
+   if (json_output)
+   jsonw_end_array(json_wtr);
+
+   close(cgroup_fd);
+
+   return ret;
+}
+
 int main(int argc, char **argv)
 {
static const struct option options[] = {
-- 
2.14.3

[PATCH net-next 4/5] bpftool: implement cgdetach command

2017-11-30 Thread Roman Gushchin

Implement cgdetach command, which allows to detach the bpf
program from a cgroup. It takes program id and attach type
as arguments.

Example:
$ ./bpftool cgdetach /sys/fs/cgroup/user.slice/ device 1

Signed-off-by: Roman Gushchin <g...@fb.com>
Cc: Alexei Starovoitov <a...@kernel.org>
Cc: Daniel Borkmann <dan...@iogearbox.net>
Cc: Jakub Kicinski <jakub.kicin...@netronome.com>
Cc: Martin KaFai Lau <ka...@fb.com>
---
 tools/bpf/bpftool/main.c | 50 +++-
 1 file changed, 49 insertions(+), 1 deletion(-)

diff --git a/tools/bpf/bpftool/main.c b/tools/bpf/bpftool/main.c
index 8eb3b9bf5bb2..77fcc1a0bd5d 100644
--- a/tools/bpf/bpftool/main.c
+++ b/tools/bpf/bpftool/main.c
@@ -81,12 +81,13 @@ static int do_help(int argc, char **argv)
"Usage: %s [OPTIONS] OBJECT { COMMAND | help }\n"
"   %s batch file FILE\n"
"   %s cgattach FILE CGROUP TYPE\n"
+   "   %s cgdetach CGROUP TYPE ID\n"
"   %s version\n"
"\n"
"   OBJECT := { prog | map }\n"
"   " HELP_SPEC_OPTIONS "\n"
"",
-   bin_name, bin_name, bin_name, bin_name);
+   bin_name, bin_name, bin_name, bin_name, bin_name);
 
return 0;
 }
@@ -166,6 +167,7 @@ void fprint_hex(FILE *f, void *arg, unsigned int n, const 
char *sep)
 
 static int do_batch(int argc, char **argv);
 static int do_cgattach(int argc, char **argv);
+static int do_cgdetach(int argc, char **argv);
 
 static const struct cmd cmds[] = {
{ "help",   do_help },
@@ -173,6 +175,7 @@ static const struct cmd cmds[] = {
{ "prog",   do_prog },
{ "map",do_map },
{ "cgattach",   do_cgattach },
+   { "cgdetach",   do_cgdetach },
{ "version",do_version },
{ 0 }
 };
@@ -338,6 +341,51 @@ static int do_cgattach(int argc, char **argv)
return 0;
 }
 
+static int do_cgdetach(int argc, char **argv)
+{
+   int prog_fd, cgroup_fd;
+   enum bpf_attach_type attach_type;
+
+   if (argc < 3) {
+   p_err("too few parameters for cgdetach\n");
+   return -1;
+   } else if (argc > 3) {
+   p_err("too many parameters for cgdetach\n");
+   return -1;
+   }
+
+   cgroup_fd = open(argv[0], O_RDONLY);
+   if (cgroup_fd < 0) {
+   p_err("can't open cgroup %s\n", argv[1]);
+   return -1;
+   }
+
+   attach_type = parse_attach_type(argv[1]);
+   if (attach_type == __MAX_BPF_ATTACH_TYPE) {
+   close(cgroup_fd);
+   p_err("Invalid attach type");
+   return -1;
+   }
+
+   prog_fd = bpf_prog_get_fd_by_id(atoi(argv[2]));
+   if (prog_fd < 0) {
+   p_err("invalid program id\n");
+   return -1;
+   }
+
+   if (bpf_prog_detach2(prog_fd, cgroup_fd, attach_type)) {
+   close(prog_fd);
+   close(cgroup_fd);
+   p_err("Failed to attach program");
+   return -1;
+   }
+
+   close(prog_fd);
+   close(cgroup_fd);
+
+   return 0;
+}
+
 int main(int argc, char **argv)
 {
static const struct option options[] = {
-- 
2.14.3

[PATCH net-next 3/5] bpftool: implement cgattach command

2017-11-30 Thread Roman Gushchin

This patch add the cgattach command to bpftool.
It allows to load a bpf program from a binary file and attach
it to a given cgroup.

Example:
$ bpftool cgattach ./mybpfprog.o /sys/fs/cgroup/user.slice/ ingress

Signed-off-by: Roman Gushchin <g...@fb.com>
Cc: Alexei Starovoitov <a...@kernel.org>
Cc: Daniel Borkmann <dan...@iogearbox.net>
Cc: Jakub Kicinski <jakub.kicin...@netronome.com>
Cc: Martin KaFai Lau <ka...@fb.com>
---
 tools/bpf/bpftool/main.c | 81 +++-
 1 file changed, 80 insertions(+), 1 deletion(-)

diff --git a/tools/bpf/bpftool/main.c b/tools/bpf/bpftool/main.c
index d6e4762170a4..8eb3b9bf5bb2 100644
--- a/tools/bpf/bpftool/main.c
+++ b/tools/bpf/bpftool/main.c
@@ -41,9 +41,14 @@
 #include 
 #include 
 #include 
+#include 
 #include 
+#include 
+#include 
+#include 
 
 #include 
+#include 
 
 #include "main.h"
 
@@ -75,12 +80,13 @@ static int do_help(int argc, char **argv)
fprintf(stderr,
"Usage: %s [OPTIONS] OBJECT { COMMAND | help }\n"
"   %s batch file FILE\n"
+   "   %s cgattach FILE CGROUP TYPE\n"
"   %s version\n"
"\n"
"   OBJECT := { prog | map }\n"
"   " HELP_SPEC_OPTIONS "\n"
"",
-   bin_name, bin_name, bin_name);
+   bin_name, bin_name, bin_name, bin_name);
 
return 0;
 }
@@ -159,12 +165,14 @@ void fprint_hex(FILE *f, void *arg, unsigned int n, const 
char *sep)
 }
 
 static int do_batch(int argc, char **argv);
+static int do_cgattach(int argc, char **argv);
 
 static const struct cmd cmds[] = {
{ "help",   do_help },
{ "batch",  do_batch },
{ "prog",   do_prog },
{ "map",do_map },
+   { "cgattach",   do_cgattach },
{ "version",do_version },
{ 0 }
 };
@@ -259,6 +267,77 @@ static int do_batch(int argc, char **argv)
return err;
 }
 
+static const char * const attach_type_strings[] = {
+   [BPF_CGROUP_INET_INGRESS] = "ingress",
+   [BPF_CGROUP_INET_EGRESS] = "egress",
+   [BPF_CGROUP_INET_SOCK_CREATE] = "sock_create",
+   [BPF_CGROUP_SOCK_OPS] = "sock_ops",
+   [BPF_CGROUP_DEVICE] = "device",
+   [__MAX_BPF_ATTACH_TYPE] = NULL,
+};
+
+enum bpf_attach_type parse_attach_type(const char *str)
+{
+   enum bpf_attach_type type;
+
+   for (type = 0; type < __MAX_BPF_ATTACH_TYPE; type++) {
+   if (attach_type_strings[type] &&
+   strcmp(str, attach_type_strings[type]) == 0)
+   return type;
+   }
+
+   return __MAX_BPF_ATTACH_TYPE;
+}
+
+static int do_cgattach(int argc, char **argv)
+{
+   struct bpf_object *obj;
+   int prog_fd, cgroup_fd;
+   enum bpf_attach_type attach_type;
+
+   if (argc < 3) {
+   p_err("too few parameters for cgattach\n");
+   return -1;
+   } else if (argc > 3) {
+   p_err("too many parameters for cgattach\n");
+   return -1;
+   }
+
+   if (bpf_prog_load(argv[0], BPF_PROG_TYPE_UNSPEC, , _fd))
+   return -1;
+
+   cgroup_fd = open(argv[1], O_RDONLY);
+   if (cgroup_fd < 0) {
+   bpf_object__close(obj);
+   close(prog_fd);
+   p_err("can't open cgroup %s\n", argv[1]);
+   return -1;
+   }
+
+   attach_type = parse_attach_type(argv[2]);
+   if (attach_type == __MAX_BPF_ATTACH_TYPE) {
+   bpf_object__close(obj);
+   close(prog_fd);
+   close(cgroup_fd);
+   p_err("Invalid attach type\n");
+   return -1;
+   }
+
+   if (bpf_prog_attach(prog_fd, cgroup_fd, attach_type, 0)) {
+   bpf_object__close(obj);
+   close(prog_fd);
+   close(cgroup_fd);
+   p_err("Failed to attach program");
+   return -1;
+   }
+
+   bpf_object__close(obj);
+   close(prog_fd);
+   close(cgroup_fd);
+
+   return 0;
+}
+
 int main(int argc, char **argv)
 {
static const struct option options[] = {
-- 
2.14.3

[PATCH net-next 2/5] libbpf: prefer global symbols as bpf program name source

2017-11-30 Thread Roman Gushchin

Libbpf picks the name of the first symbol in the corresponding
elf section to use as a program name. But without taking symbol's
scope into account it may end's up with some local label
as a program name. E.g.:

$ bpftool cglist /sys/fs/cgroup/system.slice/tmp.mount/
  16 device  LBB0_10

Fix this by preferring global symbols as program name.
For instance:

$ bpftool cglist /sys/fs/cgroup/system.slice/tmp.mount/
  17 device  bpf_prog1

Signed-off-by: Roman Gushchin <g...@fb.com>
Cc: Martin KaFai Lau <ka...@fb.com>
Cc: Alexei Starovoitov <a...@kernel.org>
Cc: Daniel Borkmann <dan...@iogearbox.net>
Cc: Jakub Kicinski <jakub.kicin...@netronome.com>
---
 tools/lib/bpf/libbpf.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 9f2410beaa18..5191afd46556 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -387,6 +387,8 @@ bpf_object__init_prog_names(struct bpf_object *obj)
continue;
if (sym.st_shndx != prog->idx)
continue;
+   if (GELF_ST_BIND(sym.st_info) != STB_GLOBAL)
+   continue;
 
name = elf_strptr(obj->efile.elf,
  obj->efile.strtabidx,
-- 
2.14.3

[PATCH v3 net-next 0/5] eBPF-based device cgroup controller

2017-11-05 Thread Roman Gushchin

This patchset introduces an eBPF-based device controller for cgroup v2.

Patches (1) and (2) are a preparational work required to share some code
  with the existing device controller implementation.
Patch (3) is the main patch, which introduces a new bpf prog type
  and all necessary infrastructure.
Patch (4) moves cgroup_helpers.c/h to use them by patch (4).
Patch (5) implements an example of eBPF program which controls access
  to device files and corresponding userspace test.

v3:
  Renamed constants introduced by patch (3) to BPF_DEVCG_*

v2:
  Added patch (1).

v1:
  https://lkml.org/lkml/2017/11/1/363

Roman Gushchin (5):
  device_cgroup: add DEVCG_ prefix to ACC_* and DEV_* constants
  device_cgroup: prepare code for bpf-based device controller
  bpf, cgroup: implement eBPF-based device controller for cgroup v2
  bpf: move cgroup_helpers from samples/bpf/ to
tools/testing/selftesting/bpf/
  selftests/bpf: add a test for device cgroup controller

 include/linux/bpf-cgroup.h | 15 
 include/linux/bpf_types.h  |  3 +
 include/linux/device_cgroup.h  | 67 +++-
 include/uapi/linux/bpf.h   | 15 
 kernel/bpf/cgroup.c| 67 
 kernel/bpf/syscall.c   |  7 ++
 kernel/bpf/verifier.c  |  1 +
 samples/bpf/Makefile   |  5 +-
 security/device_cgroup.c   | 91 ++---
 tools/include/uapi/linux/bpf.h | 15 
 tools/testing/selftests/bpf/Makefile   |  6 +-
 .../testing/selftests}/bpf/cgroup_helpers.c|  0
 .../testing/selftests}/bpf/cgroup_helpers.h|  0
 tools/testing/selftests/bpf/dev_cgroup.c   | 60 ++
 tools/testing/selftests/bpf/test_dev_cgroup.c  | 93 ++
 15 files changed, 369 insertions(+), 76 deletions(-)
 rename {samples => tools/testing/selftests}/bpf/cgroup_helpers.c (100%)
 rename {samples => tools/testing/selftests}/bpf/cgroup_helpers.h (100%)
 create mode 100644 tools/testing/selftests/bpf/dev_cgroup.c
 create mode 100644 tools/testing/selftests/bpf/test_dev_cgroup.c

-- 
2.13.6

[PATCH v3 net-next 3/5] bpf, cgroup: implement eBPF-based device controller for cgroup v2

2017-11-05 Thread Roman Gushchin

Cgroup v2 lacks the device controller, provided by cgroup v1.
This patch adds a new eBPF program type, which in combination
of previously added ability to attach multiple eBPF programs
to a cgroup, will provide a similar functionality, but with some
additional flexibility.

This patch introduces a BPF_PROG_TYPE_CGROUP_DEVICE program type.
A program takes major and minor device numbers, device type
(block/character) and access type (mknod/read/write) as parameters
and returns an integer which defines if the operation should be
allowed or terminated with -EPERM.

Signed-off-by: Roman Gushchin <g...@fb.com>
Acked-by: Alexei Starovoitov <a...@kernel.org>
Acked-by: Tejun Heo <t...@kernel.org>
Cc: Daniel Borkmann <dan...@iogearbox.net>
---
 include/linux/bpf-cgroup.h | 15 ++
 include/linux/bpf_types.h  |  3 ++
 include/linux/device_cgroup.h  |  8 -
 include/uapi/linux/bpf.h   | 15 ++
 kernel/bpf/cgroup.c| 67 ++
 kernel/bpf/syscall.c   |  7 +
 kernel/bpf/verifier.c  |  1 +
 tools/include/uapi/linux/bpf.h | 15 ++
 8 files changed, 130 insertions(+), 1 deletion(-)

diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index 87a7db9feb38..a7f16e0f8d68 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -67,6 +67,9 @@ int __cgroup_bpf_run_filter_sock_ops(struct sock *sk,
 struct bpf_sock_ops_kern *sock_ops,
 enum bpf_attach_type type);
 
+int __cgroup_bpf_check_dev_permission(short dev_type, u32 major, u32 minor,
+ short access, enum bpf_attach_type type);
+
 /* Wrappers for __cgroup_bpf_run_filter_skb() guarded by cgroup_bpf_enabled. */
 #define BPF_CGROUP_RUN_PROG_INET_INGRESS(sk, skb)\
 ({   \
@@ -112,6 +115,17 @@ int __cgroup_bpf_run_filter_sock_ops(struct sock *sk,
}  \
__ret; \
 })
+
+#define BPF_CGROUP_RUN_PROG_DEVICE_CGROUP(type, major, minor, access)\
+({   \
+   int __ret = 0;\
+   if (cgroup_bpf_enabled)   \
+   __ret = __cgroup_bpf_check_dev_permission(type, major, minor, \
+ access, \
+ BPF_CGROUP_DEVICE); \
+ \
+   __ret;\
+})
 #else
 
 struct cgroup_bpf {};
@@ -122,6 +136,7 @@ static inline int cgroup_bpf_inherit(struct cgroup *cgrp) { 
return 0; }
 #define BPF_CGROUP_RUN_PROG_INET_EGRESS(sk,skb) ({ 0; })
 #define BPF_CGROUP_RUN_PROG_INET_SOCK(sk) ({ 0; })
 #define BPF_CGROUP_RUN_PROG_SOCK_OPS(sock_ops) ({ 0; })
+#define BPF_CGROUP_RUN_PROG_DEVICE_CGROUP(type,major,minor,access) ({ 0; })
 
 #endif /* CONFIG_CGROUP_BPF */
 
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index 53c5b9ad7220..978c1d9c9383 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -19,6 +19,9 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_KPROBE, kprobe)
 BPF_PROG_TYPE(BPF_PROG_TYPE_TRACEPOINT, tracepoint)
 BPF_PROG_TYPE(BPF_PROG_TYPE_PERF_EVENT, perf_event)
 #endif
+#ifdef CONFIG_CGROUP_BPF
+BPF_PROG_TYPE(BPF_PROG_TYPE_CGROUP_DEVICE, cg_dev)
+#endif
 
 BPF_MAP_TYPE(BPF_MAP_TYPE_ARRAY, array_map_ops)
 BPF_MAP_TYPE(BPF_MAP_TYPE_PERCPU_ARRAY, percpu_array_map_ops)
diff --git a/include/linux/device_cgroup.h b/include/linux/device_cgroup.h
index 2d93d7ecd479..8557efe096dc 100644
--- a/include/linux/device_cgroup.h
+++ b/include/linux/device_cgroup.h
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 #include 
+#include 
 
 #define DEVCG_ACC_MKNOD 1
 #define DEVCG_ACC_READ  2
@@ -19,10 +20,15 @@ static inline int __devcgroup_check_permission(short type, 
u32 major, u32 minor,
 { return 0; }
 #endif
 
-#ifdef CONFIG_CGROUP_DEVICE
+#if defined(CONFIG_CGROUP_DEVICE) || defined(CONFIG_CGROUP_BPF)
 static inline int devcgroup_check_permission(short type, u32 major, u32 minor,
 short access)
 {
+   int rc = BPF_CGROUP_RUN_PROG_DEVICE_CGROUP(type, major, minor, access);
+
+   if (rc)
+   return -EPERM;
+
return __devcgroup_check_permission(type, major, minor, access);
 }
 
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index a9820677c2ff..d581407bb2dc 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -132,6 +132

[PATCH v3 net-next 2/5] device_cgroup: prepare code for bpf-based device controller

2017-11-05 Thread Roman Gushchin

This is non-functional change to prepare the device cgroup code
for adding eBPF-based controller for cgroups v2.

The patch performs the following changes:
1) __devcgroup_inode_permission() and devcgroup_inode_mknod()
   are moving to the device-cgroup.h and converting into static inline.
2) __devcgroup_check_permission() is exported.
3) devcgroup_check_permission() wrapper is introduced to be used
   by both existing and new bpf-based implementations.

Signed-off-by: Roman Gushchin <g...@fb.com>
Acked-by: Tejun Heo <t...@kernel.org>
Acked-by: Alexei Starovoitov <a...@kernel.org>
---
 include/linux/device_cgroup.h | 61 ---
 security/device_cgroup.c  | 47 ++---
 2 files changed, 59 insertions(+), 49 deletions(-)

diff --git a/include/linux/device_cgroup.h b/include/linux/device_cgroup.h
index cdbc344a92e4..2d93d7ecd479 100644
--- a/include/linux/device_cgroup.h
+++ b/include/linux/device_cgroup.h
@@ -1,17 +1,70 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 #include 
 
+#define DEVCG_ACC_MKNOD 1
+#define DEVCG_ACC_READ  2
+#define DEVCG_ACC_WRITE 4
+#define DEVCG_ACC_MASK (DEVCG_ACC_MKNOD | DEVCG_ACC_READ | DEVCG_ACC_WRITE)
+
+#define DEVCG_DEV_BLOCK 1
+#define DEVCG_DEV_CHAR  2
+#define DEVCG_DEV_ALL   4  /* this represents all devices */
+
+#ifdef CONFIG_CGROUP_DEVICE
+extern int __devcgroup_check_permission(short type, u32 major, u32 minor,
+   short access);
+#else
+static inline int __devcgroup_check_permission(short type, u32 major, u32 
minor,
+  short access)
+{ return 0; }
+#endif
+
 #ifdef CONFIG_CGROUP_DEVICE
-extern int __devcgroup_inode_permission(struct inode *inode, int mask);
-extern int devcgroup_inode_mknod(int mode, dev_t dev);
+static inline int devcgroup_check_permission(short type, u32 major, u32 minor,
+short access)
+{
+   return __devcgroup_check_permission(type, major, minor, access);
+}
+
 static inline int devcgroup_inode_permission(struct inode *inode, int mask)
 {
+   short type, access = 0;
+
if (likely(!inode->i_rdev))
return 0;
-   if (!S_ISBLK(inode->i_mode) && !S_ISCHR(inode->i_mode))
+
+   if (S_ISBLK(inode->i_mode))
+   type = DEVCG_DEV_BLOCK;
+   else if (S_ISCHR(inode->i_mode))
+   type = DEVCG_DEV_CHAR;
+   else
return 0;
-   return __devcgroup_inode_permission(inode, mask);
+
+   if (mask & MAY_WRITE)
+   access |= DEVCG_ACC_WRITE;
+   if (mask & MAY_READ)
+   access |= DEVCG_ACC_READ;
+
+   return devcgroup_check_permission(type, imajor(inode), iminor(inode),
+ access);
 }
+
+static inline int devcgroup_inode_mknod(int mode, dev_t dev)
+{
+   short type;
+
+   if (!S_ISBLK(mode) && !S_ISCHR(mode))
+   return 0;
+
+   if (S_ISBLK(mode))
+   type = DEVCG_DEV_BLOCK;
+   else
+   type = DEVCG_DEV_CHAR;
+
+   return devcgroup_check_permission(type, MAJOR(dev), MINOR(dev),
+ DEVCG_ACC_MKNOD);
+}
+
 #else
 static inline int devcgroup_inode_permission(struct inode *inode, int mask)
 { return 0; }
diff --git a/security/device_cgroup.c b/security/device_cgroup.c
index 968c21557ba7..c65b39bafdfe 100644
--- a/security/device_cgroup.c
+++ b/security/device_cgroup.c
@@ -15,15 +15,6 @@
 #include 
 #include 
 
-#define DEVCG_ACC_MKNOD 1
-#define DEVCG_ACC_READ  2
-#define DEVCG_ACC_WRITE 4
-#define DEVCG_ACC_MASK (DEVCG_ACC_MKNOD | DEVCG_ACC_READ | DEVCG_ACC_WRITE)
-
-#define DEVCG_DEV_BLOCK 1
-#define DEVCG_DEV_CHAR  2
-#define DEVCG_DEV_ALL   4  /* this represents all devices */
-
 static DEFINE_MUTEX(devcgroup_mutex);
 
 enum devcg_behavior {
@@ -810,8 +801,8 @@ struct cgroup_subsys devices_cgrp_subsys = {
  *
  * returns 0 on success, -EPERM case the operation is not permitted
  */
-static int __devcgroup_check_permission(short type, u32 major, u32 minor,
-   short access)
+int __devcgroup_check_permission(short type, u32 major, u32 minor,
+short access)
 {
struct dev_cgroup *dev_cgroup;
bool rc;
@@ -833,37 +824,3 @@ static int __devcgroup_check_permission(short type, u32 
major, u32 minor,
 
return 0;
 }
-
-int __devcgroup_inode_permission(struct inode *inode, int mask)
-{
-   short type, access = 0;
-
-   if (S_ISBLK(inode->i_mode))
-   type = DEVCG_DEV_BLOCK;
-   if (S_ISCHR(inode->i_mode))
-   type = DEVCG_DEV_CHAR;
-   if (mask & MAY_WRITE)
-   access |= DEVCG_ACC_WRITE;
-   if (mask & MAY_READ)
-   access |= DEVCG_ACC_READ;
-
-   return __devcgroup_check_permission(type, i

[PATCH v3 net-next 4/5] bpf: move cgroup_helpers from samples/bpf/ to tools/testing/selftesting/bpf/

2017-11-05 Thread Roman Gushchin

The purpose of this move is to use these files in bpf tests.

Signed-off-by: Roman Gushchin <g...@fb.com>
Acked-by: Alexei Starovoitov <a...@kernel.org>
Acked-by: Tejun Heo <t...@kernel.org>
Cc: Daniel Borkmann <dan...@iogearbox.net>
---
 samples/bpf/Makefile  | 5 +++--
 tools/testing/selftests/bpf/Makefile  | 2 +-
 {samples => tools/testing/selftests}/bpf/cgroup_helpers.c | 0
 {samples => tools/testing/selftests}/bpf/cgroup_helpers.h | 0
 4 files changed, 4 insertions(+), 3 deletions(-)
 rename {samples => tools/testing/selftests}/bpf/cgroup_helpers.c (100%)
 rename {samples => tools/testing/selftests}/bpf/cgroup_helpers.h (100%)

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 6a9321ec348a..5994075b080d 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -46,6 +46,7 @@ hostprogs-y += syscall_tp
 
 # Libbpf dependencies
 LIBBPF := ../../tools/lib/bpf/bpf.o
+CGROUP_HELPERS := ../../tools/testing/selftests/bpf/cgroup_helpers.o
 
 test_lru_dist-objs := test_lru_dist.o $(LIBBPF)
 sock_example-objs := sock_example.o $(LIBBPF)
@@ -69,13 +70,13 @@ map_perf_test-objs := bpf_load.o $(LIBBPF) 
map_perf_test_user.o
 test_overhead-objs := bpf_load.o $(LIBBPF) test_overhead_user.o
 test_cgrp2_array_pin-objs := $(LIBBPF) test_cgrp2_array_pin.o
 test_cgrp2_attach-objs := $(LIBBPF) test_cgrp2_attach.o
-test_cgrp2_attach2-objs := $(LIBBPF) test_cgrp2_attach2.o cgroup_helpers.o
+test_cgrp2_attach2-objs := $(LIBBPF) test_cgrp2_attach2.o $(CGROUP_HELPERS)
 test_cgrp2_sock-objs := $(LIBBPF) test_cgrp2_sock.o
 test_cgrp2_sock2-objs := bpf_load.o $(LIBBPF) test_cgrp2_sock2.o
 xdp1-objs := bpf_load.o $(LIBBPF) xdp1_user.o
 # reuse xdp1 source intentionally
 xdp2-objs := bpf_load.o $(LIBBPF) xdp1_user.o
-test_current_task_under_cgroup-objs := bpf_load.o $(LIBBPF) cgroup_helpers.o \
+test_current_task_under_cgroup-objs := bpf_load.o $(LIBBPF) $(CGROUP_HELPERS) \
   test_current_task_under_cgroup_user.o
 trace_event-objs := bpf_load.o $(LIBBPF) trace_event_user.o
 sampleip-objs := bpf_load.o $(LIBBPF) sampleip_user.o
diff --git a/tools/testing/selftests/bpf/Makefile 
b/tools/testing/selftests/bpf/Makefile
index 4f0734aa6e93..9fbb02638198 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -23,7 +23,7 @@ TEST_PROGS := test_kmod.sh test_xdp_redirect.sh 
test_xdp_meta.sh
 
 include ../lib.mk
 
-BPFOBJ := $(OUTPUT)/libbpf.a
+BPFOBJ := $(OUTPUT)/libbpf.a $(OUTPUT)/cgroup_helpers.c
 
 $(TEST_GEN_PROGS): $(BPFOBJ)
 
diff --git a/samples/bpf/cgroup_helpers.c 
b/tools/testing/selftests/bpf/cgroup_helpers.c
similarity index 100%
rename from samples/bpf/cgroup_helpers.c
rename to tools/testing/selftests/bpf/cgroup_helpers.c
diff --git a/samples/bpf/cgroup_helpers.h 
b/tools/testing/selftests/bpf/cgroup_helpers.h
similarity index 100%
rename from samples/bpf/cgroup_helpers.h
rename to tools/testing/selftests/bpf/cgroup_helpers.h
-- 
2.13.6

[PATCH v3 net-next 5/5] selftests/bpf: add a test for device cgroup controller

2017-11-05 Thread Roman Gushchin

Add a test for device cgroup controller.

The test loads a simple bpf program which logs all
device access attempts using trace_printk() and forbids
all operations except operations with /dev/zero and
/dev/urandom.

Then the test creates and joins a test cgroup, and attaches
the bpf program to it.

Then it tries to perform some simple device operations
and checks the result:

  create /dev/null (should fail)
  create /dev/zero (should pass)
  copy data from /dev/urandom to /dev/zero (should pass)
  copy data from /dev/urandom to /dev/full (should fail)
  copy data from /dev/random to /dev/zero (should fail)

Signed-off-by: Roman Gushchin <g...@fb.com>
Acked-by: Alexei Starovoitov <a...@kernel.org>
Acked-by: Tejun Heo <t...@kernel.org>
Cc: Daniel Borkmann <dan...@iogearbox.net>
---
 tools/testing/selftests/bpf/Makefile  |  4 +-
 tools/testing/selftests/bpf/dev_cgroup.c  | 60 +
 tools/testing/selftests/bpf/test_dev_cgroup.c | 93 +++
 3 files changed, 155 insertions(+), 2 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/dev_cgroup.c
 create mode 100644 tools/testing/selftests/bpf/test_dev_cgroup.c

diff --git a/tools/testing/selftests/bpf/Makefile 
b/tools/testing/selftests/bpf/Makefile
index 9fbb02638198..333a48655ee0 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -13,11 +13,11 @@ CFLAGS += -Wall -O2 -I$(APIDIR) -I$(LIBDIR) -I$(GENDIR) 
$(GENFLAGS) -I../../../i
 LDLIBS += -lcap -lelf
 
 TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map 
test_progs \
-   test_align test_verifier_log
+   test_align test_verifier_log test_dev_cgroup
 
 TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o test_tcp_estats.o 
test_obj_id.o \
test_pkt_md_access.o test_xdp_redirect.o test_xdp_meta.o 
sockmap_parse_prog.o \
-   sockmap_verdict_prog.o
+   sockmap_verdict_prog.o dev_cgroup.o
 
 TEST_PROGS := test_kmod.sh test_xdp_redirect.sh test_xdp_meta.sh
 
diff --git a/tools/testing/selftests/bpf/dev_cgroup.c 
b/tools/testing/selftests/bpf/dev_cgroup.c
new file mode 100644
index ..ce41a3475f27
--- /dev/null
+++ b/tools/testing/selftests/bpf/dev_cgroup.c
@@ -0,0 +1,60 @@
+/* Copyright (c) 2017 Facebook
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include "bpf_helpers.h"
+
+SEC("cgroup/dev")
+int bpf_prog1(struct bpf_cgroup_dev_ctx *ctx)
+{
+   short type = ctx->access_type & 0x;
+#ifdef DEBUG
+   short access = ctx->access_type >> 16;
+   char fmt[] = "  %d:%d\n";
+
+   switch (type) {
+   case BPF_DEVCG_DEV_BLOCK:
+   fmt[0] = 'b';
+   break;
+   case BPF_DEVCG_DEV_CHAR:
+   fmt[0] = 'c';
+   break;
+   default:
+   fmt[0] = '?';
+   break;
+   }
+
+   if (access & BPF_DEVCG_ACC_READ)
+   fmt[8] = 'r';
+
+   if (access & BPF_DEVCG_ACC_WRITE)
+   fmt[9] = 'w';
+
+   if (access & BPF_DEVCG_ACC_MKNOD)
+   fmt[10] = 'm';
+
+   bpf_trace_printk(fmt, sizeof(fmt), ctx->major, ctx->minor);
+#endif
+
+   /* Allow access to /dev/zero and /dev/random.
+* Forbid everything else.
+*/
+   if (ctx->major != 1 || type != BPF_DEVCG_DEV_CHAR)
+   return 0;
+
+   switch (ctx->minor) {
+   case 5: /* 1:5 /dev/zero */
+   case 9: /* 1:9 /dev/urandom */
+   return 1;
+   }
+
+   return 0;
+}
+
+char _license[] SEC("license") = "GPL";
+__u32 _version SEC("version") = LINUX_VERSION_CODE;
diff --git a/tools/testing/selftests/bpf/test_dev_cgroup.c 
b/tools/testing/selftests/bpf/test_dev_cgroup.c
new file mode 100644
index ..02c85d6c89b0
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_dev_cgroup.c
@@ -0,0 +1,93 @@
+/* Copyright (c) 2017 Facebook
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#include "cgroup_helpers.h"
+
+#define DEV_CGROUP_PROG "./dev_cgroup.o"
+
+#define TEST_CGROUP "test-bpf-based-device-cgroup/"
+
+int main(int argc, char **argv)
+{
+   struct bpf_object *obj;
+   int error = EXIT_FAILURE;
+   int prog_fd, cgroup_fd;
+   __u32 prog_cnt;
+
+   if (bpf_prog_load(DEV_CGROUP_PROG, BPF_PROG_TYPE_CGROUP_DEVICE,
+ , _fd)) {
+   printf("Failed to load DEV_CGROUP program\n");
+

[PATCH v3 net-next 1/5] device_cgroup: add DEVCG_ prefix to ACC_* and DEV_* constants

2017-11-05 Thread Roman Gushchin

Rename device type and access type constants defined in
security/device_cgroup.c by adding the DEVCG_ prefix.

The reason behind this renaming is to make them global namespace
friendly, as they will be moved to the corresponding header file
by following patches.

Signed-off-by: Roman Gushchin <g...@fb.com>
Cc: David S. Miller <da...@davemloft.net>
Cc: Tejun Heo <t...@kernel.org>
Cc: Alexei Starovoitov <a...@kernel.org>
Cc: Daniel Borkmann <dan...@iogearbox.net>
---
 security/device_cgroup.c | 72 
 1 file changed, 36 insertions(+), 36 deletions(-)

diff --git a/security/device_cgroup.c b/security/device_cgroup.c
index 5ef7e5240563..968c21557ba7 100644
--- a/security/device_cgroup.c
+++ b/security/device_cgroup.c
@@ -15,14 +15,14 @@
 #include 
 #include 
 
-#define ACC_MKNOD 1
-#define ACC_READ  2
-#define ACC_WRITE 4
-#define ACC_MASK (ACC_MKNOD | ACC_READ | ACC_WRITE)
+#define DEVCG_ACC_MKNOD 1
+#define DEVCG_ACC_READ  2
+#define DEVCG_ACC_WRITE 4
+#define DEVCG_ACC_MASK (DEVCG_ACC_MKNOD | DEVCG_ACC_READ | DEVCG_ACC_WRITE)
 
-#define DEV_BLOCK 1
-#define DEV_CHAR  2
-#define DEV_ALL   4  /* this represents all devices */
+#define DEVCG_DEV_BLOCK 1
+#define DEVCG_DEV_CHAR  2
+#define DEVCG_DEV_ALL   4  /* this represents all devices */
 
 static DEFINE_MUTEX(devcgroup_mutex);
 
@@ -246,21 +246,21 @@ static void set_access(char *acc, short access)
 {
int idx = 0;
memset(acc, 0, ACCLEN);
-   if (access & ACC_READ)
+   if (access & DEVCG_ACC_READ)
acc[idx++] = 'r';
-   if (access & ACC_WRITE)
+   if (access & DEVCG_ACC_WRITE)
acc[idx++] = 'w';
-   if (access & ACC_MKNOD)
+   if (access & DEVCG_ACC_MKNOD)
acc[idx++] = 'm';
 }
 
 static char type_to_char(short type)
 {
-   if (type == DEV_ALL)
+   if (type == DEVCG_DEV_ALL)
return 'a';
-   if (type == DEV_CHAR)
+   if (type == DEVCG_DEV_CHAR)
return 'c';
-   if (type == DEV_BLOCK)
+   if (type == DEVCG_DEV_BLOCK)
return 'b';
return 'X';
 }
@@ -287,10 +287,10 @@ static int devcgroup_seq_show(struct seq_file *m, void *v)
 * This way, the file remains as a "whitelist of devices"
 */
if (devcgroup->behavior == DEVCG_DEFAULT_ALLOW) {
-   set_access(acc, ACC_MASK);
+   set_access(acc, DEVCG_ACC_MASK);
set_majmin(maj, ~0);
set_majmin(min, ~0);
-   seq_printf(m, "%c %s:%s %s\n", type_to_char(DEV_ALL),
+   seq_printf(m, "%c %s:%s %s\n", type_to_char(DEVCG_DEV_ALL),
   maj, min, acc);
} else {
list_for_each_entry_rcu(ex, >exceptions, list) {
@@ -309,10 +309,10 @@ static int devcgroup_seq_show(struct seq_file *m, void *v)
 /**
  * match_exception - iterates the exception list trying to find a complete 
match
  * @exceptions: list of exceptions
- * @type: device type (DEV_BLOCK or DEV_CHAR)
+ * @type: device type (DEVCG_DEV_BLOCK or DEVCG_DEV_CHAR)
  * @major: device file major number, ~0 to match all
  * @minor: device file minor number, ~0 to match all
- * @access: permission mask (ACC_READ, ACC_WRITE, ACC_MKNOD)
+ * @access: permission mask (DEVCG_ACC_READ, DEVCG_ACC_WRITE, DEVCG_ACC_MKNOD)
  *
  * It is considered a complete match if an exception is found that will
  * contain the entire range of provided parameters.
@@ -325,9 +325,9 @@ static bool match_exception(struct list_head *exceptions, 
short type,
struct dev_exception_item *ex;
 
list_for_each_entry_rcu(ex, exceptions, list) {
-   if ((type & DEV_BLOCK) && !(ex->type & DEV_BLOCK))
+   if ((type & DEVCG_DEV_BLOCK) && !(ex->type & DEVCG_DEV_BLOCK))
continue;
-   if ((type & DEV_CHAR) && !(ex->type & DEV_CHAR))
+   if ((type & DEVCG_DEV_CHAR) && !(ex->type & DEVCG_DEV_CHAR))
continue;
if (ex->major != ~0 && ex->major != major)
continue;
@@ -344,10 +344,10 @@ static bool match_exception(struct list_head *exceptions, 
short type,
 /**
  * match_exception_partial - iterates the exception list trying to find a 
partial match
  * @exceptions: list of exceptions
- * @type: device type (DEV_BLOCK or DEV_CHAR)
+ * @type: device type (DEVCG_DEV_BLOCK or DEVCG_DEV_CHAR)
  * @major: device file major number, ~0 to match all
  * @minor: device file minor number, ~0 to match all
- * @access: permission mask (ACC_READ, ACC_WRITE, ACC_MKNOD)
+ * @access: permission mask (DEVCG_ACC_READ, DEVCG_ACC_WRITE, DEVCG_ACC_MKNOD)
  *
  * It is considered a partial match if an exception's range is found to
  * contain *any* of the device

Re: [PATCH v3 net-next 1/5] device_cgroup: add DEVCG_ prefix to ACC_* and DEV_* constants

2017-11-02 Thread Roman Gushchin

On Thu, Nov 02, 2017 at 10:54:12AM -0700, Joe Perches wrote:
> On Thu, 2017-11-02 at 13:15 -0400, Roman Gushchin wrote:
> > Rename device type and access type constants defined in
> > security/device_cgroup.c by adding the DEVCG_ prefix.
> > 
> > The reason behind this renaming is to make them global namespace
> > friendly, as they will be moved to the corresponding header file
> > by following patches.
> []
> > diff --git a/security/device_cgroup.c b/security/device_cgroup.c
> []
> > @@ -14,14 +14,14 @@
> >  #include 
> >  #include 
> >  
> > -#define ACC_MKNOD 1
> > -#define ACC_READ  2
> > -#define ACC_WRITE 4
> > -#define ACC_MASK (ACC_MKNOD | ACC_READ | ACC_WRITE)
> > +#define DEVCG_ACC_MKNOD 1
> > +#define DEVCG_ACC_READ  2
> > +#define DEVCG_ACC_WRITE 4
> > +#define DEVCG_ACC_MASK (DEVCG_ACC_MKNOD | DEVCG_ACC_READ | DEVCG_ACC_WRITE)
> 
> trivia:
> 
> major and minor are u32 but all the
> type and access uses seem to be "short"
> 
> Perhaps u16 (or __u16 if uapi public) instead?

It was so for a while, and it doesn't seem to be related with this patchset.
So, I'd prefer to change this in a separate patch.

Thanks!

[PATCH v3 net-next 0/5] eBPF-based device cgroup controller

2017-11-02 Thread Roman Gushchin

This patchset introduces an eBPF-based device controller for cgroup v2.

Patches (1) and (2) are a preparational work required to share some code
  with the existing device controller implementation.
Patch (3) is the main patch, which introduces a new bpf prog type
  and all necessary infrastructure.
Patch (4) moves cgroup_helpers.c/h to use them by patch (4).
Patch (5) implements an example of eBPF program which controls access
  to device files and corresponding userspace test.

v3:
  Renamed constants introduced by patch (3) to BPF_DEVCG_*

v2:
  Added patch (1).

v1:
  https://lkml.org/lkml/2017/11/1/363

Roman Gushchin (5):
  device_cgroup: add DEVCG_ prefix to ACC_* and DEV_* constants
  device_cgroup: prepare code for bpf-based device controller
  bpf, cgroup: implement eBPF-based device controller for cgroup v2
  bpf: move cgroup_helpers from samples/bpf/ to
tools/testing/selftesting/bpf/
  selftests/bpf: add a test for device cgroup controller

 include/linux/bpf-cgroup.h | 15 
 include/linux/bpf_types.h  |  3 +
 include/linux/device_cgroup.h  | 67 +++-
 include/uapi/linux/bpf.h   | 15 
 kernel/bpf/cgroup.c| 67 
 kernel/bpf/syscall.c   |  7 ++
 kernel/bpf/verifier.c  |  1 +
 samples/bpf/Makefile   |  5 +-
 security/device_cgroup.c   | 91 ++---
 tools/include/uapi/linux/bpf.h | 15 
 tools/testing/selftests/bpf/Makefile   |  6 +-
 .../testing/selftests}/bpf/cgroup_helpers.c|  0
 .../testing/selftests}/bpf/cgroup_helpers.h|  0
 tools/testing/selftests/bpf/dev_cgroup.c   | 60 ++
 tools/testing/selftests/bpf/test_dev_cgroup.c  | 93 ++
 15 files changed, 369 insertions(+), 76 deletions(-)
 rename {samples => tools/testing/selftests}/bpf/cgroup_helpers.c (100%)
 rename {samples => tools/testing/selftests}/bpf/cgroup_helpers.h (100%)
 create mode 100644 tools/testing/selftests/bpf/dev_cgroup.c
 create mode 100644 tools/testing/selftests/bpf/test_dev_cgroup.c

-- 
2.13.6

[PATCH v3 net-next 3/5] bpf, cgroup: implement eBPF-based device controller for cgroup v2

2017-11-02 Thread Roman Gushchin

Cgroup v2 lacks the device controller, provided by cgroup v1.
This patch adds a new eBPF program type, which in combination
of previously added ability to attach multiple eBPF programs
to a cgroup, will provide a similar functionality, but with some
additional flexibility.

This patch introduces a BPF_PROG_TYPE_CGROUP_DEVICE program type.
A program takes major and minor device numbers, device type
(block/character) and access type (mknod/read/write) as parameters
and returns an integer which defines if the operation should be
allowed or terminated with -EPERM.

Signed-off-by: Roman Gushchin <g...@fb.com>
Acked-by: Alexei Starovoitov <a...@kernel.org>
Acked-by: Tejun Heo <t...@kernel.org>
Cc: Daniel Borkmann <dan...@iogearbox.net>
---
 include/linux/bpf-cgroup.h | 15 ++
 include/linux/bpf_types.h  |  3 ++
 include/linux/device_cgroup.h  |  8 -
 include/uapi/linux/bpf.h   | 15 ++
 kernel/bpf/cgroup.c| 67 ++
 kernel/bpf/syscall.c   |  7 +
 kernel/bpf/verifier.c  |  1 +
 tools/include/uapi/linux/bpf.h | 15 ++
 8 files changed, 130 insertions(+), 1 deletion(-)

diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index 359b6f5d3d90..d77cefb3fe99 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -66,6 +66,9 @@ int __cgroup_bpf_run_filter_sock_ops(struct sock *sk,
 struct bpf_sock_ops_kern *sock_ops,
 enum bpf_attach_type type);
 
+int __cgroup_bpf_check_dev_permission(short dev_type, u32 major, u32 minor,
+ short access, enum bpf_attach_type type);
+
 /* Wrappers for __cgroup_bpf_run_filter_skb() guarded by cgroup_bpf_enabled. */
 #define BPF_CGROUP_RUN_PROG_INET_INGRESS(sk, skb)\
 ({   \
@@ -111,6 +114,17 @@ int __cgroup_bpf_run_filter_sock_ops(struct sock *sk,
}  \
__ret; \
 })
+
+#define BPF_CGROUP_RUN_PROG_DEVICE_CGROUP(type, major, minor, access)\
+({   \
+   int __ret = 0;\
+   if (cgroup_bpf_enabled)   \
+   __ret = __cgroup_bpf_check_dev_permission(type, major, minor, \
+ access, \
+ BPF_CGROUP_DEVICE); \
+ \
+   __ret;\
+})
 #else
 
 struct cgroup_bpf {};
@@ -121,6 +135,7 @@ static inline int cgroup_bpf_inherit(struct cgroup *cgrp) { 
return 0; }
 #define BPF_CGROUP_RUN_PROG_INET_EGRESS(sk,skb) ({ 0; })
 #define BPF_CGROUP_RUN_PROG_INET_SOCK(sk) ({ 0; })
 #define BPF_CGROUP_RUN_PROG_SOCK_OPS(sock_ops) ({ 0; })
+#define BPF_CGROUP_RUN_PROG_DEVICE_CGROUP(type,major,minor,access) ({ 0; })
 
 #endif /* CONFIG_CGROUP_BPF */
 
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index 36418ad43245..963a97ee4b7c 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -18,6 +18,9 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_KPROBE, kprobe)
 BPF_PROG_TYPE(BPF_PROG_TYPE_TRACEPOINT, tracepoint)
 BPF_PROG_TYPE(BPF_PROG_TYPE_PERF_EVENT, perf_event)
 #endif
+#ifdef CONFIG_CGROUP_BPF
+BPF_PROG_TYPE(BPF_PROG_TYPE_CGROUP_DEVICE, cg_dev)
+#endif
 
 BPF_MAP_TYPE(BPF_MAP_TYPE_ARRAY, array_map_ops)
 BPF_MAP_TYPE(BPF_MAP_TYPE_PERCPU_ARRAY, percpu_array_map_ops)
diff --git a/include/linux/device_cgroup.h b/include/linux/device_cgroup.h
index 1e42d33accbf..90245a70d940 100644
--- a/include/linux/device_cgroup.h
+++ b/include/linux/device_cgroup.h
@@ -1,4 +1,5 @@
 #include 
+#include 
 
 #define DEVCG_ACC_MKNOD 1
 #define DEVCG_ACC_READ  2
@@ -18,10 +19,15 @@ static inline int __devcgroup_check_permission(short type, 
u32 major, u32 minor,
 { return 0; }
 #endif
 
-#ifdef CONFIG_CGROUP_DEVICE
+#if defined(CONFIG_CGROUP_DEVICE) || defined(CONFIG_CGROUP_BPF)
 static inline int devcgroup_check_permission(short type, u32 major, u32 minor,
 short access)
 {
+   int rc = BPF_CGROUP_RUN_PROG_DEVICE_CGROUP(type, major, minor, access);
+
+   if (rc)
+   return -EPERM;
+
return __devcgroup_check_permission(type, major, minor, access);
 }
 
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 0b7b54d898bd..ea905863a033 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -131,6 +131,7 @@ enum bpf_prog_type {

[PATCH v3 net-next 2/5] device_cgroup: prepare code for bpf-based device controller

2017-11-02 Thread Roman Gushchin

This is non-functional change to prepare the device cgroup code
for adding eBPF-based controller for cgroups v2.

The patch performs the following changes:
1) __devcgroup_inode_permission() and devcgroup_inode_mknod()
   are moving to the device-cgroup.h and converting into static inline.
2) __devcgroup_check_permission() is exported.
3) devcgroup_check_permission() wrapper is introduced to be used
   by both existing and new bpf-based implementations.

Signed-off-by: Roman Gushchin <g...@fb.com>
Acked-by: Tejun Heo <t...@kernel.org>
Acked-by: Alexei Starovoitov <a...@kernel.org>
---
 include/linux/device_cgroup.h | 61 ---
 security/device_cgroup.c  | 47 ++---
 2 files changed, 59 insertions(+), 49 deletions(-)

diff --git a/include/linux/device_cgroup.h b/include/linux/device_cgroup.h
index 8b64221b432b..1e42d33accbf 100644
--- a/include/linux/device_cgroup.h
+++ b/include/linux/device_cgroup.h
@@ -1,16 +1,69 @@
 #include 
 
+#define DEVCG_ACC_MKNOD 1
+#define DEVCG_ACC_READ  2
+#define DEVCG_ACC_WRITE 4
+#define DEVCG_ACC_MASK (DEVCG_ACC_MKNOD | DEVCG_ACC_READ | DEVCG_ACC_WRITE)
+
+#define DEVCG_DEV_BLOCK 1
+#define DEVCG_DEV_CHAR  2
+#define DEVCG_DEV_ALL   4  /* this represents all devices */
+
+#ifdef CONFIG_CGROUP_DEVICE
+extern int __devcgroup_check_permission(short type, u32 major, u32 minor,
+   short access);
+#else
+static inline int __devcgroup_check_permission(short type, u32 major, u32 
minor,
+  short access)
+{ return 0; }
+#endif
+
 #ifdef CONFIG_CGROUP_DEVICE
-extern int __devcgroup_inode_permission(struct inode *inode, int mask);
-extern int devcgroup_inode_mknod(int mode, dev_t dev);
+static inline int devcgroup_check_permission(short type, u32 major, u32 minor,
+short access)
+{
+   return __devcgroup_check_permission(type, major, minor, access);
+}
+
 static inline int devcgroup_inode_permission(struct inode *inode, int mask)
 {
+   short type, access = 0;
+
if (likely(!inode->i_rdev))
return 0;
-   if (!S_ISBLK(inode->i_mode) && !S_ISCHR(inode->i_mode))
+
+   if (S_ISBLK(inode->i_mode))
+   type = DEVCG_DEV_BLOCK;
+   else if (S_ISCHR(inode->i_mode))
+   type = DEVCG_DEV_CHAR;
+   else
return 0;
-   return __devcgroup_inode_permission(inode, mask);
+
+   if (mask & MAY_WRITE)
+   access |= DEVCG_ACC_WRITE;
+   if (mask & MAY_READ)
+   access |= DEVCG_ACC_READ;
+
+   return devcgroup_check_permission(type, imajor(inode), iminor(inode),
+ access);
 }
+
+static inline int devcgroup_inode_mknod(int mode, dev_t dev)
+{
+   short type;
+
+   if (!S_ISBLK(mode) && !S_ISCHR(mode))
+   return 0;
+
+   if (S_ISBLK(mode))
+   type = DEVCG_DEV_BLOCK;
+   else
+   type = DEVCG_DEV_CHAR;
+
+   return devcgroup_check_permission(type, MAJOR(dev), MINOR(dev),
+ DEVCG_ACC_MKNOD);
+}
+
 #else
 static inline int devcgroup_inode_permission(struct inode *inode, int mask)
 { return 0; }
diff --git a/security/device_cgroup.c b/security/device_cgroup.c
index 76cc0cbbb10d..c54692208dcb 100644
--- a/security/device_cgroup.c
+++ b/security/device_cgroup.c
@@ -14,15 +14,6 @@
 #include 
 #include 
 
-#define DEVCG_ACC_MKNOD 1
-#define DEVCG_ACC_READ  2
-#define DEVCG_ACC_WRITE 4
-#define DEVCG_ACC_MASK (DEVCG_ACC_MKNOD | DEVCG_ACC_READ | DEVCG_ACC_WRITE)
-
-#define DEVCG_DEV_BLOCK 1
-#define DEVCG_DEV_CHAR  2
-#define DEVCG_DEV_ALL   4  /* this represents all devices */
-
 static DEFINE_MUTEX(devcgroup_mutex);
 
 enum devcg_behavior {
@@ -809,8 +800,8 @@ struct cgroup_subsys devices_cgrp_subsys = {
  *
  * returns 0 on success, -EPERM case the operation is not permitted
  */
-static int __devcgroup_check_permission(short type, u32 major, u32 minor,
-   short access)
+int __devcgroup_check_permission(short type, u32 major, u32 minor,
+short access)
 {
struct dev_cgroup *dev_cgroup;
bool rc;
@@ -832,37 +823,3 @@ static int __devcgroup_check_permission(short type, u32 
major, u32 minor,
 
return 0;
 }
-
-int __devcgroup_inode_permission(struct inode *inode, int mask)
-{
-   short type, access = 0;
-
-   if (S_ISBLK(inode->i_mode))
-   type = DEVCG_DEV_BLOCK;
-   if (S_ISCHR(inode->i_mode))
-   type = DEVCG_DEV_CHAR;
-   if (mask & MAY_WRITE)
-   access |= DEVCG_ACC_WRITE;
-   if (mask & MAY_READ)
-   access |= DEVCG_ACC_READ;
-
-   return __devcgroup_check_permission(type, imajor(inode), iminor(inode),
-   access)

[PATCH v3 net-next 4/5] bpf: move cgroup_helpers from samples/bpf/ to tools/testing/selftesting/bpf/

2017-11-02 Thread Roman Gushchin

The purpose of this move is to use these files in bpf tests.

Signed-off-by: Roman Gushchin <g...@fb.com>
Acked-by: Alexei Starovoitov <a...@kernel.org>
Acked-by: Tejun Heo <t...@kernel.org>
Cc: Daniel Borkmann <dan...@iogearbox.net>
---
 samples/bpf/Makefile  | 5 +++--
 tools/testing/selftests/bpf/Makefile  | 2 +-
 {samples => tools/testing/selftests}/bpf/cgroup_helpers.c | 0
 {samples => tools/testing/selftests}/bpf/cgroup_helpers.h | 0
 4 files changed, 4 insertions(+), 3 deletions(-)
 rename {samples => tools/testing/selftests}/bpf/cgroup_helpers.c (100%)
 rename {samples => tools/testing/selftests}/bpf/cgroup_helpers.h (100%)

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index ea2b9e6135f3..adb1e5dba1ea 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -45,6 +45,7 @@ hostprogs-y += syscall_tp
 
 # Libbpf dependencies
 LIBBPF := ../../tools/lib/bpf/bpf.o
+CGROUP_HELPERS := ../../tools/testing/selftests/bpf/cgroup_helpers.o
 
 test_lru_dist-objs := test_lru_dist.o $(LIBBPF)
 sock_example-objs := sock_example.o $(LIBBPF)
@@ -68,13 +69,13 @@ map_perf_test-objs := bpf_load.o $(LIBBPF) 
map_perf_test_user.o
 test_overhead-objs := bpf_load.o $(LIBBPF) test_overhead_user.o
 test_cgrp2_array_pin-objs := $(LIBBPF) test_cgrp2_array_pin.o
 test_cgrp2_attach-objs := $(LIBBPF) test_cgrp2_attach.o
-test_cgrp2_attach2-objs := $(LIBBPF) test_cgrp2_attach2.o cgroup_helpers.o
+test_cgrp2_attach2-objs := $(LIBBPF) test_cgrp2_attach2.o $(CGROUP_HELPERS)
 test_cgrp2_sock-objs := $(LIBBPF) test_cgrp2_sock.o
 test_cgrp2_sock2-objs := bpf_load.o $(LIBBPF) test_cgrp2_sock2.o
 xdp1-objs := bpf_load.o $(LIBBPF) xdp1_user.o
 # reuse xdp1 source intentionally
 xdp2-objs := bpf_load.o $(LIBBPF) xdp1_user.o
-test_current_task_under_cgroup-objs := bpf_load.o $(LIBBPF) cgroup_helpers.o \
+test_current_task_under_cgroup-objs := bpf_load.o $(LIBBPF) $(CGROUP_HELPERS) \
   test_current_task_under_cgroup_user.o
 trace_event-objs := bpf_load.o $(LIBBPF) trace_event_user.o
 sampleip-objs := bpf_load.o $(LIBBPF) sampleip_user.o
diff --git a/tools/testing/selftests/bpf/Makefile 
b/tools/testing/selftests/bpf/Makefile
index 2e7880ea0add..36c34f0218a3 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -22,7 +22,7 @@ TEST_PROGS := test_kmod.sh test_xdp_redirect.sh 
test_xdp_meta.sh
 
 include ../lib.mk
 
-BPFOBJ := $(OUTPUT)/libbpf.a
+BPFOBJ := $(OUTPUT)/libbpf.a $(OUTPUT)/cgroup_helpers.c
 
 $(TEST_GEN_PROGS): $(BPFOBJ)
 
diff --git a/samples/bpf/cgroup_helpers.c 
b/tools/testing/selftests/bpf/cgroup_helpers.c
similarity index 100%
rename from samples/bpf/cgroup_helpers.c
rename to tools/testing/selftests/bpf/cgroup_helpers.c
diff --git a/samples/bpf/cgroup_helpers.h 
b/tools/testing/selftests/bpf/cgroup_helpers.h
similarity index 100%
rename from samples/bpf/cgroup_helpers.h
rename to tools/testing/selftests/bpf/cgroup_helpers.h
-- 
2.13.6

[PATCH v3 net-next 1/5] device_cgroup: add DEVCG_ prefix to ACC_* and DEV_* constants

2017-11-02 Thread Roman Gushchin

Rename device type and access type constants defined in
security/device_cgroup.c by adding the DEVCG_ prefix.

The reason behind this renaming is to make them global namespace
friendly, as they will be moved to the corresponding header file
by following patches.

Signed-off-by: Roman Gushchin <g...@fb.com>
Cc: David S. Miller <da...@davemloft.net>
Cc: Tejun Heo <t...@kernel.org>
Cc: Alexei Starovoitov <a...@kernel.org>
Cc: Daniel Borkmann <dan...@iogearbox.net>
---
 security/device_cgroup.c | 72 
 1 file changed, 36 insertions(+), 36 deletions(-)

diff --git a/security/device_cgroup.c b/security/device_cgroup.c
index 03c1652c9a1f..76cc0cbbb10d 100644
--- a/security/device_cgroup.c
+++ b/security/device_cgroup.c
@@ -14,14 +14,14 @@
 #include 
 #include 
 
-#define ACC_MKNOD 1
-#define ACC_READ  2
-#define ACC_WRITE 4
-#define ACC_MASK (ACC_MKNOD | ACC_READ | ACC_WRITE)
+#define DEVCG_ACC_MKNOD 1
+#define DEVCG_ACC_READ  2
+#define DEVCG_ACC_WRITE 4
+#define DEVCG_ACC_MASK (DEVCG_ACC_MKNOD | DEVCG_ACC_READ | DEVCG_ACC_WRITE)
 
-#define DEV_BLOCK 1
-#define DEV_CHAR  2
-#define DEV_ALL   4  /* this represents all devices */
+#define DEVCG_DEV_BLOCK 1
+#define DEVCG_DEV_CHAR  2
+#define DEVCG_DEV_ALL   4  /* this represents all devices */
 
 static DEFINE_MUTEX(devcgroup_mutex);
 
@@ -245,21 +245,21 @@ static void set_access(char *acc, short access)
 {
int idx = 0;
memset(acc, 0, ACCLEN);
-   if (access & ACC_READ)
+   if (access & DEVCG_ACC_READ)
acc[idx++] = 'r';
-   if (access & ACC_WRITE)
+   if (access & DEVCG_ACC_WRITE)
acc[idx++] = 'w';
-   if (access & ACC_MKNOD)
+   if (access & DEVCG_ACC_MKNOD)
acc[idx++] = 'm';
 }
 
 static char type_to_char(short type)
 {
-   if (type == DEV_ALL)
+   if (type == DEVCG_DEV_ALL)
return 'a';
-   if (type == DEV_CHAR)
+   if (type == DEVCG_DEV_CHAR)
return 'c';
-   if (type == DEV_BLOCK)
+   if (type == DEVCG_DEV_BLOCK)
return 'b';
return 'X';
 }
@@ -286,10 +286,10 @@ static int devcgroup_seq_show(struct seq_file *m, void *v)
 * This way, the file remains as a "whitelist of devices"
 */
if (devcgroup->behavior == DEVCG_DEFAULT_ALLOW) {
-   set_access(acc, ACC_MASK);
+   set_access(acc, DEVCG_ACC_MASK);
set_majmin(maj, ~0);
set_majmin(min, ~0);
-   seq_printf(m, "%c %s:%s %s\n", type_to_char(DEV_ALL),
+   seq_printf(m, "%c %s:%s %s\n", type_to_char(DEVCG_DEV_ALL),
   maj, min, acc);
} else {
list_for_each_entry_rcu(ex, >exceptions, list) {
@@ -308,10 +308,10 @@ static int devcgroup_seq_show(struct seq_file *m, void *v)
 /**
  * match_exception - iterates the exception list trying to find a complete 
match
  * @exceptions: list of exceptions
- * @type: device type (DEV_BLOCK or DEV_CHAR)
+ * @type: device type (DEVCG_DEV_BLOCK or DEVCG_DEV_CHAR)
  * @major: device file major number, ~0 to match all
  * @minor: device file minor number, ~0 to match all
- * @access: permission mask (ACC_READ, ACC_WRITE, ACC_MKNOD)
+ * @access: permission mask (DEVCG_ACC_READ, DEVCG_ACC_WRITE, DEVCG_ACC_MKNOD)
  *
  * It is considered a complete match if an exception is found that will
  * contain the entire range of provided parameters.
@@ -324,9 +324,9 @@ static bool match_exception(struct list_head *exceptions, 
short type,
struct dev_exception_item *ex;
 
list_for_each_entry_rcu(ex, exceptions, list) {
-   if ((type & DEV_BLOCK) && !(ex->type & DEV_BLOCK))
+   if ((type & DEVCG_DEV_BLOCK) && !(ex->type & DEVCG_DEV_BLOCK))
continue;
-   if ((type & DEV_CHAR) && !(ex->type & DEV_CHAR))
+   if ((type & DEVCG_DEV_CHAR) && !(ex->type & DEVCG_DEV_CHAR))
continue;
if (ex->major != ~0 && ex->major != major)
continue;
@@ -343,10 +343,10 @@ static bool match_exception(struct list_head *exceptions, 
short type,
 /**
  * match_exception_partial - iterates the exception list trying to find a 
partial match
  * @exceptions: list of exceptions
- * @type: device type (DEV_BLOCK or DEV_CHAR)
+ * @type: device type (DEVCG_DEV_BLOCK or DEVCG_DEV_CHAR)
  * @major: device file major number, ~0 to match all
  * @minor: device file minor number, ~0 to match all
- * @access: permission mask (ACC_READ, ACC_WRITE, ACC_MKNOD)
+ * @access: permission mask (DEVCG_ACC_READ, DEVCG_ACC_WRITE, DEVCG_ACC_MKNOD)
  *
  * It is considered a partial match if an exception's range is found to
  * contain *any* of the device

[PATCH v3 net-next 5/5] selftests/bpf: add a test for device cgroup controller

2017-11-02 Thread Roman Gushchin

Add a test for device cgroup controller.

The test loads a simple bpf program which logs all
device access attempts using trace_printk() and forbids
all operations except operations with /dev/zero and
/dev/urandom.

Then the test creates and joins a test cgroup, and attaches
the bpf program to it.

Then it tries to perform some simple device operations
and checks the result:

  create /dev/null (should fail)
  create /dev/zero (should pass)
  copy data from /dev/urandom to /dev/zero (should pass)
  copy data from /dev/urandom to /dev/full (should fail)
  copy data from /dev/random to /dev/zero (should fail)

Signed-off-by: Roman Gushchin <g...@fb.com>
Acked-by: Alexei Starovoitov <a...@kernel.org>
Acked-by: Tejun Heo <t...@kernel.org>
Cc: Daniel Borkmann <dan...@iogearbox.net>
---
 tools/testing/selftests/bpf/Makefile  |  4 +-
 tools/testing/selftests/bpf/dev_cgroup.c  | 60 +
 tools/testing/selftests/bpf/test_dev_cgroup.c | 93 +++
 3 files changed, 155 insertions(+), 2 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/dev_cgroup.c
 create mode 100644 tools/testing/selftests/bpf/test_dev_cgroup.c

diff --git a/tools/testing/selftests/bpf/Makefile 
b/tools/testing/selftests/bpf/Makefile
index 36c34f0218a3..64ba3684a4f4 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -12,11 +12,11 @@ CFLAGS += -Wall -O2 -I$(APIDIR) -I$(LIBDIR) -I$(GENDIR) 
$(GENFLAGS) -I../../../i
 LDLIBS += -lcap -lelf
 
 TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map 
test_progs \
-   test_align test_verifier_log
+   test_align test_verifier_log test_dev_cgroup
 
 TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o test_tcp_estats.o 
test_obj_id.o \
test_pkt_md_access.o test_xdp_redirect.o test_xdp_meta.o 
sockmap_parse_prog.o \
-   sockmap_verdict_prog.o
+   sockmap_verdict_prog.o dev_cgroup.o
 
 TEST_PROGS := test_kmod.sh test_xdp_redirect.sh test_xdp_meta.sh
 
diff --git a/tools/testing/selftests/bpf/dev_cgroup.c 
b/tools/testing/selftests/bpf/dev_cgroup.c
new file mode 100644
index ..ce41a3475f27
--- /dev/null
+++ b/tools/testing/selftests/bpf/dev_cgroup.c
@@ -0,0 +1,60 @@
+/* Copyright (c) 2017 Facebook
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include "bpf_helpers.h"
+
+SEC("cgroup/dev")
+int bpf_prog1(struct bpf_cgroup_dev_ctx *ctx)
+{
+   short type = ctx->access_type & 0x;
+#ifdef DEBUG
+   short access = ctx->access_type >> 16;
+   char fmt[] = "  %d:%d\n";
+
+   switch (type) {
+   case BPF_DEVCG_DEV_BLOCK:
+   fmt[0] = 'b';
+   break;
+   case BPF_DEVCG_DEV_CHAR:
+   fmt[0] = 'c';
+   break;
+   default:
+   fmt[0] = '?';
+   break;
+   }
+
+   if (access & BPF_DEVCG_ACC_READ)
+   fmt[8] = 'r';
+
+   if (access & BPF_DEVCG_ACC_WRITE)
+   fmt[9] = 'w';
+
+   if (access & BPF_DEVCG_ACC_MKNOD)
+   fmt[10] = 'm';
+
+   bpf_trace_printk(fmt, sizeof(fmt), ctx->major, ctx->minor);
+#endif
+
+   /* Allow access to /dev/zero and /dev/random.
+* Forbid everything else.
+*/
+   if (ctx->major != 1 || type != BPF_DEVCG_DEV_CHAR)
+   return 0;
+
+   switch (ctx->minor) {
+   case 5: /* 1:5 /dev/zero */
+   case 9: /* 1:9 /dev/urandom */
+   return 1;
+   }
+
+   return 0;
+}
+
+char _license[] SEC("license") = "GPL";
+__u32 _version SEC("version") = LINUX_VERSION_CODE;
diff --git a/tools/testing/selftests/bpf/test_dev_cgroup.c 
b/tools/testing/selftests/bpf/test_dev_cgroup.c
new file mode 100644
index ..02c85d6c89b0
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_dev_cgroup.c
@@ -0,0 +1,93 @@
+/* Copyright (c) 2017 Facebook
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#include "cgroup_helpers.h"
+
+#define DEV_CGROUP_PROG "./dev_cgroup.o"
+
+#define TEST_CGROUP "test-bpf-based-device-cgroup/"
+
+int main(int argc, char **argv)
+{
+   struct bpf_object *obj;
+   int error = EXIT_FAILURE;
+   int prog_fd, cgroup_fd;
+   __u32 prog_cnt;
+
+   if (bpf_prog_load(DEV_CGROUP_PROG, BPF_PROG_TYPE_CGROUP_DEVICE,
+ , _fd)) {
+   printf("Failed to load DEV_CGROUP program\n");
+

Re: [PATCH v2 net-next 3/5] bpf, cgroup: implement eBPF-based device controller for cgroup v2

2017-11-02 Thread Roman Gushchin

On Thu, Nov 02, 2017 at 08:11:07AM -0700, Alexei Starovoitov wrote:
> On 11/2/17 7:54 AM, Roman Gushchin wrote:
> > +#define DEV_BPF_ACC_MKNOD  (1ULL << 0)
> > +#define DEV_BPF_ACC_READ   (1ULL << 1)
> > +#define DEV_BPF_ACC_WRITE  (1ULL << 2)
> > +
> > +#define DEV_BPF_DEV_BLOCK  (1ULL << 0)
> > +#define DEV_BPF_DEV_CHAR   (1ULL << 1)
> > +
> 
> all macros in bpf.h start with BPF_
> To be consistent with the rest can you rename above to BPF_DEVCG_.. ?

Sure. Will post v3.

[PATCH v2 net-next 0/5] eBPF-based device cgroup controller

2017-11-02 Thread Roman Gushchin

This patchset introduces an eBPF-based device controller for cgroup v2.

Patches (1) and (2) are a preparational work required to share some code
  with the existing device controller implementation.
Patch (3) is the main patch, which introduces a new bpf prog type
  and all necessary infrastructure.
Patch (4) moves cgroup_helpers.c/h to use them by patch (4).
Patch (5) implements an example of eBPF program which controls access
  to device files and corresponding userspace test.

v2:
  Added patch (1).

v1:
  https://lkml.org/lkml/2017/11/1/363

Roman Gushchin (5):
  device_cgroup: add DEVCG_ prefix to ACC_* and DEV_* constants
  device_cgroup: prepare code for bpf-based device controller
  bpf, cgroup: implement eBPF-based device controller for cgroup v2
  bpf: move cgroup_helpers from samples/bpf/ to
tools/testing/selftesting/bpf/
  selftests/bpf: add a test for device cgroup controller

 include/linux/bpf-cgroup.h | 15 
 include/linux/bpf_types.h  |  3 +
 include/linux/device_cgroup.h  | 67 +++-
 include/uapi/linux/bpf.h   | 15 
 kernel/bpf/cgroup.c| 67 
 kernel/bpf/syscall.c   |  7 ++
 kernel/bpf/verifier.c  |  1 +
 samples/bpf/Makefile   |  5 +-
 security/device_cgroup.c   | 91 ++---
 tools/include/uapi/linux/bpf.h | 15 
 tools/testing/selftests/bpf/Makefile   |  6 +-
 .../testing/selftests}/bpf/cgroup_helpers.c|  0
 .../testing/selftests}/bpf/cgroup_helpers.h|  0
 tools/testing/selftests/bpf/dev_cgroup.c   | 60 ++
 tools/testing/selftests/bpf/test_dev_cgroup.c  | 93 ++
 15 files changed, 369 insertions(+), 76 deletions(-)
 rename {samples => tools/testing/selftests}/bpf/cgroup_helpers.c (100%)
 rename {samples => tools/testing/selftests}/bpf/cgroup_helpers.h (100%)
 create mode 100644 tools/testing/selftests/bpf/dev_cgroup.c
 create mode 100644 tools/testing/selftests/bpf/test_dev_cgroup.c

-- 
2.13.6

[PATCH v2 net-next 2/5] device_cgroup: prepare code for bpf-based device controller

2017-11-02 Thread Roman Gushchin

This is non-functional change to prepare the device cgroup code
for adding eBPF-based controller for cgroups v2.

The patch performs the following changes:
1) __devcgroup_inode_permission() and devcgroup_inode_mknod()
   are moving to the device-cgroup.h and converting into static inline.
2) __devcgroup_check_permission() is exported.
3) devcgroup_check_permission() wrapper is introduced to be used
   by both existing and new bpf-based implementations.

Signed-off-by: Roman Gushchin <g...@fb.com>
Acked-by: Tejun Heo <t...@kernel.org>
Acked-by: Alexei Starovoitov <a...@kernel.org>
---
 include/linux/device_cgroup.h | 61 ---
 security/device_cgroup.c  | 47 ++---
 2 files changed, 59 insertions(+), 49 deletions(-)

diff --git a/include/linux/device_cgroup.h b/include/linux/device_cgroup.h
index 8b64221b432b..1e42d33accbf 100644
--- a/include/linux/device_cgroup.h
+++ b/include/linux/device_cgroup.h
@@ -1,16 +1,69 @@
 #include 
 
+#define DEVCG_ACC_MKNOD 1
+#define DEVCG_ACC_READ  2
+#define DEVCG_ACC_WRITE 4
+#define DEVCG_ACC_MASK (DEVCG_ACC_MKNOD | DEVCG_ACC_READ | DEVCG_ACC_WRITE)
+
+#define DEVCG_DEV_BLOCK 1
+#define DEVCG_DEV_CHAR  2
+#define DEVCG_DEV_ALL   4  /* this represents all devices */
+
+#ifdef CONFIG_CGROUP_DEVICE
+extern int __devcgroup_check_permission(short type, u32 major, u32 minor,
+   short access);
+#else
+static inline int __devcgroup_check_permission(short type, u32 major, u32 
minor,
+  short access)
+{ return 0; }
+#endif
+
 #ifdef CONFIG_CGROUP_DEVICE
-extern int __devcgroup_inode_permission(struct inode *inode, int mask);
-extern int devcgroup_inode_mknod(int mode, dev_t dev);
+static inline int devcgroup_check_permission(short type, u32 major, u32 minor,
+short access)
+{
+   return __devcgroup_check_permission(type, major, minor, access);
+}
+
 static inline int devcgroup_inode_permission(struct inode *inode, int mask)
 {
+   short type, access = 0;
+
if (likely(!inode->i_rdev))
return 0;
-   if (!S_ISBLK(inode->i_mode) && !S_ISCHR(inode->i_mode))
+
+   if (S_ISBLK(inode->i_mode))
+   type = DEVCG_DEV_BLOCK;
+   else if (S_ISCHR(inode->i_mode))
+   type = DEVCG_DEV_CHAR;
+   else
return 0;
-   return __devcgroup_inode_permission(inode, mask);
+
+   if (mask & MAY_WRITE)
+   access |= DEVCG_ACC_WRITE;
+   if (mask & MAY_READ)
+   access |= DEVCG_ACC_READ;
+
+   return devcgroup_check_permission(type, imajor(inode), iminor(inode),
+ access);
 }
+
+static inline int devcgroup_inode_mknod(int mode, dev_t dev)
+{
+   short type;
+
+   if (!S_ISBLK(mode) && !S_ISCHR(mode))
+   return 0;
+
+   if (S_ISBLK(mode))
+   type = DEVCG_DEV_BLOCK;
+   else
+   type = DEVCG_DEV_CHAR;
+
+   return devcgroup_check_permission(type, MAJOR(dev), MINOR(dev),
+ DEVCG_ACC_MKNOD);
+}
+
 #else
 static inline int devcgroup_inode_permission(struct inode *inode, int mask)
 { return 0; }
diff --git a/security/device_cgroup.c b/security/device_cgroup.c
index 76cc0cbbb10d..c54692208dcb 100644
--- a/security/device_cgroup.c
+++ b/security/device_cgroup.c
@@ -14,15 +14,6 @@
 #include 
 #include 
 
-#define DEVCG_ACC_MKNOD 1
-#define DEVCG_ACC_READ  2
-#define DEVCG_ACC_WRITE 4
-#define DEVCG_ACC_MASK (DEVCG_ACC_MKNOD | DEVCG_ACC_READ | DEVCG_ACC_WRITE)
-
-#define DEVCG_DEV_BLOCK 1
-#define DEVCG_DEV_CHAR  2
-#define DEVCG_DEV_ALL   4  /* this represents all devices */
-
 static DEFINE_MUTEX(devcgroup_mutex);
 
 enum devcg_behavior {
@@ -809,8 +800,8 @@ struct cgroup_subsys devices_cgrp_subsys = {
  *
  * returns 0 on success, -EPERM case the operation is not permitted
  */
-static int __devcgroup_check_permission(short type, u32 major, u32 minor,
-   short access)
+int __devcgroup_check_permission(short type, u32 major, u32 minor,
+short access)
 {
struct dev_cgroup *dev_cgroup;
bool rc;
@@ -832,37 +823,3 @@ static int __devcgroup_check_permission(short type, u32 
major, u32 minor,
 
return 0;
 }
-
-int __devcgroup_inode_permission(struct inode *inode, int mask)
-{
-   short type, access = 0;
-
-   if (S_ISBLK(inode->i_mode))
-   type = DEVCG_DEV_BLOCK;
-   if (S_ISCHR(inode->i_mode))
-   type = DEVCG_DEV_CHAR;
-   if (mask & MAY_WRITE)
-   access |= DEVCG_ACC_WRITE;
-   if (mask & MAY_READ)
-   access |= DEVCG_ACC_READ;
-
-   return __devcgroup_check_permission(type, imajor(inode), iminor(inode),
-   access)

[PATCH v2 net-next 3/5] bpf, cgroup: implement eBPF-based device controller for cgroup v2

2017-11-02 Thread Roman Gushchin

Cgroup v2 lacks the device controller, provided by cgroup v1.
This patch adds a new eBPF program type, which in combination
of previously added ability to attach multiple eBPF programs
to a cgroup, will provide a similar functionality, but with some
additional flexibility.

This patch introduces a BPF_PROG_TYPE_CGROUP_DEVICE program type.
A program takes major and minor device numbers, device type
(block/character) and access type (mknod/read/write) as parameters
and returns an integer which defines if the operation should be
allowed or terminated with -EPERM.

Signed-off-by: Roman Gushchin <g...@fb.com>
Acked-by: Alexei Starovoitov <a...@kernel.org>
Acked-by: Tejun Heo <t...@kernel.org>
Cc: Daniel Borkmann <dan...@iogearbox.net>
---
 include/linux/bpf-cgroup.h | 15 ++
 include/linux/bpf_types.h  |  3 ++
 include/linux/device_cgroup.h  |  8 -
 include/uapi/linux/bpf.h   | 15 ++
 kernel/bpf/cgroup.c| 67 ++
 kernel/bpf/syscall.c   |  7 +
 kernel/bpf/verifier.c  |  1 +
 tools/include/uapi/linux/bpf.h | 15 ++
 8 files changed, 130 insertions(+), 1 deletion(-)

diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index 359b6f5d3d90..d77cefb3fe99 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -66,6 +66,9 @@ int __cgroup_bpf_run_filter_sock_ops(struct sock *sk,
 struct bpf_sock_ops_kern *sock_ops,
 enum bpf_attach_type type);
 
+int __cgroup_bpf_check_dev_permission(short dev_type, u32 major, u32 minor,
+ short access, enum bpf_attach_type type);
+
 /* Wrappers for __cgroup_bpf_run_filter_skb() guarded by cgroup_bpf_enabled. */
 #define BPF_CGROUP_RUN_PROG_INET_INGRESS(sk, skb)\
 ({   \
@@ -111,6 +114,17 @@ int __cgroup_bpf_run_filter_sock_ops(struct sock *sk,
}  \
__ret; \
 })
+
+#define BPF_CGROUP_RUN_PROG_DEVICE_CGROUP(type, major, minor, access)\
+({   \
+   int __ret = 0;\
+   if (cgroup_bpf_enabled)   \
+   __ret = __cgroup_bpf_check_dev_permission(type, major, minor, \
+ access, \
+ BPF_CGROUP_DEVICE); \
+ \
+   __ret;\
+})
 #else
 
 struct cgroup_bpf {};
@@ -121,6 +135,7 @@ static inline int cgroup_bpf_inherit(struct cgroup *cgrp) { 
return 0; }
 #define BPF_CGROUP_RUN_PROG_INET_EGRESS(sk,skb) ({ 0; })
 #define BPF_CGROUP_RUN_PROG_INET_SOCK(sk) ({ 0; })
 #define BPF_CGROUP_RUN_PROG_SOCK_OPS(sock_ops) ({ 0; })
+#define BPF_CGROUP_RUN_PROG_DEVICE_CGROUP(type,major,minor,access) ({ 0; })
 
 #endif /* CONFIG_CGROUP_BPF */
 
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index 36418ad43245..963a97ee4b7c 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -18,6 +18,9 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_KPROBE, kprobe)
 BPF_PROG_TYPE(BPF_PROG_TYPE_TRACEPOINT, tracepoint)
 BPF_PROG_TYPE(BPF_PROG_TYPE_PERF_EVENT, perf_event)
 #endif
+#ifdef CONFIG_CGROUP_BPF
+BPF_PROG_TYPE(BPF_PROG_TYPE_CGROUP_DEVICE, cg_dev)
+#endif
 
 BPF_MAP_TYPE(BPF_MAP_TYPE_ARRAY, array_map_ops)
 BPF_MAP_TYPE(BPF_MAP_TYPE_PERCPU_ARRAY, percpu_array_map_ops)
diff --git a/include/linux/device_cgroup.h b/include/linux/device_cgroup.h
index 1e42d33accbf..90245a70d940 100644
--- a/include/linux/device_cgroup.h
+++ b/include/linux/device_cgroup.h
@@ -1,4 +1,5 @@
 #include 
+#include 
 
 #define DEVCG_ACC_MKNOD 1
 #define DEVCG_ACC_READ  2
@@ -18,10 +19,15 @@ static inline int __devcgroup_check_permission(short type, 
u32 major, u32 minor,
 { return 0; }
 #endif
 
-#ifdef CONFIG_CGROUP_DEVICE
+#if defined(CONFIG_CGROUP_DEVICE) || defined(CONFIG_CGROUP_BPF)
 static inline int devcgroup_check_permission(short type, u32 major, u32 minor,
 short access)
 {
+   int rc = BPF_CGROUP_RUN_PROG_DEVICE_CGROUP(type, major, minor, access);
+
+   if (rc)
+   return -EPERM;
+
return __devcgroup_check_permission(type, major, minor, access);
 }
 
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 0b7b54d898bd..4deb3c9acbb2 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -131,6 +131,7 @@ enum bpf_prog_type {

[PATCH v2 net-next 5/5] selftests/bpf: add a test for device cgroup controller

2017-11-02 Thread Roman Gushchin

Add a test for device cgroup controller.

The test loads a simple bpf program which logs all
device access attempts using trace_printk() and forbids
all operations except operations with /dev/zero and
/dev/urandom.

Then the test creates and joins a test cgroup, and attaches
the bpf program to it.

Then it tries to perform some simple device operations
and checks the result:

  create /dev/null (should fail)
  create /dev/zero (should pass)
  copy data from /dev/urandom to /dev/zero (should pass)
  copy data from /dev/urandom to /dev/full (should fail)
  copy data from /dev/random to /dev/zero (should fail)

Signed-off-by: Roman Gushchin <g...@fb.com>
Acked-by: Alexei Starovoitov <a...@kernel.org>
Acked-by: Tejun Heo <t...@kernel.org>
Cc: Daniel Borkmann <dan...@iogearbox.net>
---
 tools/testing/selftests/bpf/Makefile  |  4 +-
 tools/testing/selftests/bpf/dev_cgroup.c  | 60 +
 tools/testing/selftests/bpf/test_dev_cgroup.c | 93 +++
 3 files changed, 155 insertions(+), 2 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/dev_cgroup.c
 create mode 100644 tools/testing/selftests/bpf/test_dev_cgroup.c

diff --git a/tools/testing/selftests/bpf/Makefile 
b/tools/testing/selftests/bpf/Makefile
index 36c34f0218a3..64ba3684a4f4 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -12,11 +12,11 @@ CFLAGS += -Wall -O2 -I$(APIDIR) -I$(LIBDIR) -I$(GENDIR) 
$(GENFLAGS) -I../../../i
 LDLIBS += -lcap -lelf
 
 TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map 
test_progs \
-   test_align test_verifier_log
+   test_align test_verifier_log test_dev_cgroup
 
 TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o test_tcp_estats.o 
test_obj_id.o \
test_pkt_md_access.o test_xdp_redirect.o test_xdp_meta.o 
sockmap_parse_prog.o \
-   sockmap_verdict_prog.o
+   sockmap_verdict_prog.o dev_cgroup.o
 
 TEST_PROGS := test_kmod.sh test_xdp_redirect.sh test_xdp_meta.sh
 
diff --git a/tools/testing/selftests/bpf/dev_cgroup.c 
b/tools/testing/selftests/bpf/dev_cgroup.c
new file mode 100644
index ..f15d5befa099
--- /dev/null
+++ b/tools/testing/selftests/bpf/dev_cgroup.c
@@ -0,0 +1,60 @@
+/* Copyright (c) 2017 Facebook
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include "bpf_helpers.h"
+
+SEC("cgroup/dev")
+int bpf_prog1(struct bpf_cgroup_dev_ctx *ctx)
+{
+   short type = ctx->access_type & 0x;
+#ifdef DEBUG
+   short access = ctx->access_type >> 16;
+   char fmt[] = "  %d:%d\n";
+
+   switch (type) {
+   case DEV_BPF_DEV_BLOCK:
+   fmt[0] = 'b';
+   break;
+   case DEV_BPF_DEV_CHAR:
+   fmt[0] = 'c';
+   break;
+   default:
+   fmt[0] = '?';
+   break;
+   }
+
+   if (access & DEV_BPF_ACC_READ)
+   fmt[8] = 'r';
+
+   if (access & DEV_BPF_ACC_WRITE)
+   fmt[9] = 'w';
+
+   if (access & DEV_BPF_ACC_MKNOD)
+   fmt[10] = 'm';
+
+   bpf_trace_printk(fmt, sizeof(fmt), ctx->major, ctx->minor);
+#endif
+
+   /* Allow access to /dev/zero and /dev/random.
+* Forbid everything else.
+*/
+   if (ctx->major != 1 || type != DEV_BPF_DEV_CHAR)
+   return 0;
+
+   switch (ctx->minor) {
+   case 5: /* 1:5 /dev/zero */
+   case 9: /* 1:9 /dev/urandom */
+   return 1;
+   }
+
+   return 0;
+}
+
+char _license[] SEC("license") = "GPL";
+__u32 _version SEC("version") = LINUX_VERSION_CODE;
diff --git a/tools/testing/selftests/bpf/test_dev_cgroup.c 
b/tools/testing/selftests/bpf/test_dev_cgroup.c
new file mode 100644
index ..02c85d6c89b0
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_dev_cgroup.c
@@ -0,0 +1,93 @@
+/* Copyright (c) 2017 Facebook
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#include "cgroup_helpers.h"
+
+#define DEV_CGROUP_PROG "./dev_cgroup.o"
+
+#define TEST_CGROUP "test-bpf-based-device-cgroup/"
+
+int main(int argc, char **argv)
+{
+   struct bpf_object *obj;
+   int error = EXIT_FAILURE;
+   int prog_fd, cgroup_fd;
+   __u32 prog_cnt;
+
+   if (bpf_prog_load(DEV_CGROUP_PROG, BPF_PROG_TYPE_CGROUP_DEVICE,
+ , _fd)) {
+   printf("Failed to load DEV_CGROUP program\n");
+

[PATCH v2 net-next 1/5] device_cgroup: add DEVCG_ prefix to ACC_* and DEV_* constants

2017-11-02 Thread Roman Gushchin

Rename device type and access type constants defined in
security/device_cgroup.c by adding the DEVCG_ prefix.

The reason behind this renaming is to make them global namespace
friendly, as they will be moved to the corresponding header file
by following patches.

Signed-off-by: Roman Gushchin <g...@fb.com>
Cc: David S. Miller <da...@davemloft.net>
Cc: Tejun Heo <t...@kernel.org>
Cc: Alexei Starovoitov <a...@kernel.org>
Cc: Daniel Borkmann <dan...@iogearbox.net>
---
 security/device_cgroup.c | 72 
 1 file changed, 36 insertions(+), 36 deletions(-)

diff --git a/security/device_cgroup.c b/security/device_cgroup.c
index 03c1652c9a1f..76cc0cbbb10d 100644
--- a/security/device_cgroup.c
+++ b/security/device_cgroup.c
@@ -14,14 +14,14 @@
 #include 
 #include 
 
-#define ACC_MKNOD 1
-#define ACC_READ  2
-#define ACC_WRITE 4
-#define ACC_MASK (ACC_MKNOD | ACC_READ | ACC_WRITE)
+#define DEVCG_ACC_MKNOD 1
+#define DEVCG_ACC_READ  2
+#define DEVCG_ACC_WRITE 4
+#define DEVCG_ACC_MASK (DEVCG_ACC_MKNOD | DEVCG_ACC_READ | DEVCG_ACC_WRITE)
 
-#define DEV_BLOCK 1
-#define DEV_CHAR  2
-#define DEV_ALL   4  /* this represents all devices */
+#define DEVCG_DEV_BLOCK 1
+#define DEVCG_DEV_CHAR  2
+#define DEVCG_DEV_ALL   4  /* this represents all devices */
 
 static DEFINE_MUTEX(devcgroup_mutex);
 
@@ -245,21 +245,21 @@ static void set_access(char *acc, short access)
 {
int idx = 0;
memset(acc, 0, ACCLEN);
-   if (access & ACC_READ)
+   if (access & DEVCG_ACC_READ)
acc[idx++] = 'r';
-   if (access & ACC_WRITE)
+   if (access & DEVCG_ACC_WRITE)
acc[idx++] = 'w';
-   if (access & ACC_MKNOD)
+   if (access & DEVCG_ACC_MKNOD)
acc[idx++] = 'm';
 }
 
 static char type_to_char(short type)
 {
-   if (type == DEV_ALL)
+   if (type == DEVCG_DEV_ALL)
return 'a';
-   if (type == DEV_CHAR)
+   if (type == DEVCG_DEV_CHAR)
return 'c';
-   if (type == DEV_BLOCK)
+   if (type == DEVCG_DEV_BLOCK)
return 'b';
return 'X';
 }
@@ -286,10 +286,10 @@ static int devcgroup_seq_show(struct seq_file *m, void *v)
 * This way, the file remains as a "whitelist of devices"
 */
if (devcgroup->behavior == DEVCG_DEFAULT_ALLOW) {
-   set_access(acc, ACC_MASK);
+   set_access(acc, DEVCG_ACC_MASK);
set_majmin(maj, ~0);
set_majmin(min, ~0);
-   seq_printf(m, "%c %s:%s %s\n", type_to_char(DEV_ALL),
+   seq_printf(m, "%c %s:%s %s\n", type_to_char(DEVCG_DEV_ALL),
   maj, min, acc);
} else {
list_for_each_entry_rcu(ex, >exceptions, list) {
@@ -308,10 +308,10 @@ static int devcgroup_seq_show(struct seq_file *m, void *v)
 /**
  * match_exception - iterates the exception list trying to find a complete 
match
  * @exceptions: list of exceptions
- * @type: device type (DEV_BLOCK or DEV_CHAR)
+ * @type: device type (DEVCG_DEV_BLOCK or DEVCG_DEV_CHAR)
  * @major: device file major number, ~0 to match all
  * @minor: device file minor number, ~0 to match all
- * @access: permission mask (ACC_READ, ACC_WRITE, ACC_MKNOD)
+ * @access: permission mask (DEVCG_ACC_READ, DEVCG_ACC_WRITE, DEVCG_ACC_MKNOD)
  *
  * It is considered a complete match if an exception is found that will
  * contain the entire range of provided parameters.
@@ -324,9 +324,9 @@ static bool match_exception(struct list_head *exceptions, 
short type,
struct dev_exception_item *ex;
 
list_for_each_entry_rcu(ex, exceptions, list) {
-   if ((type & DEV_BLOCK) && !(ex->type & DEV_BLOCK))
+   if ((type & DEVCG_DEV_BLOCK) && !(ex->type & DEVCG_DEV_BLOCK))
continue;
-   if ((type & DEV_CHAR) && !(ex->type & DEV_CHAR))
+   if ((type & DEVCG_DEV_CHAR) && !(ex->type & DEVCG_DEV_CHAR))
continue;
if (ex->major != ~0 && ex->major != major)
continue;
@@ -343,10 +343,10 @@ static bool match_exception(struct list_head *exceptions, 
short type,
 /**
  * match_exception_partial - iterates the exception list trying to find a 
partial match
  * @exceptions: list of exceptions
- * @type: device type (DEV_BLOCK or DEV_CHAR)
+ * @type: device type (DEVCG_DEV_BLOCK or DEVCG_DEV_CHAR)
  * @major: device file major number, ~0 to match all
  * @minor: device file minor number, ~0 to match all
- * @access: permission mask (ACC_READ, ACC_WRITE, ACC_MKNOD)
+ * @access: permission mask (DEVCG_ACC_READ, DEVCG_ACC_WRITE, DEVCG_ACC_MKNOD)
  *
  * It is considered a partial match if an exception's range is found to
  * contain *any* of the device

1 2 >

1 - 100 of 108 matches

Mail list logo