Re: [PATCH net-next v2 2/4] cgroup: bpf: Add BPF_MAP_TYPE_CGROUP_ARRAY

2016-06-23 Thread Martin KaFai Lau
On Thu, Jun 23, 2016 at 11:50:08PM +0200, Daniel Borkmann wrote:
> On 06/23/2016 11:26 PM, Martin KaFai Lau wrote:
> >We are still hatching out how to set this up in production. However, the
> >situation is similar to removing the pinned file.
s/pinned file/pinned cgroup-array/

> I presume you mean removing the last BPF program holding a reference on
> the cgroup array map.
Yes

> (Any user space visibility like struct files given
> from the anon inode and pinnings are tracked via uref, btw, which is
> needed to break possible complex dependencies among tail called programs.)
Yep. Understood on prog_array use case.

Thanks,
-- Martin


Re: [PATCH net-next v2 2/4] cgroup: bpf: Add BPF_MAP_TYPE_CGROUP_ARRAY

2016-06-23 Thread Daniel Borkmann

On 06/23/2016 11:26 PM, Martin KaFai Lau wrote:

On Thu, Jun 23, 2016 at 11:42:31AM +0200, Daniel Borkmann wrote:

Hi Martin,

[ sorry to jump late in here, on pto currently ]

Thanks for reviewing.


Could you describe a bit more with regards to pinning maps and how this
should interact with cgroups? The two specialized array maps we have (tail
calls, perf events) have fairly complicated semantics for when to clean up
map slots (see commits c9da161c6517ba1, 3b1efb196eee45b2f0c4).

How is this managed with cgroups? Once a cgroup fd is placed into a map and
the user removes the cgroup, will this be prevented due to 'being busy', or
will the cgroup live further as long as a program is running with a cgroup
map entry (but the cgroup itself is not visible from user space in any way
anymore)?

Having a cgroup ptr stored in the bpf_map will not stop the user from
removing the cgroup (by rmdir /mnt/cgroup2/tc/test_cgrp).


Right.


The cgroup ptr stored in the bpf_map holds a refcnt which answer the
second part.


Yep, clear.


The situation is similar to the netfilter usecase in
commit 38c4597e4bf ("netfilter: implement xt_cgroup cgroup2 path match")


I presume it's a valid use case to pin a cgroup map, put fds into it and
remove the pinned file expecting to continue to match on it, right? So
lifetime is really until last prog using a cgroup map somewhere gets removed
(even if not accessible from user space anymore, meaning no prog has fd and
pinned file was removed).

Yes.

We are still hatching out how to set this up in production. However, the
situation is similar to removing the pinned file.


I presume you mean removing the last BPF program holding a reference on
the cgroup array map. (Any user space visibility like struct files given
from the anon inode and pinnings are tracked via uref, btw, which is
needed to break possible complex dependencies among tail called programs.)
But dropping cgroup ref at latest when the last map ref is dropped as you
currently do seems fine. It makes cgroup array maps effectively no different
from plain regular array maps.


We probably will not use tc and pin a bpf_map to do that.  Instead,
one process will setup eveything (e.g. create the cgroup, pouplate the
cgroup map, load the bpf to egress) and then go away.


Yep, that seems a valid case as well, both use cases (pinned and non-pinned)
should be fine with your code then.

Thanks,
Daniel


Re: [PATCH net-next v2 2/4] cgroup: bpf: Add BPF_MAP_TYPE_CGROUP_ARRAY

2016-06-23 Thread Daniel Borkmann

On 06/23/2016 11:13 PM, Tejun Heo wrote:

Hello,

On Thu, Jun 23, 2016 at 11:42:31AM +0200, Daniel Borkmann wrote:

I presume it's a valid use case to pin a cgroup map, put fds into it and
remove the pinned file expecting to continue to match on it, right? So
lifetime is really until last prog using a cgroup map somewhere gets removed
(even if not accessible from user space anymore, meaning no prog has fd and
pinned file was removed).


Yeap, from what I can see, the cgroup will stay around (even if it
gets deleted) as long as the bpf rule using it is around and that's
completely fine from cgroup side.


Ok, thanks for confirming!


Thanks.


Re: [PATCH net-next v2 2/4] cgroup: bpf: Add BPF_MAP_TYPE_CGROUP_ARRAY

2016-06-23 Thread Martin KaFai Lau
On Thu, Jun 23, 2016 at 11:42:31AM +0200, Daniel Borkmann wrote:
> Hi Martin,
>
> [ sorry to jump late in here, on pto currently ]
Thanks for reviewing.

> Could you describe a bit more with regards to pinning maps and how this
> should interact with cgroups? The two specialized array maps we have (tail
> calls, perf events) have fairly complicated semantics for when to clean up
> map slots (see commits c9da161c6517ba1, 3b1efb196eee45b2f0c4).
>
> How is this managed with cgroups? Once a cgroup fd is placed into a map and
> the user removes the cgroup, will this be prevented due to 'being busy', or
> will the cgroup live further as long as a program is running with a cgroup
> map entry (but the cgroup itself is not visible from user space in any way
> anymore)?
Having a cgroup ptr stored in the bpf_map will not stop the user from
removing the cgroup (by rmdir /mnt/cgroup2/tc/test_cgrp).

The cgroup ptr stored in the bpf_map holds a refcnt which answer the
second part.

The situation is similar to the netfilter usecase in
commit 38c4597e4bf ("netfilter: implement xt_cgroup cgroup2 path match")

>
> I presume it's a valid use case to pin a cgroup map, put fds into it and
> remove the pinned file expecting to continue to match on it, right? So
> lifetime is really until last prog using a cgroup map somewhere gets removed
> (even if not accessible from user space anymore, meaning no prog has fd and
> pinned file was removed).
Yes.

We are still hatching out how to set this up in production. However, the
situation is similar to removing the pinned file.
We probably will not use tc and pin a bpf_map to do that.  Instead,
one process will setup eveything (e.g. create the cgroup, pouplate the
cgroup map, load the bpf to egress) and then go away.

I don't think we need a prog fd to remove the bpf prog.

>
> I assume that using struct file here doesn't make sense (commit 
> e03e7ee34fdd1c3)
> either, right?
No. I don't think so. We eventually need a cgroup from the 'struct file'.


Re: [PATCH net-next v2 2/4] cgroup: bpf: Add BPF_MAP_TYPE_CGROUP_ARRAY

2016-06-23 Thread Tejun Heo
Hello,

On Thu, Jun 23, 2016 at 11:42:31AM +0200, Daniel Borkmann wrote:
> I presume it's a valid use case to pin a cgroup map, put fds into it and
> remove the pinned file expecting to continue to match on it, right? So
> lifetime is really until last prog using a cgroup map somewhere gets removed
> (even if not accessible from user space anymore, meaning no prog has fd and
> pinned file was removed).

Yeap, from what I can see, the cgroup will stay around (even if it
gets deleted) as long as the bpf rule using it is around and that's
completely fine from cgroup side.

Thanks.

-- 
tejun


Re: [PATCH net-next v2 2/4] cgroup: bpf: Add BPF_MAP_TYPE_CGROUP_ARRAY

2016-06-23 Thread Daniel Borkmann

Hi Martin,

[ sorry to jump late in here, on pto currently ]

On 06/22/2016 11:17 PM, Martin KaFai Lau wrote:

Add a BPF_MAP_TYPE_CGROUP_ARRAY and its bpf_map_ops's implementations.
To update an element, the caller is expected to obtain a cgroup2 backed
fd by open(cgroup2_dir) and then update the array with that fd.

Signed-off-by: Martin KaFai Lau 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Cc: Tejun Heo 
Acked-by: Alexei Starovoitov 


Could you describe a bit more with regards to pinning maps and how this
should interact with cgroups? The two specialized array maps we have (tail
calls, perf events) have fairly complicated semantics for when to clean up
map slots (see commits c9da161c6517ba1, 3b1efb196eee45b2f0c4).

How is this managed with cgroups? Once a cgroup fd is placed into a map and
the user removes the cgroup, will this be prevented due to 'being busy', or
will the cgroup live further as long as a program is running with a cgroup
map entry (but the cgroup itself is not visible from user space in any way
anymore)?

I presume it's a valid use case to pin a cgroup map, put fds into it and
remove the pinned file expecting to continue to match on it, right? So
lifetime is really until last prog using a cgroup map somewhere gets removed
(even if not accessible from user space anymore, meaning no prog has fd and
pinned file was removed).

I assume that using struct file here doesn't make sense (commit e03e7ee34fdd1c3)
either, right?

[...]

+#ifdef CONFIG_CGROUPS
+static void *cgroup_fd_array_get_ptr(struct bpf_map *map,
+struct file *map_file /* not used */,
+int fd)
+{
+   return cgroup_get_from_fd(fd);
+}
+
+static void cgroup_fd_array_put_ptr(void *ptr)
+{
+   /* cgroup_put free cgrp after a rcu grace period */
+   cgroup_put(ptr);


Yeah, as long as this respects freeing after RCU grace period, it's fine
like this ...


+}
+
+static void cgroup_fd_array_free(struct bpf_map *map)
+{
+   bpf_fd_array_map_clear(map);
+   fd_array_map_free(map);
+}
+
+static const struct bpf_map_ops cgroup_array_ops = {
+   .map_alloc = fd_array_map_alloc,
+   .map_free = cgroup_fd_array_free,
+   .map_get_next_key = array_map_get_next_key,
+   .map_lookup_elem = fd_array_map_lookup_elem,
+   .map_delete_elem = fd_array_map_delete_elem,
+   .map_fd_get_ptr = cgroup_fd_array_get_ptr,
+   .map_fd_put_ptr = cgroup_fd_array_put_ptr,
+};
+
+static struct bpf_map_type_list cgroup_array_type __read_mostly = {
+   .ops = _array_ops,
+   .type = BPF_MAP_TYPE_CGROUP_ARRAY,
+};
+
+static int __init register_cgroup_array_map(void)
+{
+   bpf_register_map_type(_array_type);
+   return 0;
+}
+late_initcall(register_cgroup_array_map);
+#endif
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index c23a4e93..cac13f1 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -393,7 +393,8 @@ static int map_update_elem(union bpf_attr *attr)
} else if (map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY) {
err = bpf_percpu_array_update(map, key, value, attr->flags);
} else if (map->map_type == BPF_MAP_TYPE_PERF_EVENT_ARRAY ||
-  map->map_type == BPF_MAP_TYPE_PROG_ARRAY) {
+  map->map_type == BPF_MAP_TYPE_PROG_ARRAY ||
+  map->map_type == BPF_MAP_TYPE_CGROUP_ARRAY) {
rcu_read_lock();
err = bpf_fd_array_map_update_elem(map, f.file, key, value,
   attr->flags);





[PATCH net-next v2 2/4] cgroup: bpf: Add BPF_MAP_TYPE_CGROUP_ARRAY

2016-06-22 Thread Martin KaFai Lau
Add a BPF_MAP_TYPE_CGROUP_ARRAY and its bpf_map_ops's implementations.
To update an element, the caller is expected to obtain a cgroup2 backed
fd by open(cgroup2_dir) and then update the array with that fd.

Signed-off-by: Martin KaFai Lau 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Cc: Tejun Heo 
Acked-by: Alexei Starovoitov 
---
 include/uapi/linux/bpf.h |  1 +
 kernel/bpf/arraymap.c| 43 +++
 kernel/bpf/syscall.c |  3 ++-
 3 files changed, 46 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 406459b..ef4e386 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -84,6 +84,7 @@ enum bpf_map_type {
BPF_MAP_TYPE_PERCPU_HASH,
BPF_MAP_TYPE_PERCPU_ARRAY,
BPF_MAP_TYPE_STACK_TRACE,
+   BPF_MAP_TYPE_CGROUP_ARRAY,
 };
 
 enum bpf_prog_type {
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 5af3073..aacd40b 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -539,3 +539,46 @@ static int __init register_perf_event_array_map(void)
return 0;
 }
 late_initcall(register_perf_event_array_map);
+
+#ifdef CONFIG_CGROUPS
+static void *cgroup_fd_array_get_ptr(struct bpf_map *map,
+struct file *map_file /* not used */,
+int fd)
+{
+   return cgroup_get_from_fd(fd);
+}
+
+static void cgroup_fd_array_put_ptr(void *ptr)
+{
+   /* cgroup_put free cgrp after a rcu grace period */
+   cgroup_put(ptr);
+}
+
+static void cgroup_fd_array_free(struct bpf_map *map)
+{
+   bpf_fd_array_map_clear(map);
+   fd_array_map_free(map);
+}
+
+static const struct bpf_map_ops cgroup_array_ops = {
+   .map_alloc = fd_array_map_alloc,
+   .map_free = cgroup_fd_array_free,
+   .map_get_next_key = array_map_get_next_key,
+   .map_lookup_elem = fd_array_map_lookup_elem,
+   .map_delete_elem = fd_array_map_delete_elem,
+   .map_fd_get_ptr = cgroup_fd_array_get_ptr,
+   .map_fd_put_ptr = cgroup_fd_array_put_ptr,
+};
+
+static struct bpf_map_type_list cgroup_array_type __read_mostly = {
+   .ops = _array_ops,
+   .type = BPF_MAP_TYPE_CGROUP_ARRAY,
+};
+
+static int __init register_cgroup_array_map(void)
+{
+   bpf_register_map_type(_array_type);
+   return 0;
+}
+late_initcall(register_cgroup_array_map);
+#endif
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index c23a4e93..cac13f1 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -393,7 +393,8 @@ static int map_update_elem(union bpf_attr *attr)
} else if (map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY) {
err = bpf_percpu_array_update(map, key, value, attr->flags);
} else if (map->map_type == BPF_MAP_TYPE_PERF_EVENT_ARRAY ||
-  map->map_type == BPF_MAP_TYPE_PROG_ARRAY) {
+  map->map_type == BPF_MAP_TYPE_PROG_ARRAY ||
+  map->map_type == BPF_MAP_TYPE_CGROUP_ARRAY) {
rcu_read_lock();
err = bpf_fd_array_map_update_elem(map, f.file, key, value,
   attr->flags);
-- 
2.5.1