Re: [PATCH net-next v2 2/4] cgroup: bpf: Add BPF_MAP_TYPE_CGROUP_ARRAY
On Thu, Jun 23, 2016 at 11:50:08PM +0200, Daniel Borkmann wrote: > On 06/23/2016 11:26 PM, Martin KaFai Lau wrote: > >We are still hatching out how to set this up in production. However, the > >situation is similar to removing the pinned file. s/pinned file/pinned cgroup-array/ > I presume you mean removing the last BPF program holding a reference on > the cgroup array map. Yes > (Any user space visibility like struct files given > from the anon inode and pinnings are tracked via uref, btw, which is > needed to break possible complex dependencies among tail called programs.) Yep. Understood on prog_array use case. Thanks, -- Martin
Re: [PATCH net-next v2 2/4] cgroup: bpf: Add BPF_MAP_TYPE_CGROUP_ARRAY
On 06/23/2016 11:26 PM, Martin KaFai Lau wrote: On Thu, Jun 23, 2016 at 11:42:31AM +0200, Daniel Borkmann wrote: Hi Martin, [ sorry to jump late in here, on pto currently ] Thanks for reviewing. Could you describe a bit more with regards to pinning maps and how this should interact with cgroups? The two specialized array maps we have (tail calls, perf events) have fairly complicated semantics for when to clean up map slots (see commits c9da161c6517ba1, 3b1efb196eee45b2f0c4). How is this managed with cgroups? Once a cgroup fd is placed into a map and the user removes the cgroup, will this be prevented due to 'being busy', or will the cgroup live further as long as a program is running with a cgroup map entry (but the cgroup itself is not visible from user space in any way anymore)? Having a cgroup ptr stored in the bpf_map will not stop the user from removing the cgroup (by rmdir /mnt/cgroup2/tc/test_cgrp). Right. The cgroup ptr stored in the bpf_map holds a refcnt which answer the second part. Yep, clear. The situation is similar to the netfilter usecase in commit 38c4597e4bf ("netfilter: implement xt_cgroup cgroup2 path match") I presume it's a valid use case to pin a cgroup map, put fds into it and remove the pinned file expecting to continue to match on it, right? So lifetime is really until last prog using a cgroup map somewhere gets removed (even if not accessible from user space anymore, meaning no prog has fd and pinned file was removed). Yes. We are still hatching out how to set this up in production. However, the situation is similar to removing the pinned file. I presume you mean removing the last BPF program holding a reference on the cgroup array map. (Any user space visibility like struct files given from the anon inode and pinnings are tracked via uref, btw, which is needed to break possible complex dependencies among tail called programs.) But dropping cgroup ref at latest when the last map ref is dropped as you currently do seems fine. It makes cgroup array maps effectively no different from plain regular array maps. We probably will not use tc and pin a bpf_map to do that. Instead, one process will setup eveything (e.g. create the cgroup, pouplate the cgroup map, load the bpf to egress) and then go away. Yep, that seems a valid case as well, both use cases (pinned and non-pinned) should be fine with your code then. Thanks, Daniel
Re: [PATCH net-next v2 2/4] cgroup: bpf: Add BPF_MAP_TYPE_CGROUP_ARRAY
On 06/23/2016 11:13 PM, Tejun Heo wrote: Hello, On Thu, Jun 23, 2016 at 11:42:31AM +0200, Daniel Borkmann wrote: I presume it's a valid use case to pin a cgroup map, put fds into it and remove the pinned file expecting to continue to match on it, right? So lifetime is really until last prog using a cgroup map somewhere gets removed (even if not accessible from user space anymore, meaning no prog has fd and pinned file was removed). Yeap, from what I can see, the cgroup will stay around (even if it gets deleted) as long as the bpf rule using it is around and that's completely fine from cgroup side. Ok, thanks for confirming! Thanks.
Re: [PATCH net-next v2 2/4] cgroup: bpf: Add BPF_MAP_TYPE_CGROUP_ARRAY
On Thu, Jun 23, 2016 at 11:42:31AM +0200, Daniel Borkmann wrote: > Hi Martin, > > [ sorry to jump late in here, on pto currently ] Thanks for reviewing. > Could you describe a bit more with regards to pinning maps and how this > should interact with cgroups? The two specialized array maps we have (tail > calls, perf events) have fairly complicated semantics for when to clean up > map slots (see commits c9da161c6517ba1, 3b1efb196eee45b2f0c4). > > How is this managed with cgroups? Once a cgroup fd is placed into a map and > the user removes the cgroup, will this be prevented due to 'being busy', or > will the cgroup live further as long as a program is running with a cgroup > map entry (but the cgroup itself is not visible from user space in any way > anymore)? Having a cgroup ptr stored in the bpf_map will not stop the user from removing the cgroup (by rmdir /mnt/cgroup2/tc/test_cgrp). The cgroup ptr stored in the bpf_map holds a refcnt which answer the second part. The situation is similar to the netfilter usecase in commit 38c4597e4bf ("netfilter: implement xt_cgroup cgroup2 path match") > > I presume it's a valid use case to pin a cgroup map, put fds into it and > remove the pinned file expecting to continue to match on it, right? So > lifetime is really until last prog using a cgroup map somewhere gets removed > (even if not accessible from user space anymore, meaning no prog has fd and > pinned file was removed). Yes. We are still hatching out how to set this up in production. However, the situation is similar to removing the pinned file. We probably will not use tc and pin a bpf_map to do that. Instead, one process will setup eveything (e.g. create the cgroup, pouplate the cgroup map, load the bpf to egress) and then go away. I don't think we need a prog fd to remove the bpf prog. > > I assume that using struct file here doesn't make sense (commit > e03e7ee34fdd1c3) > either, right? No. I don't think so. We eventually need a cgroup from the 'struct file'.
Re: [PATCH net-next v2 2/4] cgroup: bpf: Add BPF_MAP_TYPE_CGROUP_ARRAY
Hello, On Thu, Jun 23, 2016 at 11:42:31AM +0200, Daniel Borkmann wrote: > I presume it's a valid use case to pin a cgroup map, put fds into it and > remove the pinned file expecting to continue to match on it, right? So > lifetime is really until last prog using a cgroup map somewhere gets removed > (even if not accessible from user space anymore, meaning no prog has fd and > pinned file was removed). Yeap, from what I can see, the cgroup will stay around (even if it gets deleted) as long as the bpf rule using it is around and that's completely fine from cgroup side. Thanks. -- tejun
Re: [PATCH net-next v2 2/4] cgroup: bpf: Add BPF_MAP_TYPE_CGROUP_ARRAY
Hi Martin, [ sorry to jump late in here, on pto currently ] On 06/22/2016 11:17 PM, Martin KaFai Lau wrote: Add a BPF_MAP_TYPE_CGROUP_ARRAY and its bpf_map_ops's implementations. To update an element, the caller is expected to obtain a cgroup2 backed fd by open(cgroup2_dir) and then update the array with that fd. Signed-off-by: Martin KaFai LauCc: Alexei Starovoitov Cc: Daniel Borkmann Cc: Tejun Heo Acked-by: Alexei Starovoitov Could you describe a bit more with regards to pinning maps and how this should interact with cgroups? The two specialized array maps we have (tail calls, perf events) have fairly complicated semantics for when to clean up map slots (see commits c9da161c6517ba1, 3b1efb196eee45b2f0c4). How is this managed with cgroups? Once a cgroup fd is placed into a map and the user removes the cgroup, will this be prevented due to 'being busy', or will the cgroup live further as long as a program is running with a cgroup map entry (but the cgroup itself is not visible from user space in any way anymore)? I presume it's a valid use case to pin a cgroup map, put fds into it and remove the pinned file expecting to continue to match on it, right? So lifetime is really until last prog using a cgroup map somewhere gets removed (even if not accessible from user space anymore, meaning no prog has fd and pinned file was removed). I assume that using struct file here doesn't make sense (commit e03e7ee34fdd1c3) either, right? [...] +#ifdef CONFIG_CGROUPS +static void *cgroup_fd_array_get_ptr(struct bpf_map *map, +struct file *map_file /* not used */, +int fd) +{ + return cgroup_get_from_fd(fd); +} + +static void cgroup_fd_array_put_ptr(void *ptr) +{ + /* cgroup_put free cgrp after a rcu grace period */ + cgroup_put(ptr); Yeah, as long as this respects freeing after RCU grace period, it's fine like this ... +} + +static void cgroup_fd_array_free(struct bpf_map *map) +{ + bpf_fd_array_map_clear(map); + fd_array_map_free(map); +} + +static const struct bpf_map_ops cgroup_array_ops = { + .map_alloc = fd_array_map_alloc, + .map_free = cgroup_fd_array_free, + .map_get_next_key = array_map_get_next_key, + .map_lookup_elem = fd_array_map_lookup_elem, + .map_delete_elem = fd_array_map_delete_elem, + .map_fd_get_ptr = cgroup_fd_array_get_ptr, + .map_fd_put_ptr = cgroup_fd_array_put_ptr, +}; + +static struct bpf_map_type_list cgroup_array_type __read_mostly = { + .ops = _array_ops, + .type = BPF_MAP_TYPE_CGROUP_ARRAY, +}; + +static int __init register_cgroup_array_map(void) +{ + bpf_register_map_type(_array_type); + return 0; +} +late_initcall(register_cgroup_array_map); +#endif diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index c23a4e93..cac13f1 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -393,7 +393,8 @@ static int map_update_elem(union bpf_attr *attr) } else if (map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY) { err = bpf_percpu_array_update(map, key, value, attr->flags); } else if (map->map_type == BPF_MAP_TYPE_PERF_EVENT_ARRAY || - map->map_type == BPF_MAP_TYPE_PROG_ARRAY) { + map->map_type == BPF_MAP_TYPE_PROG_ARRAY || + map->map_type == BPF_MAP_TYPE_CGROUP_ARRAY) { rcu_read_lock(); err = bpf_fd_array_map_update_elem(map, f.file, key, value, attr->flags);
[PATCH net-next v2 2/4] cgroup: bpf: Add BPF_MAP_TYPE_CGROUP_ARRAY
Add a BPF_MAP_TYPE_CGROUP_ARRAY and its bpf_map_ops's implementations. To update an element, the caller is expected to obtain a cgroup2 backed fd by open(cgroup2_dir) and then update the array with that fd. Signed-off-by: Martin KaFai LauCc: Alexei Starovoitov Cc: Daniel Borkmann Cc: Tejun Heo Acked-by: Alexei Starovoitov --- include/uapi/linux/bpf.h | 1 + kernel/bpf/arraymap.c| 43 +++ kernel/bpf/syscall.c | 3 ++- 3 files changed, 46 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 406459b..ef4e386 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -84,6 +84,7 @@ enum bpf_map_type { BPF_MAP_TYPE_PERCPU_HASH, BPF_MAP_TYPE_PERCPU_ARRAY, BPF_MAP_TYPE_STACK_TRACE, + BPF_MAP_TYPE_CGROUP_ARRAY, }; enum bpf_prog_type { diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c index 5af3073..aacd40b 100644 --- a/kernel/bpf/arraymap.c +++ b/kernel/bpf/arraymap.c @@ -539,3 +539,46 @@ static int __init register_perf_event_array_map(void) return 0; } late_initcall(register_perf_event_array_map); + +#ifdef CONFIG_CGROUPS +static void *cgroup_fd_array_get_ptr(struct bpf_map *map, +struct file *map_file /* not used */, +int fd) +{ + return cgroup_get_from_fd(fd); +} + +static void cgroup_fd_array_put_ptr(void *ptr) +{ + /* cgroup_put free cgrp after a rcu grace period */ + cgroup_put(ptr); +} + +static void cgroup_fd_array_free(struct bpf_map *map) +{ + bpf_fd_array_map_clear(map); + fd_array_map_free(map); +} + +static const struct bpf_map_ops cgroup_array_ops = { + .map_alloc = fd_array_map_alloc, + .map_free = cgroup_fd_array_free, + .map_get_next_key = array_map_get_next_key, + .map_lookup_elem = fd_array_map_lookup_elem, + .map_delete_elem = fd_array_map_delete_elem, + .map_fd_get_ptr = cgroup_fd_array_get_ptr, + .map_fd_put_ptr = cgroup_fd_array_put_ptr, +}; + +static struct bpf_map_type_list cgroup_array_type __read_mostly = { + .ops = _array_ops, + .type = BPF_MAP_TYPE_CGROUP_ARRAY, +}; + +static int __init register_cgroup_array_map(void) +{ + bpf_register_map_type(_array_type); + return 0; +} +late_initcall(register_cgroup_array_map); +#endif diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index c23a4e93..cac13f1 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -393,7 +393,8 @@ static int map_update_elem(union bpf_attr *attr) } else if (map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY) { err = bpf_percpu_array_update(map, key, value, attr->flags); } else if (map->map_type == BPF_MAP_TYPE_PERF_EVENT_ARRAY || - map->map_type == BPF_MAP_TYPE_PROG_ARRAY) { + map->map_type == BPF_MAP_TYPE_PROG_ARRAY || + map->map_type == BPF_MAP_TYPE_CGROUP_ARRAY) { rcu_read_lock(); err = bpf_fd_array_map_update_elem(map, f.file, key, value, attr->flags); -- 2.5.1