subject:"\[PATCH v3 2\/6\] cgroup\: add support for eBPF programs"

Re: [PATCH v3 2/6] cgroup: add support for eBPF programs

2016-09-05 Thread Alexei Starovoitov


On 9/5/16 2:40 PM, Sargun Dhillon wrote:

On Mon, Sep 05, 2016 at 04:49:26PM +0200, Daniel Mack wrote:

Hi,

On 08/30/2016 01:04 AM, Sargun Dhillon wrote:

On Fri, Aug 26, 2016 at 09:58:48PM +0200, Daniel Mack wrote:

This patch adds two sets of eBPF program pointers to struct cgroup.
One for such that are directly pinned to a cgroup, and one for such
that are effective for it.

To illustrate the logic behind that, assume the following example
cgroup hierarchy.

   A - B - C
 \ D - E

If only B has a program attached, it will be effective for B, C, D
and E. If D then attaches a program itself, that will be effective for
both D and E, and the program in B will only affect B and C. Only one
program of a given type is effective for a cgroup.


How does this work when running and orchestrator within an orchestrator? The
Docker in Docker / Mesos in Mesos use case, where the top level orchestrator is
observing the traffic, and there is an orchestrator within that also need to run
it.

In this case, I'd like to run E's filter, then if it returns 0, D's, and B's,
and so on.


Running multiple programs was an idea I had in one of my earlier drafts,
but after some discussion, I refrained from it again because potentially
walking the cgroup hierarchy on every packet is just too expensive.


I think you're correct here. Maybe this is something I do with the LSM-attached
filters, and not for skb filters. Do you think there might be a way to opt-in to
this option?


Is it possible to allow this, either by flattening out the
datastructure (copy a ref to the bpf programs to C and E) or
something similar?


That would mean we carry a list of eBPF program pointers of dynamic
size. IOW, the deeper inside the cgroup hierarchy, the bigger the list,
so it can store a reference to all programs of all of its ancestor.

While I think that would be possible, even at some later point, I'd
really like to avoid it for the sake of simplicity.

Is there any reason why this can't be done in userspace? Compile a
program X for A, and overload it with Y, with Y doing the same than X
but add some extra checks? Note that all users of the bpf(2) syscall API
will need CAP_NET_ADMIN anyway, so there is no delegation to
unprivileged sub-orchestators or anything alike really.


One of the use-cases that's becoming more and more common are
containers-in-containers. In this, you have a privileged container that's
running something like build orchestration, and you want to do macro-isolation
(say limit access to only that tennant's infrastructure). Then, when the build
orchestrator runs a build, it may want to monitor, and further isolate the tasks
that run in the build job. This is a side-effect of composing different
container technologies. Typically you use one system for images, then another
for orchestration, and the actual program running inside of it can also leverage
containerization.

Example:
K8s->Docker->Jenkins Agent->Jenkins Build Job


frankly I don't buy this argument, since above
and other 'examples' of container-in-container look
fake to me. There is a ton work to be done for such
scheme to be even remotely feasible. The cgroup+bpf
stuff would be the last on my list to 'fix' for such
deployments. I don't think we should worry about it
at present.

Re: [PATCH v3 2/6] cgroup: add support for eBPF programs

2016-09-05 Thread Sargun Dhillon

On Mon, Sep 05, 2016 at 04:49:26PM +0200, Daniel Mack wrote:
> Hi,
> 
> On 08/30/2016 01:04 AM, Sargun Dhillon wrote:
> > On Fri, Aug 26, 2016 at 09:58:48PM +0200, Daniel Mack wrote:
> >> This patch adds two sets of eBPF program pointers to struct cgroup.
> >> One for such that are directly pinned to a cgroup, and one for such
> >> that are effective for it.
> >>
> >> To illustrate the logic behind that, assume the following example
> >> cgroup hierarchy.
> >>
> >>   A - B - C
> >> \ D - E
> >>
> >> If only B has a program attached, it will be effective for B, C, D
> >> and E. If D then attaches a program itself, that will be effective for
> >> both D and E, and the program in B will only affect B and C. Only one
> >> program of a given type is effective for a cgroup.
> >>
> > How does this work when running and orchestrator within an orchestrator? 
> > The 
> > Docker in Docker / Mesos in Mesos use case, where the top level 
> > orchestrator is 
> > observing the traffic, and there is an orchestrator within that also need 
> > to run 
> > it.
> > 
> > In this case, I'd like to run E's filter, then if it returns 0, D's, and 
> > B's, 
> > and so on.
> 
> Running multiple programs was an idea I had in one of my earlier drafts,
> but after some discussion, I refrained from it again because potentially
> walking the cgroup hierarchy on every packet is just too expensive.
>
I think you're correct here. Maybe this is something I do with the LSM-attached 
filters, and not for skb filters. Do you think there might be a way to opt-in 
to 
this option? 

> > Is it possible to allow this, either by flattening out the
> > datastructure (copy a ref to the bpf programs to C and E) or
> > something similar?
> 
> That would mean we carry a list of eBPF program pointers of dynamic
> size. IOW, the deeper inside the cgroup hierarchy, the bigger the list,
> so it can store a reference to all programs of all of its ancestor.
> 
> While I think that would be possible, even at some later point, I'd
> really like to avoid it for the sake of simplicity.
> 
> Is there any reason why this can't be done in userspace? Compile a
> program X for A, and overload it with Y, with Y doing the same than X
> but add some extra checks? Note that all users of the bpf(2) syscall API
> will need CAP_NET_ADMIN anyway, so there is no delegation to
> unprivileged sub-orchestators or anything alike really.

One of the use-cases that's becoming more and more common are 
containers-in-containers. In this, you have a privileged container that's 
running something like build orchestration, and you want to do macro-isolation 
(say limit access to only that tennant's infrastructure). Then, when the build 
orchestrator runs a build, it may want to monitor, and further isolate the 
tasks 
that run in the build job. This is a side-effect of composing different 
container technologies. Typically you use one system for images, then another 
for orchestration, and the actual program running inside of it can also 
leverage 
containerization.

Example:
K8s->Docker->Jenkins Agent->Jenkins Build Job

There's also a differentiation of ownership in each of these systems. I would 
really not require a middleware system that all my software has to talk to, 
because sometimes I'm taking off the shelf software (Jenkins), and porting it 
to 
containers. I think one of the pieces that's led to the success of cgroups is 
the straightforward API, and ease of use (and it's getting even easier in v2).

It's perfectly fine to give the lower level tasks CAP_NET_ADMIN, because we use 
something like seccomp-bpf plus some of the work I've been doing with the LSM 
to 
prevent the sub-orchestrators from accidentally blowing away the system. 
Usually, we trust these orchestrators (internal users), so it's more of a 
precautionary measure as opposed to a true security measure.

Also, rewriting BPF programs, although pretty straightforward sounds like a 
pain 
to do in userspace, even with a helper. If we were to take peoples programs and 
chain them together via tail call, or similar, I can imagine where rewriting a 
program might push you over the instruction limit.
> 
> 
> Thanks,
> Daniel
>

Re: [PATCH v3 2/6] cgroup: add support for eBPF programs

2016-09-05 Thread Daniel Mack

Hi,

On 08/30/2016 01:04 AM, Sargun Dhillon wrote:
> On Fri, Aug 26, 2016 at 09:58:48PM +0200, Daniel Mack wrote:
>> This patch adds two sets of eBPF program pointers to struct cgroup.
>> One for such that are directly pinned to a cgroup, and one for such
>> that are effective for it.
>>
>> To illustrate the logic behind that, assume the following example
>> cgroup hierarchy.
>>
>>   A - B - C
>> \ D - E
>>
>> If only B has a program attached, it will be effective for B, C, D
>> and E. If D then attaches a program itself, that will be effective for
>> both D and E, and the program in B will only affect B and C. Only one
>> program of a given type is effective for a cgroup.
>>
> How does this work when running and orchestrator within an orchestrator? The 
> Docker in Docker / Mesos in Mesos use case, where the top level orchestrator 
> is 
> observing the traffic, and there is an orchestrator within that also need to 
> run 
> it.
> 
> In this case, I'd like to run E's filter, then if it returns 0, D's, and B's, 
> and so on.

Running multiple programs was an idea I had in one of my earlier drafts,
but after some discussion, I refrained from it again because potentially
walking the cgroup hierarchy on every packet is just too expensive.

> Is it possible to allow this, either by flattening out the
> datastructure (copy a ref to the bpf programs to C and E) or
> something similar?

That would mean we carry a list of eBPF program pointers of dynamic
size. IOW, the deeper inside the cgroup hierarchy, the bigger the list,
so it can store a reference to all programs of all of its ancestor.

While I think that would be possible, even at some later point, I'd
really like to avoid it for the sake of simplicity.

Is there any reason why this can't be done in userspace? Compile a
program X for A, and overload it with Y, with Y doing the same than X
but add some extra checks? Note that all users of the bpf(2) syscall API
will need CAP_NET_ADMIN anyway, so there is no delegation to
unprivileged sub-orchestators or anything alike really.

Thanks,
Daniel

Re: [PATCH v3 2/6] cgroup: add support for eBPF programs

2016-09-05 Thread Daniel Mack

On 08/30/2016 12:42 AM, Daniel Borkmann wrote:
> On 08/26/2016 09:58 PM, Daniel Mack wrote:
>> This patch adds two sets of eBPF program pointers to struct cgroup.
>> One for such that are directly pinned to a cgroup, and one for such
>> that are effective for it.
>>
>> To illustrate the logic behind that, assume the following example
>> cgroup hierarchy.
>>
>>A - B - C
>>  \ D - E
>>
>> If only B has a program attached, it will be effective for B, C, D
>> and E. If D then attaches a program itself, that will be effective for
>> both D and E, and the program in B will only affect B and C. Only one
>> program of a given type is effective for a cgroup.
>>
>> Attaching and detaching programs will be done through the bpf(2)
>> syscall. For now, ingress and egress inet socket filtering are the
>> only supported use-cases.
>>
>> Signed-off-by: Daniel Mack 
> [...]
>> +void __cgroup_bpf_update(struct cgroup *cgrp,
>> + struct cgroup *parent,
>> + struct bpf_prog *prog,
>> + enum bpf_attach_type type)
>> +{
>> +struct bpf_prog *old_prog, *effective;
>> +struct cgroup_subsys_state *pos;
>> +
>> +old_prog = xchg(cgrp->bpf.prog + type, prog);
>> +
>> +if (prog)
>> +static_branch_inc(_bpf_enabled_key);
>> +
>> +if (old_prog) {
>> +bpf_prog_put(old_prog);
>> +static_branch_dec(_bpf_enabled_key);
>> +}
>> +
>> +effective = (!prog && parent) ?
>> +rcu_dereference_protected(parent->bpf.effective[type],
>> +  lockdep_is_held(_mutex)) :
>> +prog;
>> +
>> +css_for_each_descendant_pre(pos, >self) {
>> +struct cgroup *desc = container_of(pos, struct cgroup, self);
>> +
>> +/* skip the subtree if the descendant has its own program */
>> +if (desc->bpf.prog[type] && desc != cgrp)
>> +pos = css_rightmost_descendant(pos);
>> +else
>> +rcu_assign_pointer(desc->bpf.effective[type],
>> +   effective);
>> +}
> 
> Shouldn't the old_prog reference only be released right here at the end
> instead of above (otherwise this could race)?

Yes, that's right. Will change as well. Thanks for spotting!


Daniel

Re: [PATCH v3 2/6] cgroup: add support for eBPF programs

2016-09-05 Thread Daniel Mack

Hi Alexei,

On 08/27/2016 02:03 AM, Alexei Starovoitov wrote:
> On Fri, Aug 26, 2016 at 09:58:48PM +0200, Daniel Mack wrote:
>> This patch adds two sets of eBPF program pointers to struct cgroup.
>> One for such that are directly pinned to a cgroup, and one for such
>> that are effective for it.
>>
>> To illustrate the logic behind that, assume the following example
>> cgroup hierarchy.
>>
>>   A - B - C
>> \ D - E
>>
>> If only B has a program attached, it will be effective for B, C, D
>> and E. If D then attaches a program itself, that will be effective for
>> both D and E, and the program in B will only affect B and C. Only one
>> program of a given type is effective for a cgroup.
>>
>> Attaching and detaching programs will be done through the bpf(2)
>> syscall. For now, ingress and egress inet socket filtering are the
>> only supported use-cases.
>>
>> Signed-off-by: Daniel Mack 
> ...
>> +css_for_each_descendant_pre(pos, >self) {
>> +struct cgroup *desc = container_of(pos, struct cgroup, self);
>> +
>> +/* skip the subtree if the descendant has its own program */
>> +if (desc->bpf.prog[type] && desc != cgrp)
> 
> is desc != cgrp really needed?
> I thought css_for_each_descendant_pre() shouldn't walk itself
> or I'm missing how it works.

Hmm, no - that check is necessary in my tests, and also according to the
documentation:

/**
 * css_for_each_descendant_pre - pre-order walk of a css's descendants
 * @pos: the css * to use as the loop cursor
 * @root: css whose descendants to walk
 *
 * Walk @root's descendants.  @root is included in the iteration and the
 * first node to be visited.  Must be called under rcu_read_lock().
 *


Daniel

Re: [PATCH v3 2/6] cgroup: add support for eBPF programs

2016-08-29 Thread Sargun Dhillon

On Fri, Aug 26, 2016 at 09:58:48PM +0200, Daniel Mack wrote:
> This patch adds two sets of eBPF program pointers to struct cgroup.
> One for such that are directly pinned to a cgroup, and one for such
> that are effective for it.
> 
> To illustrate the logic behind that, assume the following example
> cgroup hierarchy.
> 
>   A - B - C
> \ D - E
> 
> If only B has a program attached, it will be effective for B, C, D
> and E. If D then attaches a program itself, that will be effective for
> both D and E, and the program in B will only affect B and C. Only one
> program of a given type is effective for a cgroup.
> 
How does this work when running and orchestrator within an orchestrator? The 
Docker in Docker / Mesos in Mesos use case, where the top level orchestrator is 
observing the traffic, and there is an orchestrator within that also need to 
run 
it.

In this case, I'd like to run E's filter, then if it returns 0, D's, and B's, 
and so on. Is it possible to allow this, either by flattening out the 
datastructure (copy a ref to the bpf programs to C and E) or something similar?


> Attaching and detaching programs will be done through the bpf(2)
> syscall. For now, ingress and egress inet socket filtering are the
> only supported use-cases.
> 
> Signed-off-by: Daniel Mack 
> ---
>  include/linux/bpf-cgroup.h  |  70 +++
>  include/linux/cgroup-defs.h |   4 ++
>  init/Kconfig|  12 
>  kernel/bpf/Makefile |   1 +
>  kernel/bpf/cgroup.c | 165 
> 
>  kernel/cgroup.c |  18 +
>  6 files changed, 270 insertions(+)
>  create mode 100644 include/linux/bpf-cgroup.h
>  create mode 100644 kernel/bpf/cgroup.c
> 
> diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
> new file mode 100644
> index 000..a5a25c1
> --- /dev/null
> +++ b/include/linux/bpf-cgroup.h
> @@ -0,0 +1,70 @@
> +#ifndef _BPF_CGROUP_H
> +#define _BPF_CGROUP_H
> +
> +#include 
> +#include 
> +
> +struct sock;
> +struct cgroup;
> +struct sk_buff;
> +
> +#ifdef CONFIG_CGROUP_BPF
> +
> +extern struct static_key_false cgroup_bpf_enabled_key;
> +#define cgroup_bpf_enabled static_branch_unlikely(_bpf_enabled_key)
> +
> +struct cgroup_bpf {
> + /*
> +  * Store two sets of bpf_prog pointers, one for programs that are
> +  * pinned directly to this cgroup, and one for those that are effective
> +  * when this cgroup is accessed.
> +  */
> + struct bpf_prog *prog[__MAX_BPF_ATTACH_TYPE];
> + struct bpf_prog *effective[__MAX_BPF_ATTACH_TYPE];
> +};
> +
> +void cgroup_bpf_put(struct cgroup *cgrp);
> +void cgroup_bpf_inherit(struct cgroup *cgrp, struct cgroup *parent);
> +
> +void __cgroup_bpf_update(struct cgroup *cgrp,
> +  struct cgroup *parent,
> +  struct bpf_prog *prog,
> +  enum bpf_attach_type type);
> +
> +/* Wrapper for __cgroup_bpf_update() protected by cgroup_mutex */
> +void cgroup_bpf_update(struct cgroup *cgrp,
> +struct bpf_prog *prog,
> +enum bpf_attach_type type);
> +
> +int __cgroup_bpf_run_filter(struct sock *sk,
> + struct sk_buff *skb,
> + enum bpf_attach_type type);
> +
> +/* Wrapper for __cgroup_bpf_run_filter() guarded by cgroup_bpf_enabled */
> +static inline int cgroup_bpf_run_filter(struct sock *sk,
> + struct sk_buff *skb,
> + enum bpf_attach_type type)
> +{
> + if (cgroup_bpf_enabled)
> + return __cgroup_bpf_run_filter(sk, skb, type);
> +
> + return 0;
> +}
> +
> +#else
> +
> +struct cgroup_bpf {};
> +static inline void cgroup_bpf_put(struct cgroup *cgrp) {}
> +static inline void cgroup_bpf_inherit(struct cgroup *cgrp,
> +   struct cgroup *parent) {}
> +
> +static inline int cgroup_bpf_run_filter(struct sock *sk,
> + struct sk_buff *skb,
> + enum bpf_attach_type type)
> +{
> + return 0;
> +}
> +
> +#endif /* CONFIG_CGROUP_BPF */
> +
> +#endif /* _BPF_CGROUP_H */
> diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
> index 5b17de6..861b467 100644
> --- a/include/linux/cgroup-defs.h
> +++ b/include/linux/cgroup-defs.h
> @@ -16,6 +16,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #ifdef CONFIG_CGROUPS
>  
> @@ -300,6 +301,9 @@ struct cgroup {
>   /* used to schedule release agent */
>   struct work_struct release_agent_work;
>  
> + /* used to store eBPF programs */
> + struct cgroup_bpf bpf;
> +
>   /* ids of the ancestors at each level including self */
>   int ancestor_ids[];
>  };
> diff --git a/init/Kconfig b/init/Kconfig
> index cac3f09..5a89c83 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1144,6 +1144,18 @@ config CGROUP_PERF
>  
>

Re: [PATCH v3 2/6] cgroup: add support for eBPF programs

2016-08-29 Thread Daniel Borkmann


On 08/26/2016 09:58 PM, Daniel Mack wrote:

This patch adds two sets of eBPF program pointers to struct cgroup.
One for such that are directly pinned to a cgroup, and one for such
that are effective for it.

To illustrate the logic behind that, assume the following example
cgroup hierarchy.

   A - B - C
 \ D - E

If only B has a program attached, it will be effective for B, C, D
and E. If D then attaches a program itself, that will be effective for
both D and E, and the program in B will only affect B and C. Only one
program of a given type is effective for a cgroup.

Attaching and detaching programs will be done through the bpf(2)
syscall. For now, ingress and egress inet socket filtering are the
only supported use-cases.

Signed-off-by: Daniel Mack 

[...]

+void __cgroup_bpf_update(struct cgroup *cgrp,
+struct cgroup *parent,
+struct bpf_prog *prog,
+enum bpf_attach_type type)
+{
+   struct bpf_prog *old_prog, *effective;
+   struct cgroup_subsys_state *pos;
+
+   old_prog = xchg(cgrp->bpf.prog + type, prog);
+
+   if (prog)
+   static_branch_inc(_bpf_enabled_key);
+
+   if (old_prog) {
+   bpf_prog_put(old_prog);
+   static_branch_dec(_bpf_enabled_key);
+   }
+
+   effective = (!prog && parent) ?
+   rcu_dereference_protected(parent->bpf.effective[type],
+ lockdep_is_held(_mutex)) :
+   prog;
+
+   css_for_each_descendant_pre(pos, >self) {
+   struct cgroup *desc = container_of(pos, struct cgroup, self);
+
+   /* skip the subtree if the descendant has its own program */
+   if (desc->bpf.prog[type] && desc != cgrp)
+   pos = css_rightmost_descendant(pos);
+   else
+   rcu_assign_pointer(desc->bpf.effective[type],
+  effective);
+   }


Shouldn't the old_prog reference only be released right here at the end
instead of above (otherwise this could race)?


+}

Re: [PATCH v3 2/6] cgroup: add support for eBPF programs

2016-08-26 Thread Alexei Starovoitov

On Fri, Aug 26, 2016 at 09:58:48PM +0200, Daniel Mack wrote:
> This patch adds two sets of eBPF program pointers to struct cgroup.
> One for such that are directly pinned to a cgroup, and one for such
> that are effective for it.
> 
> To illustrate the logic behind that, assume the following example
> cgroup hierarchy.
> 
>   A - B - C
> \ D - E
> 
> If only B has a program attached, it will be effective for B, C, D
> and E. If D then attaches a program itself, that will be effective for
> both D and E, and the program in B will only affect B and C. Only one
> program of a given type is effective for a cgroup.
> 
> Attaching and detaching programs will be done through the bpf(2)
> syscall. For now, ingress and egress inet socket filtering are the
> only supported use-cases.
> 
> Signed-off-by: Daniel Mack 
...
> + css_for_each_descendant_pre(pos, >self) {
> + struct cgroup *desc = container_of(pos, struct cgroup, self);
> +
> + /* skip the subtree if the descendant has its own program */
> + if (desc->bpf.prog[type] && desc != cgrp)

is desc != cgrp really needed?
I thought css_for_each_descendant_pre() shouldn't walk itself
or I'm missing how it works.

> + pos = css_rightmost_descendant(pos);
> + else
> + rcu_assign_pointer(desc->bpf.effective[type],
> +effective);
> + }
> +}
> +

[PATCH v3 2/6] cgroup: add support for eBPF programs

2016-08-26 Thread Daniel Mack

This patch adds two sets of eBPF program pointers to struct cgroup.
One for such that are directly pinned to a cgroup, and one for such
that are effective for it.

To illustrate the logic behind that, assume the following example
cgroup hierarchy.

  A - B - C
\ D - E

If only B has a program attached, it will be effective for B, C, D
and E. If D then attaches a program itself, that will be effective for
both D and E, and the program in B will only affect B and C. Only one
program of a given type is effective for a cgroup.

Attaching and detaching programs will be done through the bpf(2)
syscall. For now, ingress and egress inet socket filtering are the
only supported use-cases.

Signed-off-by: Daniel Mack 
---
 include/linux/bpf-cgroup.h  |  70 +++
 include/linux/cgroup-defs.h |   4 ++
 init/Kconfig|  12 
 kernel/bpf/Makefile |   1 +
 kernel/bpf/cgroup.c | 165 
 kernel/cgroup.c |  18 +
 6 files changed, 270 insertions(+)
 create mode 100644 include/linux/bpf-cgroup.h
 create mode 100644 kernel/bpf/cgroup.c

diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
new file mode 100644
index 000..a5a25c1
--- /dev/null
+++ b/include/linux/bpf-cgroup.h
@@ -0,0 +1,70 @@
+#ifndef _BPF_CGROUP_H
+#define _BPF_CGROUP_H
+
+#include 
+#include 
+
+struct sock;
+struct cgroup;
+struct sk_buff;
+
+#ifdef CONFIG_CGROUP_BPF
+
+extern struct static_key_false cgroup_bpf_enabled_key;
+#define cgroup_bpf_enabled static_branch_unlikely(_bpf_enabled_key)
+
+struct cgroup_bpf {
+   /*
+* Store two sets of bpf_prog pointers, one for programs that are
+* pinned directly to this cgroup, and one for those that are effective
+* when this cgroup is accessed.
+*/
+   struct bpf_prog *prog[__MAX_BPF_ATTACH_TYPE];
+   struct bpf_prog *effective[__MAX_BPF_ATTACH_TYPE];
+};
+
+void cgroup_bpf_put(struct cgroup *cgrp);
+void cgroup_bpf_inherit(struct cgroup *cgrp, struct cgroup *parent);
+
+void __cgroup_bpf_update(struct cgroup *cgrp,
+struct cgroup *parent,
+struct bpf_prog *prog,
+enum bpf_attach_type type);
+
+/* Wrapper for __cgroup_bpf_update() protected by cgroup_mutex */
+void cgroup_bpf_update(struct cgroup *cgrp,
+  struct bpf_prog *prog,
+  enum bpf_attach_type type);
+
+int __cgroup_bpf_run_filter(struct sock *sk,
+   struct sk_buff *skb,
+   enum bpf_attach_type type);
+
+/* Wrapper for __cgroup_bpf_run_filter() guarded by cgroup_bpf_enabled */
+static inline int cgroup_bpf_run_filter(struct sock *sk,
+   struct sk_buff *skb,
+   enum bpf_attach_type type)
+{
+   if (cgroup_bpf_enabled)
+   return __cgroup_bpf_run_filter(sk, skb, type);
+
+   return 0;
+}
+
+#else
+
+struct cgroup_bpf {};
+static inline void cgroup_bpf_put(struct cgroup *cgrp) {}
+static inline void cgroup_bpf_inherit(struct cgroup *cgrp,
+ struct cgroup *parent) {}
+
+static inline int cgroup_bpf_run_filter(struct sock *sk,
+   struct sk_buff *skb,
+   enum bpf_attach_type type)
+{
+   return 0;
+}
+
+#endif /* CONFIG_CGROUP_BPF */
+
+#endif /* _BPF_CGROUP_H */
diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index 5b17de6..861b467 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef CONFIG_CGROUPS
 
@@ -300,6 +301,9 @@ struct cgroup {
/* used to schedule release agent */
struct work_struct release_agent_work;
 
+   /* used to store eBPF programs */
+   struct cgroup_bpf bpf;
+
/* ids of the ancestors at each level including self */
int ancestor_ids[];
 };
diff --git a/init/Kconfig b/init/Kconfig
index cac3f09..5a89c83 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1144,6 +1144,18 @@ config CGROUP_PERF
 
  Say N if unsure.
 
+config CGROUP_BPF
+   bool "Support for eBPF programs attached to cgroups"
+   depends on BPF_SYSCALL && SOCK_CGROUP_DATA
+   help
+ Allow attaching eBPF programs to a cgroup using the bpf(2)
+ syscall command BPF_PROG_ATTACH.
+
+ In which context these programs are accessed depends on the type
+ of attachment. For instance, programs that are attached using
+ BPF_ATTACH_TYPE_CGROUP_INET_INGRESS will be executed on the
+ ingress path of inet sockets.
+
 config CGROUP_DEBUG
bool "Example controller"
default n
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index eed911d..b22256b 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@

Re: [PATCH v3 2/6] cgroup: add support for eBPF programs

Re: [PATCH v3 2/6] cgroup: add support for eBPF programs

Re: [PATCH v3 2/6] cgroup: add support for eBPF programs

Re: [PATCH v3 2/6] cgroup: add support for eBPF programs

Re: [PATCH v3 2/6] cgroup: add support for eBPF programs

Re: [PATCH v3 2/6] cgroup: add support for eBPF programs

Re: [PATCH v3 2/6] cgroup: add support for eBPF programs

Re: [PATCH v3 2/6] cgroup: add support for eBPF programs

[PATCH v3 2/6] cgroup: add support for eBPF programs

9 matches

Site Navigation

Mail list logo

Footer information