Re: [RFC PATCH 4/5] net: filter: run cgroup eBPF programs

2016-08-25 Thread Tejun Heo
Hello, Sargun.

On Sun, Aug 21, 2016 at 01:14:22PM -0700, Sargun Dhillon wrote:
> So, casually looking at this patch, it looks like you're relying on 
> sock_cgroup_data, which only points to the default hierarchy. If someone uses 
> net_prio or net_classid, cgroup_sk_alloc_disable is called, and this wont 
> work 
> anymore. 

The requirement there comes from network side.  In short, davem
(rightfuly) doesn't want further proliferation of cgroup association
fields in struct sock.  It makes sense for network control too as it's
schizophrenic to have different associations depending on the specific
controller.  Also, the v2 requirement shouldn't really get in the way
as it can be mounted as just another hierarchy along with other v1
hierarchies.

> Any ideas on how to work around that? Does it make sense to add another 
> pointer 
> to sock_cgroup_data, or at least a warning when allocation is disabled?

cgroup already warns when the association gets disabled due to usage
of netcls and netprio.  We probably want to update the warning message
to include bpf too.

Thanks.

-- 
tejun


Re: [RFC PATCH 4/5] net: filter: run cgroup eBPF programs

2016-08-21 Thread Sargun Dhillon
On Wed, Aug 17, 2016 at 04:00:47PM +0200, Daniel Mack wrote:
> If CONFIG_CGROUP_BPF is enabled, and the cgroup associated with the
> receiving socket has an eBPF programs installed, run them from
> sk_filter_trim_cap().
> 
> eBPF programs used in this context are expected to either return 1 to
> let the packet pass, or != 1 to drop them. The programs have access to
> the full skb, including the MAC headers.
> 
> This patch only implements the call site for ingress packets.
> 
> Signed-off-by: Daniel Mack 
> ---
>  net/core/filter.c | 44 
>  1 file changed, 44 insertions(+)
> 
> diff --git a/net/core/filter.c b/net/core/filter.c
> index c5d8332..a1dd94b 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -52,6 +52,44 @@
>  #include 
>  #include 
>  
> +#ifdef CONFIG_CGROUP_BPF
> +static int sk_filter_cgroup_bpf(struct sock *sk, struct sk_buff *skb,
> + enum bpf_attach_type type)
> +{
> + struct sock_cgroup_data *skcd = >sk_cgrp_data;
> + struct cgroup *cgrp = sock_cgroup_ptr(skcd);
> + struct bpf_prog *prog;
> + int ret = 0;
> +
> + rcu_read_lock();
> +
> + switch (type) {
> + case BPF_ATTACH_TYPE_CGROUP_EGRESS:
> + prog = rcu_dereference(cgrp->bpf_egress);
> + break;
> + case BPF_ATTACH_TYPE_CGROUP_INGRESS:
> + prog = rcu_dereference(cgrp->bpf_ingress);
> + break;
> + default:
> + WARN_ON_ONCE(1);
> + ret = -EINVAL;
> + break;
> + }
> +
> + if (prog) {
> + unsigned int offset = skb->data - skb_mac_header(skb);
> +
> + __skb_push(skb, offset);
> + ret = bpf_prog_run_clear_cb(prog, skb) > 0 ? 0 : -EPERM;
> + __skb_pull(skb, offset);
> + }
> +
> + rcu_read_unlock();
> +
> + return ret;
> +}
> +#endif /* !CONFIG_CGROUP_BPF */
> +
>  /**
>   *   sk_filter_trim_cap - run a packet through a socket filter
>   *   @sk: sock associated with _buff
> @@ -78,6 +116,12 @@ int sk_filter_trim_cap(struct sock *sk, struct sk_buff 
> *skb, unsigned int cap)
>   if (skb_pfmemalloc(skb) && !sock_flag(sk, SOCK_MEMALLOC))
>   return -ENOMEM;
>  
> +#ifdef CONFIG_CGROUP_BPF
> + err = sk_filter_cgroup_bpf(sk, skb, BPF_ATTACH_TYPE_CGROUP_INGRESS);
> + if (err)
> + return err;
> +#endif
> +
>   err = security_sock_rcv_skb(sk, skb);
>   if (err)
>   return err;
> -- 
> 2.5.5
> 

So, casually looking at this patch, it looks like you're relying on 
sock_cgroup_data, which only points to the default hierarchy. If someone uses 
net_prio or net_classid, cgroup_sk_alloc_disable is called, and this wont work 
anymore. 

Any ideas on how to work around that? Does it make sense to add another pointer 
to sock_cgroup_data, or at least a warning when allocation is disabled?


Re: [RFC PATCH 4/5] net: filter: run cgroup eBPF programs

2016-08-17 Thread Alexei Starovoitov
On Wed, Aug 17, 2016 at 11:20:29AM -0700, Alexei Starovoitov wrote:
> On Wed, Aug 17, 2016 at 04:00:47PM +0200, Daniel Mack wrote:
> > If CONFIG_CGROUP_BPF is enabled, and the cgroup associated with the
> > receiving socket has an eBPF programs installed, run them from
> > sk_filter_trim_cap().
> > 
> > eBPF programs used in this context are expected to either return 1 to
> > let the packet pass, or != 1 to drop them. The programs have access to
> > the full skb, including the MAC headers.
> > 
> > This patch only implements the call site for ingress packets.
> > 
> > Signed-off-by: Daniel Mack 
> > ---
> >  net/core/filter.c | 44 
> >  1 file changed, 44 insertions(+)
> > 
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index c5d8332..a1dd94b 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -52,6 +52,44 @@
> >  #include 
> >  #include 
> >  
> > +#ifdef CONFIG_CGROUP_BPF
> > +static int sk_filter_cgroup_bpf(struct sock *sk, struct sk_buff *skb,
> > +   enum bpf_attach_type type)
> > +{
> > +   struct sock_cgroup_data *skcd = >sk_cgrp_data;
> > +   struct cgroup *cgrp = sock_cgroup_ptr(skcd);
> > +   struct bpf_prog *prog;
> > +   int ret = 0;
> > +
> > +   rcu_read_lock();
> > +
> > +   switch (type) {
> > +   case BPF_ATTACH_TYPE_CGROUP_EGRESS:
> > +   prog = rcu_dereference(cgrp->bpf_egress);
> > +   break;
> > +   case BPF_ATTACH_TYPE_CGROUP_INGRESS:
> > +   prog = rcu_dereference(cgrp->bpf_ingress);
> > +   break;
> > +   default:
> > +   WARN_ON_ONCE(1);
> > +   ret = -EINVAL;
> > +   break;
> > +   }
> > +
> > +   if (prog) {
> 
> I really like how in this version of the patches it became
> a single load+cmp of per-packet cost when this feature is off.
> Please move
> + struct cgroup *cgrp = sock_cgroup_ptr(skcd);
> into if (prog) {..}
> to make sure it's actually single load.
> The compiler cannot avoid that load when it's placed at the top.

sorry. brain fart. it is two loads. scratch that.

> > +   unsigned int offset = skb->data - skb_mac_header(skb);
> > +
> > +   __skb_push(skb, offset);
> > +   ret = bpf_prog_run_clear_cb(prog, skb) > 0 ? 0 : -EPERM;
> 
> that doesn't match commit log. The above '> 0' makes sense to me though.
> If we want to do it for 1 only we have to define it in uapi/bpf.h
> as action code, so we can extend to 2, 3 in the future if necessary.
> 
> It also have to be bpf_prog_run_save_cb() (as sk_filter_trim_cap() does)
> instead of bpf_prog_run_clear_cb().
> See commit ff936a04e5f2 ("bpf: fix cb access in socket filter programs")
> 
> > +   __skb_pull(skb, offset);
> > +   }
> > +
> > +   rcu_read_unlock();
> > +
> > +   return ret;
> > +}
> > +#endif /* !CONFIG_CGROUP_BPF */
> > +
> >  /**
> >   * sk_filter_trim_cap - run a packet through a socket filter
> >   * @sk: sock associated with _buff
> > @@ -78,6 +116,12 @@ int sk_filter_trim_cap(struct sock *sk, struct sk_buff 
> > *skb, unsigned int cap)
> > if (skb_pfmemalloc(skb) && !sock_flag(sk, SOCK_MEMALLOC))
> > return -ENOMEM;
> >  
> > +#ifdef CONFIG_CGROUP_BPF
> > +   err = sk_filter_cgroup_bpf(sk, skb, BPF_ATTACH_TYPE_CGROUP_INGRESS);
> > +   if (err)
> > +   return err;
> > +#endif
> > +
> > err = security_sock_rcv_skb(sk, skb);
> > if (err)
> > return err;
> > -- 
> > 2.5.5
> > 


Re: [RFC PATCH 4/5] net: filter: run cgroup eBPF programs

2016-08-17 Thread Alexei Starovoitov
On Wed, Aug 17, 2016 at 04:00:47PM +0200, Daniel Mack wrote:
> If CONFIG_CGROUP_BPF is enabled, and the cgroup associated with the
> receiving socket has an eBPF programs installed, run them from
> sk_filter_trim_cap().
> 
> eBPF programs used in this context are expected to either return 1 to
> let the packet pass, or != 1 to drop them. The programs have access to
> the full skb, including the MAC headers.
> 
> This patch only implements the call site for ingress packets.
> 
> Signed-off-by: Daniel Mack 
> ---
>  net/core/filter.c | 44 
>  1 file changed, 44 insertions(+)
> 
> diff --git a/net/core/filter.c b/net/core/filter.c
> index c5d8332..a1dd94b 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -52,6 +52,44 @@
>  #include 
>  #include 
>  
> +#ifdef CONFIG_CGROUP_BPF
> +static int sk_filter_cgroup_bpf(struct sock *sk, struct sk_buff *skb,
> + enum bpf_attach_type type)
> +{
> + struct sock_cgroup_data *skcd = >sk_cgrp_data;
> + struct cgroup *cgrp = sock_cgroup_ptr(skcd);
> + struct bpf_prog *prog;
> + int ret = 0;
> +
> + rcu_read_lock();
> +
> + switch (type) {
> + case BPF_ATTACH_TYPE_CGROUP_EGRESS:
> + prog = rcu_dereference(cgrp->bpf_egress);
> + break;
> + case BPF_ATTACH_TYPE_CGROUP_INGRESS:
> + prog = rcu_dereference(cgrp->bpf_ingress);
> + break;
> + default:
> + WARN_ON_ONCE(1);
> + ret = -EINVAL;
> + break;
> + }
> +
> + if (prog) {

I really like how in this version of the patches it became
a single load+cmp of per-packet cost when this feature is off.
Please move
+   struct cgroup *cgrp = sock_cgroup_ptr(skcd);
into if (prog) {..}
to make sure it's actually single load.
The compiler cannot avoid that load when it's placed at the top.

> + unsigned int offset = skb->data - skb_mac_header(skb);
> +
> + __skb_push(skb, offset);
> + ret = bpf_prog_run_clear_cb(prog, skb) > 0 ? 0 : -EPERM;

that doesn't match commit log. The above '> 0' makes sense to me though.
If we want to do it for 1 only we have to define it in uapi/bpf.h
as action code, so we can extend to 2, 3 in the future if necessary.

It also have to be bpf_prog_run_save_cb() (as sk_filter_trim_cap() does)
instead of bpf_prog_run_clear_cb().
See commit ff936a04e5f2 ("bpf: fix cb access in socket filter programs")

> + __skb_pull(skb, offset);
> + }
> +
> + rcu_read_unlock();
> +
> + return ret;
> +}
> +#endif /* !CONFIG_CGROUP_BPF */
> +
>  /**
>   *   sk_filter_trim_cap - run a packet through a socket filter
>   *   @sk: sock associated with _buff
> @@ -78,6 +116,12 @@ int sk_filter_trim_cap(struct sock *sk, struct sk_buff 
> *skb, unsigned int cap)
>   if (skb_pfmemalloc(skb) && !sock_flag(sk, SOCK_MEMALLOC))
>   return -ENOMEM;
>  
> +#ifdef CONFIG_CGROUP_BPF
> + err = sk_filter_cgroup_bpf(sk, skb, BPF_ATTACH_TYPE_CGROUP_INGRESS);
> + if (err)
> + return err;
> +#endif
> +
>   err = security_sock_rcv_skb(sk, skb);
>   if (err)
>   return err;
> -- 
> 2.5.5
> 


Re: [RFC PATCH 4/5] net: filter: run cgroup eBPF programs

2016-08-17 Thread Tejun Heo
On Wed, Aug 17, 2016 at 04:36:02PM +0200, Daniel Mack wrote:
> On 08/17/2016 04:23 PM, Tejun Heo wrote:
> > On Wed, Aug 17, 2016 at 04:00:47PM +0200, Daniel Mack wrote:
> >> @@ -78,6 +116,12 @@ int sk_filter_trim_cap(struct sock *sk, struct sk_buff 
> >> *skb, unsigned int cap)
> >>if (skb_pfmemalloc(skb) && !sock_flag(sk, SOCK_MEMALLOC))
> >>return -ENOMEM;
> >>  
> >> +#ifdef CONFIG_CGROUP_BPF
> >> +  err = sk_filter_cgroup_bpf(sk, skb, BPF_ATTACH_TYPE_CGROUP_INGRESS);
> >> +  if (err)
> >> +  return err;
> >> +#endif
> > 
> > Hmm... where does egress hook into?
> 
> As stated in the cover letter, that's an open question on my list.

lol sorry about that.  :)

-- 
tejun


Re: [RFC PATCH 4/5] net: filter: run cgroup eBPF programs

2016-08-17 Thread Daniel Mack
On 08/17/2016 04:23 PM, Tejun Heo wrote:
> On Wed, Aug 17, 2016 at 04:00:47PM +0200, Daniel Mack wrote:
>> @@ -78,6 +116,12 @@ int sk_filter_trim_cap(struct sock *sk, struct sk_buff 
>> *skb, unsigned int cap)
>>  if (skb_pfmemalloc(skb) && !sock_flag(sk, SOCK_MEMALLOC))
>>  return -ENOMEM;
>>  
>> +#ifdef CONFIG_CGROUP_BPF
>> +err = sk_filter_cgroup_bpf(sk, skb, BPF_ATTACH_TYPE_CGROUP_INGRESS);
>> +if (err)
>> +return err;
>> +#endif
> 
> Hmm... where does egress hook into?

As stated in the cover letter, that's an open question on my list.


Thanks,
Daniel


Re: [RFC PATCH 4/5] net: filter: run cgroup eBPF programs

2016-08-17 Thread Tejun Heo
On Wed, Aug 17, 2016 at 04:00:47PM +0200, Daniel Mack wrote:
> @@ -78,6 +116,12 @@ int sk_filter_trim_cap(struct sock *sk, struct sk_buff 
> *skb, unsigned int cap)
>   if (skb_pfmemalloc(skb) && !sock_flag(sk, SOCK_MEMALLOC))
>   return -ENOMEM;
>  
> +#ifdef CONFIG_CGROUP_BPF
> + err = sk_filter_cgroup_bpf(sk, skb, BPF_ATTACH_TYPE_CGROUP_INGRESS);
> + if (err)
> + return err;
> +#endif

Hmm... where does egress hook into?

Thanks.

-- 
tejun